Kacprzyk 
Pedrycz 
Editors 


Q) Springer 


Springer Handbook 
of Computational Intelligence 


Springer Handbooks provide 
a concise compilation of approved 
key information on methods of 
research, general principles, and 
functional relationships in physical 
and applied sciences. The world’s 
leading experts in the fields of 
physics and engineering will be as- 
signed by one or several renowned 
editors to write the chapters com- 
prising each volume. The content 
is selected by these experts from 
Springer sources (books, journals, 
online content) and other systematic 
and approved recent publications of 
scientific and technical information. 
The volumes are designed to be 
useful as readable desk reference 
book to give a fast and comprehen- 
sive overview and easy retrieval of 
essential reliable key information, 
including tables, graphs, and bibli- 
ographies. References to extensive 
sources are provided. 


Hand Bue 


of Computational Intelligence 
Janusz Kacprzyk, Witold Pedrycz (Eds.) 


Editors 

Janusz Kacprzyk 

Polish Academy of Sciences 
Systems Research Inst. 

ul. Newelska 6 

01-447 Warsaw, Poland 
kacprzyk @ibspan.waw.pl 


Witold Pedrycz 

University of Alberta 

Dep. Electrical and Computer Engineering 
116 Street 9107 

T6J 2V4, Edmonton, Alberta, Canada 
wpedrycz @ualberta.ca 


ISBN: 978-3-662-43504-5 e-ISBN: 978-3-662-43505-2 
DOI 10.1007/978-3-662-43505-2 
Springer Dordrecht Heidelberg London New York 


Library of Congress Control Number: 2015936335 


© Springer-Verlag Berlin Heidelberg 2015 

This work is subject to copyright. All rights are reserved, whether the whole 
or part of the material is concerned, specifically the rights of translation, 
reprinting, reuse of illustrations, recitation, broadcasting, reproduction on 
microfilm or in any other way, and storage in data banks. Duplication of 
this publication or parts thereof is permitted only under the provisions of 
the German Copyright Law of September 9, 1965, in its current version, 
and permission for use must always be obtained from Springer. Violations 
are liable to prosecution under the German Copyright Law. 

The use of general descriptive names, registered names, trademarks, 
etc. in this publication does not imply, even in the absence of a specific 
statement, that such names are exempt from the relevant protective laws 
and regulations and therefore free for general use. 


Production and typesetting: le-tex publishing services GmbH, Leipzig 
Senior Manager Springer Handbook: Dr. W. Skolaut, Heidelberg 
Typography and layout: schreiberVIS, Seeheim 

Illustrations: Hippmann GbR, Schwarzenbruck 

Cover design: eStudio Calamar Steinen, Barcelona 

Cover production: WMXDesign GmbH, Heidelberg 

Printing and binding: Printer Trento s.r.l., Trento 


Printed on acid free paper 


Springer is part of Springer Science+Business Media (www.springer.com) 


Preface 


We are honored and happy to be able to make available this Springer Handbook of 
Computational Intelligence, a large and comprehensive account of both the state-of- 
the-art of the research discipline, complemented with some historical remarks, main 
challenges, and perspectives of the future. To follow a predominant tradition, we have 
divided this Springer Handbook into parts that correspond to main fields that are meant 
to constitute the area of computational intelligence, that is, fuzzy sets theory and fuzzy 
logic, rough sets, evolutionary computation, neural networks, hybrid approaches and 
systems, all of them complemented with a thorough coverage of some foundational 
issues, methodologies, tools, and techniques. 

We hope that the handbook will serve as an indispensable and useful source of 
information for all readers interested in both the theory and various applications of 
computational intelligence. The formula of the Springer Handbook as a convenient 
single-volume publication project should help the potential readers find a proper tool 
or technique for solving their problems just by simply browsing through a clearly 
composed and well-indexed contents. The authors of the particular chapters, who are 
the best known specialists in their respective fields worldwide, are the best assurance 
for the handbook to serve as an excellent and timely reference. 

On behalf of the entire computational intelligence community, we wish to express 
sincere thanks, first of all, to the Part Editors responsible for the scope, authors, and 
composition of the particular parts for their great job to arrange the most appropriate 
topics, their coverage, and identify expert authors. Second, we wish to thank all the 
authors for their great contributions in the sense of clarity, comprehensiveness, novelty, 
vision, and — above all — understanding of the real needs of readers of diverse interests. 

All that efforts would not end up with the success without a total and multifaceted 
publisher’s dedication and support. We wish to thank very much Dr. Werner Skolaut, 
Ms. Constanze Ober, and their collaborators from Springer, Heidelberg, and le-tex 
publishing GmbH, Leipzig, respectively, for their extremely effective and efficient han- 
dling of this huge and difficult project. 


September 2014 
Janusz Kacprzyk Warsaw 
Witold Pedrycz Edmonton 


About the Editors 


Janusz Kacprzyk graduated from the Department of Electronics, Warsaw Univer- 
sity of Technology, Poland with an MSc in Automatic Control, a PhD in Systems 
Analysis and a DSc (Habilitation) in Computer Science from the Polish Academy of 
Sciences. He is Professor of Computer Science at the Systems Research Institute, Pol- 
ish Academy of Sciences, Professor of Computerized Management Systems at WIT — 
Warsaw School of Information Technology, and Professor of Automatic Control at 
PIAP — Industrial Institute of Automation and Measurements, in Warsaw, Poland, and 
Department of Electrical and Computer Engineering, Cracow University of Technol- 
ogy, Poland. He is the author of 5 books, (co)editor of ca. 70 volumes, (co)author of 
ca. 500 papers. He is Editor-in-Chief of 6 book series and of 2 journals, and on the 
Editorial Boards of more than 40 journals. 


Witold Pedrycz is a Professor and Canada Research Chair (CRC) in Computational 
Intelligence in the Department of Electrical and Computer Engineering, University 
of Alberta, Edmonton, Canada. He is also with the Systems Research Institute of the 
Polish Academy of Sciences, Warsaw. He also holds an appointment of special profes- 
sorship in the School of Computer Science, University of Nottingham, UK. His main 
research directions involve computational intelligence, fuzzy modeling and granular 
computing, knowledge discovery and data mining, fuzzy control, pattern recognition, 
knowledge-based neural networks, relational computing, and software engineering. He 
has published numerous papers and is the author of 15 research monographs covering 
various aspects of computational intelligence, data mining, and software engineering. 
He currently serves as an Associate Editor of IEEE Transactions on Fuzzy Systems 
and is a member of a number of Editorial Boards of other international journals. 


VI 


About the Part Editors 


= Cesare Alippi 


Part D 


Politecnico di Milano 
Dip. Elettronica, Informazione e 
Ingegneria 

A 20133 Milano, Italy 
alippi@elet.polimi.it 


Thomas Bartz-Beielstein F 


Cologne University of Applied Sciences 
Faculty of Computer Science and 
Engineering Science 

51643 Gummersbach, Germany 
thomas. bartz-beielstein@fh-koeln.de 


Christian Blum 


University of the Basque Country 
Dep. Computer Science and Artificial 
Intelligence 

20018 San Sebastian, Spain 
christian. blum@ehu.es 


Oscar Castillo 


Cesare Alippi received his PhD in 1995 from Politecnico di Milano, Italy. Currently, 
he is Professor at the same institution. He has been a visiting researcher at UCL (UK), 
MIT (USA), ESPCI (F), CASIA (RC), USI (CH). Alippi is an IEEE Fellow, Vice- 
President Education of the IEEE Computational Intelligence Society, Associate Editor 
of the IEEE Computational Intelligence Magazine, Past Editor of the IEEE-TIM and 
IEEE-TNN(LS). In 2004 he received the IEEE Instrumentation and Measurement 
Society Young Engineer Award and in 2013 the IBM Faculty Award. His current 
research focuses on learning in non-stationary environments and intelligence for 
embedded systems. He holds 5 patents, has published 1 monograph book, 6 edited 
books and about 200 papers in international journals and conference proceedings. 


Part E 


Thomas Bartz-Beielstein is a Professor of Applied Mathematics at Cologne University 
of Applied Sciences (CUAS). His expertise lies in optimization, simulation, and 
statistical analysis of complex real-world problems. He has more than 100 publications 
on computational intelligence, optimization, simulation, and experimental research. 
He has been on the program committees of several international conferences and 
organizes the prestigious track Evolutionary Computation in Practice at GECCO. His 
books on experimental research are considered as milestones in this emerging field. 
He is speaker of the research center Computational Intelligence plus at CUAS and 
head of the SPOTSeven team. 


m Part F 


Christian Blum holds a Master’s Degree in Mathematics (1998) from 
the University of Kaiserslautern, Germany, and a PhD degree in Applied 
Sciencies (2004) from the Free University of Brussels, Belgium. He 
currently occupies a permanent post as Ikerbasque Research Professor at 
the University of the Basque Country, San Sebastian, Spain. His research 
interests include the development of swarm intelligence techniques and 
the combination of metaheuristics with exact approaches for solving 
difficult optimization problems. So far he has co-authored about 150 
research papers. 


Part G 


22379 Tijuana, Mexico 
ocastillo@tectijuana.mx 


Tijuana Institute of Technology Oscar Castillo holds the Doctor in Science degree in Computer Science 


from the Polish Academy of Sciences. He is a Professor of Computer 
Science in the Graduate Division, Tijuana Institute of Technology, 
Tijuana, Mexico. In addition, he serves as Research Director of 
Computer Science. Currently, he is Vice-President of HAFSA (Hispanic 
American Fuzzy Systems Association) and served as President of IFSA 
(International Fuzzy Systems Association). He belongs to the Mexican 
Research System with level II and is also a member of NAFIPS, IFSA, 
and IEEE. His research interests are in type-2 fuzzy logic, fuzzy control, 
and neuro-fuzzy and genetic-fuzzy hybrid approaches. 


X About the Part Editors 


Carlos A. Coello Coello 


Part E 


CINVESTAV-IPN 

Dep. Computación 

D.F. 07300, México, Mexico 
ccoello@cs.cinvestav.mx 


Pa 


Bernard De Baets 


Ghent University 

Dep. Mathematical Modelling, Statistics 
and Bioinformatics 

9000 Ghent, Belgium 
bernard.debaets@ugent. be 


Roderich Groß 


University of Sheffield 


Carlos A. Coello Coello received a PhD in Computer Science from Tulane University 
in 1996. He has made pioneering contributions to the research area currently known 
as evolutionary multi-objective optimization, mainly regarding the development of 
new algorithms. He is currently Professor at the Computer Science Department 

of CINVESTAV-IPN (Mexico City, México). He has co-authored more than 350 
publications (his h-index is 62). He is Associate Editor of several journals, including 
IEEE Transactions on Evolutionary Computation and Evolutionary Computation. He 
has received Mexico’s National Medal of Science in Exact Sciences and the IEEE 
Kiyo Tomiyasu Award. He is also an IEEE Fellow. 


Part A 


Bernard De Baets (1966) holds an MSc degree in Mathematics, a postgraduate degree 
in Knowledge Technology, and a PhD degree in Mathematics. He is a full professor at 
UGent (Belgium), where he leads KERMIT, an interdisciplinary team in mathematical 
modeling, having delivered 50 PhD graduates to date. His bibliography comprises 
nearly 400 journal papers, 60 book chapters, and 300 conference contributions. He 
acts as Co-Editor-in-Chief (2007) of Fuzzy Sets and Systems. He is a recipient of 

a Government of Canada Award, Honorary Professor of Budapest Tech (Hungary), 
Fellow of the International Fuzzy Systems Association, and has been nominated for 
the Ghent University Prometheus Award for Research. 


F m Part F 


Roderich Groß received a Diploma degree in Computer Science from 


Dep. Automatic Control and Systems E TU Dortmund University in 2001 and a PhD degree in Engineering 
Engineering ~ æ Sciences from the Université libre de Bruxelles in 2007. From 2005 to 
Sheffield, $1 3JD, UK aA Ņ 2009 he was a fellow of the Japan Society for the Promotion of Science, 
rgross@sheffield.ac.uk a Research Associate at the University of Bristol, a Marie Curie Fellow 
at Unilever, and a Marie Curie Fellow at EPFL. Since 2010 he has been 
with the Department of Automatic Control and Systems Engineering at 
the University of Sheffield, where he is currently Senior Lecturer. His 
research interests include evolutionary and distributed robotics. He has 
authored over 60 publications on these topics. He is a Senior Member of 
the IEEE. 
F Enrique Herrera Viedma Part B 
University of Granada Enrique Herrera-Viedma received his PhD degree in Computer Science 
d Dep. Computer Science and Artificial from Granada University in 1996. He is Professor at Granada University 
Intelligence ; in the Depaartment of Computer Science and Artificial Intelligence and 
18003 Granada, Spain a member of the BoG in IEEE SMC. His interest topics are computing 
viedma@decsai.ugr.es with words, fuzzy decision making, consensus, aggregation, social 
media, recommender systems, libraries, and bibliometrics. His h-index 
is 44 and presents over 7000 citations (WoS). In 2014 he was identified 
as Highly Cited Researcher by Thomson Reuters and Top Author in 
Computer Science according to Microsoft Academic Search. 
F 5 Luis Magdalena Part B 


European Centre for Soft Computing 
33600 Mieres, Spain 
luis. magdalena@softcomputing.es 


Luis Magdalena received the MS and PhD degrees in Telecommunication Engineering 
from the Technical University of Madrid, Spain, in 1988 and 1994. He has been 
Assistant (1990-1995) and Associate Professor (1995-2006) in Computer Science at 
the Technical University of Madrid. Since 2006 he has been Director General of the 
European Center for Soft Computing. His research interests include soft computing 
and its application. He has authored over 150 publications in the field. He has been 
President of the European Society for Fuzzy Logic and Technologies (2001-2005), 
Vice-President of the International Fuzzy Systems Association (2007-2011), and 
member of the IEEE Computational Intelligence Society AdCom (2011-2013). 


About the Part Editors 


Jörn Mehnen 


Part E 


Cranfield University 
Manufacturing Dep. 
Cranfield, MK43 OAL, UK 
j.mehnen@cranfield.ac.uk 


Patricia Melin 


Tijuana Institute of Technology 
Dep. Computer Science 

Chula Vista, CA 91909, USA 
pmelin@tectijuana.mx 


Dr Jörn Mehnen is Reader in Computational Manufacturing at Cranfield University, 
UK and Privatdozent at TU Dortmund, Germany. He is also Deputy Director of the 
EPSRC Centre in Through-life Engineering Services at Cranfield University. His 
research activities are in real-world applications of computer sciences in mechanical 
engineering with special focus on evolutionary optimization, cloud manufacturing, 
and additive manufacturing. 


Part G 


Patricia Melin holds the Doctor in Science degree in Computer Science 
from the Polish Academy of Sciences. She has been a Professor of Com- 
puter Science in the Graduate Division, Tijuana Institute of Technology, 
Tijuana, Mexico since 1998. She serves as Director of Graduate Studies in 
Computer Science. Currently, she is Vice President of HAFSA (Hispanic 
American Fuzzy Systems Association). She is the founding Chair of the 
Mexican Chapter of the IEEE Computational Intelligence Society. She 
is member of NAFIPS, IFSA, and IEEE and belongs to the Mexican 
Research System with level IN. Her research interests are in type-2 fuzzy 
logic, modular neural networks, pattern recognition, fuzzy control, and 
neuro-fuzzy and genetic-fuzzy hybrid approaches. She has published over 
200 journal papers, 6 authored books, 20 edited books, and 200 papers in 
conference proceedings. 


Dep. Business Administration 
Computer Science 

30459 Hannover, Germany 
peter.merz@hs-hannover.de 


Radko Mesiar 


STU in Bratislava 

Dep. Mathematics and Descriptive 
Geometry 

813 68 Bratislava, Slovakia 

> radko.mesiar@stuba.sk 


Frank Neumann l 


Peter Merz Part E 
University of Applied Sciences and Arts, Peter Merz received his PhD degree in Computer Science from the 
Hannover University of Siegen, Germany in 2000. Since 2009, he has been 


and Professor at the University of Applied Sciences and Arts in Hannover. 
He is a well-known scientist in the field of evolutionary computation 
and meta-heuristics. His research interests center on fitness landscapes 
of combinatorial optimization problems and their analysis. 


Part A 


Radko Mesiar received his PhD from Comenius University, Faculty of Mathematics 
and Physics, in 1979. He has been a member of the Department of Mathematics in the 
Faculty of Civil Engineering, STU Bratislava since 1978. He received his DSc in 1996 
from the Czech Academy of Sciences. He has been a full professor since 1998. He is a 
fellow member of the Institute of Information and Automation at the Czech Academy 
of Sciences and of IRAFM, University of Ostrava (Czech Republic). He is co-author 
of two scientific monographs and five edited volumes. He is the author of more than 
200 papers in WOS in leading journals. He is the co-founder of conferences AGOP, 
FSTA, ABLAT, and ISCAMI. 


Part E 


The University of Adelaide 

School of Computer Science 
Adelaide, SA 5005, Australia 
frank.neumann@adelaide.edu.au 


Frank Neumann received his diploma and PhD from the University of Kiel in 
2002 and 2006, respectively. Currently, he is an Associate Professor and leader 
of the Optimisation and Logistics Group at the School of Computer Science, The 
University of Adelaide, Australia. He is the General Chair of ACM GECCO 2016. 
He is Vice-Chair of the IEEE Task Force on Theoretical Foundations of Bio-Inspired 
Computation, and Chair of the IEEE Task Force on Evolutionary Scheduling and 
Combinatorial Optimization. In his work, he considers algorithmic approaches and 
focuses on theoretical aspects of evolutionary computation as well as high impact 
applications in the areas of renewable energy, logistics, and sports. 


Xl 


XII About the Part Editors 


Marios Polycarpou 


University of Cyprus 

Dep. Electrical and Computer 
Engineering and KIOS Research Center 
for Intelligent Systems and Networks 
1678 Nicosia, Cyprus 
mpolycar@ucy.ac.cy 


rF = Günther Raidl 


Part D 


Marios Polycarpou is Professor of Electrical and Computer Engineering 
and the Director of the KIOS Research Center for Intelligent Systems 
and Networks at the University of Cyprus. His research expertise is in 
the areas of intelligent systems and control, computational intelligence, 
fault diagnosis, cooperative and adaptive control, and distributed agents. 
He is a Fellow of the IEFE. He has participated in more than 60 research 
projects/grants, funded by several agencies and industries in Europe and 
the United States. In 2011, he was awarded the prestigious European 
Research Council (ERC) Advanced Grant. 


Part E 


Vienna University of Technology 
Inst. Computer Graphics and Algorithms 


1040 Vienna, Austria 


raidl@ads.tuwien.ac.at 


Oliver Schiitze 


Günther Raid] is Professor at the Vienna University of Technology, 
Austria, and heads the Algorithms and Data Structures Group. He 
received his PhD in 1994 and completed his Habilitation in Practical 
Computer Science in 2003. In 2005 he received a professorship 
position for combinatorial optimization. His research interests include 
algorithms and data structures in general and combinatorial optimization 
in particular, with a specific focus on metaheuristics, mathematical 
programming, and hybrid optimization approaches. 


Part E 


CINVESTAV-IPN 

Dep. Computación 

D.F. 07300, México, Mexico 
schuetze@cs.cinvestav.mx 


Roman Stowinski F 


Poznań University of Technology 
Inst. Computing Science 

60-965 Poznań, Poland 

Roman. Slowinski@cs.put.poznan.pl 


Carsten Witt 


Oliver Schiitze received a PhD in Mathematics from the University of Paderborn, 
Germany in 2004. He is currently Professor at Cinvestav-IPN in Mexico City 
(Mexico). His research interests focus on numerical and evolutionary optimization 
where he addresses scalar and multi-objective optimization problems. He has co- 
edited 5 books and is co-author of more than 90 papers. He is a co-founder of SON 
(Set Oriented Numerics) and founder of the NEO (Numerical and Evolutionary 
Optimization) workshop series. 


Part C 


Technical University of Denmark 
DTU Compute, Algorithms, Logic and 
Graphs 

2800 Kgs., Lyngby, Denmark 
cawi@imm.dtu.dk 


Roman Stowinski is Professor and Founding Chair of the Laboratory of Intelligent 
Decision Support Systems at Poznan University of Technology. He is Academician 
and President of the Poznań Branch of the Polish Academy of Sciences and a 
member of Academia Europaea. In his research, he combines operations research and 
computational intelligence. He is renowned for his seminal research on using rough 
sets in decision analysis. He was laureate of the EURO Gold Medal (1991) and won 
the 2005 Prize of the Foundation for Polish Science. He is Doctor Honoris Causa 
of Polytech’ Mons (2000), the University Paris Dauphine (2001), and the Technical 
University of Crete (2008). 


Part E 


Carsten Witt is Associate Professor at the Technical University of Den- 
mark. He received his PhD in Computer Science from the Technical 
University of Dortmund in 2004. His main research interests are the the- 
oretical aspects of nature-inspired algorithms, in particular evolutionary 
algorithms, ant colony optimization and particle swarm optimization. He 
is a member of the Editorial Boards of Evolutionary Computation and 
Theoretical Computer Science and has co-authored a textbook. 


About the Part Editors 


wm Yiyu Yao 


Part C 


University of Regina 

Dep. Computer Science 

Regina, Saskatchewan, S4S 0A2, Canada 
yyao@cs.uregina.ca 


Yiyu Yao is Professor of Computer Science in the Department of 
Computer Science, the University of Regina, Canada. His research 
interests include three-way decisions, rough sets, fuzzy sets, interval 
sets, granular computing, information retrieval, Web intelligence, and 
data mining. He is currently working on a triarchic theory of granular 
computing, a theory of three-way decisions and generalized rough sets. 


XII 


XV 


List of Authors 


Enrique Alba 

Universidad de Malaga 
E.T.S.1. Informatica 
Campus de Teatinos (3.2.12) 
29071 Malaga, Spain 
e-mail: eat@I/cc.uma.es 


Jose M. Alonso 

European Centre for Soft Computing 
Cognitive Computing 

33600 Mieres, Spain 

e-mail: jose.alonso@softcomputing.es 


Jhon Edgar Amaya 

Universidad Nacional Experimental del Tachira 
Dep. Electronic Engineering 

Av. Universidad. Paramillo 

San Cristobal, Venezuela 

e-mail: jedgar@unet.edu.ve 


Plamen P. Angelov 

Lancaster University 

School of Computing and Communications 
Bailrigg, Lancaster, LA1 4YW, UK 

e-mail: p.angelov@lancaster.ac.uk 


Dirk V. Arnold 

Dalhousie University 

Faculty of Computer Science 

6050 University Avenue 

Halifax, Nova Scotia, B3H 4R2, Canada 
e-mail: dirk@cs.dal.ca 


Anne Auger 

University Paris-Sud Orsay 

CR Inria 

LRI (UMR 8623) 

91405 Orsay Cedex, France 
e-mail: anne.auger@inria.fr 


Davide Bacciu 

Università di Pisa 

Dip. Informatica 

L.Go B. Pontecorvo, 3 

56127 Pisa, Italy 

e-mail: bacciu@di.unipi.it 


Michał Baczynski 

University of Silesia 

Inst. Mathematics 

Bankowa 14 

40-007 Katowice, Poland 

e-mail: michal.baczynski@us.edu. pl 


Edurne Barrenechea 

Universidad Pública de Navarra 

Dep. Automática y Computación 

31006 Pamplona (Navarra), Spain 

e-mail: edurne. barrenechea@unavarra.es 


Thomas Bartz-Beielstein 

Cologne University of Applied Sciences 

Faculty of Computer Science and Engineering 
Science 

Steinmüllerallee 1 

51643 Gummersbach, Germany 

e-mail: thomas. bartz-beielstein@fh-koeln.de 


Lubica Benuskova 

University of Otago 

Dep. Computer Science 

133 Union Street East 

9016 Dunedin, New Zealand 
e-mail: lubica@cs.otago.ac.nz 


Dirk Biermann 

TU Dortmund University 

Dep. Mechanical Engineering 
Baroper Str. 303 

44227 Dortmund, Germany 
e-mail: biermann@isf.de 


Sašo Blažič 

University of Ljubljana 

Faculty of Electrical Engineering 
Tržaška 25 

1000 Ljubljana, Slovenia 

e-mail: saso.blazic@fe.uni-lj.si 


Christian Blum 

University of the Basque Country 

Dep. Computer Science and Artificial Intelligence 
Paseo Manuel Lardizabal 1 

20018 San Sebastian, Spain 

e-mail: christian.blum@ehu.es 


List of Authors 


Andrea Bobbio 

Universita del Piemonte Orientale 
DiSit - Computer Science Section 
Viale Teresa Michel, 11 

15121 Alessandria, Italy 

e-mail: andrea.bobbio@unipmn.it 


Josh Bongard 

University of Vermont 

Dep. Computer Science 

33 Colchester Ave. 

Burlington, VT 05405, USA 
e-mail: josh.bongard@uvm.edu 


Piero P. Bonissone 

Piero P. Bonissone Analytics, LLC 
3103 28th Street 

San Diego, CA 92104, USA 
e-mail: bonissone@gmail.com 


Dario Bruneo 

Universita’ di Messina 

Dip. Ingegneria Civile, Informatica 
Contrada di Dio — S. Agata 

98166 Messina, Italy 

e-mail: dbruneo@unime.it 


Alberto Bugarin Diz 
University of Santiago de Compostela 


Research Centre for Information Technologies 


15782 Santiago de Compostela, Spain 
e-mail: alberto. bugarin.diz@usc.es 


Humberto Bustince 

Universidad Publica de Navarra 
Dep. Automatica y Computación 
31006 Pamplona (Navarra), Spain 
e-mail: bustince@unavarra.es 


Martin V. Butz 

University of Tubingen 

Computer Science, Cognitive Modeling 
Sand 14 

72076 Tubingen, Germany 

e-mail: martin.butz@uni-tuebingen.de 


Alexandre Campo 

Université Libre de Bruxelles 

Unit of Social Ecology 

Boulevard du triomphe, 

Campus de la Plaine 

1050 Brussels, Belgium 

e-mail: alexandre.campo@ulb.ac.be 


Angelo Cangelosi 

Plymouth University 

Centre for Robotics and Neural Systems 
Drake Circus 

Plymouth, PL4 8AA, UK 

e-mail: A.Cangelosi@plymouth.ac.uk 


Robert Carrese 

LEAP Australia Pty. Ltd. 

Clayton North, Australia 

e-mail: robert.carrese@leapaust.com.au 


Ciro Castiello 

University of Bari 

Dep. Informatics 

via E. Orabona, 4 

70125 Bari, Italy 

e-mail: ciro.castiello@uniba. it 


Oscar Castillo 

Tijuana Institute of Technology 
Calzada Tecnolo-gico s/n 

22379 Tijuana, Mexico 

e-mail: ocastillo@tectijuana.mx 


Davide Cerotti 

Politecnico di Milano 

Dip. Elettronica, Informazione e Bioingegneria 
Via Ponzio 34/5 

20133 Milano, Italy 

e-mail: davide.cerotti@polimi.it 


Badong Chen 

Xi'an Jiaotong University 

Inst. Artificial Intelligence and Robotics 
710049 Xi'an, China 

e-mail: chenbd@mail.xjtu.edu.cn 


Ke Chen 

The University of Manchester 

School of Computer Science 

G10 Kilburn Building, Oxford Road 
Manchester, M13 9PL, UK 

e-mail: chen@cs.manchester.ac.uk 


Davide Ciucci 

University of Milano-Bicocca 

Dep. Informatics, Systems and Communications 
viale Sarca 336/14 

20126 Milano, Italy 

e-mail: ciucci@disco.unimib. it 


List of Authors 


Carlos A. Coello Coello 

CINVESTAV-IPN 

Dep. Computación 

Av. Instituto Politécnico Nacional No. 2508, Col. 
San Pedro Zacatenco 

D.F. 07300, México, Mexico 

e-mail: ccoello@cs.cinvestav.mx 


Chris Cornelis 

Ghent University 

Dep. Applied Mathematics and Computer Science 
Krijgslaan 281 (S9) 

9000 Ghent, Belgium 

e-mail: chriscornelis@ugr.es 


Nikolaus Correll 

University of Colorado at Boulder 
Dep. Computer Science 

Boulder, CO 80309, USA 

e-mail: ncorrell@colorado.edu 


Carlos Cotta Porras 

Universidad de Málaga 

Dep. Lenguajes y Ciencias de la Computación 
Avda Louis Pasteur, 35 

29071 Málaga, Spain 

e-mail: ccottap@lcc.uma.es 


Damien Coyle 

University of Ulster 

Intelligent Systems Research Centre 
Northland Rd 

Derry, Northern Ireland, BT48 7JL, UK 
e-mail: dh.coyle@ulster.ac.uk 


Guy De Tré 

Ghent University 

Dep. Telecommunications and 
Information Processing 
Sint-Pietersnieuwstraat 41 
9000 Ghent, Belgium 

e-mail: guy.detre@ugent. be 


Kalyanmoy Deb 

Michigan State University 

Dep. Electrical and Computer Engineering 
428 S. Shaw Lane 

East Lansing, MI 48824, USA 

e-mail: kdeb@egr.msu.edu 


Clarisse Dhaenens 

University of Lille 

CRIStAL laboratory 

M3 building — Cité scientifique 

59655 Villeneuve d'Ascq Cedex, France 
e-mail: clarisse.dhaenens@univ-lillel.fr 


Luca Di Gaspero 

Universita degli Studi di Udine 
Dip. Ingegneria Elettrica, 
Gestionale e Meccanica 

via delle Scienze 208 

33100 Udine, Italy 

e-mail: luca.digaspero@uniud. it 


Didier Dubois 

Université Paul Sabatier 

IRIT — Equipe ADRIA 

118 route de Narbonne 

31062 Toulouse Cedex 9, France 
e-mail: dubois@irit.fr 


Antonio J. Fernandez Leiva 

Universidad de Malaga 

Dep. Lenguajes y Ciencias de la Computación 
Avda Louis Pasteur, 35 

29071 Malaga, Spain 

e-mail: afdez@Icc.uma.es 


Javier Fernandez 

Universidad Publica de Navarra 

Dep. Automatica y Computación 

31006 Pamplona (Navarra), Spain 

e-mail: fcojavier.fernandez@unavarra.es 


Martin H. Fischer 

University of Potsdam 

Psychology Dep. 
Karl-Liebknecht-Str. 24/25 

14476 Potsdam OT Golm, Germany 
e-mail: martinf@uni-potsdam.de 


Janos C. Fodor 

Obuda University 

Dep. Applied Mathematics 
Bécsi út 96/b 

1034 Budapest, Hungary 
e-mail: fodor@uni-obuda.hu 


XVII 


List of Authors 


Jairo Alonso Giraldo 

Universidad de los Andes 

Dep. Electrical and Electronics Engineering 
Cra 1Este # 19A-40 

111711 Bogota, Colombia 

e-mail: ja.giraldo908@uniandes.edu.co 


Siegfried Gottwald 

Leipzig University 

Inst. Philosophy 

Beethovenstr. 15 

04107 Leipzig, Germany 

e-mail: gottwald@uni-leipzig.de 


Salvatore Greco 

University of Catania 

Dep. Economics and Business 
Corso Italia 55 

95129 Catania, Italy 

e-mail: salgreco@unict. it 


Marco Gribaudo 

Politecnico di Milano 

Dip. Elettronica, Informazione e Bioingegneria 
Via Ponzio 34/5 

20133 Milano, Italy 

e-mail: marco.gribaudo@polimi.it 


Roderich Groß 

University of Sheffield 

Dep. Automatic Control and Systems Engineering 
Mappin Street 

Sheffield, S1 3JD, UK 

e-mail: r.gross@sheffield.ac.uk 


Jerzy W. Grzymala-Busse 

University of Kansas 

Dep. Electrical Engineering and Computer Science 
3014 Eaton Hall, 1520 W. 15th St. 

Lawrence, KS 66045-7621, USA 

e-mail: jerzygb@ku.edu 


Hani Hagras 

University of Essex 

The Computational Intelligence Centre 
Wivenhoe Park 

Colchester, C04 3SQ, UK 

e-mail: hani@essex.ac.uk 


Heiko Hamann 

Universtity of Paderborn 

Dep. Computer Science 

Zukunftsmeile 1 

33102 Paderborn, Germany 

e-mail: heiko.hamann@uni-paderborn.de 


Thomas Hammerl 

WestbahnstraßBe 25/1/7 

1070 Vienna, Austria 

e-mail: thomas.hammerl@gmail.com 


Julie Hamon 

Ingenomix 

Dep. Research and Development 
Pole de Lanaud 

87220 Boisseuil, France 

e-mail: julie. hamon@ingenomix.fr 


Nikolaus Hansen 

Université Paris-Sud 

Machine Learning and Optimization Group (TAO) 
Rue Noetzlin 

91405 Orsay Cedex, France 

e-mail: hansen@!ri.fr 


Mark W. Hauschild 

University of Missouri-St. Louis 

Dep. Mathematics and Computer Science 
1 University Blvd 

St. Louis, MO 314-972-2419, USA 

e-mail: markhauschild@gmail.com 


Sebastien Hélie 

Purdue University 

Dep. Psychological Sciences 

703 Third Street 

West Lafayette, IN 47907-2081, USA 
e-mail: shelie@purdue.edu 


Jano |. van Hemert 

Optos 

Queensferry House, Carnegie Business Park 
Dunfermline, KY11 8GR, UK 

e-mail: jano@vanhemert.co.uk 


Holger H. Hoos 

University of British Columbia 
Dep. Computer Science 

2366 Main Mall 

Vancouver, BC V6T 1Z4, Canada 
e-mail: hoos@cs.ubc.ca 


List of Authors 


Tania Iglesias 

University of Oviedo 

Dep. Statistics and O.R. 

3360 Oviedo, Spain 

e-mail: iglesiasctania@uniovi.es 


Giacomo Indiveri 

University of Zurich and ETH Zurich 
Inst. Neuroinformatics 

Zurich, Switzerland 

e-mail: giacomo@ini.uzh.ch 


Masahiro Inuiguchi 

Osaka University 

Dep. Systems Innovation, Graduate School of 
Engineering Science 

1-3 Machikaneyama-cho 

560-8531 Toyonaka, Osaka, Japan 

e-mail: inuiguti@sys.es.osaka-u.ac.jp 


Hisao Ishibuchi 

Osaka Prefecture University 

Dep. Computer Science and Intelligent Systems, 
Graduate School of Engineering 

1-1 Gakuen-Cho, Sakai 

599-8531 Osaka, Japan 

e-mail: hisaoi@cs.osakafu-u.ac.jp 


Emiliano luliano 

CIRA, Italian Aerospace Research Center 
Fluid Dynamics Lab. 

Via Maiorise 

81043 Capua (CE), Italy 

e-mail: e.iuliano@cira. it 


Julie Jacques 

Alicante LAB 

50, rue Philippe de Girard 

59113 Seclin, France 

e-mail: julie.jacques@alicante.fr 


Andrzej Jankowski 

Knowledge Technology Foundation 
Nowogrodzka 31 

00-511 Warsaw, Poland 

e-mail: andrzej.adgam@gmail.com 


Balasubramaniam Jayaram 

Indian Institute of Technology Hyderabad 
Dep. Mathematics 

ODF Estate, Yeddumailaram 

502 205 Hyderabad, India 

e-mail: jbala@iith.ac.in 


Laetitia Jourdan 

University of Lille 1 

INRIA/UFR IEEA/laboratory CRIStAL/CNRS 
59655 Lille, France 

e-mail: laetitia.jourdan@univ-lille1.fr 


Nikola Kasabov 

Auckland University of Technology 
KEDRI — Knowledge Engineering and 
Discovery Research Inst. 

120 Mayoral Drive 

Auckland, New Zealand 

e-mail: nkasabov@aut.ac.nz 


Petra Kersting 

TU Dortmund University 

Dep. Mechanical Engineering 
Baroper Str. 303 

44227 Dortmund, Germany 
e-mail: pkersting@isf.de 


Erich P. Klement 

Johannes Kepler University 

Dep. Knowledge-Based Mathematical Systems 
Altenberger Strasse 69 

4040 Linz, Austria 

e-mail: ep.klement@jku.at 


Anna Kolesárová 

Slovak University of Technology in Bratislava 
Faculty of Chemical and Food Technology 
Radlinského 9 

812 37 Bratislava, Slovakia 

e-mail: anna.kolesarova@stuba.sk 


Magda Komornikova 

Slovak University of Technology 
Dep. Mathematics 

Radlinského 11 

813 68 Bratislava, Slovakia 
e-mail: magda@math.sk 


Mark Kotanchek 

Evolved Analytics LLC 

3411 Valley Drive 

Midland, MI 48640, USA 

e-mail: mark@evolved-analytics.com 


Robert Kozma 

University of Memphis 

Dep. Mathematical Sciences 
Memphis, TN 38152, USA 
e-mail: rkozma@memphis.edu 


XIX 


List of Authors 


Tomas Kroupa 

Institute of Information Theory and Automation 
Dep. Decision-Making Theory 

Pod Vodarenskou věží 4 

182 08 Prague, Czech Republic 

e-mail: kroupa@utia.cas.cz 


Rudolf Kruse 

University of Magdeburg 

Faculty of Computer Science 
Universitatsplatz 2 

39114 Magdeburg, Germany 

e-mail: kruse@iws.cs.uni-magdeburg.de 


Tufan Kumbasar 

Istanbul Technical University 
Control Engineering Dep. 
34469 Maslak, Istanbul, Turkey 
e-mail: kumbasart@itu.edu.tr 


James T. Kwok 

Hong Kong University of Science and Technology 
Dep. Computer Science and Engineering 

Clear Water Bay 

Hong Kong, Hong Kong 

e-mail: jamesk@cse.ust.edu.hk 


Rhyd Lewis 

Cardiff University 

School of Mathematics 
Cardiff, CF10 4AG, UK 
e-mail: lewisR9@cf.ac.uk 


Xiaodong Li 

RMIT University 

School of Computer Science and 
Information Technology 
Melbourne, 3001, Australia 
e-mail: xiaodong.li@rmit.edu.au 


Paulo J.G. Lisboa 

Liverpool John Moores University 
Dep. Mathematics & Statistics 
Byrom St 

Liverpool, L3 3AF, UK 

e-mail: p.j.lisboa@ljmu.ac.uk 


Weifeng Liu 

Jump Trading 

600 W. Chicago Ave. 
Chicago, IL 60654, USA 
e-mail: weifeng@ieee.org 


Fernando G. Lobo 

Universidade do Algarve 

Dep. Engenharia Electronica e Informatica 
Campus de Gambelas 

8005-139 Faro, Portugal 

e-mail: fernando.lobo@gmail.com 


Antonio Lopez Jaimes 

CINVESTAV-IPN 

Dep. Computación 

Av. Instituto Politécnico Nacional No. 2508, Col. 
San Pedro Zacatenco 

D.F. 07300, México, Mexico 

e-mail: tonio.jaimes@gmail.com 


Francisco Luna 

Centro Universitario de Mérida 
Santa Teresa de Jornet 38 
06800 Mérida, Spain 

e-mail: fluna@unex.es 


Luis Magdalena 

European Centre for Soft Computing 
Gonzalo Gutiérrez Quirós s/n 

33600 Mieres, Spain 

e-mail: luis. magdalena@softcomputing.es 


Sebastia Massanet 

University of the Balearic Islands 

Dep. Mathematics and Computer Science 
Crta. Valldemossa km. 7,5 

07122 Palma de Mallorca, Spain 

e-mail: s.massanet@uib.es 


Benedetto Matarazzo 
University of Catania 

Dep. Economics and Business 
Corso Italia 55 

95129 Catania, Italy 

e-mail: matarazz@unict.it 


Sergi Mateo Bellido 

Polytechnic University of Catalonia 
Dep. Computer Architecture 

08034 Barcelona, Spain 

e-mail: sergim@ac.upc.edu 


James McDermott 

University College Dublin 

Lochlann Quinn School of Business 
Belfield 

Dublin 4, Ireland 

e-mail: jnmcd@jmmcd.net 


List of Authors 


Patricia Melin 

Tijuana Institute of Technology 
Dep. Computer Science 

Chula Vista, CA 91909, USA 
e-mail: pmelin@tectijuana.mx 


Corrado Mencar 

University of Bari 

Dep. Informatics 

via E. Orabona, 4 

70125 Bari, Italy 

e-mail: corrado.mencar@uniba. it 


Radko Mesiar 

STU in Bratislava 

Dep. Mathematics and Descriptive Geometry 
Radlinskeho 11 

813 68 Bratislava, Slovakia 

e-mail: radko.mesiar@stuba.sk 


Ralf Mikut 

Karlsruhe Institute of Technology (KIT) 

Inst. Applied Computer Science 
Hermann-von-Helmholtz-Platz 1 

76344 Eggenstein-Leopoldshafen, Germany 
e-mail: ralf.mikut@kit.edu 


Ali A. Minai 

University of Cincinnati 

School of Electronic & Computing Systems 
2600 Clifton Ave. 

Cincinnati, OH 45221-0030, USA 

e-mail: ali.minai@uc.edu 


Sadaaki Miyamoto 

University of Tsukuba 

Risk Engineering 

1-1-1 Tennodai 

305-8573 Tsukuba, Japan 

e-mail: miyamoto@risk.tsukuba.ac.jp 


Christian Moewes 
University of Magdeburg 
Faculty of Computer Science 
Universitdatsplatz 2 

39114 Magdeburg, Germany 
e-mail: cmoewes@ovgu.de 


Javier Montero 

Complutense University, Madrid 

Dep. Statistics and Operational Research 
Plaza de las Ciéncias, 3 

28040 Madrid, Spain 

e-mail: monty@mat.ucm.es 


Ignacio Montes 

University of Oviedo 

Dep. Statistics and O.R. 
3360 Oviedo, Spain 

e-mail: imontes@uniovi.es 


Susana Montes 

University of Oviedo 

Dep. Statistics and O.R. 
3360 Oviedo, Spain 

e-mail: montes@uniovi.es 


Oscar H. Montiel Ross 

Av. del Parque No. 1312 

B.C. 22414, Mesa de Otay, Tijuana, Mexico 
e-mail: oross@citedi.mx 


Manuel Mucientes 

University of Santiago de Compostela 
Research Centre for Information Technologies 
15782 Santiago de Compostela, Spain 

e-mail: manuel. mucientes@usc.es 


Nysret Musliu 

Vienna University of Technology 
Inst. Information Systems 
FavoritenstraBe 9 

1000 Vienna, Austria 

e-mail: musliu@dbai.tuwien.ac.at 


Yusuke Nojima 

Osaka Prefecture University 

Dep. Computer Science and Intelligent Systems, 
Graduate School of Engineering 

1-1 Gakuen-Cho, Sakai 

599-8531 Osaka, Japan 

e-mail: nojima@cs.osakafu-u.ac.jp 


Stefano Nolfi 

Consiglio Nazionale delle Ricerche (CNR-ISTC) 
Inst. Cognitive Sciences and Technologies 
Via S. Martino della Battaglia, 44 

00185 Roma, Italy 

e-mail: stefano.nolfi@istc.cnr.it 


XXI 


List of Authors 


Una-May O'Reilly 

Massachusetts Institute of Technology 

Computer Science and Artificial Intelligence Lab. 
32 Vassar St. 

Cambridge, MA 02139, USA 

e-mail: unamay@csail.mit.edu 


Miguel Pagola 

Universidad Publica de Navarra 

Dep. Automatica y Computación 
31006 Pamplona (Navarra), Spain 
e-mail: miguel.pagola@unavarra.es 


Lynne Parker 

University of Tennessee 

Dep. Electrical Engineering and Computer Science 
1520 Middle Drive 

Knoxville, TN 37996, USA 

e-mail: leparker@utk.edu 


Kevin M. Passino 

The Ohio State University 

Dep. Electrical and Computer Engineering 
2015 Neil Avenue 

Columbus, OH 43210-1272, USA 

e-mail: passino@ece.osu.edu 


Martin Pelikan 

1271 Lakeside Dr. #3123 

Sunnyvale, CA 94085, USA 

e-mail: martin@martinpelikan.net 


Irina Perfilieva 

University of Ostrava 

Inst. Research and Applications of Fuzzy Modeling 
30. dubna 22 

70103 Ostrava, Czech Republic 

e-mail: Irina. Perfilieva@osu.cz 


Henry Prade 

Université Paul Sabatier 

IRIT — Equipe ADRIA 

118 route de Narbonne 

31062 Toulouse Cedex 9, France 
e-mail: prade@irit.fr 


Mike Preuss 

WWU Minster 

Inst. Wirtschaftsinformatik 
Leonardo-Campus 3 

48149 Minster, Germany 

e-mail: mike. preuss@tu-dortmund.de 


José C. Principe 

University of Florida 

Dep. Electrical and Computer Engineering 
Gainesville, FL 32611, USA 

e-mail: principe@cnel.ufl.edu 


Domenico Quagliarella 

CIRA, Italian Aerospace Research Center 
Fluid Dynamics Lab. 

Via Maiorise 

81043 Capua (CE), Italy 

e-mail: d.quagliarella@cira.it 


Nicanor Quijano 

Universidad de los Andes 

Dep. Electrical and Electronics Engineering 
Cra 1Este # 19A-40 

111711 Bogota, Colombia 

e-mail: nquijano@uniandes.edu.co 


Jaroslav Ramik 

Silesian University in Opava 

Dep. Informatics and Mathematics 
University Sq. 1934/3 

73340 Karviná, Czech Republic 
e-mail: ramik@opf.slu.cz 


Ismael Rodriguez Fdez 

University of Santiago de Compostela 
Research Centre for Information Technologies 
15782 Santiago de Compostela, Spain 

e-mail: ismael.rodriguez@usc.es 


Franz Rothlauf 

Johannes Gutenberg University Mainz 

Gutenberg School of Management and Economics 
Jakob Welder-Weg 9 

55099 Mainz, Germany 

e-mail: rothlauf@uni-mainz.de 


Jonathan E. Rowe 

University of Birmingham 

School of Computer Science 
Birmingham, B15 2TT, UK 

e-mail: J.E.Rowe@cs.bham.ac.uk 


Imre J. Rudas 

Óbuda University 

Dep. Applied Mathematics 
Bécsi út 96/b 

1034 Budapest, Hungary 
e-mail: rudas@uni-obuda.hu 


List of Authors 


Giinter Rudolph 

Technische Universitat Dortmund 

Fak. Informatik 

Otto-Hahn-Str. 14 

44227 Dortmund, Germany 

e-mail: guenter.rudolph@cs.tu-dortmund.de 


Gabriele Sadowski 

Technische Universitat Dortmund 

Bio- und Chemieingenieurwesen 

Emil-Figge-Str. 70 

44227 Dortmund, Germany 

e-mail: gabriele.sadowski@bci.tu-dortmund.de 


Marco Scarpa 

Universita’ di Messina 

Dip. Ingegneria Civile, Informatica 
Contrada di Dio — S. Agata 

98166 Messina, Italy 

e-mail: mscarpag@unime. it 


Werner Schafhauser 

XIMES 

HollandstraBe 12/12 

1020 Vienna, Austria 

e-mail: schafhauser@ximes.com 


Roberto Sepúlveda Cruz 

Av. del Parque No. 1319 

B.C. 22414, Mesa de Otay, Tijuana, Mexico 
e-mail: rsepulve@citedi.mx 


Jennie Si 

Arizona State University 

School of Electrical, Computer and 
Energy Engineering 

Tempe, AZ 85287-5706, USA 
e-mail: si@asu.edu 


Marco Signoretto 

Katholieke Universiteit Leuven 

Kasteelpark Arenberg 10 

3001 Leuven, Belgium 

e-mail: marco.signoretto@esat.kuleuven.be 


Andrzej Skowron 

University of Warsaw 

Faculty of Mathematics, 
Computer Science and Mechanics 
Banacha 2 

02-097 Warsaw, Poland 

e-mail: skowron@mimuw.edu.pl 


Igor Skrjanc 

University of Ljubljana 

Faculty of Electrical Engineering 
Tržaška 25 

1000 Ljubljana, Slovenia 

e-mail: igor.skrjanc@fe.uni-lj.si 


Roman Stowinski 

Poznan University of Technology 

Inst. Computing Science 

Piotrowo 2 

60-965 Poznan, Poland 

e-mail: roman.slowinski@cs. put. poznan. pl 


Guido Smits 

Dow Benelux BV 

Core R&D 

Herbert H. Dowweg 5 

4542 NM Hoek, The Netherlands 
e-mail: gfsmits@dow.com 


Ronen Sosnik 

Holon Institute of Technology (H.I.T.) 
Electrical, Electronics and Communication 
Engineering 

52 Golomb St. 

5810201 Holon, Israel 

e-mail: ronens@hit.ac.il 


Alessandro Sperduti 

University of Padova 

Dep. Pure and Applied Mathematics 
Via Trieste, 63 

351 21 Padova, Italy 

e-mail: sperduti@math.unipd.it 


Kasper Støy 

IT University of Copenhagen 
Rued Langgaards Vej 7 

2300 Copenhagen S, Denmark 
e-mail: ksty@itu.dk 


Harrison Stratton 

Arizona State University & Barrow 
Neurological Institute 

Phoenix, AZ 85013, USA 

e-mail: Harrison. Stratton@asu.edu 


XXIII 


List of Authors 


Thomas Stiitzle 

Université libre de Bruxelles (ULB) 
IIRIDIA, CP 194/6 

Av. F. Roosevelt 50 

1050 Brussels, Belgium 

e-mail: stuetzle@ulb.ac.be 


Dirk Sudholt 

University of Sheffield 

Dep. Computer Science 

211 Portobello 

Sheffield, S1 4DP, UK 

e-mail: d.sudholt@sheffield.ac.uk 


Ron Sun 

Rensselaer Polytechnic Institute 
Cognitive Science Dep. 

110 Eighth Street, Carnegie 302A 
Troy, NY 12180, USA 

e-mail: rsun@rpi.edu 


Johan A. K. Suykens 
Katholieke Universiteit Leuven 
Kasteelpark Arenberg 10 

3001 Leuven, Belgium 


e-mail: johan.suykens@esat. kuleuven.be 


Roman W. Swiniarski (deceased) 


El-Ghazali Talbi 

University of Lille 

Computer Science CRISTAL 
Bat.M3 cité scientifique 

59655 Villeneuve d'Ascq, France 


e-mail: el-ghazali.talbi@univ-lille1.fr 


Lothar Thiele 


Swiss Federal Institute of Technology Zurich 
Computer Engineering and Networks Lab. 


Gloriastrasse 35 
8092 Zurich, Switzerland 
e-mail: thiele@ethz.ch 


Peter Tino 

University of Birmingham 
School of Computer Science 
Edgbaston 

Birmingham, B15 2TT, UK 
e-mail: P.Tino@cs.bham.ac.uk 


Joan Torrens 

University of the Balearic Islands 

Dep. Mathematics and Computer Science 
Crta. Valldemossa km. 7,5 

07122 Palma de Mallorca, Spain 

e-mail: jts224@uib.es 


Vito Trianni 

Consiglio Nazionale delle Ricerche 

Ist. Scienze e Tecnologie della Cognizione 
via San Martino della Battaglia 44 

00185 Roma, Italy 

e-mail: vito.trianni@istc.cnr.it 


Enric Trillas 

European Centre for Soft Computing 
Fundamentals of Soft Computing 

33600 Mieres, Spain 

e-mail: enric.trillas@softcomputing.es 


Fevrier Valdez 

Tijuana Institute of Technology 

Calzada del Tecnológico S/N, Tomas Aquino 
B.C. 22414, Tijuana, Mexico 

e-mail: fevrier@tectijuana.mx 


Nele Verbiest 

Ghent University 

Dep. Applied Mathematics, 
Computer Science and Statistics 
Krijgslaan 281 (S9) 

9000 Ghent, Belgium 

e-mail: nele.verbiest@ugent. be 


Thomas Villmann 

University of Applied Sciences Mittweida 
Dep. Mathematics, Natural and Computer 
Sciences 

Technikumplatz 17 

09648 Mittweida, Germany 

e-mail: thomas.villmann@hs-mittweida.de 


Milan Vlach 

Charles University 

Theoretical Computer Science and 
Mathematical Logic 

Malostranské náměstí 25 

118 00 Prague, Czech Republic 
e-mail: Milan.Viach@mff. cuni.cz 


List of Authors 


Ekaterina Vladislavleva 

Evolved Analytics Europe BVBA 

A. Coppenslaan 27 

2300 Turnhout, Belgium 

e-mail: katya@evolved-analytics.com 


Tobias Wagner 

TU Dortmund University 

Dep. Mechanical Engineering 
Baroper Str. 303 

44227 Dortmund, Germany 
e-mail: wagner@isf.de 


Jun Wang 

The Chinese University of Hong Kong 

Dep. Mechanical & Automation Engineering 
Shatin, New Territories 

Hongkong, Hong Kong 

e-mail: jwang@mae.cuhk.edu.hk 


Simon Wessing 

Technische Universitat Dortmund 

Fak. Informatik 

Otto-Hahn-Str. 14 

44227 Dortmund, Germany 

e-mail: simon.wessing@tu-dortmund.de 


Wei-Zhi Wu 

Zhejiang Ocean University 

School of Mathematics, Physics and 
Information Science 

No.1 Haida South Road, Lincheng District 
316022 Zhoushan, Zhejiang, China 
e-mail: wuwz@zjou.edu.cn 


Lei Xu 

The Chinese University of Hong Kong 
Dep. Computer Science and Engineering 
Shatin, New Territories 

Hong Kong, Hong Kong 

e-mail: Ixu@cse.cuhk.edu.hk 


JingTao Yao 

University of Regina 

Dep. Computer Science 

3737 Wascana Parkway 

Regina, Saskatchewan, S4S 0A2, Canada 
e-mail: jtyao@cs.uregina.ca 


Yiyu Yao 

University of Regina 

Dep. Computer Science 

3737 Wascana Parkway 

Regina, Saskatchewan, S4S 0A2, Canada 
e-mail: yyao@cs.uregina.ca 


Andreas Zabel 

TU Dortmund University 

Dep. Mechanical Engineering 
Baroper Str. 303 

44227 Dortmund, Germany 
e-mail: zabel@isf.de 


Stawomir Zadrozny 

Polish Academy of Sciences 

Systems Research Inst. 

ul. Newelska 6 

01-447 Warsaw, Poland 

e-mail: Slawomir.Zadrozny@ibspan.waw. pl 


Zhigang Zeng 

Huazhong University of Science and Technology 
Dep. Control Science and Engineering 

No. 1037, Luoyu Road 

430074 Wuhan, China 

e-mail: zgzeng@hust.edu.cn 


Yan Zhang 

University of Regina 

Dep. Computer Science 

3737 Wascana Parkway 

Regina, Saskatchewan, S4S 0A2, Canada 
e-mail: zhang83y@cs.uregina.ca 


Zhi-Hua Zhou 

Nanjing University 

National Key Lab. for Novel Software Technology 
210023 Nanjing, China 

e-mail: zhouzh@nju.edu.cn 


XXV 


Contents 


List of Abbreviations .................... cece cece cece eeccceeeecceeecceecceeeenecs 


1 Introduction 


Janusz Kacprzyk, Witold PRY CZ. seseris coreia iinic iin Vivier oie bbe 08 
11 Details ofthe Contem S cssc sapnis senne cranss nnii inrnra inanin 
1.2 Conclusions and Acknowledgments .............ccccccessseceeceeeeees 


Part A Foundations 


2 Many-Valued and Fuzzy Logics 


Slegfned GORWOG oncccsis codecs ecgiaae ads obs ssa t onra tess odevsedederies 
2.1 ‘Basic Many Valued LORS a. .ccc cose scseiccscasecwercetexeersetedeeces 
2:2 Ry SOS E A A a shares elas cuanen aueaeeee 
Za P= NONWKBSsGd LORS eangarra e Enia 
2% Partlewlar FUZZY LOBIES -o 22. .cccds.cccieeceuets erenneren ennbo rsr iasi 
25 SOME Generalizations. ices cccnccscsesdsseaseces erneseds eevvdedondes 
2.6 Extensions with Graded Notions of Inference ....................208. 
2.7- Some Complexity Results 25. wecetocs cco wace desea deweadied cheeses cetne 
28  Cönmduding Remarks 2.2 sec. 2 ccedes ce nhs eediaen seis aedeiaged née sedewaes 
aae E E A E earner casa near ieiedantes 


3 Possibility Theory and Its Applications: Where Do We Stand? 
Didier Dubois, Henry Prade oi osis- cs cisinsccn ts tins ea wae cvind aa sina'e 04 vistas riake sins 


3.1 Historical Ba CR an 5 is iocs.s s ararernsn a win w-iec¥ vinin nieinresinie ia elaine wins 650 ¥ is 
3.2 Basic Notions of Possibility THeOry.............. cc cece ec eeeeeeeeeeeeeee 
3.3 Qualitative Possibility Theory............ cc cccee eee e cece cceeeeeeeeeeees 
3.4 Quantitative Possibility Theory .........ssesesesessosssessosesessoeesess 
39 Some Applications. oiscs . siasewisivis onc's scivicinn tens tene Enn E a Wie oie bates 
3.6 Some Current Research LIMES: «ccs cscs catssssacssesedss saves vaevee sores 
References oie asa e dans diaeracait Sadia nnana CEE EES SEENE ENA Cie ee ee SEa 


4 Aggregation Functions on [0,1] 


Radko Mesiar, Anna Kolesárová, Magda Komorníková ............0000eeeees 
4.1 Historical and Introductory Remarks .............cccccesseeeeeceeeeees 
4.2 Classification of Aggregation FUNCTtIONS...............ceeeeeeceeeeeeees 
4.3 Properties and Construction Methods ...............eeeeeeecceceeeeees 
Bo  Conduding Remarks ...........c0000 sc canccrssenseescccaseesseasacaoaes 
Refere NCGS o se a EREE TEE EEE caesar ERSS 


5 Monotone Measures-Based Integrals 
Erich P. Klement: Radko Mèsiði occ. isc cs cance cssca svete sieeve cecvseulse oclee 
5.1 Preliminaries, Choquet, and Sugeno Integrals................eeceeeee 
Bid Benvenuti integral od sane coe cc acicsnecend nonobese senacsnaasmdes einna 


XXVII 


XXVIII 


Contents 


53 WS a IEC BANS asics ccs so cenanihcaieddevessasacstdebecncanecceeemeaeae 82 
5.4 General Integrals Which Are Not Universal ..............cceeeeeeeeeees 84 
5.5 Concluding Remarks, Application Fields..............ccccceeeeee eee ees 86 
RETETE GOS cenna oa cosaedone sata adamiescatanmen cd cate dominets EA 87 


The Origin of Fuzzy Extensions 
Humberto Bustince, Edurne Barrenechea, Javier Ferndndez, 


Miguel Pagola, Javier Montero sii <snsse ae vistas news wenceee au sewamasie’ mentees 89 
6.1 Considerations Prior to the Concept of Extension of Fuzzy Sets ..... 90 
6&2 OPE the Extensions so oiiccsc a cidencaascls mdr outuebadadessveees seaities 93 
fee -- WOH 2 FR SONS rnea colwide E EE ates oie touGueeonkaus 94 
6.4 —Interval-Valued Fuzzy Sets ...........ccccccee eee e ee cccceeeeseeeeeeeeees 98 
6.5 Atanasssov's Intuitionistic Fuzzy Sets or Bipolar Fuzzy Sets 

OF Type 2 0r IF Fuzzy SEIS sac 5s Sens and ak cindaie cine dadnele ining 103 
6.6  Atanassov's Interval-Valued Intuitionistic Fuzzy Sets ................ 105 
6.7 Links Between the Extensions of Fuzzy Sets ..............eeeeecceeees 106 
6:8. Other ypes Of SEIS c.ccsdd tices wceevia ceases identea teed stdeseranees 106 
CS LONA erakin n EEn EEEE EE etn bate 108 
RETTENET ics cic cancicdu E E EE EET ESA 108 
F-Transform 
lina PERI OW Gc 0:0 seasons elena dot noit EE Sav are nade E E EREE E EEEE E Ea 113 
Tl. Fuzzy MOONE ercsi eosin s daw tiana void noise ERK E namie oo 113 
O FURL E E EE E 6 Seareiges stalainie’y Bincarerd bine 114 
Cee . Fuzzy TAS TOD sees a shies Sends Sean cenanece wahoo REA phase aaiwe d Maneater 117 
CA Discrete FHTPASTON u esessnerirsisri eiiis i eo oe 119 
7.5 F-Transforms of Functions of Two Variables ................eeee eee eee 120 
TF RN E E E EES 121 
Cal APPIAN S iain oiai aei E ERA EER 122 
TA- Ondu ONS ro arrena resa cae wie waned VEEE ECEE ewes 129 
Referentes sssr eiar sni siert eee con ETETE TEIMAS E LESELE LAEE EEIE RPTE EEE Ei 129 


Fuzzy Linear Programming and Duality 


Jaroslav Ramik, Milan VIACH.......cccc ccc cce cece ccc ccceeeeucnsecceeeeuueneeeses 131 
8.1 PelimiMants i.e dkas cov eve ngeea riirii siii vee ve unciudebetbsaneGed aan 132 
8.2 Fuzzy Linear PrograMMing...........ccccceeeeseeceececceenseseeceeeeees 135 
8.3 Duality in Fuzzy Linear Programming........ssssssssessssesessssesese 137 
Bae COMAWSION 65 saccscsrisdeamicedasmaninaes inde RE EARRA 143 
RETORGMCOS 6.) 0262060500 aeiio Gon EE AEK EEE EEE pane eand ead aaaes 143 


Basic Solutions of Fuzzy Coalitional Games 


Tomás kroupä, Mian VIG, nc cscis okecaconswaswaacavieseaeredesadenenedueees 145 
9.1  Coalitional Games with Transferable Utility ................. cece cee eee 146 
9.2 Coalitional Games with Fuzzy Coalitions .................c cece eee eeeee 150 
eS PUM ROWS is ic bison heed tice E A E 155 


GTP ENCES eoc sssssssarsia'p nieces ersscisisia wie no dee ne oro ie n TEE AEE DE 156 


Contents 


Part B Fuzzy Logic 


10 Basics of Fuzzy Sets 


János C Fodor Imre J. RUGS: o.si5 cic cess oisanasoisadeoxuvsacortedsoesconenead 
10.1 Classical Mathematics and Logic ............ ccc cece cece eeeee esse eeeeee 
10.2 Fuzzy Logic, Membership Functions, and Fuzzy Sets ................. 
10.3 COMMECEIVES IN FUZZY LOBIC.. nc. cccenseckadsteedae nee buadeedecderdubeas 
10:4 tonduüding RemafKS ocres ci ececscc sea ceeis eee os cancun tbad aia 
PRET nenco srrrocsitenca itini doti rrn n RE E NENE A arnnecals 


11 Fuzzy Relations: Past, Present, and Future 


Susana Montes, Ignacio Montes, Tania Iglesias ...........ceceeeeeececeeeeees 
TLL Fuzzy Fela E oia ENEE EEA E Few a 
11.2 Cut Relatia geese a E Baba are deere ners Vale N E Naa 
11:3 Fuzzy Binary RelationS ncns nornas a aE 
11.4 Particular Cases of Fuzzy Binary Relations.................ccceceeeeees 
11.5 Present and Future of Fuzzy Relations..............cccceeeeseeeeeeeeee 
ROTOTONGES 5.2555. 5 eiscneived acdivuecersceees ease EE EEEE thea suas sneed tetera nates 
12 Fuzzy Implications: Past, Present, and Future 

Michat Baczynski, Balasubramaniam Jayaram, Sebastia Massanet, 

JOG TOENG sos sc sian oieie:¥ vidne ritti wien one’s o ein eindase. inte NEEE EEEE a's Wee 
12.1 Fuzzy Implications: Examples, Properties, and Classes .............. 
12.2 Current Research on Fuzzy Implications ...............cee eee eeeeeeeees 
12.3 Fuzzy Implications if Applications. 65. .6cscscavcscasess cess saves scans 
12.4 Future of Fuzzy Implications ...665..ccvs.eetee esses ances ces ieeiverdas 
ROTRTONCCS aici nciess ccigsarsaesancatiacce ge eeeeateaasaweaasaccecevecdaaaeaa anes 

13 Fuzzy Rule-Based Systems 

luis Magdalena. oo sinus givesinieis 0 oven wie ong itinn Enr sE EEEE 4 Wa ease Wore SES 
13.1 Components of a Fuzzy Rule Based-System..............ceeeeeeeeeeee 
13.2 Types of Fuzzy Rule-Based Systems 00.0. .0000600cciseeeteceedeoredaeees 
13.3 Hierarchical Fuzzy Rule-Based Systems............ccccceeseseeeeeeeees 
13.4 Fuzzy Rule-Based Systems Design ............ceccccceeeseeeeeeceeeenes 
13:5 CONCUSSIONS sc.0ccc.0sceccocseiaeceeee EnEn E NNER Da dee ERRANEN 
ROTOTONGOS .5..2555.5:055 oinen h in EEEE EEEE EEEE ERNO EA 


14 Interpretability of Fuzzy Systems: 
Current Research Trends and Prospects 


Jose M. Alonso, Ciro Castiello, Corrado M@NCOTP.........cecceucceuccecceecceess 
14.1 The Quest for Interpretability............... ccc cece eee c cc ceeee seen eeeeee 
14.2  Interpretability Constraints and Criteria ............... cs cece cece eee eee 
14.3 Interpretability Assessment .......... ccc eee eee e cece ce eeeeteeeeeeeees 
14.4 Designing Interpretable Fuzzy SySteMs...........cccccceeeeeeeeeeeeeee 
14.5 Interpretable Fuzzy Systems in the Real World ...............eeeeeeee 
14.6 Future Research Trends on Interpretable Fuzzy Systems ............. 
F One ienn a S a E a PAM AE avers OO 
PROTEINS css isansnacoiensentreys a asa EE AOE AE 


XXIX 


XXX Contents 


15 Fuzzy Clustering - Basic Ideas and Overview 


SOUOGI  NIYAMOW < 2cccc: calc teen clbielescesek baebiesewsds a tees seein a 239 
15.1 Fuzzy CSAS ccnn eann E o EEES EE 239 
12 FU CMe ei e NE EE EEEE EE EGE 239 
15.3 Hierarchical Fuzzy Clustering sooccsnciscsicisecnrissicccsnceiteasisss 245 
ISG COMON scien aa ceo. rnia EE EEE EA E E onions 246 
Ea E a E E E E 247 


16 An Algebraic Model of Reasoning to Support Zadeh's CWW 


PSCC Te ines resccueade tac eaecen soeaea sina E ia eseesnedensn 249 
16.1. AView on Reasòning. s5 osc ccanciecsied seecscewecusdemecsceaesseeaeanes 249 
TOR WIGS cerrara dew itsa need dahis andi adden E A O N G 250 
SS RAS CHUNG oi ce newecdebinc sce sienenensitantnecmmimdsinonsadiedacaseenieeiun tts 251 
164 Reasoming and LOG. ceccia diac cotesdaGen headeeducvudeaciesesnatecaties 254 
16.5 A Possible Scheme for an Algebraic Model 

Of COMMONSENSE REASONING...........cccceeeeeeeeccecceeeeeeeeeeceeees 255 
16.6 Weak and Strong Deduction: Refutations and Conjectures 

in a BFA (with a Few Restrictions) .............ccccceeeeeeeessesessees 260 
16.7 Toward a Classification of Conjectures ..............ccceeeee eee eeeeeees 262 
168 LOS REMEM renren ei ina EN aE E enesameness 264 
IF COM SO nrar n EEE EEE 265 
ROTOR ROS e eaea EEE E ETSA 266 


17 Fuzzy Control 


Christian Moewes, Ralf Mikut, Rudolf KruS@............cceseccccecceeneeeeeees 269 
17.1 Knowledgé-Driven COntiol ocececisiiicsereciscciisririsesrecisiieiise 269 
17.2 Classical Control Engineering ..............c cee ee cece cece eeeeeeeeeceeees 270 
17.3 Using Fuzzy Rules for CONG! i. ceca ccc cecensdstaciaa cee sendece caeoes 271 
17.4 A Glance at Some Industrial Applications .................. cece cece eee 276 
17.5 Automatic Learning of Fuzzy Controllers..............csseeeeeeeeceeees 279 
LF RS HADNT oa. cine sa oin:s Siovnis vid die E 281 
ROTOIBNEOS eonda AAEE cuacauedemesa ones 281 


18 Interval Type-2 Fuzzy PID Controllers 


Tufan Kumbasar, Hani HOQIGS ..0.5 cocci sccsarcted ncccsdebed ecececsegeneneee 285 
18:1 Fuzzy COMO) Background soccorsi seemed rniii i 285 
18.2 The General Fuzzy PID Controller Structure ...............eeeeeeeceeees 286 
18.3 -Simülation Sde eera ran iE EE Roe hee nates 291 
180 COn SiON s ireren aniani E E EEE 292 
ROETOPE EOS unsa naran a EE E E E E 293 


19 Soft Computing in Database and Information Management 


Guy De Tré, Stawomir ZACrOZNY.......0000ccccescnccceeeccnnceeeeeensceeseeesens 295 
19.1 Challenges for Modern Information Systems...............eeeeeeeeeee 295 
19.7 Some PADUA e nnana ia Bove olevesoie vies 296 
19.3 Soft Computing in Information Modeling .........ssssssssessessesses 298 
19.4 Soft Computing in Querying .......sssssssssssesessssesessssessessessese 302 
DS COM rore E E a Eaa 309 


OTIC NCES oissiscc 5 :o65 ois dosisiarasae' aera ssesiaia oe wa rsieind eri E AEDE 309 


Contents 


20 Application of Fuzzy Techniques to Autonomous Robots 


Ismael Rodriguez Fdez, Manuel Mucientes, Alberto Bugarín Diz............ 313 
20.1. Robotics and Fuzzy lOt oericsceciiernaiiiimresisincnneieniniai o 313 
20.2 Wal Fallow ye secacucsdssdesvec tes de chive cdbedssdeevan aes ESEE 314 
PLOM ~ WARP LA do E ae Mecamasitiat A Maweamteedassmdeabens 315 
20.4 Talega TaC MAB eien icenen i e a a aa 317 
20.5 Moving Target Tracking ........... ccc cece cece cc eee neste ceceeeeeeeeeeeeees 318 
20:6 Perception cicc6 cxva sce sohas ea cweds teunk var a Soesaaa neu E naeedaad 319 
Bie -P o i a E A EEA EEN 319 
POR MAM ieor merreni a anaE O E R 320 
209 CODO crna a a a 320 
20:10 legged RGODOIS «ica caccidesdesveadve id chavs cdnckeadeeaae ce ia aai 321 
20.11 Exoskeletons and Rehabilitation Robots ........sssssssssssessesseess 322 
20.12 EMOUGN al RODOS -riannee rnern ne s an en enaa EG 323 
20.13 Fuzzy Modeling eesriie raren a aao 323 
20.14 Comments and CONMCIUSIONS ............eceeeee eect eer e tree ee eeeeeeeees 324 
PRT ENS scission nits g vin catcwhanynignsa on snngennnn AEEA 325 


Part C Rough Sets 


21 Foundations of Rough Sets 


2 


N 


Andrzej Skowron, Andrzej Jankowski, Roman W. Swiniarski.............06. 331 
21.1 Rough Sets: Comments on Development.............cccceeeeeeeeeeees 331 
212 Vane ONCEDE oeesernnoanirnice ri seente TED EEES EEEE 332 
21:3 ‘ROWS Set Philösophy.--orcscreiesisriiererir resite nrcbo rii iai 333 
21.4 Indiscernibility and Approximation.........ssessssssesssesssssssseses 333 
21.5 Decision Systems and Decision Rules ............. cee cccceeeeeeeeteeees 336 
21.6 DepëndemilS.. i ccc. cesieseneewsaaacsanbicediienesemerteuateceseeeeeanee 337 
21.7 (REGUCUGM GT AU DUES 2. coicsc cies teenav usd Mati sananers chess eo e 337 
21.8. Rough Membership -crreesescricnrcenseience anea 338 
21.9 Discernibility and Boolean Reasoning .........ssssssssssessessssseses 339 
21.10 Röugh Sets and [MAME ON a oscserancsnisisinsisicinssnei rimasi 340 
21.11 Rough Set-Based Generalizations .............. cc cece cceeeeeeeeeeeeees 340 
2112 Rough SEES and logit ones 20 sen caste ecaienentee Maceradieedas se ddebeas 343 
PAER o E eaarmentod cneemiod Quads celts dad eam aneenan’s 347 
RETETENCES coen xa 5 sicis.0 min tiiin irte r En A E EEEE A one ad 347 


Rough Set Methodology for Decision Aiding 


Roman Stowinski, Salvatore Greco, Benedetto MatardazZO................065 349 
22.1 Data Inconsistency as a Reason for Using Rough Sets................ 350 
22.2 The Need for Replacing the Indiscernibility Relation 


by the Dominance Relation when Reasoning About Ordinal Data .. 351 
The Dominance-based Rough Set Approach 


tö Multi-Criteria ClassificatiOM. e ssecessi niorse cereias 353 
The Dominance-based Rough Set Approach to Multi-Criteria 
CHETCE ana RANKING oriinsecisrseren uniia neiaa e cdaws Die 361 


Important Extensions Of DRSA ........sssssssssesssesesssseseessseseses 366 


XXXI 


XXXII Contents 


22.6 DRSA to Operational Research Problems .............ceccccceeeeseeeees 366 
22.7 Concluding Remarks on DRSA Applied to Multi-Criteria 

DECOM POBLES ick cavsseds on sndnstdievede rentene i Esnes Eaei 367 
RGTOVE NGOS god dessa aes toa lnvaddetnaaiee ta sauce nanedaene aw ee eed Hoclaee caneus 367 


23 Rule Induction from Rough Approximations 


Jerzy W. Gizymala BUSSE oo. svn vasa seiis von wea can ne iene eo cles oo aa 371 
23.1 Complete and Consistent Data scsscscssiscsserrcrsincsicsonisnrccsssa 371 
23:2 VACONSISUEIL Data iiss. orree ease dd vaawuweeee eae vs un aan deeenid lanes AEEA 375 
23.3 Decision Table with Numerical Attributes ..................cceeeeeeees 377 
234 lacomplete Data nocie scenes oxen en vhs wus vaca i sin owe anew 378 
25S OM AIGIS sarria Eao ERRER R dis ska CENERE REIR 384 
PI A E E A 384 


24 Probabilistic Rough Sets 


Yiyu Yao, Salvatore Greco, Roman StOWINSKI..........00ccceeeeeneeeeeeeeeeees 387 
24.1 Motivation for Studying Probabilistic Rough Sets.................0065 388 
24:2. Pawiak Rough Sets oascccscccwida sen actedaws-on'sdenMacniaadenwsiiee see wanes 388 
24.3 A Basic Model of Probabilistic Rough Sets ..............cccceesee sence 390 
24.4 Variants of Probabilistic Rough Sets ............. cc cece ccceeeeeeeeeeees 391 
24.5 Three Fundamental Issues of Probabilistic Rough Sets............... 394 
24.6 Dominance-Based Rough Set Approaches ............ccccceeeee scenes 398 
24.7 A Basic Model of Dominance-Based Probabilistic Rough Sets....... 399 


24.8 Variants of Probabilistic Dominance-Based Rough Set Approach... 400 
24.9 Three Fundamental Issues of Probabilistic Dominance-Based 


Oe SENG dats deren sit ncndemand nececitas T E T 403 
PE LO CPU SIONS oriori n RE aauiouede wu erea abe se aee oe soaimens vamos 409 
REPOPEIICES: o onic ccacsija clea die do5 vied opie n keiensis va smed uae eliadegncdseguaboes 409 


25 Generalized Rough Sets 


Aingtao Yao, Davide ced, Yan ZHONG oie ces evnse s canes donde coisa newearen 413 
25.1 Definition and Approximations of the Models.................0eeeeee 414 
25.2 Theoretical Approaches. 2. scisivecesacts vaecas cdaieas Matas seeds sovenses 420 
253 (COMCIISIGI civics cceee ri iri ceases deus se ie ees vw awee Vea tun EEE EE 422 
REPOFEMCOS i. ccs es cissseuneds osnan n nET a EEEE EOAR ERE EA 423 


26 Fuzzy-Rough Hybridization 


Masahiro Inuiguchi, Wei-Zhi Wu, Chris Cornelis, Nele Verbiest.............. 425 
26.1 Introduction to Fuzzy-Rough Hybridization .................eeeee eens 425 
26.2 Classification- Versus Approximation-Oriented 

Fuzzy Rough Set Models 3 osiceseid dics scesatend sents eceenteadas erences 427 
26.3 Generalized Fuzzy Belief Structures with Application 

in Fuzzy Information Systems ..........c.c.cs cesses ene ce ceaensece nen nae 437 
26.4 Applications of FUZZY ROUGH SOUS co s.. sec scan ccd sde cae eis ccadteeneeens 44h 


ROTO NEITCOS: E A rsciersin ne vo E A 44T 


Contents 


Part D Neural Networks 
27 Artificial Neural Network Models 


Peter Tino, Lubica Benuskova, Alessandro Sperduti ...........cceeeeeceeeeees 
2X1 Biological MAUNGINS:.iccanesatieds osvued since asatioesasens eines avaea sean 
Zis2) POTERDINON oss dices ccdes A oc Sa nevtveawe vou tiebs eee. eeonees 
27.3 Multilayered Feed-Forward ANN Models ............ccceeeeeeeeeeeeees 
21-6 Recumbent ANN Models o.455.065. cei odes pecn seiri ikan Tineia 
27.5 Radial Basis Function ANN Models ..............ccecccceeccceesecceeees 
22.4 Self-Organizing MAS 0+ wsevaesd sni caver s Tet er Cen ceveese Aves 
27.7 Recursive Neural Networks ............cccccee eee c eee ceeeeceeaseeceeees 
PER COMGCINSIOIN 2.5 stccead scents xdenege A s 
PPT COS EET vd satan da a daes E ea dew elaed sae eda adoeeweaes 


ME CHOU cara sad Seis E E digs Seles E gisele wig Ge Hew veneer peaie 
ZEA, a T E adwled ces sea sanedss sacesadela vas ves banner esees 
28.2 Deep Neural N@RWOIKS.....6. sissies caver cee dewewed onus vou ce earevioens 
28.3 Modular Neural N@DWOIKS.c.55.6665 ccc ccicdsserscetaessascsseasecaaaens 
234%: Conduding REMAINS: sese five sive iniiae ee EEEE EER 
ROPOTOMGCS orsrssissrsisrro cassaawaracendgd voce saree dea reednaueaaee ces sameadeans 
29 Machine Learning 
James T: Kwok, zZħi=-Huq Zhou, Lei XU ceciscscccccesensecss ose dec ceeeiaeiet sen 
Phe o a E AE 
29,2. SHPO MSO Leari ME aeerseiiinr ii ine s ie e EEE vis a e oe beret 
29.3 Unsuperised LOSING. encina aa ears 
25.48 Reinforcement learing e enreeccssiucsceciis riie a ser sieves 
29.5 Semi-Supervised learning.. iic.s cevics viviss caves cavetscsa des seees sean 
29.6 Ensemble Methods «ici eccc tine civns sce eies stoves svete seve es eeaereuees 
29.7 Feature Selection and Extraction............. cc eeeee cece cece neces enone 
Refere NEES eirio ebonita sanse EEEE nan swinnd age EEEE EEAS 
30 Theoretical Methods in Machine Learning 
Badong Chen, Weifeng Liu, José C. Principe..........ossessseessseessseessee. 
30.1. Background DVEMGW o occcrecccrrencdiransaiiinsisicencune dariniais 
30.2 Reproducing Kernel Hilbert Spaces .........ssssssssssssssessessssseses 
30.3 Online Learning with Kernel Adaptive Filters .................eeeeeeee 
30:4  Illùstration Eames sss scee sca can rearea ni rrenean as 
320.5 COTES WGN sain bi5 d:cicie oc0ie Sessa orecart sve wipcoincnue eceveib a E E 
Referen S conire En na toni E EE E Sninedaeadeators A 


31 Probabilistic Modeling in Machine Learning 
Davide Bacciu, Paulo J.G. Lisboa, Alessandro Sperduti, Thomas Villmann . 


31.1 Probabilistic and Information-Theoretic Methods.................08. 
32 Gaphial Modi.. eeina nerien penne nrn a in 
31.3 Latent Variable Models. occ srecrsicicurssrissssrarresreiet rene ans 
ILE Markov IMIS... oioisaicce esccckis viersciasee we cinie aisle d ood dale i ania 
31.5 Conclusion and Further Reading...............cceeecccceeeeeseeeeceees 
PRT RTGS oeeie r e sivsnss ais la ein ove apeinidinidns E EROE A 


XXXIII 


XXXIV Contents 


32 Kernel Methods 


Marco Signoretio, Johan A. K. Suykens 20.6 cccccs cceeee ccdaan cost edevaas ceases 577 
Bice BKO occas s tiaccaescebsucestiorsescaphesietaeasaesansuemoceces waaieeas 578 
32.2 Foundations of Statistical Learning .............. cc cece ccc eee eee e eee eee 580 
32:3 PimalkDual MEMOS... cess cen icccseeseeedtsontaabecadasicaesee weanee 586 
32:4 Gaussian Processes <u. cece choses cece hdwieseekaecies ee ae win ceases navten 593 
32.5  MGUG Selechön © vices cscs cocsncesxcsecsssasedsasosdebewsdeaossegeusuees 596 
32320 MOLE Oi Kemell sic daccesd dacieavdeavedunsiewed idvewadnii de a ee 597 
Bek PBC E discs sides hinid accrmceresraa teres a ctariiaiee E TT, 600 
ROTOUG CGS sos cs cena sorneta nian Gun teadeeSuaeneea ste deanetecaties 601 


33 Neurodynamics 


Robert Kozma, Jun Wang, Zhigang Zeng.........ccceseeeccccccceeneeeeeceeues 607 
33.1 Dynamics of Attractor and Analog Networks ................eeeeeeeeee 607 
33.2 Synchrony, Oscillations, and Chaos in Neural Networks.............. 611 
33.3 Memristive NEUrOdyNaMICsS............ cece cece cceee sete eceeeeeneeeenees 629 
33.4 Neurodynamic Optimization..............cccccccsee cess eeccceeseseenees 634 
ROTOR ACS occain unin E KEE ERE EEEE 639 


34 Computational Neuroscience — Biophysical Modeling 
of Neural Systems 


Harrison SUGTON, Jennie Sheernessin Souw woae n Goes oases veleeea 649 
34.1 Anatomy and Physiology of the Nervous System ...............0e0eee 649 
34.2 Cells and Signaling Among Cells.............cccceeeseeecccceeessseeeees 652 
34.3 Modeling Biophysically Realistic Neurons ............ ccc ccee seen eee eee 656 
34.4 Reducing Computational Complexity 

for Large Network Simulations .......... ccc cccee nsec cece eeeeeeeeenees 660 
BS “COMPUSIONS: ain Eaa bused stews idineee needa dies oaneus 662 
ROTOPE ICES E alias iw crvindiolne inaistarnye sinierainiene areca nba T 662 


35 Computational Models of Cognitive and Motor Control 


BUT Bc PANDO os E Gecateveas E E tous seienees 665 
351 (OV GWMEW 5. cescsdcciedacictsdaadinscewwsduciede chess uct iwieatwaeaseseeanens 665 
35:2 Møtor CORT sossiicais scesiea canvases oe cscs ceendae assesesaleulmaaesensaees 667 
35.3 Cognitive Control and Working Memory .............cccccceeeeeeeeeees 670 
Be AOU USAGE ruinerer n en n E EEEa EE Ae 674 
eE R E E toebe Tacwens conan’ 674 


36 Cognitive Architectures and Agents 


Sebastien Hele, Ron SUR ok. ccccdic as cds ow ne ves cedainidiwesviey i E Ee 683 
26.1 TRACE (GUNG! ce ccies scisinn ci tine souametadeaicdaesineaaacmsieneesieemosseeenaanes 683 
36.2 Adaptive Control of Thought-Rational (ACT-R) ..............ecceeeeees 685 
DAS ce UDE wsiccss sinc win cassie E E E E A E, 688 
BO CLARION oisdscviads cece loosahdecee can ida men dock E E a Eei 690 
36.5 Cognitive Architectures as Models of Multi-Agent Interaction....... 693 
260 General DISCUSSION reai enrio n s nEn SEESE 694 


AAA E sie wioles sie:siore is pow w/Sierpwsinie'e’e ersiscereles 695 


Contents 


37 Embodied Intelligence 


Angelo Cangelosi, Josh Bongard, Martin H. Fischer, Stefano Nolfi.......... 
37.1 Introduction to Embodied Intelligence ..............ccccceee essence ees 
37.2 Morphological Computation for Body-Behavior Coadaptation...... 
37.3 Sensory—Motor Coordination in Evolving Robots .................e005 
37.4 Developmental Robotics for Higher Order Embodied Cognitive 

Capa BIES .isccc cine aaie neathotseaceenercanedteimekeniedeausoeewedescemead 
Pheo, MOMGIMSIBN sco E E eidiiaaacie ties savedaed 
PROCTER NS sig sishaseinn rein’ enei n aininiesaiacniacayonibinj4 RE EONA 


38 Neuromorphic Engineering 


Giacomo Indiveri 6.6 veins <iceeeie voice sie iriri soe eed eers EN OESE HEN eles Vee onsen 
SOL “THE URSIN rereana cs Ea iamcaaiauenses adeee beay sane 
38.2 Neural and Neuromorphic COMPUTING............. cece eeeee sees eee ees 
38.3 The Importance of Fundamental Neuroscience ..............eeeeeeeee 
38.4 Temporal Dynamics in Neuromorphic Architectures .................5 
35.5 Synapse anid: Neuron CINCUITS: seessrcsisirccisisrsieincse con cae ee aces ean 
38.6 Spike-Based Multichip Neuromorphic Systems ..............eeeeeeeee 
38.7 State-Dependent Computation in Neuromorphic Systems........... 
SOS CONC ONE wi.. oisces Pesce econ ne EE VNE ESEETO ede cveeuey ees 
References 0:5. 50ccceiciissaciadanscaeiaecteseeceiaisessnteaeacsteeeeeaaascka aes 


39 Neuroengineering 


Damien Coyle, Ronen SOSNIK 1... ccc ccc cece eee e eee c cece ne eeeeeeeeeeeeeeeneeees 
39.1 Overview — Neuroengineering in General...............ceeee eee e eens 
39.2 Human Motor CONi waco ica cec sc adisiiieme nedetiensdewes sowihiod seu nedebiees 
39.3 Modeling the Motor System — Internal Motor Models................ 
39.4 Sensorimotor Learning ............ sce e cece cc cceee eee eeceeeeeeeeeeeeeeees 
39.5 MRI and the Motor System - Structure and Function ................ 
39.6 Electrocorticographic Motor Cortical Surface Potentials .............. 
39.7 MEG and EEG - Extra Cerebral Magnetic and Electric Fields 

OT LITE Motor SYSEBN oss cence cae nean ntkGinebssdonveceen sie’ 
39.8 Extracellular Recording — Decoding Hand Movements 

from Spikes and Local Field Potential ................ cece cceeeeeeeeee 
39.9 Translating Brainwaves into Control Signals — BCls................665 
39,10 COMMON caacccbeenddediserataiewn aaicad cecsaeointeaGsieuleeeneiedaeeaoneas 
Referees 5.50 dei ist bicdvarsda OaE EEE EE NOREEN aoes canada’ 


40 Evolving Connectionist Systems: 
From Neuro-Fuzzy-, to Spiking- and Neuro-Genetic 
Niola ROSY saan cccda citcdanias otiatasausesecbmiaiss nese delpea doer eneeawlee nes 


40.1 Principles of Evolving Connectionist Systems (ECOS)..............6085 
40.2 Hybrid Systems and Evolving Neuro-Fuzzy Systems ...............085 
40.3 Evolving Spiking Neural Networks (@SNN) .............cceeseeeeeeeeees 
40.4 Computational Neuro-Genetic Modeling (CNGM) ..............20ee00s 
40.5 Conclusions and Further Directions........... 00... eee eee eee eee ee eee eee 


Referents oo oo secs Aorwsocene ares bie citsnewi sg disinle sce. vids crohns vos ioe a eaiengies vee donews 


XXXV 


XXXVI 


Contents 


41 Machine Learning Applications 


Piero E Eon S E ea essen EE tid oaaeiies dae ee cememeeias Memsadeenae 783 
HAE, MEVA ennai EE EEEE EEEE EEEE nes 784 
41.2 Machine Learning (ML) Functions ...............cccceseceeeseeeeneeeees 786 
41.3 CI/ML Applications in Industrial Domains: 

Prognostics and Health Management (PHM) .............ceeeseeeesees 787 
41.4 CI/ML Applications in Financial Domains: Risk Management........ 797 
41.5 Model Ensembles and Fusion ................ eee eee cece eee eeeeeeeeees 807 
41.6 Summary and Future Research Challenges .............sececcccceeeees 812 
Aa E A r A tadidanmecarage E 817 


Part E Evolutionary Computation 
42 Genetic Algorithms 


Jonathan E; ROW oo cic daassuietoaaases cielogvadaeeseadion a e i n 825 
42.1 Algorithmic Framework.. .os2..sccescccsiesossecusdatsecesscocseas denne 826 
42.2 Selection Methods.. eosccecssirensenierrs ennonn e/ainibinisiectin’s s 828 
42.3 Replacement MethödS csc. cise cececda reote nrnse i nirani irate 831 
42.4 MUTATION Methods..e.ssssrsecercercinicsieseccrieniieiesserneriiieectsst 832 
42.5 Selection-Mutation Balance ..........sssssssssssssessssssesoseesesees 834 
L2G Gossover Methods . coiciis cccischseweidnnsiaeaineie dire enii iani aoe 836 
42.7 Pøpuülatan Divers iy.. naince miaccresermse irati aeaea 838 
42.8 Parallel Genetic Algorithms .........ssssssseseesssesesssssssesesssesesss 839 
42.9 Populatlons as Solutions ...icc2 nce. ckcdis soeeed ede nceateedetdesevaeees 841 
BZ LGD: Mea e UTS WINGS 5 case: scacece ie wcave:o ecscguelove a cssayes's A 842 
REMGFERCES «v5 os cas ceeded cia tauetiaiascadeaue isu puede cbGadaevidaaeuetesasdeesnens 843 


43 Genetic Programming 


James McDermott, Una-May OReilly .........cc ccc cccccnennneetceceeeeeseeeees 845 
43.1 Evolutionary Search for Executable Programs ..............eeeeeceeeee 845 
ee a e e one sideatass cee nossa yout one sucescued tose E T ides 846 
43.35 Taxonomy or Al ad GP ci cicscd sce. awwdicceseedieeiwes as geneees cacas 848 
3a BOS OT GP oinetan an a EESE 853 
43.5 Reseateh TOPICS. credente shamed s teed EE 857 
L3G PUA TS assed lo ie ian soso en alsharnraraietniebwin ETE btw ba 861 
Rele MCCS oeenn E E ened aldecant nies aid anata panes 862 


44 Evolution Strategies 


Nikolaus Hansen, Dirk V. Arnold, Anne Auger... 871 
BSL a a oc:hcstieve E added bea vbw dx Bale poe ewe A 871 
Bee 2 Main PHMENBDIGS sities av iass cin eaccavesecsadedoa vas NEEESE ade EEE B73 
btg Parameter Control ss oct-c sie acco oesan dana ns Wes oe EE oa Vee sade 877 
Be e P A E E E A T 886 
Referees riero inane avin EEA ETEEN ETE AEEA ATE EEEREN 895 


45 Estimation of Distribution Algorithms 


Martin Pelikan, Mark W. Hauschild, Fernando G. Lobo ...........0cceeeceees 899 
AGT, Basic EDA Proce oo... sis sicsigcclsee:caieisieacieeieie'sieisewe sieves acsiargieieie/eia a bieiseae 900 


Contents 


45.2 Taxonomy of EDA Models ...........ccccceee eee eececcceeenseeececeseeaes 903 
HSS OVEMIEMPOE EDAS 6c. icisdidvewaseds ciseerdasadrtbasea aus a i 908 
EE EDA TPO dnan ac ba a a a a aa 916 
45.5 Efficiency Enhancement Techniques for EDAS..............cccceeeeees 917 
45.6 Starting Points for Obtaining Additional Information................ 920 
65.7. Summa and CONCIISIONS civic ccessckecatieses sreci secsi e o seendad’ 921 
PRE ETS ioe ncini nean AAE EE EOE RAT ANES 921 


46 Parallel Evolutionary Algorithms 


4 


“I 


DIR SUG sc cooks taco evi c ceace does daeths cadiebewen ta teresdae oi aE 929 
46.1 Parallel Models .c.c.c.ccccccscscceeevaaescctsctesneed cencseeeawseeueesees 931 
46.2 Effects of Parallelization .. cc. c20 ceeds civeasoessaseteasa coeds Moses cowed eas 935 
46.3 On the Spread of Information in Parallel EAs ...............ccceeeeeee 938 
46.4 Examples Where Parallel EAs Excel ............ccccccceeesececccceeeeues 943 
46.5 Speedups by Parallelization ............. ccc cece ccc ccceeeeeeecceceeeenes 949 
HEG COMCIWSIONS: 250 dese nc0siacniacda ans dodiesa e a cdeedads 956 
ROP OVOT OCS AA A A 957 


Learning Classifier Systems 


PEGE V DUZ osiris k Erato E wens veel ac Vids AREEIRO AE een 961 
HT BACK SRY E E EE S E E 962 
E E ME S TEA E E E A E E EA ler eueaws 965 
HUT, OEE E saw EAEN NE cea ANETO EE E SARTAN A 970 
lt Data MMN yee saws roie iiien E Ee EEE ENEE 972 
S7.5 Behavioral LegiwAP..; ivccasaicssevccgeadvassswsaetadcnse rages tees corns 973 
HET OCIS TOFS iio oie sit aisle x ond vie d's E ecm salen ous dese ew hadeadacimen ones 977 
47:7 Books and Source Ode 2... ccs cesses caved catia ectivasev eves cvesasaes 978 
AEn CES E sin E E T ev Oen en eeeawet T euees 979 


48 Indicator-Based Selection 


Lothar DIONE rassar v isd ensia tar adaanasiamnee ceases tania aanath anne caine deans 983 
FED, PONVI enarak ee A eS ehd deed EO EA EE EOAR 983 
taZ Basic CONCEDES e srrsrira nenni onnaa RaR AE EENEN ves sanded sans 984 
HBS. Selecion SCONES eeriseeri errari oe eed rice EENEN eed view EN 987 
48.4 Preference-Based Selection .............cccccceeccceeeeceeeceeeaneceees 990 
48.5 Concduding Remarko -.i5csi.cc00 ewes Vina sE aE ve diwadew ese ida sen exaas + 992 
RGTRTBOINEES Solas eda Bi choo U8 ead aa de Sead a a0 Sos TE aia See Aad aoa 4s Res 993 


49 Multi-Objective Evolutionary Algorithms 


KaVORMOY DOD oe dsc wecawotds scanccse a EE E EE EA 995 
KoL Predmier aE TEE EE EEEE ETE 995 
49.2 Evolutionary Multi-Objective Optimization (EMO) .................68. 996 
49.3 A Brief Timeline for the Development of EMO Methodologies ....... 999 
G94 Elitist EMO: NSGASI i oiccs cases cts beadewedsdaseunse wed taweeacenesemetieas 1000 
BOS: Applicatdons OF EMO wesseicccscccssgicelaactessaweds sdacane bhde an ocecitauts 1002 
49.6 Recent Developments in EMO ............ cece eee ccc cceeeeeeeeceeeeeenes 1004 
HOS COMUS 625 coiicsiviadsaievenncedshuies E E E dass hvenesawedae 1010 


PREP CES oes ss sess A E E os Marilee Giolewts eeiengies vee dotiean 1011 


XXXVII 


50 Parallel Multiobjective Evolutionary Algorithms 
Francisco túna; Enrique Albă. occ ccdeasc csc ccekiacckeads ansancastedevees cease 
50.1 Multiobjective Optimization and Parallelism ................seeeee eee 
50.2 Parallel Models for Evolutionary Multi-Objective Algorithms........ 
50.3 An Updated Review of the Literature .............. cc cece cee seen eee ees 
50.4 Conclusions and Future WORKS ccc: coccseds ceeeases ea seve voaneee sexes 
Refere neS eanan Sincere nia spose w aielosaleie aia a ace ETE g ete bbe glocedinr sd E 


51 Many-Objective Problems: Challenges and Methods 


Antonio Lopez Jaimes, Carlos A. Coello Coell0..........cccccceecenecceeeeesaes 
SIL ‘BACKS MUME scewsvdais arere saved ved bun wveud EE ETEK seb auale 
51.2 Basic Concepts and Notation ............. cc cccccse eee e eee ceeeeeseeeeees 
51.3 Sources of Difficulty to Solve Many-Objective 

Optimization Problems. i: ciccas ce cisors teavte senses saemess née aaweaoes 
51.4 Current Approaches to Deal with Many-Objective Problems......... 
51.5 Recombination Operators and Mating Restrictions................... 
51.6 Scalanzation MENGES icc. cise secs cecevveeds ees weevectueis ea 
51.7 Conclusions and Research Paths .......ciss.icsseecssssecisseeseesseades 
RSTOME COS 5 desde ve bona EE causes bey EAEE wea ee dees eeebeebeaveceneuwes 

52 Memetic and Hybrid Evolutionary Algorithms 
Jhon Edgar Amaya, Carlos Cotta Porras, Antonio J. Ferndndez Leiva ....... 
Bite OEREN ean e a ckciandees sates ENE AEE E 
52.2 A Bird's View of Evolutionary AlgorithmS.........ssssssssesssessesses 
52.3 From Hybrid Metaheuristics to Hybrid EAs ..............ccceee seen eens 
52.4 Menretic Algorithms «............cccecesecsensees eset tenedecesenteseben nae 
52.5 Cooperative Optimization Models ..............c cee e cece cc eeeeeeeeenees 
SAG CTS noaa Eana vied te olga E EAA 
RETTERE E ue na r E EE EEE AO EE N NA 
53 Design of Representations and Search Operators 

Franz ROMA eercisncsicrisitisisiis oie 08 ara:ninse Eene eE NEEE ENE E 
531 Representations csc ccaccs ovwes comneanarne i n i somone E a E a 
53.2 Search DP OVA OTS os anecaninetersens tesnia n nas E nii nie 
53.3 Problem-Specific Design of Representations and Search Operators. 
53-4 Summary and Onus eesriie iii a 
e a EETAS 


54 Stochastic Local Search Algorithms: An Overview 
Holger H: Hoos; Momas SUIE s « orrserreisecreienrnsri ii oats geass cadens 
54.1 The Nature and Concept Of SLS ..iscscccscssissscrirericcieccciicersosi 
54.2 Greedy Construction Heuristics and Iterative Improvement ......... 
54.3 Simple SLS WGRNOS «cos coca vce acicaiieeewaiee dviecsdubsa nu ivonns 
Sa Hybrid SUS Methods «...02scccse cece cddeieseekanecetin ae ein Senceee neuen 
54.5 Population-Based SLS Methods ............ccccceeseeeecceceeeeeseeeees 
54.6 Recent Research Directions 666s. cc.cs cess secveededeneee see dee onee eae ned 
RECN BUCS E E E E N a oie vgs 


Contents 


55 Parallel Evolutionary Combinatorial Optimization 
BISCO ZO TOI oo os cose a adie nals tid E ENERE EAEE ives cdanees (bea bevee meee 


SSA MOOO ocina grse orscarainne cin etbicinidierie nti ding oiaretane sina elnwrald see wa Sate 
55.2 Parallel Design Of BAS iss idesscaccids cecvacdacdesdeevac ces doateeeneectads 
55.3 Parallel Implementation of EAS ..............ccceeeeeccceeeeeeeeeeeeees 
55.4 Parallel EAs Under Paradise ooi.ccs cc csccecet ceeds ceeiee decd oeeonvenau's 
55.5 Conclusions and Perspectives ..........cccccceseseeeeccceeeeseeeeeeeees 
RoT DEET oaeoi e E E E E E Ohinecoeeds atone car edand 


56 How to Create Generalizable Results 
Thomas BENZ =BClCISCEMN. «.. bc ecccccs se ctuecs nenni ci laces a a 
56.1 Test Problems in Computational Intelligence ................ee cece eee 
56.2 Features of Optimization Problems ..............c ccc ccceeeeeeeeeeeees 
56.3 Algonthm FRGCUNES o.dccc cece cect vasmedats noc eseneeenacsnaaemiedie me denen 
54 Objective PUNCHIONS ices csce. den caeca deine coieaeieeedas eeesedeees peawaad 
565 CASE SPU ANOS irca aed a a a a 
56:6 “Summary and QUWOOK ocirocicsicniriereisiansiosraicsi aneneen 
PTO O e E A E E 


57 Computational Intelligence in Industrial Applications 
Ekaterina Vladislavleva, Guido Smits, Mark KotancheKk.............cceeeeeees 
57.1 Intelligence and Computation ............. ccc cece cece ccc ceeeeeeeeeeees 
57.2 Computational Modeling for Predictive Analytics.................0005 
ES o e i Fei ov E A E 
Bak WORM OWS < ciwotsiacias sworsaraaen gee ARRE EE EERS 
Si E EE BS 6c os cece § ween ews vada vine O S wee ewes eee yesavans 
BiG ONUS eirean ana esceesaee bee si eds dense osvandesn ees sends dani 
RETAT ae o OE cinnleain eave se T vane a Dew gevebaw ee veie eee a aan 


58 Solving Phase Equilibrium Problems 

by Means of Avoidance-Based Multiobjectivization 

Mike Preuss, Simon Wessing, Günter Rudolph, Gabriele Sadowski.......... 
58.1 Coping with Real-World Optimization Problems ..................065 
58.2 The Phase-Equilibrium Calculation Problem................eeeeeeeeee 
58.3 Multiobjectivization-Assisted Multimodal Optimization: MOAMO... 
58.4 Solving General Phase-Equilibrium Problems ...............eeeeeeeee 
58.5 Conclusions and Gutlogh sissies ieves dcaseasdtoss svedsaude ves eveswe s 
ROTCTOMGCS icc iicissssiics saseasdawea cased ca vees atea dea tow NENESE NESESER SATS 


59 Modeling and Optimization of Machining Problems 
Dirk Biermann, Petra Kersting, Tobias Wagner, Andreas Zabel.............. 
59.1 Elements of a Machining Process .............cceeeeeccceeeeeeeeeeeeees 
59.2 Design ODUM ZaMOn sc css cds con dae sl coevaccdsceadeevan ces tedieeescddaas 
59.3 Computer-Aided Design and Manufacturing................eeeeeeeeee 
59.4 Modeling and Simulation of the Machining Process ................. 
59.5 Optimization of the Process Parameters ...........ccccceeeeseeeeeceees 
59:6 “PROCESS MORONE .icccs ccaesedens ty cena varsdd deena cendeteuee scandens 
59,7 Visualiza Goins baie ba ciortiersscdieisiecctsnaniowides «madeblenederlesinnaineiodas «medbiniies 


XXXIX 


XL Contents 


59.8 Summary and DUO 6.52 secs toc bosnadowae sade steesaiean e a 1180 
Refere MOS sites cc tesacdetiaotana baesiaa ita sands ts a EETRIS 1180 
60 Aerodynamic Design with Physics-Based Surrogates 
Emiliano luliano, Domenico Quagliarella ......... ccc cece cece ceeenneeeeeeeenes 1185 
60.1 The Aerodynamic Design Problem .............ccecccccceeeseeeeeeeeees 1186 
60.2 Literature Review of Surrogate-Based Optimization.................. 1187 
60.3 POD-Bäsed SUMOBATES co.cc ccccccesacceieseceseesseeaeceesemesdasadates 1190 
60.4 Application Example of POD-Based Surrogates ...............ccceeeee 1191 


60.5 Strategies for Improving POD Model Quality: Adaptive Sampling.... 1199 
60.6 Aerodynamic Shape Optimization by Surrogate Modeling 


and Evolutionary Computing ........sssssssssssssesessssoseesesssesese 1201 
GOT CONS OE a anae o EEN EEE E EEE R E 1207 
ROTOR EAC sereen e era E E EEAO S caves 1208 
61 Knowledge Discovery in Bioinformatics 
Julie Hamon, Julie Jacques, Laetitia Jourdan, Clarisse Dhaenens........... 1214 
61.1 Challenges in Bioinformatics ..........sssssssesssesessssoseseessseseese 1211 
61.2 Association Rules by Evolutionary Algorithm in Bioinformatics ..... 1212 
61.3 Feature Selection for Classification and Regression 
by Evolutionary Algorithm in Bioinformatics.................eee eee eee 1215 
61.4 Clustering by Evolutionary Algorithm in Bioinformatics.............. 1218 
GLS COnN oneri ern E E E a 1220 
RETETE NCES cocirier ier annen NEE EEEE OE AEE cay ous 1221 
62 Integration of Metaheuristics and Constraint Programming 
ea Di GUDO eea sane E ea bass E a aa 1225 
62.1 Constraint Programming and Metaheuristics ...............ceeeeeeeee 1225 
62.2 Constraint Programming Essentials ..............ccccccesesseeeeeceeees 1226 
62.3 Integration of Metaheuristics and CP ........ cc cce seen eee eeeeeees 1230 
Gad COU SUITS oi sia s seca ina G-nia.s vd a n E 1234 
ROTRIENEOS onanera a deadocends secede deeded dwes be beedaenee tne bees samen’ 1235 


63 Graph Coloring and Recombination 


ROU DEVS E E EEE E E E E E, 1239 
Bad GHP COE ce aae E EE EEE ese 1239 
63:2 Algorithms TOP Graph COMME: occas c cia we i niea 1240 
Base SOUP sacccaverinscctdvsdiend saiwedudes cecdsiae bie E 1244 
635 EDEME na ceaisinidssecnsid even cewnsancieninswevseemeds tester ern nis 1246 
3.5 EDEMEN 2 ccdccce sac bhedi acids cotbada GiOhndaed E E e DREE 1249 
63:6  CONCIUSIONE and DISCUSSION: «000.06 600.00.ccceesecsvoreectennannaainaa nae 1251 
ROTOPB GOS 6 5 i css canciaacnatocduadecciss nddaune des EE dnc beaG eames 1252 


64 Metaheuristic Algorithms and Tree Decomposition 


Thomas Hammerl, Nysret Musliu, Werner SCHAfhMQuser............eeeeeeeeees 1255 
Gas] Tree DECONDGOSIMONG ..cscdiaciccncdieda sentedasdher a en 1256 
64.2 Generating Tree Decompositions by Metaheuristic Techniques...... 1258 
GUT COMU Orna aa E canter E S E 1268 


AA E a E E sreresd einem aisiorpieinnete'eCrsecrelse 1269 


Contents 


65 Evolutionary Computation and Constraint Satisfaction 
Jono t- VON ETI a 5 awaits a danse E deans po EEE ES 


65.1 Informal Introduction to CSP ............. cc ccc ce eee c cence eeeeeteeeeeees 
65.2 Formal Definitions cocisvicciavscccsds chews satea jedteaen ceeds chevtedeee bed 
65.3 Solving CSP with Evolutionary Algorithms ................eeeeeeeeeeeee 
65:4 Performance Indicators «ccs. coda cdeedeeves acct eeeceeas od eeeeevevauss 
65.5 Specific Constraint Satisfaction Problems...............sseeeeeeeeeeeee 
65.6 Creating Rather than Solving Problems...............cceeeeeeeeeeeeeee 
65.7 Conclusions and Future Directions ...................ceeeccceeeeeeeeeees 
RETERONEGS E ceesasitees cove ceacesas cbenaceeueg coted en coeds os eeeem uu 


Part F Swarm Intelligence 


66 Swarm Intelligence in Optimization and Robotics 
(hristian Blum, Roderich GIS <..csiniews soc esc ence sis Voiced enienis suivinints Vane si vass 
GG. QUEUE ocssniccswvscsvarestonccdsoasetendeasoaaae nance s ENEA EENEN dass 
66.2. Siin OP Umea 45 cisiwacduea con o eii EE nd diwad ooess ea eer ean 6 
66.3 Sl in Robotics: Swarm RODOTICS............ cc ceeeeeeee eee eeeeeeeeeeeeeee 
664 Research Challange. ac. .cc ceases een vaeeedaw sw seven aves eek ieas 
References oiciiccccssscccaaviagsancaeiaecdessseeeaiiaesaveaaeacstas vedas adea cane 


67 Preference-Based Multiobjective 

Particle Swarm Optimization for Airfoil Design 

Robert Carrese, Xiaodong Li soc cscs see cas vine sev ancssinwee w168 sorer Xow seb ese ec 
OL Arion DENEM oiicsc in ins ood aacigaled ona swesennne «eemew genes cea tales names egies 
67.2 Shape Parameterization and Flow Solver ...............eeeececeeeeees 
66.3 Optimization ALOK 05% 25.009 oes cave esta sweseedainss ei 
67.4 Case Study: Airfoil Shape Optimization ................ cece eee cece eees 
67:5 CONCUSSION ssri acs siasaawenaaetedvegesccecdsa vewesaueeaee ces taveazeans 
ROTSTONGCOS oc cseee evens vudeweneeeas weeks EE AEE EEES tees gedneceen nes 


68 Ant Colony Optimization for the Minimum-Weight Rooted 

Arborescence Problem 

Christian Blum, Sergi Mateo Bellido «0.6... cece ccc is cec canescens secisese ees ees 
68.1 Introductory Remarks oes ccscw secede veawea oranensis ne 
68.2 The Minimum-Weight Rooted Arborescence Problem...............- 
68.3 DP-Heur: A Heuristic Approach to the MWRA Problem............... 
68.4 Ant Colony Optimization for the MWRA Problem................eeeeee 
68.5 Experimental Evaluation............ ccc cesses eee ee cceeeeeneeeeeeeees 
68.6 Conclusions and Future Work ................ceeeeececeeeecceeeeeeeeees 
PROTEIN ES ineine EE EE |e ace osblernin EEA parece A bela 


69 An Intelligent Swarm of Markovian Agents 
Dario Bruneo, Marco Scarpa, Andrea Bobbio, Davide Cerotti, 
MATEO GHIDQUG a EE E rhodes sees Ov esnagens oh ptebeweLeseos 
69.1 Swarm Intelligence: A Modeling Perspective...............eeeeeeeeeee 
69.2 Markovian Agent MOUEISsiscccs cceys ceces cecccs des aeccendeoewee vaese ees 
69.3 A Consolidated Example: WSN Routing ...............cceeeee sees eeeeee 


XLI 


XLII Contents 


69.4 Ant Colony Optimization ........... cece cece cece cece eee eeeeeeeeeeeeeees 1354 
GSS COMUNS scsi. don svadand bohudsvosedes a N a a wend 1358 
Rèferentůs go. cicig.oicss ied siossinside bitte acd neii enirn E a A e 1359 


70 Honey Bee Social Foraging Algorithm for Resource Allocation 


7 


T 


e 


N 


Jairo Alonso Giraldo, Nicanor Quijano, Kevin M. PaSSINO...........seeeeeeee 1361 
70.1 Honey Bee Foraging Algorithm .............. cc cece ccc cc eee eeeeeeeceeees 1363 
70.2 Application in a Multizone Temperature Control Grid................ 1365 
PU SU Gg casa: r a O TEE ate are alarms E EA 1371 
TOG PSUS SID. £. aneren o aie nace EENE E nie bbig 0:08 bree 1373 
TOS COMISO dicta shiva neti ceiadadeanddeed secwendemendee eta REER 1374 
RET AS ins ss sty sinrscasalbjossia svarepnined one E T 1374 
Fundamental Collective Behaviors in Swarm Robotics 

Vito THON, Alexandre Campo ccs. cs veces csiscvesscaeessen ses cen swedeuesienes 1377 
71.1 Designing Swarm BehaviourS.........ssssssssssssssosesssessesessssssese 1378 
71.2 Getting Together: Aggregation ..........ssssessessosessssssesessssssese 1379 
71.3 Acting Together: Synchronization .....:....ccseesicisessesrocisssecesee 1381 
71.4 Staying Together: Coordinated Motion...........ssssssssssesssssssses 1383 
71.5 Searching Together: Collective Exploration .........ssssssssssessesses 1386 
71.6 Deciding Together: Collective Decision Making ...............ceceeeee 1388 
TLT ANSI r a a EEE E aterne gtd ENA 1390 
RefereMeES icenasne sisia divin a:tiase Sia sorte opaidie'wie EENE abcd tobiere ecdsecei EE goal See 1391 
Collective Manipulation and Construction 

Be PEP EAA E vevede ielen coed nadanteate des wieuae ewe teeeeeaces 1395 
72:1 DJOCE TransPORalON -ossc icc sed cots iadecn Sadana i a oot gedinees cations 1395 
72.2 Object Sorting and Clustering .............. ccc cece cece ceeeeeeeeeeceeees 1401 
72.3 Collective Construction and Wall Building...................c cece eeeee 1402 
TL MAUS UNE nerak nea n E nE EE EERE oictd vate 1404 
PGT WG AS enna E EE E ETA O EA 1404 


73 Reconfigurable Robots 


WSO SO E E E Gaus gee dana aera ewose he sada a 1407 
73.1 Mechatronics System Integration .............cc cece cece eeeeeeeeeeeeeee 1409 
T32 Connection Medani Ms eeen dieses edie bias naes 1410 
Toes HCO sieve cicieccmiecnia an R EEE E E EEE ENEE 1411 
Ta Diatibured Comtal ercicsson oe does canons 1412 
73.5 Programmability and Debugging ............... cece ccceeeeeeeeeeceeees 1417 
T3 Pepee canere maa E de ota yae ne came eeuases canteen 1418 
Poet Farther RACINE. eienenn aneneen e ia Eaa 1419 
ROTRIENEOS voices cevsiactbevsaatedsaiisd tala AEE E A ORG 1419 


74 Probabilistic Modeling of Swarming Systems 


Nikolaus Correll; Heiko Hamann sssisesrisiresosrssressstnsarroisaseisssns 1423 
74.1 From Bioligical to Artificial SWarMs............. cece ccceeeee eee eeceeees 1423 
Tha THE Master EQUSUOIT o occci ci ccdvetevirder soeemedsducesadie deb disease 1424 


74.3 Non-Spatial Probabilistic Models .............. cece ce cceeeeeeeeeeceeees 1424 


Contents 


74.4 Spatial Models: Collective Optimization ............... cee cece cece eeees 
PHS “COMMISION eiennenn sac bads sadsse eadeees ae EE aai 
OTST SN GES osc loud occ eos E E 


Part G Hybrid Systems 


75 A Robust Evolving Cloud-Based Controller 
Plamen P. Angelov, Igor Škrjanc, Sašo Blažič ...... ccc ccc c cscs n 
75.1 Overview of Some Adaptive and Evolving Control Approaches ...... 
75.2 Structure of the Cloud-Based Controller..............ccceseeeeeeeeeees 
75.3 Evolving Methodology for RECCO 2.05.5. cccssescsessecs doe cee eeenas eas 
Fock. Simulghon SUY vs cases cena teeth ida eiewew woe dawad ones EO Ea 
75:5 GOMCIUSIONS esistano arrire atusa erR PANEO daaieaaee NERENS 
Referees cine dav oesdeweedevensle EEEE EEEE EEE EES 


76 Evolving Embedded Fuzzy Controllers 
Oscar H. Montiel Ross, Roberto Sepúlveda CruZ..........sccccccececsneeeeeees 
PG, VENIEN a. cio ciaice inrit ririn iajens sare cine bainco os 6 aiacelbie Winners EEA deine a 
76.2 Type-1 and Type-2 Fuzzy Controllers «....3..3 .cccsscccdssceeseewae cae 
To- Host TRCHMOM BY iienaa tinwawew wed doce ee a OE a 
76.4 Hardware Implementation Approaches ..............cceeeeeeeeeeeeees 
76.5 Development of a Standalone IT2FC ............ ccc ceee eee e eee ceeeeees 
76.6 Developing of IT2FC COprocessors .............eeeceeeeeceeceeeeeeeeeees 
76.7 Implementing a GA in an FPGA 2... ...0..cceie esc cees cree sees ces eaaewes 
76.0 Evolving Fuzzy Conners aos iiis coe'sas ceases caves cateus cau isesi nesa 
ROTSTONCCS oo sdn ves ensviewe a iri to T Cr ae vein sehen EEEE EEEO VEEE 


7 


“I 


Multiobjective Genetic Fuzzy Systems 

Hisao ISHIDUCHT, YUSURG NOPHAG 0 c5ccccceseaevenen obesnae oe ece teeeeeeesedeeane 
Eick Fuzzy System DESEN ccnn seas ieascccisass teisene Sade aa ii 
FEZ) Accuracy Maximizatlo Me escari aes n 
Ti. Complexity MINIMIZATION ca. ci. ccc cccsseas cesds coescan ceeds ein ie 
77.4 Single-Objective Approaches ........... ccc cesses ee cece cceeeeeeeeeeeeees 
77.5 Evolutionary Multiobjective Approaches ............... cesses eee eeeeeee 
TLO CONCUSSION coiesiscis-s sau oie waniosrewisnainoia’ knn D 
ROTEPONEOS orein iiaia vanced ednanaane its deste deedaenhinmans E Ea 


78 Bio-Inspired Optimization of Type-2 Fuzzy Controllers 
Oscar CASIO os civ ctinec ns aden a acne docdiaded suse besegeeueuaneteoce EEE ane 
78.1 Related Work in Type-2 Fuzzy Control ............ ccc cc ceeee eee eeeeeeee 
eve Fuzzy lopit VEMS ennenen ar eo ned E 
78.3 Bio-Inspired Optimization Methods .........ssssssssenssesssssesseses. 
78.4 General Overview of the Area and Future Trends ................eee0e 
FOS “COMCIUSIONG ieceres ehane eE E ERA teeeenaendaas 
ROTEN oorsese peni E EOE EONS 


XLII 


XLIV Contents 


79 Pattern Recognition with Modular Neural Networks 
and Type-2 Fuzzy Logic 


POTS TRUE o i n ENEE EEEE ENE E OEN ne 1509 
T9): Related Wark in thE Atia ooo beck ood cea wcdeaids teased cehis sirrinin 1509 
79.2 Overview of Fuzzy Edge Detectors .............. cece ccceceeeeeeeeceseees 1510 
T93 Experimemal SAUD cnc oisckn ces Sec heed adie nc do sanenis oecescnsania chase 1512 
79.4 Experimental Results.. 2... cc ccc ccc eens denee teen end sinseeeeneneas 1513 
TS- CONDUTOS e E ENa EEEE EE 1515 
Referents enine enana s shee die EE ES 1515 
80 Fuzzy Controllers for Autonomous Mobile Robots 
Patricia Melin, Oscar Castillo ......csssoscsnssonesseessnesesseeseeseesreseesseee 1517 
80.1. Fuzzy Control of Mobile Robots... coca. cccsrsiioiieccriesicesrsisesnssio 1517 
80.2 The Chemical Optimization Paradigm..........sssssesessseseesessseses 1518 
803 The MODINE ROO. crrccrreenr ona esna a EEE 1521 
80.4 Fuzzy Logic Controler.. cc ciscncncting saad ceaiene ede neste ended endwedemenieiee 1522 
80.5 Experimental RESUS. .cascidcies.c cued abcde chosen dis ini n i 1523 
Sao COM SONS is acviccedniini oiatniesrpincin eide wisp nara AEE S 1530 
Rere MES anarai ear EE E NE OE NEE ew eee 1530 
81 Bio-Inspired Optimization Methods 
Fevrier VINER nn cessiince ewe: ine Pieces cae sian sales ann t EAE bina ge Wien nig EEE Wiece'g'e 6 Biore els 1533 
81.1- Bio-inəpired Methode ®. :., ssccsesvscpanviaeswdes een’ OE EEA EREA 1533 
81.2 Bio-Inspired Optimization Methods ........sssssssessssssesesssseseses 1533 
81.3 A BierHistory Of GPUS 63 ais coves sar aiteanna ini ann aR 1535 
BLA ‘Experimental Ress eccidi i e ves gonenee 1535 
BLG COMCUSIONS serari isra EEE 1538 
Referentes aocor iinei sT sages e bere cd nveleenesadevandeetacun eee v ered sei eerees 1538 
Acknowledgements ..................cccccecceecceeceeeseeeeeeseeeeeeeeseeeneeees 1539 
About the Authors.. coii. es cccee cece ca vscensesa canvas eases arie cones rinsa 1543 
Detailed COC oo 5 eicatencendi veces instchcesesacencdencans watedie dicated 1569 


VN AEA TE bd niacs Sore des asics a sda BR ands Seen bak wedebionlagededauseatansees 1605 


XLV 


List of Abbreviations 


Symbols 


1-D one-dimensional 

2-D two-dimensional 

3-CNF-SAT three variables/clause-conjunctive normal 
form-satisfiability 

3-D three-dimensional 

A 

A2A all-to-all 

AaaS analytics-as-a-service 

AANN auto-associative neural network 

ABC artificial bee colony 

ACC anterior cingulate cortex 

ACO ant colony optimization 

ACP active categorical perception 

ACS action-centered subsystem 

ACS ant colony system 

ACT-R adaptive control of thought-rational 

AD anomaly detection 

ADC analog digital converter 

ADF additively decomposable function 

ADF automatically defined function 

ADGLIB adaptive genetic algorithm optimization 
library 

AER address event representation 

AFPGA adaptive full POD genetic algorithm 

AFSA artificial fish swarm algorithm 

Al anomaly identification 

Al artificial intelligence 

AIC Akaike information criterion 

AICOMP comparable based AI model 

AIGEN generative AI model 

ALCS anticipatory learning classifier system 

ALD approximate linear dependency 

ALM asset—liability management 

ALU arithmetic logic unit 

ALU arithmetic unit 

AM amplitude modulation 

amBOA adaptive variant of mBOA 

AMPGA adaptive mixed-flow POD genetic 
algorithm 

AMS anticipated mean shift 

AMT active media technology 

ANN artificial neural network 


AUC 
AVITEWRITE 


BDAS 
BER 
BeRoSH 


BFA 

BG 

BIC 
BINCSP 
BioHEL 


BKS 
BLB 
BMA 
BMDA 
BMF 
BMI 
BnB 
BOA 
BP 


analysis of variance 
Angelov—Yager 
alternating-position crossover 
automatic programming 

affine projection algorithm 
application programming interface 
aggregation pheromone system 
auto power spectral density 
approximate reasoning 

average ranking 

automatic relevance determination 
adaptive representation genetic 
optimization technique 

adaptive range MOGA 
application-specific integrated circuit 
answer-set programming 
adenosine triphosphate 

area under curve 

area under ROC curve 

adaptive vector integration to endpoint 
handwriting 


blood brain barrier 

brain—computer interface 

bee colony optimization 

Berkeley data analytics stack 

bit error rate 

behavior-based multiple robot system 
with host for object manipulation 
basic fuzzy algebra 

basal ganglia 

Bayesian information criterion 
binary constraint satisfaction problem 
bioinformatics-oriented hierarchical 
evolutionary learning 
Bandler—Kohout subproduct 

bag of little bootstrap 

Bayes model averaging 

bivariate marginal distribution algorithm 
binary matrix factorization 
brain—machine interface 

branch and bound 

Bayesian optimization algorithm 
bereitschafts potential 


XLVI 


List of Abbreviations 


back-propagation 
back-propagation through time 
base system builder 

bipolar satisfaction degree 
blind source separation 
Bayesian Yin- Yang 


c-granule 
CA 
CA 
CA 


CAD 
CAE 
CAM 
CART 
CBLS 
CBR 
CBR 
CC 
CCF 
CCG 
CD 
CEA 
CEBOT 
CF 
CF 

cf 
CFD 
CFG 
CFS 
CG 
CG 
cGA 
CGP 
CI 
CIP 
CIS 
CLB 
clk 
CLM 
CMA 
CMA 
CML 
cMOEA 
CMOS 


CNF 


complex granule 

cellular automata 

classification accuracy 

complete F-transform-based fusion 
algorithm 

computer-aided design 

contrastive auto-encoder 
computer-assisted manufacturing 


classification analysis and regression tree 


constraint-based local search 
case-based reasoner 
case-based reasoning 
coherence criterion 

cross correlation function 
controlling crossed genes 
contrastive divergence 
cellular evolutionary algorithm 
cellular robot 

collaborative filtering 
compact flash 

convergence factor 
computational fluid dynamics 
context-free grammar 
correlation feature selection 
center of gravity 
Cohen-Grossberg 

compact genetic algorithm 
Cartesian GP 

computational intelligence 
cross information potential 
Computational Intelligence Society 
configurable logic block 
clock 

component level model 
cingulate motor area 
covariance matrix adaptation 
coupled map lattice 

cellular MOEA 
complementary 
metal-oxide-semiconductor 
conjunctive normal form 


computational neuro-genetic modeling 
cellular neural network 

central nervous system 

center of area 

center of gravity 

coverage-based genetic induction 
cluster of processors 

constrained optimization problem 
Computing Research and Education 
center of set 

cluster of workstations 

constraint programming 
contrapositive symmetry 
conditional preference network 
centralized Pareto front 

central pattern generator 

cross power spectral density 
cummulative prospect theory 
central processing unit 
commonsense reasoning 

control register 

chemical reaction algorithm 
compositional rule of inference 

cell saving 

cognitive system 

contractual service agreement 
cumulative step-size adaptation 
covariate shift minimization 
common spatial pattern 

constraint satisfaction problem 
class-shape transformation 
corticospinal tract 

continuous-time finite Markov chain 
compute unified device architecture 
computing with words 

control word 

computing with words 

cycle crossover 


dopamine 

design and analysis of computer 
experiments 

denoising auto-encoder 

directed acyclic graph 

logic for data analysis 

database 

deep belief network 
decision-graph BOA 


List of Abbreviations 


DC 
DC/AD 
DCA 
DE 

dEA 
DENFIS 
deSNN 
DEUM 


DEUM 


DGA 
DIC 
DL 
DLPFC 
DLR 
DLS 
DM 
DM 
DMA 
dMOEA 
DNA 
DNF 
DNN 
DOE 
DOF 
DP 
DPF 
DPLL 
DPR 
DRC 
DREAM 


DRRS 


DRS 
DRSA 
DSA 
DSMGA 


DSP 
DSP 
DSS 
dtEDA 
DTI 
DTLZ 
DTRS 
DW 


direct current 
change/activate-deactivate 
de-correlated component analysis 
differential evolution 

distributed evolutionary algorithm 
dynamic neuro-fuzzy inference system 
dynamic eSNN 

density estimation using Markov random 
fields algorithm 

distribution estimation using Markov 
random fields 

direct genetic algorithm 

deviance information criterion 

deep learning 

dorsolateral prefrontal cortex 
German Aerospace Center 

dynamic local search 

displacement mutation operator 
decision maker 

direct memory access 

distributed MOEA 

deoxyribonucleic acid 

disjunctive normal form 

deep neural network 

design of experiment 

degree of freedom 

dynamic programming 

distributed Pareto front 
Davis—Putnam—Logemann—Loveland 
dynamic partial reconfiguration 
domain relational calculus 
distributed resource evolutionary 
algorithm machine 

dynamically reconfigurable robotic 
system 

dominance resistant solution 
dominance-based rough set approach 
data space adaptation 
dependency-structure matrix genetic 
algorithm 

digital signal processing 

digital signal processor 

decision support system 
dependency-tree EDA 

diffusion tensor imaging 
Deb-Thiele—Laumanns-—Zitzler 
decision-theoretic rough set 

data word 


E 

EA evolutionary algorithm 

EAPR early access partial reconfiguration 

EBNA estimation of Bayesian network 
algorithm 

EC embodied cognition 

EC evolutionary computation 

EC evolutionary computing 

ECGA extended compact genetic algorithm 

ECGP extended compact genetic programming 

ECJ Java evolutionary computation 

ECoG electrocorticography 

ECOS evolving connectionist system 

EDA estimation of distribution algorithm 

EDP estimation of distribution programming 

EEG electroencephalogram 

EEG electroencephalography 

EFRBS evolutionary FRBS 

EFuNN evolving fuzzy neural network 

EGA equilibrium genetic algorithm 

EGNA estimation of Gaussian networks 
algorithm 

EGO efficient global optimization 

EHBSA edge histogram based sampling algorithm 

EHM edge histogram matrix 

EI expected improvement 

EKM enhanced KM 

EKMANI enhanced Karnik—Mendel algorithm with 
new initialization 

ELSA evolutionary local selection algorithm 

EM exchange mutation operator 

EM expectation maximization 

EMG electromyography 

EMNA estimation of multivariate normal 
algorithm 

EMO evolutionary multiobjective optimization 

EMOA evolutionary multiobjective algorithm 

EMSE excess mean square error 

EODS enhanced opposite directions searching 

EP evolutionary programming 

EP exchange property 

EPTV extended possibilistic truth value 

ER edge recombination 

ERA epigenetic robotics architecture 

ERA Excellence in Research for Australia 

ERD event-related desynchronization 

ERM empirical risk minimization 

ERS event-related synchronization 

ES embedding system 

ES evolution strategy 


XLVII 


XLVIII List of Abbreviations 


ESA 
ESN 
eSNN 
ESOM 
ETS 
EV 
EvoStar 


EvoWorkshops 


EW-KRLS 
EX-KRLS 


F 


FA 
FA 
FA 
FA-DP 


FATI 
FB-KRLS 
FCA 
FCM 
FDA 
FDRC 
FDT 
FEMO 


FGA 
FIM 
FIM 
FIR 
FIS 
FIS1 
FIS2 
FITA 
FL 
FLC 
FlexCo 
FLP 
FLS 
FM 
FMG 
FMM 
FMM 
fMRI 
FNN 
FNN 
FOM 


enhanced simple algorithm 

echo state network 

evolving spiking neural network 
evolving self-organized map 

evolving Takagi-Sugeno system 
extreme value 

Main European Events on Evolutionary 
Computation 

European Workshops on Applications of 
Evolutionary Computation 
exponentially weighted KRLS 
extended kernel recursive least square 


factor analysis 

firefly algorithm 

fractional anisotropy 

fitness assignment and diversity 
preservation 

first aggregation then inference 
fixed-budget KRLS 

formal concept analysis 

fuzzy c-means algorithm 
factorized distribution algorithm 
fuzzy domain relational calculus 
fuzzy decision tree 

fair evolutionary multi-objective 
optimizer 

fuzzy generic algorithm 

fuzzy inference mechanism 
fuzzy instance based model 
finite impulse response 

fuzzy inference system 

type-1 fuzzy inference system 
type-2 fuzzy inference system 
first inference then aggregation 
fuzzy logic 

fuzzy logic controller 

flexible coprocessor 

fuzzy linear programming 

fuzzy logic system 

fuzzy modeling 

full multi-grid 

finite mixture model 

fuzzy mathematical morphology 
functional magneto-resonance imaging 
fuzzy neural network 

fuzzy nearest neighbor 
full-order model 


footprint of uncertainty 

field programmable gate array 

full POD genetic algorithm 

fuzzy PID 

fractal prediction machine 

fuzzy particle swarm optimization 
floating point unit 

fuzzy query language 

fuzzy rule-based 

fuzzy rule-based classification systems 
fuzzy rule-based system 

fuzzy-rule based classifier 

fuzzy relational inference 

fuzzy set 

fuzzy system 

fuzzy support vector machine 
F-transform image compression 
unordered fuzzy rule induction algorithm 
foreign exchange 


general achievement 

genetic algorithm 
gamma-aminobutyric acid 

generalized approximate cross-validation 
genetic algorithm gradient 

genetic algorithm optimization toolbox 
granular computing 

graph coloring problem 

grammatical evolution 

Genetic and Evolutionary Computation 
Conference 

generalized fuzzy relational database 
green fluorescent protein 

genetic fuzzy system 

grouping genetic algorithm 
grammar-based genetic programming 
gray matter 

generalized modus ponens 

grammar model-based program evolution 
genetic algorithm 

genetic programming 

Gaussian process 

g-protein coupled receptor 
general-purpose GPU 

globus pallidus 

general-purpose input/output interface 
genetic pattern search 

graphics processing unit 


List of Abbreviations 


greedy partition crossover 

greedy randomized adaptive search 
procedure 

Gaussian RBM 

gene/protein regulatory network 
gene regulatory network 
generalized T2FS 

genome-wide association studies 
global workspace theory 


hierarchical probabilistic incremental 
program evolution 

hierarchical BOA 

hyper-cube framework 

hill climbing with learning 

hardware description language 
higher frequency band 
Hodgkin—Huxley 

health management 

hidden Markov model 
high-performance computing 

Hilbert space 

heuristic space search 

Hough transform 

hardware 

hardware internal configuration access 
point 


independent, identically distributed 
iterative algorithm with stop condition 
indicator based 

indicator-based evolutionary algorithm 
incremental Bayesian optimization 
algorithm 

intelligent controller 

interrupt controller 

independent component analysis 
internal configuration access point 
induced chromosome element exchanger 
International Conference on Machine 
Learning 

intelligent distribution agent 

iterated density estimation evolutionary 
algorithm 

inference engine 


Institute of Electrical and Electronics 
Engineers 

intuitionistic fuzzy 

independent factor analysis 

interval valued fuzzy set 

iterated greedy 

interactive granular computing 
iterative heuristic algorithm 
iterated local search 

interneuron 

International Neuroinformatics 
Coordinating Facility 

input/output hidden Markov model 
input/output block 

internet of things 

identity principle 

inductive programming 

intellectual property 

intellectual property interface 
inferior parietal lobe 

increasing population size 
interactive rough granular computing 
iteratively re-weighted least squares 
indiscernibility-based rough set approach 
Integrated Synthesis Environment 
inter-spike interval 

insertion mutation operator 

interval type-2 

interval T2FC 

interval T2FS 

integral time absolute error 
information theoretic learning 
incremental univariate marginal 
distribution algorithm 

inversion mutation operator 


Java distributed evolutionary algorithms 
library 

John Eddy genetic algorithm 

java multi-criteria and multi-attribute 
analysis framework 


kernel adaptive filter 

kernel affine projection algorithm 
knowledge base 

knowledge discovery and data mining 


XLIX 


L 


List of Abbreviations 


KGA Kriging-driven genetic algorithm 
KKT Karush—Kuhn—Tucker 
KL Kullback—Leibler 
KLMS kernel least mean square 
KM Karnik—Mendel 
KMC kernel Maximum Correntropy 
KNN k nearest neighbor 
KPCA kernel principal component analysis 
KRLS kernel recursive least square 
KUR Kurswae 
L 
LAN local network 
LASSO least absolute shrinkage and selection 
operator 
LB logic block 
LCS learning classifier system 
LDA latent Dirichlet allocation 
LDA linear discriminant analysis 
LDS limited discrepancy search 
LED light emitting diode 
LEM learning from examples module 
LERS learning from examples using rough sets 
LFA local factor analysis 
LFB lower frequency band 
LFDA learning FDA 
LFM linguistic fuzzy modeling 
LFP local field potential 
LGP linear GP 
LHS latin hypercube sampling 
LI law of importation 
LIFM leaky integrate-and-fire 
LLE liquid-liquid equilibrium 
LMI linear matrix inequalities 
LMS least mean square 
LNS large neighborhood search 
LO leading one 
LOCVAL locational value 
LOO leave-one-out 
LOOCV leave-one-out cross-validation 
LOTZ leading ones trailing zeroes 
LP logic programming 
LQR linear-quadratic regulator 
LR logistic regression 
LRP lateralized readiness potential 
LS least square 
LS local search 
LSM liquid state machine 
LSTM long short term memory 


MIL 
MIMIC 


MIML 
MISO 
MKL 
ML 

ML 
MLEM2 


linguistic term 

look-up table 

linguistic variable 
linguistic-variable-term 
locally-weighted projection regression 
algorithm 

leading zero 


motor cortex 

machine-to-machine 

Markovian agent 

memetic algorithm 

mean of the absolute error 

Java mimetic algorithms framework 
Markovian agent model 

multiple algorithms, multiple problems 
multiple algorithms and multiple 
problem instances 

maximum a posteriori 

multivariate adaptive regression splines 
multiple algorithms and one single 
problem 

mixed Bayesian optimization algorithm 
minor component analysis 
multi-criteria decision analysis 
multiple criteria decision aiding 
multiple criteria decision-making 
maximum cardinality search 
meta-cognitive subsystem 

minimum description length 

Markov decision process 
multidimensional scaling 
magnetoencephalogram 
magnetoencephalography 

minimal epistemic logic 

membership function 

Mackey-—Glass 

morphological gradient 

metaheuristic 

multi-instance learning 

mutual information maximizing input 
clustering 

multi-instance, multi-label learning 
multiple inputs-single output 
multiple kernel learning 

machine learning 

maximum likelihood 

modified LEM2 algorithm 


List of Abbreviations 


< 
= 
u 


MOP 
MOPSO 


MOSAIC 


MOSES 


MOT 
MPE 
MPE 
MPGA 
MPI 
MPM 
MPP 
MPS 
MR 
MRCP 
MRI 
mRMR 
mRNA 
MRNN 


MS 
MS 
MSA 


multilayer perceptron 

multiple linear regression 
multi-response linear regression 
mathematical morphology 
multimemetic algorithm 

MAX-MIN ant system 

model-based multiobjective evolutionary 
algorithm 

man—machine learning dilemma 
Markov network EDA 
memristor-based neural network 
modular neural network 
multiobjectivization-assisted multimodal 
optimization 

multiobjective evolutionary 

mixture of experts 

multiobjective evolutionary algorithm 
multiobjective evolutionary algorithm 
based on decomposition 
multiobjective genetic algorithm 
multiobjective genetic fuzzy system 
mean of maxima 

multi-objective messy GA 
multi-objective optimization 
many-objective optimization problem 
multiobjective problem 
multiobjective optimization problem 
multiobjective particle swarm 
optimization 

modular selection and identification for 
control 

meta-optimizing semantic evolutionary 
search 

movement time 

mean percentage error 

most probable explanation 
mixed-flow POD genetic algorithm 
message passing interface 

marginal product model 

massively parallel machine 
multiprocessor system 

maximum ranking 

movement-related cortical potentials 
magnetic resonance imaging 
minimal-redundancy-maximal-relevance 
messenger RNA 

memristor-based recurrent neural 
network 

master/slave 

motivational subsystem 

minor subspace analysis 


MSE mean square error 

MSG max-set of Gaussian landscape generator 

msMOEA master-slave MOEA 

MT medial temporal 

MTFL multi-task feature learning 

MTL multi-task learning 

MV maximum value 

MWRA minimum-weight rooted arborescence 

N 

NACS non-action-centered subsystem 

NASA National Aeronautics and Space 
Administration 

NC neural computation 

NC novelty criterion 

NC numerical control 

NCL negative correlation learning 

NDS nonlinear dynamical systems 

NEAT neuro-evolution of augmenting 
topologies 

NES natural evolution strategy 

NeuN neuronal nuclei antibody 

NFA non-Gaussian factor analysis 

NFI neuro-fuzzy inference system 

NFL no free lunch 

NHBSA node histogram based sampling 
algorithm 

NIL nondeterministic information logic 

NIPS neural information processing system 

NLMS normalized LMS 

NLPCA nonlinear principal components 

NMF negative matrix and tensor factorization 

NMF nonnegative matrix factorization 

NN neural network 

NOW networks of workstation 

NP neutrality principle 

NP nondeterministic polynomial-time 

NPV net present value 

NR noise reduction 

NS negative slope 

NSGA nondominated sorting genetic algorithm 

NSPSO nondominated sorting particle swarm 
optimization 

NURBS nonuniform rational B-spline 


ordinary differential equation 
original equipment manufacturer 


LI 


LII List of Abbreviations 


PC 
PC-SAFT 


PCA 
PCVM 
PD 

PD 
PDDL 
PDE 
pdf 
PDGP 
PEEL 


PERT 
PESA 


PET 
PFC 
PFC 
PFM 
PHM 
PIC 
PID 
PII 
PIPE 


online kernel learning 
online analytical processing 
operational momentum 
ordered modular average 
ordering property 

on-chip peripheral bus 

open programming language 
operations research 
operational research 
overshoot 

ordered weighted average 
ordered weighted maximum 
order crossover 

order-based crossover 


probably approximately correct 
Pareto-archived evolution strategy 
place and route 

perception-based computing 
population-based incremental learning 
programming by optimization 
probabilistic computing 

perturbed chain statistical associating 
fluid theory 

principal component analysis 
probabilistic classifier vector machine 
Parkinson disease 
proportional-differential 

planning domain definition language 
partial differential equation 
probability density function 

parallel and distributed GP 

program evolution with explicit 
learning 

program evaluation and review 
technique 

Pareto-envelope based selection 
algorithm 

positron emission tomography 
Pareto front computation 

prefrontal cortex 

precise fuzzy modeling 

prognostics and health management 
peripheral interface controller 
proportional-integral-derivative 
probabilistic iterative improvement 
probabilistic incremental program 
evolution 


PLA 
PLB 
PLS 
pLSA 
PLV 

PM 
PMBGA 


PMC 
PMI 
PMX 
PN 
PNS 
POD 
PoE 
POR 
POS 
PP 
PPSN 
PR 
PRAS 


PRM 
PRODIGY 


PRR 
PS 
PSA 
PSCM 
PSD 
PSD 
PSEA 
PSNR 
PSO 
PSS 
PSTH 
PTT 
PV 


programmable logic array 

processor local bus 

partial least square 

probabilistic latent semantic analysis 
phase lock value 

parallel model 

probabilistic model-building genetic 
algorithm 

premotor cortex 

partial mutual information 
partially-mapped crossover 
pyramidal neuron 

peripheral nervous system 

proper orthogonal decomposition 
product of experts 

preference order relation 
position-based crossover 

parallel platform 

parallel problem solving in nature 
partial reconfiguration 
polynomial-time randomized 
approximation scheme 

partially reconfigurable module 
program distribution estimation with 
grammar model 

partially reconfigurable region 
pattern search 

principal subspace analysis 
problem-space computational model 
power spectral density 

predictive sparse decomposition 
Pareto sorting evolutionary algorithm 
peak signal-to-noise ratio 

particle swarm optimization 
problem space search 
peri-stimulus-time histogram 
pursuit-tracking task 

principal value 

persistent vegetative state 

pulse width modulation 


quantile—quantile 

quadratic assignment problem 
quantum-inspired eSNN 
quadratic information potential 
quantized KLMS 

quadratic programming 


List of Abbreviations 


R 

r.k. reproducing kernel 

RAF representable aggregation function 

RAM random access memory 

RANS Reynolds-averaged Navier-Stokes 

RB rule base 

RBF radial basis function 

RBM Boltzmann machine 

rBOA real-coded BOA 

RECCo robust evolving cloud-based controller 

RecNN recursive neural network 

REGAL relational genetic algorithm learner 

REML restricted maximum likelihood 
estimator 

RET relevancy transformation 

RFID radio frequency identification 

RFP red fluorescent protein 

RGB red-green-blue 

RGN random generation number 

RHT randomized Hough transform 

RII randomized iterative improvement 

RISC reduced intstruction set computer 

RKHS reproducing kernel Hilbert space 

RL reinforcement learning 

RLP randomized linear programming 

RLS randomized local search 

RLS recursive least square 

RM-MEDA regularity model based multiobjective 
EDA 

RMI remote method invocation 

RMSE root-mean-square error 

RMSEP root-mean-square error of prediction 

RMTL regularized multi-task learning 

RN regularization network 

RNA ribonucleic acid 

RNN recurrent neural network 

ROC receiver operating characteristic 

ROI region of interest 

ROM read only memory 

ROM reduced-order model 

ROS robot operating system 

RP readiness potential 

RPC remote procedure call 

RPCL rival penalized competitive learning 

RS rough set 

rst reset 

RT reaction time 

RT real-time 

RTL register transfer logic 

RTRL real-time recurrent learning 


SAE 
SAMP 


SARSA 
SASP 


SAT 
SBF 
SBO 
SBR 
SBS 
SBSO 
SBX 
SC 

SC 
SCH 
SCNG 
SD 
SDE 
SDPE 
SEAL 
SEMO 


SF 
SFS 
SG-GP 


SHCLVND 


SI 
SIM 
SISO 
SLAM 
SLF 
SLS 
SM 
SM 
SMA 


remaining useful life 
roulette wheel selection 


simple ant colony optimization 
section-bit 

semi-supervised support vector 
machine 

simple F-transform-based fusion 
algorithm 

simulated annealing 

sparse auto-encoder 

one single algorithm and multiple 
problems 
state-action-reward-state-action 
one single algorithm and one single 
problem 

satisfiability 

subspace-based function 
surrogate-based optimization 
similarity based reasoning 
sequential backward selection 
surrogate based shape optimization 
simulated binary crossover 

soft computing 

surprise criterion 

school 

sparse coding neural gas 

structured data 

stochastic differential equation 
standard deviation percentage error 
simulated evolution and learning 
simple evolutionary multi-objective 
optimizer 

scaling factor 

sequential forward selection 
stochastic grammar-based genetic 
programming 

stochastic hill climbing with learning by 
vectors of normal distribution 
swarm intelligence 
simple-inversion mutation operator 
single input single output 
simultaneous localization and mapping 
superior longitudinal fasciculus 
stochastic local search 

scramble mutation operator 
surrogate model 

supplementary motor area 


LII 


LIV 


List of Abbreviations 


SMO 

SMP 

SMR 

sMRI 
SMS-EMOA 


SNARC 


SNE 
SNP 
SNR 
SOC 
SOFM 
SOFNN 
SOGA 
SOM 
SPAM 


SPAN 
SPD 

SPEA 
SPOT 


SPR 
SQL 

SR 

SR 

SRD 

SRF 

SRM 

SRM 

SRN 

SRT 

SSM 

SSOCF 

SSSP 

StdGP 

STDP 

STDP 

STGP 

SU 
SURE-REACH 


SUS 
SVaR 
SVC 

SVD 
SVM 

SW 
SW-KRLS 


sequential minimum optimization 
symmetric multiprocessor 
sensorimotor rhythm 

structural magnetic resonance imaging 
S-metric selection evolutionary 
multiobjective algorithm 
spatial—numerical association of response 
code 

stochastic neighborhood embedding 
single nucleotide polymorphism 
signal-noise-ratio 

self-organized criticality 
self-organized feature maps 
self-organizing fuzzy neural network 
single-objective genetic algorithm 
self-organizing map 

set preference algorithm for 
multiobjective optimization 

spike pattern association neuron 
strictly positive definite 

strength Pareto evolutionary algorithm 
sequential parameter optimization 
toolbox 

static partial reconfiguration 
structured query language 
stochastic resonance 

symbolic regression 

standard reference dataset 

strength raw fitness 

spike response model 

structural risk minimization 

simple recurrent network 

serial reaction time 

state-space model 

subset size-oriented common features 
single-source shortest path problem 
standard GP 

spike-timing dependent plasticity 
spike-timing dependent learning 
strongly typed GP 

single unit 

sensorimotor, unsupervised, 
redundancy-resolving control 
architecture 

stochastic universal sampling 
simplified value at risk 

support vector classification 
singular value decomposition 
support vector machine 

software 

sliding window KRLS 


T 
T1 type-1 
TIFC type-1 fuzzy controller 
TIFS type-1 fuzzy set 
T2 type-2 
T2FC type-2 fuzzy controller 
T2FS type-2 fuzzy set 
T2IC type-2 intelligent controller 
T2MF type-2 membership function 
TAG3P tree adjoining grammar-guided genetic 
programming 
TD temporal difference 
TDNN time delay neural network 
TET total experiment time 
TFA temporal factor analysis 
TGBF truncated generalized Bell function 
TN thalamus 
TOGA target objective genetic algorithm 
TR type reducer 
TRC tuple relational calculus 
TS tabu search 
TS time saving 
TSK Takagi-Sugeno—Kang 
TSP traveling salesman problem 
TIGA trainable threshold gate array 
TWNFI transductive weighted neuro-fuzzy 


æ 
= 
4g 


MDA 
ML 
PMOPSO 
S EPA 


Geeqdceae ace 


Cc 


JW 


V 


VB 
VC 
VC 
VCR 
VEGA 


inference system 


universal asynchronous 
receiver/transmitter 

unmanned aerial vehicle 

user constraint file 

supervised classifier system 

uniform cycle crossover 

univariate marginal distribution algorithm 
universal modeling language 
user-preference multiobjective PSO 
United States Environmental Protection 
Agency 

underwriter 


variational Bayes 
Vapnik-Chervonenkis 
variable consistency 
variance ratio criterion 
vector-evaluated GA 


List of Abbreviations 


VHDL VHSIC hardware description language 
VHS virtual heading system 
VHSIC very high speed integrated circuit 
VLNS very large neighborhood search 
VLPFC ventrolateral prefrontal cortex 
VLSI very large scale integration 
VND variable neighborhood descent 
VNS variable neighborhood search 
VPRSM variable precision rough set model 
VQ vector quantization 
VQRS vaguely quantified rough set 

Ww 
W2T wisdom web of things 
WAN wide area network 
WC Wilson—Cowan 
WEP weight error power 
WFG walking fish group 
WisTech Wisdom Technology 


WM 


white matter 


working memory 

wireless sensor network 

Wu-Tan 

winner-take-all 

weighted-weighted nearest neighbor 


x-anticipatory classifier system 
Xie-Beni cluster validity index 

X classifier system 

XCS for function approximation 
exponential natural evolution strategy 
Xilinx platform studio 

Xilinx system generator 


zeroth level classifier system 
Zitzler-Deb-Thiele 
Zonal Euler—Navier—Stokes 


LV 


Janusz Kacprzyk, Witold Pedrycz 


This Springer Handbook of Computational Intel- 
ligence is a result of a broad project that has 
been launched by us to respond to an urgent 
need of a wide scientific and scholarly community 
for a comprehensive reference source on Compu- 
tational Intelligence, a field of science that has 
for some decades enjoyed a growing popularity 
both in terms of the theory and methodology as 
well as numerous applications. As it is always the 
case in such situations, after some time once an 
area has reached maturity, and there is to some 
extent a consent in the community as to which 
paradigms, and tools and techniques may be use- 
ful and promising, the time will come when some 


The first and most important question that can be posed 
by many people, notably those who work in more tra- 
ditional and relatively well-defined fields of science, 
is what Computational Intelligence is. There are many 
definitions that try to capture the very essence of that 
field, emphasize different aspects, and — by necessity — 
somehow reflect the individual research interests, pref- 
erences, prospective application areas, etc. 

However, it seems that in recent years there has has 
been a wider and wider consent as to what basically 
Computational Intelligence is. Let us start with a cita- 
tion coming from the Constitution of the IEEE (Institute 
of Electrical and Electronics Engineers) CIS (Computa- 
tional Intelligence Society) — Article I, Section 5: 


The Field of Interest of the Society is the theory, 
design, application, and development of biolog- 
ically and linguistically motivated computational 
paradigms emphasizing neural networks, connec- 
tionist systems, genetic algorithms, evolutionary 
programming, fuzzy systems, and hybrid intelli- 
gent systems in which these paradigms are con- 
tained. 


1. Introduction 


1.1 Details of the Contents................0.0.0.... 
1.1.1. Part A Foundations... 
11.2 Part B Fuzzy LOSIC.. oenas 
1.1.3 Part C Rough Sets ois i cclseciccesescsaces 
1.1.4 Part D Neural Networks................... 
1.1.5 Part E Evolutionary Computation ..... 
1.1.6 Part F Swarm Intelligence ............... 
1.1.7 Part G Hybrid Systems................00. 


F FWWWNNN YN 


1.2 Conclusions and Acknowledgments.......... 


state of the art exposition, exemplified by this 
Springer Handbook, would be welcome. We think 
that this is the right moment. 


It seems that this is extremely up to the point, and we 
have basically followed this general philosophy in the 
composition of the Springer Handbook. 

Let us first extend a little bit the above essential 
land comprehensive description of what computational 
intelligence is interested in, what deals with, which 
tools and techniques it uses, etc. Computational In- 
telligence is a broad and diverse collection of nature 
inspired computational methodologies and approaches, 
and tools and techniques that are meant to be used 
to model and solve complex real-world problems in 
various areas of science and technology in which the 
traditional approaches based on strict and well-defined 
tools and techniques, exemplified by hard mathemat- 
ical modeling, optimization, control theory, stochastic 
analyses, etc., are either not feasible or not efficient. 
Of course, the term nature inspired should be meant in 
a broader sense of being biologically inspired, socially 
inspired, etc. 

Those complex problems that are of interest of com- 
putational intelligence are often what may be called 
ill-posed which may make their exact solution, using 
the traditional hard tools and techniques, impossible. 


5 
= 
= 
[e] 
a 
i= 
A 
=è 
o 
3 


2 


uolnpolu| 


Introduction 


However, we all know that such problems are quite ef- 
fectively and efficiently solved in real life by human 
being, or — more generally — by living species. One 
can easily come to a conclusion that one should de- 
velop new tools, maybe less precise and not so well 
mathematically founded, that would provide a solution, 
maybe not optimal but good enough. This is exactly 
what Computational Intelligence is meant to provide. 
Briefly speaking, it is most often considered that 
computational intelligence includes as its main com- 
ponents fuzzy logic, (artificial) neural networks, and 


1.1 Details of the Contents 


The afore-mentioned view of the very essence of com- 
putational intelligence has been followed by us when 
dividing the handbook into parts, which correspond to 
the particular fields that constitute the area of Compu- 
tational Intelligence, and also in the selection of field 
editors, and then their selection of proper authors. We 
have been very fortunate to be able to attract as the field 
editors and authors of chapters the best people in the 
respective fields. 


1.1.1 Part A Foundations 


For obvious reasons, we start the Springer Handbook 
of Computational Intelligence with Part A, Founda- 
tions, which deliver a constructive survey of some 
carefully chosen topics that are of importance for vir- 
tually all ensuing parts of the handbook. This part, 
edited by Professors Radko Mesiar and Bernard De 
Baets, involved foundational works on multivalued log- 
ics, possibility theory, aggregation functions, measure- 
based integrals, the essence of extensions of fuzzy sets, 
F-transforms, mathematical programming, and games 
under imprecision and fuzziness. It is easy to see that 
the contributions cover topics that are of profound 
relevance. 


1.1.2 Part B Fuzzy Logic 


Part B, Fuzzy Logic, edited by Professors Luis 
Magdalena and Enrique Herrera-Viedma, attempts to 
present the most relevant elements and issues related 
to a vast area of fuzzy logic. First of all, a comprehen- 
sive account of foundations of fuzzy sets theory has 
been provided, emphasizing both theoretical and ap- 
plication oriented aspects. This has been followed by 


evolutionary computation. Of course, these main el- 
ements should be properly meant. For instance, one 
should understand that fuzzy logic is to be comple- 
mented by rough set theory or multivalued logic, neural 
networks should be more generally meant as including 
all kinds of connectionist systems and also learning sys- 
tems, and evolutionary computation should be viewed 
as the area including swarm intelligence, artificial im- 
mune systems, bacterial algorithms, etc. One can also 
add in this context many other approaches like the 
Dempster-Shafer theory, chaos theory, etc. 


a state-of-the-art presentation of the concept, properties, 
and applications of fuzzy relations, including a brief 
historical perspective and future challenges. A similar 
account of the past, present, and future of an extremely 
important concept of fuzzy implications has then been 
authoritatively covered. 

Then, various issues related to fuzzy systems mod- 
eling have been presented; notably the concept and 
properties of fuzzy-rule-based systems which are the 
core of fuzzy modeling, and the problem of inter- 
pretability of fuzzy systems. Fuzzy clustering, which 
is one of the most widely used tools and techniques, 
encountered in virtually all problems related to data 
analysis, modeling, control, etc., is then exposed, with 
focus on the past, present, and future challenges. Then, 
issues related to Zadeh’s seminal idea of comput- 
ing with words have been thoroughly studied from 
many perspectives, in particular a logical and algebraic 
one. 

Since fuzzy (logic) control is undoubtedly the 
most vigorously reported industrial application of fuzzy 
logic, this subject has been presented in detail, both for 
the conventional fuzzy sets and their extensions, espe- 
cially type 2 and interval type-2 fuzzy sets. Applications 
of fuzzy logic in autonomous robotics have been pre- 
sented. 

An account of fundamental issues and solutions 
related to the use of fuzzy logic in database and infor- 
mation management has then been given. 


1.1.3 Part C Rough Sets 


Part C, Rough Sets, edited by Professors Roman Słow- 
inski and Yiyu Yao, starts with a comprehensive, 
rigorous, yet readable presentation of foundations of 


1.1 Details of the Contents 


rough sets, followed by a similar exposition focused 
on the use of rough sets to decision making, aid- 
ing, and support. Then, rule induction is considered 
as a tool for modeling, decision making, and data 
analysis. 

A number of important extensions of the basic con- 
cept of a rough set have then been presented, including 
the concept of a probabilistic rough set and a general- 
ized rough set, along with a lucid exposition of their 
properties and possible applications. 

A crucial problem of a fuzzy-rough hybridization is 
then discussed, followed by a more general exposition 
of rough systems. 


1.1.4 Part D Neural Networks 


Part D, Neural Networks, edited by Professors Cesare 
Alippi and Marios Polycarpou, starts with a general pre- 
sentation of artificial neural network models, followed 
by presentations of some mode specific types exempli- 
fied by deep and modular neural networks. 

Much attention in this part is devoted to machine 
learning, starting from a very general overview of 
the area and main tools and techniques employed, 
theoretical methods in machine learning, probabilistic 
modeling in machine learning, kernel methods in ma- 
chine learning, etc. 

An important problem area called neurodynamics 
is the subject of a next state-of-the-art survey, followed 
by a review of basic aspects, models, and challenges of 
computational neuroscience considered basically from 
the point of view of biophysical modeling of neural sys- 
tems. Cognitive architectures, notably for agent-based 
systems, and a related problem of computational mod- 
els of cognitive and motor control have then been 
presented in much detail. 

Advanced issues involved in the so-called embod- 
ied intelligence, and neuroengineering, and neuromor- 
phic engineering, emerging as promising paradigms for 
modeling and problem solving, have then been dealt 
with. 

Evolving connectionist systems, which constitute 
a novel, very promising architectures for the modeling 
of various processes and systems have been surveyed. 
An important part is on real-world applications of ma- 
chine learning completes this important part. 


1.1.5 Part E Evolutionary Computation 


Part E, Evolutionary Computation, edited by Profes- 
sors Frank Neumann, Carsten Witt, Peter Merz, Car- 


los A. Coello Coello, Oliver Schiitze, Thomas Bartz- 
Beielstein, Jorn Mehnen, and Giinther Raidl, concerns 
the third fundamental element of what is tradition- 
ally being considered to be the core of Computational 
Intelligence. 

First, comprehensive surveys of genetic algorithms, 
genetic programming, evolution strategies, parallel evo- 
lutionary algorithms are presented, which are readable 
and constructive so that a large audience might find 
them useful and — to some extent — ready to use. Some 
more general topics like the estimation of distribution 
algorithms, indicator-based selection, etc., are also dis- 
cussed. 

An important problem, from a theoretical and prac- 
tical point of view, of learning classifier systems is 
presented in depth. 

Multiobjective evolutionary algorithms, which con- 
stitute one of the most important group, both from the 
theoretical and applied points of view, are discussed in 
detail, followed by an account of parallel multiobjec- 
tive evolutionary algorithms, and then a more general 
analysis of many multiobjective problems. 

Considerable attention has also been paid to a pre- 
sentation of hybrid evolutionary algorithms, such as 
memetic algorithms, which have emerged as a very 
promising tool for solving many real-world problems in 
a multitude of areas of science and technology. More- 
over, parallel evolutionary combinatorial optimization 
has been presented. 

Search operators, which are crucial in all kinds of 
evolutionary algorithms, have been prudently analyzed. 
This analysis was followed by a thorough analysis of 
various issues involved in stochastic local search algo- 
rithms. 

An interesting survey of various technological and 
industrial applications in mechanical engineering and 
design has been provided. Then, an account of the use 
of evolutionary combinatorial optimization in bioinfor- 
matics is given. 

An analysis of a synergistic integration of meta- 
heuristics, notably evolutionary computation, and con- 
straint satisfaction, constraint programming, graph col- 
oring, tree decomposition, and similar relevant prob- 
lems completes the part. 


1.1.6 Part F Swarm Intelligence 


Part F, Swarm Intelligence, edited by Professors Chris- 
tian Blum and Roderich Gross, starts with a con- 
cise yet comprehensive introduction to swarm intel- 
ligence in optimization and robotics, two fields of 


3 


uoipnpo1}u] 


4 


uorpnpo}u] 


Introduction 


science in which this type of metaheuristics has been 
considered, and demonstrated to be powerful and 
useful. 

Then, a preference-based multiobjective particle 
swarm optimization model is covered as a good tool 
for airfoil design. An ant colony optimization model 
for the minimum-weight rooted arborescence problem, 
which may be a good model for many diverse problem 
in computer science, decision analysis, data analysis, 
etc. is discussed. 

An intelligent swarm of Markovian agents is 
the topic of a thorough analysis, which highlights 
the power and universality of this model. More- 
over, a probabilistic modeling of swarm systems is 
surveyed. 

Then some explicitly nature inspired algorithms 
based on how some species behave are presented, 
notably, a honey bee social foraging algorithm for 
resource allocation. Collective behavior modes and re- 
configurability are discussed in swarm robotics, com- 
plemented by an exposition of problems and so- 
lutions related to the collective manipulation and 
construction. 


1.1.7 Part G Hybrid Systems 


Part G, Hybrid Systems, edited by Professors Oscar 
Castillo and Patricia Melin, starts with papers on var- 
ious types of controllers developed with the aid of 
computational intelligence tools and techniques that are 
employed in a highly synergistic way. 

First, an interesting and visionary study of robust 
evolving cloud-based controller which combines new 
conceptual ideas with novel computing architecture is 
provided. Then evolving embedded fuzzy controllers as 
well as the bio-inspired optimization of type-2 fuzzy 
controllers are surveyed. New hybrid modeling and so- 
lution tools are then presented; notably multiobjective 
genetic fuzzy systems. The use of modular neural net- 
works and type-2 fuzzy logic is shown to be effective 
and efficient in various pattern recognition problems. 

A novel idea of using chemical algorithms for the 
optimization of interval type-2 and type-1 fuzzy con- 
trollers for autonomous mobile robots is shown. 

Finally, the implementation of bio-inspired opti- 
mization methods on graphic processing units is pre- 
sented and its efficiency is emphasized. 


1.2 Conclusions and Acknowledgments 


To summarize, the coverage of topics in the particular 
parts has certainly provided a comprehensive, rigorous 
yet readable state-of-the-art survey of main research 
directions, developments and challenges in Computa- 
tional Intelligence. 


Both more advanced readers, who may look for 
details, and novice readers, who may look for some 
more general and readable introduction that could be 
employed in their later works, would certainly find this 
handbook useful. 


Foundati 


Many-Valued and Fuzzy Logics 
Siegfried Gottwald, Leipzig, Germany 


Possibility Theory and Its Applications: 
Where Do We Stand? 

Didier Dubois, Toulouse Cedex 9, France 
Henry Prade, Toulouse Cedex 9, France 


Aggregation Functions on [0,1] 

Radko Mesiar, Bratislava, Slovakia 

Anna Kolesárová, Bratislava, Slovakia 
Magda Komornikova, Bratislava, Slovakia 


Monotone Measures-Based Integrals 
Erich P. Klement, Linz, Austria 
Radko Mesiar, Bratislava, Slovakia 


Part A Foundations 


Ed. by Bernard De Baets, Radko Mesiar 


The Origin of Fuzzy Extensions 

Humberto Bustince, Pamplona (Navarra), 
Spain 

Edurne Barrenechea, Pamplona (Navarra), 
Spain 

Javier Fernández, Pamplona (Navarra), 
Spain 

Miguel Pagola, Pamplona (Navarra), Spain 
Javier Montero, Madrid, Spain 


F-Transform 
Irina Perfilieva, Ostrava, Czech Republic 


Fuzzy Linear Programming and Duality 
Jaroslav Ramik, Karvina, Czech Republic 
Milan Vlach, Prague, Czech Republic 


Basic Solutions of Fuzzy Coalitional 
Games 

Tomas Kroupa, Prague, Czech Republic 
Milan Vlach, Prague, Czech Republic 


2. Many-Valued and Fuzzy Logics 


Siegfried Gottwald 


In this chapter, we consider particular classes 
of infinite-valued propositional logics which are 
strongly related to t-norms as conjunction con- 
nectives and to the real unit interval as a set of 
their truth degrees, and which have their impli- 
cation connectives determined via an adjointness 
condition. 

Such systems have in the last 10 years been of 
considerable interest, and the topic of important 
results. They generalize well-known systems of 
infinite-valued logic, and form a link to as differ- 
ent areas as, e.g., linear logic and fuzzy set theory. 

We survey the most important ones of these 
systems, always explaining suitable algebraic se- 
mantics and adequate formal calculi, but also 
mentioning complexity issues. 

Finally, we mention a type of extension which 
allows for graded notions of provability and 
entailment. 


2.1 Basic Many-Valued Logics....................0.. 8 
21.1 The Gödel LORIE «0.50.5 cccceccccessees canes 8 
2.1.2 The tukasiewicz Logics.................... 9 
213 TRE Product LOSIC «ccc. ccccscn cece onacces 9 
ZA The POSE logits cocsicseusiissisasss 10 
2.1.5 Algebraic Semantics .....0......:0:.-.000 10 

Pod, PUZZY SOUS: A E AA AE 11 
2.2.1 Set Algebra for Fuzzy Sets................ 11 
2.2.2 Fuzzy Sets and Many-Valued Logic.. 12 
2.2.3. t-Norms and t-Conorms................. 12 


Classical two-valued logic is characterized by two ba- 
sic principles. The principle of extensionality states that 
the truth value of any compound sentence depends only 
on the truth values of the components. The Principle 
of bivalence, also known as tertium non datur, states 
that any sentence is either true or false, nothing else 
is possible. Intuitively, a sentence is understood here 
as a formulation which has a truth value, i.e., which 


2.3 t-Norm-Based Logics ...................::::0:0e0 13 
2.3.1 Basic [qea i. sccciaccsmssaosteavcacanbeswece 13 
2.3.2 Left and Full Continuity of t-Norms. 14 
2.3.3 Extracting an Algebraic Framework.. 15 


2.4 Particular Fuzzy Logics .....................0008 16 
2.4.1 The Logic BL 
of All Continuous t-Norms.............. 17 
2.4.2 The Logic MTL 
of All Left Continuous t-Norms........ 18 
2.6.2 Extensions of MUL... csccccsssscsreessccs 19 
2.4.4 Logics of Particular t-Norms............ 20 
2.4.5 Extensions to First-Order Logics....... 20 
2.5 Some Generalizations........................008 21 
2.5.1 Adding a Projection Operator.......... 21 
2.5.2 Adding an Idempotent Negation ..... 22 
2.5.3 Logics with Additional 
Strong Conjunctions ..................0 22 
2.5.4 Logics Without 
a Truth Degree Constant.................. 22 
2.5.5 Logics with a Noncommutative 
Strong Conjunto.. sisses 22 
2.6 Extensions with Graded Notions 
Of Inference e eaaa 23 
2.6.1 Pavelka-Style Approaches............... 23 
2.6.2 A Lattice Theoretic Approach ........... 24 
2.7 Some Complexity Results ....................00. 25 
2.8 Concluding Remarks......................0..ccc08 27 
Referentes. avoin n a a Zi 


is true or false. In everyday language this excludes 
formulations like questions, requests, and commands. 
Nevertheless, this explanation sounds like a kind of cir- 
cular formulation. So formally one first fixes a certain 
formalized language, and then lays down formal cri- 
teria of what should count as a well-formed formula 
of this language, and particularly what should count as 
a sentence. 


8 PartA 


Foundations 


L'Z | Y Hed 


The principle of bivalence excludes hence self- 
contradictory formulations which are true as well as 
false, and it also excludes so-called truth value gaps, 
i. e., formulations which are neither true nor false. 

Based on the understanding that a sentence is a for- 
mulation which has a truth value, many-valued logic 
generalizes the understanding of what a truth value is 
and hence allows for more values as only the two classi- 
cal values true and false. To indicate this generalization, 
we speak here of truth degrees in the case of many- 
valued logics. And this allows the additional convention 
to have the name truth value reserved for the values true 
and false in their standard understanding in the sense of 
classical logic. 

There are many possibilities to choose particular 
sets of truth degrees. Thus there are quite different 
systems of many-valued logic. However, each partic- 


2.1 Basic Many-Valued Logics 


If one looks systematically for many-valued logics 
which have been designed for quite different applica- 
tions, one finds four main types of systems: 


The Lukasiewicz logics Ly as explained in [2.1]; 
The Gödel logics G, from [2.2]; 

The product logic J studied in [2.3]; 

The Post logics P, for 2 < m € N from [2.4]. 


The first two types of many-valued logics each of- 
fer a uniformly defined family of systems which differ 
in their sets of truth degrees and comprise finitely val- 
ued logics for each one of the truth degree sets W, = 
{0, i =. ..., 1}, n> 2, together with an infinite- 
valued system with truth degree set Woo = (0, 1]. 
Common reference to the finite-valued and the infinite- 
valued cases is formally indicated by choosing k € {n € 
N | n> 2}U {oo} as an index. 

For the fourth type an infinite-valued version is 
lacking. 

In their original presentations, these logics look 
rather different, regarding their propositional parts. For 
the first-order extensions, however, there is a unique 
strategy: one adds a universal and an existential quan- 
tifier such that quantified formulas get, respectively, as 
their truth degrees the infimum and the supremum of all 
the particular cases in the range of the quantifiers. 

As areference for these and also other many-valued 
logics in general, the reader may consult [2.5]. 


ular one of these systems L has a fixed set of truth 
degrees Wg. Furthermore, each such system has its 
set DL C WL of designated truth degrees: formulas 
of the corresponding formalized language are logi- 
cally valid iff they always have a designated truth 
degree. 

Instead of the principle of bivalence each system L 
of many-valued logic satisfy a principle of multivalence 
in the sense that any sentence has to have exactly one 
truth degree out of Wy. And the principle of extension- 
ality now states that the truth degree of any compound 
sentence depends only on the truth degrees of the com- 
ponents. 

Fuzzy logics are particular infinite-valued logics 
which have, at least in their most simple forms, the real 
unit interval [0, 1] as their truth degree sets, and which 
have the degree 1 as their only designated truth degree. 


Our primary interest here is in the infinite-valued 
versions of these logics. These ones have the clos- 
est connections to the fuzzy logics discussed later on. 
Therefore, we further on write simply L instead of Loo, 
and G instead of Goo. 

For simplicity of notation, later on we often will use 
the same symbol for a connective and its truth degree 
function. It shall always become clear from the context 
what actually is meant. 


2.1.1 The Gödel Logics 


The simplest ones of these logics are the Gédel logics 
Gx which have a conjunction A and a disjunction V de- 
fined by the minimum and the maximum, respectively, 
of the truth degrees of the constituents 
uAv=min{u, v}, uV v = max{u, v}. (2.1) 
These Gödel logics have also a negation ~ and an 
implication —> g defined by the truth degree functions 


l, ifu=0; l, ifu<v; 

Rus u>GgVv= 
0, ifu>0. v, ifu>v. 
(2.2) 


The systems differ in their truth degree sets: for each 
2<« < œ the truth degree set of Gy is Wx. 


Many-Valued and Fuzzy Logics | 2.1 Basic Many-Valued Logics 


As shown by Dummett [2.6], the infinite-valued 
propositional Gödel logic G has an adequate axiomati- 
zation which is provided by an adequate axiomatization 
of the intuitionistic propositional logic enriched with 
the additional axiom schema 


>Y) => gp). (2.3) 


Later on, in Sect. 2.5, we will recognize another 
axiomatization because G is a particular t-norm-based 
logic. 


2.1.2 The tukasiewicz Logics 


The Łukasiewicz logics Lę, again with 2 < k < œo, have 
originally been designed in [2.1] with only two primi- 
tive connectives, an implication —>, and a negation — 
characterized by the truth degree functions 


su = l —u, u —>, v = min{1, 1—u +v}. (2.4) 


The systems differ in their truth degree sets: for each 
2 <« < œ the truth degree set of Ly is We. 

However, it is possible to define further connectives 
from these primitive ones. With 


g & y =a le > Yy), 
o Y Y =a >Y (2.5) 


one gets a (strong) conjunction and a (strong) disjunc- 
tion with truth degree functions 


u & v= max{u+v—1,0}, 


u Y v= min{u +v, 1}, (2.6) 


usually called the Łukasiewicz (arithmetical) conjunc- 
tion and the Łukasiewicz (arithmetical) disjunction. It 
should be mentioned that these connectives are linked 
together via a De Morgan’s law using the standard nega- 
tion of this system 


a(u&v) = >u Y =v. (2.7) 
With the additional definitions 


PAW =a 9 & (p> y) 
oV Y =a lp >Y) LV (2.8) 


one gets another (weak) conjunction ^ with truth de- 
gree function min, and a further (weak) disjunction V 


with max as truth degree function, i. e., one has the con- 
junction and the disjunction of the Gédel logics also 
available. 

The infinite-valued propositional Lukasiewicz logic 
L, with implication and negation as primitive connec- 
tives, has an adequate axiomatization consisting of the 
axiom schemata: 


(lool) 9. (W1¢9), 

(lLoo2) (>Y) L(Y >x) > (YL), 
(loo3) Cy L779) > (p > V), 

(L4) (> Y)> Y)> (Y > p)> p) 


together with the rule of detachment as the only infer- 
ence rule. 

Later on, in Sect. 2.5, we will recognize another 
axiomatization because L is a particular t-norm-based 
logic. 


2.1.3 The Product Logic 


The product logic IT, in detail explained in [2.3], has the 
real unit interval as truth degree set, has a fundamental 
conjunction © with the ordinary product of reals as its 
truth degree function, and has an implication — 77 with 
the truth degree function 


ifu<v; 


1, 
ur>a7av= Vu. (2.9) 
-, ifu<v. 
v 


Additionally, it has a truth degree constant 0 to denote 
the truth degree zero. 

In this context, a negation and a further conjunction 
are defined as 


X P =a P > 0, 
gp^Y =a pO l>r Y). (2.10) 


Routine calculations show that both connectives coin- 
cide with the corresponding ones of the infinite-valued 
Gödel logic G. And also the disjunction V of this Gödel 
logic becomes available, now via the definition 


oyy =a (>n Y)>n Y) 
Ay >n 9) >n p). (2.11) 


There is, however, no natural way to combine with 
this (infinite valued) product logic a whole family of 
finite-valued systems by simply restricting the set of 
truth degrees to some W, as in the previous two cases: 


9 


L'Z | Y Hed 


10 PartA 


Foundations 


L'Z | Y Hed 


besides W, no such set is closed under the ordinary 
product, and for W, the product coincides, e.g., with 
the minimum operation. 

Later on, in Sect. 2.4, we will recognize an ade- 
quate axiomatization because also the product logic M 
is a particular t-norm-based logic. Contrary to the pre- 
vious cases of G and L, however, there is no essentially 
different axiomatization known as this later one. 


2.1.4 The Post Logics 


The Post system P,,, for m > 2 has truth degree set W. 
These propositional systems have been originally for- 
mulated uniformly in negation and disjunction as basic 
connectives with the following truth degree functions 


Í; foru=0, 
foru#0, 


u— > 
m— 


uV v = max{u, v} . 


Contrary to the previous systems, the definition of nega- 
tion here does not seem to be given in a uniform way 
independent of the number of truth degrees. However, 
it is always just a cyclic permutation of all the truth de- 
grees (in their natural order). 

For the sets of designated truth degrees, a canonical 
choice does not exist; already Post [2.4] has discussed 
the possibility that there may be chosen truth de- 
grees different from 1 as designated ones. Nevertheless, 
DP = {1} is a kind of standard choice. 

The set of basic connectives of each one of the Post 
systems P,,, is functionally complete, i. e., allows to rep- 
resent every possible truth degree function (over W,,). 
Therefore, each one of the Post systems P,,,, with Dp = 
{1} as the set of designated truth degrees, covers its cor- 
responding Lukasiewicz system with the same set of 
truth degrees — in the sense that the set of L,,-tautologies 
is a subset of the set of P,,-tautologies, and that this 
set of P,,-tautologies does not contain any formula g 
whose Lukasiewicz negation —¢ is L,,-satisfiable. And 
the same holds true for the corresponding m-valued 
Gödel system Gn. 

If one enriches all the finitely many-valued (propo- 
sitional) Łukasiewicz systems L,, with truth degree 
constants for all their truth degrees, then these enriched 
systems L* become functionally complete. And this 
means that the extended m-valued Łukasiewicz systems 
L* and the m-valued Post logics become interdefinable 
(for each fixed number m of truth degrees). Hence there 
is in principle no essential difference between both 


types of (finitely valued) systems: all what can be ex- 
pressed in the Post world can also be expressed in the 
(extended) Lukasiewicz world, and vice versa. 

We omit to discuss adequate axiomatizations be- 
cause these Post logics will not be of particular interest 
later on in this chapter. The interested reader might con- 
sult [2.5]. 


2.1.5 Algebraic Semantics 


All these previously discussed many-valued logics have 
been introduced by their standard semantics. 

Besides these standard semantics, all these many- 
valued logics have also algebraic semantics determined 
by suitable classes K of truth degree structures. The 
situation is similar here to the case of classical logic: 
the logically valid formulas in classical logic are also 
just all those formulas which are valid in all Boolean 
algebras. 

Of course, these structures have the same signature 
as the language £ of the corresponding logic, and they 
have to have — in the case that one discusses the cor- 
responding first order logics — suprema and infima for 
all those subsets which may appear as value sets of for- 
mulas. Particularly, hence, they have to be (partially) 
ordered, or at least preordered. 

For each formula ¢ of the language £ of the corre- 
sponding logic, for each such (generalized truth degree) 
structure A, and for each evaluation e which maps the 
set of atomic formulas of £ into the carrier of A, one 
has to define a value Val(g, e), and finally one has to 
define what it means that such a formula ¢ is valid in 
A. Then a formula ¢ is logically valid w.r.t. this class 
K iff g is valid in all structures from K. 


Gödel and tukasiewicz Logics 
It is remarkable that for both these types of many- 
valued logics corresponding algebraic semantics have 
mainly been developed for the infinite-valued systems, 
and have been considered in the context of complete- 
ness proofs. 

For the infinite-valued Gödel logic G such a class of 
structures is, according to the completeness proof given 
by Dummett [2.6], the class of all Heyting algebras, i. e., 
of all relatively pseudo-complemented lattices, which 
satisfy the prelinearity condition 

(u->v)UW>u)=1. (2.12) 
Here U is the lattice join and >> the relative pseudo- 
complement. 


Many-Valued and Fuzzy Logics | 2.2 Fuzzy Sets 


For the infinite-valued Lukasiewicz logic L the cor- 
responding class of structures is the class of all MV- 
algebras, first introduced again within a completeness 
proof by Chang [2.7], and more recently extensively 
studied in [2.8]. 

It is interesting to recognize that all these struc- 
tures — prelinear Heyting algebras, MV-algebras, and 
product algebras — are Abelian lattice-ordered semi- 
groups with an additional residuation operation. 

For the finite-valued logics from both families, sep- 
arately developed algebraic semantics did not yet find 
considerable interest. 


Product Logic 
The product logic, as introduced in [2.3], was from the 
very beginning designed as a logic which had, in par- 
allel, a standard semantics — provided by the real unit 
interval and by a product-based conjunction as a funda- 
mental connective — as well as an algebraic semantics, 
formed by the class of all product algebras — introduced 
in [2.3] again within a completeness proof. 

We shall not explain more details here because this 
whole approach proved to become paradigmatic for 
the development of t-norm-based infinite-valued logics, 
a topic which shall be discussed later on, starting with 
Sect. 2.3. 


Post Logics 
Contrary to the situation for the Lukasiewicz and the 
Gödel systems, for the Post systems in their original 
form there exist only very few syntactically oriented 
studies toward constituting or investigating logical cal- 
culi for these systems. Instead, for the Post systems one 
mainly was interested in the corresponding algebraic 
structures, which were suitable to form an algebraic se- 
mantics, and investigated such structures earlier, and in 
more detail, as similar structures for the Lukasiewicz 
and the Gédel systems. Rosenbloom in a paper [2.9] of 
1942 was the first one to do this. His algebraic struc- 


2.2 Fuzzy Sets 


A fuzzy set A is usually a fuzzy subset of a given 
set X and characterized by its membership function 
Ha : X + [0, 1]. The set X is often called the universe 
of discourse. This notation derives from [2.18]. So these 
fuzzy sets are (possibly) first-level objects of a cumula- 
tive hierarchy, with the elements of X as urelements. 
But the usual applications do not need higher level 


tures shall here be called P-algebras for short — but not 
be considered in detail: the interested reader might, e.g., 
consult [2.5]. 

One of the main reasons for the difficulty and com- 
plexity of the defining conditions of P-algebras is the 
fact that the Post systems as well as the P-algebras have 
only two primitive notions, their connectives resp. their 
basic operations, but have maximal expressive power 
in the sense of being functionally complete. That this 
choice of the primitive notions really is the main 
obstacle toward a simplification became clear as Ep- 
stein [2.10] in 1960 changed these basic operations and 
found a much simpler class of definitionally equivalent 
algebras, now called Post algebras. 

What are not covered by these basic considerations 
are possible infinite-valued generalizations of these log- 
ical calculi, or of these Post algebras. Approaches 
toward this problem started, e.g., with papers on gen- 
eralizations of the notion of Post algebras like [2.11- 
13]. The most influential paper, however, which also 
discussed the corresponding logical systems was the 
paper [2.14] of Rasiowa in which Post algebras of the 
order œw + 1 and the corresponding systems of infinitely 
many-valued (first-order) logic have been introduced. 
The algebraic theory of these Post algebras of the order 
æ + 1 is partly given in [2.14]. 

Another such infinitely many-valued generalization 
of the standard Post systems is discussed, e.g., in [2.15, 
16], Post algebras of the order w + w*. 

The Post algebras of finite or infinite order and 
the systems of many-valued logic related with them 
seem to be of particular importance for investigations in 
computer science, which rely on many-valued logic as 
a toolbox, because these Post systems are functionally 
complete and well suited to study the representabil- 
ity of truth degree functions on the basis of some 
predetermined set of basic truth degree functions, as 
determined, e.g., by available electronic components, 
cf. [2.17] for a still good introduction. 


fuzzy sets. And also for our discussion of the back- 
ground logic such higher level fuzzy sets do not matter. 


2.2.1 Set Algebra for Fuzzy Sets 


Mathematically, it is customary to identify such a fuzzy 
subset of X with its membership function. Accordingly, 


11 


Tz | Y Hed 


= 


Tz | Y Hed 


2 PartA | Foundations 


Hala) and A(a) both are used to denote the member- 
ship degree of the object a € X w.r.t. the fuzzy set A. 
For any binary operation * between membership de- 
grees the pointwise approach means to define from it 
a binary operation ® for fuzzy sets such that the fuzzy 
set A ® B is characterized by 


A @® B(x) = A(x) * B(x) forallxeX. (2.13) 


Hence, Zadeh’s standard intersection A N B and union 
AUB are characterized by 


AN B(x) = min{A (x), B}, 
AU B(x) = max{A(a), B(x)} . (2.14) 


Additionally, again following the first proposal 
from [2.18], one usually also defines the complement 
CxA of a fuzzy set A by the condition 


CxA(x) =1—A(x)  forallxe X. (2.15) 


However, in [2.18] also other versions of such binary 
operations had been mentioned: an algebraic product 
AB and an algebraic sum A + B defined through 


AB(x) = min{A(x)- B(x)}, 
A+ B(x) = min{A(x) + B(x), 1} (2.16) 


as well as an absolute difference |A — B| defined by 
|A — B| (x) = |A(x) — B(x)| . (2.17) 


It is interesting to notice, and shall be explained in 
more detail in Sect. 2.2.2, that the operations (2.16) can 
be seen as generalized kinds of intersection and union 
operations, respectively. However, these operations are 
not idempotent: one has in general AA Æ A as well as 
A+AFA. 


2.2.2 Fuzzy Sets and Many-Valued Logic 


It is well known that there is a strong parallelism 
between the standard set algebraic operations of in- 
tersection, union, and complementation, and classical 
logic, namely, the operations of conjunction, disjunc- 
tion, and negation, determined by their truth value 
functions et, vel, non, respectively. So one usually de- 
fines, e.g., the intersection MM N of sets M,N by the 
condition 


xEMNN <= xEMAxEN, (2.18) 


with A for the conjunction operation of classical logic 
here. 


In more abstract terms, the idea here is that these 
operations are defined in such a way that the power set 
algebra P(M) of any set M is (isomorphic to) the direct 
product W™ = [],<,, W of the Boolean algebra W = 
({1, 0}, et, vel, non) of truth values of classical logic. 

A similar relationship can be recognized between 
the set algebra of fuzzy sets and suitable many-valued 
logics. It is simply necessary to consider the set of 
membership degrees for the fuzzy sets as set of truth 
degrees for a corresponding many-valued logic. So one 
can consider the operations (2.14) as intersection and 
union related to the Gédel logic G, or also related to 
the Lukasiewicz logic L. Similarly, the complemen- 
tation (2.15) is related to the negation operation of 
the Lukasiewicz logics, and the algebraic operations 
in (2.16) are an intersection operation with respect to 
product logic, and a union operations with respect to 
(the strong disjunction of) Lukasiewicz logic. Even the 
operation (2.17) can be defined via Lukasiewicz logic: 
one gets immediately via the corresponding truth de- 
gree functions 


ju— v| = 7((u>_ v) & v > u)). (2.19) 


In more abstract terms, again, the set algebraic oper- 
ations with respect to a particular [0, 1]-valued logic L 
should be defined in such a way that the class F(X) = 
[0, rN ag of fuzzy subsets of a universe of discourse X is 
(isomorphic to) the direct product WX = Tex W of 
the truth degree algebra W = ((0, 1],...) of this partic- 
ular [0, 1]-valued logic. 


2.2.3 t-Norms and t-Conorms 


For the previously mentioned nonidempotent intersec- 
tion operation, i.e., the algebraic product from (2.16), 
and for further similar possibilities the mathematically 
oriented part of the fuzzy community reached, mainly 
in the first half of the 1980s, a consensus that such 
generalized intersection operations should be defined 
via (2.13) from a triangular norm *. Such triangular 
norms — t-norms for short — had first been considered in 
the context of probabilistic metric spaces to get a suit- 
able version of a triangle inequality, cf. e.g. [2.19], and 
found since independent interest in different contexts, 
cf. [2.20,21]. They are isotonic, associative, and com- 
mutative binary operations in the unit interval which 
have 1 as their neutral element. This means that they 
make the unit interval an ordered monoid. 

The class of all t-norms is, however, very large 
and not yet really well understood. So the question 


Many-Valued and Fuzzy Logics | 2.3 t-Norm-Based Logics 


appears to restrict to suitable subclasses, e.g., to the 
continuous t-norms or to the left-continuous ones. (For 
a t-norm T left-continuity means that all the unary func- 
tions T, with T,(x) = T(a,x) for each a € [0, 1] are 
left-continuous. For continuity the conditions (i) that 
T is continuous as a binary function and (ii) that all 
the T, are continuous coincide [2.5, 20].) Standard ex- 
amples for continuous t-norms are the min-operation 
in [0, 1], also called Gédel t-norm Tg, the arithmetic 
product in [0,1], also called product t-norm Tp, and 
the Lukasiewicz t-norm T, : (u,v) => max{u+ v — 1, 0} 
which is the truth degree function of the strong con- 
junction in the Lukasiewicz many-valued systems. And 
a standard example for a left-continuous t-norm which 
is not continuous is the nilpotent minimum Tym de- 
fined as 


min{u,v}, ifu+tv>1 


Tum (u, v) = (2.20) 


0 otherwise . 


These examples for continuous t-norms are even 
characteristic in the sense that each continuous t-norm 
is an ordinal sum of isomorphic versions of T,, Tp, Tg, 
cf. [2.20] and also [2.5]. 

To explain what is meant by an isomorphic version 
of some f-norm, one has to start from an order automor- 
phism f of the unit interval, i. e., from a continuous 1—1 
onto map f : [0, 1] — [0, 1] with f(0) = 0 and f(1) = 1. 
Is now T a t-norm and T* : [0, 1]? — [0, 1] defined by 


T* (xy) =f '(TEO).FO))). (2.21) 


which equivalently means 


F(T" y)) = TFR).FO)). (2.22) 


2.3 t-Norm-Based Logics 


From the point of view of many-valued logic, a t-norm 
is a suitable candidate for a truth degree function of 
some generalized conjunction connective. Accepting 
this, one is essentially concerned with systems of many- 
valued logic with infinite truth degree set [0, 1]. And 
additionally one prefers to consider such systems which 
have the truth degree | as the only designated truth de- 
gree. (This means, e.g., that a formula of the language 
of such a system counts as logically valid just in case 
it always assumes this designated truth degree 1. This 


then T* is again a t-norm and called an isomorphic ver- 
sion of T, and T, T* are isomorphic t-norms. 

Parallel with t-norms one often also considers t- 
conorms: these are isotonic, associative, and commuta- 
tive binary operations in the unit interval which have 0 
as their neutral element. For the set algebra of fuzzy sets 
they define (possibly) nonidempotent unions, and for 
the background logics they constitute (possibly) non- 
idempotent disjunctions. 

There is a natural 1—1 duality between t-norms and 
t-conorms. By 


1—S(u, v) = TA —u,1—v) (2.23) 


one determines a t-conorm S for any t-norm T, and con- 
versely determines a t-norm T for any f-conorm S. This 
relationship connects, e.g. the truth degree function 
(u, v) => max{u + v— 1,0} of the Lukasiewicz strong 
conjunction with that one of the corresponding strong 
disjunction (u, v) > min{u + v, 1}. 

Obviously (2.23) constitutes, for the background 
logic, a de Morgan connection between suitably chosen 
conjunctions and disjunctions — as long as the function 
ut> l — u acts as the truth degree function of a nega- 
tion. And indeed this is the truth degree function of 
the negation of the Łukasiewicz systems, which was al- 
ready used in the definition (2.15) of the complement of 
a fuzzy set. 

Summing up, one has for the background logic 
idempotent weak connectives for conjunction and dis- 
junction, determined by the minimum and the maxi- 
mum operation in [0, 1]. Furthermore, one is interested 
to have (possibly) nonidempotent strong connectives 
for conjunction and disjunction, determined by a t-norm 
and a t-conorm, usually one the dual of the other ac- 
cording to (2.23). 


notion, as well as the other notions from many-valued 
logic are explained in detail, e.g., in [2.5].) 


2.3.1 Basic Ideas 


Such a system of many-valued logic is called t-norm 
based (on some particular t-norm T) iff all the other 
connectives of it have associated truth degree functions 
which are defined from this t-norm T, using possibly 
some truth degree constants. Usually one considers to- 


13 


EZ | V Hed 


14 PartA 


Foundations 


EZ | Y Hed 


gether with the conjunction connective & with the truth 
degree function T an implication connective — with the 
truth degree function J; characterized by 

Ir(u, v) =ar supiz | T(u, z) < v} , (2.24) 
the so-called R-implication connected with T, and 
a standard negation connective — with truth degree 
function ny, given as 

nr(u) =a¢ Ir (u, 0) . (2:25) 
As shall be explained in the Sect. 2.3.2, the definition 
(2.24) determines a reasonable implication function just 
in the case that the t-norm T is left continuous. Here 
reasonable essentially means that —>p satisfies a suit- 
able version of the rule of detachment. 

In more technical terms it means that for left con- 
tinuous f-norms T condition (2.24) defines a residu- 
ation operator Ir, previously sometimes also called 
y-operator, cf. [2.22]. And it means also, under this as- 
sumption of left continuity of T, that condition (2.24) is 
equivalent to the adjointness condition 

T(u,w) <v = w<lr(u,v), (2.26) 
i. e., that the operations T and Ir form an adjoint pair. 

Forced by these results one usually restricts, in this 
logical context, the considerations to left continuous — 
or even to continuous — t-norms. 

But together with this restriction of the t-norms, 
a generalization of the possible truth degree sets some- 
times is useful: one may accept each subset of the unit 
interval [0, 1] as a truth degree set which is closed under 
the particular t-norm T and its residuum. 

The restriction to continuous t-norms enables even 
the definition of the operations max and min, which 
make [0, 1] into an (linearly) ordered lattice. On the one 
hand, one has from straightforward calculations that al- 
ways 


min{u, v} = T (u, Ir (u, v)) , (2.27) 


and on the other hand one gets always [2.23, 24] 


max{u, v} = min{Ir(Ir(u, v), v), Ir (Ir(v, u), u)} .- 
(2.28) 


It is a routine matter to check that the infinite-valued 
Gödel logic G, the infinite-valued Łukasiewicz logic L, 


and also the product logic TI all are t-norm-based logics 
in the present sense. 

The systems of fuzzy logic we discuss here are 
also sometimes called R-fuzzy logics, stressing the 
fact that our implication connectives — have as truth 
degree functions Ir the residuation operations, char- 
acterized by (2.24) or (2.26). Besides these R-fuzzy 
logics one occasionally, e.g., in [2.25,26], discusses 
so-called S-fuzzy logics which are also based on some 
t-norm, but additionally take the Lukasiewicz nega- 
tion nį (u) = 1 — u or also some other negation function, 
sometimes together with a further t-conorm, as a basic 
connective. 

These S-fuzzy logics define their implication con- 
nective like material implication might be defined in 
classical logic. However, these logics lose, in general, 
the rule of detachment as a sound rule of inference if 
they have the degree 1 as the only designated truth de- 
gree — or they allow all positive reals from (0, 1] as 
designated truth degrees. 

For a complete development of such t-norm-based 
logics one needs adequate axiomatizations. This seems 
to be, however, a difficult goal — essentially because 
of its dependency from the particular choice of the t- 
norms which determine these logics. Therefore, the first 
successful approaches intended to axiomatize common 
parts of a whole class of such logics. This will be dis- 
cuss later in Sect. 2.4. 


2.3.2 Left and Full Continuity of t-Norms 


As had been mentioned in the previous section, the 
adjointness condition is an algebraic equivalent of the 
analytical notion of left continuity. This will be proved 
here. 


Definition 2.1 

A t-norm T is left continuous (continuous) iff all the 
unary functions T; : x> T(x, a) for a € [0, 1] are left 
continuous (continuous). 


This definition of continuity for t-norms via their 
unary parametrizations coincides with the usual defini- 
tion of continuity for a binary function, cf. [2.20]. 


Proposition 2.1 
A t-norm T is left continuous iff T and its R-implication 


Ir form an adjoint pair. 


Proofs are given, e.g., in [2.5, 20, 22]. 


Many-Valued and Fuzzy Logics | 2.3 t-Norm-Based Logics 


It is interesting, and important later on, to also no- 
tice that the continuity of a t-norm has an algebraic 
equivalent. 


Proposition 2.2 
A t-norm T is continuous iff T and Ir satisfy the equa- 
tion 


T(a,Ip(a,b)) = min{a, b} . (2.29) 


Proof: Assume first that T is continuous. Then one 
has for a<be[0,1] immediately T(a,I;(a,b)) = 
T(a, 1) = a= min{a, b}. And one has for b < a 


T(a, I7(a, b)) = T(a, max{z | T(a,z) < bY) 


= max{T(a,z) | T(a,z) <b} <b 
(2.30) 


already by the left continuity of T. Continuity of T fur- 
thermore gives from 0 = T(a,0) < b<a=T(a, 1) the 
existence of some c € [0, 1] with b = T(a,c), and thus 
T(a,Ir(a, b)) = b = min{a, b} by (2.30). 

Assume conversely (2.29). Then the adjointness 
condition forces T to be left continuous. Hence for the 
continuity of T one has to show that T is also right con- 
tinuous. 

Suppose that this is not the case. Then there exist 
a,b € (0, 1], and also a decreasing sequence (x;);>o with 
limj;+oo x; =b such that T(a,b) Æ inf; T(a,x;), i.e., 
such that T(a,b) < inf; T(a,x;). Consider now some 
d with T(a, b) < d < inf; T(a, xi) < a. Then there does 
not exist some c € [0, 1] with d = T (a, c), because oth- 
erwise one would have d = T(a,c) > T(a,b), hence 
c > b and thus inf; T(a, x;) < T (a, c) = d from the fact 
that b = lim;—oo x; and there thus exists some integer k 
with x, < c. This means that the lack of right continuity 
for T contradicts condition (2.29). Oo 


2.3.3 Extracting an Algebraic Framework 


For the problem of adequate axiomatization of (classes 
of) t-norm-based systems of many-valued logic there is 
an important difference to the standard approach toward 
semantically based systems of many-valued logic: here 
there is no single, standard semantical matrix for the 
general approach. 

The most appropriate way out of this situation 
seems to be: to find some suitable class(es) of algebraic 
structures which can be used to characterize these log- 
ical systems, and which preferably should be algebraic 
varieties, i. e., equationally definable. 


From an algebraic point of view, the following con- 
ditions seem to be structurally important for t-norms: 


e ([0,1],7,1) is a commutative semigroup with 
a neutral element, i. e., a commutative monoid, 

© < isa (lattice) ordering in [0, 1] which has 0 as uni- 
versal lower bound and | as universal upper bound, 

© Both structures fit together: T is nondecreasing 
w.r.t. this lattice ordering. 


Thus it seems reasonable to consider commutative 
lattice-ordered monoids as the truth degree structures 
for the t-norm-based systems. 

In general, however, commutative lattice-ordered 
monoids may have different elements as the universal 
upper bound of the lattice and as the neutral element of 
the monoid. This is not the case for the t-norm-based 
systems, they make [0, 1] into an integral commuta- 
tive lattice-ordered monoid as truth degree structure, 
namely, one in which the universal upper bound of the 
lattice ordering and the neutral element of the monoidal 
structure coincide. 

Furthermore, one also likes to have the t-norm T 
combined with another operation, its R-implication op- 
erator, which forms together with T an adjoint pair: i. e., 
the commutative lattice-ordered monoid formed by the 
truth degree structure has also to be a residuated one. 

Summing up, hence, we are going to consider resid- 
uated lattices, i.e., algebraic structures (L, N, U, *, > 
,0, 1) such that L is a lattice under N, U with the uni- 
versal lower bound 0 and the universal upper bound 1, 
and a commutative lattice-ordered monoid under * with 
neutral element 1, and such that the operations * and >> 
form an adjoint pair, i.e., satisfy 

x*Z<y SS 7S(x>>Yy). (2.31) 
In this framework one additionally introduces, follow- 
ing the understanding of the negation connective given 
in (2.25), a further operation — by 


Seuss, (2.32) 


Definition 2.2 

A lattice-ordered monoid (L, x, 1, <} is divisible iff for 
all a,b € L with a < b there exists some c € L witha = 
bxc. 


For linearly ordered residuated lattices, one has an- 
other nice and useful characterization of divisibility. 


15 


EZ | V Hed 


= 
oO 


Z| V Hed 


Part A 


Foundations 


Proposition 2.3 

A linearly ordered residuated lattice (L, N, U, *, => 
, 0, 1) is divisible, i. e., corresponds to a divisible lattice- 
ordered monoid (L, x, 1, <), iff one has aN b = a x 
(a > b) for all a, b € L. (Of course, < here is the lattice 
ordering of the lattice (L, N, U).) 


Proof: We first show that one has in each residuated lat- 
tice 

ax(a>>b)=b & Ax(axx=b) (2.33) 
for all a,b € L. Of course, in the case a * (a>> b) = b 
there exists an x such that axx = b. So supposea*c = b 
for some c € L. If one then would have a (a >> b) Æ b, 
this would mean a * (a >> b) < b= a * c because one 
always has a x (a > b) < b by the adjointness condi- 
tion, and this hence would mean c £ a > b (because 
otherwise c < a >> b and hence b = axc < ax(a>> b) 
would be the case) and therefore also axc=cxaZ 
b by the adjointness condition, a contradiction. Thus 
(2.33) is established. 

Supposing now the divisibility of (L, N, U, x, >=> 
,0, 1), then one has for all b < a € L from the existence 
of an x such that (b = ax x) immediately a * (a >> b) = 
b = aN b. Otherwise one has a < b by the linearity of 
the ordering and hence a >> b = 1 from the adjointness 
property, thus a x (a >> b) =a*x1l=a=anb. 

Assuming on the other hand that one always has 
aNb = ax (a > b); furthermore, for all a < b € L from 
a = aNb = bNa one gets the equation a = b (b > a), 
and hence there is an x such that a = b * x. E 

Using this result, we can restate Proposition 2.2 in 
the following way. 


Corollary 2.1 
A t-algebra [0, 1]7 = ([0, 1], min, max, T, Ir, 0, 1) is di- 
visible iff the t-norm T is continuous. 


2.4 Particular Fuzzy Logics 


Now we shall discuss the core systems of t-norm-based 
logics. Of course, it would be preferable to be able 
to axiomatize each single t-norm-based logic directly. 
However, actually there is no way to do so. Hence other 
approaches have been developed. The core idea is first 
to develop systems which cover large parts which are 
common to all those t-norm-based logics. 


A further restriction is suitable w.r.t. the class of 
residuated lattices because each t-algebra [0, 1]r is lin- 
early ordered, and thus makes particularly the wff 
(eg > Y) v (Y > ọ) valid. Following Hajek [2.23, 24], 
one calls BL-algebras those divisible residuated lattices 
which also satisfy the prelinearity condition (2.12). 


Definition 2.3 

A structure L = (L, V, ^, x, >=>, 0, 1) is a BL-algebra 

iff: 

i) (L,V,A,9, 1) is a bounded lattice with lattice order- 
ing <, 

ii) (L, x, 1, <) is a lattice-ordered Abelian monoid, 

iii) The operations * and >> satisfy the adjointness 
condition 

xxysz 4> xX y>>z, (2.34) 

iv) the prelinearity condition (2.12) is satisfied, 

v) the divisibility condition is satisfied, i.e., one has 
always 


X*(X>> y)=XAYy. (2.35) 


It is interesting to notice that the prelinearity condi- 
tion (2.12) can equivalently be characterized in another 
form, which will become important later on. 


Proposition 2.4 
In residuated lattices there are equivalent 


(i) (e> yUQ x= 1, 
(ii) (œ >= y)>> 2) * (>> x)= z) <z. 


The proof is by routine calculations, cf., e.g., [2.5, 
23). 


The first successful approach came from Hájek who 
presented 1998 in the seminal monograph [2.23] the 
logic BL of all continuous t-norms, i.e., the common 
part of all the t-norm-based logics which are determined 
by a continuous f-norm. Inspired by this work a short 
time later Esteva and Godo [2.27] introduced 2001 the 
logic MTL of all left-continuous t-norms. 


Many-Valued and Fuzzy Logics | 2.4 Particular Fuzzy Logics 


These logics are characterized by algebraic seman- 
tics: BL by the class of all t-algebras with a continuous 
t-norm, and MTL by the class of all t-algebras with a left- 
continuous t-norm. All those ft-algebras are particular 
cases of residuated lattices. 

It should be noticed, however, that already in 1996 
Höhle [2.28] introduced the monoidal logic ML char- 
acterized by the class of all residuated lattices as their 
algebraic semantics. 

And it should also be mentioned that, in the case of 
logics which are determined by an algebraic semantics, 
the problem of their adequate axiomatization becomes 
particularly well manageable, if the algebraic semantics 
is given as a variety of algebraic structures, i.e., as an 
equationally definable class of algebraic structures. 


2.4.1 The Logic BL 
of All Continuous t-Norms 


The class of t-algebras (with a continuous t-norm or 
not) is not a variety: it is not closed under direct prod- 
ucts because each t-algebra is linearly ordered. Hence 
one may expect that it would be helpful for the devel- 
opment of a logic of continuous t-norms to extend the 
class of all divisible t-norm algebras in a moderate way 
to get a variety. 

And indeed this idea works: it was developed by 
Hájek and in detail explained in [2.23]. 

The core point is that one considers instead of the 
divisible t-algebras [0, 1]7, which are linearly ordered 
integral monoids, lattice-ordered integral monoids 
which satisfy the condition (2.35), which have an ad- 
ditional residuation operation connected with the semi- 
group operation via an adjointness condition (2.26), and 
which also satisfy the prelinearity condition 


(x yVOrxx»=1, (2.36) 


or equivalently 


(a> y >> z> (YX) > D> Y= 1. 
(2.37) 


The axiomatization of Hdjek [2.23] for the basic t- 
norm logic BL (in [2.5] denoted BTL), i.e., for the class 
of all well-formed formulas which are valid in all BL- 
algebras, is given in a language £r which has as basic 
vocabulary the connectives —>, & and the truth degree 
constant 0, taken in each BL-algebra (L, N, U, *, >> 
, 0,1) as the operations >, * and the element 0. 


This ¢-norm-based logic BL has the following axiom 
schemata: 


(Axel) (p > Wy) (y > n> > YX), 
(AXgL2) p& Y > ọ, 

(AxgL3) y & Y > Y &gy, 

(AxsL4) (p> (Wry) Y&y-> 1., 

(AXgL5) (p & y > xX) > (9> (Y >21). 

(AxpL6) y & (p > y)—> y & (y> go), 

(AxsL7) (C >Y) > > (M >p)> 1) >21 
(AXgL8) 0 > g, 


and has as its (only) inference rule the rule of detach- 
ment. 

Starting from the primitive connectives —>, &, and 
the truth degree constant 0, the language £r of BL is 
extended by definitions of further connectives 


CAV =49&(QV>P), (2.38) 
oV y =a (p > y)> Y) 

Aly >> y), (2.39) 

~ =49>0, (2.40) 


where g, y are formulas of the language of that system. 
Calculations (in BL-algebras) show that the ad- 
ditional connectives ^, V just have the lattice opera- 
tions N, U as their truth degree functions. 
The system BL is an implicative logic in the sense 
of Rasiowa [2.29]. So one gets a general soundness and 
completeness result. 


Theorem 2.1 General Completeness 
A formula ¢ of the language £r is derivable within the 
axiomatic system BL iff ¢ is valid in all BL-algebras. 


However, it is shown in [2.23] that already the 
class of all BL-chains, i. e., of all linearly ordered BL- 
algebras, provides an adequate algebraic semantics. 


Theorem 2.2 General Chain Completeness 
A formula g of £r is derivable within the axiomatic 
system BL iff ọ is valid in all BL-chains. 


But even more is provable and leads back to the 
starting point of the whole approach: the theorems of BL 
are just those formulas which hold true w.r.t. all divis- 
ible t-algebras. This was, extending preliminary results 
from [2.24], finally proved in [2.30]. 


Theorem 2.3 Standard Completeness 
The class of all formulas which are provable in the sys- 


17 


H°? | Y Hed 


= 
lee) 


HZ | Y Hed 


Part A 


Foundations 


tem BL coincides with the class of all formulas which 
are logically valid in all t-algebras with a continuous 
t-norm. 


The main steps in the proof are to show (i) that each 
BL-algebra is a subdirect product of subdirectly irre- 
ducible BL-chains, i. e., of linearly ordered BL-algebras 
which are not subdirect products of other BL-chains, 
and (ii) that each subdirectly irreducible BL-chain can 
be embedded into the ordinal sum of some BL-chains 
which are either trivial one-element BL-chains, or lin- 
early ordered MV-algebras, or linearly ordered product 
algebras, such that (iii) each such ordinal summand 
is locally embedable into a t-norm-based residuated 
lattice with a continuous t-norm, cf. [2.24,30] and 
again [2.5]. 

This is a lot more of algebraic machinery as nec- 
essary for the proof of the General Completeness The- 
orem 2.1 and thus offers a further indication that the 
extension of the class of divisible t-algebras to the class 
of BL-algebras made the development of the intended 
logical system easier. But even more can be seen from 
this proof: the class of BL-algebras is the smallest va- 
riety which contains all the divisible t-algebras, i. e., all 
the t-algebras determined by a continuous t-norm. And 
the algebraic reason for this is that each variety may be 
generated from its subdirectly irreducible elements, cf. 
again [2.31, 32]. 

Yet another generalization of Theorem 2.1 deserves 
to be mentioned. To state it, let us call schematic exten- 
sion of BL every extension which consists in an addition 
of axiom schemata to the axiom schemata of BL. And let 
us denote such an extension by BL+ C. And call BL(C)- 
algebra each BL-algebra A which makes A-valid all 
formulas of C, i. e., which is a model of C. 

Then one can prove, as done in [2.23], an even more 
general completeness result. 


Theorem 2.4 Strong General Completeness 
For each set C of axiom schemata and any formula ¢ of 
Lr there are equivalent: 


i) @ is derivable within BL + C; 
ii) g is valid in all BL(C)-algebras; 
iii) g is valid in all BL(C)-chains. 


For the standard semantics this result holds true 
only in a restricted form: one has to restrict the con- 
sideration to finite sets C of axiom schemata, i.e., to 
finite theories. For the Lukasiewicz logic L, which is 


the extension of BL by the schema ~~g — g of double 
negation, this has already been shown in [2.23]. And for 
arbitrary continuous f-norms this follows from results 
of Hanikova [2.33, 34]. 


Theorem 2.5 Strong Standard Completeness 
For each finite set C of axiom schemata and any formula 
Q of Lr there are equivalent: 


i) @ is derivable within BL + C; 
ii) ¢ is valid in all t-algebras which are models of C. 


2.4.2 The Logic MTL 
of All Left Continuous t-Norms 


The guess of Esteva and Godo [2.27] has been that one 
should arrive at the logic of left continuous t-norms 
if one starts from the logic of continuous t-norms and 
deletes the continuity condition, i.e., the divisibility 
condition (2.35). 

The algebraic approach needs only a small modi- 
fication: in the definition of the BL-algebras one has 
simply to delete the divisibility condition. The resulting 
algebraic structures have been called MTL-algebras. 
They again form a variety. 

Following this idea, one has to modify the previous 
axiom system in a suitable way. And one has to delete 
the definition (2.38) of the connective A, because this 
definition (together with suitable axioms) essentially 
codes the divisibility condition. The definition (2.39) of 
the connective V remains unchanged. 

As a result one now considers a new system MTL 
of mathematical fuzzy logic, characterized semantically 
by the class of all MTL-algebras. It is connected with 
the axiom system: 


(AxurLl) (> yY)> (M >x) > (>), 
(AXmrTL2) p& Y > 4g, 

(AXmTL3) p& Y >Y &Q, 

(AxurL4) >>> GY kY>y, 
(AxmrL5) (P&Y> YY? (YX), 
(AXmrL6) GAY >g, 

(AXmrL7) GAY > YW AQ, 

(AXmrtL8) g & (YW) yAY, 


(AXmTL9) 0—> Q, 
(AXmtL10) (>= WN CW) NH, 


together with the rule of detachment (w.r.t. the implica- 
tion connective —) as (the only) inference rule. 


Many-Valued and Fuzzy Logics | 2.4 Particular Fuzzy Logics 


Again, the system MTL is an implicative logic in the 
sense of Rasiowa [2.29], giving a general soundness 
and completeness result as for the previous system BL. 
Proofs of these results were given in [2.27]. 


Theorem 2.6 General Completeness 
A formula ¢ of the language £r is derivable within the 
system MTL iff ọ is valid in all MTL-algebras. 


Furthermore it is shown, in [2.27], that again al- 
ready the class of all MTL-chains provides an adequate 
algebraic semantics. 


Theorem 2.7 General Chain Completeness 
A formula g of Lr is derivable within the axiomatic 
system MTL iff ọ is valid in all MTL-chains. 


And again, similar as for the BL-case, even more 
is provable: the system MTL characterizes just these 
formulas which hold true w.r.t. all those t-norm-based 
logics which are determined by a left continuous t- 
norm, cf. [2.35]. 


Theorem 2.8 Standard Completeness 

The class of formulas which are provable in the sys- 
tem MTL coincides with the class of formulas which are 
logically valid in all t-algebras with a left continuous 
t-norm. 


This result again means, as the similar one for the 
logic of continuous f-norms, that the variety of all 
MTL-algebras is the smallest variety which contains all 
t-algebras with a left continuous t-norm. 

Also for MTL an extended completeness theorem 
similar to Theorem 2.4 holds true. (The notions MTL + 
C and MTL(C)-algebra are used similar to the BL 
case.) 


Theorem 2.9 Strong General Completeness 
For each set C of axiom schemata and any formula @ of 
Lr the following are equivalent: 


i) ọ is derivable within the system MTL + C; 
ii) ọ is valid in all MTL(C)-algebras; 
iii) ọ is valid in all MTL(C)-chains. 


For much more information on completeness mat- 
ters for different systems of fuzzy logic the reader may 
consult [2.36]. 


2.4.3 Extensions of MTL 


Because of the fact that the BL-algebras are the divisi- 
ble MTL-algebras, one gets another adequate axiomati- 
zation of the basic t-norm logic BL. 


Proposition 2.5 


BL=MIL+ {p^ Yy > g&l > y). 


Proof: Routine calculations in MTL-algebras give x * 
(x >> y) <x and x*(x>> y) <y, and hence the in- 
equality x* (x>> y) <xNy. In those MTL-algebras 
which are models of 


eprAvro&e>yV). (2.41) 


also the converse inequality holds true, hence even x * 
(x >> y) = xN y. Thus the class of models of (2.41) is 
the class of all BL-algebras. So the result follows from 
the Completeness Theorem 2.1. oO 


Proposition 2.6 


L=BL+ {-7¢9 > p} 


Proof: BL-algebras which also satisfy the equation 
(x >> 0) + 0=x can be shown to be MV-algebras, 
cf. [2.5,37]. And each MV-algebra is also a BL- 
algebra. Hence BL+ {~~o — g} is characterized by 
the class of all MV-algebras, so it is L according to 
Sect. 2.4.1. (There is also a syntactic proof available 
given in [2.23].) 


Proposition 2.7 


IT=BL+{gA-@ > 0, 


~y > (PRY y &xr -> >y). 


Proof: This is essentially the original characterization 
of the product logic TI as given in [2.3]. oO 


Proposition 2.8 


G=BL+{p >y & o}. 


19 


H°? | Y Hed 


1Z| Y Hed 


Part A | Foundations 


Proof: The prelinear Heyting algebras are just those 
BL-algebras for which the semigroup operation * co- 
incides with the lattice meet: x = ^, cf. [2.5,38], and 
each Heyting algebra is also a BL-algebra. So the result 
follows again via Sect. 2.4.1. Oo 

Similar remarks apply to further extensions of MTL. 


2.4.4 Logics of Particular t-Norms 


It is easy to recognize that two isomorphic (left contin- 
uous) t-norms 7), T determine the same t-norm-based 
logic: any order automorphism of [0,1] which trans- 
forms T; into T according to (2.21) is an isomorphism 
between the f-algebras [0, 1]7, and [0, 1]7,. 

A continuous t-norm T is called archimedean iff 
T(x, x) < x holds true for all 0 < x < 1. And a t-norm 
T has zero divisors iff there exist 0 < a,b such that 
T(a, b) = 0. 

It is well known that each continuous archimedean 
t-norm with zero divisors is isomorphic to the 
Łukasiewicz t-norm T, cf. [2.5, 20]. 


Proposition 2.9 

Each t-norm-based logic, which is determined by 
a continuous archimedean t-norm with zero divisors, 
has the same axiomatizations as the infinite-valued 
Lukasiewicz logic L. 


Furthermore, a continuous t-norm T is called strict 
iff it is strictly monotonous, i. e., satisfies for all z 4 0 


x<y <&— T(x,2)<T(,2). 


Again it is well known that each strict continuous 
t-norm is isomorphic to the product t-norm Tp, cf. [2.5, 
20]. 


Proposition 2.10 

Each t-norm-based logic which is determined by a strict 
continuous f-norm has the same axiomatizations as the 
infinite-valued product logic JT. 


But there is a general solution of the axiomatization 
problem of those ¢-norm-based logics which are deter- 
mined by a continuous t-norm. 

In [2.39], Esteva et al. study the variety BL of all 
BL-algebras. They prove that each of its subvarieties 
which is generated by a single T-algebra over [0, 1], 
T a continuous t-norm, is finitely axiomatizable. Ad- 
ditionally, they provide an algorithm to determine these 
finitely many axioms. 


So the following main result is reached: 


Theorem 2.10 

Each t-norm-based fuzzy logic £r determined by a con- 
tinuous f-norm T is a finite axiomatic extension of the 
basic fuzzy logic BL. 


For left continuous t-norms a similar result is lack- 
ing. 


2.4.5 Extensions to First-Order Logics 


The extensions of these propositional logics to first- 
order ones follows the standard lines of approach: one 
has to start from a first-order language (£ with the two 
standard quantifiers V,4) and a suitable residuated lat- 
tice A, and has to define A-interpretations M by fixing 
a nonempty domain M = |M| and by assigning to each 
predicate symbol of £ an A-valued relation in M (of 
suitable arity) and to each constant an element from (the 
support of) A. 

Usually one supposes that the first-order language 
£ has only predicate symbols and no function symbols. 
The insertion of function symbols proves to be a del- 
icate matter, essentially because it is not completely 
clear what the basic properties of the identity predicate 
should be. The core problem is whether such an iden- 
tity relation should be a crisp one or should be really 
graded. The paper [2.40] also surveys these problems 
of the identity relation. 

The satisfaction relation is defined in the standard 
way. The quantifiers V and J are interpreted as taking 
the infimum or supremum, respectively, of all the values 
of the relevant instances. 

To be sure that this approach works well one 
has either to suppose that the underlying lattices of 
the interpretations are complete lattices, or at least 
that all the necessary infima and suprema do exist in 
these lattices. Interpretations over lattices which sat- 
isfy this last mentioned condition are called safe by 
Hajek [2.23]. 

For the logic BL of continuous t-norms, Hájek [2.23] 
added the axioms: 


(V1) (Vx)g(x) > y(t), where f is substitutable for x 
in g, 

(31) g(t) > (Ax)g(x), where t is substitutable for x 
in g, 

(Y2) (Vx)(y¥ > p) > (x > (Vx)¢@), where x is not free 
in x, 


Many-Valued and Fuzzy Logics | 2.5 Some Generalizations 


(A2) (Vx)(g > x) > (Ex) —> Xx), where x is not free 
in x, 
(V3) (Vx)(VV o) > XV (Yx)ọ, where x is not free in y 


and the rule of generalization to the propositional cal- 
culus yielding the system BLV. 

Then he was able to prove the following complete- 
ness theorem. 


Theorem 2.11 General Chain Completeness 
A first-order formula g is BLV-provable iff it is valid in 
all safe interpretations over BL-chains. 


This result can be extended to a lot of other first- 
order fuzzy logics, e.g., to MTLY. 

We will not discuss further completeness results 
here but refer to the extended survey [2.40]. But it 
should be mentioned that, as suprema are not always 
maxima and infima not always minima, the truth de- 
gree of an existentially/universally quantified formula 
may not be the maximum/minimum of the truth degrees 
of the instances. It is, however, interesting to have con- 
ditions which characterize models in which the truth 
degrees of each existentially/universally quantified for- 
mula is witnessed as the truth degree of an instance. 
Cintula and Hájek [2.41] study this problem. 

The topic of first-order fuzzy logics with identity 
deserves some attention. The core problem is, as in any 
many-valued logic, whether the identity symbol should 
be interpreted by the standard, i. e., two-valued identity 
relation, or whether one should allow for graded iden- 
tity relations inside the interpretations. 

Direct translations of the identity axioms of clas- 
sical first-order logic into, e.g., the language of the 
Lukasiewicz systems force that the interpretation of the 
identity symbol has to be the standard identity relation, 


2.5 Some Generalizations 


The standard approach toward t-norm-based logics, as 
explained in Sects. 2.4.1 and 2.4.2, has been modified 
in various ways. The main background ideas are the ex- 
tension or the modification of the expressive power of 
these logical systems. 


2.5.1 Adding a Projection Operator 
A first, quite fundamental addition to the standard vo- 


cabulary of the languages of t-norm-based systems was 
proposed in [2.47]: a unary propositional operator A, 


cf. [2.42]. Similarly, for a wide class of first-order fuzzy 
logics the addition of the axioms: 


Idl xxyv=>xxy 

Id2 xxx 

Id3 xx y — (v(x, z) > p(y, 2)), for y substitutable for 
xing 


forces that the identity symbol ~ can only be under- 
stood as meaning standard identity. A general complete- 
ness theorem like Theorem 2.11 remains valid in this 
case too, cf. [2.40]. 

For the case of the Lukasiewicz logics, however, 
a slight modification of the standard identity axioms — 
particularly of the Leibniz schema, as given in [2.43], 
allows for graded identity relations, cf. also [2.5]. For 
fuzzy logics, in general similarity relations, i. e., graded 
equivalence relations offer such an approach [2.23, 
44]. For the restricted case of Horn formulas an ap- 
proach is offered by Bélohldvek and Vychodil [2.45, 
46]. They consider a first-order language with function 
symbols and the identity symbol ~ as the only pred- 
icate symbol. Their models for sets of Horn formulas 
therefore have to be algebraic structures with graded 
identity relations. However, the aim of these authors 
is not to develop an identity logic, they mainly are 
interested to use the approach to characterize classes 
of algebraic structures with graded identity relations 
and to find fuzzified versions of results from universal 
algebra. 

These authors even consider fuzzy sets of Horn for- 
mulas, i. e., they work in a Pavelka-style fuzzy logic as 
explained later in Sect. 2.6.1. But because this type of 
approach can be mirrored in standard fuzzy logics with 
sufficiently many truth degree constants, (Sect. 2.6.1) 
this approach is already discussed here. 


also known as Baaz’ Delta, which has for t-algebras the 
semantics 


A(xy)=1 forx=1, 
A(x) =0 forxAl. (2.42) 


This unary connective can be added to the systems BL 
and MTL via the additional axioms 


(Al) gv-he, 
(A2) (pV ¢) > (Agv Ay), 


21 


S'Z | Y Hed 


22 


G'Z | WV Hed 


Part A | Foundations 


(A3) Ap>ọ, 
(A4) Ag > AAg, 
(A5) A@>Yy)> (Ag Ay). 


This addition leaves all the essential theoretical results, 
like correctness and completeness theorems, valid: of 
course w.r.t. suitably expanded algebraic structures. 


2.5.2 Adding an Idempotent Negation 


A second stream of papers discusses the addition of 
an idempotent negation, i.e., a negation which satis- 
fies the double negation law, for those cases where 
the standard negation of the ¢-norm-based system is 
not idempotent. This is, e.g., the case for the prod- 
uct logic which, as explained at the end of Sect. 2.1.3, 
has the Gédel negation (2.2) as its standard nega- 
tion. By the way, it should be noticed that (routine 
calculations show that) this nonidempotent Gédel nega- 
tion is the standard negation of all those t-norm al- 
gebras with a t-norm &® which does not have zero- 
divisors. A very general approach is given in [2.48], 
and a more particular axiomatization problem discussed 
in [2.49]. 


2.5.3 Logics with Additional 
Strong Conjunctions 


A third stream of papers, partly related to the pre- 
viously mentioned one, is devoted to the problem of 
a unified treatment of different, usually two, t-norms 
and their related connectives within one logical sys- 
tem. Here the focus is on the join of the systems 
based on the Lukasiewicz t-norm and on the product 
t-norm. The great advantage of this unification is that 
the Lukasiewicz t-norm essentially allows to treat the 
addition, as may be seen from the truth degree function 
(2.6) of the Lukasiewicz (arithmetical) disjunction, and 
that the product t-norm adds the treatment of the usual 
product: and this means that the elementary arithmetic 
(in the unit interval) can be discussed in this combined 
system. This combined system has been considered in 
two strongly related forms, denoted by L/7 and LIT i 
The distinction between both systems is that LJ7 has 
both t-norms & and © and their related (residual) im- 
plications and negations among their basic connectives, 
and that LJT } adds a truth degree constant for the truth 
degree Ł. These two systems are discussed in detail 


2 
in [2.50-54]. 


2.5.4 Logics Without 
a Truth Degree Constant 


A fourth stream of papers intends to weaken the sys- 
tems BL and MTL in such a way that one deletes the 
explicit reference to the truth degree constant 0 and con- 
siders the falsity free fragments of the previous systems. 
From the algebraic point of view their characteristic 
structures become the hoops which in general are de- 
fined as algebraic structures H = (H, *, =>, 1) such that 
(H, x, 1) is a commutative monoid and that the further 
binary operation = satisfies the equations 


x=>2= 1, 
x(x => y) =yx(y > x), 
(x*y) > z=x>(y>2). 


The definition 


xCy=¢x>x=1 


provides an ordering E with the universal upper bound 
1 which makes (H, x, 1) an ordered monoid, and which 
has the additional property that the operations *, = be- 
come an adjoint pair w.r.t. this ordering. 

In particular, hoops with the additional property 


x>(VS>7CO0S0>2))>2z 


can in a natural way be generated from t-algebras with 

continuous f-norms, as has been shown in [2.55]. So one 
has a kind of competing generalization of t-algebras. 
And for this kind of algebraic semantics, one can find 
adequate axiomatizations for the corresponding hoop 
logics quite similar to the approaches of Sects. 2.4.1 
and 2.4.2. The details have been developed in [2.56]. 


2.5.5 Logics with a Noncommutative 
Strong Conjunction 


And a fifth stream discusses the generalization of the al- 
gebraic semantics from the case of commutative lattice- 
ordered monoids with residuation to the case of non- 
commutative lattice-ordered semigroups. In this con- 
text, one tries to define noncommutative BL-algebras or 
noncommutative MTL-algebras, and similarly defines 
noncommutative t-norms, also called pseudo-f-norms. 
And these considerations become combined with the 
design of an adequate axiomatization, with similar re- 
sults as in Sects. 2.4.1 and 2.4.2. Important papers on 
this topic are [2.5763]. 


Many-Valued and Fuzzy Logics | 2.6 Extensions with Graded Notions of Inference 


And finally it should be mentioned that Hájek [2.64] 
even gives a common generalization of all of these gen- 
eralized fuzzy logics, thus giving up divisibility, the 
falsity constant, and commutativity. The corresponding 


algebras are called fleas (or flea algebras), and the logic 
is the flea logic FIL. There are examples of fleas on (0, 1] 
not satisfying divisibility, nor commutativity, and hav- 
ing no least element. 


2.6 Extensions with Graded Notions of Inference 


The systems of ¢-norm-based logics discussed up to 
now have been designed to formalize the logical back- 
ground for fuzzy sets, and they have degrees of truth for 
their formulas. But they all have crisp notions of conse- 
quence, i. e., of entailment and of provability. 

Having in mind that fuzzy logics, also in their form 
as formalized logical systems, should be a (mathemat- 
ical) tool for approximate reasoning makes it desirable 
that they should be able to deal with graded inferences 
too. This means inferences which start from fuzzy sets 
of formulas, and offer consequence hulls which again 
are fuzzy sets of formulas. 


2.6.1 Pavelka-Style Approaches 


This problem was first treated by Pavelka [2.65- 
67]. The basic monograph elaborating this approach 
is [2.44]. Accordingly, such approaches are sometimes 
called Pavelka-style, but they have — with emphasis on 
the syntactic side of the matter — also been coined ap- 
proaches with evaluated syntax. Here we will call them 
GI-approaches. 

Such an approach with graded inferences has to deal 
with fuzzy sets X®™ of formulas, i.e., besides formu- 
las g also their membership degrees © ~(g) in XY. 
And these membership degrees are just the truth de- 
grees. We may assume that these degrees again form 
a residuated lattice L = (L,N,U,*,>>,0,1). Thus 
we (slightly) generalize the standard notion of fuzzy 
set (with membership degrees from the real unit in- 
terval). Therefore, the appropriate language has the 
same logical connectives as in the previous consider- 
ations. 

A Gl-approach is an easy matter as long as the en- 
tailment relationship is considered. An evaluation e is 
a model of a fuzzy set X™ of formulas iff 


x ~ (g) < elp) (2.43) 


holds for each formula gy. This immediately yields that 
the semantic consequence hull of X~ should be char- 


acterized by the membership degrees 


ce™(E~)(w) = N tety) | e model of 2~} 
(2.44) 


for each formula w. 

For a syntactic characterization of this entailment 
relation, it is necessary to have some calculus IK which 
treats formulas of the language together with truth 
degrees. So the language of this calculus has to ex- 
tend the language of the basic logical system by hav- 
ing also symbols for the truth degrees. We indicate 
these symbols by overlined letters like @,¢, and real- 
ize the common treatment of formulas and truth degrees 
by considering evaluated formulas, i.e., ordered pairs 
(a, p) consisting of a truth degree symbol and a for- 
mula. This transforms each fuzzy set X~% of formulas 
into a (usual) set of evaluated formulas, again denoted 
by X”. 

So K has to allow to derive evaluated formulas 
from sets of evaluated formulas, using suitable axioms 
and rules of inference. These axioms are usually only 
formulas g which, however, are used in the deriva- 
tions as the corresponding evaluated formulas (1, ¢). 
The rules of inference have to deal with evaluated 
formulas. 

Each K-derivation of an evaluated formula (a, g) 
counts as a derivation of @ to the degree a € L. The 
provability degree of y from X™ in K is the supremum 
over all these degrees. The syntactic consequence hull 
of X” is the fuzzy set Ce of formulas characterized 
by the membership function 


CR (S~) 
= Via € L| K derives (a, y) out of X~} (2.45) 


for each formula w. 

Despite the fact that KK is a standard calculus for 
evaluated formulas, this is — for infinite truth degree 
structures — an infinitary notion of provability for usual 
formulas. 


23 


97 | Y Hed 


24 PartA 


Foundations 


9°72 | Y Hed 


For the infinite-valued Lukasiewicz logic L, this ma- 
chinery works particularly well because it needs in an 
essential way the continuity of the residuation opera- 
tion. The corresponding calculus KK, has as axioms any 
axiom system of the infinite-valued Lukasiewicz logic 
L which provides together with the rule of detachment 
an adequate axiomatization of L, but K, replaces this 
standard rule of detachment by the generalized form 


p> y) 
(axc, y) 


for evaluated formulas. 

The soundness result then says that the K,- 
provability of an evaluated formula (@, g) means that 
a<e(g) holds for every valuation e. And this just 
means that the formula @— ø is valid; however, as 
a formula of an extended propositional language which 
has all the truth degree constants among its vocabulary. 
Of course, for this extended language the evaluations e 
have to satisfy e(a) = a for each a € [0, 1]. 

The soundness and completeness results for K, say 
that a strong completeness theorem holds true giving 


@9) (2.46) 


CIF) y) = CK (SM (2.47) 
for each formula w and each fuzzy set X ™ of formulas. 

If one takes the previously mentioned turn and ex- 
tends the standard language of propositional L by truth 
degree constants for all degrees a € [0, 1], and if one 
reads each evaluated formula (4, g) as the formula 4 > 
gy, then a slight modification Kr of the former calculus 
Kı again provides an adequate axiomatization: one has 
to add the bookkeeping axioms 


(4 &C) =axc, 


(@—>C)=a>>C, 


as explained, e.g., in [2.44]. And if one is interested to 
have evaluated formulas together with the extension of 
the language by truth degree constants, one has also to 
add the logical constant introduction rule 


G, p) 
a>@ i 


However, even a stronger result is available which 
refers only to a notion of derivability over a countable 
language. The completeness result (2.47), for Kt in- 
stead of K,, becomes already provable if one adds truth 
degree constants only for all the rationals in [0, 1], as 


was shown in [2.23]. And this extension of L, known 
as Rational Pavelka Logic, is even a conservative one, 
cf. [2.68], i.e., Kt proves only such constant-free for- 
mulas of the language with rational constants which 
are already provable in the standard infinite-valued 
Łukasiewicz logic L. 

So the Gl-approach with graded notion of prov- 
ability and entailment can suitably be mirrored inside 
standard fuzzy logics with sufficiently many truth de- 
gree constants. 

For more details the reader may also consult, 
e.g., [2.23, 44, 69]. 


2.6.2 A Lattice Theoretic Approach 


For completeness, we also mention a much more ab- 
stract approach toward fuzzy logics with graded notions 
of entailment as the previously explained one for the t- 
norm-based fuzzy logics is. 

The background for this generalization by Gerla, in 
detail explained in [2.70], is that (already) in systems of 
classical logic the syntactic as well as the semantic con- 
sequence relations, i.e., the provability as well as the 
entailment relations, are closure operators within the set 
of formulas. This is a fundamental observation made by 
Tarski [2.71] already in 1930. And the same holds true 
for the Pavelka style extensions of Sect. 2.6.1 and the 
operators C**™ and C° introduced in (2.44) and (2.45), 
respectively: they are generalized closure operators. 

The context, chosen in [2.70], is that of L-fuzzy sets, 
with L = (L, <) an arbitrary complete lattice. A closure 
operator in L is a mapping J : L —> L satisfying for ar- 
bitrary x, y € L the well-known conditions 


x <J(x) (increasingness) , 
x<y=>J(x) <J) (isotonicity) , 
J(J(x)) = J) (idempotency) . 


And a closure system in L is a subclass C C L which is 
closed under arbitrary lattice meets. 

For fuzzy logic such closure operators and closure 
systems are considered in the lattice F; (F) of all fuzzy 
subsets of the set F of formulas of some suitable for- 
malized language. 

An abstract fuzzy deduction system now is an or- 
dered pair D = (F,(F),D) determined by a closure 
operator D in the lattice Fz (F). And the fuzzy theories T 
of such an abstract fuzzy deduction system, also called 
D-theories, are the fixed points of D: T = D(T), i.e., 
the deductively closed fuzzy sets of formulas. 


Many-Valued and Fuzzy Logics | 2.7 Some Complexity Results 


A rather abstract setting is also chosen for the se- 
mantics of such an abstract fuzzy deduction system: an 
abstract fuzzy semantics M is nothing but a class of el- 
ements of the lattice F; (F), i.e., a class of fuzzy sets 
of formulas. These fuzzy sets of formulas are called 
models. The only restriction is that the universal set 
over F, i.e., the fuzzy subset of F which has always 
membership degree one, is not allowed as a model. 
The background idea here is that, for each standard 
interpretation 2 (in the sense of many-valued logic — 
including an evaluation of the individual variables) for 
the formulas of F, a model M is determined as the 
fuzzy set which has for each formula gy € F the truth 
degree of gy in A as membership degree. Accordingly, 
the satisfaction relation =m coincides with inclusion: 
for models M € M and fuzzy sets X of formulas one 
has 


MEuD e SCM. (2.48) 


2.7 Some Complexity Results 


Each (left-continuous) t-norm T determines four impor- 
tant sets of formulas: 


@ 1TAUT (T): The set of all 1-tautologies. 

@ posTAUT (T): The set of all positive tautologies. 

@ 1SAT (T): The set of 1-satisfiable formulas. 

@ posSAT (T): The set of all positively satisfiable for- 
mulas. 


Here a 1-tautology is a formula valid in [0, 1]7, i. e., 
having for each evaluation of propositional variables by 
elements of [0, 1] the value 1 in [0, 1]7. And a positive 
tautology is a formula which has for each evalua- 
tion a positive value in [0, 1]7. Similarly 1-satisfiability 
means to have the [0, 1]; value 1 for some evaluation, 
and positive satisfiability means to have for some eval- 
uation a positive [0, 1]7-value. 

In the same way, one defines analogous sets corre- 
sponding to sets of t-norms; in particular, with BL refer- 
ring to the set of all continuous t-norms, one defines 


TTAUT(BL) 

= ( \aTaut(7) | T a continuous t-norm} , 
posTAUT(BL) 

= ( \{posTAUT(7) | T a continuous t-norm} , 


and similarly for the satisfiability cases. 


In this setting, one has a semantic and a syntac- 
tic consequence operator, both being closure operators, 
i.e., one has for each fuzzy set X of formulas from F 
a semantic as well as a syntactic consequence hull, 
given by 


ce™(S) = ( {M € M |M Em X}, 
c (X) = D(X). (2.49) 


Similar to the classical case one has C**™(M) = M for 
each model M € M, i.e., each such model provides 
a C%™-theory. 

However, a general completeness theorem is not 
available. What one needs instead, in search for a com- 
pleteness result, that are specifications which restrict the 
full generality of this approach, and lead mainly back 
to situations which have been discussed in the previous 
sections. 


There are interesting results on the computational 
complexity of these sets. So it was, already in [2.23], 
shown that if the t-norm T is Tı, or Tg, or Tp, then 
TTAUT(T) and posTAUT(T) are co-NP-complete, and 
1SAT(T) and posSAT(T) are NP-complete. This result 
was partly strengthened in [2.34] yielding that 1TAUT(T) 
is co-NP-complete for each continuous t-norm T. 

The corresponding results have been proved 
in [2.72, 73] for the logic BL of continuous t-norms. 


Theorem 2.12 
1TAUT(BL) and posTAUT(BL) are co-NP-complete, and 
1SAT(BL) as well as posSAT(BL) are NP-complete. 


Furthermore, there are several results on equality or 
inequality among the sets involved [2.23,73]. So one 
has, e.g., 


1SAT(G) = posSAT(G) = 1SAT(I7) = posSAT(JT) , 
but also 

1SAT(L) Æ posSAT(L) 
and 

posSAT(BL) = posSAT(L) . 


For the 1-tautologicity the papers [2.74-76] con- 
tain interesting results relating the property of a formula 


25 


L'Z? | Y Hed 


26 


L'Z | V Hed 


Part A 


Foundations 


being a l-tautology of one of the logics L,G, to 
the property of being a 1-tautology of finitely many 
finite-valued logics of estimated complexity, and sim- 
ilar results for 1TAUT(BL). For example, ọ is in 1TAUT(L) 
if and only if it is a 1-tautology of the finitely valued 
Lukasiewicz logic Lm for m being 2*) where #(g) is 
the number of occurrences of variables in g. 

Remind that for predicate logics LxV, the gen- 
eral models are safe interpretations over any linearly 
ordered \Lx-algebra, and the standard models — for 
t-norm-based logics — are interpretations over any t- 
algebra which is also an L x-algebra. 


Definition 2.4 
Let ọ be a closed formula of the language of L xY: 


1. ọ is a general Lx V-tautology if ọ is valid in each 
safe interpretation over any linearly ordered Lx- 
algebra; genTAUT(LxV) is the set of all general 
Lx V-tautologies. 

2. ọ is a standard LxV-tautology if g is valid in 
each safe interpretation over any standard Lx- 
algebra; stTAUT(LxV) is the set of all standard 
Lx V-tautologies. 

3. gis LxV-satisfiable if ọ is valid in some safe inter- 
pretation over some linearly ordered L x-algebra; 
genSAT(L xY) is the set of all Lx V-satisfiable sen- 
tences. 

4. ọ is standardly Lx‘-satisfiable if g is valid 
in some interpretation over some standard Lx- 
algebra; stSAT(LxV) is the set of all standardly 
Lx V-satisfiable formulas. 


It was already shown in [2.23] that if L is the 
logic BL or one of its specifications L,G,M, then 
genTAUT(LYV) is © -complete and genSAT(LY) is M- 
complete. And this result has been extended in [2.77] to 
any ¢-norm-based logic LV for a continuous t-norm T. 

For standard semantics the situation 1s different. Al- 
ready Ragaz [2.78, 79] proved that the set stTAUT(LV) of 
standard tautologies of the infinite-valued Lukasiewicz 
logic is [77-complete. Generalizing this, Hájek [2.23] 
also showed that stTAUT(GV) = genTAUT(GY) and that 
therefore the set stTAUT(GV) of standard tautologies of 
the infinite-valued Gödel logic is Xı-complete. 

These results have been considerably extended by 
Montagna [2.80] yielding the following facts. 


Theorem 2.13 


1. For each set K of continuous t-norms containing 
a t-norm different from Tg, the set stTAUT(L xV) is 
T1,-hard. 

2. If K is a nonempty set of continuous t-norms con- 
taining a t-norm which has, in its ordinal sum rep- 
resentation, a product component or a nonextremal 
(this means being neither first nor last summand) 
Lukasiewicz component then stTAUT(L xY) is not 
arithmetical. 


The arithmetical complexity of the set stTAUT(L7V) 
remains undetermined if T is, e.g., one of the t-norms 
LOL, GGL, LOG, or LAGO L, with @ denoting the 
ordinal sum operation. 

For standard satisfiability Hájek [2.23] proved that 
the sets stSAT(GV) and stSAT(LV) are /7,-complete. He 
also proved, in [2.81], that the set stSATUTY) is not 
arithmetical, and gave in [2.82] also the following re- 
sult. 


Theorem 2.14 
If ® is a continuous t-norm whose first ordinal sum 
component is G, or L, then stSAT(L@Y) is [7,-complete. 


The reason is that one has, under these assumptions, 
stSAT(L@V) = stSAT(GV) , 

as well as 
stSAT(L@V) = stSAT(LY) . 


Montagna [2.83] added for the product logic IT the 
more general. 


Theorem 2.15 

If K is a nonempty set of continuous t-norms contain- 
ing Tp, or a t-norm whose first ordinal summand is Tp, 
then stSAT(L xY) is not arithmetical. 


The complexity of stSAT(L;V) for continuous t- 
norms T which do not have a first component in their 
ordinal sum representation is an open problem. 

More complexity results are surveyed in [2.77] and 
more recently in [2.84, 85]. 


Many-Valued and Fuzzy Logics | References 


2.8 Concluding Remarks 


The reader who is interested in further results, or in 
more details, might consult the recent Handbook of 
Mathematical Fuzzy Logic [2.86]. This Handbook sur- 
veys the whole field of mathematical fuzzy logics and 
offers the most actual state of the art in this field. For 
the wider topic of many-valued logics, [2.5] is still the 
best reference. 

There is one approach, however, which is not dis- 
cussed here and only shortly mentioned in [2.86]: 
a version of Church-style type theory based on suitable 
mathematical fuzzy logics, called fuzzy type theory, and 


References 


developed by Novák [2.87]. This fuzzy-type theory is 
particularly used for linguistic modeling. A good sur- 
vey is [2.88]. 

There are further topics also often treated under the 
heading fuzzy logic which have been omitted in this sur- 
vey. Here the focus was on what is called mathematical 
fuzzy logic or also fuzzy logic in narrow sense. Those 
other topics mainly are classified as fuzzy logic in wider 
sense. They include topics like fuzzy implications, ap- 
proximate, and commonsense reasoning — and some of 
them are discussed elsewhere in this Handbook. 


2.1 J. tukasiewicz, A. Tarski: Untersuchungen über den 
Aussagenkalkiil, c.r. Séances Soc. Sci. Lett. Vars. cl. 
III 23, 30-50 (1930) 

2.2 K. Gödel: Zum intuitionistischen Aussagenkalkiil, 
Anz. Akad. Wiss. Wien: Math.-naturwiss. KI. 69, 65- 
66 (1932) 

2.3 P. Hajek, L. Godo, F. Esteva: A complete many-valued 
logic with product-conjunction, Arch. Math. Log. 35, 
191-208 (1996) 

2.4 E.L. Post: Introduction to a general theory of el- 
ementary propositions, Am. J. Math. 43, 163-185 
(1921) 

2:5 S. Gottwald: A Treatise on Many-Valued Logics, Stud- 
ies in Logic and Computation, Vol. 9 (Research Stud- 
ies, Baldock 2001) 

2.6 M. Dummett: A propositional calculus with denu- 
merable matrix, J. Symb. Log. 24, 97-106 (1959) 

2.7 C.C. Chang: Algebraic analysis of many valued logics, 
Trans. Am. Math. Soc. 88, 476-490 (1958) 

2.8 R. Cignoli, |.M.L. D'Ottaviano, D. Mundici: Algebraic 
Foundations of Many-Valued Reasoning, Trends in 
Logic — Studia Logica Library, Vol. 7 (Kluwer, Dor- 
drecht 2000) 

2.9 P.C. Rosenbloom: Post algebras. I. postulates and 
general theory, Am. J. Math. 64, 167-188 (1942) 

2.10 G. Epstein: The lattice theory of post algebras, Trans. 
Am. Math. Soc. 95, 300-317 (1960) 

2.11 N. Cat-Ho: Generalized Post Algebras and Their Ap- 
plications to Some Infinitary Many-Valued Logics, 
Diss. Math., Vol. 57 (PWN, Warsaw 1973) 

2.12 P. Dwinger: Generalized post algebras, Bull. Acad. 
Polon. Sci. Sér. Sci. Math. Astronom. Phys. 16, 559- 
563 (1968) 

2.13 P. Dwinger: A survey of the theory of post alge- 
bras and their generalizations. In: Modern Uses of 
Multiple-Valued Logic, ed. by J.M. Dunn, G. Epstein 
(Reidel, Dordrecht 1977) pp. 53-75 

2.14 H. Rasiowa: On generalised post algebras of order 
wt and wt-valued predicate calculi, Bull. Acad. 


Polon. Sci. Sér. Sci. Math. Astron. Phys. 21, 209-219 
(1973) 

2.15 G. Epstein, H. Rasiowa: Theory and uses of post 
algebras of order œ +œ*. Part I, 20th Int. Symp. 
Multiple-Valued Log., Charlotte/NC 1990 (IEEE Com- 
puter Society, New York 1990) pp. 42-47 

2.16 G. Epstein, H. Rasiowa: Theory and uses of post 
algebras of order w+*. Part Il, 21st Int. Symp. 
Multiple-Valued Log., Victoria/B.C., 1991 (IEEE Com- 
puter Society, New York 1991) pp. 248-254 

2.17 D.C. Rine (Ed.): Computer Science and Multiple Val- 
ued Logic, 2nd edn. (North-Holland, Amsterdam 
1984) 

2.18 L.A. Zadeh: Fuzzy sets, Inf. Control 8, 338-353 (1965) 

2.19 B. Schweizer, A. Sklar: Probabilistic Metric Spaces 
(North-Holland, Amsterdam 1983) 

2.20 E.P. Klement, R. Mesiar, E. Pap: Triangular Norms 
(Kluwer, Dordrecht 2000) 

2.21 C. Alsina, M.J. Frank, B. Schweizer: Associative func- 
tions. Triangular Norms and Copulas (World Scien- 
tific, Hackensack 2006) 

2.22 S. Gottwald: Fuzzy Sets and Fuzzy Logic: Founda- 
tions of Application - From a Mathematical Point 
of View. Artificial Intelligence (Verlag Vieweg, Wies- 
baden, and Tecnea, Toulouse 1993) 

2.23 P. Hajek: Metamathematics of Fuzzy Logic, Trends in 
Logic, Vol. 4 (Kluwer, Dordrecht 1998) 

2.24 P. Hajek: Basic fuzzy logic and BL-algebras, Soft Com- 
put. 2, 124-128 (1998) 

2.25 D. Butnariu, E.P. Klement, S. Zafrany: On triangu- 
lar norm-based propositional fuzzy logics, Fuzzy Sets 
Syst. 69, 241-255 (1995) 

2.26 J. Hekrdla, E.P. Klement, M. Navara: Two approaches 
to fuzzy propositional logics, J. Multiple-Valued Log, 
Soft Comput. 9, 343-360 (2003) 

2.27 F. Esteva, L. Godo: Monoidal t-norm based logic: To- 
ward a logic for left-continuous t-norms, Fuzzy Sets 
Syst. 124, 271-288 (2001) 


27 


Z | Y Hed 


28 Part A | Foundations 
= 2.28 U. Höhle: On the fundamentals of fuzzy set theory, 2.47 M. Baaz: Infinite-valued Gödel logics with 0-1 pro- 
2 J. Math. Anal. Appl. 201, 786-826 (1996) jections and relativizations. In: Gödel ‘96, Lecture 
Ge 2.29 H. Rasiowa: An Algebraic Approach to Non-Classical Notes in Logic, Vol. 6, ed. by P. Hajek (Springer, New 
= Logics (North-Holland/PWN, Amsterdam/Warsaw York 1996) pp. 23-33 
N 1974) 2.48 F. Esteva, L. Godo, P. Hájek, M. Navara: Residuated 
2.30 R. Cignoli, F. Esteva, L. Godo, A. Torrens: Basic fuzzy fuzzy logic with an involutive negation, Arch. Math. 
logic is the logic of continuous t-norms and their Log. 39, 103-124 (2000) 
residua, Soft Comput. 4, 106-112 (2000) 2.49 S. Gottwald, S. Jenei: Anew axiomatization for invo- 
2.31 S. Burris, H.P. Sankappanavar: A Course in Universal lutive monoidal t-norm based logic, Fuzzy Sets Syst. 
Algebra (Springer, New York 1981) 124, 303-307 (2001) 
2.32 K. Denecke, S.L. Wismath: Universal Algebra and Ap- 2.50 P. Cintula: The t/7 and t4 propositional and pred- 
plications in Theoretical Computer Science (Chapman icate logics, Fuzzy Sets Syst. 124, 289-302 (2001) 
Hall/CRC, Boca Raton 2002) 2.51 P. Cintula: An alternative approach to the t/7 logic, 
2.33 Z. Haniková: Standard algebras for fuzzy proposi- Neural Netw. World 124, 561-575 (2001) 
tional calculi, Fuzzy Sets Syst. 124, 309-320 (2001) 2.52 P. Cintula: Advances in ŁA and t4 logics, Arch. 
2.34 Z. Haniková: A note on the complexity of proposi- Math. Log. 42, 449-468 (2003) 
tional logics of individual t-algebras, Neural Netw. 2.53 F. Esteva, L. Godo: Putting together tukasiewicz 
World 12, 453-460 (2002) and product logics, Mathw. Soft Comput. 6, 219-234 
2.35 S.Jenei, F. Montagna: A proof of standard complete- (1999) 
ness for Esteva and Godo's logic MTL, Stud. Log. 70, 2.54 F. Esteva, L. Godo, F. Montagna: The ti and 
183-192 (2002) tT} logics: Two complete fuzzy systems joining 
2.36 P. Cintula, F. Esteva, J. Gispert, L. Godo, F. Mon- tukasiewicz and product logics, Arch. Math. Log. 40, 
tagna, C. Noguera: Distinguished algebraic semantics 39-67 (2001) 
for t-norm based fuzzy logics: Methods and alge- 2.55 P. Agliano, I.M.A. Ferreirim, F. Montagna: Basic 
braic equivalencies, Ann. Pure Appl. Log. 160, 53-81 hoops: An algebraic study of continuous t-norms, 
(2009) Stud. Log. 87, 73-98 (2007) 
2.37 U. Höhle: Presheaves over GL-monoids. In: Non- 2.56 F. Esteva, L. Godo, P. Hájek, F. Montagna: Hoops and 
Classical Logics and Their Applications to Fuzzy Sub- fuzzy logic, J. Log. Comput. 13, 531-555 (2003) 
sets, Theory and Decision Library, Series B, Vol.32,ed. 2.57 A. di Nola, G. Georgescu, A. lorgulescu: Pseudo-BL 
by U. Höhle, E.P. Klement (Kluwer, Dordrecht 1995) algebras. | and Il, J. Multiple-Valued Log. 8, 671-750 
pp. 127-157 (2002) 
2.38 U. Höhle: Commutative, residuated /-monoids. In: 2.58 P. Flondor, G. Georgescu, A. lorgulescu: Pseudo t- 
Non-Classical Logics and Their Applications to Fuzzy norms and pseudo-BL algebras, Soft Comput. 5, 
Subsets, Theory and Decision Library, Series B, Vol. 32, 355-371 (2001) 
ed. by U. Höhle, E.P. Klement (Kluwer, Dordrecht 2.59 P. Hájek: Embedding standard BL-algebras into non- 
1995) pp. 53-106 commutative pseudo-BL-algebras, Tatra Mt. Math. 
2.39 F. Esteva, L. Godo, F. Montagna: Equational char- Publ. 27, 125-130 (2003) 
acterization of the subvarieties of BL generated by 2.60 P. Hájek: Fuzzy logics with non-commutative con- 
t-norm algebras, Stud. Log. 76, 161-200 (2004) junctions, J. Log. Comput. 13, 469-479 (2003) 
2.40 P. Cintula, P. Hájek: Triangular norm based predicate 2.61 P. Hájek: Observations on non-commutative fuzzy 
fuzzy logics, Fuzzy Sets Syst. 161, 311-346 (2010) logics, Soft Comput. 8, 28-43 (2003) 
2.41 P. Cintula, P. Hájek: On theories and models in fuzzy 2.62 S. Jenei, F. Montagna: A proof of standard complete- 
predicate logics, J. Symb. Log. 71, 863-880 (2006) ness for non-commutative monoidal t-norm logic, 
2.42 H. Thiele: Theorie der endlichwertigen tukasiewicz- Neural Netw. World 13, 481-488 (2003) 
schen Pradikatenkalkiile der ersten Stufe, Z. Math. 2.63 J. Kiihr: Pseudo-BL algebras and PRI-monoids, 
Log. Grundl. Math. 4, 108-142 (1958) Math. Bohem. 128, 199-208 (2003) 
2.43 S. Gottwald: A generalized tukasiewicz-styleidentity 2.64 P. Hajek: Fleas and fuzzy logic, J. Multiple-Valued 
logic. In: Mathematical Logic and Formal Systems, Log. Soft Comput. 11, 137-152 (2005) 
Lecture Notes in Pure and Applied Mathematics, 2.65 J. Pavelka: On fuzzy logic. Part 1, Z. Math. Log. Grundl. 
Vol. 94, ed. by L.P. de Alcantara (Marcel Dekker, New Math. 25, 45-52 (1979) 
York 1985) pp. 183-195 2.66 J. Pavelka: On fuzzy logic. Part Il, Z. Math. Log. 
2.44 V. Novak, |. Perfilieva, J. Mockor: Mathematical Prin- Grundl. Math. 25, 119-134 (1979) 
ciples of Fuzzy Logic (Kluwer, Boston 1999) 2.67 J. Pavelka: On fuzzy logic. Part Ill, Z. Math. Log. 
2.45 R. Bělohlávek, V. Vychodil: Fuzzy Horn logic. |. Proof Grundl. Math. 25, 447-464 (1979) 
theory, Arch. Math. Log. 45, 3-51 (2006) 2.68 P. Hájek, J. Paris, J. Shepherdson: Rational Pavelka 
2.46 R. Bělohlávek, V. Vychodil: Fuzzy Horn logic. Il. Im- predicate logic is a conservative extension of 


plicationally defined classes, Arch. Math. Log. 45, 
149-177 (2006) 


tukasiewicz predicate logic, J. Symb. Log. 65, 669- 
682 (2000) 


Many-Valued and Fuzzy Logics 


References 


2.69 


E. Turunen: Well-defined fuzzy sentential logic, 
Math. Log. Quart. 41, 236-248 (1995) 

G. Gerla: Fuzzy logic, Mathematical Tools for Ap- 
proximate Reasoning, Trends in Logic, Vol. 11 (Kluwer, 
Dordrecht 2001) 

A. Tarski: Fundamentale Begriffe der Methodolo- 
gie der deduktiven Wissenschaften, Monatsh. Math. 
Phys. 37, 361-404 (1930) 

M. Baaz, P. Hajek, F. Montagna, H. Veith: Complexity 
of t-tautologies, Ann. Pure Appl. Log. 113, 3-11 (2002) 
P. Hajek: Basic fuzzy logic and BL-algebras Il, Soft 
Comput. 7, 179-183 (2003) 

S. Aguzzoli, A. Ciabattoni: Finiteness in infinite- 
valued tukasiewicz logic, J. Log. Lang. Inf. 9, 5-29 
(2000) 

S. Aguzzoli, B. Gerla: Finite-valued reductions of 
infinite-valued logics, Arch. Math. Log. 41, 361-399 
(2002) 

S. Aguzzoli, B. Gerla: On countermodels in basic 
logic, Neural Netw. World 12, 407-420 (2002) 

P. Hajek: Arithmetical complexity of fuzzy predicate 
logics — A survey, Soft Comput. 9(12), 935-941 (2005) 
M. Ragaz: Arithmetische Klassifikation von Formel- 
mengen der unendlichwertigen Logik, Ph.D. Thesis 
(Abteilung Mathematik der Eidgenössischen Tech- 
nischen Hochschule Ziirich, Ziirich 1981) 

M. Ragaz: Die Unentscheidbarkeit der einstelligen 
unendlichwertigen Pradikatenlogik, Arch. Math. Log. 
Grundlagenforsch. 23, 129-139 (1983) 


2.80 


2.81 


2.82 


2.83 


2.84 


2.85 


2.86 


2.87 


2.88 


F. Montanga: On the predicate logics of continu- 
ous t-norm BL-algebras, Arch. Math. Log. 44, 97-114 
(2005) 

P. Hajek: Fuzzy logic and arithmetical hierarchy III, 
Stud. Log. 68, 129-142 (2001) 

P. Hajek: Fuzzy logic and arithmetical hierarchy IV, 
First-Order Logic Revisited, Proc. Conf. FOL75 - 75 
Years of First-Order Logic, Berlin, ed. by V. Hendricks 
(Logos, Berlin 2004) pp. 107-115 

F. Montagna: Three complexity problems in quanti- 
fied fuzzy logic, Stud. Log. 68, 143-152 (2001) 

Z. Hanikova: Computational complexity of proposi- 
tional fuzzy logics. In: Handbook of Mathematical 
Fuzzy Logic, Studies in Logic, Vol. 2, ed. by P. Cintula, 
P. Hajek, C. Noguera (College, London 2011) pp. 793- 
851 

P. Hajek, F. Montagna, C. Noguera: Computational 
complexity of first-order fuzzy logics. In: Handbook 
of Mathematical Fuzzy Logic, Studies in Logic, ed. by 
P. Cintula, P. Hajek, C. Noguera (College Publ., London 
2011) pp. 853-908 

P. Cintula, P. Hajek, C. Noguera (Eds.): Handbook of 
Mathematical Fuzzy Logic, Studies in Logic, Vol. 37 
(College Publ., London 2011) 

V. Novak: On fuzzy type theory, Fuzzy Sets Syst. 149(2), 
235-273 (2005) 

V. Novak: Reasoning about mathematical fuzzy logic 
and its future, Fuzzy Sets Syst. 192, 25-44 (2012) 


29 


Z | Y Hed 


3. Possibility Theory and Its Applications: 


Didier Dubois, Henry Prade 


This chapter provides an overview of possibility 
theory, emphasizing its historical roots and its re- 
cent developments. Possibility theory lies at the 
crossroads between fuzzy sets, probability, and 
nonmonotonic reasoning. Possibility theory can be 
cast either in an ordinal or in a numerical setting. 
Qualitative possibility theory is closely related to 
belief revision theory, and commonsense reason- 
ing with exception-tainted knowledge in artificial 
intelligence. Possibilistic logic provides a rich rep- 
resentation setting, which enables the handling 
of lower bounds of possibility theory measures, 
while remaining close to classical logic. Quali- 
tative possibility theory has been axiomatically 
justified in a decision-theoretic framework in the 
style of Savage, thus providing a foundation for 
qualitative decision theory. Quantitative possibil- 
ity theory is the simplest framework for statistical 
reasoning with imprecise probabilities. As such, it 
has close connections with random set theory and 
confidence intervals, and can provide a tool for 
uncertainty propagation with limited statistical or 
subjective information. 


3.1 Historical Background .................... 32 
BAT GLS; SBC Be eisiea is 32 
Sled DEWIS inersion 33 
Be E ROMER T 33 
SAM DA ZIJE ene 33 

3.2 Basic Notions of Possibility Theory........... 33 
3.2.1 Possibility Distributions................. 33 
LAZ SPRUNG coineinean 34 
3.2.3 Possibility 

and Necessity Functions................ 34 
3.2.4 Certainty Qualification.........00000.... 35 


Possibility theory is an uncertainty theory devoted to 
the handling of incomplete information. To a large ex- 
tent, it is comparable to probability theory because 


Where Do We Stand? 


3.2.5 Joint Possibility Distributions......... 35 
3:2:6 COMAIMOTMING o eisnsensesesossats 35 
3.2.7 Independence... 36 
3.2.8 Fuzzy Interval Analysis .................. 36 
3.2.9 Guaranteed Possibility ..........0..... 36 
3.2.10 Bipolar Possibility Theory .............. at 
3.3 Qualitative Possibility Theory................... 38 
3.3.1 Possibility Theory and Modal Logic. 38 
3.3.2 Comparative Possibility ................. 38 
3.3.3 Possibility Theory 
and Nonmonotonic Inference........ 39 
3.3.4 Possibilistic Logic... 40 
3.3.5 Ranking Function Theory............... 41 
3.3.6  Possibilistic Belief Networks .......... 42 


3.3.7 Fuzzy Rule-Based 
and Case-Based Approximate 


REA SUIS: eienen riinas 43 
3.3.8 Preference Representation............. 43 
3.3.9 Decision-Theoretic Foundations..... 44 
3.4 Quantitative Possibility Theory ................ 45 
3.4.1 Possibility as Upper Probability...... 46 
3.4.2 COMGIMOMING isis ccnciscscesvencccnesteeds 46 
3.4.3 Probability—Possibility 
Transformations. ecreis 47 
3.5 Some Applications.....................ccceeeeeeeeee 49 
3.5.1 Uncertain Database Querying 
and Preference Queries ................. 49 
3.5.2 Description LOgICS ....2....ccccceeeec 50 
3.5.3 Information FUSION ......cc0:.sccceeansecs 50 
3.5.4 Temporal Reasoning 
and Scheduling .. cassis ces ineacetacssascs 51 
3559 RIS AMBIEN i co sencsceseacuesaccsavurees 52 
3.5.6 Machine Learning... 52 
3.6 Some Current Research Lines ................... 53 
RGTOVQTIORS o.oo cicsiscccctusccecnsshstensseacckaptaeccsesueed’ 54 


it is based on set functions. It differs from the latter 
by the use of a pair of dual set functions (possibility 
and necessity measures) instead of only one. Besides, 


31 


32 


ve | Y Hed 


Part A 


Foundations 


it is not additive and makes sense on ordinal struc- 
tures. The name Theory of Possibility was coined by 
Zadeh [3.1], who was inspired by a paper by Gaines and 
Kohout [3.2]. In Zadeh’s view, possibility distributions 
were meant to provide a graded semantics to natural 
language statements; on this basis, possibility degrees 
can be attached to other statements, as well as dual ne- 
cessity degrees expressing graded certainty. However, 
possibility and necessity measures can also be the ba- 
sis of a full-fledged representation of partial belief that 
parallels probability, without compulsory reference to 
linguistic information [3.3, 4]. It can be seen either as 
a coarse, nonnumerical version of probability theory, or 
a framework for reasoning with extreme probabilities, 
or yet a simple approach to reasoning with imprecise 
probabilities [3.5]. 

Besides, possibility distributions can also be in- 
terpreted as representations of preference, thus stand- 
ing for a counterpart to a utility function. In this 
case, possibility degrees estimate degrees of feasibil- 
ity of alternative choices, while necessity measures 
can represent priorities [3.6]. The possibility theory 
framework is also bipolar [3.7] because distributions 
may either restrict the possible states of the world 
(negative information pointing out the impossible), or 
model sets of actually observed possibilities (posi- 
tive information pointing out the possible). Negative 
information refers to pieces of knowledge that are 
supposedly correct and act as constraints. Possibility 
and necessity measures rely on negative information. 


3.1 Historical Background 


Zadeh was not the first scientist to speak about for- 
malising notions of possibility. The modalities pos- 
sible and necessary have been used in philosophy 
at least since the Middle Ages in Europe, based 
on Aristotle’s and Theophrastus’ works [3.11]. More 
recently these notions became the building blocks 
of modal logics that emerged at the beginning of 
the 20th century from the works of C.I. Lewis 
(see Cresswell [3.12]). In this approach, possibility 
and necessity are all-or-nothing notions, and han- 
dled at the syntactic level. More recently, and inde- 
pendently from Zadeh’s view, the notion of possi- 
bility, as opposed to probability, was central in the 
works of one economist, and in those of two philoso- 
phers. 


Positive information refers to reports of actually ob- 
served states, or to sets of preferred choices. They 
induce two other set functions: guaranteed possibility 
measures and its dual, that are decreasing w.r.t. set in- 
clusion [3.8]. 

After reviewing pioneering contributions to possi- 
bility theory, we recall its basic concepts namely the 
four set functions at work in possibility theory. Then we 
present the two main directions along which possibility 
theory has developed: the qualitative and quantitative 
settings. Both approaches share the same basic maxitiv- 
ity axiom. They differ when it comes to conditioning, 
and to independence notions. We point out the connec- 
tions with a coarse numerical integer-valued approach 
to belief representation, proposed by Spohn [3.9], now 
known as ranking theory [3.10]. 

In each setting, we discuss current and prospective 
lines of research. In the qualitative approach, we review 
the connections between possibility theory and modal 
logic, possibilistic logic and its applications to non- 
monotonic reasoning, logic programming and the like, 
possibilistic counterparts of Bayesian belief networks, 
the framework of soft constraints and the possibilistic 
approach to qualitative decision theory, and more recent 
investigations in formal concept analysis and learning. 
On the quantitative side, we review quantitative possi- 
bilistic networks, the connections between possibility 
theory, belief functions and imprecise probabilities, the 
connections with non-Bayesian statistics, and the appli- 
cation of quantitative possibility to risk analysis. 


3.1.1 G.L.S. Shackle 


A graded notion of possibility was introduced as a full- 
fledged approach to uncertainty and decision in 1940- 
1970 by the English economist Shackle [3.13], who 
called degree of potential surprise of an event its de- 
gree of impossibility, that is, retrospectively, the degree 
of necessity of the opposite event. Shackle’s notion of 
possibility is basically epistemic, it is a character of the 
chooser’s particular state of knowledge in his present. 
Impossibility is understood as disbelief. Potential sur- 
prise is valued on a disbelief scale, namely a positive 
interval of the form [0, y*], where y* denotes the ab- 
solute rejection of the event to which it is assigned. 
In case everything is possible, all mutually exclusive 


Possibility Theory and Its Applications: Where Do We Stand? 


hypotheses have zero surprise. At least one elemen- 
tary hypothesis must carry zero potential surprise. The 
degree of surprise of an event, a set of elementary hy- 
potheses, is the degree of surprise of its least surprising 
realization. Shackle also introduces a notion of con- 
ditional possibility, whereby the degree of surprise of 
a conjunction of two events A and B is equal to the max- 
imum of the degree of surprise of A, and of the degree of 
surprise of B, should A prove true. The disbelief notion 
introduced later by Spohn [3.9, 10] employs the same 
type of convention as potential surprise, but uses the set 
of natural integers as a disbelief scale; his conditioning 
tule uses the subtraction of natural integers. 


3.1.2 D. Lewis 


In his 1973 book [3.14], the philosopher David Lewis 
considers a graded notion of possibility in the form of 
a relation between possible worlds he calls compara- 
tive possibility. He connects this concept of possibility 
to a notion of similarity between possible worlds. This 
asymmetric notion of similarity is also comparative, 
and is meant to express statements of the form: a world 
jis at least as similar to world i as world k is. Compar- 
ative similarity of j and k with respect to 7 is interpreted 
as the comparative possibility of j with respect to k 
viewed from world i. Such relations are assumed to be 
complete pre-orderings and are instrumental in defining 
the truth conditions of counterfactual statements (of the 
form If I were rich, I would buy a big boat). Compar- 
ative possibility relations > 77 obey the key axiom: for 
all events A, B, C 


A >r Bimplies CUA >z CUB. 
This axiom was later independently proposed by the 


first author [3.15] in an attempt to derive a possi- 
bilistic counterpart to comparative probabilities. Inde- 


pendently, the connection between numerical possibil- 
ity degrees and similarity was investigated by Sud- 
kamp [3.16]. 


3.1.3 L.J. Cohen 


A framework very similar to the one of Shackle was 
proposed by the philosopher Cohen [3.17] who con- 
sidered the problem of legal reasoning. He introduced 
so-called Baconian probabilities understood as degrees 
of provability. The idea is that it is hard to prove some- 
one guilty at the court of law by means of pure statistical 
arguments. The basic feature of degrees of provability is 
that a hypothesis and its negation cannot both be prov- 
able together to any extent (the contrary being a case 
for inconsistency). Such degrees of provability coincide 
with what is known as necessity measures. 


3.1.4 L.A. Zadeh 


In his seminal paper [3.1], Zadeh proposed an inter- 
pretation of membership functions of fuzzy sets as 
possibility distributions encoding flexible constraints 
induced by natural language statements. Zadeh tenta- 
tively articulated the relationship between possibility 
and probability, noticing that what is probable must 
preliminarily be possible. However, the view of pos- 
sibility degrees developed in his paper refers to the 
idea of graded feasibility (degrees of ease, as in the 
example of how many eggs can Hans eat for his break- 
fast) rather than to the epistemic notion of plausibility 
laid bare by Shackle. Nevertheless, the key axiom of 
maxitivity for possibility measures is highlighted. In 
the two subsequent articles [3.18, 19], Zadeh acknowl- 
edged the connection between possibility theory, belief 
functions and upper/lower probabilities, and proposed 
their extensions to fuzzy events and fuzzy information 
granules. 


3.2 Basic Notions of Possibility Theory 


The basic building blocks of possibility theory orig- 
inate in Zadeh’s paper [3.1] and were first ex- 
tensively described in the authors’ book [3.20], 
then further on in [3.3,21]. More recent accounts 
are in [3.4,5]. In this section, possibility theory 
is envisaged as a stand-alone theory of uncer- 
tainty. 


3.2.1 Possibility Distributions 


Let S be a set of states of affairs (or descriptions 
thereof), or states for short. This set can be the domain 
of an attribute (numerical or categorical), the Cartesian 
product of attribute domains, the set of interpretations 
of a propositional language etc.. A possibility distribu- 


3.2 Basic Notions of Possibility Theory 33 


T'E |Y Hed 


34 PartA 


Foundations 


T'E | Y Hed 


tion is a mapping z from S to a totally ordered scale L, 
with top denoted by | and bottom by 0. In the finite 
case L= {1 =A, > ++- Àn > Ànp1ı = 0}. The possibil- 
ity scale can be the unit interval as suggested by Zadeh, 
or generally any finite chain, or even the set of nonnega- 
tive integers. It is often assumed that L is equipped with 
an order-reversing map denoted by à € Le 1— À. 

The function z represents the state of knowledge of 
an agent (about the actual state of affairs), also called 
an epistemic state distinguishing what is plausible from 
what is less plausible, what is the normal course of 
things from what is not, what is surprising from what 
is expected. It represents a flexible restriction on what 
is the actual state with the following conventions (sim- 
ilar to probability, but opposite to Shackle’s potential 
surprise scale (If L = N, the conventions are opposite: 
0 means possible and oo means impossible.)): 


@ x(s) =0 means that state s is rejected as impossi- 
ble; 

© zx(s)=1 means that state s is totally possible (= 
plausible). 


The larger z(s), the more possible, i.e., plausible 
the state s is. Formally, the mapping z is the member- 
ship function of a fuzzy set [3.1], where membership 
grades are interpreted in terms of plausibility. If the uni- 
verse S is exhaustive, at least one of the elements of S 
should be the actual world, so that 3s, (s) = 1 (nor- 
malization). This condition expresses the consistency of 
the epistemic state described by zr. 

Distinct values may simultaneously have a degree 
of possibility equal to 1. In the Boolean case, z is just 
the characteristic function of a subset E C S of mutually 
exclusive states (a disjunctive set [3.22]), ruling out all 
those states considered as impossible. Possibility theory 
is thus a (fuzzy) set-based representation of incomplete 
information. 


3.2.2 Specificity 


A possibility distribution z is said to be at least as spe- 
cific as another zr’ if and only if for each state of affairs 
s: (s) < x’ (s) [3.23]. Then, x is at least as restrictive 
and informative as 7’, since it rules out at least as many 
states with at least as much strength. In the possibilistic 
framework, extreme forms of partial knowledge can be 
captured, namely: 


© Complete knowledge: for some so, 7 (so) = 1 and 
z (s) = 0, Vs Æ so (only so is possible) 


© Complete ignorance: x(s) = 1, Ws € S (all states are 
possible). 


Possibility theory is driven by the principle of min- 
imal specificity. It states that any hypothesis not known 
to be impossible cannot be ruled out. It is a minimal 
commitment, cautious information principle. Basically, 
we must always try to maximize possibility degrees, 
taking constraints into account. 

Given a piece of information in the form x is F, 
where F is a fuzzy set restricting the values of the ill- 
known quantity x, it leads to represent the knowledge by 
the inequality x < ur, the membership function of F. 
The minimal specificity principle enforces the possibil- 
ity distribution z = upr, if no other piece of knowledge 
is available. Generally there may be impossible val- 
ues of x due to other piece(s) of information. Thus, 
given several pieces of knowledge of the form x is F;, 
for i=1,...,n, each of them translates into the con- 
straint 2 < ur; hence, several constraints lead to the 
inequality 2 < min}_, Hr; and on behalf of the mini- 
mal specificity principle, to the possibility distribution 


n 
m = min , 
i=l 
where x; is induced by the information item x is F;. 
It justifies the use of the minimum operation for com- 
bining information items. It is noticeable that this way 
of combining pieces of information fully agrees with 
classical logic, since a classical logic base is equivalent 
to the logical conjunction of the logical formulas that 
belong to the base, and its models is obtained by in- 
tersecting the sets of models of its formulas. Indeed, in 
propositional logic, asserting a proposition @ amounts 
to declaring that any interpretation (state) that makes ¢ 
false is impossible, as being incompatible with the state 
of knowledge. 


3.2.3 Possibility and Necessity Functions 


Given a simple query of the form does event A occur? 
(is the corresponding proposition ¢ true?) where A is 
a subset of states, the response to the query can be 
obtained by computing degrees of possibility and ne- 
cessity, respectively (if the possibility scale L = [0, 1]) 


IT(A) = sup x (s); 


SEA 


N(A) = inf 1—z(s). 
séA 


IT(A) evaluates to what extent A is consistent with 
zt, while N(A) evaluates to what extent A is certainly 


Possibility Theory and Its Applications: Where Do We Stand? | 3.2 Basic Notions of Possibility Theory 35 


implied by x. The possibility—necessity duality is ex- 
pressed by N(A) = 1 — TI (A°), where A° is the comple- 
ment of A. Generally, 7(S) = N(S) = 1 and (Ø) = 
N(®) = 0 (since z is normalized to 1). In the Boolean 
case, the possibility distribution comes down to the dis- 
junctive (epistemic) set E C S [3.3, 24]: 


© I[1(A) =1ifANE # 9, and 0 otherwise: function M 
checks whether proposition A is logically consistent 
with the available information or not. 

© N(A)=1 if ECA, and 0 otherwise: function N 
checks whether proposition A is logically entailed 
by the available information or not. 


More generally, possibility and necessity measures 
represent degrees of plausibility and belief, respec- 
tively, in agreement with other uncertainty theories (see 
Sect. 3.4). Possibility measures satisfy the basic maxi- 
tivity property TI (A U B) = max(/7(A), II (B)). Neces- 
sity measures satisfy an axiom dual to that of possi- 
bility measures, namely N (A N B) = min(N (A), N(B)). 
On infinite spaces, these axioms must hold for infinite 
families of sets. As a consequence, of the normalization 
of x, min(N (A), N(A°)) = 0 and max (I (A), I (A°)) = 
1, where A‘ is the complement of A, or equivalently 
IT(A) = 1 whenever N(A) > 0, which totally fits the in- 
tuition behind this formalism, namely that something 
somewhat certain should be fully possible, i. e., consis- 
tent with the available information. 


3.2.4 Certainty Qualification 


Human knowledge is often expressed in a declara- 
tive way using statements to which belief degrees are 
attached. Certainty-qualified pieces of uncertain infor- 
mation of the form A is certain to degree a can then 
be modeled by the constraint N(A) > œ. It represents 
a family of possible epistemic states x that obey this 
constraints. The least specific possibility distribution 
among them exists and is defined by [3.3] 


9 1 ifseA, (3.1) 
T s) = ` 
(Aa) 1—a_ otherwise. 


If œ = 1, we get the characteristic function of A. If a = 
0, we get total ignorance. This possibility distribution 
is a key building block to construct possibility distri- 
butions from several pieces of uncertain knowledge. 
Indeed, e.g., in the finite case, any possibility distribu- 
tion can be viewed as a collection of nested certainty- 
qualified statements. Let E; = {s : 7 (s) > A; € L} be the 


Ài-cut of x. Then it is easy to check that 7r (s) = 
min;:sgg; | — N(E;) (with the convention ming = 1). 

We can also consider possibility-qualified state- 
ments of the form M (A) > p; however, the least spe- 
cific epistemic state compatible with this constraint 
expresses total ignorance. 


3.2.5 Joint Possibility Distributions 


Possibility distributions over Cartesian products of at- 
tribute domains S, x -++ X Sm are called joint possibility 


distributions 7 (s1,..., Sn). The projection ny of the 
joint possibility distribution x onto S; is defined as 


my (s) = T (S1 +++ Spi X {4} +++ Sepa X Sm) 


= sup A(S1,...,Sn). 
sj ES; izk 
Clearly, 7 (s1, ..., Sn) < ming, my (sx) that is, a joint 


possibility distribution is at least as specific as the 
Cartesian product of its projections. When the equality 
holds, 7 (s1, .. . , Sn) is called separable. 


3.2.6 Conditioning 


Notions of conditioning exist in possibility theory. Con- 
ditional possibility can be defined similarly to prob- 
ability theory using a Bayesian-like equation of the 
form [3.3] 


TI(BAA) = TI(B | A) x TI (A) . 


where TI(A)>0 and * is a t-norm (A nondecreas- 
ing Abelian semigroup operation on the unit interval 
having identity 1 and absorbing element 0 [3.25].); 
moreover N(B | A) = 1 — TI (B° | A). The above equa- 
tion makes little sense for necessity measures, as it 
becomes trivial when N(A) = 0, that is under lack 
of certainty, while in the above definition, the equa- 
tion becomes problematic only if M(A)= 0, which 
is natural as then A is considered impossible. If op- 
eration » is the minimum, the equation JI (B A A) = 
min(I(B | A), ITI (A)) fails to characterize M(B |A), 
and we must resort to the minimal specificity principle 
to come up with the qualitative conditioning rule [3.3] 


1 if (BNA) = T(A)>0, 


IT(B|A)= 
(B14) IT(BNA) otherwise . 


(3.2) 


T'E |Y Hed 


36 PartA | Foundations 


T'E | Y Hed 


It is clear that N(B |A) >0 if and only if 7(BN 
A) > TI (B° MA). Moreover, if TI(B | A) > (B) then 
TI(B|A)= 1, which points out the limited expres- 
siveness of this qualitative notion (no gradual positive 
reinforcement of possibility). However, it is possible to 
have that N(B) > 0, N(B° | A1) > 0, N(B |A: NA2) > 0 
(i.e., oscillating beliefs). Extensive works on condi- 
tional possibility, especially qualitative, handling the 
case IT(A) = 0, have been recently carried out by Co- 
letti and Vantaggi [3.26, 27] in the spirit of De Finetti’s 
approach to subjective probabilities defined in terms of 
conditional measures and allowing for conditioning on 
impossible events. 

In the numerical setting, due to the need of preserv- 
ing for JT7(B| A) continuity properties of IT, we must 
choose * = product, so that 


TI(BNA) 
TI(A) 


which makes possibilistic and probabilistic condition- 
ings very similar [3.28] (now, gradual positive rein- 
forcement of possibility is allowed). But there is yet 
another definition of numerical possibilistic condition- 
ing, not based on the above equation as seen later in this 
chapter. 


TI(B |A) = 


3.2.7 Independence 


There are also several variants of possibilistic indepen- 
dence between events. Let us mention here the two 
basic approaches: 


@ Unrelatedness: TI (AA B) = min(M (A), T (B)). 
When it does not hold, it indicates an epistemic 
form of mutual exclusion between A and B. It is 
symmetric but sensitive to negation. When it holds 
for all pairs made of A, B and their complements, 
it is an epistemic version of logical independence 
related to separability. 

© Causal independence: TI(B | A) = TI (B). This no- 
tion is different from the former one and stronger. 
It is a form of directed epistemic independence 
whereby learning A does not affect the plausibility 
of B. It is neither symmetric nor insensitive to nega- 
tion: for instance, it is not equivalent to N(B | A) = 
N(B). 


Generally, independence in possibility theory is 
neither symmetric, nor insensitive to negation. For 
Boolean variables, independence between events is 
not equivalent to independence between variables. But 


since the possibility scale can be qualitative or quan- 
titative, and there are several forms of conditioning, 
there are also various possible forms of independence. 
For studies of various notions and their properties 
see [3.29-32]. More discussions and references appear 
in [3.4]. 


3.2.8 Fuzzy Interval Analysis 


An important example of a possibility distribution is 
a fuzzy interval [3.3,20]. A fuzzy interval is a fuzzy 
set of reals whose membership function is unimodal 
and upper-semi continuous. Its œ-cuts are closed in- 
tervals. The calculus of fuzzy intervals is an extension 
of interval arithmetics based on a possibilistic counter- 
part of a computation of random variable. To compute 
the addition of two fuzzy intervals A and B one has 
to compute the membership function of A @ B as the 
degree of possibility pagg (2) = H ({(x, y) : x+y = z}), 
based on the possibility distribution min(j,4 (x), ug (y)). 
There is a large literature on possibilistic interval anal- 
ysis; see [3.33] for a survey of 20th-century refer- 
ences. 


3.2.9 Guaranteed Possibility 


Possibility distributions originally represent negative 
information in the sense that their role is essentially 
to rule out impossible states. More recently, [3.34, 35] 
another type of possibility distribution has been con- 
sidered where the information has a positive nature, 
namely it points out actually possible states, such as 
observed cases, examples of solutions, etc. Positively- 
flavored possibility distributions will be denoted by 6 
and serve as evidential support functions. The conven- 
tions for interpreting them contrast with usual possibil- 
ity distributions: 


@ 65(s)=1 means that state s is actually possible be- 
cause of a high evidential support (for instance, s is 
a case that has been actually observed); 

@ 6(s) =O means that state s has not been observed 
(yet: potential impossibility). 


Note that (s) = 1 indicates potential possibility, 
while 5(s) = 1 conveys more information. In contrast, 
ô(s) = 0 expresses ignorance. 

A measure of guaranteed possibility can be defined, 
that differs from functions M and N [3.34,35] 


A(A) = inf 3(s) . 


Possibility Theory and Its Applications: Where Do We Stand? | 3.2 Basic Notions of Possibility Theory 37 


It estimates to what extent all states in A are actually 
possible according to evidence. A(A) can be used as 
a degree of evidential support for A. Of course, this 
function possesses a conjugate V such that V(A) = 
1— A(A‘) = sup,g,4 1 — 6(s). Function V(A) evaluates 
the degree of potential necessity of A, as it is 1 
only if some state s outside A is potentially impossi- 
ble. 

Uncertain statements of the form A is possible to de- 
gree P often mean that any realization of A is possible 
to degree f (e.g., it is possible that the museum is open 
this afternoon). They can then be modeled by a con- 
straint of the form A(A) > £. It corresponds to the idea 
of observed evidence. 

This type of information is better exploited by as- 
suming an informational principle opposite to the one 
of minimal specificity, namely, any situation not yet ob- 
served is tentatively considered as impossible. This is 
similar to the closed-world assumption. The most spe- 
cific distribution ôq, g) in agreement with A(A) > £ is 


Ê ifseA, 


ô s) = 
a.p (s) 0 otherwise. 


Note that while possibility distributions induced from 
certainty qualified pieces of knowledge combine con- 
junctively, by discarding possible states, evidential 
support distributions induced by possibility-qualified 
pieces of evidence combine disjunctively, by accumu- 
lating possible states. Given several pieces of knowl- 
edge of the form x is F; is possible, for i= 1,...,n, 
each of them translates into the constraint ô > wr,; 
hence, several constraints lead to the inequality 6 > 
max'_, Ur; and on behalf of another minimal commit- 
ment principle based on maximal specificity, we get the 
possibility distribution 


n 
ô = max T; , 


i=1 


where 6; is induced by the information item x is F; is 
possible. It justifies the use of the maximum operation 
for combining evidential support functions. Acquiring 
pieces of possibility-qualified evidence leads to updat- 
ing (4,8) into some wider distribution ô > 5(4,8). Any 
possibility distribution can be represented as a collec- 
tion of nested possibility-qualified statements of the 
form (E;, A(E;)), with E; = {s : 6(s) > Aj}, since ô(s) = 
max;:;<z, A(E;), dually to the case of certainty-qualified 
statements. 


3.2.10 Bipolar Possibility Theory 


A bipolar representation of information using pairs 
(5, 7x) may provide a natural interpretation of interval- 
valued fuzzy sets [3.8]. Although positive and negative 
information are represented in separate and different 
ways via 6 and z functions, respectively, there is a co- 
herence condition that should hold between positive 
and negative information. Indeed, observed informa- 
tion should not be impossible. Likewise, in terms of 
preferences, solutions that are preferred to some ex- 
tent should not be unfeasible. This leads to enforce the 
coherence constraint ô < x between the two represen- 
tations. 

This condition should be maintained when new in- 
formation arrives and is combined with the previous 
one. This does not go for free since degrees 5(s) tend 
to increase while degrees (s) tend to decrease due 
to the disjunctive and conjunctive processes that, re- 
spectively, govern their combination. Maintaining this 
coherence requires a revision process that works as fol- 
lows. If the current information state is represented by 
the pair (5,7), receiving a new positive (resp. nega- 
tive) piece of information represented by ™°™ (resp. 
z™) to be enforced, leads to revising (6,7) into 
(max(6, 5°”), x") (resp. into (6, min(z, 2"*™)), us- 
ing, respectively, 


mY = max(z, 6"™) ; (3.3) 
5°’ = min(a", 5). (3.4) 


It is important to note that when both positive and neg- 
ative pieces of information are collected, there are two 
options: 


© Either priority is given to positive information over 
negative information: it means that (past) positive 
information cannot be ruled out by (future) negative 
information. This may be found natural when very 
reliable observations (represented by 5) contradict 
tentative knowledge (represented by zr). Then revis- 
ing (6, m) by (6°, 2") yields the new pair 


Oe mY) == (max(6, 67), 


max(min(z, 2"), max(6, 5"°”))) 


@ Priority is given to negative information over pos- 
itive information. It makes sense when handling 
preferences. Indeed, then, positive information may 
be viewed as wishes, while negative informa- 


T'E |Y Hed 


38 PartA | Foundations 


E'E | Y Hed 


tion reflects constraints. Then, revising (6,7) by 
(6°, a") would yield the new pair 


(6°, 2’) = (min(min(z, 2"), max(5, 6"™)), 


min(z,72™™)) . 


It can be checked that the two latter revision rules 
generalize the two previous ones. With both revision 
options, it can be checked that if 6 < m and 6" < 17" 


3.3 Qualitative Possibility Theory 


This section is restricted to the case of a finite state 
space S, typically S is the set of interpretations of a for- 
mal propositional language £ based on a finite set of 
Boolean attributes V. The usual connectives A (con- 
junction), V (disjunction), and — (negation) are used. 
The possibility scale is then taken as a finite chain, or 
the unit interval understood as an ordinal scale, or even 
just a complete preordering of states. At the other end, 
one may use the set of natural integers (viewed as an im- 
possibility scale) equipped with addition, which comes 
down to a countable subset of the unit interval, equipped 
with the product t-norm, instrumental for conditioning. 
However, the qualitative nature of the latter setting is 
questionable, even if authors using it do not consider it 
as genuinely quantitative. 


3.3.1 Possibility Theory and Modal Logic 


In this section, the possibility scale is Boolean (L = 
{0, 1}) and a possibility distribution reduces to a sub- 
set of states EF, for instance the models of a set of 
formulas K representing the beliefs of an agent in 
propositional logic. The presence of a proposition p 
in K can be modeled by N([p]) = 1, or I7([=p]) =0 
where [p] is the set of interpretations of p; more gen- 
erally the degrees of possibility and necessity can be 
defined by [3.36]: 


@ N((p]) = I ([p]) = 1 if and only if K }¥ p (the agent 
believes p) 

e N((=p]) = M([>p]) = 0 if and only if K =} —p (the 
agent believes =p) 

@ N([p]) = 0 and J7([p]) = 1 if and only if K {£ p and 
K |- -p (the agent is unsure about p) 


However, in propositional logic, it cannot be syn- 
tactically expressed that N([p]) = 0 nor J7([p]) = 1. To 


hold, revising (6, z) by (6°, 2") yields a new coher- 
ent pair. This revision process should not be confused. 
with another one pertaining only to the negative part of 
the information, namely computing min(z, 7") may 
yield a possibility distribution that is not normalized, in 
the case of inconsistency. If such an inconsistency takes 
place, it should be resolved (by some appropriate renor- 
malization) before one of the two above bipolar revision 
mechanisms can be applied. 


do so, a modal language is needed [3.12], that prefixes 
propositions with modalities such as necessary (O) and 
possible (©). Then Op encodes N([p]) = 1 (instead of 
p € K in classical logic), Op encodes IT([p]) = 1. Only 
a very simple modal language £— is needed that en- 
capsulates the propositional language £. Atoms of this 
logic are of the form Op, where p is any propositional 
formula. Well-formed formulas in this logic are ob- 
tained by applying standard conjunction and negation 
to these atoms 


£L- =Up,peLl|-d|oary. 


The well-known conjugateness between possibility and 
necessity reads: Op = —D-p. Maxitivity and minitiv- 
ity axioms of possibility and necessity measure, respec- 
tively, read O(pV q) = OpV ©q and O(p ^ q) = Op ^ 
q and are well known to hold in regular modal log- 
ics, and the consistency of the epistemic state is ensured 
by axiom D : Op > Op. This is the minimal epistemic 
logic (MEL) [3.37] needed to account for possibility 
theory. It corresponds to a small fragment of the logic 
KD without modality nesting and without objective for- 
mulas (£= N £ = Ø). Models of such modal formulas 
are epistemic states: for instance, E is a model of Op 
means that E C [p] [3.37,38]. This logic is sound and 
complete with respect to this semantics, and enables 
propositions whose truth status is explicitly unknown 
to be reasoned about. 


3.3.2 Comparative Possibility 


A plausibility ordering is a complete preorder of states 
denoted by > x, which induces a well-ordered partition 
{E\,...,£,} of S. It is the comparative counterpart of 
a possibility distribution 7, i.e., s >, s’ if and only if 
st(s) > 2(s’). Indeed it is more natural to expect that 


Possibility Theory and Its Applications: Where Do We Stand? 


an agent will supply ordinal rather than numerical in- 
formation about his beliefs. By convention, E; contains 
the most normal states of fact, E, the least plausible, 
or most surprising ones. Denoting by max(A) any most 
plausible state sọ € A, ordinal counterparts of possibil- 
ity and necessity measures [3.15] are then defined as 
follows: {s} > 77 Ø for all s € S and 


A >m B if and only if max(A) >, max(B) 
A >y B if and only if max(B°) > max(A°) . 


Possibility relations > yz were proposed by Lewis [3.14] 
and they satisfy his characteristic property 


A >r Bimplies CUA >z CUB, 


while necessity relations can also be defined as A >y 
B if and only if B° >77 A‘, and they satisfy a similar 
axiom 


A >y B implies CNA >y CAB. 


The latter coincides with epistemic entrenchment re- 
lations in the sense of belief revision theory [3.39] 
(provided that A > 77 Ø, if A Æ Ø). Conditioning a pos- 
sibility relation > zz by a nonimpossible event C > pry 9 
means deriving a relation >¢, such that 


A>% Bifand only ifANC>7 BNC. 


These results show that possibility theory is implicitly 
at work in the principal axiomatic approach to belief 
revision [3.40], and that conditional possibility obeys its 
main postulates [3.41]. The notion of independence for 
comparative possibility theory was studied by Dubois 
et al. [3.31], for independence between events, and Ben 
Amor et al. [3.32] between variables. 


3.3.3 Possibility Theory 
and Nonmonotonic Inference 


Suppose S$ is equipped with a plausibility ordering. The 
main idea behind qualitative possibility theory is that 
the state of the world is always believed to be as nor- 
mal as possible, neglecting less normal states. A > 77 B 
really means that there is a normal state where A holds 
that is at least as normal as any normal state where B 
holds. The dual case A >y B is intuitively understood 
as A is at least as certain as B, in the sense that there 
are states where B fails to hold that are at least as nor- 
mal as the most normal state where A does not hold. In 


particular, the events accepted as true are those which 
are true in all the most plausible states, namely the ones 
such that A >y Ø. These assumptions lead us to inter- 
pret the plausible inference A |~ B of a proposition B 
from another A, under a state of knowledge > 77 as fol- 
lows: B should be true in all the most normal states 
were A is true, which means B >H B® in terms of or- 
dinal conditioning, that is, A N B is more plausible than 
AN B°. A |x B also means that the agent considers B as 
an accepted belief in the context A. 

This kind of inference is nonmonotonic in the sense 
that A |~ B does not always imply A N C |~ B for any 
additional information C. This is similar to the fact that 
a conditional probability P(B | AM C) may be low even 
if P(B | A) is high. The properties of the consequence 
relation |x are now well understood, and are precisely 
the ones laid bare by Lehmann and Magidor [3.42] 
for their so-called rational inference. Monotonicity is 
only partially restored: A |~ B implies A N C |~ B pro- 
vided that A |~ C° does not hold (i. e., that states were 
A is true do not typically violate C). This property is 
called rational monotony, and, along with some more 
standard ones (like closure under conjunction), charac- 
terizes default possibilistic inference |-~. In fact, the set 
{B, A | B} of accepted beliefs in the context A is de- 
ductively closed, which corresponds to the idea that the 
agent reasons with accepted beliefs in each context as 
if they were true, until some event occurs that modifies 
this context. This closure property is enough to justify 
a possibilistic approach [3.43] and adding the rational 
monotonicity property ensures the existence of a single 
possibility relation generating the consequence relation 
|= [3.44]. 

Plausibility orderings can be generated by a set of 
if-then rules tainted with unspecified exceptions. This 
set forms a knowledge base supplied by an agent. Each 
tule if A then B is modeled by a constraint of the form 
ANB >r AN B° on possibility relations. There exists 
a single minimally specific element in the set of pos- 
sibility relations satisfying all constraints induced by 
rules (unless the latter are inconsistent). It corresponds 
to the most compact plausibility ranking of states in- 
duced by the rules [3.44]. This ranking can be computed 
by an algorithm originally proposed by Pearl [3.45]. 

Qualitative possibility theory has been studied from 
the point of view of cognitive psychology. Experimen- 
tal results [3.46] suggest that there are situations where 
people reason about uncertainty using the rules or pos- 
sibility theory, rather than with those of probability 
theory, namely people jump to plausible conclusions 
based on assuming the current world is normal. 


3.3 Qualitative Possibility Theory 39 


ee |W Hed 


40 PartA | Foundations 


ee | Y Hed 


3.3.4 Possibilistic Logic 


Qualitative possibility relations can be represented by 
(and only by) possibility measures ranging on any to- 
tally ordered set L (especially a finite one) [3.15]. This 
absolute representation on an ordinal scale is slightly 
more expressive than the purely relational one. For 
instance, one can express that a proposition is fully 
plausible (J7(A) = 1), while using a possibility rela- 
tion, one can only say that it is among the most plausible 
ones. When the finite set S is large and generated 
by a propositional language, qualitative possibility dis- 
tributions can be efficiently encoded in possibilistic 
logic [3.47—49]. 

A possibilistic logic base K is a set of pairs (p;, œi), 
where p; is an expression in classical (propositional or 
first-order) logic and a; > 0 is a element of the value 
scale L. This pair encodes the constraint N(p;) > a; 
where N(p;) is the degree of necessity of the set of mod- 
els of p;. Each prioritized formula (p;, œ;) has a fuzzy 
set of models (via certainty qualification described in 
Sect. 3.2) and the fuzzy intersection of the fuzzy sets 
of models of all prioritized formulas in K yields the 
associated plausibility ordering on S encoded by a pos- 
sibility distribution zx. Namely, an interpretation s is 
all the less possible as it falsifies formulas with higher 
weights, i. e., 


zxg(s) = lifs = pi, V(pi, i) EK, (3.5) 
Itk(s) = 1—max{q; : (pi, &i) € K, s F pi} 
otherwise . (3.6) 


This distribution is obtained by applying the minimal 
specificity principle, since it is the largest one that sat- 
isfies the constraints N(p;) > œ;. If the classical logic 
base {p; : (pi, &i) € K} is inconsistent, zx is not nor- 
malized, and a level of inconsistency equal to inc(K) = 
1 — max zx can be attached to the base K. However, the 
set of formulas {p; : (pi, &;) € K, a; > inc(K)} is always 
consistent. 

Syntactic deduction from a set of prioritized clauses 
is achieved by refutation using an extension of the stan- 
dard resolution rule, whereby (p V q, min(a@, 6)) can 
be derived from (pV r,a) and (qV =r, B). This rule, 
which evaluates the validity of an inferred proposition 
by the validity of the weakest premiss, goes back to 
Theophrastus, a disciple of Aristotle. Another way of 
presenting inference in possibilistic logic relies on the 
fact that KF (p,q) if and only if Ka = {p; : (pi, ai) € 
K,a; > «œ}F p in the sense of classical logic. In par- 
ticular, inc(K) = max{a : K F L}. Inference in possi- 


bilistic logic can use this extended resolution rule and 
proceeds by refutation since K F (p,q) if and only if 
inc({(—p, 1)} UK) > œ. Computational inference meth- 
ods in possibilistic logic are surveyed in [3.50]. 

Possibilistic logic is an inconsistency-tolerant ex- 
tension of propositional logic that provides a natural 
semantic setting for mechanizing nonmonotonic rea- 
soning [3.51], with a computational complexity close 
to that of propositional logic. Namely, once a possibil- 
ity distribution on models is generated by a set of if-then 
rules p; —> qi (as explained in Sect. 3.3.3 and modeled 
here using qualitative conditioning as N(q; | pi) > 9), 
weights œ; = N(—p; V qi) can be computed, and the cor- 
responding possibilistic base built [3.51]. See [3.52] for 
an efficient method involving compilation. 

Variants of possibilistic logic have been proposed 
in later works. A partially ordered extension of pos- 
sibilistic logic has been proposed, whose semantic 
counterpart consists of partially ordered models [3.53]. 
Another approach for handling partial orderings be- 
tween weights is to encode formulas with partially 
constrained weights in a possibilistic-like many-sorted 
propositional logic [3.54]. Namely, a formula (p, œ) is 
rewritten as a classical two-sorted clause p V abg, where 
aby means the situation is a-abnormal, and thus the 
clause expresses that p is true or the situation is abnor- 
mal, while more generally (p, min(a, 6)) is rewritten 
as the clause p V aby V abg. Then a known constraint 
between unknown weights such as a > £ is translated 
into a clause aby V abg. In this way, a possibilistic 
logic base, where only partial information about the rel- 
ative ordering between the weights is available under 
the form of constraints, can be handled as a set of clas- 
sical logic formulas that involve symbolic weights. 

An efficient inference process has been proposed 
using the notion of forgetting variables. This approach 
provides a technique for compiling a standard possi- 
bilistic knowledge bases in order to process inference 
in polynomial time [3.55]. Let us also mention quasi- 
possibilistic logic [3.56], an extension of possibilis- 
tic logic based on the so-called quasi-classical logic, 
a paraconsistent logic whose inference mechanism is 
close to classical inference (except that it is not allowed 
to infer pV q from p). This approach copes with in- 
consistency between formulas having the same weight. 
Other types of possibilistic logic can also handle con- 
straints of the form I7(¢) > a, or A(¢) > « [3.49]. 

There is a major difference between possibilistic 
logic and weighted many-valued logics [3.57]. Namely, 
in the latter, a weight t € L attached to a (many val- 
ued, thus nonclassical) formula p acts as a truth-value 


Possibility Theory and Its Applications: Where Do We Stand? | 3.3 Qualitative Possibility Theory 41 


threshold, and (p,t) in a fuzzy knowledge base ex- 
presses the Boolean requirement that the truth value of 
p should be at least equal to t for (p, T) to be valid. So 
in such fuzzy logics, while truth of p is many-valued, 
the validity of a weighted formula is two-valued. On the 
contrary, in possibilistic logic, truth is two-valued (since 
p is Boolean), but the validity of a possibilistic for- 
mula (p,@) is many-valued. In particular, it is possible 
to cast possibilistic logic inside a many-valued logic. 
The idea is to consider many-valued atomic sentences 
@ of the form (p,a), where p is a formula in classi- 
cal logic. Then, one can define well-formed formulas 
suchas ġ V Y, PAY, or yet 6 — y, where the exter- 
nal connectives linking ġ and y are those of the chosen 
many-valued logic. From this point of view, possibilis- 
tic logic can be viewed as a fragment of a many-valued 
logic that uses only one external connective: conjunc- 
tion interpreted as minimum. This approach involving 
a Boolean algebra embedded in a nonclassical one has 
been proposed by Boldrin and Sossai [3.58] with a view 
to augment possibilistic logic with fusion modes cast at 
the object level. It is also possible to replace classical 
logic by a many-valued logic inside possibilistic logic. 
For instance, possibilistic logic has been extended to 
Gödel many-valued logic [3.59]. A similar technique 
has been used by Hájek et al. to extend possibilistic 
logic to a many-valued modal setting [3.60]. 

Lehmke [3.61] has cast fuzzy logics and possibilistic 
logic inside the same framework, considering weighted 
many-valued formulas of the form (p,0), where p is 
a many-valued formula with truth set T, and @ is a la- 
bel defined as a monotone mapping from the truth-set 
T to a validity set L (a set of possibility degrees). T 
and L are supposed to be complete lattices, and the set 
of labels has properties that make it a fuzzy extension 
of a filter. Labels encompass fuzzy truth-values in the 
sense of Zadeh [3.62], such as very true, more or less 
true that express uncertainty about (many-valued) truth 
in a graded way. 

Rather than expressing statements such as it is half- 
true that John is tall, which presupposes a state of 
complete knowledge about John’s height, one may be 
interested in handling states of incomplete knowledge, 
namely assertions of the form all we know is that John 
is tall. One way to do it is to introduce fuzzy constants 
in a possibilistic first-ordered logic. Dubois, Prade, and 
Sandri [3.63] have noticed that an imprecise restric- 
tion on the scope of an existential quantifier can be 
handled in the following way. From the two premises 
Vx € A, ap(x, y) V q(x, y), and ax € B, p(x, a), where a 
is a constant, we can conclude that 3x € B, g(x, a) pro- 


vided that B C A. Thus, letting p(B, a) stand for 3x € 
B, p(x, a), one can write 


Vx € A, =p(x, y) V q(x, y), p(B, a) F q(B, a) 


if B CA, B being an imprecise constant. Letting A and 
B be fuzzy sets, the following pattern can be validated 
in possibilistic logic 


phx, y) V q(x, y), min(ua (x), @)), (p(B, a), B) 
F (q(B, a), min(Ng(A), a, B) , 


where Ng(A) = inf, max (u4 (t), 1 — ug(t)) is the neces- 
sity measure of the fuzzy event A based on fuzzy 
information B. Note that A, which appears in the weight 
slot of the first possibilistic formula plays the role of 
a fuzzy predicate, since the formula expresses that the 
more x is A, the more certain (up to level a) if p is true 
for (x, y), q is true for them as well. 

Alsinet and Godo [3.64,65] have applied possi- 
bilistic logic to logic programming that allows for 
fuzzy constants [3.65,66]. They have developed pro- 
gramming environments based on possibility theory. In 
particular, the above inference pattern can be strength- 
ened, replacing B by its cut Bg in the expression of 
Ng(A) and extended to a sound resolution rule. They 
have further developed possibilistic logic programming 
with similarity reasoning [3.67] and more recently ar- 
gumentation [3.68, 69]. 

Lastly, in order to improve the knowledge repre- 
sentation power of the answer-set programming (ASP) 
paradigm, the stable model semantics has been ex- 
tended by taking into account a certainty level, ex- 
pressed in terms of necessity measure, on each rule 
of a normal logic program. It leads to the definition 
of a possibilistic stable model for weighted answer- 
set programming [3.70]. Bauters et al. [3.71] introduce 
a characterization of answer sets of classical and pos- 
sibilistic ASP programs in terms of possibilistic logic 
where an ASP program specifies a set of constraints on 
possibility distributions. 


3.3.5 Ranking Function Theory 


A theory that parallels possibility theory to a large ex- 
tent and that has been designed for handling issues 
in belief revision, nonmonotonic reasoning and causa- 
tion, just like qualitative possibility theory is the one of 
ranking functions by Spohn [3.9, 10, 72]. The main dif- 
ference is that it is not really a qualitative theory as it 
uses the set of integers including oo (denoted by N+) 


ee |W Hed 


42 


ee | Y Hed 


Part A 


Foundations 


as a value scale. Hence, it is more expressive than qual- 
itative possibility theory, but it is applied to the same 
problems. 

Formally [3.10], a ranking function is a mapping x : 
2S —> N* such that: 


@ «x({s}) =0 for some s € S; 
@ «(A) = minses K({5}); 
© k(Ø)=%. 


It is immediate to verify that the set function 
TI (A) = 2~*™) is a possibility measure. So a ranking 
function is an integer-valued measure of impossibility 
(disbelief). The function (A) = x(A‘) is an integer- 
valued necessity measure used by Spohn for measuring 
belief, and it is clear that the rescaled necessity measure 
is N(A) = 1— 272%), Interestingly, ranking functions 
also bear close connection to probability theory [3.72], 
viewing «(A) as the exponent of an infinitesimal prob- 
ability, of the form P(A) = €“. Indeed the order of 
magnitude of P(A U B) is then «™“-«@)) | Integers 
also come up naturally if we consider Hamming dis- 
tances between models in the Boolean logic context, if 
for instance, the degree of possibility of an interpreta- 
tion is a function of its Hamming distance to the closest 
model of a classical knowledge base. 

Spohn [3.9] also introduces conditioning concepts, 
especially: 


@ The so-called A-part of k, which is a conditioning 
operation by event A defined by «(B | A) =K(BN 
A) —k(B); 

@ The (A,n)-conditionalization of «K, K(-| (A — n)) 
which is a revision operation by an uncertain input 
enforcing «’(A°) = n, and defined by 


K(s| A) 
n+k(s|A‘) 


ifseA 
otherwise . 
(3.7) 


K(s| (A> n)) = 


This operation makes A more believed than A‘ by n 
steps, namely, 


B(A|(A>n)) =0; 


It is easy to see that the conditioning of ranking 
functions comes down to the product-based condi- 
tioning of numerical possibility measures, and to the 
infinitesimal counterpart of usual Bayesian condition- 
ing of probabilities. The other conditioning rule can 
be obtained by means of Jeffrey’s rule of condition- 
ing [3.73] P(B | (A,a@)) = aP(B| A) + (1—a@)P(B | AS) 


pa | (A>n=n)). 


by a constraint of the form P(A) = «œ. Both qualita- 
tive and quantitative counterparts of this revision rule 
in possibility theory have been studied in detail [3.74, 
75]. In fact, ranking function theory is formally en- 
compassed by numerical possibility theory. Moreover, 
there is no fusion rule in Spohn theory, while fusion is 
one of the main applications of possibility theory (see 
Sect. 3.5). 


3.3.6 Possibilistic Belief Networks 


Another compact representation of qualitative possi- 
bility distributions is the possibilistic directed graph, 
which uses the same conventions as Bayesian nets, but 
relies on conditional possibility [3.76]. The qualitative 
approach is based on a symmetric notion of qualita- 
tive independence J7(BMA) = min(/7(A), I(B)) that 
is weaker than the causal-like condition [7(B | A) = 
IT(B) [3.31]. Like joint probability distributions, joint 
possibility distributions can be decomposed into a con- 
junction of conditional possibility distributions (us- 
ing minimum or product) in a way similar to Bayes 
nets [3.76]. A joint possibility distribution associated 
with variables X1, . . . , X„, decomposed by the chain rule 


n (Xi, , Xn) = min(z(X,, | X1,...,Xn—1), 
š .., (X2 | X1), a(X1)) x 


Such a decomposition can be simplified by assuming 
conditional independence relations between variables, 
as reflected by the structure of the graph. The form of 
independence between variables at work here is condi- 
tional noninteractivity: Two variables X and Y are inde- 
pendent in the context Z, if for each instance (x, y, z) of 
(X,Y,Z) we have: z(x,y | z) = min(z(x | z2), z | z)). 
Ben Amor and Benferhat [3.77] investigate the 
properties of qualitative independence that enable lo- 
cal inferences to be performed in possibilistic nets. 
Uncertainty propagation algorithms suitable for possi- 
bilistic graphical structures have been studied in [3.78]. 
It is also possible to propagate uncertainty in nondi- 
rected decompositions of joint possibility measures 
as done quite early by Borgelt etal. [3.79]. Coun- 
terparts of product-based numerical possibilistic nets 
using ranking functions exist as well [3.10]. Quali- 
tative possibilistic counterparts of decision trees and 
influence diagrams for decision trees have been re- 
cently investigated [3.80,81]. Compilation techniques 
for inference in possibilistic networks have been de- 
vised [3.82]. Finally, the study of possibilistic networks 
from the standpoint of causal reasoning has been inves- 


Possibility Theory and Its Applications: Where Do We Stand? | 3.3 Qualitative Possibility Theory 43 


tigated, using the concept of intervention, that comes 
down to enforcing the values of some variables so as to 
lay bare their influence on other ones [3.83, 84]. 


3.3.7 Fuzzy Rule-Based and Case-Based 
Approximate Reasoning 


A typology of fuzzy rules has been devised in the set- 
ting of possibility theory, distinguishing rules whose 
purpose is to propagate uncertainty through reasoning 
steps, from rules whose main purpose is similarity- 
based interpolation [3.85], depending on the choice 
of a many-valued implication connective that models 
a rule. The bipolar view of information based on (ô, 7) 
pairs sheds new light on the debate between conjunctive 
and implicative representation of rules [3.86]. Repre- 
senting a rule as a material implication focuses on 
counterexamples to rules, while using a conjunction 
between antecedent and consequent points out exam- 
ples of the rule and highlights its positive content. 
Traditionally in fuzzy control and modeling, the lat- 
ter representation is adopted, while the former is the 
logical tradition. Introducing fuzzy implicative rules in 
modeling accounts for constraints or landmark points 
the model should comply with (as opposed to observed 
data) [3.87]. The bipolar view of rules in terms of ex- 
amples and counterexamples may turn out to be very 
useful when extracting fuzzy rules from data [3.88]. 

Fuzzy rules have been applied to case-based rea- 
soning (CBR). In general, CBR relies on the following 
implicit principle: similar situations may lead to similar 
outcomes. Thus, a similarity relation § between prob- 
lem descriptions or situations, and a similarity measure 
T between outcomes are needed. This implicit CBR 
principle can be expressed in the framework of fuzzy 
rules as: “the more similar (in the sense of S) are the 
attribute values describing two situations, the more pos- 
sible the similarity (in the sense of T) of the values 
of the corresponding outcome attributes.” Given a sit- 
uation sọ associated to an unknown outcome fo and 
a current case (s,f), this principle enables us to con- 
clude on the possibility of f being equal to a value 
similar to ¢ [3.89]. This acknowledges the fact that, of- 
ten in practice, a database may contain cases that are 
rather similar with respect to the problem description 
attributes, but which may be distinct with respect to 
outcome attribute(s). This emphasizes that case-based 
reasoning can only lead to cautious conclusions. 

The possibility rule the more similar s and so, the 
more possible t and to are similar, is modeled in terms 
of a guaranteed possibility measure [3.90]. This leads 


to enforce the inequality Ao(T(t,-)) > Ws(s, So), which 
expresses that the guaranteed possibility that fo belongs 
to a high degree to the fuzzy set of values that are T- 
similar to ¢, is lower bounded by the S-similarity of s 
and so. Then the fuzzy set F of possible values f’ for fo 
with respect to case (s, t) is given by 


Fp (t) = min(ur(t, t), Ws(s, 50)) , 


since the maximally specific distribution such that 
A(A) > & is 6 = min(u4, œ). What is obtained is the 
fuzzy set T(t,.) of values ¢ that are T-similar to t, 
whose possibility level is truncated at the global degree 
[ts(S, 89) of similarity of s and so. The max-based ag- 
gregation of the various contributions obtained from the 
comparison with each case (s,f) in the memory M of 
cases acknowledges the fact that each new comparison 
may suggest new possible values for tọ and agrees with 
the positive nature of the information in the repository 
of cases. Thus, we obtain the following fuzzy set Eso of 
the possible values r’ for to 


Eso(t) = max min(S(s, so), T(t, t). 
(s,t)EM 


This latter expression can be put in parallel with the 
evaluation of a flexible query [3.91]. This approach has 
been generalized to imprecisely or fuzzily described sit- 
uations, and has been related to other approaches to 
instance-based prediction [3.92, 93]. 


3.3.8 Preference Representation 


Possibility theory also offers a framework for prefer- 
ence modeling in constraint-directed reasoning. Both 
prioritized and soft constraints can be captured by pos- 
sibility distributions expressing degrees of feasibility 
rather than plausibility [3.6]. Possibility theory offers 
a natural setting for fuzzy optimization whose aim is to 
balance the levels of satisfaction of multiple fuzzy con- 
straints (instead of minimizing an overall cost) [3.94]. 
In such problems, some possibility distributions repre- 
sent soft constraints on decision variables, other ones 
can represent incomplete knowledge about uncontrol- 
lable state variables. Qualitative decision criteria are 
particularly adapted to the handling of uncertainty in 
this setting. Possibility distributions can also model 
ill-known constraint coefficients in linear and nonlin- 
ear programming, thus leading to variants of chance- 
constrained programming [3.95]. 

Optimal solutions of fuzzy constraint-based prob- 
lems maximize the satisfaction of the most violated 
constraint, which does not ensure the Pareto dominance 


EE |Y Hed 


44 Part A 


Foundations 


ee | Y Hed 


of all such solutions. More demanding optimality no- 
tions have been defined, by canceling equally satisfied 
constraints (the so-called discrimin ordering) or using 
a leximin criterion [3.94, 96, 97]. 

Besides, the possibilistic logic setting provides 
a compact representation framework for preferences, 
where possibilistic logic formulas represent priori- 
tized constraints on Boolean domains. This approach 
has been compared to qualitative conditional prefer- 
ence networks (CP nets), based on a systematic ceteris 
paribus assumption (preferential independence between 
decision variables). CP nets induce partial orders of so- 
lutions rather than complete preorders, as possibilistic 
logic does [3.98]. Possibilistic networks can also model 
preference on the values of variables, conditional to the 
value of other ones, and offer an alternative to condi- 
tional preference networks [3.98]. 

Bipolar possibility theory has been applied to pref- 
erence problems where it can be distinguished between 
imperative constraints (modeled by propositions with 
a degree of necessity), and nonimperative wishes (mod- 
eled by propositions with a degree of guaranteed possi- 
bility level) [3.99]. Another kind of bipolar approach to 
qualitative multifactorial evaluation based on possibil- 
ity theory, is when comparing objects in terms of their 
pros and cons where the decision maker focuses on the 
most important assets or defects. Such qualitative multi- 
factorial bipolar decision criteria have been defined, ax- 
iomatized [3.100], and empirically tested [3.101]. They 
are qualitative counterparts of cumulative prospect the- 
ory criteria of Kahneman and Tverski [3.102]. 

Two issues in preference modeling based on possi- 
bility theory in a logic format are as follows: 


@ Preference statements of the form M (p) > IT(q) 
provide an incomplete description of a preference 
relation. One question is then how to complete this 
description by default. The principle of minimal 
specificity then means that a solution not explicitly 
rejected is satisfactory by default. The dual maximal 
specificity principle, says that a solution not sup- 
ported is rejected by default. It is not always clear 
which principle is the most natural. 

e@ A statement according to which it is better to sat- 
isfy a formula p than a formula q can in fact be 
interpreted in several ways. For instance, it may 
mean that the best solution satisfying p is better that 
the best solution satisfying q, which reads IT(p) > 
IT(q) and can be encoded in possibilistic logic 
under minimal specificity assumption; a stronger 
statement is that the worst solution satisfying p is 


better that the best solution satisfying q, which reads 
A(p) > TI (q). Other possibilities are A(p) > A(q), 
and IT(p) > A(q). This question is studied in some 
detail by Kaci [3.103]. 


3.3.9 Decision-Theoretic Foundations 


Zadeh [3.1] hinted that since our intuition concerning 
the behavior of possibilities is not very reliable, our un- 
derstanding of them 


would be enhanced by the development of an ax- 
iomatic approach to the definition of subjective 
possibilities in the spirit of axiomatic approaches 
to the definition of subjective probabilities. 


Decision-theoretic justifications of qualitative possibil- 
ity were devised, in the style of Von Neumann and 
Morgenstern, and Savage [3.104] more than 15 years 
ago [3.105, 106]. 

On top of the set of states, assume there is a set 
X of consequences of decisions. A decision, or act, is 
modeled as a mapping f from S to X assigning to each 
state S its consequence f(s). The axiomatic approach 
consists in proposing properties of a preference relation 
> between acts so that a representation of this relation 
by means of a preference functional W(f) is ensured, 
that is, act f is as good as act g (denoted by f > g) if 
and only if W(f) > W(g). W(f) depends on the agent’s 
knowledge about the state of affairs, here supposed to 
be a possibility distribution z on S, and the agent’s goal, 
modeled by a utility function u on X. Both the utility 
function and the possibility distribution map to the same 
finite chain L. A pessimistic criterion W; (f) is of the 
form 


Wee (f) = min max(n(sx(s)),u(f(s))), 


where n is the order-reversing map of L. n(z(s)) is the 
degree of certainty that the state is not s (hence the de- 
gree of surprise of observing s), u(f(s)) the utility of 
choosing act f in state s. W7 (f) is all the higher as all 
states are either very surprising or have high utility. This 
criterion is actually a prioritized extension of the Wald 
maximin criterion. The latter is recovered if z(s) = 1 
(top of L) Vs € S. According to the pessimistic criterion, 
acts are chosen according to their worst consequences, 
restricted to the most plausible states S* = {s, x (s) > 
n(W;, (f))}. The optimistic counterpart of this criterion 
is 


Wer (f) = max min(a (s)), u(f(s))) . 


Possibility Theory and Its Applications: Where Do We Stand? | 3.4 Quantitative Possibility Theory 45 


we (f) is all the higher as there is a very plausible state 
with high utility. The optimistic criterion was first pro- 
posed by Yager [3.107] and the pessimistic criterion by 
Whalen [3.108]. See Dubois et al. [3.109] for the res- 
olution of decision problems under uncertainty using 
the above criterion, and cast in the possibilistic logic 
framework. Such criteria can be refined by the classical 
expected utility criterion [3.110]. 

These optimistic and pessimistic possibilistic crite- 
ria are particular cases of a more general criterion based 
on the Sugeno integral [3.111] specialized to possibility 
and necessity of fuzzy events [3.1, 20] 


Syu(f) = max min(A, y(F,)) , 
AEL 


where Fy, = {s € S,u(f(s)) > A}, y is a monotonic set 
function that reflects the decision-maker attitude in 
front of uncertainty: y(A) is the degree of confidence in 
event A. If y = TI, then S77,,(f) = w (f). Similarly, if 
y =N, then Sy uf) = Wz (f). 

For any acts f, g, and any event A, let fAg denote an 
act consisting of choosing f if A occurs and g if its com- 
plement occurs. Let f A g (resp. f V g) be the act whose 
results yield the worst (resp. best) consequence of the 
two acts in each state. Constant acts are those whose 
consequence is fixed regardless of the state. A result 
in [3.112, 113] provides an act-driven axiomatization of 
these criteria, and enforces possibility theory as a ra- 
tional representation of uncertainty for a finite state 
space S: 


Theorem 3.1 
Suppose the preference relation > on acts obeys the fol- 
lowing properties: 


1. (X°,>)isa complete preorder. 
2. There are two acts such that f > g. 


3.4 Quantitative Possibility Theory 


The phrase quantitative possibility refers to the case 
when possibility degrees range in the unit interval, and 
are considered in connection with belief function and 
imprecise probability theory. Quantitative possibility 
theory is the natural setting for a reconciliation be- 
tween probability and fuzzy sets. In that case, a precise 
articulation between possibility and probability theo- 
ries is useful to provide an interpretation to possibility 
and necessity degrees. Several such interpretations can 


3. VWA,Vg and h constant, Yf,g > h implies gAf > 
hAf. 

4. Iff is constant, f > hand g > h imply f A g >h. 

5. Iff is constant, h > f and h > g imply h >f V g. 


Then there exists a finite chain L, an L-valued 
monotonic set function y on S and an L-valued utility 
function u, such that > is representable by a Sugeno 
integral of u(f) with respect to y. Moreover, y is a ne- 
cessity (resp. possibility) measure as soon as property 
(4) (resp. (5)) holds for all acts. The preference func- 
tional is then Wz (f) (resp. wx (f)). 


Axioms (4 and 5) contradict expected utility theory. 
They become reasonable if the value scale is finite, de- 
cisions are one-shot (no compensation) and provided 
that there is a big step between any level in the quali- 
tative value scale and the adjacent ones. In other words, 
the preference pattern f > h always means that f is 
significantly preferred to h, to the point of consider- 
ing the value of h negligible in front of the value of 
f. The above result provides decision-theoretic founda- 
tions of possibility theory, whose axioms can thus be 
tested from observing the choice behavior of agents. 
See [3.114] for another approach to comparative possi- 
bility relations, more closely relying on Savage axioms, 
but giving up any comparability between utility and 
plausibility levels. The drawback of these and other 
qualitative decision criteria is their lack of discrimi- 
nation power [3.115]. To overcome it, refinements of 
possibilistic criteria were recently proposed, based on 
lexicographic schemes. These refined criteria turn out 
to be by a classical (but big-stepped) expected utility 
criterion [3.110], and Sugeno integral can be refined by 
a Choquet integral [3.116]. For extension of this qual- 
itative decision-making framework to multiple-stage 
decision, see [3.117]. 


be consistently devised: a degree of possibility can 
be viewed as an upper probability bound [3.118], and 
a possibility distribution can be viewed as a likelihood 
function [3.119]. A possibility measure is also a special 
case of a Shafer plausibility function [3.120]. Following 
a very different approach, possibility theory can ac- 
count for probability distributions with extreme values, 
infinitesimal [3.72] or having big steps [3.121]. There 
are finally close connections between possibility theory 


re | Y Hed 


46 PartA 


Foundations 


re | Y Hed 


and idempotent analysis [3.122]. The theory of large de- 
viations in probability theory [3.123] also handles set 
functions that look like possibility measures [3.124]. 
Here we focus on the role of possibility theory in the 
theory of imprecise probability. 


3.4.1 Possibility as Upper Probability 


Let x be a possibility distribution where (s) € [0, 1]. 
Let P(r) be the set of probability measures P such 
that P < TI, i.e., WA C S, P(A) < II (A). Then the pos- 
sibility measure JI coincides with the upper probability 
function P* such that P*(A) = sup{P(A), P € P(x)} 
while the necessity measure N is the lower probabil- 
ity function Pẹ such that P(A) = inf{P (A), P € P(z)}; 
see [3.118, 125] for details. P and z are said to be con- 
sistent if P € P(x). The connection between possibility 
measures and imprecise probabilistic reasoning is es- 
pecially promising for the efficient representation of 
nonparametric families of probability functions, and it 
makes sense even in the scope of modeling linguistic 
information [3.126]. 

A possibility measure can be computed from 
nested confidence subsets {A,A2,...,Am} where A; C 
Aj41,i=1,...,m—1. Each confidence subset A; is 
attached a positive confidence level À; interpreted as 
a lower bound of P(A;), hence a necessity degree. It is 
viewed as a certainty qualified statement that generates 
a possibility distribution 7; according to Sect. 3.2. The 
corresponding possibility distribution is 


f ifue A 
© Ji- ifj=max{i:sgA}>1 


The information modeled by z can also be viewed 
as a nested random set {(A;, v;), i = 1,...,m}, where 
vi = Ài — Ài—1. This framework allows for imprecision 
(reflected by the size of the A;’s) and uncertainty (the 
v;’s). And v; is the probability that the agent only knows 
that A; contains the actual state (it is not P(A;)). The 
random set view of possibility theory is well adapted 
to the idea of imprecise statistical data, as developed 
in [3.127, 128]. Namely, given a bunch of imprecise 
(not necessarily nested) observations (called focal sets), 
x supplies an approximate representation of the data, as 
m(s) = are Vi. 

In the continuous case, a fuzzy interval M can be 
viewed as a nested set of œ-cuts, which are intervals 
Mo = {x : m(x) = a, Va > 0}. In the continuous case, 


note that the degree of necessity is N(Ma) = 1—a, 
and the corresponding probability set P(um) = {P: 
P(Mq) = 1—a,Va > 0}. Representing uncertainty by 
the family of pairs {((Mq, 1—a) : Va > 0} is very simi- 
lar to the basic approach of info-gap theory [3.129]. 

The set P(x) contains many probability distribu- 
tions, arguably too many. Neumaier [3.130] has re- 
cently proposed a related framework, in a different 
terminology, for representing smaller subsets of prob- 
ability measures using two possibility distributions in- 
stead of one. He basically uses a pair of distributions 
(6,7) (in the sense of Sect. 3.2) of distributions, he 
calls cloud, where 6 is a guaranteed possibility distri- 
bution (in our terminology) such that x > 6. A cloud 
models the (generally nonempty) set P(x) AN P(1 — ô), 
viewing | — ô as a standard possibility distribution. The 
precise connections between possibility distributions, 
clouds and other simple representations of numerical 
uncertainty is studied in [3.131]. 


3.4.2 Conditioning 


There are two kinds of conditioning that can be en- 
visaged upon the arrival of new information E. The 
first method presupposes that the new information al- 
ters the possibility distribution z by declaring all states 
outside E impossible. The conditional measure z(. | £) 
is such that TI (B | E)- I(E) = I(B A E). This is for- 
mally Dempster rule of conditioning of belief functions, 
specialized to possibility measures. The conditional 
possibility distribution representing the weighted set of 
confidence intervals is 


ms), 
——., ifseE 
m(s| E) = 4 IT(E) 
0 otherwise . 


De Baets et al. [3.28] provide a mathematical justifi- 
cation of this notion in an infinite setting, as opposed 
to the min-based conditioning of qualitative possibil- 
ity theory. Indeed, the maxitivity axiom extended to 
the infinite setting is not preserved by the min-based 
conditioning. The product-based conditioning leads to 
a notion of independence of the form [7(BN E) = 
IT(B)- IT(E) whose properties are very similar to the 
ones of probabilistic independence [3.30]. 

Another form of conditioning [3.132, 133], more 
in line with the Bayesian tradition, considers that the 
possibility distribution x encodes imprecise statisti- 
cal information, and event E only reflects a feature of 
the current situation, not of the state in general. Then 


Possibility Theory and Its Applications: Where Do We Stand? 


the value /7(B || E) = sup{P(B | E), P(E) > 0, P < IT} 
is the result of performing a sensitivity analysis of the 
usual conditional probability over P(x) [3.134]. Inter- 
estingly, the resulting set function is again a possibility 
measure, with distribution 


z(s || E) = 
max (0, a) , ifseE 
m(s) +N(E) 
0 otherwise . 


It is generally less specific than m on E, as clear 
from the above expression, and becomes noninforma- 
tive when N(E) = 0 (i.e., if there is no information 
about E). This is because z/(- || E) is obtained from 
the focusing of the generic information m over the 
reference class E. On the contrary, m(-| E) operates 
a revision process on m due to additional knowledge 
asserting that states outside E are impossible. See De 
Cooman [3.133] for a detailed study of this form of 
conditioning. 


3.4.3 Probability—Possibility 
Transformations 


The problem of transforming a possibility distribution 
into a probability distribution and conversely is mean- 
ingful in the scope of uncertainty combination with 
heterogeneous sources (some supplying statistical data, 
other linguistic data, for instance). It is useful to cast all 
pieces of information in the same framework. The ba- 
sic requirement is to respect the consistency principle 
IT > P. The problem is then either to pick a probability 
measure in P(x), or to construct a possibility measure 
dominating P. 

There are two basic approaches to possibility/ 
probability transformations, which both respect a form 
of probability—possibility consistency. One, due to 
Klir [3.135, 136] is based on a principle of information 
invariance, the other [3.137] is based on optimizing in- 
formation content. Klir assumes that possibilistic and 
probabilistic information measures are commensurate. 
Namely, the choice between possibility and probabil- 
ity is then a mere matter of translation between lan- 
guages neither of which is weaker or stronger than 
the other (quoting Klir and Parviz [3.138]). It sug- 
gests that entropy and imprecision capture the same 
facet of uncertainty, albeit in different guises. The 
other approach, recalled here, considers that going from 
possibility to probability leads to increase the preci- 
sion of the considered representation (as we go from 


a family of nested sets to a random element), while 
going the other way around means a loss of speci- 
ficity. 


From Possibility to Probability 
The most basic example of transformation from possi- 
bility to probability is the Laplace principle of insuf- 
ficient reason claiming that what is equally possible 
should be considered as equally probable. A general- 
ized Laplacean indifference principle is then adopted 
in the general case of a possibility distribution z: the 
weights v; bearing on the sets A; from the nested fam- 
ily of levels cuts of m are uniformly distributed on 
the elements of these cuts A;. Let P; be the uniform 
probability measure on A;. The resulting probability 
measure is P = J ;—1,...m Vi Pj. This transformation, 
already proposed in 1982 [3.139] comes down to select- 
ing the center of gravity of the set P(x) of probability 
distributions dominated by z. This transformation also 
coincides with Smets’ pignistic transformation [3.140] 
and with the Shapley value of the unamimity game (an- 
other name of the necessity measure) in game theory. 
The rationale behind this transformation is to minimize 
arbitrariness by preserving the symmetry properties of 
the representation. This transformation from possibility 
to probability is one-to-one. Note that the definition of 
this transformation does not use the nestedness prop- 
erty of cuts of the possibility distribution. It applies all 
the same to nonnested random sets (or belief functions) 
defined by pairs {(A;,v;),i=1,...,m}, where v; are 
nonnegative reals such that })j—),, Vi = 1. 
From Objective Probability to Possibility 

From probability to possibility, the rationale of the 
transformation is not the same according to whether 
the probability distribution we start with is subjec- 
tive or objective [3.106]. In the case of a statistically 
induced probability distribution, the rationale is to pre- 
serve as much information as possible. This is in 
line with the handling of A-qualified pieces of in- 
formation representing observed evidence, considered 
in Sect. 3.2; hence we select as the result of the 
transformation of a probability measure P, the most 
specific possibility measure in the set of those dominat- 
ing P [3.137]. This most specific element is generally 
unique if P induces a linear ordering on S. Suppose 
S is a finite set. The idea is to let JT (A) = P(A), for 
these sets A having minimal probability among other 
sets having the same cardinality as A. If pı > p2 > 
+++ > Pn, then IT(A) = P(A) for sets A of the form 
{sj,..-,5,}, and the possibility distribution is defined 


3.4 Quantitative Possibility Theory 47 


re | Y Hed 


48 PartA | Foundations 


te | Y Hed 


as (Si) = } j=i,....m Pj» With pj = P({s;}). Note that 
Xp is a kind of cumulative distribution of P, already 
known as a Lorentz curve in the mathematical liter- 
ature [3.141]. If there are equiprobable elements, the 
unicity of the transformation is preserved if equipossi- 
bility of the corresponding elements is enforced. In this 
case it is a bijective transformation as well. Recently, 
this transformation was used to prove a rather surpris- 
ing agreement between probabilistic indeterminateness 
as measured by Shannon entropy, and possibilistic non- 
specificity. Namely it is possible to compare probability 
measures on finite sets in terms of their relative peaked- 
ness (a concept adapted from Birnbaum [3.142]) by 
comparing the relative specificity of their possibilis- 
tic transforms. Namely let P and Q be two probability 
measures on S and zp, mg the possibility distribu- 
tions induced by our transformation. It can be proved 
that if mp > 7g (i.e., P is less peaked than Q) then 
the Shannon entropy of P is higher than the one of 
Q [3.143]. This result give some grounds to the in- 
tuitions developed by Klir [3.135], without assuming 
any commensurability between entropy and specificity 
indices. 


Possibility Distributions Induced by Prediction 

Intervals 
In the continuous case, moving from objective prob- 
ability to possibility means adopting a representation 
of uncertainty in terms of prediction intervals around 
the mode viewed as the most frequent value. Extract- 
ing a prediction interval from a probability distribution 
or devising a probabilistic inequality can be viewed 
as moving from a probabilistic to a possibilistic rep- 
resentation. Namely suppose a nonatomic probability 
measure P on the real line, with unimodal density ¢, 
and suppose one wishes to represent it by an interval 7 
with a prescribed level of confidence P(/) = y of hitting 
it. The most natural choice is the most precise interval 
ensuring this level of confidence. It can be proved that 
this interval is of the form of a cut of the density, i. e., 
I, = {s,o(s) = 0} for some threshold 6. Moving the 
degree of confidence from 0 to 1 yields a nested family 
of prediction intervals that form a possibility distribu- 
tion z consistent with P, the most specific one actually, 
having the same support and the same mode as P and 
defined by [3.137] 


(inf l,) = z (suply) = 1— y =1—PU,). 


This kind of transformation again yields a kind of 
cumulative distribution according to the ordering in- 


duced by the density ġ. Similar constructs can be 
found in the statistical literature (Birnbaum [3.142]). 
More recently Mauris et al. [3.144] noticed that starting 
from any family of nested sets around some charac- 
teristic point (the mean, the median,...), the above 
equation yields a possibility measure dominating P. 
Well-known inequalities of probability theory, such as 
those of Chebyshev and Camp-Meidel, can also be 
viewed as possibilistic approximations of probability 
functions. It turns out that for symmetric unimodal den- 
sities, each side of the optimal possibilistic transform 
is a convex function. Given such a probability density 
on a bounded interval [a, b], the triangular fuzzy num- 
ber whose core is the mode of ¢ and the support is 
[a, b] is thus a possibility distribution dominating P re- 
gardless of its shape (and the tightest such distribution). 
These results justify the use of symmetric triangu- 
lar fuzzy numbers as fuzzy counterparts to uniform 
probability distributions. They provide much tighter 
probability bounds than Chebyshev and Camp-Meidel 
inequalities for symmetric densities with bounded sup- 
port. This setting is adapted to the modeling of sensor 
measurements [3.145]. These results are extended to 
more general distributions by Baudrit et al. [3.146], 
and provide a tool for representing poor probabilis- 
tic information. More recently, Mauris [3.147] unifies, 
by means of possibility theory, many old techniques 
independently developed in statistics for one-point esti- 
mation, relying on the idea of dispersion of an empirical 
distribution. The efficiency of different estimators can 
be compared by means of fuzzy set inclusion applied 
to optimal possibility transforms of probability distribu- 
tions. This unified approach does not presuppose a finite 
variance. 


Subjective Possibility Distributions 
The case of a subjective probability distribution is dif- 
ferent. Indeed, the probability function is then supplied 
by an agent who is in some sense forced to express 
beliefs in this form due to rationality constraints, and 
the setting of exchangeable bets. However his actual 
knowledge may be far from justifying the use of a sin- 
gle well-defined probability distribution. For instance in 
case of total ignorance about some value, apart from its 
belonging to an interval, the framework of exchange- 
able bets enforces a uniform probability distribution, 
on behalf of the principle of insufficient reason. Based 
on the setting of exchangeable bets, it is possible to 
define a subjectivist view of numerical possibility the- 
ory, that differs from the proposal of Walley [3.134]. 
The approach developed by Dubois et al. [3.148] re- 


Possibility Theory and Its Applications: Where Do We Stand? | 3.5 Some Applications 49 


lies on the assumption that when an agent constructs 
a probability measure by assigning prices to lotteries, 
this probability measure is actually induced by a be- 
lief function representing the agent’s actual state of 
knowledge. We assume that going from an underly- 
ing belief function to an elicited probability measure 
is achieved by means of the above mentioned pignis- 
tic transformation, changing focal sets into uniform 
probability distributions. The task is to reconstruct this 
underlying belief function under a minimal commit- 
ment assumption. In the paper [3.148], we pose and 
solve the problem of finding the least informative be- 
lief function having a given pignistic probability. We 
prove that it is unique and consonant, thus induced by 
a possibility distribution. The obtained possibility dis- 
tribution can be defined as the converse of the pignistic 
transformation (which is one-to-one for possibility dis- 
tributions). It is subjective in the same sense as in the 
subjectivist school in probability theory. However, it is 
the least biased representation of the agent’s state of 
knowledge compatible with the observed betting be- 
havior. In particular, it is less specific than the one 
constructed from the prediction intervals of an objec- 
tive probability. This transformation was first proposed 
in [3.149] for objective probability, interpreting the em- 
pirical necessity of an event as summing the excess of 
probabilities of realizations of this event with respect 
to the probability of the most likely realization of the 
opposite event. 


Possibility Theory and Defuzzification 
Possibilistic mean values can be defined using Choquet 
integrals with respect to possibility and necessity mea- 
sures [3.133, 150], and come close to defuzzification 
methods [3.151]. Interpreting a fuzzy interval M, asso- 
ciated with a possibility distribution 4m, as a family of 


3.5 Some Applications 


Possibility theory has not been the main framework 
for engineering applications of fuzzy sets in the past. 
However, on the basis of its connections to symbolic 
artificial intelligence, to decision theory and to im- 
precise statistics, we consider that it has significant 
potential for further applied developments in a number 
of areas, including some where fuzzy sets are not yet 
always accepted. Only some directions are pointed out 
here. 


probabilities, upper and lower mean values E*(M) and 
E,.(M), can be defined as [3.152] 


1 1 
Ex(M) = | infMa do: E"M) = f supMa de 
0 0 


where Mg is the a-cut of M. 

Then the mean interval E(M) = [Ex (M), E* (M)] of 
M is the interval containing the mean values of all 
random variables consistent with M, that is E(M) = 
{E(P) | P € P(um)}, where E(P) represents the ex- 
pected value associated with the probability measure 
P. That the mean value of a fuzzy interval is an in- 
terval seems to be intuitively satisfactory. Particularly 
the mean interval of a (regular) interval [a,b] is this 
interval itself. The upper and lower mean values are 
linear with respect to the addition of fuzzy numbers. 
Define the addition M + N as the fuzzy interval whose 
cuts are My +Nq = {s+ t, s € My, t € Na} defined ac- 
cording to the rules of interval analysis. Then E(M + 
N) = E(M) + E(N), and similarly for the scalar multi- 
plication E(aM) = aE(M), where aM has membership 
grades of the form jzy(s/a) fora #0. In view of this 
property, it seems that the most natural defuzzication 
method is the middle point E(M) of the mean interval 
(originally proposed by Yager [3.153]). Other defuzzi- 
fication techniques do not generally possess this kind 
of linearity property. E(M) has a natural interpreta- 
tion in terms of simulation of a fuzzy variable [3.154], 
and is the mean value of the pignistic transformation 
of M. Indeed it is the mean value of the empirical 
probability distribution obtained by the random process 
defined by picking an element «œ in the unit interval 
at random, and then an element s in the cut Mg at 
random. 


3.5.1 Uncertain Database Querying 
and Preference Queries 


The evaluation of a flexible query in the face of incom- 
plete or fuzzy information amounts to computing the 
possibility and the necessity of the fuzzy event express- 
ing the gradual satisfaction of the query [3.155]. This 
evaluation, known as fuzzy pattern matching [3.156, 
157], corresponds to the extent to which fuzzy sets 


G'E | Y Hed 


50 PartA | Foundations 


G'E | Y Hed 


(representing the query) overlap, or include the possi- 
bility distributions (representing the available informa- 
tion). Such an evaluation procedure has been extended 
to symbolic labels that are no longer represented by 
possibility distributions, but which belong to possi- 
bilistic ontologies where approximate similarity and 
subsumption between labels are estimated in terms of 
possibility and necessity degrees, respectively [3.158]. 
These approaches presuppose a total lack of depen- 
dencies between ill-known attributes. A more general 
approach based on possible world semantics has been 
envisaged [3.159]. However, as for the probabilistic 
counterpart of this latter view, evaluating queries has 
a high computational cost [3.160]. This is why it has 
been proposed to only use certainty qualified values (or 
disjunctions of values), as in possibilistic logic, rather 
than general possibility distributions, for representing 
attribute values pervaded with uncertainty. It has been 
shown that it leads to a tractable extension of relational 
algebra operations [3.161, 162]. 

Besides, possibility theory is not only useful for 
representing qualitative uncertainty, but it may also 
be of interest for representing preferences, and as 
such may be applied to the handling of preferences 
queries [3.163]. Thus, requirements of the form A and 
preferably B (i.e., it is more satisfactory to have A and 
B than A alone), or A or at least B can be expressed 
using appropriate priority orderings, as in possibilistic 
logic [3.164]. Lastly, in bipolar queries [3.165-167], 
flexible constraints that are more or less compulsory 
are distinguished from additional wishes that are op- 
tional, as for instance in the request find the apartments 
that are cheap and maybe near the train station. Indeed, 
negative preferences express what is (more or less, or 
completely) impossible or undesirable, and by com- 
plementation state flexible constraints restricting the 
possible or acceptable values. Positive preferences are 
not compulsory, but rather express wishes; they state 
what attribute values would be really satisfactory. 


3.5.2 Description Logics 


Description logics (initially named terminological log- 
ics) are tractable fragments of first-order logic repre- 
sentation languages that handle notions of concepts, 
roles and instances, referring at the semantic level to 
the respective notions of set, binary relations, mem- 
bership, and cardinality. They are useful for describing 
ontologies that consist in hierarchies of concepts in 
a particular domain, for the semantic web. Two ideas 
that, respectively, come from fuzzy sets and possibil- 


ity theory, and that may be combined, may be used for 
extending the expressive power of description logics. 
On one hand, vague concepts can be approximated in 
practice by pairs of nested sets corresponding to the 
cores and the supports of fuzzy sets, thus sorting out 
the typical elements, in a way that agrees with fuzzy set 
operations and inclusions. On the other hand, a possi- 
bilistic treatment of uncertainty and exceptions can be 
performed on top of a description logic in a possibilistic 
logic style [3.168]. In both cases, the underlying prin- 
ciple is to remain as close as possible to classical logic 
for preserving computational efficiency as much as pos- 
sible. Thus, formal expressions such as (P 3 Q, B) 
intend to mean that it is certain at least at level B that 
the degree of subsumption of concept P in Q is at least 
a, in the sense of some X-implication (e.g., Gödel, or 
Kleene—Dienes implication). In particular, it can be ex- 
pressed that typical Ps are Qs, or that typical Ps are 
typical Qs, or that an instance is typical of a concept. 
Such ideas have been developed by Qi etal. [3.169] 
toward implemented systems in connection with web 
research. 


3.5.3 Information Fusion 


Possibility theory offers a simple, flexible framework 
for information fusion that can handle incompleteness 
and conflict. For instance, intervals or fuzzy intervals 
can be merged, coming from several sources. The ba- 
sic fusion modes are the conjunctive and disjunctive 
modes, presupposing, respectively, that all sources of 
information are reliable and that at least one is [3.170, 
171]. In the conjunctive mode, the use of the minimum 
operation avoids assuming sources are independent. If 
they are, the product rule can be applied, whereby 
low plausibility degrees reinforce toward impossibil- 
ity. Quite often, the results of a conjunctive aggregation 
are subnormalized, this indicating a conflict. Then, it 
is common to apply a renormalization step that makes 
this mode of combination brittle in case of strong con- 
flict, and anyway the more numerous the sources the 
more conflicting they become. Weighted average of 
possibility degrees can be used but it does not pre- 
serve the properties of possibility measure. The use of 
the disjunctive mode is more cautious: it avoids the 
conflict at the expense of losing information. When 
many sources are involved the result becomes totally 
uninformative. 

To cope with this problem, some ad hoc adap- 
tive combination rules have been proposed that fo- 
cus on maximal subsets of sources that are either 


Possibility Theory and Its Applications: Where Do We Stand? | 3.5 Some Applications 51 


fully consistent or not completely inconsistent [3.170]. 
This scheme has been further improved by Oussalah 
et al. [3.172]. Oussalah [3.173] has proposed a num- 
ber of postulates a possibilistic fusion rule should 
satisfy. Another approach is to merge the set of cuts 
of the possibility distributions based on the maximal 
consistent subsets of sources (consistent subsets of 
cuts are merged using conjunction, and the results are 
merged disjunctively). The result is then a belief func- 
tion [3.174]. Another option is to make a guess on 
the number of reliable sources and merge informa- 
tion inside consistent subsets of sources having this 
cardinality. 

Possibilistic information fusion can be performed 
syntactically on more compact representations such as 
possibilistic logic bases [3.175] (the merging of possi- 
bilistic networks [3.176] has also been recently consid- 
ered). The latter type of fusion may be of interest both 
from a computational and from representational point 
of view. Still it is important to make sure that the syntac- 
tic operations are counterparts of semantic ones. Fusion 
should be performed both at the semantic and at the syn- 
tactic levels equivalently. For instance, the conjunctive 
merging of two possibility distributions corresponds to 
the mere union of the possibilistic bases that represent 
them. More details for other operations can be found 
in [3.175, 177], and in the bipolar case in [3.99]. This 
line of research is pursued by Qi et al. [3.178]. They 
also proposed an approach to measuring conflict be- 
tween possibilistic knowledge bases [3.179]. 

The distance-based approach [3.180] that applies to 
the fusion of classical logic bases can be embedded 
in the possibilistic fusion setting as well [3.177]. The 
distance between an interpretation s and each classical 
base K is usually defined as d(s, K) = min{H(s,s*) : 
s* H K} where H(s,s*) is the Hamming distance that 
evaluates the number of literals with different signs in s 
and s*). It is then easy to encode the distance d(s, K) 
into a possibilistic knowledge base (interpreting pos- 
sibility as Hamming-distance-based similarity to the 
models of K, i. e., z (s) = a@-®) a e (0, 1)). The result 
of the possibilistic fusion is a possibilistic knowledge 
base, the highest weight layer of which is the classical 
database that is searched for, provided that the distance 
merging operation is suitably translated to a possibilis- 
tic merging operation. 

A similar problem exists in belief revision where an 
epistemic state, represented either by a possibility dis- 
tribution or by a possibilistic logic base, is revised by 
an input information p [3.181]. Revision can be viewed 
as prioritized fusion, using for instance conditioning, 


or other operations, depending if in the revised epis- 
temic state one wants to enforce N(p) = 1, or N(p) > 0 
only, or if we are dealing with an uncertain input (p, œ). 
Then, the uncertain input may be understood as enforc- 
ing N(p) => « in any case, or as taking it into account 
only if it is sufficiently certain w.r.t. the current epis- 
temic state. 


3.5.4 Temporal Reasoning and Scheduling 


Temporal reasoning may refer to time intervals or to 
time points. When handling time intervals, the basic 
building block is the one provided by Allen relations 
between time intervals. There are 13 relations that de- 
scribe the possible relative locations of two intervals. 
For instance, given the two intervals A = [a,a’] and 
B = |b,b’], A is before (resp. after) B means a’ <b 
(resp. b’ < a), A meets (resp. is met by) B means a’ = b 
(resp. b’ = a), A overlaps (resp. is overlapped by) B iff 
b >a and d’ >b and b’ >a’ (resp. a >b and V >a, 
and a’ > b’). The introduction of fuzzy features in tem- 
poral reasoning can be related to two different issues: 


© First, it can be motivated by the need of a grad- 
ual, linguistic-like description of temporal relations 
even in the face of complete information. Then 
an extension of Allen relational calculus has been 
proposed, which is based on fuzzy comparators 
expressing linguistic tolerance, which are used in 
place of the exact relations >, =’, and <. Fuzzy 
Allen relations are thus defined from three fuzzy 
relations between dates that can be, for instance 
approximately equal, clearly greater, and clearly 
smaller, where, e.g., the extent to which x is ap- 
proximately equal to y is the degree of membership 
of x— y to some fuzzy set expressing something like 
small [3.182, 183]. 

@ Second, the possibilistic handling of fuzzy or in- 
complete information leads to pervade classical 
Allen relations, and more generally fuzzy Allen 
relations, with uncertainty. Then patterns for prop- 
agating uncertainty and composing the different 
(fuzzy) Allen relations in a possibilistic way have 
been laid bare [3.184, 185]. 


Besides, the handling of temporal reasoning in 
terms of relations between time points can also be ex- 
tended in case of uncertain information [3.186]. Uncer- 
tain relations between temporal points are represented 
by means of possibility distributions over the three basic 
relations >,=’, and <. Operations for computing in- 


G'E | Y Hed 


52 


G'E | Y Hed 


Part A 


Foundations 


verse relations, for composing relations, for combining 
relations coming from different sources and pertaining 
to the same temporal points, or for handling negation, 
have been defined. This shows that possibilistic tempo- 
ral uncertainty can be handled in the setting of point 
algebra. The possibilistic approach can then be favor- 
ably compared with a probabilistic approach previously 
proposed (first, the approach can be purely qualitative, 
thus avoiding the necessity of quantifying uncertainty 
if information is poor, and second, it is capable of 
modeling ignorance in a nonbiased way). Possibilis- 
tic logic has also been extended to a timed version 
where time intervals where a proposition is more or 
less certainly true is attached to classical propositional 
formulas [3.187]. 

Applications of possibility theory-based decision- 
making can be found in scheduling. One issue is to 
handle fuzzy due dates of jobs using the calculus of 
fuzzy constraints [3.188]. Another issue is to han- 
dle uncertainty in task durations in basic scheduling 
problems such as program evaluation and review tech- 
nique (PERT) networks. A large literature exists on this 
topic [3.189, 190] where the role of fuzzy sets is not al- 
ways very clear. Convincing solutions on this problem 
start with the works of Chanas and Zielinski [3.191, 
192], where the problem is posed in terms of projecting 
a joint possibility theory on quantities of interest (earli- 
est finishing times, or slack times) and where tasks can 
be possibly or certainly critical. A full solution apply- 
ing Boolean possibility theory to interval uncertainty of 
tasks durations is described in [3.193], and its fuzzy 
extension in [3.194]. Other scheduling problems are 
solved in the same possibilistic framework by Kasper- 
ski and colleagues [3.195, 196], as well as more general 
optimization problems [3.197, 198]. 


3.5.5 Risk Analysis 


The aim of risk analysis studies is to perform un- 
certainty propagation under poor data and without 
independence assumptions (see the papers in the spe- 
cial issue [3.199]). Finding the potential of possibilis- 
tic representations in computing conservative bounds 
for such probabilistic calculations is certainly a ma- 
jor challenge [3.200]. An important research direc- 
tion is the comparison between fuzzy interval anal- 
ysis [3.33] and random variable calculations with 


a view to unifying them [3.201]. Methods for joint 
propagation of possibilistic and probabilistic infor- 
mation have been devised [3.202], based on casting 
both in a random set setting [3.203]; the case of 
probabilistic models with fuzzy interval parameters 
has also been dealt with [3.204]. The active area of 
fuzzy random variables is also connected to this ques- 
tion [3.205]. 


3.5.6 Machine Learning 


Applications of possibility theory to learning have 
started to be investigated rather recently in differ- 
ent directions. For instance, taking advantage of the 
proximity between reinforcement learning and partially 
observed Markov decision processes, a possibilistic 
counterpart of reinforcement learning has been pro- 
posed after developing the possibilistic version of the 
latter [3.206]. Besides, by looking for big-stepped prob- 
ability distributions, defined by discrete exponential 
distributions, one can mine data bases for discovering 
default rules [3.207]. Big-stepped probabilities mim- 
ick possibility measures in the sense that P(A) > P(B) 
if and only if max;e, p(s) > max;eg p(s). The ver- 
sion space approach to learning presents interesting 
similarities with the binary bipolar possibilistic rep- 
resentation setting, thinking of examples as positive 
information and of counterexamples as negative in- 
formation [3.208]. The general bipolar setting, where 
intermediary degrees of possibility are allowed, pro- 
vides a basis for extending version space approach 
in a graded way, where examples and counter ex- 
amples can be weighted according to their impor- 
tance. The graded version space approach agrees with 
the possibilistic extension of inductive logic program- 
ming [3.209]. Indeed, where the background knowl- 
edge may be associated with certainty levels, the 
examples may be more or less important to cover, 
and the set of rules that is learnt may be stratified 
in order to have a better management of exceptions 
in multiple-class classification problems, in agree- 
ment with the possibilistic approach to nonmonotonic 
reasoning. 

Other applications of possibility theory can be 
found in fields such as data analysis [3.79, 210,211], 
diagnosis [3.212,213], belief revision [3.181], argu- 
mentation [3.68, 214, 215], etc. 


Possibility Theory and Its Applications: Where Do We Stand? 


3.6 Some Current Research Lines 


3.6 Some Current Research Lines 


A number of ongoing works deal with new research 
lines where possibility theory is central. In the follow- 
ing, we outline a few of those: 


© Formal concept analysis: Formal concept analysis 
(FCA) studies Boolean data tables relating objects 
and attributes. The key issue of FCA is to ex- 
tract so-called concepts from such tables. A concept 
is a maximal set of objects sharing a maximal 
number of attributes. The enumeration of such con- 
cepts can be carried out via a Galois connection 
between objects and attributes, and this Galois con- 
nection uses operators similar to the A function of 
possibility theory. Based on this analogy, other cor- 
respondences can be laid bare using the three other 
set functions of possibility theory [3.216,217]. In 
particular, one of these correspondences detects in- 
dependent subtables [3.22]. This approach can be 
systematized to fuzzy or uncertain versions of for- 
mal concept analysis. 

© Generalized possibilistic logic: Possibilistic logic, 
in its basic version, attaches degrees of necessity 
to formulas, which turn them into graded modal 
formulas of the necessity kind. However only con- 
junction of weighted formulas are allowed. Yet 
very early we noticed that it makes sense to ex- 
tend the language toward handing constraints on 
the degree of possibility of a formula. This re- 
quires allowing for negation and disjunctions of 
necessity-qualified proposition. This extension, still 
under study [3.218], puts together the KD modal 
logic and basic possibilistic logic. Recently it has 
been shown that nonmonotonic logic programming 
languages can be translated into generalized pos- 
sibilistic logic, making the meaning of negation 
by default in rules much more transparent [3.219]. 
This move from basic to generalized possibilistic 
logic also enables further extensions to the mul- 
tiagent and the multisource case [3.220] to be 
considered. Besides, it has been recently shown 
that a Sugeno integral can also be represented in 
terms of possibilistic logic, which enables us to lay 
bare the logical description of an aggregation pro- 
cess [3.221]. 


© Qualitative capacities and possibility measures: 


While a numerical possibility measure is equiva- 
lent to a convex set of probability measures, it turns 
out that in the qualitative setting, a monotone set 
function can be represented by means of a family 
of possibility measures [3.222, 223]. This line of re- 
search enables qualitative counterparts of results in 
the study of Choquet capacities in the numerical set- 
tings to be established. Especially, a monotone set 
function can be seen as the counterpart of a belief 
function, and various concepts of evidence the- 
ory can be adapted to this setting [3.224]. Sugeno 
integral can be viewed as a lower possibilistic ex- 
pectation in the sense of Sect. 3.3.9 [3.223]. These 
results enable the structure of qualitative monotonic 
set functions to be laid bare, with possible con- 
nection with neighborhood semantics of nonregular 
modal logics [3.225]. 

Regression and kriging: Fuzzy regression analy- 
sis is seldom envisaged from the point of view of 
possibility theory. One exception is the possibilis- 
tic regression initiated by Tanaka and Guo [3.211], 
where the idea is to approximate precise or set- 
valued data in the sense of inclusion by means 
of a set-valued or fuzzy set-valued linear function 
obtained by making the linear coefficients of a lin- 
ear function fuzzy. The alternative approach is the 
fuzzy least squares of Diamond [3.226] where fuzzy 
data are interpreted as functions and a crisp dis- 
tance between fuzzy sets is often used. However, in 
this approach, fuzzy data are questionably seen as 
objective entities [3.227]. The introduction of pos- 
sibility theory in regression analysis of fuzzy data 
comes down to an epistemic view of fuzzy data 
whereby one tries to construct the envelope of all 
linear regression results that could have been ob- 
tained, had the data been precise [3.228]. This view 
has been applied to the kriging problem in geo- 
Statistics [3.229]. Another use of possibility theory 
consists in exploiting possibility—probability trans- 
forms to develop a form of quantile regression on 
crisp data [3.230], yielding a fuzzy function that is 
much more faithful to the data set than what a fuzzi- 
fied linear function can offer. 


53 


9'E | Y Hed 


54 PartA 


Foundations 


£ | Y Hed 


References 

3.1 L.A. Zadeh: Fuzzy sets as a basis for a theory of pos- 3.18 L.A. Zadeh: Fuzzy sets and information granularity. 
sibility, Fuzzy Set. Syst. 1, 3-28 (1978) In: Advances in Fuzzy Set Theory and Applications, 

3.2 B.R. Gaines, L. Kohout: Possible automata, Proc. ed. by M.M. Gupta, R. Ragade, R.R. Yager (Amster- 
Int. Symp. Multiple-Valued Logics (Bloomington, dam, North-Holland 1979) pp. 3-18 
Indiana 1975) pp. 183-196 3.19 L.A. Zadeh: Possibility theory and soft data analy- 

3.3 D. Dubois, H. Prade: Possibility Theory (Plenum, sis. In: Mathematical Frontiers of Social and Policy 
New York 1988) Sciences, ed. by L. Cobb, R. Thrall (Westview, Boul- 

3.4 D. Dubois, H.T. Nguyen, H. Prade: Fuzzy sets and der 1982) pp. 69-129 
probability: Misunderstandings, bridges and gaps. 3.20 D. Dubois, H. Prade: Fuzzy Sets and Systems: The- 
In: Fundamentals of Fuzzy Sets, ed. by D. Dubois, ory and Applications (Academic Press, New York 
H. Prade (Kluwer, Boston 2000) pp. 343-438, see 1980) 
also the bibliography in http://www.scholarpedia. 3.21 G.J. Klir, T. Folger: Fuzzy Sets, Uncertainty and 
org/article/Possibility_theory) Information (Prentice Hall, Englewood Cliffs 

3.5 D. Dubois, H. Prade: Possibility theory: Qualitative 1988) 
and quantitative aspects. In: Handbook of Defea- 3.22 D. Dubois, H. Prade: Possibility theory and formal 
sible Reasoning and Uncertainty Management Sys- concept analysis: Characterizing independent sub- 
tems, Vol. 1, ed. by D.M. Gabbay, P. Smets (Kluwer, contexts, Fuzzy Set. Syst. 196, 4-16 (2012) 
Dordrecht 1998) pp. 169-226 3.23 R.R. Yager: An introduction to applications of pos- 

3.6 D. Dubois, H. Fargier, H. Prade: Possibility theory in sibility theory, Hum. Syst. Manag. 3, 246-269 (1983) 
constraint satisfaction problems: Handling priority, 3.24 R.R. Yager: A foundation for a theory of possibility, 
preference and uncertainty, Appl. Intell. 6, 287-309 Cybern. Syst. 10(1-3), 177-204 (1980) 

(1996) 3.25 E.P. Klement, R. Mesiar, E. Pap: Triangular Norms 

3.7 S. Benferhat, D. Dubois, S. Kaci, H. Prade: Model- (Kluwer, Dordrecht 2000) 
ing positive and negative information in possibility 3.26 G. Coletti, B. Vantaggi: Comparative models ruled 
theory, Int. J. Intell. Syst. 23, 1094-1118 (2008) by possibility and necessity: A conditional world, 

3.8 D. Dubois, H. Prade: An overview of the asymmetric Int. J. Approx. Reason. 45(2), 341-363 (2007) 
bipolar representation of positive and negative in- 3.27 G. Coletti, B. Vantaggi: T-conditional possibilities: 
formation in possibility theory, Fuzzy Set. Syst. 160, Coherence and inference, Fuzzy Set. Syst. 160(3), 
1355-1366 (2009) 306-324 (2009) 

3.9 W. Spohn: Ordinal conditional functions: A dy- 3.28 B. De Baets, E. Tsiporkova, R. Mesiar: Condition- 
namic theory of epistemic states. In: Causation in ing in possibility with strict order norms, Fuzzy Set. 
Decision, Belief Change, and Statistics, Vol. 2, ed. Syst. 106, 221-229 (1999) 
by W.L. Harper, B. Skyrms (Kluwer, Dordrecht 1988) 3.29 G. De Cooman: Possibility theory. Part I: Measure- 
pp. 105-134 and integral-theoretic groundwork; Part Il: Con- 

3.10 W. Spohn: The Laws of Belief: Ranking Theory and ditional possibility; Part Ill: Possibilistic indepen- 
its Philosophical Applications (Oxford Univ. Press, dence, Int. J. Gen. Syst. 25, 291-371 (1997) 

Oxford 2012) 3.30 L.M. De Campos, J.F. Huete: Independence concepts 

3.11 I.M. Bocheński: La Logique de Théophraste (Li- in possibility theory, Fuzzy Set. Syst. 103, 487-506 
brairie de l'Université de Fribourg en Suisse, Fri- (1999) 
bourg 1947) 3.31 D. Dubois, L. del Farinas Cerro, A. Herzig, 

3.12 B.F. Chellas: Modal Logic, an Introduction (Cam- H. Prade: Qualitative relevance and independence: 
bridge Univ. Press, Cambridge 1980) A roadmap, Proc. 15th Int. Jt. Conf. Artif. Intell. 

3.13 G.L.S. Shackle: Decision, Order and Time in Human Nagoya (1997) pp. 62-67 
Affairs, 2nd edn. (Cambridge Univ. Press, Cam- 3.32 N. Ben Amor, K. Mellouli, S. Benferhat, D. Dubois, 
bridge 1961) H. Prade: A theoretical framework for possibilis- 

3.14 D.L. Lewis: Counterfactuals (Basil Blackwell, Oxford tic independence in a weakly ordered setting, Int. 
1973) J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 117- 

3.15 D. Dubois: Belief structures, possibility theory and 155 (2002) 
decomposable measures on finite sets, Comput. Ar- 3.33 D. Dubois, E. Kerre, R. Mesiar, H. Prade: Fuzzy inter- 
tif. Intell. 5, 403-416 (1986) val analysis. In: Fundamentals of Fuzzy Sets, ed. by 

3.16 T. Sudkamp: Similarity and the measurement of D. Dubois, H. Prade (Kluwer, Boston 2000) pp. 483- 
possibility, Actes Rencontres Francophones sur la 581 
Logique Floue et ses Applications (Cepadues Edi- 3.34 D. Dubois, H. Prade: Possibility theory as a basis 
tions, Toulouse 2002) pp. 13-26 for preference propagation in automated reason- 

3.17 L.J. Cohen: The Probable and the Provable (Claren- ing, Proc. 1st IEEE Int. Conf. Fuzzy Systems (FUZZ- 


don, Oxford 1977) 


IEEE'92), San Diego (1992) pp. 821-832 


Possibility Theory and Its Applications: Where Do We Stand? 


References 


3.35 


3.42 


3.43 


3.44 


3.45 


3.46 


3.47 


3.48 


3.49 


D. Dubois, P. Hajek, H. Prade: Knowledge-driven 
versus data-driven logics, J. Log. Lang. Inform. 9, 
65-89 (2000) 

D. Dubois, H. Prade: Possibility theory, probability 
theory and multiple-valued logics: A clarification, 
Ann. Math. Artif. Intell. 32, 35-66 (2001) 

M. Banerjee, D. Dubois: A simple modal logic for 
reasoning about revealed beliefs, Lect. Notes Artif. 
Intell. 5590, 805-816 (2009) 

M. Banerjee, D. Dubois: A simple logic for reasoning 
about incomplete knowledge, Int. J. Approx. Rea- 
son. 55, 639-653 (2014) 

D. Dubois, H. Prade: Epistemic entrenchment and 
possibilistic logic, Artif. Intell. 50, 223-239 (1991) 

P. Gardenfors: Knowledge in Flux (MIT Press, Cam- 
bridge 1988) 

D. Dubois, H. Prade: Belief change and possibil- 
ity theory. In: Belief Revision, ed. by P. Gardenfors 
(Cambridge Univ. Press, Cambridge 1992) pp. 142- 
182 

D. Lehmann, M. Magidor: What does a condi- 
tional knowledge base entail?, Artif. Intell. 55, 
1-60 (1992) 

D. Dubois, H. Fargier, H. Prade: Ordinal and prob- 
abilistic representations of acceptance, J. Artif. In- 
tell. Res. 22, 23-56 (2004) 

S. Benferhat, D. Dubois, H. Prade: Nonmonotonic 
reasoning, conditional objects and possibility the- 
ory, Artif. Intell. 92, 259-276 (1997) 

J. Pearl: System Z: A natural ordering of defaults 
with tractable applications to default reasoning, 
Proc. 3rd Conf. Theor. Aspects Reason. About Knowl. 
(Morgan Kaufmann, San Francisco 1990) pp. 121- 
135 

E. Raufaste, R. Da Silva Neves, C. Mariné: Testing the 
descriptive validity of possibility theory in human 
judgements of uncertainty, Artif. Intell. 148, 197- 
218 (2003) 

H. Farreny, H. Prade: Default and inexact reason- 
ing with possibility degrees, IEEE Trans. Syst. Man 
Cybern. 16(2), 270-276 (1986) 

D. Dubois, J. Lang, H. Prade: Possibilistic logic. In: 
Handbook of Logic in Al and Logic Programming, 
Vol. 3, ed. by D.M. Gabbay (Oxford Univ. Press, 0x- 
ford 1994) pp. 439-513 

D. Dubois, H. Prade: Possibilistic logic: A retrospec- 
tive and prospective view, Fuzzy Set. Syst. 144, 3-23 
(2004) 

J. Lang: Possibilistic logic: Complexity and algo- 
rithms. In: Algorithms for Uncertainty and Defea- 
sible Reasoning, (Kluwer, Dordrecht 2001) pp. 179- 
220 

S. Benferhat, D. Dubois, H. Prade: Practical han- 
dling of exception-tainted rules and indepen- 
dence information in possibilistic logic, Appl. In- 
tell. 9, 101-127 (1998) 

S. Benferhat, S. Yahi, H. Drias: A new default the- 
ories compilation for MSP-entailment, J. Autom. 
Reason. 45(1), 39-59 (2010) 


3.53 


3.61 


3.62 


3.63 


3.65 


3.66 


3.67 


3.68 


S. Benferhat, S. Lagrue, 0. Papini: Reasoning with 
partially ordered information in a possibilistic logic 
framework, Fuzzy Set. Syst. 144, 25-41 (2004) 

S. Benferhat, H. Prade: Encoding formulas with 
partially constrained weights in a possibilistic-like 
many-sorted propositional logic, Proc. 9th Int. Jt. 
Conf. Artif. Intell. (UCAI'05) (2005) pp. 1281-1286 

S. Benferhat, H. Prade: Compiling possibilistic 
knowledge bases, Proc. 17th Eur. Conf. Artif. Intell. 
(Riva del Garda, Italy 2006) 

D. Dubois, S. Konieczny, H. Prade: Quasi- 
possibilistic logic and its measures of information 
and conflict, Fundam. Inform. 57, 101-125 (2003) 
D. Dubois, F. Esteva, L. Godo, H. Prade: Fuzzy-set 
based logics — An history-oriented presentation of 
their main developments. In: Handbook of the His- 
tory of Logic, the Many-Valued and Nonmonotonic 
Turn in Logic, Vol. 8, ed. by D.M. Gabbay, J. Woods 
(Elsevier, Amsterdam 2007) pp. 325-449 

L. Boldrin, C. Sossai: Local possibilistic logic, J. Appl. 
Non-Class. Log. 7, 309-333 (1997) 

P. Dellunde, L. Godo, E. Marchioni: Extending pos- 
sibilistic logic over Gödel logic, Int. J. Approx. Rea- 
son. 52, 63-75 (2011) 

P. Hajek, D. Harmancova, R. Verbrugge: A qualita- 
tive fuzzy possibilistic logic, Int. J. Approx. Reason. 
12(1), 1-19 (1995) 

S. Lehmke: Logics which Allow Degrees of Truth and 
Degrees of Validity, Ph.D. Thesis (Universität Dort- 
mund, Germany 2001) 

L.A. Zadeh: Fuzzy logic and approximate reasoning 
(In memory of Grigore Moisil), Synthese 30, 407-428 
(1975) 

D. Dubois, H. Prade, S. Sandri: A. Possibilistic logic 
with fuzzy constants and fuzzily restricted quanti- 
fiers. In: Logic Programming and Soft Computing, 
ed. by T.P. Martin, F. Arcelli-Fontana (Research 
Studies Press Ltd., Baldock, England 1998) pp. 69- 
90 

T. Alsinet, L. Godo: Towards an automated de- 
duction system for first-order possibilistic logic 
programming with fuzzy constants, Int. J. Intell. 
Syst. 17, 887-924 (2002) 

T. Alsinet: Logic Programming with Fuzzy Unifi- 
cation and Imprecise Constants: Possibilistic Se- 
mantics and Automated Deduction, Ph.D. The- 
sis (Technical University of Catalunya, Barcelona 
2001) 

T. Alsinet, L. Godo, S. Sandri: Two formalisms 
of extended possibilistic logic programming with 
context-dependent fuzzy unification: A compara- 
tive description, Electr. Notes Theor. Comput. Sci. 
66(5), 1-21 (2002) 

T. Alsinet, L. Godo: Adding similarity-based rea- 
soning capabilities to a Horn fragment of possi- 
bilistic logic with fuzzy constants, Fuzzy Set. Syst. 
144, 43-65 (2004) 

T. Alsinet, C. Chesfhevar, L. Godo, S. Sandri, 
G. Simari: Formalizing argumentative reasoning in 


55 


£ | Y Hed 


56 Part A | Foundations 
a possibilistic logic programming setting with fuzzy 3.84 S. Benferhat, S. Smaoui: Inferring interventions in 
unification, Int. J. Approx. Reason. 48, 711-729 product-based possibilistic causal networks, Fuzzy 
(2008) Set. Syst. 169(1), 26-50 (2011) 

3.69 T. Alsinet, C. Chesñevar, L. Godo, G. Simari: A logic 3.85 D. Dubois, H. Prade: What are fuzzy rules and how 
programming framework for possibilistic argumen- to use them, Fuzzy Set. Syst. 84, 169-185 (1996) 
tation: Formalization and logical properties, Fuzzy 3.86 D. Dubois, H. Prade, L. Ughetto: A new perspective 
Set. Syst. 159(10), 1208-1228 (2008) on reasoning with fuzzy rules, Int. J. Intell. Syst. 18, 

3.70 P. Nicolas, L. Garcia, l. Stéphan, C. Lefèvre: Possi- 541-567 (2003) 

ay bilistic uncertainty handling for answer set pro- 3.87 S. Galichet, D. Dubois, H. Prade: Imprecise specifi- 

A gramming, Ann. Math. Artif. Intell. 47(1/2), 139-181 cation of ill-known functions using gradual rules, 

> (2006) Int. J. Approx. Reason. 35, 205-222 (2004) 

TH 3.71 K. Bauters, S. Schockaert, M. De Cock, D. Ver- 3.88 D. Dubois, E. Huellermeier, H. Prade: A system- 
meir: Possibilistic answer set programming revis- atic approach to the assessment of fuzzy associ- 
ited, Proc. 26th Conf. Uncertainty in Artif. Intell. ation rules, Data Min. Knowl. Discov. 13, 167-192 
(UAI'10), Catalina Island (2010) pp. 48-55 (2006) 

3.72 W. Spohn: A general, nonprobabilistic theory of 3.89 D. Dubois, F. Esteva, P. Garcia, L. Godo, R. de Lopez 
inductive reasoning. In: Uncertainty in Artificial Mantaras, H. Prade: Fuzzy set modelling in cased- 
Intelligence, Vol. 4, ed. by R.D. Shachter (North based reasoning, Int. J. Intell. Syst. 13(4), 345-373 
Holland, Amsterdam 1990) pp. 149-158 (1998) 

3.73 R. Jeffrey: The Logic of Decision, 2nd edn. (Chicago 3.90 D. Dubois, E. Huellermeier, H. Prade: Fuzzy set- 
Univ. Press, Chicago 1983) based methods in instance-based reasoning, IEEE 

3.74 D. Dubois, H. Prade: A synthetic view of belief re- Trans. Fuzzy Syst. 10, 322-332 (2002) 
vision with uncertain inputs in the framework of 3.91 D. Dubois, E. Huellermeier, H. Prade: Fuzzy meth- 
possibility theory, Int. J. Approx. Reason. 17(2-3), ods for case-based recommendation and decision 
295-324 (1997) support, J. Intell. Inf. Syst. 27, 95-115 (2006) 

3.75 S. Benferhat, D. Dubois, H. Prade, M.-A. Williams: 3.92 E. Huellermeier, D. Dubois, H. Prade: Model adap- 
A framework for iterated belief revision using pos- tation in possibilistic instance-based reasoning, 
sibilistic counterparts to Jeffrey's rule, Fundam. IEEE Trans. Fuzzy Syst. 10, 333-339 (2002) 

Inform. 99(2), 147-168 (2010) 3.93 E. Huellermeier: Case-Based Approximate Reason- 

3.76 S. Benferhat, D. Dubois, L. Garcia, H. Prade: On the ing (Springer, Berlin 2007) 
transformation between possibilistic logic bases 3.94 D. Dubois, P. Fortemps: Computing improved opti- 
and possibilistic causal networks, Int. J. Approx. mal solutions to max-min flexible constraint sat- 
Reason. 29, 135-173 (2002) isfaction problems, Eur. J. Oper. Res. 118, 95-126 

3.77 N. Ben Amor, S. Benferhat: Graphoid properties (1999) 
of qualitative possibilistic independence relations, 3.95 M. Inuiguchi, H. Ichihashi, Y. Kume: Modality con- 
Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 13, strained programming problems: A unified ap- 
59-97 (2005) proach to fuzzy mathematical programming prob- 

3.78 N. Ben Amor, S. Benferhat, K. Mellouli: Anytime lems in the setting of possibility theory, Inf. Sci. 67, 
propagation algorithm for min-based possibilistic 93-126 (1993) 
graphs, Soft Comput. 8, 50-161 (2003) 3.96 D. Dubois, H. Fargier, H. Prade: Refinements of 

3.79 C. Borgelt, J. Gebhardt, R. Kruse: Possibilistic the maximin approach to decision-making in fuzzy 
graphical models. In: Computational Intelligence environment, Fuzzy Set. Syst. 81, 103-122 (1996) 
in Data Mining, ed. by G.D. Riccia (Springer, Wien 3.97 D. Dubois, P. Fortemps: Selecting preferred solu- 
2000) pp. 51-68 tions in the minimax approach to dynamic pro- 

3.80 W. Guezguez, N. Ben Amor, K. Mellouli: Qualitative gramming problems under flexible constraints, Eur. 
possibilistic influence diagrams based on quali- J. Oper. Res. 160, 582-598 (2005) 
tative possibilistic utilities, Eur. J. Oper. Res. 195, 3.98 S. Kaci, H. Prade: Mastering the processing of pref- 
223-238 (2009) erences by using symbolic priorities in possibilistic 

3.81 H. Fargier, N. Ben Amor, W. Guezguez: On the com- logic, Proc. 18th Eur. Conf. Artif. Intell. (ECAI'08), Pa- 
plexity of decision making in possibilistic decision tras (2008) pp. 376-380 
trees, Proc. UAI (2011) pp. 203-210 3.99 S. Benferhat, D. Dubois, S. Kaci, H. Prade: Bipolar 

3.82 R. Ayachi, N. Ben Amor, S. Benferhat: Experimental possibility theory in preference modeling: Repre- 
comparative study of compilation-based inference sentation, fusion and optimal solutions, Inf. Fusion 
in Bayesian and possibilitic networks, Lect. Notes 7, 135-150 (2006) 

Comput. Sci. 6857, 155-163 (2011) 3.100 D. Dubois, H. Fargier, J.-F. Bonnefon: On the quali- 

3.83 S. Benferhat: Interventions and belief change in tative comparison of decisions having positive and 


possibilistic graphical models, Artif. Intell. 174(2), 
177-189 (2010) 


negative features, J. Artif. Intell. Res. 32, 385-417 
(2008) 


Possibility Theory and Its Applications: Where Do We Stand? 


References 


3.101 


3.102 


3.103 


3.104 


3.105 


3.106 


3.107 


3.108 


3.109 


3.110 


3.111 


3.112 


3.113 


3.114 


3.115 


3.116 


3.117 


3.118 


J.-F. Bonnefon, D. Dubois, H. Fargier, S. Leblois: 
Qualitative heuristics for balancing the pros and 
the cons, Theor. Decis. 65, 71-95 (2008) 

A. Tversky, D. Kahneman: Advances in prospect 
theory: Cumulative representation of uncertainty, 
J. Risk Uncertain. 5, 297-323 (1992) 

S. Kaci: Working with Preferences: Less Is More 
(Springer, Berlin 2011) 

L.J. Savage: The Foundations of Statistics (Dover, 
New York 1972) 

D. Dubois, L. Godo, H. Prade, A. Zapico: On the 
possibilistic decision model: From decision under 
uncertainty to case-based decision, Int. J. Un- 
certain. Fuzziness Knowl.-Based Syst. 7, 631-670 
(1999) 

D. Dubois, H. Prade, P. Smets: New semantics for 
quantitative possibility theory, Lect. Notes Artif. In- 
tell. 2143, 410-421 (2001) 

R.R. Yager: Possibilistic decision making, IEEE Trans. 
Syst. Man Cybern. 9, 388-392 (1979) 

T. Whalen: Decision making under uncertainty 
with various assumptions about available infor- 
mation, IEEE Trans. Syst. Man Cybern. 14, 888-900 
(1984) 

D. Dubois, D. Le Berre, H. Prade, R. Sabbadin: Using 
possibilistic logic for modeling qualitative deci- 
sion: ATMS-based algorithms, Fundam. Inform. 37, 
1-30 (1999) 

H. Fargier, R. Sabbadin: Qualitative decision under 
uncertainty: Back to expected utility, Artif. Intell. 
164, 245-280 (2005) 

M. Grabisch, T. Murofushi, M. Sugeno (Eds.): Fuzzy 
Measures and Integrals — Theory and Applications 
(Physica, Heidelberg 2000) pp. 314-322 

D. Dubois, H. Prade, R. Sabbadin: Qualitative deci- 
sion theory with Sugeno integrals. In: Fuzzy Mea- 
sures and Integrals - Theory and Applications, 
(Physica, Heidelberg 2000) pp. 314-322 

D. Dubois, H. Prade, R. Sabbadin: Decision- 
theoretic foundations of possibility theory, Eur. 
J. Oper. Res. 128, 459-478 (2001) 

D. Dubois, H. Fargier, P. Perny, H. Prade: Qualita- 
tive decision theory with preference relations and 
comparative uncertainty: An axiomatic approach, 
Artif. Intell. 148, 219-260 (2003) 

D. Dubois, H. Fargier, H. Prade, R. Sabbadin: A sur- 
vey of qualitative decision rules under uncertainty. 
In: Decision-making Process-Concepts and Meth- 
ods, ed. by D. Bouyssou, D. Dubois, M. Pirlot, 
H. Prade (Wiley, London 2009) pp. 435-473 

D. Dubois, H. Fargier: Making discrete sugeno in- 
tegrals more discriminant, Int. J. Approx. Reason. 
50(6), 880-898 (2009) 

R. Sabbadin, H. Fargier, J. Lang: Towards qualita- 
tive approaches to multi-stage decision making, 
Int. J. Approx. Reason. 19(3-4), 441-471 (1998) 

D. Dubois, H. Prade: When upper probabilities are 
possibility measures, Fuzzy Set. Syst. 49, 65-74 
(1992) 


3.119 


3.120 


3.121 


3.122 


3.123 


3.124 


3.125 


3.126 


3.127 


3.128 


3.129 


3.130 


3.131 


3.132 


3.133 


3.134 


3.135 


3.136 


3.137 


3.138 


3.139 


D. Dubois, S. Moral, H. Prade: A semantics for pos- 
sibility theory based on likelihoods, J. Math. Anal. 
Appl. 205, 359-380 (1997) 

G. Shafer: Belief functions and possibility mea- 
sures. In: Analysis of Fuzzy Information, Vol. |: 
Mathematics and Logic, ed. by J.C. Bezdek (CRC, 
Boca Raton 1987) pp. 51-84 

S. Benferhat, D. Dubois, H. Prade: Possibilistic 
and standard probabilistic semantics of condi- 
tional knowledge bases, J. Log. Comput. 9, 873- 
895 (1999) 

V. Maslov: Méthodes Opératorielles (Mir Publica- 
tions, Moscow 1987) 

A. Puhalskii: Large Deviations and Idempotent 
Probability (Chapman Hall, London 2001) 

H.T. Nguyen, B. Bouchon-Meunier: Random sets 
and large deviations principle as a foundation for 
possibility measures, Soft Comput. 8, 61-70 (2003) 
G. De Cooman, D. Aeyels: Supremum-preserving 
upper probabilities, Inf. Sci. 118, 173-212 (1999) 

P. Walley, G. De Cooman: A behavioural model for 
linguistic uncertainty, Inf. Sci. 134, 1-37 (1999) 

J. Gebhardt, R. Kruse: The context model, Int. J. Ap- 
prox. Reason. 9, 283-314 (1993) 

C. Joslyn: Measurement of possibilistic histograms 
from interval data, Int. J. Gen. Syst. 26, 9-33 (1997) 
Y. Ben-Haim: Info-Gap Decision Theory: Decisions 
Under Severe Uncertainty, 2nd edn. (Academic 
Press, London 2006) 

A. Neumaier: Clouds, fuzzy sets and probability in- 
tervals, Reliab. Comput. 10, 249-272 (2004) 

S. Destercke, D. Dubois, E. Chojnacki: Unifying prac- 
tical uncertainty representations Part I: General- 
ized p-boxes, Int. J. Approx. Reason. 49, 649-663 
(2008); Part Il: Clouds, Int. J. Approx. Reason. 49, 
664-677 (2008) 

D. Dubois, H. Prade: Bayesian conditioning in pos- 
sibility theory, Fuzzy Set. Syst. 92, 223-240 (1997) 
G. De Cooman: Integration and conditioning in nu- 
merical possibility theory, Ann. Math. Artif. Intell. 
32, 87-123 (2001) 

P. Walley: Statistical Reasoning with Imprecise 
Probabilities (Chapman Hall, London 1991) 

G.J. Klir: A principle of uncertainty and information 
invariance, Int. J. Gen. Syst. 17, 249-275 (1990) 

J.F. Geer, G.J. Klir: A mathematical analysis of 
information-preserving transformations between 
probabilistic and possibilistic formulations of un- 
certainty, Int. J. Gen. Syst. 20, 143-176 (1992) 

D. Dubois, H. Prade, S. Sandri: On possibil- 
ity/probability transformations. In: Fuzzy Logic: 
State of the Art, ed. by R. Lowen, M. Roubens 
(Kluwer, Dordrecht 1993) pp. 103-112 

G.J. Klir, B.B. Parviz: Probability-possibility trans- 
formations: A comparison, Int. J. Gen. Syst. 21, 
291-310 (1992) 

D. Dubois, H. Prade: On several representa- 
tions of an uncertain body of evidence. In: 
Fuzzy Information and Decision Processes, ed. by 


57 


£ | Y Hed 


58 Part A | Foundations 
M. Gupta, E. Sanchez (North-Holland, Amsterdam 3.159 P. Bosc, O. Pivert: Modeling and querying uncertain 
1982) pp. 167-181 relational databases: A survey of approaches based 
3.140 P. Smets: Constructing the pignistic probability on the possible worlds semantics, Int. J. Uncer- 
function in a context of uncertainty. In: Uncertainty tain. Fuzziness Knowl.-Based Syst. 18(5), 565-603 
in Artificial Intelligence, Vol. 5, ed. by M. Henrion (2010) 
(North-Holland, Amsterdam 1990) pp. 29-39 3.160 P. Bosc, 0. Pivert: About projection-selection- 
3.141 A.W. Marshall, |. Olkin: Inequalities: Theory of Ma- join queries addressed to possibilistic relational 
jorization and Its Applications (Academic, New York databases, IEEE Trans. Fuzzy Syst. 13, 124-139 
ay 1979) (2005) 
4 3.142 Z.W. Birnbaum: On random variables with compa- 3.161 P. Bosc, 0. Pivert, H. Prade: A model based on pos- 
> rable peakedness, Ann. Math. Stat. 19, 76-81 (1948) sibilistic certainty levels for incomplete databases, 
T 3.143 D. Dubois, E. Huellermeier: Comparing probabil- Proc. 3rd Int. Conf. Scalable Uncertainty Man- 
ity measures using possibility theory: A notion of agement (SUM 2009) (Springer, Washington 2009) 
relative peakedness, Inter. J. Approx. Reason. 45, pp. 80-94 
364-385 (2007) 3.162 P. Bosc, O. Pivert, H. Prade: An uncertain database 
3.144 D. Dubois, L. Foulloy, G. Mauris, H. Prade: model and a query algebra based on possibilis- 
Probability-possibility transformations, triangular tic certainty, Proc. 2nd Int. Conf. Soft Comput. and 
fuzzy sets, and probabilistic inequalities, Reliab. Pattern Recognition (SoCPaR10), ed. by T.P. Martin 
Comput. 10, 273-297 (2004) (IEEE, Paris 2010) pp. 63-68 
3.145 G. Mauris, V. Lasserre, L. Foulloy: Fuzzy modelingof 3.163 A. HadjAli, S. Kaci, H. Prade: Database preference 
measurement data acquired from physical sensors, queries — A possibilistic logic approach with sym- 
IEEE Trans. Meas. Instrum. 49, 1201-1205 (2000) bolic priorities, Ann. Math. Artif. Intell. 63, 357-383 
3.146 C. Baudrit, D. Dubois: Practical representations of (2011) 
incomplete probabilistic knowledge, Comput. Stat. 3.164 P. Bosc, 0. Pivert, H. Prade: A possibilistic logic view 
Data Anal. 51, 86-108 (2006) of preference queries to an uncertain database, 
3.147 G. Mauris: Possibility distributions: A unified rep- Proc. 19th IEEE Int. Conf. Fuzzy Syst. (FUZZ-IEEE10), 
resentation of usual direct-probability-based pa- Barcelona (2010) 
rameter estimation methods, Int. J. Approx. Rea- 3.165 D. Dubois, H. Prade: Handling bipolar queries 
son. 52, 1232-1242 (2011) in fuzzy information processing. In: Fuzzy Infor- 
3.148 D. Dubois, H. Prade, P. Smets: A definition of sub- mation Processing in Databases, Vol. 1, ed. by 
jective possibility, Int. J. Approx. Reason. 48, 352- J. Galindo (Information Science Reference, Hershey 
364 (2008) 2008) pp. 97-114 
3.149 D. Dubois, H. Prade: Unfair coins and necessity 3.166 S. Zadrozny, J. Kacprzyk: Bipolar queries - An ag- 
measures: A possibilistic interpretation of his- gregation operator focused perspective, Fuzzy Set. 
tograms, Fuzzy Set. Syst. 10(1), 15-20 (1983) Syst. 196, 69-81 (2012) 
3.150 D. Dubois, H. Prade: Evidence measures based on 3.167 P. Bosc, O. Pivert: On a fuzzy bipolar relational al- 
fuzzy information, Automatica 21, 547-562 (1985) gebra, Inf. Sci. 219, 1-16 (2013) 
3.151 W. Van Leekwijck, E.E. Kerre: Defuzzification: Crite- 3.168 D. Dubois, J. Mengin, H. Prade: Possibilistic un- 
ria and classification, Fuzzy Set. Syst. 108, 303-314 certainty and fuzzy features in description logic. 
(2001) A preliminary discussion. In: Fuzzy Logic and the 
3.152 D. Dubois, H. Prade: The mean value of a fuzzy Semantic Web, ed. by E. Sanchez (Elsevier, Amster- 
number, Fuzzy Set. Syst. 24, 279-300 (1987) dam 2005) 
3.153 R.R. Yager: A procedure for ordering fuzzy subsets 3.169 G. Qi, Q. Ji, J.Z. Pan, J. Du: Extending description 
of the unit interval, Inf. Sci. 24, 143-161 (1981) logics with uncertainty reasoning in possibilistic 
3.154 S. Chanas, M. Nowakowski: Single value simulation logic, Int. J. Intell. Syst. 26(4), 353-381 (2011) 
of fuzzy variable, Fuzzy Set. Syst. 25, 43-57 (1988) 3.170 D. Dubois, H. Prade: Possibility theory and data fu- 
3.155 P. Bosc, H. Prade: An introduction to the fuzzy sion in poorly informed environments, Control Eng. 
set and possibility theory-based treatment of soft Pract. 2(5), 811-823 (1994) 
queries and uncertain of imprecise databases. In: 3.171 D. Dubois, H. Prade, R.R. Yager: Merging fuzzy in- 
Uncertainty Management in Information Systems, formation. In: Fuzzy Sets in Approximate Reasoning 
ed. by P. Smets, A. Motro (Kluwer, Dordrecht 1997) and Information Systems, The Handbooks of Fuzzy 
pp. 285-324 Sets Series, ed. by J. Bezdek, D. Dubois, H. Prade 
3.156 M. Cayrol, H. Farreny, H. Prade: Fuzzy pattern- (Kluwer, Boston 1999) pp. 335-401 
matching, Kybernetes 11(2), 103-116 (1982) 3.172 M. Oussalah, H. Maaref, C. Barret: From adaptive 
3.157 D. Dubois, H. Prade, C. Testemale: Weighted fuzzy to progressive combination of possibility distribu- 
pattern matching, Fuzzy Set. Syst. 28, 313-331 (1988) tions, Fuzzy Set. Syst. 139(3), 559-582 (2003) 
3.158 Y. Loiseau, H. Prade, M. Boughanem: Qualitative 3.173 M. Oussalah: Study of some algebraic properties of 


pattern matching with linguistic terms, Al Com- 
mun. 17(1), 25-34 (2004) 


adaptive combination rules, Fuzzy Set. Syst. 114(3), 
391-409 (2000) 


Possibility Theory and Its Applications: Where Do We Stand? 


References 


3.174 


3.178 


3.179 


3.180 


3.181 


3.182 


3.183 


3.184 


3.189 


3.190 


S. Destercke, D. Dubois, E. Chojnacki: Possibilistic 
information fusion using maximum coherent sub- 
sets, IEEE Trans. Fuzzy Syst. 17, 79-92 (2009) 

S. Benferhat, D. Dubois, H. Prade: From semantic to 
syntactic approaches to information combination 
in possibilistic logic. In: Aggregation and Fusion of 
Imperfect Information, ed. by B. Bouchon-Meunier 
(Physica-Verlag, Heidelberg 1998) pp. 141-161 

S. Benferhat: Merging possibilistic networks, Proc. 
17th Eur. Conf. Artif. Intell. (Riva del Garda, Italy 
2006) 

S. Benferhat, D. Dubois, S. Kaci, H. Prade: Pos- 
sibilistic merging and distance-based fusion of 
propositional information, Ann. Math. Artif. Intell. 
34(1-3), 217-252 (2002) 

G. Qi, W. Liu, D.H. Glass, D.A. Bell: A split- 
combination approach to merging knowledge 
bases in possibilistic logic, Ann. Math. Artif. Intell. 
48(1/2), 45-84 (2006) 

G. Qi, W. Liu, D.A. Bell: Measuring conflict and 
agreement between two prioritized knowledge 
bases in possibilistic logic, Fuzzy Set. Syst. 161(14), 
1906-1925 (2010) 

S. Konieczny, J. Lang, P. Marquis: Distance based 
merging: A general framework and some complex- 
ity results, Proc. Int. Conf. Principles of Knowl- 
edge Representation and Reasoning (2002) pp. 97- 
108 

S. Benferhat, D. Dubois, H. Prade, M.-A. Williams: 
A practical approach to revising prioritized knowl- 
edge bases, Stud. Log. 70, 105-130 (2002) 

S. Barro, R. Marin, J. Mira, A.R. Paton: A model and 
a language for the fuzzy representation and han- 
dling of time, Fuzzy Set. Syst. 61, 153175 (1994) 

D. Dubois, H. Prade: Processing fuzzy temporal 
knowledge, IEEE Trans. Syst. Man Cybern. 19(4), 
729-744 (1989) 

D. Dubois, A. Hadj Ali, H. Prade: Fuzziness and un- 
certainty in temporal reasoning, J. Univer. Comput. 
Sci. 9(9), 1168-1194 (2003) 

M.A. Cardenas Viedma, R. Marin, |. Navarrete: Fuzzy 
temporal constraint logic: A valid resolution prin- 
ciple, Fuzzy Set. Syst. 117(2), 231-250 (2001) 

D. Dubois, A. Hadj Ali, H. Prade: A possibility 
theory-based approach to the handling of un- 
certain relations between temporal points, Int. 
J. Intell. Syst. 22, 157-179 (2007) 

D. Dubois, J. Lang, H. Prade: Timed possibilistic 
logic, Fundam. Inform. 15, 211-234 (1991) 

D. Dubois, H. Fargier, H. Prade: Fuzzy constraints in 
job-shop scheduling, J. Intell. Manuf. 6, 215-234 
(1995) 

R. Slowinski, M. Hapke (Eds.): Scheduling under 
Fuzziness (Physica, Heidelberg 2000) 

D. Dubois, H. Fargier, P. Fortemps: Fuzzy schedul- 
ing: Modelling flexible constraints vs. coping with 
incomplete knowledge, Eur. J. Oper. Res. 147, 231- 
252 (2003) 


3.193 


3.194 


3.195 


3.196 


3.197 


3.198 


3.199 


3.200 


3.201 


3.202 


3.203 


3.204 


S. Chanas, P. Zielinski: Critical path analysis in the 
network with fuzzy activity times, Fuzzy Set. Syst. 
122, 195-204 (2001) 

S. Chanas, D. Dubois, P. Zielinski: Necessary crit- 
icality in the network with imprecise activity 
times, IEEE Trans. Man Mach. Cybern. 32, 393-407 
(2002) 

J. Fortin, P. Zielinski, D. Dubois, H. Fargier: Criti- 
cality analysis of activity networks under interval 
uncertainty, J. Sched. 13, 609-627 (2010) 

D. Dubois, J. Fortin, P. Zielinski: Interval PERT and 
its fuzzy extension. In: Production Engineering and 
Management Under Fuzziness, ed. by C. Kahraman, 
M. Yavuz (Springer, Berlin 2010) pp. 171-199 

S. Chanas, A. Kasperski: Possible and necessary 
optimality of solutions in the single machine 
scheduling problem with fuzzy parameters, Fuzzy 
Set. Syst. 142, 359-371 (2004) 

A. Kasperski: A possibilistic approach to sequencing 
problems with fuzzy parameters, Fuzzy Set. Syst. 
150, 77-86 (2005) 

J. Fortin, A. Kasperski, P. Zielinski: Some methods 
for evaluating the optimality of elements in ma- 
troids with ill-known weights, Fuzzy Set. Syst. 16, 
1341-1354 (2009) 

A. Kasperski, P. Zielenski: Possibilistic bottleneck 
combinatorial optimization problems with ill- 
known weights, Int. J. Approx. Reason. 52, 1298- 
1311 (2011) 

J. C. Helton, W. L. Oberkampf (Eds.): Alternative 
representations of uncertainty, Reliab. Eng. Syst. 
Saf. 85(1-3), (2004) 

D. Guyonnet, B. Bourgine, D. Dubois, H. Fargier, 
B. Côme, J.-P. Chilès: Hybrid approach for address- 
ing uncertainty in risk assessments, J. Environ. Eng. 
129, 68-78 (2003) 

D. Dubois, H. Prade: Random sets and fuzzy interval 
analysis, Fuzzy Set. Syst. 42, 87-101 (1991) 

C. Baudrit, D. Guyonnet, D. Dubois: Joint propa- 
gation and exploitation of probabilistic and possi- 
bilistic information in risk assessment, IEEE Trans. 
Fuzzy Syst. 14, 593-608 (2006) 

C. Baudrit, |. Couso, D. Dubois: Joint propagation of 
probability and possibility in risk analysis: Towards 
a formal framework, Inter. J. Approx. Reason. 45, 
82-105 (2007) 

C. Baudrit, D. Dubois, N. Perrot: Representing para- 
metric probabilistic models tainted with impreci- 
sion, Fuzzy Set. Syst. 159, 1913-1928 (2008) 

M. Gil (Ed.): Fuzzy random variables, Inf. Sci. 133, 
(2001) Special Issue 

R. Sabbadin: Towards possibilistic reinforcement 
learning algorithms, FUZZ-IEEE 2001, 404-407 
(2001) 

S. Benferhat, D. Dubois, S. Lagrue, H. Prade: A big- 
stepped probability approach for discovering de- 
fault rules, Int. J. Uncertain. Fuzziness Knowl.- 
Based Syst. 11(Supplement), 1-14 (2003) 


59 


£ | Y Hed 


60 PartA 


Foundations 


£ | Y Hed 


3.208 


3.209 


3.210 


3.211 


3.212 


3.213 


3.214 


3.215 


3.216 


3.217 


3.218 


3.219 


3.220 


H. Prade, M. Serrurier: Bipolar version space learn- 
ing, Int. J. Intell. Syst. 23, 1135-1152 (2008) 

H. Prade, M. Serrurier: Introducing possibilistic 
logic in ILP for dealing with exceptions, Artif. In- 
tell. 171, 939-950 (2007) 

0. Wolkenhauer: Possibility Theory with Applica- 
tions to Data Analysis (Research Studies Press, 
Chichester 1998) 

H. Tanaka, P.J. Guo: Possibilistic Data Analysis for 
Operations Research (Physica, Heidelberg 1999) 

D. Cayrac, D. Dubois, H. Prade: Handling uncer- 
tainty with possibility theory and fuzzy sets in 
a Satellite fault diagnosis application, IEEE Trans. 
Fuzzy Syst. 4, 251-269 (1996) 

S. Boverie: Online diagnosis of engine dyno test 
benches: A possibilistic approach, Proc. 15th. Eur. 
Conf. Artif. Intell. (IOS, Lyon, Amsterdam 2002) 
pp. 658-662 

L. Amgoud, H. Prade: Reaching agreement through 
argumentation: A possibilistic approach, Proc. 9th 
Int. Conf. Principles of Knowledge Representation 
and Reasoning (KR'04), Whistler, BC, Canada (AAAI, 
Palo Alto 2004) pp. 175-182 

L. Amgoud, H. Prade: Using arguments for mak- 
ing and explaining decisions, Artif. Intell. 173(3-4), 
413-436 (2009) 

D. Dubois, F. de Dupin Saint-Cyr, H. Prade: 
A possibility-theoretic view of formal concept 
analysis, Fundam. Inform. 75, 195-213 (2007) 

Y. Djouadi, H. Prade: Possibility-theoretic exten- 
sion of derivation operators in formal concept 
analysis over fuzzy lattices, Fuzzy Optim. Decis. 
Mak. 10, 287-309 (2011) 

D. Dubois, H. Prade: Generalized possibilistic logic, 
Lect. Notes Comput. Sci. 6929, 428-432 (2011) 

D. Dubois, H. Prade, S. Schockaert: Rules and meta- 
rules in the framework of possibility theory and 
possibilistic logic, Sci. Iran. 18(3), 566-573 (2011) 

D. Dubois, H. Prade: Toward multiple-agent exten- 
sions of possibilistic logic, Proc. IEEE Int. Conf. on 


3.221 


3.222 


3.223 


3.224 


3.225 


3.226 


3.227 


3.228 


3.229 


3.230 


Fuzzy Syst. (FUZZ-IEEE'07), London (2007) pp. 187- 
192 

D. Dubois, H. Prade, A. Rico: A possibilistic logic 
view of Sugeno integrals. In: Proc. Eurofuse Work- 
shop on Fuzzy Methods for Knowledge-Based Sys- 
tems (EUROFUSE 2011), Advances in Intelligent and 
Soft Computing, Vol. 107, ed. by P. Melo-Pinto, 
P. Couto, C. Serôdio, J. Fodor, B. De Baets (Springer, 
Berlin 2011) pp. 19-30 

G. Banon: Constructive decomposition of fuzzy 
measures in terms of possibility and necessity 
measures, Proc. VIth IFSA World Congress, Vol. | (São 
Paulo, Brazil 1995) pp. 217-220 

D. Dubois: Fuzzy measures on finite scales as fami- 
lies of possibility measures, Proc. 7th Conf. Eur. Soc. 
Fuzzy Logic Technol. (EUSFLAT'N!) (Atlantis, Annecy 
2011) pp. 822-829 

H. Prade, A. Rico: Possibilistic evidence, Lect. Notes 
Artif. Intell. 6717, 713-724 (2011) 

D. Dubois, H. Prade, A. Rico: Qualitative capacities 
as imprecise possibilities, Lect. Notes Comput. Sci. 
7958, 169-180 (2011) 

P. Diamond: Fuzzy least squares, Inf. Sci. 46, 141-157 
(1988) 

K. Loquin, D. Dubois: Kriging and epistemic uncer- 
tainty: A critical discussion. In: Methods for Han- 
dling Imperfect Spatial Information, Vol. 256, ed. 
by R. Jeansoulin, 0. Papini, H. Prade, S. Schockaert 
(Springer, Berlin 2010) pp. 269-305 

D. Dubois, H. Prade: Gradualness, uncertainty and 
bipolarity: Making sense of fuzzy sets, Fuzzy Set. 
Syst. 192, 3-24 (2012) 

K. Loquin, D. Dubois: A fuzzy interval analysis ap- 
proach to kriging with ill-known variogram and 
data, Soft Comput. 16(5), 769-784 (2012) 

H. Prade, M. Serrurier: Maximum-likelihood 
principle for possibility distributions viewed as 
families of probabilities, Proc. IEEE Int. Conf. 
Fuzzy Syst. (FUZZ-IEEE'11), Taipei (2011) pp. 2987- 
2993 


4, Aggregation Functions on [0,1] 


Radko Mesiar, Anna Kolesárová, Magda Komornikova 


After a brief presentation of the history of aggrega- 
tion, we recall the concept of aggregation functions 
on [0,1] and on a general interval / C [—oo, oo]. 
We give a list of basic examples as well as some 
peculiar examples of aggregation functions. Af- 
ter discussing the classification of aggregation 
functions on [0,1] and presenting the prototyp- 
ical examples for each introduced class, we also 
recall several construction methods for aggrega- 
tion functions, including optimization methods, 
extension methods, constructions based on given 


Aggregation (fusion, joining) of several input values 
into one, in some sense the most informative value, 
is a basic processing method in any field dealing with 
quantitative information. We only recall mathematics, 
physics, economy, sociology or finance, among others. 


4.1 Historical and Introductory Remarks........ 61 
4.2 Classification of Aggregation Functions .... 63 
4.3 Properties and Construction Methods....... 66 
4.4 Concluding Remarks...................00:0000 71 
Referentes. a o cscs tn osdesseccdeneisivewsstaccomboancuameenccs 72 


aggregation functions, and introduction of 
weights. Finally, a remark on aggregation of more 
general inputs, such as intervals, distribution 
functions, or fuzzy sets, is added. 


Basic arithmetical operations of addition and multipli- 
cation on [0, co] are typical examples of aggregation 
functions. As another example let us recall integration 
and its application to geometry allowing us to compute 
areas, surfaces, volumes, etc. 


4.1 Historical and Introductory Remarks 


Just in the field of integration one can find the first 
historical traces of aggregation known in the written 
form. Recall the Moscow mathematical papyrus and 
its problem no. 14, dating back to 1850 BC, concern- 
ing the computation of the volume of a pyramidal 
frustum [4.1], or the exhaustive method allowing to 
compute several types of areas proposed by Eudoxus 
of Cnidos around 370 BC [4.2]. The roots of a re- 
cent penalty-based method of constructing aggregation 
functions [4.3] can be found in books of Appolonius 
of Perga (living in the period about 262-190 BC) who 
(motivated by the center of gravity problems) proposed 
an approach leading to the centroid, i.e., to the arith- 
metic mean, minimizing the sum of squares of the 
Euclidean distances of the given n points from an un- 
known but fixed one. Generalization of the Appolonius 


of Perga method based on a general norm is known as 
the Fréchet mean, or also as the Karcher mean, and it 
was deeply discussed in [4.4]. 

Another type of mean, the Heronian mean of two 
nonnegative numbers x and y is given by the formula 


He(x, y) = z (x+ /xyty) . (4.1) 
It is named after Hero of Alexandria (10-70 AD) who 
used this aggregation function for finding the volume of 
a conical or pyramidal frustum. He showed that this vol- 
ume is equal to the product of the height of the frustum 
and the Heronian mean of areas of parallel bases. 
Another interesting historical example can be found 
in multivalued logic. Already Aristotle (384-322 BC) 


61 


v 
g 

Gæ] 

pes 
D> 
p 
=à 


62 


L'h |Y Hed 


Part A 


Foundations 


was a classical logician who did not fully accept the 
law of excluded middle, but he did not create a system 
of multivalued logic to explain this isolated remark (in 
the work De Interpretatione, chapter IX). Systems of 
multivalued logics considering 3, n (finitely many), and 
later also infinitely many truth degrees were introduced 
by Łukasiewicz [4.5], Post [4.6], Gödel [4.7], respec- 
tively, and in each of these systems the aggregation 
of truth values was considered (conjunction, disjunc- 
tion). 

Though several particular aggregation functions (or 
classes of aggregation functions) were discussed in 
many earlier works (we only recall means discussed 
around 1930 by Kolmogorov [4.8] and Nagumo [4.9], 
or later by Aczél [4.10], triangular norms and copu- 
las studied by Schweizer and Sklar in 1960s of the 
previous century and summarized in [4.11]), an inde- 
pendent theory of aggregation can be dated only about 
20 years back and the roots of its axiomatization can 
be found in [4.12—14]. Probably the first monograph 
devoted purely to aggregation is the monograph by 
Calvo et al. [4.15]. As a basic literature for any scientist 
interested in aggregation we recommend the mono- 
graphs [4.16—18]. 

In this chapter, not only we summarize some earlier, 
but also some recent results concerning aggregation, 
including classification, construction methods, and sev- 
eral examples. We will deal with inputs and outputs 
from the unit interval [0, 1]. Note that though, in gen- 
eral, we can consider an arbitrary interval J C [—oo, oo], 
there is no loss of generality (up to the isomorphism) 
when restricting our considerations to 7 = [0,1]. As 
an example, consider the aggregation of nonnegative 
inputs, i. e., fix Z = [0, oo[. Then any aggregation func- 
tion A on [0, oo[ can be seen as an isomorphic transform 
of some aggregation function B on [0,1], restricted 
to [0, 1[ and satisfying two constraints: 


i) Bx) =1 if and only ifx = (1,...,1), 
ii) sup {B(x) |x € [0, 1["}=1neEN. 


Note that any increasing bijection g : [0, 1[— [0, oo[ 
can be applied as the considered isomorphism. For 
more details about aggregation on a general interval 
I C [—o0, ov] refer to [4.17]. 

We can consider either aggregation functions with 
a fixed number n € N, n > 2, of inputs or extended ag- 
gregation functions defined for any number n € N of 
inputs. The number n is called the arity of the aggrega- 
tion function. 


Definition 4.1 

For a fixed n € N, n > 2, a function A : [0, 1]” — [0, 1] 
is called an (n-ary) aggregation function whenever it is 
increasing in each variable and satisfies the boundary 
conditions 


A(O,...,0)=0 and A(1,...,1)=1. 


A mapping A: U,en[0. 1]" — [0, 1] is called an ex- 
tended aggregation function whenever A(x) =x for 
each x € [0, 1], and for each n € N, n> 2, A | [0, 1]” is 
an n-ary aggregation function. 


The framework of extended aggregation functions 
is rather general, not relating different arities, and thus 
some additional constraints are often considered, such 
as associativity, decomposability, neutral element, etc. 

The Heronian mean He given in (4.1) is an example 
of a binary aggregation function. Prototypical examples 
of extended aggregation functions on [0, 1] are: 


@ The smallest extended aggregation function A, 
given by 
1 ifx=(,...,1) 


0 else. 


A(x) = | 


@ The greatest extended aggregation function A, 
given by 
0 ifx=(0,...,0) 


1 else. 


Ag(x) = | 
© The arithmetic mean M given by 


1 n 
M(x1,...5%Xn) = -2 a. 


i=1 


@ The geometric mean G given by 


G(x1,...,Xn) = (11s) i ; 


i=l 


@ The product /7 given by 


Tsin] e 


i=1 


@ The minimum Min given by 


Min(x1,..., Xn) = min {x1,..., Xn} . 


Aggregation Functions on [0,1] | 4.2 Classification of Aggregation Functions 63 


@ The maximum Max given by 


Max(x1,...,X,) = max {x1,...,X,} . 


© The truncated sum Sp (also known as the 
Lukasiewicz t-conorm) given by 


SL(X1,.--,Xn) = nin} oat ; 
i=1 


@ The 3-/7-operator E introduced in [4.19] and given 
by 


E s.es Xn) = 7 7 ; 
“1 a P=% + T7210 — x) 


with some convention covering the case 2, 
@ The Pascal weighted arithmetic mean Wp given by 


1 n m=i 
Wplxi, -3 Xn) = Ja (ije 
i=1 


As distinguished examples of n-ary aggregation 
functions for a fixed arity n > 2, recall the projections 
P; and order statistics OS;, i= 1,...,, given by 


Pi(x,-. 


š Xn) = Xj 
and 
OS; (x1 E Xn) = Xo (i), 


where o is an arbitrary permutation of (1,...,n) such 
that x6(1) < Xa) S++: < Xo nm). Observe that the first 
projection Pp = P; and the last projection PL = P,, can 
be seen as instances of extended aggregation functions 
Pp and PL, respectively. On the other hand, for any fixed 
n > 2, OS, is just Min | [0, 1]” and OS,, = Max | [0, 1]”. 

As a peculiar example of an extended aggre- 
gation function we can introduce the mapping V: 


Unen 0, 1]” > [0, 1] given by 


(4.2) 


1 
V(x, ..-,%n) = min (£a) Jl 


i=l 


4.2 Classification of Aggregation Functions 


Let us denote by A the class of all extended aggrega- 
tion functions, and by A, (for a fixed n > 2) the class of 
all n-ary aggregation functions. Several classifications 
of n-ary aggregation functions can be straightforwardly 
extended to the class A. The basic classification pro- 
posed by Dubois and Prade [4.20] distinguishes (both 
for n-ary and extended aggregation functions): 


© Conjunctive aggregation functions, 
C={AeE A|A < Min}, 

© Disjunctive aggregation functions, 
D= {A € A |A > Max}, 

© Averaging aggregation functions, 
Av = {A € A | Min < A < Max}, 

@ Mixed aggregation functions, 
M=A\(CUDU AY). 


Considering purely averaging aggregation func- 
tions Av? = Av \ {Min, Max}, we can see that the set 
{C,D, Av’, M} forms a partition of A. Note that the 
classes A, C, D, Av, Av? are convex, which is not the 
case of the class M. For the previously introduced ex- 


amples it holds: 


M, G, Wp, Pr, Pi E€ Av’, 
ITEC, 

S, VED, 

EEM. 


Observe that n-ary aggregation functions P; and 
OS;, i=1,...,n, are averaging, so are their convex 
sums, i.e., weighted arithmetic means 


W = 5 wiPi > 
i=1 
and ordered weighted averages (OWA operators) [4.21], 
OWA = > w,OS; , 
i=1 


with w; > 0 and $`; _;w;= 1. The binary Heronian 
mean He given in (4.1) is a convex combination of 


zh | Y Hed 


64 Part A | Foundations 
Ap(x, y) 
1 

0.5 
a=) 
g 
=l 
> Fig. 4.1 3D plot of the aggregation function A; defined by Fig. 4.2 3D plot of the aggregation function A» defined by 
F (4.3) (4.4) 
N 


the arithmetic mean M and the geometric mean G, 
He = M + 5G, and thus it is also averaging. 

Consider two binary aggregation functions Aj, A2 : 
[0, 1]? — [0, 1] given by 


Ai (x, y) = Med(0, 1,x + y—0.5) (4.3) 


and 


Ao(x, y) = Med(x+ y,0.5,x+y—1), (4.4) 
where Med is the standard median operator. Then 
A,,A2€™ but ŁA; + 4A. = M € Av. The 3D plots 
of aggregation functions A,, A and M are depicted in 
Figs. 4.1-4.3. 

More refined classifications of n-ary aggrega- 
tion functions are related to order statistics OS;, 
i=1,...,n. The conjunctive classification [4.22] 
deals with the partition of the class A, given by 
{C,...,Cn, Rc}, where the class of i-conjunctive ag- 
gregation functions, i= 1,...,n, is defined by 


C; = {A € A, | min{card{j | xj > A(x)} 
|x €[0, 1]"}=73 


= {Ae€A,|A<OS,—j;41 but not A < OS,—;}, 


where formally OSo = 0. 

In other words, A is i-conjunctive if and only if the 
aggregated value A(x) is dominated by at least i input 
values independently of x € [0, 1]”, but not by (i+ 1) 
values, in general. 


Clearly, the classes Cj,...,C, are pairwise disjoint 
and the remaining aggregation functions are members 
of the class Re = A, \\Uj~, Ci. If we come back to 
the above-mentioned basic classification of aggrega- 
tion functions (applied to A,,), we obtain C = C, and 
We = UZ] Ci = Av \ {Min}. The class We is called 
weakly conjunctive [4.22]. 

Similarly, we have a disjunctive type of classifica- 
tion of A, related to the partition {D,,...,D,,Ro}, 
with 


D; ={A€ A, |A > OS; but not A > OS;+1} , 


i=1,...,n. 


= aa 


Fig. 4.3 3D plot of the aggregation function SA} + 1A 
=M 


Aggregation Functions on [0,1] | 4.2 Classification of Aggregation Functions 65 


Then D, =D and for the class of weakly disjunc- 
tive aggregation functions Wp = "| D; we have 
Wp = Av\{Max}. Hence We () Wp = Av’, and A € 
Ui, C; if and only if A < Max, while A € Ui; D; if 
and only if A > Min. 

Note that the conjunctive and disjunctive classifica- 
tions can be applied to aggregation functions defined 
on posets, too [4.22], and that this approach to the 
classification of aggregation functions on [0, 1] was al- 
ready proposed by Marichal in [4.23] as i-tolerant and 
i-intolerant aggregation functions (Marichal’s approach 
based on order statistics is applicable when considering 
chains only). 

Observe that this approach to classification has no 
direct extension to extended aggregation functions. On 
the other hand, we have the next classification valid for 
extended aggregation functions only. We distinguish: 


@ Dimension decreasing aggregation functions form- 
ing the class A\,, satisfying A (x1, . . . , Xn, Xn+1) < 
A(Xi,...,Xn) for any nEN, xi, ...,Xn+1 € [0, 1], 
but violating the equality, in general. 

@ Dimension increasing aggregation functions form- 
ing the class A x, satisfying A (x1, . . . , Xn, Xn+1) = 
A(Xi,...,Xn) for any nEN, xi, ...,Xn+1 € [0, 1], 
but violating the equality, in general. 

@ Dimension averaging aggregation functions form- 


ing the class Z, satisfying A(x1,...,Xn,0) < 
Airera 0 LAs Xn, 1) for any neN, 
Xi, ...,Xn € [0, 1], and attaining strict inequalities 
for at least one x € [0, 1]”. 


Evidently, the classes A, A z, and a are dis- 
joint and i they, together with their reminder A \ (A\, U 
A a UA), form a partition of A. Let us note that each 
associative conjunctive aggregation function is dimen- 
sion decreasing, and thus, J, Min € Ax. Similarly, 
each associative disjunctive aggregation function is di- 
mension increasing, so, SL, Max € Az. 

Recently, Yager has introduced extended aggrega- 
tion functions with the self-identity property [4.24] 
characterized by the equality 


A(X,- Xn A(X... Xn)) =A, .. Xn) 


for any ne N and x,...,x, € [0,1] (e.g., the arith- 
metic mean M or the geometric mean G satisfy this 


property). Evidently, each such aggregation function 
satisfies 


Ais esim O < A(X, ..- 5 Xn, Xn-+1) 
<A(X1,..-5Xp, 1) 


forall n EN, x1,...,Xn+1 € [0, 1] and thus, if the strict 
inequalities are attained for some n € N and 


X1y 06+ 5Xnt1 E [0, 1], 


<> <> 
A belongs to A. So, for example, M, G €A. The ex- 


<> 
tended aggregation function V (4.2) also belongs to A. 
On the other hand, the first projection Pp does not be- 
long to 


<> 
AX, UAaUA, 


and the last projection Pz; belongs to A. Recall that if 
A € Ax, it is also said to have the downward attitude 
property [4.24]. Similarly, the upward attitude prop- 
erty introduced in [4.24] corresponds to the class A a. 
Dimension increasing aggregation functions were also 
considered in [4.25]. 

Let us return to the basic classification of aggre- 
gation functions and recall several distinguished types 
of aggregation functions belonging to the classes C, D, 
Av’, and M: 


© Conjunctive aggregation functions: Triangular 
norms [4.26, 27], copulas [4.27, 28], quasi-copulas 
(4.29, 30], and semicopulas [4.31]. 

© Disjunctive aggregation functions: Triangular 
conorms [4.26, 27], dual copulas [4.28]. 

© Averaging aggregation functions: (Weighted) 
quasi-arithmetic means [4.10], idempotent uni- 
norms [4.32], integrals based on capacities, 
including the Choquet and Sugeno integrals [4.18, 
33-36], also covering OWA [4.21], ordered 
weighted maximum (OWMax) [4.37] and ordered 
modular average (OMA) [4.38] operators, as well 
as lattice polynomials [4.39]. 

© Mixed aggregation functions: nonidempotent uni- 
norms [4.40], gamma-operators [4.41], special con- 
vex sums in fuzzy linear programming [4.42]. 


For more details concerning these aggregation func- 
tions see [4.17] or references given above. 


zh | Y Hed 


66 PartA | Foundations 


Eh | Y Hed 


4.3 Properties and Construction Methods 


Properties of aggregation functions are mostly related 
to the field of their application, such as multicriteria 
decision aid, multivalued logics, or probability theory, 
for example. Besides the standard analytical properties 
of functions, such as continuity, the Lipschitz prop- 
erty, and (perhaps adapted) algebraic properties, such 
as symmetry, associativity, bisymmetry, neutral ele- 
ment, annihilator, cancellativity, or idempotency [4.17, 
Chapter 2], the above-mentioned applied fields have 
brought into aggregation theory properties as decom- 
posability, conjunctivity, or n-increasigness. Each of the 
mentioned properties can be introduced for n-ary aggre- 
gation functions (excepting decomposability), and thus 
also for extended aggregation functions. However, in 
the case of extended aggregation functions, some prop- 
erties can be introduced in a stronger form, involving 
different arities in a single formula. 

For example, the (weak) idempotency of A € A 
means the idempotency of each A | [0, 1]”, which means 
that for each n € N and 
xX) =x. 


x€[0,1], A(x... 
—— 
n-times 


Note that an extended aggregation function A is idem- 
potent if and only if it is averaging, i. e., A € Av. The 
strong idempotency [4.15] of an extended aggregation 
function A € A means that 


Ax, nX) = A(x) 


k-times 


for each k € N and x € |J „en (0, 1]”. For example, the 
extended aggregation function Wp is idempotent but not 
strongly idempotent. 

Similarly, e € [0, 1] is a (weak) neutral element of 
an extended aggregation function A € A if and only if 
for each n > 2 and x € [0, 1]” such that x = e for j Æ i it 
holds A(x) = x;. On the other hand, e is a strong neutral 
element of an extended aggregation function A € A if 
and only if for any n > 2, x € [0, 1]” with x; = e, it holds 


A(X), + 004 M1 OHA 1s0 + n) 


SAX 155 Minis Ay. HH) . 


Obviously, if e is a strong neutral element of A € A then 
it is also a (weak) neutral element of A. As an example, 
consider the extended copula D € A given by 


D(x1,..., Xn) = x1 -min {x2,..., Xn} - 


Obviously, e = 1 is a weak neutral element of D. How- 
ever D (1, 5, 5) = 5 # 1 =D(4, 5)s i.e., e = 1 is not 
a strong neutral element of D. For a deeper discussion 
and exemplification of properties of aggregation func- 
tions we recommend [4.17]. 

Aggregation functions in many fields are con- 
strained by the required properties — axioms in each 
considered field. As a typical example recall multi- 
valued logics (fuzzy logics) with truth values domain 
[0, 1], where conjunction is modeled by means of trian- 
gular norms [4.26, 43,44]. Recall that a binary aggre- 
gation function T : [0, 1]? — [0, 1] is called a triangular 
norm (f-norm for short) whenever it is symmetric, 
associative and e = | is its neutral element. Due to as- 
sociativity, there is a genuine extension of a t-norm T 
into an extended aggregation function (we will also use 
the same notation T in this case). Then e = 1 is a strong 
neutral element for the extended T. However, without 
some additional properties we still cannot determine 
a t-norm convenient for our purposes. Requiring, for 
example, the idempotency of T, we obtain that the 
only solution is T = Min, the strongest triangular norm. 
Considering continuous triangular norms satisfying the 
diagonal inequalities 0 < T(x, x) < x for all x €]O, 1[, 
we can show that T is isomorphic to the product /7, i. e., 
there is an automorphism ø : [0, 1] — [0, 1] such that 
T(x, y) = o7! (TI (g(x), ¢(y))), and in the extended 
form, T(x1,....X») = Q7! (M (x1), . . -, @&%))). For 
more details and several other results we recom- 
mend [4.26]. 

As another example consider probability theory, 
namely the relationship between the joint distribu- 
tion function Fz of a random vector Z = (X),...,Xn), 
and the corresponding marginal one-dimensional dis- 
tribution functions Fy,,...,Fx,. By the Sklar theo- 


rem [4.45], for all (x1, ..., Xn) € R we have 


Fz(X1,---.Xn) = C (Fx, @1), -< Fx, An) 


for some n-ary aggregation function C. Obviously, 
constrained by the basic properties of probabil- 
ities, C should possess a neutral element e= 
1 and annihilator (zero element) a= 0, and the 
function C should be n-increasing (i.e., proba- 
bility P (Z € [u,v] x -+ x [un,vn]) =O for any n- 
dimensional box [u;, v1] x- +- x [utn, Val), which yields an 
axiomatic definition of copulas. More details for inter- 
ested readers can be found in [4.28]. Considering some 
additional constraints, we obtain special subclasses of 


Aggregation Functions on [0,1] | 4.3 Properties and Construction Methods 67 


copulas. For example, if we fix n= 2 and consider 
the stability of copulas with respect to positive pow- 
ers, i.e., the property C(x*, y+) = (C(x, y))* for each 
A €]0, oo[ and each (x, y) € [0, 1]*, then we obtain ex- 
treme value copulas (EV copulas) [4.46, 47]. Recall that 
a copula C : [0, 1]? > [0, 1] is an EV copula if and only 
if there is a convex function d : [0, 1] — [0, 1] such that 
for each t € [0, 1], max {t, 1 — t} < d(t) < 1 and for all 
(x,y) €]0, If, 


Cay) = (o) (S) 


(observe that on [0, 1]7\]0, 1[ for each copula it holds 
C(x, y) = min {x, y}). 

Our third example comes from economics. In mul- 
ticriteria decision problems, we often meet the require- 
ment of the comonotone additivity of the considered 
n-ary (extended) aggregation function A, i. e., we expect 
that A(x +y) = A(x) + A(y) for all x,y € [0, 1]” such 
that x +y € [0, 1]” and (x; —j) (Qi —yj) = 0 for any i,j € 
{1,...,m}. The comonotonicity of x and y means that 
the ordering on {1,...,} induced by x is not contradic- 
tory to that one induced by y. Due to Schmeidler [4.48], 
we know that then A is necessarily the Choquet inte- 
gral based on the fuzzy measure m : 2"!>--} — [0, 1], 
m(E) = A(1g), given by (4.6). 

The axiomatic approach to aggregation character- 
izes some special classes of aggregation functions. 
Another important look at aggregation involves con- 
struction methods. We can roughly divide them into the 
next four groups: 


© Optimization methods, 

© Extension methods, 

© Constructions based on the given aggregation func- 
tions, 

@ Introduction of weights. 


An exhaustive overview of construction methods for 
aggregation functions can be found in [4.17, Chapter 6]. 
Here we briefly recall the most distinguished ones. 

A typical optimization method is the penalty-based 
approach proposed in [4.49] and generalized in [4.3], 
where dissimilarity functions were introduced, see 
also [4.50]. 


Definition 4.2 
A function D : [0, 1]? — [0, oof given by 
D(x,y) = K F()—f()) ; 


where f:[0,1]— R is a continuous strictly mono- 
tone function and K : R —> [0, oo[ is a convex function 


attaining the unique minimum K(0) = 0, is called a dis- 
similarity function. 


Theorem 4.1 

Let D: [0,1]? = [0,00] be a dissimilarity function. 
Then for any ne N, x,...,x, € [0,1], the function 
h: [0,1] > R given by A(t) = X} ;—; D(x; t) attains its 
minimal value exactly on a closed interval [a, b] and the 
formula 


a+b 
2 


defines a strongly idempotent symmetric extended ag- 
gregation function A on (0, 1]. 


A(X1,...,Xn) = 


Construction given in Theorem 4.1 covers: 


© the arithmetic mean (D(x, y) = (x—y)’), 
© quasi-arithmetic means (D(x, y) = (f(x) -f (y))*), 
@ the median (D(x, y) = |x— y|), 


among others. This method is a generalization of the 
Appolonius of Perga method. Note that in general, 
a function D need not be symmetric, i.e., K need not 
be an even function (compare with the symmetry of 
metrics). As a typical example, let us recall the dissim- 
ilarity function De : [0, 1]? — [0, co], c €]0, cof, given 
by 

x—y ifx> y, 
c(y—x) 


yielding by means of Theorem 4.1 the a-quantile of 
a sample (x1, ... , Xn) witha = The 

As a possible generalization of Theorem 4.1, one 
can consider different dissimilarity functions D; (which 
violates the symmetry of the constructed aggregation 
function A). Consider, for example, Dj (x, y) = |x— 
y| and D2(x, y) = -++ = D, (x, y) = ++» = (x—y)*. Then 
the minimization of the sum )~_, D;(x;, f) results in 
the extended aggregation function A : (J „ey [0, 1]” > 
[0, 1] given by 


A(X1,.--,Xn) = 
Med (x;, M(x, .. 


BER = ifx<y 


Xn) — 0.5, M (x2, . . . , Xn) + 0.5) 


(4.5) 


whenever n > 1. 

Some other generalizations based on a generalized 
approach to dissimilarity (penalty) functions can be 
found in [4.16]. 


Eh | Y Hed 


68 Part A | Foundations 
Extension methods are based on a partial informa- This integral covers the Choquet integral if C is 

tion that is available about an aggregation function. As equal to the product copula IT, [77m = Chm, as well 
a typical example, we recall integral-based aggregation as the Sugeno integral in the case of the greatest 
functions. Suppose that for a fixed arity n the values of copula Min, IMin.mn = SUn. Observe that if the ca- 
an aggregation function A are known at Boolean inputs pacity m is symmetric, i.e., m(E) = Vearaz, Where 
only, i.e., we know A | {0, 1!" only. Identifying sub- 0 = vo < vı <+ < vn = 1, then Ic,m turns to OMA 
sets of the space X = {1, .. . , n} with the corresponding operator introduced in [4.38]. Its special instances 
characteristic functions, we get the set function m : are the OWA operators [4.21] based on the Choquet 
2X > [0, 1] given by m(E) = A(1g). Obviously, m is integral, 
monotone, i. e., m(E1) < m(E2) whenever E; C E, C X, f 
and m(@) =A(0, ...,0) =0, m(X) = A(1,...,1)= 1. OWA (x) = Yro Wis 
Note that m is often called a fuzzy measure [4.51,52] or = 
a capacity [4.17]. ; 

v Among several integral-based extension methods with w;=v;—vi—1, and the OWMax opera- 

5 we recall: tor [4.37], 

= _ ; eee 

= © The Choquet integral [4.53], Chn : [0, 1]" > [0, 1], Ea ar vere E gay |Past 

w For better understanding, fix n = 2, i.e., consider 


Chp) = J xo (m(Eo.i) — m(Eo.i+1)) 5 
i=l 


(4.6) 


where o : X — X is a permutation such that xo (1) < 
Xo (2) S+ SXo(n); Eo.i = {o(i),...,0(n)} for i = 
1,...,n, and Eg,,41 = Ø. Note that the Choquet in- 
tegral can be seen as a weighted arithmetic mean 
with the weights dependent on the ordinal structure 
of the input vector x. If the capacity m is additive, 
i.e., m(E) = Ve, m({i}), then 


Chn (x) = a ’ 


i=1 


where for the weights it holds w; = m({i}), i € X 
(hence X; w; = 1). 
© The Sugeno integral [4.51], Su, : [0, 1]” — [0, 1], 


Su„ (x) = max {min {xg (i, mM(Eg,i)} | i € X} . 


If m is maxitive, i. e., m(E) = max {m({i}) | i € E}, 
then we recognize the weighted maximum 
Su„ (x) = max {min {x;, v;i} | i€ X}, with weights 
v; = m({i}) (hence max {v; | i € X} = 1). 

@ The copula-based integral [4.34], Ic,m : [0, 1]” > 
[0, 1], where C : [0, 1]? — [0, 1] is a binary copula, 


n 


Ic m(x) = > (c (Xow, m(Ev,i)) 


i=l 


=C (xoi, mMEo,i+1))) - 


X = {1,2}. Then m({1}) =a and m({2}) = b are any 
constants from [0, 1], and m(@) = 0, m(X) = 1 due to 
the boundary conditions. The following equalities hold: 


ax+(1—a)y 
(1—b)x+ by else, 

© Sun(x, y) = max {min {a, x}, min {b, y} , min {x, y}}, 
C(x, a) + y— Cy, a) 
C(y, b) +x— C(x, db) 


TER 
© Ch,(,y) = ie 


ifx>y, 
e Toum(%, y) = dice 


The considered capacity m is symmetric if and only 
if a = b, and then: 


© Ch,, (x,y) = OWA(x, y) = (1 —a)- min {x, y} + a- 
max {x, y}, 

© Sun(x, y) = OW Max(x, y) = Med(x, a, y) is the so- 
called a-median [4.54, 55], 

© Ten (x,y) = OMA(x, y) = fi (min {x, y}) + 
fo(max {x, y}), where fi, fo: [0,1] — [0, 1] 
given by fı (t) = t— C(t, a) and fh (t) = C(t, a). 


are 


For more details concerning integral-based 
constructions of aggregation functions we recom- 
mend [4.34, 36, 56] or [4.34] by Klement, Mesiar, and 
Pap. 

Another kind of extension methods exploiting 
capacities is based on the Möbius transform. Re- 
call that for a capacity m:2* — [0,1], its Möbius 
transform y : 2X — R is given by 


WE) = YD mL) . 


LOE 


Aggregation Functions on [0,1] | 4.3 Properties and Construction Methods 69 


Theorem 4.2 

[4.57] Let C : [0, 1]" — [0, 1] be an n-ary copula, and 
m:2X — [0,1] a capacity. Then the function Ac,m : 
[0, 1]” — [0, 1] given by 


Acm@) = J uE): CEV le) 


ECX 


is an aggregation function. 


Special instances of Theorem 4.2 are the Lovász 
extension [4.58] corresponding to the strongest cop- 
ula Min (Amin.m = I,m = Chm is just the Cho- 
quet integral), and the Owen extension [4.59] cor- 
responding to the product copula M (Arm, mœ) = 
L rcx (WE) Mice x)). 

Several extension methods were introduced for bi- 
nary copulas, for example, in the case when only the 
information about their diagonal section ôc : [0, 1] > 
[0, 1], 5c(x) = C(x, x) is available. If 5 : [0, 1] > [0, 1] 
is any increasing 2-Lipschitz function such that 
6(0) = 0, 6(1) = 1, and d(x) < x for each x € [0, 1], then 
the formula 


d(x) + 8O) 


a z (Œy) € [0, 1}, 


D(x, y) = min 2 
defines a binary copula with dp = ô. Note that D is the 
greatest symmetric copula with the given diagonal sec- 
tion. Among numerous papers dealing with such types 
of extensions we recommend the overview paper [4.60]. 
Similarly, one can extend horizontal or vertical sections 
to copulas [4.61]. An overview of extension methods 
for triangular norms can be found in [4.26]. 

The third group of construction methods involves 
methods creating new aggregation functions from the 
given ones. These methods are applied either to aggre- 
gation functions with a fixed arity n, or to extended 
aggregation functions. Some of them can be applied to 
any kind of aggregation functions. As a typical exam- 
ple, recall transformation of aggregation functions by 
means of an automorphism ¢ : [0, 1] — [0, 1] Gi. e., an 
isomorphic transformation) given by 


Ag (x1, tee Xn) = yg! (A (g(x), sees 9(Xn))) G 
(4.7) 


Transformation (4.7) preserves all algebraic properties 
as well as the classification of aggregation functions. 
However, some analytical properties can be broken, 
for example, the Lipschitz property or n-increasigness. 
Some special classes of aggregation functions can be 


characterized by a unique member and its isomorphic 
transforms. Consider, for example, triangular norms. 
Then strict triangular norms are isomorphic to the prod- 
uct t-norm JI, nilpotent t-norms are isomorphic to 
the Lukasiewicz t-norm T,. Similarly, quasi-arithmetic 
means with no annihilator are isomorphic to the arith- 
metic mean M. The only n-ary aggregation functions 
invariant under isomorphic transformations are the lat- 
tice polynomials [4.62], i.e., the Choquet integrals 
with respect to {0, 1}-valued capacities. So, for n = 2, 
only Min, Max, Pp and P, are invariant under isomor- 
phic transformations. There are several generalizations 
of (4.7). One can consider, for example, decreasing bi- 
jections 7: [0, 1] — [0, 1] and define A, via (4.7). This 
type of transformations reverses the conjunctivity of ag- 
gregation function into disjunctivity, and vice versa. It 
preserves the existence of a neutral element (annihila- 
tor), however, if e is a neutral element of A (a is an 
annihilator of A) then n7! (e) is a neutral element of Ay, 
(n~! (a) is an annihilator of An). If ņ is involutive, i. e., 
if no n = idjo.1], then (Ay), =A, so there is a duality 
between A and Ay. The most applied duality is based 
on the standard (or Zadeh’s) negation 7: [0, 1] — [0, 1] 
given by n(x) = 1 — x. In that case, we use the notation 
A? = Ay and A4(x1,...,%,) = 1-A(1—11,..., 1— Xn). 
As a distinguished example recall the class of triangular 
conorms which are just the dual aggregation functions 
to triangular norms, i. e., S is a triangular conorm [4.26] 
if and only if there is a triangular norm T such that 
SEI, 

Further generalizations of (4.7) consider different 
automorphisms g, 91, . . . , n : [0, 1] — [0, 1], 


= Q (A (p1 1), +--+ Pn@n))) - (4.8) 


Moreover, it is enough to suppose that ¢),...,@, are 
monotone (not necessarily strictly) and satisfy g;(0) = 
0, gi(1)= 1,i=1,...,n, as in such case it also holds 
that for any aggregation function A, Ag,g,,...,0, given 
by (4.8) is an aggregation function. 

Another construction well known from functional 
theory is linked to the composition of functions. We 
have two kinds of composition methods. In the first one, 
considering a k-ary aggregation function B: [0, 1]‘ > 
[0, 1], we can choose arbitrary k aggregation functions 
Ci,..., Ck (either all of them are extended aggregation 
functions, or all of them are n-ary aggregation functions 
for some fixed n > 1), and then we can introduce a new 
aggregation function A (either extended, with the con- 


Eh | Y Hed 


70 Part A | Foundations 
vention A(x) = x, x € [0, 1]; or n-ary) such that Theorem 4.3 
Let f : [0, 1] — [—oo, oo] be a continuous strictly mono- 
A(x) =B(C\(x),..., Ci(x)) . (4.9) tone function, and let O = aọ <a; <---<a,=1 be 
a given sequence of real constants. Then for any system 
As a typical example of construction (4.9), consider B (iyi of n-ary (extended) aggregation functions the 
to be a weighted arithmetic mean W, W(x,,...,x,) = function A: [0, 1]" — [0, 1] (A : Unen 0, 1]” > [0, 1]) 
Ye Wixi. Then given by 
k k 
A(x) = >D wi Ci(x) , A(x) =f ice + (G—aGj—)A(x) 
i=1 J= 
. A . bi . f . fi . k—1 
i. e., A is a convex combination of aggregation functions y f(a) (4.11) 
m TE Ces j=l 
3 The second method is based on a partition of the 
> space of coordinates {1, ... , n} into subspaces where 
a 
w {1,...,m},{m4+1,...,m) +n2},..., 


{ny Hees +n ++ l,n}. 


Then, considering a k-ary aggregation function B: 
[0, 1] — [0, 1] and aggregation functions C; : [0, 1]" > 
[0,1], ¿= 1,...,k, we can define a composite aggrega- 
tion function A : [0, 1]” — [0, 1] by 


Alies) 
=B (Ci@i tae Xn). C2 (Xn) +1; tee ee 
sens CeO atthe eta n) > (4.10) 


This method can be generalized by considering an 
arbitrary partition of {1,...,} into {),..., 4}. As an 
example, consider the n-ary copula C : [0, 1]” — [0, 1] 
defined for a fixed partition {J),...,J} of {1,...,n} by 


k 
Ci.) = | | min {y JER 


i=1 


For more details, see [4.63]. 

The third group containing constructions based 
on some given aggregation functions can be seen as 
a group of patchwork methods. As typical examples, 
we can recall several types of ordinal sums. Besides 
the well-known Min-based ordinal sums for conjunctive 
aggregation functions (especially for triangular norms 
and copulas) [4.26, 64], W-ordinal sums for copulas 
(or quasi-copulas) [4.65], as well as g-ordinal sums 
for copulas [4.66], we recall one kind of ordinal sums 
introduced in [4.67] which is applicable to arbitrary ag- 
gregation functions. 


. Xj —G— 
xo = max fo, min fi, = ; 
a1 


is an n-ary (extended) aggregation function. 


Observe that if all A;’s are triangular norms (copu- 
las, quasi-copulas, triangular conorms, continuous ag- 
gregation functions, idempotent aggregation functions, 
symmetric aggregation functions) then so is the newly 
constructed aggregation function A. 

The fourth group contains construction methods 
allowing one to introduce weights into the aggrega- 
tion procedure. The quantitative look at weights can 
be seen as the corresponding repetition of inputs, and 
the weights roughly correspond to the occurrence of 
single input arguments. For example, when consid- 
ering a strongly idempotent (symmetric) aggregation 
function constructed by means of a dissimilarity func- 
tion D (see Theorem 4.1) and weights w1,...,Wpn 
(at least one of them should be positive, and all of 
them are nonnegative), we look for minimizers of the 
sum >, w;D(x;, t). For example, if D(x, y) = (x—y)*, 
then we obtain the weighted arithmetic mean 


Wires in = Se. 

Dimi Wi 
This approach can also be introduced in the case when 
different dissimilarity functions are applied. As an ex- 
ample, consider the aggregation function A : [0, 1]” > 


Aggregation Functions on [0,1] | 4.4 Concluding Remarks 71 


[0, 1] given by (4.5). We look for minimizers of the 
expression wı|x1 — t| + X; wi(x; — 1)” and the result- 
ing weighted aggregation function Aw : [0, 1]” — [0, 1] 
is given by 


Aw(x1, tee Xn) 


= Med (1M... 89) ow 
i=2 i 


Wi 
M(X2,.--5%n) + sera} 
2 im Wi 
Considering the integer weights w = (w1, ..., Wn), for 
an extended aggregation function A which is symmetric 


and strongly idempotent, we obtain the weighted aggre- 
gation function Aw : [0, 1]” — [0, 1] given by 


Aw(x1, see Xn) 
SA Ries ag Mi KOs ig HI jax ep More og Rp 
——$s§@+ — aAa 
w -times w2-times Wn-times 


The strong idempotency of A also allows one to in- 
troduce rational weights into aggregation. Observe that 


4.4 Concluding Remarks 


As already mentioned, all introduced results (some- 
times for special types of aggregation functions only) 
can be straightforwardly extended to any interval J C 
[—co, co]. Moreover, one can aggregate more gen- 
eral objects than real numbers. For example, a quite 
expanding field concerns interval mathematics. The 
aggregation of interval inputs can be done coordinate- 
wise, 


A (Pa, yi], +--+ Bn Yn]) 
= [Ai (x1, Sra , Xn), A421, series yn) 5 


where A), Ao are an arbitrary couple of classical aggre- 
gation functions such that A; < A2 (mostly A; = A2 is 
considered). However, there are also more sophisticated 
approaches [4.69]. 

Already in 1942, Menger [4.43] introduced the ag- 
gregation of distribution functions whose supports are 
contained in [0,00] (distance functions), which led 
not only to the concept of triangular norms [4.44], 
but also to triangle functions directly aggregating 


for each ke N, the weights k-w result in the same 
weighted aggregation function as when considering the 
weights w only. For general weights the limit approach 
described in [4.17, Proposition 6.27] should be applied. 

The qualitative approach to weights considers 
a transformation of inputs x1, . . . , Xn accordingly to the 
considered weights (importances) w1,...,Wn € [0, 1], 
with constraint that at least once it holds w; = 1. This 
approach is applied when we consider an extended 
aggregation function A with a strong neutral element 
e € (0, 1]. Then the weighted aggregation function Ay : 
[0, 1]” — [0, 1] is given by 


Ayi, see Xn) =A (h(w1, x1), Brena , A(Wn, Xn)) ’ 


where hh: [0,1]? — [0,1] is a relevancy transforma- 
tion (RET) operator [4.24, 68] satisfying h(0,x) =e, 
h(1,x) =x, which is increasing in the second coordi- 
nate as well as in the first coordinate for all x > e, while 
h(-,x) is decreasing for all x < e. As an example, con- 
sider the RET operator h given by 


h(w,x) =wx+ (1—w)e. 


For more details, we recommend [4.17, Chapter 6]. 


such distribution functions [4.70]. Some triangle func- 
tions are derived from special aggregation functions 
(triangular norms), some of them have more com- 
plex background (as a distinguished example recall 
the standard convolution of distribution functions). 
For an overview and details we recommend [4.71, 
72). 

In 1965, Zadeh [4.73] introduced fuzzy sets. Their 
aggregation, in particular union and intersection, is 
again built by means of special aggregation functions 
on [0, 1], namely by means of triangular conorms and 
triangular norms [4.26]. Triangular norms also play an 
important role in the Zadeh extension principle [4.74— 
76] allowing to extend standard aggregation functions 
acting on real inputs to the generalized aggregation 
functions acting on fuzzy inputs. As a typical exam- 
ple recall the arithmetic of fuzzy numbers [4.77]. In 
some special fuzzy logics also uninorms have found 
the application in modeling conjunctions. Among re- 
cent generalizations of fuzzy set theory recall the type 
2-fuzzy sets, including interval-valued fuzzy sets, or 


Hh |Y Hed 


72 


4 | Y Hed 


Part A 


Foundations 


n-fuzzy sets. In all these fields, a deep study of aggre- 
gation functions is one of the major theoretical tasks to 
build a sound background. 

Observe that all mentioned particular domains are 
covered by the aggregation on posets, where up to now 


References 


only some particular general results are known [4.22, 
78]. We expect an enormous growth of interest in this 
field, as it can be seen, for example, in its special sub- 
domain dealing with computing and aggregation with 
words [4.79—81]. 


4.1 


4.2 


4.3 


4.4 


4.5 


4.6 


4.7 


4.8 


4.9 


4.10 


4.11 


4.12 


4.13 


4.14 


4.15 


4.16 


4.17 


R.C. Archibald: Mathematics before the Greeks Sci- 
ence, Science 71(1831), 109-121 (1930) 

D. Smith: History of Mathematics (Dover, New York 
1958) 

T. Calvo, R. Mesiar, R.R. Yager: Quantitative weights 
and aggregation, IEEE Trans. Fuzzy Syst. 12(1), 62-69 
(2004) 

H. Karcher: Riemannian center of mass and mollifier 
smoothing, Commun. Pure Appl. Math. 30(5), 509- 
541 (1977), published online: 13 October 2006 

J. tukasiewicz: O logice trdjwartosciowej (in Pol- 
ish), Ruch Filoz. 5, 170-171 (1920); English translation: 
On three-valued logic. In: Selected Works by Jan 
Lukasiewicz, ed. by L. Borkowski (North-Holland, 
Amsterdam 1970) pp. 87-88 

E.L. Post: Introduction to a general theory of ele- 
mentary propositions, Am. J. Math. 43, 163-185 (1921) 
K. Gödel: Zum intuitionistischen Aussagenkalkil, 
Anz. Akad. Wiss. Wien 69, 65-66 (1939) 

A.N. Kolmogoroff: Sur la notion de la moyenne, Ac- 
cad. Naz. Lincei Mem. Cl. Sci. Fis. Mat. Nat. Sez. 12, 
388-391 (1930) 

M. Nagumo: Uber eine Klasse der Mittelwerte, Jpn. 
J. Math. 6, 71-79 (1930) 

J. Aczél: Lectures on Functional Equations and Their 
Applications (Academic, New York 1966) 

B. Schweizer, A. Sklar: Probabilistic Metric Spaces, 
Ser. Probab. Appl. Math, Vol. 5 (North-Holland, New 
York 1983) 

G.J. Klir, T.A. Folger: Fuzzy Sets, Uncertainty, and In- 
formation (Prentice-Hall, Hemel Hempstead 1988) 
A. Kolesárová, M. Komornikova: Triangular norm- 
based iterative aggregation and compensatory op- 
erators, Fuzzy Sets Syst. 104, 109-120 (1999) 

R. Mesiar, M. Komornikova: Triangular norm—based 
aggregation of evidence under fuzziness. In: Stud- 
ies in Fuzziness and Soft Computing, Aggregation 
and Fusion of Imperfect Information, Vol. 12, ed. 
by B. Bouchon-Meunier (Physica, Heidelberg 1998) 
pp. 11-35 

T. Calvo, A. Kolesárová, M. Komornikova, R. Mesiar: 
A Review of Aggregation Operators (Univ. of Alcala 
Press, Alcala de Henares, Madrid 2001) 

G. Beliakov, A. Pradera, T. Calvo: Aggregation Func- 
tions: A Guide for Practitioners (Springer, Berlin 
2007) 

M. Grabisch, J.-L. Marichal, R. Mesiar, E. Pap: Ag- 
gregation Functions, Encyclopedia of Mathematics 


4.18 


4.19 


and Its Applications, Vol. 127 (Cambridge Univ. Press, 
Cambridge 2009) 

Y. Narukawa (Ed.): Modeling Decisions: Information 
Fusion and Aggregation Operators, Cognitive Tech- 
nologies (Springer, Berlin, Heidelberg 2007) 

R.R. Yager, D.P. Filev: Essentials of Fuzzy Modelling 
and Control (Wiley, New York 1994) 

D. Dubois, H. Prade: On the use of aggregation op- 
erations in information fusion processes, Fuzzy Sets 
Syst. 142, 143-161 (2004) 

R.R. Yager: On ordered weighted averaging aggrega- 
tion operators in multicriteria decision making, IEEE 
Trans. Syst. Man. Cybern. 18, 183-190 (1988) 

M. Komornikova, R. Mesiar: Aggregation functions on 
bounded partially ordered sets and their classifica- 
tion, Fuzzy Sets Syst. 175(1), 48-56 (2011) 

J.-L. Marichal: k-intolerant capacities and Choquet 
integrals, Eur. J. Oper. Res. 177(3), 1453-1468 (2007) 
R.R. Yager: Aggregation operators and fuzzy systems 
modeling, Fuzzy Sets Syst. 67(2), 129-146 (1995) 

M. Gagolewski, P. Grzegorzewski: Arity-monotonic 
extended aggregation operators, Commun. Comput. 
Inform. Sci. 80, 693-702 (2010) 

E.P. Klement, R. Mesiar, E. Pap: Triangular Norms 
(Kluwer, Dordrecht 2000) 

C. Alsina, M.J. Frank, B. Schweizer: Associative Func- 
tions, Triangular Norms and Copulas (World Scien- 
tific, Hackensack 2006) 

R.B. Nelsen: An Introduction to Copulas, Lecture 
Notes in Statistics, Vol. 139, 2nd edn. (Springer, New 
York 2006) 

C. Alsina, R.B. Nelsen, B. Schweizer: On the charac- 
terization of a class of binary operations on distribu- 
tion functions, Stat. Probab. Lett. 17(2), 85-89 (1993) 
C. Genest, J.J. Quesada Molina, J.A. Rodriguez Lal- 
lena, C. Sempi: A characterization of quasi-copulas, 
J. Multivar. Anal. 69, 193-205 (1999) 

B. Bassano, F. Spizzichino: Relations among uni- 
variate aging, bivariate aging and dependence for 
exchangeable lifetimes, J. Multivar. Anal. 93, 313-339 
(2005) 

B. De Baets: Idempotent uninorms, Eur. J. Oper. Res. 
180, 631-642 (1999) 

D. Denneberg: Non-Additive Measure and Integral 
(Kluwer, Dordrecht 1994) 

E.P. Klement, R. Mesiar, E. Pap: A universal integral 
as common frame for Choquet and Sugeno integral, 
IEEE Trans. Fuzzy Syst. 18, 178-187 (2010) 


Aggregation Functions on [0,1] | References 73 


4.40 


4.41 


4.42 


4.43 


4.44 


4.45 


4.46 


E. Pap: Null-Additive Set Functions (Kluwer, Dor- 
drecht 1995) 

Z. Wang, G.J. Klir: Generalized Measure Theory 
(Springer, New York 2009) 

D. Dubois, H. Prade: A review of fuzzy set aggregation 
connectives, Inform. Sci. 36, 85-121 (1985) 

R. Mesiar, A. Mesiarova-Zemankova: The ordered 
modular averages, IEEE Trans. Fuzzy Syst. 19, 42-50 
(2011) 

M. Couceiro, J.-L. Marichal: Representations and 
characterizations of polynomial functions on chains, 
J. Multiple-Valued Log. Soft Comput. 16(1-2), 65-86 
(2010) 

J.C. Fodor, R.R. Yager, A. Rybalov: Structure of uni- 
norms, Int. J. Uncertain. Fuzziness Knowledge- 
Based Syst. 5, 411-427 (1997) 

H.J. Zimmermann, P. Zysno: Latent connectives in 
human decision making, Fuzzy Sets Syst. 4, 37-51 
(1980) 

M.K. Luhandjula: Compensatory operators in fuzzy 
linear programming with multiple objectives, Fuzzy 
Sets Syst. 8(3), 245-252 (1982) 

K. Menger: Statistical metrics, Proc. Natl. Acad. Sci. 
28, 535-537 (1942) 

B. Schweizer, A. Sklar: Statistical metric spaces, Pac. 
J. Math. 10(1), 313-334 (1960) 

A. Sklar: Fonctions de répartition à n dimensions et 
leurs marges, Vol. 8 (Institut de Statistique, LUniver- 
sité de Paris, Paris 1959) pp. 229-231 

J. Galambos: The Asymptotic Theory of Extreme Order 
Statistics, 2nd edn. (Krieger, Melbourne 1987) 

J.A. Tawn: Bivariate extreme value theory: Models 
and estimation, Biometrika 75, 397-415 (1988) 

D. Schmeidler: Integral representation without 
additivity, Proc. Am. Math. Soc. 97(2), 255-261 
(1986) 

R.R. Yager: Fusion od ordinal information using 
weighted median aggregation, Int. J. Approx. Rea- 
son. 18, 35-52 (1998) 

T. Calvo, G. Beliakov: Aggregation functions based on 
penalties, Fuzzy Sets Syst. 161, 1420-1436 (2010) 

M. Sugeno: Theory of fuzzy integrals and applica- 
tions, Ph.D. Thesis (Tokyo Inst. of Technology, Tokyo 
1974) 

Z. Wang, G.J. Klir: Fuzzy Measure Theory (Plenum, 
New York 1992) 

G. Choquet: Theory of capacities, Ann. Inst. Fourier 
5(54), 131-295 (1953) 

J.C. Fodor: An extension of Fung-Fu's theorem, Int. 
J. Uncertain. Fuziness Knowledge-Based Syst. 4, 
235-243 (1996) 

L.W. Fung, K.S. Fu: An axiomatic approach to rational 
decision making in a fuzzy environment. In: Fuzzy 
Sets and Their Applications to Cognitive and Deci- 
sion Processes, ed. by L.A. Zadeh, K.S. Fu, K. Tanaka, 
M. Shimura (Academic, New York 1975) pp. 227- 
256 


4.59 


4.60 


4.61 


4.62 


4.63 


4.64 


4.65 


4.66 


4.67 


4.69 


M. Grabisch, T. Murofushi, M. Sugeno (Eds.): Fuzzy 
Measures and Integrals. Theory and Applications 
(Physica, Heidelberg 2000) 

A. Kolesárová, A. Stuphanova, J. Beganova: Aggrega- 
tion—based extensions of fuzzy measures, Fuzzy Sets 
Syst. 194, 1-14 (2012) 

L. Lovasz: Submodular functions and convexity. In: 
Mathematical Programming: The State of the Art, 
ed. by A. Bachem, M. Grotschel, B. Korte (Springer, 
Berlin, Heidelberg 1983) pp. 235-257 

G. Owen: Multilinear extensions of games, Manag. 
Sci. 18, 64-79 (1972) 

F. Durante, A. Kolesárová, R. Mesiar, C. Sempi: Cop- 
ulas with given diagonal sections: novel construc- 
tions and applications, Int. J. Uncertain. Fuzziness 
Knowlege-Based Syst. 15(4), 397-410 (2007) 

F. Durante, A. Kolesárová, R. Mesiar, C. Sempi: Copu- 
las with given values on a horizontal and a vertical 
section, Kybernetika 43(2), 209-220 (2007) 

S. Ovchinnikov, A. Dukhovny: Integral representation 
of invariant functionals, J. Math. Anal. Appl. 244, 
228-232 (2000) 

R. Mesiar, V. Jágr: d-dimensional dependence func- 
tions and Archimax copulas, Fuzzy Sets Syst. 228, 
78-87 (2013) 

R. Mesiar, C. Sempi: Ordinal sums and idempotents 
of copulas, Aequ. Math. 79(1-2), 39-52 (2010) 

R. Mesiar, J. Szolgay: W-ordinal sums of copulas 
and quasi-copulas, Proc. MAGIA 2004 Conf. Kočovce 
(2004) pp. 78-83 

R. Mesiar, V. Jágr, M. Juráňová, M. Komorníková: Uni- 
variate conditioning of copulas, Kybernetika 44(6), 
807-816 (2008) 

R. Mesiar, B. De Baets: New construction meth- 
ods for aggregation operators, IPMU'2000 Int. Conf. 
Madrid (Springer, Berlin, Heidelberg 2000) pp. 701- 
706 

M. Šabo, A. Kolesárová, Š. Varga: RET operators gen- 
erated by triangular norms and copulas, Int. J. Un- 
certain. Fuzziness Knowledge-Based Syst. 9, 169-181 
(2001) 

G. Deschrijver, E.E. Kerre: Aggregation operators in 
interval-valued fuzzy and atanassov's intuitionistic 
fuzzy set theory. In: Fuzzy Sets and Their Extensions: 
Representation, Aggregation and Models, Studies in 
Fuzziness and Soft Computing, ed. by H. Bustince, 
F. Herrera, J. Montesa (Springer, Berlin, Heidelberg 
2008) pp. 183-203 

A.N. Serstnev: On a probabilistic generalization of 
metric spaces, Kazan. Gos. Univ. Ucen. Zap. 124, 3-11 
(1964) 

S. Saminger-Platz, C. Sempi: A primer on triangle 
functions |, Aequ. Math. 76(3), 201-240 (2008) 

S. Saminger-Platz, C. Sempi: A primer on triangle 
functions Il, Aequ. Math. 80(3), 239-268 (2010) 

L.A. Zadeh: Fuzzy sets, Inform. Control 8, 338-353 
(1965) 


h |Y Hed 


74 + PartA 


Foundations 


4 | Y Hed 


4.74 


4.75 


4.76 


4.77 


L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning, Part |, In- 
form. Sci. 8, 199-251 (1976) 

L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning, Part Il, In- 
form. Sci. 8, 301-357 (1975) 

L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning, Part III, In- 
form. Sci. 9, 43-80 (1976) 

D. Dubois, E.E. Kerre, R. Mesiar, H. Prade: Fuzzy 
interval analysis. In: Fundamentals of Fuzzy 
Sets, The Handbook of Fuzzy Sets Series, ed. by 
D. Dubois, H. Prade (Kluwer, Boston 2000) pp. 483- 
582 


4.78 


4.79 


4.80 


4.81 


G. De Cooman, E.E. Kerre: Order norms on bounded 
partially ordered sets, J. Fuzzy Math. 2, 281-310 
(1994) 

F. Herrera, S. Alonso, F. Chiclana, E. Herrera-Viedma: 
Computing with words in decision making: Founda- 
tions, trends and prospects, Fuzzy Optim. Decis. Mak. 
8(4), 337-364 (2009) 

L.A. Zadeh: Computing with Words - Principal Con- 
cepts and Ideas, Stud. Fuzziness Soft Comput, Vol. 277 
(Springer, Berlin, Heidelberg 2012) 

L.A. Zadeh (Ed.): Computing with Words in Informa- 
tion/Intelligent System 1: Foundations, Stud. Fuzzi- 
ness Soft Comput, Vol. 33 (Springer, Berlin, Heidel- 
berg 2012) p.1 


5. Monotone Measures-Based Integrals 


Erich P. Klement, Radko Mesiar 


The theory of classical measures and integral re- 
flects the genuine property of several quantities 
in standard physics and/or geometry, namely the 
o-additivity. Though monotone measure not as- 
suming o-additivity appeared naturally in models 
extending the classical ones (for example, inner 
and outer measures, where the related integral 
was considered by Vitali already in 1925), their 
intensive research was initiated in the past 40 
years by the computer science applications in ar- 
eas reflecting human decisions, such as economy, 
psychology, multicriteria decision support, etc. In 
this chapter, we summarize basic types of mono- 
tone measures together with the basic monotone 
measures-based integrals, including the Choquet 
and Sugeno integrals, and we introduce the con- 
cept of universal integrals proposed by Klement 
et al. to give a common roof for all mentioned 


Before Cauchy, there was no definition of the integral 
in the actual sense of the word definition, though the 
integration was already well established and in many ar- 
eas applied method. Recall that constructive approaches 
to integration can be traced as far back as the ancient 
Egypt around 1850 BC; the Moscow Mathematical Pa- 
pyrus (Problem 14) contains a formula of a frustum 
of a square pyramid [5.1]. The first documented sys- 
tematic technique, capable of determining integrals, is 
the method of exhaustion of the ancient Greek as- 
tronomer Eudoxus of Cnidos (ca. 370 BC) [5.2] who 
tried to find areas and volumes by approximating them 
by a (large) number of shapes for which the area or vol- 
ume was known. This method was further developed 
by Archimedes in third-century BC who calculated the 
area of parabolas and gave an approximation to the 
area of a circle. Similar methods were independently 
developed in China around third-century AD by Liu 
Hui, who used it to find the area of the circle. This 


5.1 Preliminaries, Choquet, and Sugeno 


UC | eer 76 
5.2 Benvenuti Integral ......................ee 80 
5.3 Universal Integrals ....................ee 82 


5.4 General Integrals 
Which Are Not Universal ......................... 84 


5.5 Concluding Remarks, Application Fields... 86 


Referees ninasi nenen 87 


integrals. Benvenuti's integrals linked to semicop- 
ulas are shown to be a special class of universal 

integrals. Up to several other integrals, we also in- 
troduce decomposition integrals due to Even and 
Lehrer, and show which decomposition integrals 

are inside the framework of universal integrals. 


was further developed in the fifth century by the Chi- 
nese mathematicians Zu Chongzhi and Zu Geng to find 
the volume of a sphere. In the same century, the In- 
dian mathematician Aryabhata used a similar method 
in order to find the circumference of a circle. More 
than 1000 years later, Johannes Kepler invented the 
Kepler’sche Fassregel [5.3] (also known as Simpson 
rule) in order to compute the (approximative) volume 
of (wine) barrels. 

Based on the fundamental work of Isaac New- 
ton and Gottfried Wilhelm Leibniz in the 18th century 
(see [5.4,5]), the first indubitable access to integration 
was given by Bernhard Riemann in his Habilitation 
Thesis at the University of Gottingen [5.6]. Note that 
Riemann has generalized the Cauchy definition of in- 
tegral defined for continuous real functions (of one 
variable) defined on a closed interval [a, b]. 

Among several other developments of the inte- 
gration theory, recall the Lebesgue approach covering 


15 


76 PartA 


Foundations 


L'S | Y Hed 


measurable functions defined on a measurable space 
and general o-additive measures. Here we recall the 
final words of H. Lebesgue from his lecture held at 
a conference in Copenhagen on May 8, 1926, entitled 
The Development of the Notion of the Integral (for the 
full text see [5.7]): 


. if you will, that a generalization made not for 
the vain pleasure of generalizing, but rather for the 
solution of problems previously posed, is always 
a fruitful generalization. The diverse applications 
which have already taken the concepts which we 
have just examine prove this super-abundantly. 


All till now mentioned approaches to integration are 
related to measurable spaces, measurable real functions 
and (o-)additive real-valued measures. Though there 
are many generalizations and modifications concerning 
the range and domain of considered functions and mea- 
sures (and thus integrals), in this chapter we will stay in 
the above-mentioned framework, with the only excep- 
tion that the (o-)additivity of measures is relaxed into 
their monotonicity, thus covering many natural general- 
izations of (o-)additive measures, such as outer or inner 
measures, lower or upper envelopes of systems of mea- 
sures, etc. 

Maybe the first approach to integration not deal- 
ing with the additivity was due to Vitali [5.8]. Vitali 
was looking for integration with respect to lower/upper 
measures and his approach is completely covered by 
the later, more general, approach of Choquet [5.9], 
see Sect. 5.1. Note that the Choquet integral is a gen- 
eralization of the Lebesgue integral in the sense 


that they coincide whenever the considered measure 
is o-additive (i.e., when the Lebesgue integral is 
meaningful). 

A completely different approach, influenced by the 
starting development of fuzzy set theory [5.10], is due 
to Sugeno [5.11]. Sugeno even called his integral as 
fuzzy integral (and considered set functions as fuzzy 
measures), though there is no fuzziness in this con- 
cept (Sect. 5.1). Later, several approaches generalizing 
or modifying the above-mentioned integrals were in- 
troduced. In this chapter, we give a brief overview 
of these integrals, i.e., integrals based on monotone 
measures. In the next section, some preliminaries and 
basic notions are recalled, as well as the Choquet and 
Sugeno integrals. Section 5.2 brings a generalization 
of both Choquet and Sugeno integrals, now known as 
the Benvenuti integral. In Sect. 5.3, universal integrals 
as a rather general framework for monotone measures- 
based integral is given and discussed, including copula- 
based integrals, among others. In Sect. 5.4, we bring 
some integrals not giving back the underlying measure. 
Finally, some possible applications are indicated and 
some concluding remarks are added. Note that we will 
not discuss integrals defined only for some special sub- 
classes of monotone measures, such as pseudoadditive 
integrals [5.12, 13] or t-conorms-based integrals of We- 
ber [5.14]. Moreover, we restrict our considerations to 
normed measures satisfying m(X) = 1, and to functions 
with range contained in [0, 1]. This is done for the sake 
of higher transparentness and the generalizations for 
m(X) € ]0, oo] and functions with different ranges will 
be covered by the relevant quotations only. 


5.1 Preliminaries, Choquet, and Sugeno Integrals 


For a fixed measurable space (X, A), where A is a o- 
algebra of subsets of the universe X, we denote by 
F.a) the set of all A-measurable functions f : X > 
[0, 1], and by Mvy, a) the set of all monotone measures 
m: A — [0,1] i.e., m(@) = 0, m(X) = 1 and m(A) < 
m(B) whenever A C B C X). Note that functions f from 
Fx.a) can be seen as membership functions of fuzzy 
events on (X, A), and that monotone measures are in 
different references also called fuzzy measures, capaci- 
ties, pre-measures, etc. Moreover, if X is finite, we will 
always consider A = 2* only. In such case, any mono- 
tone measure m € Mga) is determined by 2!*!—2 
weights from [0, 1] (measures of proper subsets of X) 
constraint by the monotonicity condition only, and to 


each monotone measure m: A — [0, 1] we can assign 
its Mobius transform M, : A — R given by 


Mn(A) = >> (—1)4\8! -m(B). (5.1) 
BCA 
Then 
m(A) =) Mn(B). (5.2) 


BSA 


Moreover, dual monotone measure m : A — [0, 1] is 
given by m? (A) = 1 — m(A°). 


Monotone Measures-Based Integrals | 5.1 Preliminaries, Choquet, and Sugeno Integrals 


Among several distinguished subclasses of mono- 
tone measures from M(x, a) we recall these classes, 
supposing the finiteness of X: 


© Additive measures, m(A U B) = m(A) +m(B) when- 
ever A N B = Ø; 

© Maxitive measures, m(A U B) = m(A) V m(B) (these 
measures are called also possibility measures [5.15, 
16)); 

@ k-additive measures, M,,(A) = 0 whenever |A| > k 
(hence additive measures are 1-additive); 

© Belief measures, M,,(A) > 0 for all A C X; 

Plausibility measures, m’ is a belief measure; 

@ Symmetric measures, M,,(A) depends on |A| only. 


For more details on monotone measures, we recom- 
mend [5.17—19] and [5.20]. 

Concerning the functions, for any c € [0, 1], A € A 
we define a basic function b(c, A) : X — [0, 1] by 


c ifxEA, 
HEA = 0 else. 
Obviously, basic functions can be related to the char- 
acteristic functions, 14 = b(1,A) and b(x, A) = c- 14. 
However, as we are considering more general types of 
multiplication as the standard product, in general, we 
prefer not to depend in our consideration on the stan- 
dard product. 

The first integral introduced for monotone measures 
was proposed by Choquet [5.9] in 1953. 


Definition 5.1 
For a fixed monotone measure m € M(x, a), a func- 
tional Ch, : F¢x,.a) — [0, 1] given by 


1 


Ch (f) = J maid (5.3) 


0 


is called the Choquet integral (with respect to m), where 
the right-hand side of (5.3) is the classical Riemann in- 
tegral. 


Note that the Choquet integral is well defined be- 
cause of the monotonicity of m. Observe that if m is 
o-additive, i. e., if it is a probability measure on (X, A), 
then the function A: [0,1] > [0,1] given by h(t) = 
m(f > t) is the standard survival function of the random 
variable f, and then Ch,,(f) = i h(t)dt = fọ f dmis the 
standard expectation of f (i.e., Lebesgue integral of f 
with respect to m). 


Due to Schmeidler [5.21, 22], we have the following 
axiomatization of the Choquet integral. 


Theorem 5.1 

A functional J: Fæ, a) > [0, 1], 7x) = 1, is the Cho- 
quet integral with respect to monotone measure m € 
M.a) given by m(A) = 1(14) if and only if J is 
comonotone additive, i. e., if If + g) = /(f) + 1(g) for 
all f, g € F(x,a) such that f + g € Fx,a) and f and g 
are comonotone, (f(x) —f(y))-(g(x) — g(y)) = 0 for any 
x,y EX. 


We recall some properties of the Choquet integral. 

It is evident that the Choquet integral Ch,, is an 
increasing functional, Ch,,(f) <Ch,,(g) for any m€ 
Mx, a)» f.8 E€ F.a) such that f < g. Moreover, for 
each A € A it holds Ch,, (b(c, A)) = c- m(A), and espe- 
cially Ch„ (14) = m(A). 


Remark 5.1 

i) Due to results of Sipos [5.23], see also [5.24], 
the comonotone additivity of the functional J in 
Theorem 5.1, which implies its positive homogene- 
ity, [(cf) =c-I(f) for all c>0 and f € Fy,a) 
such that cf € Fcx,a) can be replaced by the pos- 
itive homogeneity of J and its horizontal additivity, 
ie, 


If) =lf Aa)+lf—faa) 


for all f € F.a) anda € [0, 1]. 
ii) Choquet integral Ch,, : Fæ, a) — [0, 1] is continu- 
ous from below, 


lim Chm f) = Chn (f) 


n—>co 


whenever for (f,)neN € FÀ ay we have fa < fn+1 
for all n € N and f = lim,—+oo fn, if and only if m is 
continuous from below, 


lim m(A,) = m(A) 
n=>=00 


whenever for (An)neN € AN we have A, C Ant 
for all n € N and A = U ey An. 

iii) Choquet integral Ch, : Fæ, a) — [0, 1] is subaddi- 
tive (superadditive), 


IF +8) SIC)+I) UF+8) 217) +1@)) 


77 


L'S |Y Hed 


78 Part A | Foundations 
for all f, g,f +8 € F.a), if and only if m is sub- Then 
modular (supermodular), 
m(A U B) + m(A N B) < m(A) + m(B) , Ch„ (f) = inf fi dP|P > m 
(m(A U B) + m(A N B) > m(A) + m(B)) x 
forall A. Be A. Similarly, for the related plausibility measure mf, it 
iv) For any m € My, a) and f € Fy, a) it holds holds 
Cha (f) =1 —Ch,, (1 —f) , Cha (f) = sup P dP|P < mf 
i.e., in the framework of aggregation func- * 
tions [5.25] the dual to a Choquet integral (with 
respect to a monotone measure m) is again the Cho- = sup / faP|P >m 
quet integral (with respect to the dual monotone x 
measure m?). 
For interested readers, we recommend the collec- 
For th fs and details ab he ab tion [5.30]. 

i orit n EOIS an p crsa ae p nie In general, for any monotone measure m € Mix, a) 
ary 4 ts on Choquet integral, we recommend [5.18, 19, 24, and any measurable (continuous from below) func- 
= 6]. 7 ; . : . tion f € F(x,a) there is a probability measure P, f on 
> Restricting our considerations to finite universes, (X, A) so that 
E we have also the next evaluation formula due to ; 
= Chateauneuf and Jaffray [5.27] 


Chn(f) = >) Mn(A)-min (f(x)|xe€ A). (5.4) 


ACX 


In the Dempster-Shafer theory of evidence [5.28, 29], 
belief measures are considered, and then the Mobius 
transform M, : 2* \ {Ø} > [0, 1] of a belief measure m 
is called a basic probability assignment. Evidently, M, 
can be seen as a probability measure (of singletons) 
on the finite space 2* \ {Ø} (with cardinality 2!*! — 1), 
and defining a function F : 2* \ {Ø} — [0, 1] by F(A) = 
min (f(x)|x € A), the formula (5.4) can be seen as the 
Lebesgue integral of F with respect to Mm (i. e., it is the 
standard expectation of variable F) 


Chn(f)= J. FA) M,A). 


AEX Ø; 


Another genuine relationship of Choquet and 
Lebesgue integrals in the framework of the Dempster- 
Shafer theory is based on the fact that each belief 
measure m can be seen as a lower envelope of the 
class of dominating probability measures, i. e., for each 
A CX (X is finite) 


m(A) = inf {P(A)|P > m} . 


Chn (f) = [ferns ’ (5.5) 


X 


see, e.g., [5.24, Theorem 2.6], where the right-hand 
side of (5.5) is the standard Lebesgue integral. More- 
over, if f,g € F.a) are comonotone, one can find 
unique probability measure P allowing to express the 
Choquet integral of f and g with respect to m as 
the Lebesgue integral of f and g with respect to P, 
respectively. As an immediate consequence of (5.5), 
Jensen’s inequality for Choquet integral can be shown 
to be valid. Similarly, if f and g are comonotone, 
based on the above observations, one can prove the 
Minkowski and Chebyshev inequality. For more details, 
see [5.31]. 

For ke N, consider a probability measure P on 
the product space (X, AY, and define a set function 
m: A — [0,1] by m(A) = P(A‘). Then m € Mix, a) is 
a k-additive monotone measure (and belief measure, as 
well), and for all f € Fy.) it holds 


Chn f) = fre, (5.6) 
xk 
where F:X*-> [0,1] is given by F(x1,...,x.) = 


min (f(x;),...,f(x,)). For more details see [5.32]. 


Monotone Measures-Based Integrals | 5.1 Preliminaries, Choquet, and Sugeno Integrals 79 


The Sugeno integral (in the original sources 
called fuzzy integral) was introduced by Sugeno in 
1972 in Japanese in [5.33] and in English in 1974 
in [5.11]. Inspired by the fuzzy set theory introduced 
by Zadeh [5.10], Sugeno has proposed a way how to 
formalize human subjectivity in spirit similar to the ran- 
domness but based only on ordinal scales. His concept 
is not fuzzy, though both fuzzy set theory and Sugeno’s 
integral theory exploit the same aggregation functions 
(sup and inf), and considering functions f € Fx, a) as 
membership functions of fuzzy subsets of X, the corre- 
sponding Sugeno integral can be seen as a version of 
expectation of fuzzy sets. 


Definition 5.2 
For a fixed monotone measure m € My, q), a func- 
tional Su,, : F(x,a) — [0, 1] given by 


Su,,(f) = sup {min (t, m(f > t)) |t € [0, 1]} 


is called the Sugeno integral (with respect to m). 


(5.7) 


There is an equivalent formula for the Sugeno integral, 

compare ([5.11, Definition 3.1]), 
Su (f) = sup {min (m/(A), inf (f(x) |x EA) |A € A}, 
(5.8) 


which in the case of finite X (and using also lattice no- 
tation sup = V, min = A) can be rewritten as 


Sun (f) = V (My (A) A min(f(a)|x € A)) , 


ACX 


(5.9) 


showing the striking similarity with the evaluation for- 
mula (5.4) for the Choquet integral. Here the set func- 
tion MY : 2% \ {Ø} — [0, 1] is the so-called possibilistic 
Mobius transform introduced by Mesiar in [5.34] and 
given by 


MY (A) = 0 if m(A) = m(B) for some BSA, 
j m(A) else. 


Sugeno integral has properties similar to the Choquet 
integral. Indeed, it is nondecreasing functional such that 
Su,,(b(c, A)) = c Am(A), and in particular Su,,(14) = 
m(A). Moreover, Su„ is comonotone maxitive, i.e., 
Sun (f V 2) = Sun(f) V Sun(g) for any comonotone 
f.g € F(x.ay, and min-homogeneous, Su,,(cAf) = cA 
Su,,(f). We have the next axiomatization of the Sugeno 
integral due to Marichal [5.35] (compare with Theorem 
5.1 for the Choquet integral). 


Theorem 5.2 


A 


functional 7: Fæ,a > [0,1], 71x) = 1, is the 


Sugeno integral with respect to monotone measure 
m E€ Myx,a) given by m(A) = /(1,) if and only if J is 
comonotone maxitive and min-homogeneous. 


For alternative axiomatizations see [5.24]. 
Choquet and Sugeno integrals with respect to 


a monotone measure m may differ not more than i 1.6, 
for all f € Fcx,a) it holds 


1 
|Ch,,(f) ~~ Sun(f)| = 4 ; 


Moreover, Chn (f) = Sum(f) for all f E€ F.a) if and 
only if m(A) € {0, 1} for all A € A, and then 


Ch,,(f) = Sun(f) = sup {inf {f(x) |x E€ A} |m(A) = 1} , 


which in case X is finite turns out to be a lattice polyno- 
mial. 


Note that if X has cardinality n and m(A) € {0, 1} 


for all AC xX, then Chm = Sun : [0, 1]" — [0,1] are 
the only n-ary continuous aggregation functions in- 
variant under each automorphism ¢ : [0, 1] — [0, 1], 
i.e., ġ o Cha (f) = Chn (f ° $) for each f[0, 1]” (for f = 
(ai,..., an), f° = ($ (a1), ...,(an))). For more de- 
tails see [5.36]. 


Example 5.1 


i) 


ii) 


Let X = {1,2,3} and define m : 2* — [0, 1] by 


0 ifcard A<1, 


m(A) = 1 otherwise. 


Then, for each f = (x, y, z) € [0, 1], 
Ch,,(f) = Sun(f) = (xAy)V (xA ZV (yA z) 
= med(x, y, z) 


brings the classical median. 
Let X= {1,2} and define m:2* > [0,1] by 
m(A) = sA, Then, for each f = (x, y) € [0, 11°, 
x+y 

2 


G. e., Ch, is the standard arithmetic mean), while 


Chn f) = 


Suntf = Eav (CvA) . 


For fi = (4, 1), Chn (fi) = ; and Sun(fi) = 
For h = (0, 5) Chn (f2) = 1 and Sun (f2) = 


NI=NI= 


Vs |Y Hed 


80 Part A | Foundations 
In general, where c, is the unique solution of the equation t = 
1 1 (1— t)”, t €]0, 1[. Hence, 
Chin (f)— Sun (f)| = 2 (lx— yl ^ lx+y— 1) = 4 + 
: 1 
iii) Let X = [0,1], A = B ([0,1]) and let m: A — ifp=1, Ch, (f) = Sun(f) = 5? 
[0, 1] be given by m(A) = AP (A), where p € ]0, oo[ 1 
is a fixed constant and A: A — [0,1] is the stan- if p = 2, Ch,,(f) = = 
dard Lebesgue measure. For any Lebesgue measure 3 
preserving function f : X — [0, 1], such as f(x) = x, 35 J5 
f(x) = 1—x, or f(x) = |2x— 1|, we have and Sun (f) = 2 = 0.382; 
2 
f r 1 ifp =3; Ch, (f) — 
Ch,,(f) = [ mrz0 da= fa-pa= — 3 
ae gaei 
ajd ? j and Sun (f) = —5— = 0.618 
Su„ (f) = sup {min (1, (1 — P) |t € [0, 1} = ¢, 
= 5.2 Benvenuti Integral 
ro] 
as Comparing Theorems 5.1 and 5.2, we see a striking Lemma 5.1 
— similarity in the axiomatic characterization of the Cho- Let @ : [0, u]? — [0, u] be a given pseudoaddition on 
z quet and Sugeno integrals. This similarity was gener- [0, u]. The related pseudo difference © : [0, u]? — [0, u] 


alized under a common roof by Benvenuti et al. [5.24], 
calling there introduced integral general fuzzy integral. 
This integral is now also known as Benvenuti integral 
(compare [5.25]). 

Choquet integral is linked to the standard arithmetic 
operations + and - on [0, co], while the Sugeno inte- 
gral deals with lattice operations ^ and v on (0, 1]. To 
generalize these two couples of operations, pseudoad- 
dition ® and pseudomultiplication © was introduced 
in [5.24]. 


Definition 5.3 

Let u € [1, co] be a fixed constant. An operation @ : 
[0, u]? + [0, u] is called a pseudoaddition on [0,u] 
whenever it is associative, nondecreasing in both 
components, 0 is its neutral element, and © is 
continuous. 


Observe that the structure ([0, u], $) with @ a pseu- 
doaddition on [0, u] is just an /-semigroup of Mostert 
and Shields [5.37] and hence @ is also commutative. 
Moreover, considering the principles of Galois connec- 
tions, we can introduce a pseudodifference © related to 
® satisfying, for all a,b,c € [0, u], (a © b) < c if and 
only ifa<bO6c. 

It is not difficult to see the link to the pseudodiffer- 
ence considered already by Weber [5.14]. 


is given by 


aQb=inf{ceE [0,u]|bDOc>a}. 


Considering the standard addition + on [0, oo], and 
a > b, then the corresponding (pseudo-) difference is 
the standard difference a— b. On the other hand, V is 
a pseudoaddition on any interval [0, u], and its corre- 
sponding pseudodifference Oy, is given by 


0 ifa<b, 


a b= : 
Əv a otherwise . 


Due to [5.37], each pseudoaddition ® on [0, u] can be 
represented as an ordinal sum, 


Se | (ge(Bx) A (grla) + gx(b))) 
if (a,b) € Jax, BÊ. 
avb otherwise , 


agb= 


where (Jax, Bx [pe is a disjoint system of open subin- 
tervals of [0, u], and gx: [ax, Bk] —> [0, 00] is a con- 
tinuous strictly increasing function such that g; (œx) = 
0,ke K (K can be also empty). Two extremal 
cases correspond to 6 = V (when K is empty) and 


Monotone Measures-Based Integrals | 5.2 Benvenuti Integral 


Archimedean pseudoaddition ® on [0, u] generated by 
g : [0, u] — [0, co] (when K is singleton, say K = {1}, 
and a; = 0, B} = u), 


a®b=g '(g(u) A (g(a) +8b))) . 


Then g is called an additive generator of ® and it is 
unique up to a positive multiplicative constant. 

Note that if g is a bijection, i.e., g(u) = ov, then 
a®b=g '(g(a) + g(b)) and @ is called a strict pseu- 
doaddition. 

For a fixed pseudoaddition ® on [0, u], Benvenuti 
et al. [5.24] have introduced a @-fitting pseudomulti- 
plication ©. 


Definition 5.4 

Fix u,v € [1, co] and let © be a given pseudoaddition 
on [0, u]. A mapping © : [0, u] x [0, v] — [0, u] is called 
a ®-fitting pseudomultiplication whenever it is nonde- 
creasing in both components, 0 is its annihilator, i. e., 
00b=a00=0 for all a € [0, u], b € [0, v], it is left 
distributive over @, i.e., (a@b) Oc = (aOc) @(bOc) 
for all a, b € [0, u], c € [0, v], and it is lower semicontin- 
uous, i. €., 


(VneN an) (©) (Vme N bm) = Vn,meEN (an © bm) : 


The left distributivity of a pseudomultiplication © 
over V simply means the nondecreasingness of © in 
the first coordinate, and thus there are several kinds 
of V-fitting pseudomultiplication ©. On the other 
hand, this is a rather restrictive constraint when © is 
Archimedean, i.e., generated by an additive generator 
g: [0, u] > [0, co]. 


Proposition 5.1 

Let @ : [0, u]? — [0, u] be an Archimedean pseudoad- 
dition generated by an additive generator g: [0, u] > 
[0, co]. A mapping ©: [0, u] x [0, v] > [0, u] is a @- 
fitting pseudomultiplication if and only if there is 
a lower semicontinuous nondecreasing function h: 
[0, v] — [0, co] such that h(w) = 0 for some w € [0, v], 
and g(u)- h(a) > g(u) for all a € |v, v], so that 


aOb =g" (g(u) ^ g(a): h(b)) . 


In particular, if ® is a strict pseudoaddition, then 
h: [0, v] — [0, co] is a lower semicontinuous nonde- 
creasing function, satisfying h(0)=0, and a © b= 
g~! (g(a) -h(b)). 


Definition 5.5 

Let u,v € [1,00] be fixed given constants and let @: 
[0, u]? > [0,u] be a given pseudoaddition, and ©: 
[0, u] x [0, v] + [0, u] be a given @-fitting pseudomul- 
tiplication such that 1 © 1 < 1. For a fixed monotone 
measure m € M,a), a functional B® : F a) > 
[0, 1] given by 


BLOF) = sup D (ai (©) m(Aj)) |n € N, 


i=1 


D b(ai, Ai) < f, (A=; is a chain 
i=1 
is called Benvenuti integral (with respect to m, based on 
® and ©). 


Observe that if s€ F(y,a) is a simple function, 
range s = {b),..., ba}, by < by < --- < bn, then 


BE (s) = @ (S b1) Om(s > bi) , 


i=1 
with the convention bp = 0. Then for any f € Fœ, a), 
BPO) 
= sup [BE Olose F is simple, s < \ 
m (X.A) ple, s <f ` 
Evidently, 


B®: (b(a, A)) = a © m(A) 


m 


and hence 


BR OA) = m(A) 
for all m € Mgx,a), A € A only if 1 © b = b for all b € 
[0, 1]. 

If @ is a strict pseudoaddition on [0, u] generated by 
an additive generator g, this means that © restricted to 
[0, 1]? is given by 


aobu eee) 
cee ( a). 


If is a nonstrict pseudoaddition, then there is no 
®-fitting pseudomultiplication © such that 1 © b = b 
for all b € [0, 1]. 

Note that for the standard arithmetic operations + 
and - on [0, oo], Bre = Chn, i. e., the Choquet integral 
is recovered. Similarly, BX+^ = Sun. 


81 


T'S |Y Hed 


82 PartA | Foundations 


E'S | Y Hed 


Example 5.2 

i) Let u=v=1, =v and ©: [0,1]? > [0,1] 
be given by a@b=a?-b!, p,q €]0,oo[. Then 
B®-O(f)) = sup {P -(m(f = 1))4 |t € [0, 1]} for any 
me My.ay and fe Faa and BY-O(14) = 
(m(A))*%. Note that if p = q = 1, the Shilkret inte- 
gral Shn = Bo-O is recovered, see [5.19,38]. In 
general, Be: F) = Shna (fP) for any f € F.a). 

ii) For a strict pseudoaddition © on [0, u] and a @- 
fitting pseudomultiplication © on [0, u] x [0, v], see 
Proposition 5.1, the constraint 1©1<1 means 
h(b) < 1, and then BLOF) = g | (Chron) (g(f))), 
i.e., BY-© is obtained as a transformation of the 


m 
Choquet integral. 


For more details, we recommend the original 
source [5.24], but also [5.25, 39]. 


Remark 5.2 
When considering u = 1, a pseudoaddition ® on (0, 1] 


5.3 Universal Integrals 


The concept of universal integrals on [0, oo] was pro- 
posed and discussed in [5.44]. As already mentioned, 
we will restrict our considerations to the interval 


(0, 1]. 


Definition 5.6 
Let S be the class of all measurable spaces. A mapping 


E U (Mæ. x Fæ.a) > [0.1] 
(xX, A)ES 


is called a universal integral whenever it satisfies 


UII J is nondecreasing in both components; 

UI2 there is a semicopula @ : [0, 1]? > [0,1] (i.e., @ 
is nondecreasing in both components and 1 & a = 
a® 1 for all a € [0, 1]) such that J (m, b(a, E)) = 
a®m(E) for all a€ [0,1], any (X,A)e€S,me 
Mix,a) and E € A; 

UB I(m;, fi) =I (m, fo) whenever (mi, fi) € (Xi, Ai), 
i= 1,2, and m (fi > t) = m(fh > t) for all tE 
[0, 1]. 


Observe that the axiom (UIl) reflects the stan- 
dard monotonicity of integrals. On the other hand, 


becomes a (continuous) f-conorm. Integrals based on 
t-conorms closely related to Benvenuti integrals were 
discussed by Murofushi and Sugeno [5.40], resulting 
to the two classes of ¢-conorm based integrals. Those 
based on the smallest t-conorm V coincide with Ben- 
venuti integral based on V, with stronger requirements 
on the corresponding V-fitting pseudomultiplication ©. 
The second one, based on continuous Archimedean t- 
conorms, is a special transform of the Choquet integral, 
compare Example 5.2 ii), 


MS,„ (f) =k (Chron) (s(f))) , 


with appropriately chosen functions k, h, g : [0,1] > 
[0, co]. Note that the Murofushi—-Sugeno integral cov- 
ers also the integral of Weber [5.14] based on strict 
t-conorms. Another closely related approach to integra- 
tion, fixing u = v = oo, can be found in [5.41], where 
Choquet-like integrals were introduced and discussed. 
For more details on these types of integrals we refer 
to [5.42, 43]. 


(UI2) expresses the fact that an integral of a basic 
function b(a, E) with respect to a monotone measure 
m depends on the values a and m(E) only, inde- 
pendently of the considered measurable space (X, A) 
and a monotone measure m € M(x,a) (compare the 
truth values principle in the propositional logics). Fi- 
nally, (UI3) generalizes the well-known fact from 
the probability theory that two random variables (de- 
fined possibly on two different probability spaces) 
have the same expectation whenever their distribution 
functions coincide (in fact, for a probability mea- 
sure P, P(f > t) defines a survival function which 
is complementary to the related distribution func- 
tion). 

There are several construction methods for univer- 
sal integrals. First of all, for any given semicopula & : 
[0, 1]? — [0, 1], one can introduce the smallest univer- 
sal integral Ig and the greatest universal integral [2 
related to ® through (UI2): 


Ig(m,f) = sup {t m(f > ilt € [0, 1} 
and 


1° (m, f) = essup,,(f) ® m(supp f) , 


Monotone Measures-Based Integrals | 5.3 Universal Integrals 


where 


essup,,(f) = sup {t € [0, 1]|m(f = t) > 0} 


and 


supp f = {x € X|f(x) > 0}. 


Observe that J, (m, -) = Su,, is the Sugeno integral, 
Irr (m,:) = Shn is the Shilkret integral (I denotes the 
product semicopula), while Zr with T a strict t-norm is 
an integral introduced by Weber in [5.45]. 

Considering the Benvenuti integral based on a pseu- 
doaddition ® on [0,u] and a @-fitting pseudomullti- 
plication © : [0, u] x [0, v] — [0, u], u, v, € [1, co], such 
that ® = ©|[0, 1]? is a semicopula, one get a universal 
integral given by 

PO (m,f) = BPO). 

Note that I+" (m,f) = Chf) and I~:^(m,f)= 
Sun (f). 

As an important class of universal integrals we 
introduce copula-based integrals. Recall that a semicop- 
ula C: [0, 1]? — [0, 1] is called a copula [5.46] when- 
ever it is supermodular, i.e., for any x,y € [0, 1]? it 
holds 


Cavy)+CxAy) = C)+ C). 


Note that there is a one-to-one correspondence between 
copulas and probability measures on Borel subsets of 
[0, 1]? with uniformly distributed margins, this relation 
is stated by the equality 


Pc ({0, a] x [0, b]) = C(a, b), (a,b) € [0, 1}. 


The next result is extracted from [5.44], also com- 
pare [5.47, 48]. 


Proposition 5.2 
Let C : [0, 1]? — [0, 1] be a fixed copula. Then the map- 
ping 


Ke: |] (Maa x Fux.ay) > (0.1 
(X, AVES 


given by 
Ke(m,f) = Pc ({(u, v) € [0, Iv < mf = w)}) 


is a universal integral (with C being the corresponding 
semicopula). 


Note that for the product copula M, K77(m,-) = 
Ch, is the Choquet integral, while for the greatest cop- 


ula A = Min, K,(m,-) = Sum is the Sugeno integral. 
For the smallest copula W : [0, 1]? — [0, 1] given by 


W(a, b) = max(0,a+b—-1), 


Kw was called opposite Sugeno integral in [5.49] and it 
is given by 


Kw(m,f) =A ({t € [0, ]|mf > )>1-t), 


where À is the standard Lebesgue measure on Borel 
subsets of [0, 1]. 


Remark 5.3 
The class of universal integrals is convex, i. e., for J4, Jb 
universal integrals and a constant c € [0, 1], also 
IT=cl+U-ohb 
is a universal integral (related to the semicopula © = 
c-O1 + (l—c)- 02). 
Though the class of semicopulas is also convex, for 


the weakest universal integrals we can ensure only the 
inequality 


l-0, +0—)-0: = Clo, + (1—cMe, - 


On the other hand, for the convex class of copulas it 
holds 


Kec, 40-0 = CKe, + 1 -o)Ke , 


i. e., the class of copula-based integrals is convex. 


83 


E'S | V Hed 


84 PartA | Foundations 


H's |Y Hed 


5.4 General Integrals Which Are Not Universal 


There are several integrals defined on any measurable 
space (X, A), for any monotone measure m € M(x, a) 
and any function f € Fx,a) which are not universal. 
We recall two of them based on the standard arithmetic 
operations + and -. 


Definition 5.7 
A mapping 


G: U (M&,a) x Fæ.a) > [0, 00] 
(X,A)ES 


given by 


G(m, f) = sup X ai-m(A)|n €N,ay,..-,4n 20, 


i=1 


do blai, Ai) <f and (Aj), 


i=l 


is a disjoint subsystem of A 


is called a PAN-integral. 


Note that this integral was introduced by 
Yang [5.50], see also [5.51] in more general set- 
ting on [0, co] involving operations ® and ©. Due to 
the results of [5.52], each PAN-integral on [0, 1] is ei- 
ther a transformation of integral given in Definition 5.7, 
I(m, f) = g—! (G(g(m), g(f))) for some automorphism 
g: [0,1] > [0, 1], or if 6 = v, it is a special instant 
of integrals Jọ discussed in Sect. 5.3. Also observe 
that a deep discussion on PAN-integral G can be found 
in [5.53]. 

PAN-integral allows one to recognize the under- 
flying monotone measure m only if m is superad- 
ditive. Moreover, as a major defect of this inte- 
gral we recall that it does not exclude the equality 
of integrals based on two different monotone mea- 
sures, i.e., there are monotone measures mj, mp € 
Mx,ay,m Am, such that G(m,f) = G(m,f) for 
all fE Fix,ay. Note that PAN-integral coincide 
with the Lebesgue integral whenever m is o- 
additive. A similar situation is linked to the con- 
cave integral introduced by Lehrer [5.54], see 
also [5.55]. 


Definition 5.8 
A mapping 


L: U (Maa) x Fæ.a)) — [0, co] 


(X,AJES 


given by 


L(m,f) = sup X ai-m(Aj)|n EN, 


i=1 


Bip io Gn > 0, Š d(ai,Ai) <f 


i=1 
is called a concave integral. 


Observe that this integral is concave in the sense that 
for each m € Mix, a). f, 9 E F.a) and c € [0, 1], 


Lim, cf + (1—c) g) > cL(m,f) + (1 — c) L(m, g). 


Concave integral coincides with the Choquet inte- 
gral whenever m is supermodular. However, also here 
L(m,f) = L(m2,f) may hold for all f € Fix.) for 
some monotone measures mı, m E Myx,a), mi Amo. 
Finally, recall that it trivially holds 


L(m,f) = G(m, f) and L(m,f) > Chn(f) 
for all m € Mx,a) and f € Frx,a). 


Example 5.3 

i) Consider X = [0, 1], A = 8([0, 1]) and A the stan- 
dard Lebesgue measure on A. Let m =A’, pe 
]0, 1[. Then for any f € Fa) with nonvanishing 
support (i. e., m(f > 0) > 0) it holds 


G(m,f) = L(m,f) = +00 . 


On the other hand, for m = A? (observe that m is 
supermodular, and thus also superadditive) we get, 
considering f = idx, 
2 . 1 
G(m,f) = B while L(m, f) = Cha f) = 3° 


ii) For X = {1,2,3} and A=2*, let m:A—>R 
be given by m,(@) = 0,m,(A) = 0.1 if card A = 1, 


Monotone Measures-Based Integrals 


5.4 General Integrals Which Are Not Universal 


m,(A) = aif card A = 2 and m,(X) = 1. Evidently, 
Ma E Mx a) if and only if a€ [0.1,1]. Let fe 
Fæ.a be given by f(1) = 3,f(2) = },fG) = 1. 
Then 


to 4 2 
G(ma, f) = sup a dha Olka 
; if a € (0.1, 0.45], 
Otte if a€]0.45, 1), 


and 


1 1 1 1 
Lime f) = spf 1+5 a+ 5 Ol, 5 


1 2 1 2 
=:-0.1+ =-0.1,=-a+=- 
+3 t3 3 Gong a 

13 if a € [0.1,0.2[, 


= 4 Hte if ae[0.2,0.55], 


a if a€]0.55, 1]. 
Moreover, 
1.1 
Chm, (f) = < s g 


Observe that m, is supermodular if and only if a € 
[0.2, 0.55] and then 


lil+a 
z 


L(ma, f) = Chm, f) = 


iii) For X finite and m € M(x a) such that m(A) € 
{0, 1} for all A C X, all universal integrals coin- 
cide, independently of the underlying semicopula 
®, Im(f) = sup {min (f(x)|x € A) |m(A) = 1}. How- 
ever, this does not hold for PAN-integral G(m, -) 
neither for the concave integral L(m, -). Consider 
as an example the greatest monotone measure m* € 
Mgx.2x) given by 


* 0 if A=@, 
m (4) = 1 else. 

Then for any universal integral J it holds 
I(m*,f) = max (f(x)|xEX), but G(m*,f)= 
L(m*,f) T Vxext @). 

iv) The only monotone measures m E€ My oxy, X fi- 
nite, such that all universal integrals as well as the 


PAN and concave integrals coincide, are so-called 
unanimity measures 


1 if BCA, 


G = 
mp, BCX, BHO, mp(A) 0 else. 


Then 


I(mg,f) = G(mg, f) = L(mg, f) 
= min (f x)|x € B) . 


Recently, a new concept of decomposition integrals 
was proposed in [5.56], unifying the PAN integral G, 
the concave integral L, the Choquet integral Ch, and the 
Shilkret integral Sh. 


Definition 5.9 

Let (X, A) be a measurable space and let H be a sys- 
tem of some finite subsystems (i. e., of collections) from 
A. Then the mapping 


Dy : Mw, ay X F.a > (0, 0] 
given by 
Dyr (m,f) = sup Yo ai m(Ai)|ai >0,i€l, 
i€l 


by b(ai, Ai) <f, Adie € H 
icl 
is called a H -decomposition integral. 


Consider the next decomposition systems 


H” = {(AD;—ı isachainin A}, nEeN ; 
Hg = {(Ai)ic; İs a finite measurable partition of X} ; 
HL=A; 

Hey = {B\B is a finite chain in A} . 


Then 
Dyo (m, ) = Shin ; 
Dig =G; 
Dy, =L; 
D Hon (m, -) = Chm . 


85 


1s |Y Hed 


86 PartA 


S'S | Y Hed 


Foundations 


2/3 
1/2 
1/3 
0 0 
1/2 1 1 
a) 0 / b) 0 13 


2/3 1 0 1 


c) 


Fig. 5.1a-c The function À (idx > f) with shaded areas expressing the corresponding integrals Dy;a) (A, idx) (a), 


D rœ (A, idy) (b), Cha (idx) (c) 


Further, the only decomposable integrals which are also 
universal integrals are the Choquet integral and H- 
decomposition integrals D4fo) and they satisfy 


Sh = Dyo < Dyo <- < Dyw <- <Ch. 


Observe that if X is finite, card X = n, then Dzfœ = Ch 
and that 


Ch = lim Dyw = sup {Dywan E N}. 


For more details and further discussion about decom- 
position integrals, we recommend [5.56-58]. 


Example 5.4 
Using the notation from Example 5.3 i), it holds 
Dar (A, idy) E 
n , 1 ————— 

H X 2(n + D 

and 
1 
s L = Chy (idẹ) . 


lim —“— = 
noo Wn+1) 2 


For better understanding, see Fig. 5.1 with the graph of 
the function A (idx > t) and with shaded areas express- 
ing the corresponding integrals. 


5.5 Concluding Remarks, Application Fields 


We have recalled and discussed several kinds of inte- 
grals defined on any measurable space for any mono- 
tone measure and any nonnegative measurable func- 
tions, restricting our considerations to the unit interval 
[0, 1]. There are several possible extensions of these in- 
tegrals to the bipolar scale [—1, 1], i.e., for integrating 
functions with range in [—1, 1]. Recall only the case of 
the Choquet integral with bipolar extensions of different 
kinds, such as: 


@ Asymmetric Choquet integral, 


Chir (f) = Ch, f”) ap Cha (f) , 
where ft:X— [0,1] is given by ft(x)= 
max (0,f(x)), f :X — [0,1] is given by f7 (x) = 
max (0, —f(x)), and m? : A — [0, 1] is a monotone 
measure dual to m. For more details see [5.18, 19, 
26]; 


© Symmetric (Šipoš) Choquet integral, 


Chè (f) = Ch, (fT) = Cha (f ) , 


m 


see [5.18, 19, 23, 26]; 

@ In the case when X is finite, two another exten- 
sions called a balanced Choquet integral [5.59] and 
a merging Choquet integral [5.60] reflecting (par- 
tial) compensation of positive and negative inputs 
were also introduced and discussed. Further gen- 
eralizations yield the background of cummulative 
prospect theory CPT (Cummulative Prospect The- 
ory) of Tversky and Kahneman [5.61, 62], however, 
then two monotone measures are considered, 


Ch, m2 P) = Chm, E?) = Chm, C) . 


Observe that economical applications of CPT have 
resulted into Nobel Prize for Tversky and Kahneman in 
2002. 


Monotone Measures-Based Integrals 


References 


Some of introduced integrals were introduced be- 
cause of solving some practical problems. For example, 
concave integral of Lehrer [5.54] is a solution of an 
optimization problem looking for a maximal global 


Among many fields where integrals discussed in 
this chapter are an important tool, we recall decision 
making under multiple criteria, multiobjective opti- 
mization, multiperson decision making, pattern recog- 


performance. nition and classification, image analysis, etc. For more 
details, we recommend [5.25, Appendix B] or [5.19]. 
References 
5.1 R.C. Archibald: Mathematics before the Greeks, Sci- 5.19 Z. Wang, G.J. Klir: Generalized Measure Theory 
ence 71(1831), 109-121 (1930) (Springer, New York 2009) 
5.2 D. Smith: History of Mathematics (Dover Publica- 5.20 E. Pap (Ed.): Handbook of Measure Theory (Elsevier, 
tions, New York 1958) Amsterdam 2002) 
5.3 J. Kepler: Nova Stereometria Doliorum Vinariorum 5.21 D. Schmeidler: Integral Representation without 
(Linz 1615) additivity, Proc. Am. Math. Soc. 97(2), 255-261 
5.4 G.W. Leibniz: Nova methodus pro maximis et min- (1986) 
imis (New method for maximums and minimums; 5.22 D. Schmeidler: Subjective probability and expected 
1684). In: A Source Book in Mathematics, ed. by utility without additivity, Econometrica 57, 571-587 
D.J. Struik (Harvard Univ. Press, Cambridge 1969) (1989) 
p. 271 5.23 J. Šipoš: Integral with respect to a pre-measure, 
5.5 I. Newton: Principia (1687) (S. Chandrasekhar: New- Math. Slov. 29, 141-155 (1979) 
tons Principia for the Common Reader, Oxford Univ. 5.24 P. Benvenuti, R. Mesiar, D. Vivona: Monotone set 
Press, Oxford, 1995) functions-based integrals. In: Handbook of Mea- 
5.6 G.F.B. Riemann: On the Hypotheses Which Underlie sure Theory, ed. by E. Pap (Elsevier, Amsterdam 2002) 
Geometry, Habilitation Thesis (Universitat Göttingen, pp. 1329-1379 
Göttingen 1854), published first in Proc. R. Philos. 5.25 M. Grabisch, J.-L. Marichal, R. Mesiar, E. Pap: Ag- 
Soc. Göttingen 13, 87-132 (1854) in German gregation Functions, Encyclopedia of Mathematics 
5.7 S.B. Chae: Lebesgue Integration (Marcel Dekker, Inc., and Its Applications, Vol. 127 (Cambridge Univ. Press, 
New York 1980) Cambridge 2009) 
5.8 G. Vitali: Sulla definizione di integrale delle funzioni 5.26 D. Denneberg: Non-Additive Measure and Integral 
di una variabile, Ann. Mat. Pura Appl. IV 2, 111-121 (Kluwer, Dordrecht 1994) 
(1925) 5.27 A. Chateauneuf, J.-Y. Jaffray: Some characterizations 
5.9 G. Choquet: Theory of capacities, Ann. Inst. Fourier of lower probabilities and other monotone capaci- 
(Grenoble) 5, 131-292 (1953) ties through the use of Möbius inversion, Math. Soc. 
5.10 L.A. Zadeh: Fuzzy sets, Inform. Control 8, 338-353 Sci. 17, 263-283 (1989) 
(1965) 5.28 A.P. Dempster: Upper and lower probabilities in- 
5.11 M. Sugeno: Theory of Fuzzy Integrals and Applica- duced by a multi-valued mapping, Ann. Math. Stat. 
tions, Ph.D. Thesis (Tokyo Inst. of Technology, Tokyo 38, 325-339 (1967) 
1974) 5.29 G. Shafer: A Mathematical Theory of Evidence 
5.12 M. Sugeno, T. Murofushi: Pseudo—additive measures (Princeton Univ. Press, Princeton, NJ 1976) 
and integrals, J. Math. Anal. Appl. 122, 197-222 (1987) 5.30 R.R. Yager, L. Liu: Classic works of the Dempster- 
5.13 E. Pap: Integral generated by decomposable mea- Shafer theory of belief functions. In: Studies in 
sure, Univ. u Novom Sadu Zb. Rad. Prirod.-Mat. Fak. Fuzziness and Soft Computing, ed. by R.R. Yager, 
Ser. Mat. 20(1), 135-144 (1990) L. Liu (Springer, Berlin 2008) 
5.14 S. Weber: 1-decomposable measures and integrals 5.31 R. Mesiar, J. Li, E. Pap: The Choquet integral as 
for Archimedean t-conorms L, J. Math. Anal. Appl. Lebesgue integral and related inequalities, Kyber- 
101, 114-138 (1984) netika 46(6), 1098-1107 (2010) 
5.15 L.A. Zadeh: Fuzzy sets as a basis fora theory of pos- 5.32 R. Mesiar: k-order additive fuzzy measures, Int. 
sibility, Fuzzy Sets Syst. 1, 3-28 (1978) J. Uncertain. Fuzziness Knowl.-Based Syst. 7(6), 561- 
5.16 D. Dubois, H. Prade: Fuzzy Sets and Systems, Theory 568 (1999) 
and Applications (Academic, New York 1980) 5.33 M. Sugeno: Fuzzy measure and fuzzy integral, Trans. 
5.17 M. Grabisch, T. Murofushi, M. Sugeno (Eds.): Fuzzy Soc. Instrum. Control Eng. 8, 95-102 (1972) 
Measures and Integrals, Theory and Applications 5.34 R. Mesiar: k-order Pan-discrete fuzzy measures, 
(Physica, Heidelberg 2000) Proc. IFSA'97 1, 488-490 (1997) 
5.18 E. Pap: Null-Additive Set Functions (Kluwer, Dor- 5.35 J.L. Marichal: An axiomatic approach of the discrete 


drecht 1995) 


Sugeno integral as a tool to aggregate interacting cri- 


87 


S | Y Hed 


88 PartA | Foundations 
teria in a qualitative framework, IEEE Trans. Fuzzy 5.48 E.P. Klement, R. Mesiar, E. Pap: Measure-based ag- 
Syst. 9(1), 164-172 (2001) gregation operators, Fuzzy Sets Syst. 142(1), 3-14 
5.36 S. Ovchinnikov, A. Dukhovny: Integral representation (2004) 
of invariant functionals, J. Math. Anal. Appl. 244, 5.49 H. Imaoka: On a subjective evaluation model by 
228-232 (2000) a generalized fuzzy integral, Int. J. Uncertain. Fuzzi- 
5.37 P.S. Mostert, A.L. Shield: On the structure of semi- ness Knowl.—Based Syst. 5, 517-529 (1997) 
groups on a compact manifold with boundary, Ann. 5.50 Q. Yang: The pan-integral on fuzzy measure space, 
Math. 65, 117-143 (1957) Fuzzy Math. 3, 107-114 (1985), in Chinese 
5.38 N. Shilkret: Maxitive measures and integration, 5.51 Z. Wang, G.J. Klir: Fuzzy Measure Theory (Plenum, 
Indag. Math. 33, 109-116 (1971) New York 1992) 
5.39 W. Sander, J. Siedekum: Multiplication, distributivity 5.52 R. Mesiar, J. Rybarik: Pan-operations structure, Fuzzy 
and fuzzy-integral II & Ill, Kybernetika 41(4), 497-518 Sets Syst. 74, 365-369 (1995) 
(2005) 5.53 Q. Zhang, R. Mesiar, J. Li, P. Struk: Generalized 
5.40 T. Murofushi, M. Sugeno: Fuzzy t-conorm integrals Lebesgue integral, Int. J. Approx. Reason. 52(3), 427- 
with respect to fuzzy measures: generalizations of 443 (2011) 
Sugeno integral and Choquet integral, Fuzzy Sets 5.54 E. Lehrer: A new integral for capacities, Econ. Theory 
Syst. 42, 51-57 (1991) 39, 157-176 (2009) 
5.41 R. Mesiar: Choquet-like integrals, J. Math. Anal. 5.55 E. Lehrer, R. Teper: The concave integral over large 
Appl. 194, 477-488 (1995) spaces, Fuzzy Sets Syst. 159, 2130-2144 (2008) 
5.42 E. Pap: Pseudo-convolution and its applications. In: 5.56 Y. Even, E. Lehrer: Decomposition-integral: unifying 
Fuzzy Measures and Integrals, Theory and Applica- Choquet and the concave integrals, Econ. Theory 56, 
tions, ed. by M. Grabisch, T. Murofushi, M. Sugeno 33-58 (2014) 
5 (Physica, Heidelberg 2000) pp. 171-204 5.57 R. Mesiar, A. Stupňanová: Decomposition integrals, 
o 5.43 W. Sander, J. Siedekum: Multiplication, distributiv- Int. J. Approx. Reason. 54(8), 1252-1259 (2013) 
= ity and fuzzy-integral I, Kybernetika 41(3), 397-422 5.58 A. Stupňanová: Decomposition integrals, Comm. 
oz (2005) Comput. Info. Sci. 300, 542-548 (2012) 
ul 5.44 E.P. Klement, R. Mesiar, E. Pap: A universal integral 5.59 A. Mesiarova-Zemankova, R. Mesiar, K. Ahmad: The 
as common frame for Choquet and Sugeno integral, balancing Choquet integral, Fuzzy Sets Syst. 161(7), 
IEEE Trans. Fuzzy Syst. 18, 178-187 (2010) 2243-2255 (2010) 
5.45 S. Weber: Two integrals and some modified version- 5.60 R. Mesiar, A. Mesiarovaé-Zemankova, K. Ahmad: 
critical remarks, Fuzzy Sets Syst. 20, 97-105 (1986) Discrete Choquet integral and some of its sym- 
5.46 R.B. Nelsen: An Introduction to Copulas, Lecture metric extensions, Fuzzy Sets Syst. 184(1), 148-155 
Notes in Statistics, Vol. 139, 2nd edn. (Springer, New (2011) 
York 2006) 5.61 A. Tversky, D. Kahneman: Advances in prospect the- 
5.47 H. Imaoka: Comparison between three fuzzy in- ory: Cumulative representation of uncertainty, J. Risk 
tegrals. In: Fuzzy Measures and Integrals, Theory Uncertain. 5, 297-323 (1992) 
and Applications, ed. by M. Grabisch, T. Muro- 5,62 A. Tversky, D. Kahneman: Rational choice and 


fushi, M. Sugeno (Physica, Heidelberg 2000) pp. 273- 
286 


the framing of decisions, J. Bus. 59(278), 251-278 
(1986) 


89 


6. The Origin of Fuzzy Extensions 


Humberto Bustince, Edurne Barrenechea, Javier Fernández, Miguel Pagola, Javier Montero 


Many different kinds of sets have been defined 6.4 Interval-Valued Fuzzy GREG nais 98 
ithin the fi k of fi ts. Thi fo- 6.4.1 Two Interpretations 
Balad Ll ek SE le! tech of Interval-Valued Fuzzy Sets.......... 99 


cusses on those fuzzy set extensions that address 


6.4.2 Shad d Sets A Particul 
the difficulties that experts find in order to build ee eee 


Case of Interval-Valued Fuzzy Sets. 100 


the membership values. In particular, we ana- 6.4.3 Interval-Valued Fuzzy Sets 

lyze type-2 fuzzy sets, interval-valued fuzzy sets, Are a Particular Case of Type-2 
Atanassov's intuitionistic fuzzy sets, or bipolar sets FUZZY SO i a daiacxtcnucaionk: 100 
of type-2 and Atanassov's interval-valued fuzzy 6.4.4 Some Problems 

sets. After stating a general approach to these with Interval-Valued Fuzzy Sets.... 100 
extensions, we remark some structural problems GS PB ONESUONE «csc css.deckhinccesseoncaeeses 100 
in the extension problem and stress some ap- 6.5 Atanasssov’s Intuitionistic Fuzzy Sets 
plications for which the results obtained with or Bipolar Fuzzy Sets of Type 2 

extensions are better than those obtained with or IF Fuzzy Sets .........0.0.cccccccccceeseeseeeees 103 
Zadeh's fuzzy sets. 6.5.1 Relation Between Interval-Valued 


Fuzzy Sets and Atanassov's 
Intuitionistic Fuzzy Sets: 


6.1 Considerations Prior to the Concept Two Different Concepts ................. 103 
of Extension of Fuzzy Sets .............00..00.. 90 6.5.2 Some Problems 
6.1.1 Brouwer's Intuitionistic Logic.......... 91 with the Intuitionistic Sets 
6.1.2 Lukasiewicz's Multivalued Logics..... 91 Defined by Atanassov...........ccce. 104 
6.1.3 Zadeh's Fuzzy Logic. 6.5.3 Applications .......c cece cece eee 104 
First Generalization by Goguen ....... 91 6.5.4 The Problem of the Name............. 104 
6.2 Origin of the Extensions ..................00000. g3 6-6 Atanassov’s Interval-Valued 
Intuitionistic Fuzzy Sets........................ 105 
6.3 Type-2 Fuzzy Sets..................... perae 94 67 Links Between the Extensions 
6.3.1 Type-2 Fuzzy Sets as a Lattice.......... 94 of Fuzzy Sets... eiieeii 106 
6:3.2 Remarks on the Notation................ 95 
6.3.3 A First Definition of Operations we a Lise ara ae ee y 
pelle oe 95 6.8.2 Fuzzy Multisets and n-Dimensional 
ee ee FUZZY SOUS. e gige eseas 106 


6.3.4 Problems with the Lattice-Based 
Definitions. Operations Based 
on Zadeh's Extension Principle ....... 96 


6.8.3 Bipolar Valued Set or Bipolar Set... 107 
6.8.4 Neutrosophic Sets or Symmetric 


Parry BipOla¥ SOUS ..0.és3..03.cssisceesssce0sas08 107 

oa ae 6.8.5 Hesitant Sets... cece eee eee 107 
Zadeh's Extension Principle 6.8.6 Fuzzy SOM SAS oa.5. si cccssscsessiacacessens 107 
ApPproaCh o3 65 ccssscenssscsseseesdocesaesoedeas 96 6.8.7 Fuzzy Rough Sets „n... 107 

6.3.6 About Computational Efficiency....... ög 69 COMCIUSIONS 2002. 005 005s cece ssc ccsssnecdaevsseceserss 108 


637 Applications <......cccscscesesesccesscsseeans 98 ReEPEPENCES .... 6 scc565cccsscsicessvessecasssscessesecdeszess 108 


90 PartA 


Foundations 


9 |Y Hed 


Many different types of fuzzy sets have appeared in the 
literature since Zadeh introduced the concept of fuzzy 
set (or type-1 fuzzy set) [6.1]. Roughly speaking, the 
basic characteristics of all those definitions are the fol- 
lowing: 


i) They are particular instances of the L-fuzzy sets de- 
fined by Goguen [6.2]. 

ii) They arise from theoretical problems and are very 
efficient to solve such theoretical problems. 

iii) The specific characteristics of the new definitions 
do not use to play a formal role, quite often becom- 
ing an easy adaptation of Zadeh’s fuzzy sets. 

iv) It is not always shown to what extent the new pro- 
posal implies a practical advantage when compared 
to Zadeh’s fuzzy sets. 


The last point gives rise to a key criticism when ad- 
ditional information is needed for the management of 
a new kind of fuzzy sets, but the improvement we ob- 
tain in practice cannot be justified by the effort required 
to obtain such an information. But more important than 
that is the previous criticism (iii), about the difficulty 
of building the best family of sets for the application 
we are considering. Surprisingly, this key issue has not 
captured the attention of too many researchers. 

In this paper, we shall focuss on those sets con- 
ceived to address the problem stated by Zadeh in 1971 
in order to address the difficulty of finding the mem- 
bership degree of each element (we shall refer to these 
sets as extensions of the fuzzy sets), and then we shall 
point out applications that can be found in the litera- 
ture in which the use of some extensions provides better 


results than the use of type-1 fuzzy sets, according to 
the comparison carried out in the papers where this im- 
provement is shown. Once the definition of extension of 
fuzzy sets has been introduced, we shall describe some 
of its properties and remark the structural problems of 
the different types of these extensions. Among those 
extensions we shall consider type-2 fuzzy sets, interval- 
valued fuzzy sets, Atanassov’s intuitionistic fuzzy sets 
or type-2 bipolar fuzzy sets and Atanassov’s interval- 
valued fuzzy sets. 

We have organized this chapter as follows. In 
Sect. 6.2 we start recalling the reasons that led Zadeh 
to introduce fuzzy sets. We also remind the basic no- 
tions in Brouwer’s intuitionistic theory to later justify 
the terminological problems linked to the sets defined 
by Atanassov. In Sect. 6.3 we present the origin of the 
extensions of fuzzy sets as well as the definitions. Sec- 
tion 6.4 is devoted to type-2 fuzzy sets. We stress the 
problems related to the definition of the basic operations 
and the terminology. In Sect. 6.5 we analyze a particular 
case of the previous sets, namely, interval-valued fuzzy 
sets. We present their properties and different construc- 
tion methods, depending on the application that we are 
dealing with. We also refer to the papers in which it 
is shown that the results that we obtain with these sets 
are better than those obtained with other techniques. 
In Sects. 6.6 and 6.7 we describe the sets defined by 
Atanassov. Section 6.8 explains the links between the 
considered extensions. In Sect. 6.9 we exhibit some 
other definitions of fuzzy sets in the literature that do 
not fall into the scope of our notion of extension. We 
finish with some conclusions and references. 


6.1 Considerations Prior to the Concept of Extension of Fuzzy Sets 


In classical logic, propositions can only be either true 
or false. Aristotle formulated the basic principles of 
this logic: the noncontradiction principle (a statement 
cannot be true and false at the same time) and the 
middle-excluded principle (every statement is either 
true or false). 

It is easy to note that there are many situations 
for which more than two truth values are needed. 
This fact led C.S. Peirce to say that Aristotle’s for- 
mulation is the simplest hypothesis we can work with. 
In fact, meanwhile human knowledge representation 
is based upon concepts [6.3], and these concepts are 
not crisp in nature, we should not expect that hu- 
man beings use binary logic so often in their daily 
life. 


Everyday situations such as taste, meaning of ad- 
jectives, etc., can only be studied precisely if gradings 
more complex than true or false are considered. Even 
very widely used mathematical models can lead to 
paradoxes. For instance, quite often we are forced to 
establish arbitrary cuts in order to make reality fit our 
binary model. 

These considerations led to propose different log- 
ical formulations which allowed for more than two 
truth values, like Brouwer’s intuitionistic logic (par- 
tially caught by the so-called intuitionistic propositional 
calculus modeled by Heyting algebras), multivalued 
logics presented by Lukasiewicz, or Zadeh’s fuzzy logic 
(which replaces the set {0,1} by the set [0, 1]), for 
example. 


The Origin of Fuzzy Extensions 


6.1 Considerations Prior to the Concept of Extension of Fuzzy Sets 


6.1.1 Brouwer's Intuitionistic Logic 


In 1907, the Dutch mathematician L.E.J. Brouwer 
(1881-1966) introduced the intuitionistic logic. Be- 
tween the precursors of intuitionistic logic, we can 
include Kronecker, Poincare, Borel, or Weyl. 

For intuitionistic researchers, the objects of study in 
Mathematics are just some intuitions of the mind and 
the constructions that can be made with them. Hence, 
the intuitionistic mathematics only handles built objects 
and only recognizes the properties assigned to these ob- 
jects in their construction. In particular, the negation of 
the impossibility of a fact is not a construction of such 
a fact, and so both the double negation principle and 
the reduction ad absurdum method are not acceptable 
for the intuitionist. In the same way, it may happen that 
it is impossible to build both a fact and its negation, 
so also the middle-excluded principle is excluded by 
intuitionism. 

In 1930 Heyting, a Brouwer’s disciple, went one 
step ahead and defined a propositional calculus in terms 
of axioms and rules in Hilbert’s style. This calculus 
is known as intuitionistic propositional calculus (intu- 
itionistic logic). For several decades, the research in 
intuitionism was almost stopped. But it has reappeared 
with strength in the logic of categories and topos [6.4, 
5]. In this sense, the studies by Takeuti and Titani in 
1984 [6.6] on intuitionistic fuzzy logic and intuitionis- 
tic fuzzy set theory are of special interest for us. In [6.7], 
it is settled that 


Takeuti and Titani’s intuitionitic fuzzy logic is sim- 
ply an extension of intuitionistic logic, i.e., all 
formulas provable in the intuitionistic logic are 
provable in their logic. They give a sequent calculus 
which extends Heyting intuitionistic logic, an exten- 
sion that does not collapse to classical logic and 
keeps the flavor of intuitionism. 


6.1.2 Lukasiewicz's Multivalued Logics 


In 1920s, Jan Lukasiewicz (1878-1956) along with 
Lesniewski founded a school of logic in Warsaw that 
became one of the most important mathematical teams 
in the world, and among whose members was Alfred 
Tarski. 

Lukasiewicz’s idea consists in distributing the truth 
values uniformly on the [0, 1] interval: if n values are 
considered, they should be 0, a Žo aguS E, 1; if 


they are infinite, we should take QN [0, 1]. Negation is 


defined as n(x) = 1 — x, and the following operation is 
also defined: x ® y = min(1,x+ y). 


6.1.3 Zadeh's Fuzzy Logic. 
First Generalization by Goguen 


Consistently to Lukasiewicz’s studies, Zadeh [6.1] in- 
troduced fuzzy logic in his 1965 paper, Fuzzy Sets. 
Born in Azerbaijan in 1921, he moved to the Univer- 
sity of California at Berkeley in 1959. His ideas on 
fuzzy sets were soon applied to different areas such as 
artificial intelligence, natural language, decision mak- 
ing, expert systems, neural networks, control theory, 
etc. 

In mathematics, every subset of a given referential 
universe U can be identified with its characteristic func- 
tion f; that is, the function f: U — {0, 1} which takes the 
value 1 if the element belongs to the considered subset 
and 0 in other case. In contrast, a fuzzy set is a mapping 
from the universe U to [0, 1]; that is, 


Definition 6.1 
A fuzzy set (or type-1 fuzzy set) A over a referential 
set U is an object 


A = {(uj, Ma(ui))|ui € U} , 
where ua: U — [0, 1]. 


Ha (u;i) represents the degree of membership of the el- 
ement u; € U to the set A. The elements for which 
Ha (ui) = 1 belong to the set A; those for which 
Ha (ui) = 0 do not belong to A and there are elements 
with a greater or smaller degree of membership to A 
depending on pa (u;). 

We are going to denote by FS(U) the class of fuzzy 
sets defined over U; that is, FS(U) = [0, 1]”. The mem- 
bership degree of an element u; € U to the fuzzy set A 
is usually denoted by A(u;) instead of ya (u;). 

From the classical definition of union and inter- 
section for crisp sets, Zadeh proposes the following 
definitions: 


AU B(uj) = max(A(u;), B(uj)) š 
ANB(u;) = min(A(u;), B(u;)) . aiei 


A key concept in the following developments is that 
of lattice. We review now its definition, that can be 
found for instance in [6.8]. 


91 


l'9 | Y Hed 


92 


9 |Y Hed 


Part A 


Foundations 


Recall that an order relationship over a set L is a re- 
lation <z such that 


i) x <z x for all x € L (reflexivity); 

ii) if x <z y and y <z z then x <z z for any x,y,z E€ L 
(transitivity); 

iii) if x <z y and y <z x, then x = y, for any x, y € L (an- 
tisymmetry). 


If <z is an order relationship over L then (L, <;) 
is called a partially ordered set. Now, in order to de- 
fine a lattice we need first to introduce the following 
definition. 


Definition 6.2 

Let (L, <L) be a partially ordered set and A C L (in the 
sense of the usual set theory). The greatest lower bound 
of A (if it exists) is the element xint € L such that: 


i) Xing <z z for all z € A and 
ii) for any y € L such that y <z z for all z € A it follows 
that y <p Xing. 


Analogously, the least upper bound of A (if it exists) is 
the element Xsup € L such that 


i) Z <L Xsup for all z € A and 
ii) for any y € L such that z <z y for all z € A it follows 
that Xsup <z y. 


Now we can introduce the notion of lattice. 


Definition 6.3 

A lattice is a partially ordered set (L, <z) such that any 
two elements x, y € L have the greatest lower bound or 
meet, denoted by x A y and the lowest upper bound or 
join, denoted by xV y. A lattice L is called complete 
if any subset of L has the lowest upper bound and the 
greatest lower bound. 


Given a lattice (L <,), we will call supremum of 
L and denote by 1z the lowest upper bound of L (if it 
exists). Analogously, we will call the infimum of L and 
denote by 0, the greatest lower bound of L. In case both 
1, and O; exist, L is called a bounded lattice. 

Observe that if we know how the join and meet op- 
erations are defined for any two elements of a set L, 
we can recover the ordering <z just by defining for any 
x,yEL 


x <, y if and only ifx ^y =x 


if and only ifxv y= y 


Taking into account (6.1) and Definition 6.3, it is easy 
to prove the following theorem. 


Theorem 6.1 
(FS(U), U, N) is a complete lattice. 


From Theorem 6.1 and the concept of lattice, we 
can define the following partial order relation: For 
A,B € FS(U) 


A <rs B if and only if A(u;) < B(u;) 


for every wE U. 


The first criticism to fuzzy sets theory arises from 
this order relation <ps. Since Zadeh presented fuzzy 
sets to represent uncertainty, it comes out that <;s is 
a crisp relation. Note that the following may happen: 
Let U be a referential set with 1000 elements and let A 
and B be two fuzzy sets over U such that for every 
element except for one A(u;) < B(u;). Then, from the 
previous relation, A is not less than B. This fact led 
Willmott [6.9], Bandler and Kohout [6.10] and others 
to consider the concept of inclusion measure. These 
measures have been widely used in fuzzy morphologic 
mathematics [6.11], in image processing [6.12], etc. 

It is easy to see that with the operations defined 
in (6.1) and the standard negation, n(x) = 1— x for 
all x € [0, 1], neither the noncontradiction principle nor 
the middle excluded principle hold. Nowadays, op- 
erations in (6.1) are given in terms of t-norms and 
t-conorms [6.13-16]. 

Definition 6.1 can be clearly extended to consider 
mappings valued over any kind of set. In particular, 
for our future developments and following Goguen’s 
work [6.2], it is interesting to consider the case of map- 
pings that take values over a lattice L. In this case, we 
speak of L-fuzzy sets. 

Taking into account Definition 6.3 Goguen presents 
the concept of L-fuzzy set as follows: 


Definition 6.4 
Let (L, V, A) be a lattice. An L-fuzzy set over the refer- 
ential set U is a mapping 


£U—>L. 
For a given lattice L, we will denote by L-FS(U), 


the space of L-fuzzy sets over the referential U. That is, 
L-FS(U) = L’. 


The Origin of Fuzzy Extensions | 6.2 Origin of the Extensions 


Union and intersection of L-fuzzy sets can be easily 
defined as follows. 


Definition 6.5 

Let L be a lattice, and let V and A be its join and meet 
operators respectively. Then intersection and union are 
defined, respectively, by: 


i) 
Nz: L-FS(U) x L-FS(U) — L-FS(U) given by 
Nz (A, B)(uj) = A(uj) A B(ui) . 
In order to recover the usual notation for fuzzy sets, 
we will write Nz (A, B) as AN, B; 
ii) 


Ur: L-FS(U) x L-FS(U) —> L-FS(U) given by 
Uz (A, B)(uj) = A(u;) V B(ui) . 


In order to recover the usual notation for fuzzy sets, 
we will write Uz (A, B) as A Uz B. 


We can state the following result for L-fuzzy sets. 


Proposition 6.1 

Let L be a bounded lattice with a supremum given by 
1, and an infimum given by 0z. Let V and ^ be the 
join and meet operators of L, respectively. Then, the set 


6.2 Origin of the Extensions 


In 1971, Zadeh in his paper [6.17] settled that the con- 
struction of the fuzzy sets, that is, the determination of 
the membership degree of each element to the set, is 
the biggest problem for using fuzzy sets theory in ap- 
plications. This fact led him to introduce the concept of 
type-2 fuzzy set. 

Later, in December 11, 2008, in the bisc-group mail 
list Zadeh proposes the following definitions. 


Definition 6.6 

Fuzzy logic is a precise system of reasoning, deduction, 
and computation in which the objects of discourse and 
analysis are associated with information which is, or is 
allowed to be, imperfect. 


Definition 6.7 

Imperfect information is defined as information which 
in one or more respects is imprecise, uncertain, vague, 
incomplete, partially true, or partially possible. 


(L-FS(U), <r-rs(yy) is a bounded lattice, where the or- 
der is defined as 


A <,.rs(v) B if and only if A Uz B = B 
or equivalently 

A <_rsv) B if and only if ANL B =A. 
That is 


A SL-FS(U) Bif and only if A(u;) Vv B(uj) = B(uj) 
for all u; € U 


or equivalently 


A <,.rs(v) B if and only if A(u;) A B(u;) = A(u;) 
foral u;E€ U. 


The supremum of this lattice is given by 


lLFsU) $ U > E 3 


uj; —> 1, 
and the infimum is given by 


OL-Fs(u) :U—>L 


ui > 0L. 


On the same date and place, Zadeh made the fol- 
lowing remarks: 


1. In fuzzy logic everything is or is allowed to be 
a matter of degree. Degrees are allowed to be 
fuzzy. 

2. Fuzzy logic is not a replacement for bivalent logic 
or bivalent-logic-based probability theory. Fuzzy 
logic adds to bivalent logic and bivalent-logic- 
based probability theory a wide range of concepts 
and techniques for dealing with imperfect informa- 
tion. 

3. Fuzzy logic is designed to address problems in rea- 
soning, deduction, and computation with imperfect 
information which are beyond the reach of tradi- 
tional methods based on bivalent logic and bivalent- 
logic-based probability theory. 

4. In fuzzy logic the writing instrument is a spray pen 
(Fig. 6.1) with a precisely known adjustable spray 
pattern. In bivalent logic the writing instrument is 
a ballpoint pen. 


93 


79 |Y Hed 


94 PartA 


Foundations 


€°9 | Y Hed 


5. The importance of fuzzy logic derives from the fact 
that in much of the real-world imperfect informa- 
tion is the norm rather than exception. 


All these considerations justify the use of fuzzy sets 
theory whenever objects are linked to soft concepts, 
those that do not show clear boundaries. Of course, ap- 
plications might require tools other than fuzzy [6.18]. In 
any case, if we decide to use fuzzy sets and it is hard for 
us to build the characteristic functions of the involved 
sets, then we must use set representations that take into 
account these difficulties, and focus on those fuzzy sets 
that we call extensions. 


6.3 Type-2 Fuzzy Sets 


The idea of taking into account the experts’ uncertainty 
when they build the membership degrees of the ele- 
ments to a given fuzzy sets led Zadeh to present in 
1971 the notion of type-2 fuzzy set [6.17] as follows: 
A type-2 fuzzy set is a fuzzy set over a referential set U 
for which the membership degrees of the elements are 
given by fuzzy sets defined over the referential set [0, 1]. 

The mathematical formalization of this concept was 
made in 1976 by Mizumoto and Tanaka in [6.19] and in 
1979 by Dubois and Prade in [6.20] as follows: 


Definition 6.8 
A type-2 fuzzy set is a mapping A: U > FS([0, 1]). 


In Fig. 6.2 we show an example of type-2 fuzzy set. 
We denote by T2FS(U) the set of all type-2 fuzzy 
sets over U. That is 


T2FS(U) = (FS([0, 1)” . 
6.3.1 Type-2 Fuzzy Sets as a Lattice 
From Definition 6.8, the following result is obvious. 
Corollary 6.1 
Type-2 fuzzy sets are a particular type of Goguen’s L- 
fuzzy sets. 

Taking into account Corollary 6.1, it is clear that we 
can define the following operations over type-2 fuzzy 


sets [6.21]. 


Definition 6.9 
The operations of union U7 and intersection N72 of 


So the origin of the concept of extension of fuzzy 
sets is directly associated with the idea of building fuzzy 
sets that allow us to represent objects that are described 
through imperfect information, and that also allow us to 
represent the lack of knowledge or uncertainty associ- 
ated with the membership degrees that are given by the 
experts. 

It is clear that working with extensions implies that 
we need to use more information than in the basic 
model of Zadeh. As already pointed out, in order to jus- 
tify the use of these extensions in practice, the results 
obtained with them must be better than those obtained 
with usual fuzzy sets. 


A,B € T2FS(U) (in the sense of lattices) are defined, 
respectively, as 


Ur (A, B): U — FS((0, 1]) given by 
A Urz B(uj) = A(uj) U B(ui) 


and 


Nr (A, B): U > FS((0, 1]) , 
AN 72 B(u;) = A(uj) N B(ui) . 


Proposition 6.2 
The set (T2FS(U), Ur2, N72) is a bounded lattice with 
respect to the order 


A <mrs{v) B if and only if A Ur2 B = B 
or equivalently 

A <mrsw) B if and only if A Nmn B =A. 
That is 


A <12FS(U) Bif and only if A(u;) U B(u;) = B(u;) 
for all u; € U 


or equivalently 


A ITFS(U) B if and only if A (u;i) N B(uj) = A(uj) 
for allu;eU. 


The supremum of this lattice is given by I72Fs(y): 
U — FS(U) where, for every uj € U, lrzrs{v) (ui) is 


The Origin of Fuzzy Extensions | 6.3 Type-2 Fuzzy Sets 


Fig. 6.1 The writing instrument is a spray pen 


the fuzzy sets that assigns to every t € [0, 1] member- 
ship equal to 1. The infimum is given by O7275(y): U > 
FS(U) where, for every u; € U, Orzrs(v) (ui) is the fuzzy 
sets that assigns to every t € [0, 1] membership equal 
to 0. 


6.3.2 Remarks on the Notation 


Mizumoto and Tanaka in 1976 [6.19] and Mendel and 
John in 2000 [6.22] used the following notation: 


J T 7 


uEU tEJy 


J, C [0,1], 


where J,, is the primary membership of u € U and, for 
each fixed u = uo, the fuzzy set Siera A(uo, t)/t is the 
secondary membership of ug. 

From our point of view, this notation is not the most 
appropriate one, so now we try to introduce a more 
clarifying notation. Observe that a type-2 fuzzy set as- 
signs to an element in the referential U a mapping 
A(u): [0, 1] — [0, 1]. To represent fuzzy sets (or type-1 
fuzzy sets) defined by a mapping A it is quite usual the 
notation 


{(u;, A(u;)) | u € U}. (6.2) 


In this type-1 case, A (u) is a real number in [0, 1] for ev- 
ery u; € U. In the case of type-2 fuzzy sets, if we imitate 
this notation, we formally lead to {(u;, A(u;)) | u; € U}. 
But now for each u; E€ U, we have that A(u;) is not a real 
number but a mapping (a type-1 fuzzy set) 


A(u):[0, 1] > [0, 1], 
t— A(u)(t). 


Fig. 6.2 Example of a type-2 fuzzy set 


Taking into account these considerations Harding 
et al. [6.21] and Aisbett et al. [6.23] suggested the fol- 
lowing notation for a type-2 fuzzy set A: 


A= {(u;, (t, A(u;)(t)|u; € U ,t € [0, 1}. 
But an easier one to use one could be the following. 


Definition 6.10 
Let A: U > FS((0, 1]) be a type-2 fuzzy set. Then A is 
denoted as 

{ (ui, A(uj, t)) |ui eU,te [0, 1]} : 
where A(u;,-):[0, 1] — [0,1] is defined as A(u;, t) = 
A(ui) (0). 


6.3.3 A First Definition of Operations 
Between Type-2 Fuzzy Sets: 
Lattice-Based Approach 

With Definition 6.10, if we have two type-2 fuzzy sets 

A= {uj (A(ui, 1) | u; E€ U , t € [0,1]; 
and 
B = {(u;, (B(uj, t)) | u; € U , t € [0, 1]} 

we have (Fig. 6.3) 

A Urrs B = {(u;, A U B(u;, t)) |u; E€ U , t€ [0, 1]}, 
where, for each u; € U and each t € [0, 1], we have 


A U B(u;, t) = max(A(u;, t), B(u;, t)) 
= max(A(u;)(t), B(uj) £) (6.3) 


95 


€°9 | Y Hed 


96 Part A | Foundations 
Analogously, and 
= {(u; 1 ift=ł 
A rors B {(u;, A N B(u;, t)) | uj E U; tE [0, 1}} ’ Br2(u)(t) = i 4 ; 
0 in other case 
where, for each u; € U and each t € [0, 1], we have 
1 ift=3 
Bro(u2)(t) = ; 
AN B(u;, t) = min(A(u;, t), B(uj, t)) 0 in other case 
= min(A(u;)(t), B(u;) (0) (6.4) 1 ifr=1t 
Br2(u3)(t) = 7 ; 
; nase nE: 0 in other case 
Observe that this notation is very similar to that pro- 
posed by Deschrijver and Kerre [6.24, 25]. When we have 
6.3.4 Problems with the Lattice-Based i depleted 
Definitions. Operations Based Ar Urzrs Bro (u1) (À) = | f 4 2, 
on Zadeh's Extension Principle 0 in other case 
1 ift=}ort=} 
Although meaningful from a mathematical point of Ar Urors Bro (u2) (t) = j 
view, as pointed out by Dubois and Prade in [6.26], 0 in other case 
from these definitions we do not recover the usual ones 
for fuzzy sets. To see it, just consider a finite referential and 
set U = {u1, u2, u3} with three elements, and consider . i 
the following two fuzzy sets over U. We use the nota- Aro Urzrs Br (u3) (t) = 1 E =7;o0rt=1 
tion of (6.2) for the sake of brevity. 0 in other case 
1 1 which does not coincide with our previous result. More- 
2 A=)|(u, 2)’ u2, 3j’ (u3, 1) over, observe that we do not even recover a fuzzy set but 
4 a true type-2 fuzzy set. 
> and In order to solve this problem, several authors [6.19, 
= 22,26] proposed the following definitions of the opera- 
w tions of union and intersection. 


r-e 


Then we have, for instance, 


AUB= Hu 5) ; (i IEG D} 


On the other hand, we can also see A and B as type-2 
fuzzy sets, that we denote by A, and Bro, respectively, 
just taking 


1 ifr=} 
Ar2(uy)(t) = 2 : 
r) F in other case 


1 ift=i 
Ar2(u2)(t) = f ' 
72( 2)(t) in other case 


1 iff=1 


A72(u3)(t) = ‘ 
raus) (0) 0 in other case 


6.3.5 Second Definition of the Operations: 
Zadeh's Extension Principle Approach 


Definition 6.11 
Given two type-2 fuzzy sets 


A= {(uj,A(uj, t)) | u; € U , t € [0, 1]} 
and 

B = {(u;, B(uj, t)) | u; € U , t€ [0, 1]} 
we can define (Fig. 6.4) 

ANB = {(u;, AN B(u;, t)) | u;E€ U , t € [0,1]} 
with 


ANB(u;,t)= sup min(A(u;, z), B(ui, w)) 


min(z,w)=t 


The Origin of Fuzzy Extensions | 6.3 Type-2 Fuzzy Sets 97 


and 
AUB = {(uj;, AU B(u;, f)) | u;€ U , t € [0, 1]} 1 
0.8 
with 0.6 
0.4 
AUB(u;,t)= sup min(A(u;, z), B(uj, w)) . 0.2 
max(z,w)=t 1 
0.8 
For instance, let us recover our previous example. 06 
Consider the type-2 fuzzy sets Ar2 and Br2. Then we DA 
have that Ae : 
i 4 45 
Ar UBro(u1,t) = a) D a eon oe io 


0 ifrg {4,4} 
sup (min(Arz(u1, z), Br2(u1, w))) . 
max(z.w)=t 


in other case 


But if t = i then, as } > 1 and since Ar2 (u1, z) = 0 for 
all z < L, it follows that min(Ar2(u1, z), Br2 (u1, w)) = 
! L, then 


0 whenever max(z,w) = z. Finally, if t= 
min(A72 (uy, $), Bro(ur, D) = 1, so we finally arrive at 


4 


0 ifti 


Ar U Br2 (u1, t) = ‘ 
72 U Br2(uy, t) i gisi 


Since for uz and u3 the same arguments work, we see 
that we indeed recover the fuzzy case. In particular, with 
respect to these new operations, we have the following 
result [6.21]. 


€°9 | Y Hed 


Proposition 6.3 
Let U be a referential set. (T2FS(U), U, M) is not a lat- 
tice. 


In fact, the problem is that the absorption laws 


AM(ALB) =A c) 
and Fig. 6.3a-c Two different type 2 fuzzy sets (a) AUrrs B (b) 

AU(AMB)=A A rars B (o) 
do not hold. Nevertheless, it is also possible to provide ~- 
a positive result [6.21]. Remark 6.1 

We should remark the following: 
Proposition 6.4 1. If we work with the operations defined in Eqs. (6.3) 
Let U be a referential set. Then for any A,B,C € and (6.4), and consider fuzzy sets as particu- 
T2FS(U) the following properties hold: lar instances of type-2 fuzzy sets, then we do 
i) AUA=AandAnA =A: a U the classical operations defined by 
ii) AUB = BUA and ANB = BMA; a oe ” 
ee we 2. If we use the operations in Definition 6.11, then 
iii) AU (BUC) = (AUB)UC. ; j i 
we recover Zadeh’s classical operations for fuzzy 

That is, (T2FS(U),U, M) is a bisemilattice. sets, but we do not have a lattice structure. This 


fact makes that the use of type-2 fuzzy sets in many 


98 PartA 


Foundations 


a 


AMB(u;, t) = sup min(A(u; 2), Banw < 


min(z,w)=t 


Fig. 6.4 Example of intersection of two membership sets A(u;, t) 
and B(u;, t). Green line is the set obtained 


1'9 | Y Hed 


applications, such as decision making, is very com- 
plicate. 


Obviously, an interesting problem is to analyze 
which further conclusions and results can be obtained 
from this new formulation of the operations between 
type-2 fuzzy sets. 


6.3.6 About Computational Efficiency 


Note also that although the computational complexity 
and the efficiency in time of type-2 fuzzy sets are not as 
high as used to be a few years ago, it is clear that the use 


6.4 Interval-Valued Fuzzy Sets 


These sets were introduced in the 1970s. In May 1975, 
Sambuc [6.37] presented, in his doctoral thesis, the 
concept of an interval-valued fuzzy set named a @- 
fuzzy set. That same year, Jahn [6.38] wrote about 
these sets. One year later, Grattan-Guinness [6.39] es- 
tablished a definition of an interval-valued membership 
function. In that decade interval-valued fuzzy sets ap- 
peared in the literature in various guises and it was not 
until the 1980s, with the work of Gorzalczany and Tiirk- 
sen [6.40-45], that the importance of these sets, as well 
as their name, was definitely established. 

Let us denote by L([0, 1]) the set of all closed subin- 
tervals in [0, 1], that is, 


L((0, 1]) = fx = E x] | (a x) € [0, 1}? 


E (6.5) 
and x< x} . 


of these kinds of sets introduces additional complex- 
ity in any given problem. For this reason, many times 
the possible improvement of results is not as big as re- 
placing type-1 fuzzy sets by type-2 fuzzy sets in many 
applications. 

On the other hand, we can also define type-3 fuzzy 
sets as those fuzzy sets whose membership of each ele- 
ment is given by a type-2 fuzzy set [6.27]. Even more, 
it is possible to define recursively type-n fuzzy sets as 
those fuzzy sets whose membership values are type- 
(n— 1) fuzzy sets. The computational efficiency of these 
sets decreases as the complexity level of the building in- 
creases. From a theoretical point of view, we consider 
that it is necessary to carry out a complete analysis of 
type-n fuzzy sets structures and operations. But up to 
now no applications has been developed on the basis of 
a type-n fuzzy sets. 


6.3.7 Applications 


It is worth to mention the works by Mendel in comput- 
ing with words and perceptual computing [6.28-31], of 
Hagras [6.32,33], of Sepulveda et al. [6.34] in control, 
of Xia etal. in mobiles [6.35] and of Wang in neural 
networks [6.36]. We will see in the next section that 
the advantage of using these kinds of sets versus usual 
fuzzy sets has been shown only for a particular type 
of them, namely, the so-called interval-valued fuzzy 
sets. 


Definition 6.12 
An interval-valued fuzzy set (or interval type-2 fuzzy 
set) A on the universe U Æ @ is a mapping 


A: U > L({0, 1), 


such that the membership degree of u € U is given 
by A(u) = [A (u), A(u)] € L((0, 1]), where A: U — [0, 1] 


and A: U > [0, 1] are mappings defining the lower and 
the upper bounds of the membership interval A(w), re- 
spectively (Fig. 6.5). 


From Definition 6.12, it is clear that for these sets 
the membership degree of each element u; € U to A is 
given by a closed subinterval in [0, 1]; that is, A (u;) = 
[A(u;), A(u;)]. Obviously, if for every u; € U, we have 
A(u;) = A(u;), then the considered set is a fuzzy set. So 


The Origin of Fuzzy Extensions | 6.4 Interval-Valued Fuzzy Sets 99 


ov 


0 1 2 3 4 5 6 7 
Fig. 6.5 Example of interval valued fuzzy set 


fuzzy sets are particular cases of interval-valued fuzzy 
sets. 

In 1989, Deng [6.46] presented the concept of Grey 
sets. Later Dubois proved that these are also interval- 
valued fuzzy sets. 

We denote by JVFS(U) the class of all interval- 
valued fuzzy sets over U; that is, VFS(U) = L({0, 1])”. 
From Zadeh’s definitions of union and intersections, 
Sambuc proposed the following definition: 


Definition 6.13 
Given A, B € IVFS(U). 
A Ur(fo,11) B(ui) = [max(A(u;), B(ui)), 
max(A(u;), B(u;))] 
ANx((0,1) Bui) = [min(A(u;), B(ui)), 
min(A(u;), B(u;))] 


These operations can be generalized by the use of 
the widely analyzed concepts of IV t-conorm and IV 
t-norm [6.4749]. 


Corollary 6.2 
Interval valued fuzzy sets are a particular case of 
L-fuzzy sets. 


Proof: Just note that L([0, 1]) with the operations in 
Definition 6.13 is a lattice. a 


Proposition 6.5 
The set (VFS(U), Ur to,1}) Mz{0,1})) is a bounded lat- 
tice, where the order is defined as 


A SIVFS(U) Bif and only ifA Ur(fo.11) B=B 


or equivalently 
A S<IvFs(u) Bif and only ifA NL.) B=A. 
That is 


A <rrs(v) B if and only if 
max(A(u;), B(u;)) = B(u;) and 
max(A(u;), B(u;)) = B(u;) 


for all u; € U, or equivalently 


A <ivrs(u) B if and only if 
min(A(u;), B(u;)) = A(u;) and 
min(A(u;), B(uj)) = A(u;) 


for all u; € U. 


From Proposition 6.5, we deduce that the order 
A <westvy B if and only if A(u;) < B(u;) and A(u;) < 
B(u;) for all u;€ U is not linear. The use of these 
sets in decision making has led several authors to 
consider the problem of defining total orders between 
intervals [6.50]. In this sense, in [6.51] a construction 
method for such orders by means of aggregation func- 
tions can be found. 


6.4.1 Two Interpretations 
of Interval-Valued Fuzzy Sets 


719 | Y Hed 


From our point of view, interval-valued fuzzy sets can 
be understood in two different ways [6.52]: 


1. The membership degree of an element to the set 
is a value that belongs to the considered interval. 
The interval representation is used since we can- 
not say precisely which that number is. For this 
reason, we provide bounds for that number. We 
think this is the correct interpretation for these 
sets. 

2. The membership degree of each element is the 
whole closed subinterval provided as membership. 
From a mathematical point of view, this interpre- 
tation is very interesting, but, in our opinion, it is 
very difficult to understand it in the applied field. 
Moreover, in this case, we find the following para- 
dox [6.53]: 

For fuzzy sets and with the standard negation it 
holds that min(A(u;), 1 —A(u;)) < 0.5 for all u; € U. 
But for interval-valued fuzzy sets, if we use the stan- 


100 PartA 


Foundations 


uy uz U 


Fig. 6.6 Construction of type-2 fuzzy sets from interval-valued 
fuzzy sets 


1'9 | Y Hed 


dard negation N(A(u;)) = [1 — A(u;), 1 — A(u;)], we 
have that there is no equivalent bound for 


min [A(u;), A(u)] , [1 Aw), 1-A(u)] . 


6.4.2 Shadowed Sets Are a Particular Case 
of Interval-Valued Fuzzy Sets 


The so-called shadow sets were suggested by 
Pedrycz [6.54] and developed later together with 
Vukovic [6.55,56]. A shadowed set B induced by 
a given fuzzy set A defined in U is an interval-valued 
fuzzy set in U that maps elements of U into 0,1 
and the unit interval [0,1], i.e., B is a mapping 
B: U — {0, 1, [0, 1]}, where 0, 1, [0, 1] denote complete 
exclusion from B, complete inclusion in B and complete 
ignorance, respectively. Shadow sets are isomorphic 
with a three-valued logic. 


6.4.3 Interval-Valued Fuzzy Sets 
Are a Particular Case of Type-2 
Fuzzy Sets 


In 1995, Klir and Yuan proved in [6.27] that from an 
interval-valued fuzzy set, we can build a type-2 fuzzy 
set as pointed out in Fig. 6.6. 

Later in 2007 Deschrijver and Kerre [6.24, 25] and 
Mendel [6.57], proved that interval-valued fuzzy sets 
are particular cases of type-2 fuzzy sets. 


6.4.4 Some Problems 
with Interval-Valued Fuzzy Sets 


1. Taking into account the definition of interval-valued 
fuzzy sets, we follow Gorzalczany [6.41] and de- 


fine the compatibility degree between two interval- 
valued fuzzy sets as an element in L((0, 1]). The 
other information measures [6.58—62] (interval- 
valued entropy, interval-valued similarity, etc.) 
should also be given by an interval. However, in 
most of the works about these measures, the results 
are given by a number, and not by an interval. This 
consideration leads us to settle that, from a theoret- 
ical point of view, we should distinguish between 
two different types of information measures: those 
which give rise to a number and those which give 
rise to an interval. Obviously, the problem of inter- 
preting both types of measures arises. Moreover, if 
the result of the measure is an interval, we should 
consider its amplitude as a measure of the lack of 
knowledge [6.63] linked to the considered measure. 
2. In [6.57], Mendel writes: 


It turns out that an interval type-2 fuzzy set is the 
same as an interval-valued fuzzy set for which there 
is a very extensive literature. These two seemingly 
different kinds of fuzzy sets were historically ap- 
proached from very different starting points, which 
as we Shall explain next has turned out to be a very 
good thing. 


Nonetheless, we consider that interval-valued fuzzy 
sets are a particular case of interval type-2 fuzzy sets 
and therefore they are not the same thing. 

3. Due to the current characteristics of computers, we 
can say that the computational cost of working with 
these sets is not much higher than the cost of work- 
ing with type-1 fuzzy sets [6.64]. 

4. We have already said that the commonly used 
order is not linear. This is a problem for some 
applications, such as decision making. In [6.65], 
it is shown that the choice of the order should 
depend on the considered application. Often ex- 
perts do not have enough information to choose 
a total order. This is a big problem since the 
choice of the order influences strongly the final 
outcome. 


6.4.5 Applications 


We can say that there already exist applications of 
interval-valued fuzzy sets that provide results which are 
better than those obtained with fuzzy sets. For instance: 


1. In classification problems. Specifically, in [6.66— 
69] a methodology to enhance the performance of 
fuzzy rule-based classification systems (FRBCSs) 


The Origin of Fuzzy Extensions | 6.4 Interval-Valued Fuzzy Sets 


is presented. The methodology used in these papers 

has the following structure: 

1) An initial FRBCS is generated by using a fuzzy 
rule learning algorithm. 

2) The linguistic labels of the learned fuzzy rules 
are modeled with interval-valued fuzzy sets in 
order to take into account the ignorance degree 
associated with the assignment of a number as 
the membership degree of the elements to the 
sets. These sets are constructed starting from the 
fuzzy sets used in the learning process and their 
shape is determined by the value of one or two 
parameters. 

3) The fuzzy reasoning method is extended so as 
to take into account the ignorance represented 
by the interval-valued fuzzy sets throughout the 
inference process. 

4) The values of the system’s parameters, for in- 
stance the ones determining the shape of the 
interval-valued fuzzy sets, are tuned applying 
evolutionary algorithms. See [6.66-69] for de- 
tails about the specific features of each proposal. 

The methodology allows us to statistically out- 

performing the performance of the following ap- 

proaches: 

a) In [6.66], the performance of the initial FR- 
BCS generated by the Chi et al. algorithm [6.70] 
and the fuzzy hybrid genetics-based machine 
learning method [6.71] are outperformed. In ad- 
dition, the results of the GAGRAD (genetic 
algorithm gradient) approach [6.72] are notably 
improved. 

b) A new tuning approach is defined in [6.67], 
where the results obtained by the tuning of the 
lateral position of the linguistic labels ([6.73]) 
and the performance provided by the tuning 
approach based on the linguistic 3-tuples repre- 
sentation [6.74] are outperformed. 

c) Fuzzy decision trees (FDTs) are used as the 
learning method in [6.68]. In this contribu- 
tion, numerous decision trees are enhanced, in- 
cluding crisp decision trees, FDTs, and FDTs 
constructed using genetic algorithms. For in- 
stance, the well-known C4.5 decision tree 
([6.75]) or the fuzzy decision tree proposed by 
Janikow [6.76] is outperformed. 

d) The proposal presented in [6.69] is the most 
remarkable one, since it allows outperforming 
two state-of-the-art fuzzy classifiers, namely, 
the FARC-HD method [6.77] and the unordered 
fuzzy rule induction algorithm (FURIA) [6.78]. 


Furthermore, the performance of the fuzzy 
counterpart of the presented approach is outper- 
formed as well. 

2. Image processing. In [6.63,79-85], it has been 
shown that if we use interval-valued fuzzy sets to 
represent those areas of an image for which the ex- 
perts have problems to build the fuzzy membership 
degrees, then edges, segmentation, etc., are much 
better. 

3. In some decision-making problems, it has also been 
shown that the results obtained with interval-valued 
fuzzy sets are better than the ones obtained with 
fuzzy sets [6.86]. They have also been used in 
Web problems [6.87], pattern recognition [6.88], 
medicine [6.89], etc., see also [6.90, 91]. 


Construction of Interval-Valued Fuzzy Sets 

In many cases, it is easier for experts to give the mem- 
bership degrees by means of numbers instead of by 
means of intervals. In this case it may happen that the 
obtained results are not the best ones. If this is so, we 
should build intervals from the numerical values pro- 
vided by the experts. For this reason, we study methods 
to build intervals from real numbers. For any such meth- 
ods, we require the following: 


i) The numerical value provided by the expert should 
be interior to the considered interval. We require 
this property since we assume that the membership 
degree for the expert is a number but he or she is 
not able to fix it exactly so he or she provides two 
bounds for it. 

ii) The amplitude of the built interval is going to repre- 
sent the degree of ignorance of the expert to fix the 
numerical value he or she has provided us. 


The previous considerations have led us to define 
in [6.63] the concept of ignorance degree G; associated 
with the value given by an expert. In such definition, it 
is settled that if the degree of membership given by the 
expert is equal to 0 or 1, then the ignorance is equal to 
0, since the expert is sure of the fact that the element be- 
longs or does not belong to the considered set. However, 
if the provided membership degree is equal to 0.5, then 
ignorance is maximal, since the expert does not know 
at all whether the element belongs or not to the set. 
Different considerations and construction methods for 
such ignorance functions using overlap functions can 
be found in [6.92]. 

Taking into account the previous argumentation, in 
Fig. 6.7 we show the schema of construction of an in- 
terval from a membership degree u given by the expert 


101 


719 | Y Hed 


102 


19 | Y Hed 


Foundations 


Membership function to FS 


Ignorance function of 
membership function of FS 


GU) 


ES) al 


Length = G(u(x)) 


Fig. 6.7 Construction with ignorance functions 


and from an ignorance function G; chosen for the con- 
sidered problem [6.63]: 

There exist other methods for constructing interval- 
valued fuzzy sets. The choice of the method depends 
on the application we are working in. One of the 
most used methods in magnetic resonance image pro- 
cessing (for fuzzy theory) is the following: several 
doctors are asked for building, for an specific region 
of an image, a fuzzy set representing that region. At 
the end, we will have several fuzzy sets, and with 
them we build an interval-valued fuzzy set as fol- 
lows. For each element’s membership, we take as lower 
bound the minimum of the values provided by the 
doctors, and as the upper bound, the maximum. This 
method has shown itself very useful in particular im- 
ages [6.83]. In Fig. 6.8, we represent the proposed 
construction. 


0.95 | 


0.9 } 


0.85 


0.8 | 


0.75 l l > 
0 50 100 150 200 250 


Fig. 6.8 Construction with different experts 


In [6.63], it is shown that for some specific ultra- 
sound images, if we use fuzzy theory to obtain the 
objects in the image, results are worse than if we use 
interval-valued fuzzy sets using the method proposed 
by Tizhoosh in [6.84]. Such method consists of the fol- 
lowing (see Fig. 6.9): from the numerical membership 
degree u4 given by the expert and from a numerical co- 
efficient œ > 1, associated with the doubt of the expert 
when he or she constructs u4 we generate the member- 
ship interval 


[ns (4), ye | : 


Interval-valued 
fuzzy set 


l4 I, 
Fuzzy set 
0.8 0.8 
Upper limit 
0.6 0.6 
0.4 Membership 0.4 
0.2 0.2 
Lower limit 
0 0 


Fig. 6.9 Tizhoosh’s construction 


The Origin of Fuzzy Extensions | 6.5 Atanasssov's Intuitionistic Fuzzy Sets or Bipolar Fuzzy Sets of Type 2 or IF Fuzzy Sets 


6.5 Atanasssov's Intuitionistic Fuzzy Sets or Bipolar Fuzzy Sets of Type 2 


or IF Fuzzy Sets 


In 1983, Atanassov presented his definition of intu- 
itionistic fuzzy set [6.93]. This paper was written in 
Bulgarian, and in 1986 he presented his ideas in English 
in [6.94]. 


Definition 6.14 
An intuitionistic fuzzy set over U is an expression A 
given by 


A = {(u;, pa (ui), va(ui)) [Ui € U}, 
where ua: U — [0,1] 
v: U — [0, 1] 
such that 0 < ua (u;i) + va (u;i) < 1 for every u; € U. 


Atanassov also introduced the following two essen- 
tial characteristics of these sets: 


1. The complementary of 
A = {(uj, pa (ui), va (ui)) |u; € U} 
is 
Ac = { (u;i, va (ui), pa (ui)) |u; € U3 . 


2. For each u; € U, the intuitionistic or hesitance index 
of such element in the considered set A is given by 


ma (uj) = 1 — pa (ui) — va (ui) . 


T4 (ui) is a measure of the hesitance of the expert to as- 
sign a numerical value to ua (u;) and va(u;). For this 
reason, we consider that these sets are an extension of 
fuzzy sets. It is clear that if for each u; € U we take 
va (u;i) = 1 — ua (u;i), then the considered set A is a fuzzy 
set in Zadeh’s sense. So fuzzy sets are a particular case 
of those defined by Atanassov. 

In 1993, Gau and Buehre [6.95] introduced the con- 
cept of vague set and later in 1994 it was shown that 
these are the same as those introduced by Atanassov in 
1983 [6.96]. 

We denote by A—JFS(U) the class of all intu- 
itionistic sets (in the sense of Atanassov) defined over 
the referential U. Atanassov also gave the following 
definition: 


Definition 6.15 
Given A, B € A—IFS(U). 


A Us—irs B = {(u;, max (ua (ui), a (ui)), 
min(va (uj), vg (ui))) |u; € U} 

A Da~rs B = {(u;, min(ua (ui), pg (ui)), 
max(va (ui), vg (ui)))|u; € U} . 


Definitions of connectives for Atanassov’s sets in 
terms of t-norms, etc. can be found in [6.49, 97]. 


Corollary 6.3 
Atanassov’s intuitionistic fuzzy sets are a particular 
case of L-fuzzy sets. 


Proof: Just note that L= {(x1,x2)|x; +x2 < 1 with 
x1, X2 € [0, 1]} with the operations in Definition 6.15 is 
a lattice. E 


Proposition 6.6 
The set (A — IFS(U), U,—irs, Qa—irs) is a bounded lat- 
tice, where the order is defined as 


A <,- rs B if and only if A Us—jrs B = B 
or equivalently 


A S<A-—IFS B if and only if A NA—IFS B=A. 


From Proposition 6.6, we see that the order 


A <,~:rs B if and only if u4 (u;i) < glui) and 


va (ui) > vg(ui) for all u; E U 


is not linear. Different methods to get linear orders 
for these sets can be found in [6.50, 51]. 


6.5.1 Relation Between Interval-Valued 
Fuzzy Sets and Atanassov's 
Intuitionistic Fuzzy Sets: 

Two Different Concepts 


In 1989, Atanassov and Gargov [6.98] and later De- 
schrijver and Kerre [6.24] proved that from an interval- 
valued fuzzy set we can build an intuitionistic fuzzy set 
and vice-versa. 


103 


s°9|W Hed 


104 PartA 


s°9 | Y Hed 


Foundations 


Theorem 6.2 
The mapping 


®:IVFS(U) > A—IFS(U) , 
A>, 


where A’ = {(u;,A(u;), 1 —A(u;))|u; € U}, is a bijec- 
tion. 


Theorem 6.2 shows that interval-valued fuzzy sets 
and Atanassov’s intuitionistic fuzzy sets, are equivalent 
from a mathematical point of view. But, as pointed out 
in [6.52], the absence of a structural component in their 
description might explain this result, since from a con- 
ceptual point of view they very different models: 


a) The representation of the membership of an ele- 
ment to a set using an interval means that the expert 
doubts about the exact value of such membership, 
so such an expert provides two bounds, and we 
never consider the representation of the nonmem- 
bership to a set. 

b) By means of the intuitionistic index we, repre- 
sent the hesitance of the expert in simultaneously 
building the membership and the nonmembership 
degrees. 


From an applied point of view, the conceptual dif- 
ference between both concepts has also been clearly 
displayed in [6.99]. On page 204 of this paper, Ye 
adapts an example by Herrera and Herrera-Viedma 
appeared in 2000 [6.100]. Ye’s example runs as fol- 
lows: n experts are asked about a money investment in 
four different companies. Ye considers that the mem- 
bership to the set that represents each company is 
given by the number of experts who would invest their 
money in that company (normalized by n), and the non- 
membership is given by the number of experts who 
would not invest their money in that company. Clearly, 
the intuitionistic index corresponds to the experts that 
do not provide either a positive or a negative answer 
about investing in that company. In this way, Ye proves 
that: 


1. The results obtained with this representation are 
more realistic than those obtained in [6.100] using 
Zadeh’s fuzzy sets. 

2. In the considered problem, the interval interpreta- 
tion does not make much sense besides its use as 
a mathematical tool. 


6.5.2 Some Problems with the Intuitionistic 
Sets Defined by Atanassov 


Besides the missed structural component pointed out 
in [6.52]: 


1. In these sets, each element has two associated 
values. For this reason, we consider that the in- 
formation measures as entropy [6.59, 61], similar- 
ity [6.101, 102], etc. should also be given by two 
numerical values. That is, in our opinion, we should 
distinguish between those measures that provide 
a single number and those others that provide two 
numbers. This fact is discussed in [6.103] where the 
two concepts of entropy given in [6.59] and [6.61] 
are jointly used to represent the uncertainty linked 
to Atanassov’s intuitionistic fuzzy set. So we think 
that it is necessary to carry out a conceptual revi- 
sion of the definitions of similarity, dissimilarity, 
entropy, comparability, etc., given for these sets. 
Even more since nowadays working with two num- 
bers instead of a single one does not imply a much 
larger computational cost. 

2. Asin the case of interval-valued fuzzy sets, in many 
applications, there is a problem to choose the most 
appropriate linear order associated with that appli- 
cation [6.50,51]. We should remark that the chosen 
order directly influences the final outcome, so it is 
necessary to study the conditions that determine the 
choice of one order or another [6.65]. 


6.5.3 Applications 


Extensions have shown themselves very useful in prob- 
lems of decision making [6.99, 104-108]. In general, 
they work very well in problems for which we have to 
represent the difference between the positive and the 
negative representation of something [6.109], in par- 
ticular in cognitive psychology and medicine [6.110]. 
Also in image processing they have been used often, as 
in [6.111, 112]. We should remark that the mathemati- 
cal equivalence between these sets and interval-valued 
fuzzy sets makes that in many applications in which 
interval-valued fuzzy sets are useful, so are Atanassov’s 
intuitionistic fuzzy sets [6.113]. 


6.5.4 The Problem of the Name 


From Sect. 6.1.1, it is clear that the term intuitionistic 
was used in 1907 by Brouwer, in 1930 by Heyting, etc. 
So, 75 years before Atanassov used it, it already had 


The Origin of Fuzzy Extensions | 6.6 Atanassov's Interval-Valued Intuitionistic Fuzzy Sets 105 


a specific meaning in logic. Moreover, one year after 
Atanassov first used it in Bulgarian, Takeuti and Titani 
(1984) presented a set representation for Heyting ideas, 
using the expression intuitionistic fuzzy sets. From our 
point of view, this means that in fact the correct termi- 
nology is that of Takeuti and Titani. Nevertheless, all 
these facts have originated a serious notation problem 
in the literature about the subject. 

In 2005, in order to solve these problems, Dubois 
et al. published a paper [6.7] on the subject and, they 
proposed to replace the name intuitionistic fuzzy sets 
by bipolar fuzzy sets, justifying this change. Later, 
Atanassov has answered in [6.114], where he defends 
the reasons he had to choose the name intuitionistic and 
states a clear fact: the sets he defined are much more 
cited and used than those defined by Takeuti and Titani, 
so in his opinion the name must not change. 


In Dubois and Prade’s works about bipolarity 
types [6.115, 116], these authors stated that Atanassov’s 
sets are included in the type-2 bipolar sets, so they call 
these sets fuzzy bipolar sets of type-2. 

But we must say that nine years before Dubois 
et al.’s paper about the notation, Zhang in [6.117, 118] 
used the word bipolar in connection with the fuzzy 
sets theory and presented the concept of bipolar-valued 
set. 

All these considerations have led some authors to 
propose the name Atanassov’s intuitionistic fuzzy sets. 
However, Atanassov himself disagrees with this nota- 
tion and asserts that his notation must be hold; that is, 
intuitionistic fuzzy sets. Other authors use the name IF- 
sets (intuitionistic fuzzy) [6.119]. 

In any case, only time will fix the appropriate 
names. 


6.6 Atanassov's Interval-Valued Intuitionistic Fuzzy Sets 


In 1989, Atanassov and Gargov presented the following 
definition [6.98]: 


Definition 6.16 
An Atanassov’s interval-valued intuitionistic fuzzy set 
over U is an expression A given by 


A = {(u;, Ma (ui), Na (ui)) |ui € U} , 
where M4: U —> L((0, 1]), 
Na: U — L([0, 1]) 
such that 0 < Ma (u:i) + N4 (u;i) < 1 for every u; € U. 


In this definition, authors adapt Atanassov’s in- 
tuitionistic sets to Zadeh’s ideas on the problem of 
building the membership degrees of the elements to the 
fuzzy set. Moreover, if for every u; € U, we have that 
My (uj) = M4 (uj) and N4(u;) = N4 (uj), then we recover 
an Atanassov’s intuitionistic fuzzy set, so the latter 
are a particular case of Atanassov’s interval-valued in- 
tuitionistic fuzzy sets. As in the case of Atanassov’s 
intuitionistic fuzzy sets, the complementary of a set is 
obtained by interchanging the membership and non- 
membership intervals. 

We represent by A—JVIFS(U) the class of all 
Atanassov’s interval-valued intuitionistic fuzzy sets 
over a referential set U. 


Definition 6.17 
Given A, B € A—IVIFS(U). 


A Us—wvirs B = {(uj, A Us—ivirs B(uj)) |u; € U} 
where A Ug—jvirs B(u;) 
= | (max (Ma (ui), Ma (ui) , max (Ma (ui), Mp(ui))) | 
[min (Na (u), Ng (u;)) , min (Na (uj), Ne (ui) | . 
A Da—ivirs B = {(uj. A Da—ivirs B(ui)) |u; € U} 
where A Da—zvirs B(u;) 
= ( [min (M4 (ui), Mg (ui)) , min (M4 (u;), Mg (u:))] 
[max (Na (ui), Ng (u;)) , max (Na (ui), Ng (ui))] ) , 


Corollary 6.4 
Atanassov’s interval-valued intuitionistic fuzzy sets are 
a particular case of L-fuzzy sets. 


Proof: Just note that LL((0, 1]) = {(x, y) € 
L((0, 1])?|¥+ y} with the operations in Definition 6.17 
is a lattice. E 


Proposition 6.7 
The set (A = IVIFS(U), Ug—1VvIFs; Na—rvirs) is a bound- 
ed lattice, where the order is defined as 


A SA—IVIFS Bif and only ifA Us—ivirs B=B 


9°9 | Y Hed 


106 PartA 


Foundations 


8°9 | Y Hed 


or equivalently 


A <A~IVIFS B if and only if A NA—IVIFS B=A. 


Note that A <A~IVIFS B if and only if Ma (ui) <IVFS 
Mp(ui) and Na (uj) ZIVFS Npg(ui) for all ui € U; that is, 
A <A~IVIFS B if and only if M; (uj) S Mp(ui), Ma (ui) < 
Mg(uj), Na (ui) > Na(ui), and Na(u;) > Ng(u;) for all 
ui € U is not linear. — 

We make the following remarks regarding these ex- 
tensions: 


1. Itis necessary to study two different types of infor- 
mation measures: those whose outcome is a single 
number [6.120] and those whose outcomes are two 


intervals in [0, 1] [6.120]. It is necessary a study of 
both types. 

2. Nowadays, there are many works using these 
sets [6.121—123]. However none of them displays 
an example where the results obtained with these 
sets are better than those obtained with fuzzy sets or 
other techniques. As it happened until recent years 
with interval-valued fuzzy sets, it is necessary to 
find an application that provides better results us- 
ing these extensions rather than using other sets. To 
do so, we should compare the results with those ob- 
tained with other techniques, which is something 
that it is not done for the moment in the papers that 
make use of these sets. From the moment, most of 
the studies are just theoretical [6.124—126]. 


6.7 Links Between the Extensions of Fuzzy Sets 


Taking into account the study carried out in previous 
sections, we can describe the following links between 
the different extensions. 


1. FS CIVFS = Grey Sets = A — IFS = 
Vague sets C A — IVIFS C L— FS . 


6.8 Other Types of Sets 


In this section, we present the definition of other types 
of sets that have arisen from the idea of Zadeh’s fuzzy 
set. However, for us none of them should be considered 
an extension of a fuzzy set, since we do not represent 
with them the degree of ignorance or uncertainty of the 
expert. 


6.8.1 Probabilistic Sets 

These sets were introduced in 1981 by Hirota [6.127]. 
Definition 6.18 

Let (2, B, P) be a probability space and let B(0, 1) de- 


note the family of Borel sets in [0, 1]. A probabilistic set 
A over the universe U is a function 


A: U x 2 = ([0, 1], B(0, 1), 


where A(u;,-) is measurable for each u; € U. 


2. If we consider the operations in Definition 6.11, we 
have the sequence of inclusions: 


FS CIVFS = Grey Sets = A — IFS 
= Vague sets C T2FS C L— FS. 


6.8.2 Fuzzy Multisets and n-Dimensional 
Fuzzy Sets 


The idea of multiset was given by Yager in 1986 [6.128] 
and later developed by Miyamoto [6.129]. In these mul- 
tilevel sets, several degrees of membership are assigned 
to each element. 


Definition 6.19 
Let U be a nonempty set and n € N*. A fuzzy multiset 
A over U is given by 


, Ha, (ui)) |u; € US, 


where ua;: U —> [0, 1] is called the ith membership de- 
gree of A. 


A = { (ui, Ha, (ui), Has (Ui), - - - 


If in Definition 6.19 we require that: 4, < Ma, < 
-+ < Wa, We have an n-Dimensional fuzzy set [6.130, 
131]. Nevertheless, it is worth to point out the rela- 
tion of these families of fuzzy set with the classification 


The Origin of Fuzzy Extensions | 6.8 Other Types of Sets 


model proposed in [6.132], and the particular model 
proposed in [6.133], where fuzzy preference inten- 
sity was arranged according to the basic preference 
attitudes. 


6.8.3 Bipolar Valued Set or Bipolar Set 


In 1996, Zhang presented the concept of bipolar set as 
follows [6.117]: 


Definition 6.20 
A bipolar-valued set or a bipolar set on U is an object 


A= {(u, p7 ui), 97 (ui) |ui € U} 
with gt: U > [0, 1], 97: U > [-1, 0]. 


In these sets, the value g~ (u;) must be understood as 
how much the environment of the problem opposes to 
the fulfillment of gt (u;). Nowadays interesting studies 
exist about these sets [6.134—-138]. 


6.8.4 Neutrosophic Sets or Symmetric 
Bipolar Sets 


These sets were first studied by Smarandache in 
2002 [6.139]. They arise from Atanassov’s intuitionis- 
tic fuzzy sets ignoring the restriction on the sum of the 
membership and the nonmembership degrees. 


Definition 6.21 


A neutrosophic set or symmetric bipolar set on U is an 
object 


A = { (u;i, pa (ui), Va(ui)) |u; E U} , 


with ua: U —> [0, 1], va: U > [0, 1]. 


6.8.5 Hesitant Sets 


These sets were introduced by Torra and Naru- 
kawa in 2009 to deal with decision-making 
problems [6.140, 141]. 


Definition 6.22 

Let ([0, 1]) be the set of all subsets of the unit interval 
and U be a nonempty set. Let ua: U > go((0, 1]), then 
a hesitant fuzzy set (HFS in short) A defined over U is 
given by 


A = {(uj, Ma(ui))|ui € U} . (6.6) 


6.8.6 Fuzzy Soft Sets 


Based on the definition of soft set [6.142], Maji et al. 
present the following definition [6.143]. 


Definition 6.23 
A pair (F, A) is called a fuzzy soft set over U, where F 
is a mapping given by F: A — FP(U). 


Where FP(U) denotes the set of all fuzzy subsets of U. 
6.8.7 Fuzzy Rough Sets 


From the concept of rough set given by Pawlak 
in [6.144], Dubois and Prade in 1990 proposed 
the following definition [6.145]. From different point 
of views these sets could be considered as an ex- 
tension of fuzzy sets in our sense, besides these 
sets are being exhaustively studied, for this rea- 
son we consider that these sets need another 
chapter. 


Definition 6.24 

Let U be a referential set and R be a fuzzy similar- 
ity relation on U. Take A € FS(U). A fuzzy rough 
set over U is a pair (R| A, Rt A) € FS(U) x FS(U), 
where 


@ RJA: U = (0, 1] is given by 

R | A(u) = inf,ey max(1 — R(v, u), A(v)) 
© Rt A:U = (0, 1] is given by 

R * A(u) = sup,ey min(R(v, u), A(v)). 


107 


8°9 | Y Hed 


108 PartA | Foundations 


9 | Y Hed 


6.9 Conclusions 


In this chapter, we have reviewed the main types of 
fuzzy sets defined since 1965. We have classified these 
sets in two groups: those that take into account the 
problem of building the membership functions, which 
we have included in the so-called extensions of fuzzy 
sets, and those that appear as an answer to such a key 
issue. 

We have introduced the definitions and first proper- 
ties of the extensions, that is, type-2 fuzzy sets, interval- 
valued fuzzy sets; Atanassov’s intuitionistic fuzzy sets 
or type-2 bipolar fuzzy sets, and Atanassov’s interval- 
valued fuzzy sets. We have described the properties and 
problems linked to type-2 fuzzy sets, and we have pre- 


sented several construction methods for interval-valued 
fuzzy sets, depending on the application. We have also 
referred to some papers where it is shown that the use of 
interval-valued fuzzy sets improves the results obtained 
with fuzzy sets. 

In general, we have stated the main problem in 
fuzzy sets extensions, namely, to find applications for 
which the results obtained with these sets are better 
than those obtained with other techniques. This has only 
been proved, up to now, for interval-valued fuzzy sets. 
We think that the great defy for some sets that are ini- 
tially justified as a theoretical need is to prove their 
practical usefulness. 


References 
6.1 L.A. Zadeh: Fuzzy sets, Inf. Control 8, 338-353 (1965) 6.14 T. Calvo, A. Kolesárová, M. Komornikova, R. Mesiar: 
6.2 J.A. Goguen: L-fuzzy sets, J. Math. Anal. Appl. 18, Aggregation operators: Properties, classes and con- 


145-174 (1967) 

6.3 J.T. Cacioppo, W.L. Gardner, C.G. Berntson: Beyond 
bipolar conceptualizations and measures: The case 
of attitudes and evaluative space, Pers. Soc. Psy- 
chol. Rev. 1, 3-25 (1997) 

6.4 R. Goldblatt: Topoi: The Categorial Analysis of Logic 
(North-Holland, Amsterdam 1979) 

6.5 S.M. Lane, |. Moerfijk: Sheaves in Geometry and 
Logic (Springer, New York 1992) 

6.6 G. Takeuti, S. Titani: Intuitionistic fuzzy logic and 
intuitionistic fuzzy set theory, J. Symb. Log. 49, 
851-866 (1984) 

6.7 D. Dubois, S. Gottwald, P. Hajek, J. Kacprzyk, 
H. Prade: Terminological difficulties in fuzzy set 
theory — The case of intuitionistic fuzzy sets, Fuzzy 
Sets Syst. 156(3), 485-491 (2005) 

6.8 G. Birkhoff: Lattice Theory (American Mathematical 
Society, Providence 1973) 

6.9 R. Willmott: Mean Measures in Fuzzy Power-Set 
Theory, Report No. FRP-6 (Dep. Math., Univ. Essex, 
Colchester 1979) 

6.10 W. Bandler, L. Kohout: Fuzzy power sets, fuzzy 
implication operators, Fuzzy Sets Syst. 4, 13-30 
(1980) 

6.11 B. De Baets, E.E. Kerre, M. Gupta: The fundamentals 
of fuzzy mathematical morphology — part 1: Basic 
concepts, Int. J. Gen. Syst. 23(2), 155-171 (1995) 

6.12 L.K. Huang, M.J. Wang: Image thresholding by 
minimizing the measure of fuzziness, Pattern 
Recognit. 29(1), 41-51 (1995) 

6.13 H. Bustince, J. Montero, E. Barrenechea, M. Pagola: 
Semiautoduality in a restricted family of aggrega- 
tion operators, Fuzzy Sets Syst. 158(12), 1360-1377 
(2007) 


struction methods. In: Aggregation Operators New 
Trends and Applications, ed. by T. Calvo, G. Mayor, 
R. Mesiar (Physica, Heidelberg 2002) pp. 3-104 

6.15 J. Fodor, M. Roubens: Fuzzy preference modelling 
and multicriteria decision support, Theory and De- 
cision Library (Kluwer, Dordrecht 1994) 

6.16 E.P. Klement, R. Mesiar, E. Pap: Triangular norms, 
trends in logic, Studia Logica Library (Kluwer, Dor- 
drecht 2000) 

6.17 L.A. Zadeh: Quantitative fuzzy semantics, Inf. Sci. 
3, 159-176 (1971) 

6.18 E.E. Kerre: A first view on the alternatives of fuzzy 
sets theory. In: Computational Intelligence in The- 
ory and Practice, ed. by B. Reusch, K.-H. Temme 
(Physica, Heidelberg 2001) pp. 55-72 

6.19 M. Mizumoto, K. Tanaka: Some properties of fuzzy 
sets of type 2, Inf. Control 31, 312-340 (1976) 

6.20 D. Dubois, H. Prade: Operations in a fuzzy-valued 
logic, Inf. Control 43(2), 224-254 (1979) 

6.21 J. Harding, C. Walker, E. Walker: The variety gener- 
ated by the truth value algebra of type-2 fuzzy sets, 
Fuzzy Sets Syst. 161, 735-749 (2010) 

6.22 J.M. Mendel, R.I. John: Type-2 fuzzy sets made sim- 
ple, IEEE Trans. Fuzzy Syst. 10, 117-127 (2002) 

6.23 J. Aisbett, J.T. Rickard, D.G. Morgenthaler: Type-2 
fuzzy sets as functions on spaces, IEEE Trans. Fuzzy 
Syst. 18(4), 841-844 (2010) 

6.24 G. Deschrijver, E.E. Kerre: On the position of in- 
tuitionistic fuzzy set theory in the framework of 
theories modelling imprecision, Inf. Sci. 177, 1860- 
1866 (2007) 

6.25 G. Deschrijver, E.E. Kerre: On the relationship be- 
tween some extensions of fuzzy set theory, Fuzzy 
Sets Syst. 133, 227-235 (2003) 


The Origin of Fuzzy Extensions 


References 


6.40 


6.41 


6.42 


6.43 


6.44 


D. Dubois, H. Prade: Fuzzy Sets and Systems: Theory 
and Applications (Academic, New York 1980) 

G.J. Klir, B. Yuan: Fuzzy Sets and Fuzzy Logic: Theory 
and Applications (Prentice-Hall, New Jersey 1995) 
J.M. Mendel: Type-2 fuzzy sets for computing with 
words, IEEE Int. Conf. Granul. Comput., Atlanta 
(2006), GA 8-8 

J.M. Mendel: Computing with words and its rela- 
tionships with fuzzistics, Inf. Sci. 177(4), 988-1006 
(2007) 

J.M. Mendel: Historical reflections on perceptual 
computing, Proc. 8th Int. FLINS Conf. (FLINS'08) 
(World Scientific, Singapore 2008) pp. 181-187 

J.M. Mendel: Computing with words: Zadeh, Turing, 
Popper and Occam, IEEE Comput. Intell. Mag. 2,10- 
17 (2007) 

H. Hagras: Type-2 FLCs: A new generation of fuzzy 
controllers, IEEE Comput. Intell. Mag. 2, 30-43 
(2007) 

H. Hagras: A hierarchical type-2 fuzzy logic control 
architecture for autonomous mobile robots, IEEE 
Trans. Fuzzy Syst. 12, 524-539 (2004) 

R. Sepulveda, 0. Castillo, P. Melin, A. Rodriguez- 
Diaz, 0. Montiel: Experimental study of intelli- 
gent controllers under uncertainty using type-1 
and type-2 fuzzy logic, Inf. Sci. 177, 2023-2048 
(2007) 

X.S. Xia, Q.L. Liang: Crosslayer design for mobile ad 
hoc networks using interval type-2 fuzzy logic sys- 
tems, Int. J. Uncertain. Fuzziness Knowl. Syst. 16(3), 
391-408 (2008) 

C.H. Wang, C.S. Cheng, T.T. Lee: Dynamical opti- 
mal training for interval type-2 fuzzy neural net- 
work (T2FNN), IEEE Trans. Syst. Man Cybern. B 34(3), 
14621477 (2004) 

R. Sambuc: Fonction ®-Flous, Application a l'aide 
au Diagnostic en Pathologie Thyroidienne, These 
de Doctorat en Medicine (Univ. Marseille, Marseille 
1975) 

K.U. Jahn: Intervall-wertige Mengen, Math. Nachr. 
68, 115-132 (1975) 

|. Grattan-Guinness: Fuzzy membership mapped 
onto interval and many-valued quantities, 
Z. Math. Log. Grundl. Math. 22, 149-160 (1976) 

A. Dziech, M.B. Gorzalczany: Decision making in 
signal transmission problems with interval-valued 
fuzzy sets, Fuzzy Sets Syst. 23(2), 191-203 (1987) 
M.B. Gorzalczany: A method of inference in ap- 
proximate reasoning based on interval-valued 
fuzzy sets, Fuzzy Sets Syst. 21, 1-17 (1987) 

M.B. Gorzalczany: An interval-valued fuzzy infer- 
ence method. Some basic properties, Fuzzy Sets 
Syst. 31(2), 243-251 (1989) 

I.B. Turksen: Interval valued fuzzy sets based on 
normal forms, Fuzzy Sets Syst. 20(2), 191-210 (1986) 
I.B. Türksen, Z. Zhong: An approximate analogi- 
cal reasoning schema based on similarity measures 
and interval-valued fuzzy sets, Fuzzy Sets Syst. 34, 
323-346 (1990) 


6.45 


6.46 


6.47 


6.48 


6.49 


I.B. Tiirksen, D.D. Yao: Representation of connec- 
tives in fuzzy reasoning: The view through normal 
forms, IEEE Trans. Syst. Man Cybern. 14, 191-210 
(1984) 

J.L. Deng: Introduction to grey system theory, 
J. Grey Syst. 1, 1-24 (1989) 

H. Bustince: Indicator of inclusion grade for 
interval-valued fuzzy sets. Application to approx- 
imate reasoning based on interval-valued fuzzy 
sets, Int. J. Approx. Reason. 23(3), 137-209 (2000) 
G. Deschrijver: The Archimedean property for t- 
norms in interval-valued fuzzy set theory, Fuzzy 
Sets Syst. 157(17), 2311-2327 (2006) 

G. Deschrijver, C. Cornelis, E.E. Kerre: On the rep- 
resentation of intuitionistic fuzzy t-norms and t- 
conorms, IEEE Trans. Fuzzy Syst. 12(1), 45-61 (2004) 
Z. Xu, R.R. Yager: Some geometric aggregation 
Operators based on intuitionistic fuzzy sets, Int. 
J. Gen. Syst. 35, 417-433 (2006) 

H. Bustince, J. Fernandez, A. Kolesárová, M. Mesiar: 
Generation of linear orders for intervals by means 
of aggregation functions, Fuzzy Sets Syst. 220, 69- 
77 (2013) 

J. Montero, D. Gomez, H. Bustince: On the rele- 
vance of some families of fuzzy sets, Fuzzy Sets Syst. 
158(22), 2429-2442 (2007) 

H. Bustince, F. Herrera, J. Montero: Fuzzy Sets and 
Their Extensions: Representation Aggregation and 
Models (Springer, Berlin 2007) 

W. Pedrycz: Shadowed sets: Representing and pro- 
cessing fuzzy sets, IEEE Trans. Syst. Man Cybern. B 
28, 103-109 (1998) 

W. Pedrycz, G. Vukovich: Investigating a relevance 
off uzzy mappings, IEEE Trans. Syst. Man Cybern. B 
30, 249-262 (2000) 

W. Pedrycz, G. Vukovich: Granular computing with 
shadowed sets, Int. J. Intell. Syst. 17, 173-197 (2002) 
J.M. Mendel: Advances in type-2 fuzzy sets and sys- 
tems, Inf. Sci. 177, 84-110 (2007) 

H. Bustince, J. Montero, M. Pagola, E. Barrenechea, 
D. Gomez: A survey of interval-valued fuzzy sets. 
In: Handbook of Granular Computing, ed. by 
W. Pedrycz (Wiley, New York 2008) 

P. Burillo, H. Bustince: Entropy on intuitionistic 
fuzzy sets and on interval-valued fuzzy sets, Fuzzy 
Sets Syst. 78, 305-316 (1996) 

A. Jurio, M. Pagola, D. Paternain, C. Lopez-Molina, 
P. Melo-Pinto: Interval-valued restricted equiva- 
lence functions applied on clustering techniques, 
Proc. Int. Fuzzy Syst. Assoc. World Congr. Eur. Soc. 
Fuzzy Log. Technol. Conf. (2009) pp. 831-836 

E. Szmidt, J. Kacprzyk: Entropy for intuitionistic 
fuzzy sets, Fuzzy Sets Syst. 118(3), 467-477 (2001) 

H. Rezaei, M. Mukaidono: New similarity measures 
of intuitionistic fuzzy sets, J. Adv. Comput. Intell. 
Inf. 11(2), 202-209 (2007) 

H. Bustince, M. Pagola, E. Barrenechea, J. Fer- 
nandez, P. Melo-Pinto, P. Couto, H.R. Tizhoosh, 
J. Montero: Ignorance functions. An application to 


109 


9 | Y Hed 


no PartA 


Foundations 


9 | Y Hed 


6.64 


6.65 


6.66 


6.67 


6.68 


6.69 


6.70 


6.71 


6.72 


6.73 


6.74 


6.75 


6.76 


6.77 


the calculation of the threshold in prostate ultra- 
sound images, Fuzzy Sets Syst. 161(1), 20-36 (2010) 
D. Wu: Approaches for reducing the computational 
cost of interval type-2 fuzzy logic systems: Overview 
and comparisons, IEEE Trans. Fuzzy Syst. 21(1), 80- 
99 (2013) 

H. Bustince, M. Galar, B. Bedregal, A. Kolesárová, 
R. Mesiar: A new approach to interval-valued Cho- 
quet integrals and the problem of ordering in 
interval-valued fuzzy set applications, IEEE Trans. 
Fuzzy Syst. 21(6), 1150-1162 (2013) 

J. Sanz, H. Bustince, F. Herrera: Improving the per- 
formance of fuzzy rule-based classification systems 
with interval-valued fuzzy sets and genetic ampli- 
tude tuning, Inf. Sci. 180(19), 3674-3685 (2010) 

J. Sanz, A. Fernandez, H. Bustince, F. Herrera: A ge- 
netic tuning to improve the performance of fuzzy 
rule-based classification systems with interval- 
valued fuzzy sets: Degree of ignorance and lateral 
position, Int. J. Approx. Reason. 52(6), 751-766 
(2011) 

J. Sanz, A. Fernandez, H. Bustince, F. Herrera: 
IIVFDT: Ignorance functions based interval-valued 
fuzzy decision tree with genetic tuning, Int. J. Un- 
certain. Fuzziness Knowl.-Based Syst. 20(Suppl. 2), 
1-30 (2012) 

J. Sanz, A. Fernandez, H. Bustince, F. Herrera: IV- 
TURS: A linguistic fuzzy rule-based classification 
system based on a new interval-valued fuzzy rea- 
soning method with tuning and rule selection, IEEE 
Trans. Fuzzy Syst. 21(3), 399-411 (2013) 

Z. Chi, H. Yan, T. Pham: Fuzzy Algorithms with Ap- 
plications to Image Processing and Pattern Recog- 
nition (World Scientific, Singapore 1996) 

H. Ishibuchi, T. Yamamoto, T. Nakashima: Hy- 
bridization of fuzzy GBML approaches for pattern 
classification problems, IEEE Trans. Syst. Man Cy- 
bern. B 35(2), 359-365 (2005) 

J. Dombi, Z. Gera: Rule based fuzzy classification 
using squashing functions, J. Intell. Fuzzy Syst. 
19(1), 3-8 (2008) 

R. Alcala, J. Alacala-Fdez, F. Herrera: A proposal for 
the genetic lateral tuning of linguistic fuzzy systems 
and its interaction with rule selection, IEEE Trans. 
Fuzzy Syst. 15(4), 616-635 (2007) 

R. Alcala, J. Alacala-Fdez, M. Graco, F. Herrera: Rule 
base reduction and genetic tuning of fuzzy systems 
based on the linguistic 3-tuples representation, 
Soft Comput. 11(5), 401-419 (2007) 

J. Quinlan: C4.5: Programs for Machine Learning 
(Morgan Kaufmann, San Mateo 1993) 

C.Z. Janikow: Fuzzy decision trees: Issues and 
methods, IEEE Trans. Syst. Man Cybern. B 28(1), 1-14 
(1998) 

J. Alacala-Fdez, R. Alcala, F. Herrera: A fuzzy asso- 
ciation rule-based classification model for high- 
dimensional problems with genetic rule selection 
and lateral tuning, IEEE Trans. Fuzzy Syst. 19(5), 
857-872 (2011) 


6.78 


6.79 


6.80 


6.81 


6.82 


6.83 


6.84 


6.85 


6.86 


6.87 


6.88 


6.89 


6.90 


6.91 


6.92 


6.93 


6.94 


J. Hühn, E. Hiillermeier: FURIA: An algorithm for 
unordered fuzzy rule induction, Data Min. Knowl. 
Discov. 19(3), 293-319 (2009) 

E. Barrenechea, H. Bustince, B. De Baets, C. Lopez- 
Molina: Construction of interval-valued fuzzy re- 
lations with application to the generation of fuzzy 
edge images, IEEE Trans. Fuzzy Syst. 19(5), 819-830 
(2011) 

H. Bustince, P.M. Barrenechea, J. Fernandez, 
J. Sanz: “Image thresholding using type Il fuzzy 
sets." Importance of this method, Pattern Recog- 
nit. 43(9), 3188-3192 (2010) 

H. Bustince, E. Barrenechea, M. Pagola, J. Fernan- 
dez: Interval-valued fuzzy sets constructed from 
matrices: Application to edge detection, Fuzzy Sets 
Syst. 60(13), 1819-1840 (2009) 

M. Galar, F. Fernandez, G. Beliakov, H. Bustince: 
Interval-valued fuzzy sets applied to stereo match- 
ing of color images, IEEE Trans. Image Process. 20, 
1949-1961 (2011) 

M. Pagola, C. Lopez-Molina, J. Fernandez, E. Bar- 
renechea, H. Bustince: Interval type-2 fuzzy sets 
constructed from several membership functions. 
Application to the fuzzy thresholding algorithm, 
IEEE Trans. Fuzzy Syst. 21(2), 230-244 (2013) 

H.R. Tizhoosh: Image thresholding using type-2 
fuzzy sets, Pattern Recognit. 38, 2363-2372 (2005) 
M.E. Yuksel, M. Borlu: Accurate segmentation of 
dermoscopic images by image thresholding based 
on type-2 fuzzy logic, IEEE Trans. Fuzzy Syst. 17(4), 
976-982 (2009) 

C. Shyi-Ming, W. Hui-Yu: Evaluating students 
answer scripts based on interval-valued fuzzy 
grade sheets, Expert Syst. Appl. 36(6), 9839-9846 
(2009) 

F. Liu, H. Geng, Y.-Q. Zhang: Interactive fuzzy inter- 
val reasoning for smart web shopping, Appl. Soft 
Comput. 5(4), 433-439 (2005) 

C. Byung-In, C.-H.R. Frank: Interval type-2 fuzzy 
membership function generation methods for pat- 
tern recognition, Inf. Sci. 179(13), 2102-2122 (2009) 
H.M. Choi, G.S. Min, J.Y. Ahn: A medical diagnosis 
based on interval-valued fuzzy sets, Biomed. Eng. 
Appl. Basis Commun. 24(4), 349-354 (2012) 

J.M. Mendel, H. Wu: Type-2 fuzzistics for symmetric 
interval type-2 fuzzy sets: Part 1: Forward problems, 
IEEE Trans. Fuzzy Syst. 14(6), 781-792 (2006) 

D. Wu, J.M. Mendel: A vector similarity measure for 
linguistic approximation: Interval type-2 and type- 
1 fuzzy sets, Inf. Sci. 178(2), 381-402 (2008) 

A. Jurio, H. Bustince, M. Pagola, A. Pradera, 
R.R. Yager: Some properties of overlap and group- 
ing functions and their application to image 
thresholding, Fuzzy Sets Syst. 229, 69-90 (2013) 
K.T. Atanassov: Intuitionistic fuzzy sets, VII ITKRs 
Session, Central Sci.-Tech. Libr. Bulg. Acad. Sci., 
Sofia (1983) pp. 1684-1697, (in Bulgarian) 

K.T. Atanassov: Intuitionistic fuzzy sets, Fuzzy Sets 
Syst. 20, 87-96 (1986) 


The Origin of Fuzzy Extensions 


References 


6.98 


6.99 


6.100 


6.101 


6.102 


6.103 


6.104 


6.105 


6.106 


6.107 


6.108 


6.109 


6.110 


6.111 


W.L. Gau, D.J. Buehrer: Vague sets, IEEE Trans. Syst. 
Man Cybern. 23(2), 610-614 (1993) 

H. Bustince, P. Burillo: Vague sets are intuitionistic 
fuzzy sets, Fuzzy Sets Syst. 79(3), 403-405 (1996) 
H. Bustince, E. Barrenechea, P. Pagola: Genera- 
tion of interval-valued fuzzy and Atanassov's intu- 
itionistic fuzzy connectives from fuzzy connectives 
and from Kw operators: Laws for conjunctions and 
disjunctions, amplitude, Int. J. Intell. Syst. 32(6), 
680-714 (2008) 

K.T. Atanassov, G. Gargov: Interval valued intu- 
itionistic fuzzy sets, Fuzzy Sets Syst. 31(3), 343-349 
(1989) 

J. Ye: Fuzzy decision-making method based on the 
weighted correlation coefficient under intuition- 
istic fuzzy environment, Eur. J. Oper. Res. 205(1), 
202-204 (2010) 

F. Herrera, E. Herrera-Viedma: Linguistic decision 
analysis: Steps for solving decision problems under 
linguistic information, Fuzzy Sets Syst. 115, 67-82 
(2000) 

L. Baccour, A.M. Alimi, R.I. John: Similarity mea- 
sures for intuitionistic fuzzy sets: State of the art, 
J. Intell. Fuzzy Syst. 24(1), 37-49 (2013) 

E. Szmidt, J. Kacprzyk, P. Bujnowski: Measuring the 
amount of knowledge for Atanassovs intuitionis- 
tic fuzzy sets, Lect. Notes Comput. Sci. 6857, 17-24 
(2011) 

N.R. Pal, H. Bustince, M. Pagola, U.K. Mukher- 
jee, D.P. Goswami, G. Beliakov: Uncertainties with 
Atanassov's intuitionistic fuzzy sets: fuzziness and 
lack of knowledge, Inf. Sci. 228, 61-74 (2013) 

U. Dudziak, B. Pekala: Equivalent bipolar fuzzy re- 
lations, Fuzzy Sets Syst. 161(2), 234-253 (2010) 

Z. Xu: Approaches to multiple attribute group de- 
cision making based on intuitionistic fuzzy power 
aggregation operators, Knowl. Syst. 24(6), 749-760 
(2011) 

Z. Xu, H. Hu: Projection models for intuitionistic 
fuzzy multiple attribute decision making, Int. J. Inf. 
Technol. Decis. Mak. 9(2), 257-280 (2010) 

Z. Xu: Priority weights derived from intuitionis- 
tic multiplicative preference relations in decision 
making, IEEE Trans. Fuzzy Syst. 21(4), 642-654 (2013) 
X. Zhang, Z. Xu: A new method for ranking in- 
tuitionistic fuzzy values and its application in 
multi-attribute decision making, Fuzzy Optim. De- 
cis. Mak. 11(2), 135-146 (2012) 

T. Chen: Multi-criteria decision-making meth- 
ods with optimism and pessimism based on 
Atanassov's intuitionistic fuzzy sets, Int. J. Syst. Sci. 
43(5), 920-938 (2012) 

S.K. Biswas, A.R. Roy: An application of intuitionis- 
tic fuzzy sets in medical diagnosis, Fuzzy Sets Syst. 
117, 209-213 (2001) 

|. Bloch: Lattices of fuzzy sets and bipolar fuzzy 
sets, and mathematical morphology, Inf. Sci. 
181(10), 2002-2015 (2011) 


6.112 


6.113 


6.114 


6.115 


6.116 


6.117 


6.118 


6.119 


6.120 


6.121 


6.122 


6.123 


6.124 


6.125 


6.126 


P. Melo-Pinto, P. Couto, H. Bustince, E. Bar- 
renechea, M. Pagola, F. Fernandez: Image segmen- 
tation using Atanassov's intuitionistic fuzzy sets, 
Expert Syst. Appl. 40(1), 15-26 (2013) 

P. Couto, A. Jurio, A. Varejao, M. Pagola, H. Bustince, 
P. Melo-Pinto: An IVFS-based image segmentation 
methodology for rat gait analysis, Soft Comput. 
15(10), 1937-1944 (2011) 

K.T. Atanassov: Answer to D. Dubois, S. Gottwald, 
P. Hajek, J. Kacprzyk and H. Prade's paper Termi- 
nological difficulties in fuzzy set theory - The case 
of Intuitionistic fuzzy sets, Fuzzy Sets Syst. 156(3), 
496-499 (2005) 

D. Dubois, H. Prade: An introduction to bipolar rep- 
resentations of information and preference, Int. 
J. Intell. Syst. 23, 866-877 (2008) 

D. Dubois, H. Prade: An overview of the asymmet- 
ric bipolar representation of positive and negative 
information in possibility theory, Fuzzy Sets Syst. 
160(10), 1355-1366 (2009) 

W.R. Zhang: NPN fuzzy sets and NPN qualitative 
algebra: a computational framework for bipolar 
cognitive modeling and multiagent decision anal- 
ysis, IEEE Trans. Syst. Man Cybern. B 26(4), 561-574 
(1996) 

W.R. Zhang: Bipolar logic and bipolar fuzzy partial 
orderings for clustering and coordination, Proc. 6th 
Joint Conf. Inf. Sci. (2002) pp. 85-88 

P. Grzegorzewski: On some basic concepts in prob- 
ability of IF-events, Inf. Sci. 232, 411-418 (2013) 

H. Bustince, P. Burillo: Correlation of interval- 
valued intuitionistic fuzzy sets, Fuzzy Sets Syst. 
74(2), 237-244 (1995) 

J. Wu, F. Chiclana: Non-dominance and attitu- 
dinal prioritisation methods for intuitionistic and 
interval-valued intuitionistic fuzzy preference re- 
lations, Expert Syst. Appl. 39(18), 13409-13416 (2012) 
Z. Xu, Q. Chen: A multi-criteria decision making 
procedure based on interval-valued intuitionistic 
fuzzy bonferroni means, J. Syst. Sci. Syst. Eng. 20(2), 
217-228 (2011) 

J. Ye: Multicriteria decision-making method using 
the Dice similarity measure based on the reduct 
intuitionistic fuzzy sets of interval-valued intu- 
itionistic fuzzy sets, Appl. Math. Model. 36(9), 
4466-4472 (2012) 

A. Aygunoglu, B.P. Varol, V. Cetkin, H. Ay- 
gun: Interval-valued intuitionistic fuzzy subgroups 
based on interval-valued double t-norm, Neural 
Comput. Appl. 21(1), $207—S214 (2012) 

M. Fanyong, Z. Qiang, C. Hao: Approaches to 
multiple-criteria group decision making based on 
interval-valued intuitionistic fuzzy Choquet in- 
tegral with respect to the generalized lambda- 
Shapley index, Knowl. Syst. 37, 237-249 (2013) 

W. Wang, X. Liu, Y. Qin: Interval-valued intuition- 
istic fuzzy aggregation operators 14, J. Syst. Eng. 
Electron. 23(4), 574-580 (2012) 


111 


9 | Y Hed 


112 


9 | Y Hed 


Part A 


Foundations 


6.127 


6.128 


6.129 


6.130 


6.131 


6.132 


6.133 


6.134 


6.135 


K. Hirota: Concepts of probabilistic sets, Fuzzy Sets 
Syst. 5, 31-46 (1981) 

R.R. Yager: On the theory of bags, Int. J. Gen. Syst. 
13, 23-37 (1986) 

S. Miyamoto: Multisets and fuzzy multisets. In: Soft 
Computing and Human-Centered Machines, ed. 
by Z.-Q. Liu, S. Miyamoto (Springer, Berlin 2000) 
pp. 9-33 

Y. Shang, X. Yuan, E.S. Lee: The n-dimensional 
fuzzy sets and Zadeh fuzzy sets based on the finite 
valued fuzzy sets, Comput. Math. Appl. 60, 442-463 
(2010) 

B. Bedregal, G. Beliakov, H. Bustince, T. Calvo, 
R. Mesiar, D. Paternain: A class of fuzzy multisets 
with a fixed number of memberships, Inf. Sci. 189, 
1-17 (2012) 

A. Amo, J. Montero, G. Biging, V. Cutello: Fuzzy clas- 
sification systems, Eur. J. Oper. Res. 156, 459-507 
(2004) 

J. Montero: Arrow's theorem under fuzzy rational- 
ity, Behav. Sci. 32, 267-273 (1987) 

A. Mesiarová, J. Lazaro: Bipolar Aggregation oper- 
ators, Proc. AGOP2003, Al-calá de Henares (2003) 
pp. 119-123 

A. Mesiarova-Zemankova, R. Mesiar, K. Ahmad: The 
balancing Choquet integral, Fuzzy Sets Syst. 161(17), 
2243-2255 (2010) 


6.136 


6.137 


6.138 


6.139 


6.140 


6.141 


6.142 


6.143 


6.144 


6.145 


A. Mesiarova-Zemankova, K. Ahmad: Multi-polar 
Choquet integral, Fuzzy Sets Syst. 220, 1-20 
(2013) 

W.R. Zhang, L. Zhang: YinYang bipolar logic and 
bipolar fuzzy logic, Inf. Sci. 165(3/4), 265-287 
(2004) 

W.R. Zhang: YinYang Bipolar T-norms and T- 
conorms as granular neurological operators, Proc. 
IEEE Int. Conf. Granul. Comput., Atlanta (2006) 
pp. 91-96 

F. Smarandache: A unifying field in logics: Neutro- 
sophic logic, Multiple-Valued Logic 8(3), 385-438 
(2002) 

V. Torra: Hesitant fuzzy sets, Int. J. Intell. Syst. 25, 
529539 (2010) 

V. Torra, Y. Narukawa: On hesitant fuzzy sets and 
decision, Proc. Conf. Fuzzy Syst. (FUZZ IEEE) (2009) 
pp. 1378-1382 

D. Molodtsov: Soft set theory. First results, Comput. 
Math. Appl. 37, 19-31 (1999) 

P.K. Maji, R. Biswas, R. Roy: Fuzzy soft sets, J. Fuzzy 
Math. 9(3), 589-602 (2001) 

Z. Pawlak: Rough sets, Int. J. Comput. Inf. Sci. 11, 
341-356 (1982) 

D. Dubois, H. Prade: Rough fuzzy-sets and fuzzy 
rough sets, Int. J. Gen. Syst. 17(3), 191-209 
(1990) 


Irina Perfilieva 


The theory of the F-transform is presented and 
discussed from the perspective of the latest devel- 
opments and applications. Various fuzzy partitions 
are considered. The definition of the F-transform 
is given with respect to a generalized fuzzy parti- 
tion, and the main properties of the F-transform 
are listed. The applications to image processing, 
namely image compression, fusion and edge detec- 
tion, are discussed with sufficient technical details. 


7.14 Fuzzy Modeling....................ccccccsseeeeeeees 113 
7.2 Fuzzy Partitions ...................ccccccceeee 114 
7.2.1 Fuzzy Partition 
with the Ruspini Condition........... 114 
7.2.2 Fuzzy Partitions with the 
Generalized Ruspini Condition...... 115 
7.2.3 Generalized Fuzzy Partitions ......... 116 


7.1 Fuzzy Modeling 


Fuzzy modeling is still regarded as a modern technique 
with a nonclassical background. The goal of this chap- 
ter is to bridge standard mathematical methods and 
methods for the construction of fuzzy approximation 
models. We will present the theory of the fuzzy trans- 
form (the F-transform), which was introduced in [7.1] 
for the purpose of encompassing both classical (usu- 
ally, integral) transforms and approximation models 
based on fuzzy IF-THEN rules (fuzzy approximation 
models). We start with an informal characterization of 
integral transforms, and from this discussion, we ex- 
amine the similarities and differences among integral 
transforms, the F-transform, and fuzzy approximation 
models. An integral transform is performed using some 
kernel. The kernel is represented by a function of two 
variables and can be understood as a collection of lo- 
cal factors or closeness areas around elements of an 
original space. Each factor is then assigned an aver- 


7. F-Transform 


23 Fuzzy Transfoma 117 
7.3.1 Direct F-Transform...................0088 117 
7.3.2 Inverse F-Transform..................0005 118 

7.4 Discrete F-Transform ............0....00..00ccc00e 119 

7.5 F-Transforms of Functions 
of Two Variables... 120 

7.6 F'=TramsfOrm ...........cccccccccccceecsceececeeeees 121 

7.7 Applications ............0....0..0. cece 122 
7.7.1 Image Compression 

and Reconstruction...................08. 122 

G2 Image FUSION eorr 125 
7.7.3 F'-Transform Edge Detector.......... 127 

7.8 Conclusions .........eeeeeeeeee 129 
REFEFENCeS......... occ cece cece eee eeceeaeseesaeeneeees 129 


age value of a transforming object (usually, a function). 
Consequently, the transformed object is a new function 
defined on a space of local factors. The F-transform 
can be implicitly characterized by a discrete kernel that 
is associated with a finite collection of fuzzy subsets 
(local factors or closeness areas around chosen nodes) 
of an original space. We say that this collection estab- 
lishes a fuzzy partition of the space. Then, similar to 
integral transforms, the F-transform assigns an aver- 
age value of a transforming object to each fuzzy subset 
from the fuzzy partition of the space. Consequently, 
the F-transformed object is a finite vector of average 
values. 

Similar to the F-transform, a fuzzy approximation 
model can also be implicitly characterized by a discrete 
kernel that establishes a fuzzy partition of an original 
space. Each element of the established fuzzy partition 
is a fuzzy set in the IF part (antecedent) of the re- 


113 


v 
kan] 

=- 

“= 
D> 
“I 
= 


14 PartA 


Foundations 


cL) Y Hed 


spective fuzzy IF-THEN rule. The rule characterizes 
a correspondence between an antecedent and an aver- 
age value of a transforming object (singleton model) or 
a fuzzy subset of a space of object values (fuzzy set 
model). 

To emphasize the differences among integral trans- 
forms, the F-transform, and fuzzy approximation mod- 
els, we note that the last two are actually finite collec- 
tions of local descriptions of a considered object. Each 
collection produces a global description of the consid- 
ered object in the form of the direct F-transform or the 
system of fuzzy IF-THEN rules. 

The idea of producing collections of local descrip- 
tions by fuzzy IF-THEN rules originates from the 
early works of Zadeh [7.2-5] and from the Takagi- 
Sugeno [7.6] approximation models. 

Similar to the conventional integral transforms (the 
Fourier and Laplace transforms, for example), the F- 
transform performs a transformation of an original 
universe of functions into a universe of their skeleton 


7.2 Fuzzy Partitions 


In this section, we present a short overview of various 
fuzzy partitions of a universe in which transforming 
objects (functions) are defined. As we learned from 
Sect. 7.1, a fuzzy partition is a finite collection of fuzzy 
subsets of the universe that determines a discrete kernel 
and thus a respective transform. Therefore, we have as 
many F-transforms as fuzzy partitions. 


7.2.1 Fuzzy Partition 
with the Ruspini Condition 


The fuzzy partition with the Ruspini condition (7.1) 
(simply, Ruspini partition) was introduced in [7.1]. 
This condition implies normality of the respective 
fuzzy partition, i. e., the partition-of-unity. It then leads 
to a simplified version of the inverse F-transform. 
In later publications [7.15,16], the Ruspini condition 
was weakened to obtain an additional degree of free- 
dom and a better approximation by the inverse F- 
transform. 


Definition 7.1 

Let xı <---<x, be fixed nodes within [a,b] such 
that xı = a,x, = b and n> 2. We say that the fuzzy 
sets A,,...,A,, identified with their membership func- 
tions defined on [a, b], establish a Ruspini partition of 


models (vectors of F-transform components) for which 
further computations are easier (see, e.g., an application 
to the initial value problem with fuzzy initial condi- 
tions [7.7]). In this respect, the F-transform can be as 
useful in applications as traditional transforms (see ap- 
plications to image compression [7.8, 9] and time series 
processing [7.10-14], for example). Moreover, some- 
times the F-transform can be more efficient than its 
counterparts; see the details below. 

The structure of this chapter is as follows. In 
Sect. 7.2, we consider various fuzzy partitions: uniform 
and with and without the Ruspini condition, among oth- 
ers; in Sect. 7.3, definitions of the F-transforms (direct 
and inverse) and their main properties are considered; 
in Sect. 7.4, the discrete F-transform is defined; in 
Sect. 7.5, the direct and inverse F-transform of a func- 
tion of two variables is introduced; in Sect. 7.6, a higher 
degree F-transform is considered; in Sect. 7.7, appli- 
cations of the F-transform and F'-transform to image 
processing are discussed. 


[a, b] if they fulfill the following conditions for k = 
i eee 


1. Ag: [a,b] — [0,1], Ax) = 1 

2. A(x) = 0 if x Z (xk—1, X41), Where for uniformity 
of notation, we set x) = a and x,4) = b 

3. A(x) is continuous 

4. A(x), for k=2,...,n, strictly increases on 
[x,—1, xz] and A; (x) fork = 1,...,n—1, strictly de- 
creases on [xk, xk+1] 

5. forall x € [a,b], 


Aw =i. (7.1) 
k=1 


The condition (7.1) is known as the Ruspini condi- 
tion. The membership functions A;,...,A, are called 
the basic functions. A point x € [a, b] is covered by the 
basic function A, if A(x) > 0. 

The shape of the basic functions is not predeter- 
mined and therefore, it can be chosen according to 
additional requirements (e.g., smoothness). Let us give 
examples of various fuzzy partitions with the Ruspini 
condition. In Fig. 7.1, two such partitions with triangu- 
lar and cosine basic functions are shown. The following 
formulas represent generic fuzzy partitions with the 


F-Transform | 7.2 Fuzzy Partitions 


Ruspini condition and triangular functions 


(x—x1) 
— , xE [x1, x], 
A(x) = hy 
0, otherwise , 
(x —xx~1) 
LOCEL yei, 
ħhk—ı 
A(x) = (x— x, 
œ% ja xE [XE eta]. 
hg 
0, otherwise , 
Se ekekrt, 
A, (x) = hy-1 
0, otherwise , 


where k = 2,...n—1 and hk = xg-41 — Xk. 

We say that a Ruspini partition of [a, b] is h-uniform 
if its nodes x,,...,%,, where n > 3, are equidistant, 
i.e., xx =ath(k—1), fork =1,...,n, where h = (b — 
a)/(n— 1), and the two additional properties are met: 


6. A(x — x) = Arx +x), for all xe [0,4], k= 
2,...,n—l, 

7. Ag(x) = Ap—1(x—h), for all kK=2,...,n—1 and 
x E [xk, Xe], and Ag+i(x) = A(x — h), for all 
k=2,...,n—Land x € [xk, Xk+ 1]. 


7.2.2 Fuzzy Partitions with the 
Generalized Ruspini Condition 


Fuzzy partitions with the generalized Ruspini con- 
dition were introduced in [7.15]. The generalization 
consists in replacing partition-of-unity (7.1) by fuzzy r- 
partition (7.2). This type of partition was investigated 
in [7.15,17], where the focus was on smoothing or 
filtering data using the inverse F-transform. The follow- 
ing definition is taken from [7.15]. 


Definition 7.2 
Let r > 1 and n > 2 be fixed integers such that r < n. 
Let a = xı < ++- < Xn = b be nodes within [a,b], and 
let xj, <+ < xo <a and b< X41) <+ < Xn+r be 
nodes outside of [a,b]. A fuzzy r-partition of [a, b] is 
a family of n + 2r —2 continuous, normal, convex fuzzy 
sets 
AWM 


2—pr te 


(r) (r) o) 
Ay get Ay ira Antri 


such that the following conditions are fulfilled: 


Fig. 7.1a,b Two Ruspini partitions with triangular (a) and cosine 


basic functions (b) 


1. Fork=1,...,n, Ao is a continuous function on 

[a, b] such that A? (xx) = 1 and AO (x) = 0 for x ¢ 

[max (x;,—,, a), min(x+,, b)| 

2. Fork=1,...,n, Aw is increasing on 

[max(x,—,, a), x] and decreasing on 

[xx, min(xk+r, b)] 

3. Fork =-—r+2,...,0, AO is decreasing on 

[max (x, a), Xk+r] 

4. Fok=n+1,...,n+r— Ae is increasing on 

|Xk—rs min(x,, b)| 

5. For all x € [a, b], the following partition-of-r condi- 
tion holds 


n+r—1 
Y aP @=r (7.2) 


k=—r+2 


If r= 1, then a fuzzy r-partition in the sense of Defi- 
nition 7.2 becomes the standard fuzzy partition in the 
sense of Definition 7.1, 1.e., the partition-of-unity. In 
Fig. 7.2, the fuzzy 2-partition with triangular basic 
functions is shown. 


0 
0 


Fig. 7.2 An example of a fuzzy 2-partition with triangular basic 


functions 


15 


cL | Y Hed 


16 PartA | Foundations 


cL) Y Hed 


7.2.3 Generalized Fuzzy Partitions 


A generalized fuzzy partition appeared in [7.16] in 
connection with the notion of the higher degree F- 
transform. Its even weaker version was implicitly in- 
troduced in [7.18] with the purpose of meeting the 
requirements of image compression. We summarize 
both these notions and propose the following definition. 


Definition 7.3 
Let [a,b] be an interval on R, n>2, and let 
X0, X1, -< <, Xn, Xn+1 be nodes such that 


a = Xo LS X1 <0 < Xn S Xni =b. 
We say that the fuzzy sets 
Aj,..-,An: [a,b] > [0, 1] 


constitute a generalized fuzzy partition of [a,b] if for 
every k = 1,...,n there exist h}, h > 0 such that 


h, thi > 0, [xk — hi, xk + hg] E [a,b] 


and the following three conditions are fulfilled: 

1. (locality) — Ag(x) > 0 if x € (xk — hi, Xx + hY), and 
A(x) = 0 if x € [a, b] \ [xe — hy xe + hy] 

2. (continuity) — A, is continuous on [xx — hy, xk + AY] 

3. (covering) — for x € [a, b], \7p—, Ax (x) > 0. 


It is important to remark that by conditions of local- 
ity and continuity, 


b 


Jawa >0. 


a 


An (h,h’)-uniform generalized fuzzy partition of 
[a, b] is defined for equidistant nodes 


xy =ath(k—1),k=1,...,n, 


where h = (b—a)/(n—1), h’ > h/2 and two additional 
properties are satisfied: 


4. A(x) = Ap—1(x—/A) for all K=2,...,n—1 and 
xE [xXx Xk+1], and Ag+i(x) = A,(x—h) for all k= 
2,...,n—Land x € [xx, Xk+ 1]. 

5. hy =h =0, h =h, =- =, =h, =k and 
for all k= 2,...,n—1 and all x€ [0, h], Ar (xk — 
x) = Ax (x, + x). 


An (h,h’)-uniform generalized fuzzy partition of 
[a, b] can also be defined using the generating function 
Ao: [—1, 1] — [0, 1], which is assumed to be even, con- 
tinuous, and positive everywhere except for on bound- 
aries, where it vanishes. (The function Ag : [—1, 1] > R 
is even if for all x € [0, 1], Ag(—x) = Ao(x).) Then, ba- 
sic functions A, of an (h, h’)-uniform generalized fuzzy 
partition are shifted copies of Ao in the sense that 


XX, 
Aj (x) Ao ( W 
1v) = 
0, otherwise , 


and for k = 2,...,n— 1, 


k xE [ix +h’), 


X— Xk 
Ax(x) Ao ( W 
k(x) = 
0, otherwise , 
Ao (=>) , xE [nk , Xn], 
0, otherwise . 


7 xek h, x +h’), 


’ 


An (x) = 


(7.3) 


As an example, we note that the function Ag(x) = 1—|.| 
is a generating function for all uniform triangular par- 
titions. The difference between them is in parameters h 


l 
1 
Xk Xk+l  Xn-1 |b Xn+1 


Fig. 7.3 Generating function Ao of an h-uniform generalized fuzzy partition (after [7.19]) 


F-Transform | 7.3 Fuzzy Transform 


and h’. An (h, h)-uniform generalized fuzzy partition is 
simply called an h-uniform one (Fig. 7.3). 


Remark 7.1 
A generalized fuzzy partition can also be consid- 
ered in connection with radial membership functions; 


7.3 Fuzzy Transform 


The F-transform establishes a correspondence between 
a set of continuous functions on an interval of real num- 
bers and the set of n-dimensional (real) vectors. Each 
component of the resulting vector is a weighted local 
mean of a corresponding function over an area covered 
by a corresponding basic function. The vector of the 
F-transform components is a simplified representation 
of an original function that can be used instead of the 
original function in many applications. Among them, 
let us mention applications to image compression [7.8, 
9], image fusion, image reduction, time series process- 
ing [7.10-14], and the initial value problem with fuzzy 
initial conditions [7.7]. 


7.3.1 Direct F-Transform 


In this section, we give the definition of the F-transform 
according to [7.1] and recall the main properties of 
it. We assume that the universe is an interval [a, b] 
and x; <--- <x, are fixed nodes from [a, b] such that 
xı =a, Xn =b and n> 2. Let us formally extend the 
set of nodes by x9 =a and x,4); = b. Let Aj,...,Ap, 
be the basic functions that form a fuzzy partition of 
[a, b] according to Definition 7.3. Let C([a,b]) be the 
set of continuous functions on the interval [a, b]. The 
following definition introduces the fuzzy transform of 
a function f € C([a, b]). 


Definition 7.4 

Let A,,...,A, be the basic functions that form a gen- 
eralized fuzzy partition of [a,b] and f be any function 
from C([a,b]). We say that the n-tuple of real num- 
bers F[f] = (F1, ... , Fa) given by 


b 
p, = fa fOr 


z k=1,.. 
J, Ag(x)dx 


sN, (7.4) 


is the (integral) F-transform of f with respect to 
Aj,...,An. 


see [7.20]. In this case, every basic function has 
a generic representation in terms of a kernel o : RF —> 
R such that 


Ax (x) = g(x — xl); K=1,...,0. 


The elements F),...,F, are called the components of 
the F-transform. If A,,...,Ap is an h-uniform Ruspini 
partition, then (7.4) may be simplified as follows, 


2 Ff 
m=? J EAO 


Fa =< J rona, 


Xn—1 


Xk+ 
R= J ronwa, k=2,...,n—1. (7.5) 


“k—1 
The following is a list of some properties of the F- 
transform of f with respect to a generalized fuzzy 
partition of [a, b]: 


(a) If for all x € [a, b], f(x) = C, then 
F.=C, k= l,... n. 

(b) Iff = æg + Bh, then 
F[f] = aF [s] + BF (A). 

(c) If [c, d] = {f (x) | x € [a, b]}, then 
Fy = mingea] LE —y)?Ay(x)dx, k=1,..., n. 

(d) If f is twice continuously differentiable on [a, b], 
then Fy = f (xx) + O(h?), k=1,...,n. (This is true 
for an h-uniform Ruspini partition of [a,b] only. 
A similar estimation of the F-transform compo- 
nent Fy as a linear combination of f(x,—r-+ 
DY) ite Fesi f(x, +r—1) can be established 
for a fuzzy r-partition [7.15].) 

(e) If a generalized fuzzy partition is (h, h’)-uniform, 
then foreach k = 1,...,n—1, 


If) — Fil < 2o(h,f) . 
|f(t) —Frtil <2a(h,f). 


where A = max(h,h’), t € [xe Xk + hj, and 


o(h,f) = max max f |f(x+6)—f(x)|. 


|\8|<h x€[a.b— 


(7.6) 


117 


E'L | Y Hed 


18 PartA | Foundations 


€2 | V Hed 


(f) 
n=l 


b 
Fi F, 
frown F+DF). 


(This is true for an h-uniform Ruspini partition of 
[a, b] only.) 


7.3.2 Inverse F-Transform 


It is clear that an original nonconstant function f can- 
not be precisely reconstructed from its F-transform F[f] 
because we lose information when passing from f to 
F[f]. However, the inverse F-transform f that can be 
reconstructed (using the inversion formula (7.7)) ap- 
proximates f in such a way that universal convergence 
can be established. 


Definition 7.5 
Let A,,...,A, be the basic functions that form a gener- 
alized fuzzy partition of [a, b] and f be a function from 
C((a, b]). Let F[f] = (F,,...,F,) be the F-transform 
of f with respect to Aj,...,A,. Then, the function f : 
[a, b] > R represented by 

jai FRAR(x) 


is 2. k=1 
f(x) = 7 Ag(x) 


is called the inverse F-transform. 


, (7.7) 


Remark 7.2 

If a fuzzy partition of [a,b] fulfills the generalized 
Ruspini condition (7.2) with r > 1, then the inversion 
formula (7.7) can be simplified to 


f@= : DFA) 
k=1 


or to (in the case of the Ruspini partition for which 
r=1) 


fœ) = DFA) : 
k=1 


The following theorem demonstrates that the in- 
verse F-transform f can approximate a continuous 
function f with arbitrary precision. Thus, it explains 
why the F-transform has convincing applications in 
various fields, including image and time series process- 
ing, and data mining [7.21]. In Fig. 7.4, we illustrate 


10.8 


0 1 2 3 4 5 6 


Fig. 7.4 The function f(x) = 10e~°—” ¥ (gray) and its 
inverse F-transform (brown) with respect to the uniform 
Ruspini partition of [0,6] by 29 triangular-shaped basic 
functions. The F-transform components are marked by 
small circles 


how the inverse F-transform approximates the function 
10e— 7)", 


Theorem 7.1 

Let f be a continuous function on [a, b]. Then, for any 
é > 0, there exist ng and a generalized fuzzy partition 
Aj,...,An, Of [a, b] such that for all x € [a, b], 


eq8|f (x) —fe(x)| <e, (7.8) 


where fo is the inverse F-transform of f with respect to 
the fuzzy partition A], ... , Ane- 


From Theorem 7.2, which is given below, we learn 
that for a pointwise approximation (as in Theorem 7.1), 
it is sufficient to compute the F-transform with respect 
to the simplest triangular fuzzy partition. Therefore, al- 
most all applications of the F-transform are based on 
this type of partition. 


Theorem 7.2 

Let f be any continuous function on [a,b], and let 
Al... ., A} and AY, ..., AV, for n > 3, be the basic func- 
tions that form different (,h’)-uniform generalized 
fuzzy partitions of [a, b]. Let f” and f” be the two in- 
verse F-transforms of f with respect to different sets of 
basic functions Aj,...,A’ or AY,...,A”. Then, for ar- 
bitrary x € [a, b], 


MOMOE 40h, f) 


where h = baa, h = max(h, h’) and œ(h, f) is the mod- 


ulus of continuity (7.6) of f on the interval [a, b]. 


F-Transform | 7.4 Discrete F-Transform 119 


a) 


b) 


Fig. 7.5 (a) Function f(x) = 10e—°—* y (gray) and its in- 
verse F-transform (brown) with respect to the Ruspini 
partition given by the triangular-shaped basic functions 
Aj,...,As5 (gray). (b) Noisy function f + s (gray), where 
s(x) = sin(2x) + 0.6 sin(8x) + 0.3 sin(16x), and its inverse 
F-transform (brown) with respect to the same fuzzy par- 
tition. Both inverse F-transforms f and fs are equal on 
[x2, x4] 


7.4 Discrete F-Transform 


The discrete case of the F-transform, for which an orig- 
inal function f is defined (may be computed) on a finite 
set P = {p1,...,pi} C [a,b], was introduced in [7.1]. 
We will adapt the mentioned definition to the case of 
a generalized fuzzy partition of [a, b]. 

We assume that the domain P of the function f is 
sufficiently dense with respect to the fixed partition, i. e., 


YOGDA) > 0. 


Then, the (discrete) F-transform of f is defined as fol- 
lows. 


Definition 7.6 
Let A,,...,A,, for n > 2, be the basic functions that 
form a generalized fuzzy partition of [a, b], and let func- 


The proofs of Theorems 7.1 and 7.2 can be obtained 
from the respective proofs in [7.22, Theorems 2 and 3] 
after some necessary changes caused by the usage of 
the generalized fuzzy partition. 

Below, we list some properties of the inverse F- 
transform f of f that were considered and proved 
in [7.1, 15, 23]. If not specially mentioned, it is assumed 
that the F-transform is computed with respect to a gen- 
eralized fuzzy partition of [a, b]: 


(a) If for all x € [a,b], f@=C, then f(x) =C 

(b) Iff=ag+ Bh, thenf =a + Bh 

(c) f? fod = f? f(x)dx (This is true for the fuzzy r- 
partition (r > 1) of [a, b] only.) 

(d) Let A;,..., An be an h-uniform Ruspini partition of 
[a, b], where h = (b—a)/(n— 1) and n > 3. Let s : 
[a, b] > R be a continuous function such that one 
of the following two conditions are fulfilled: 


(i) s is 2h-periodical and for all x € [0, A], s(x, — 
x) = —s(xk + x), where k = 2,...,n—1 

(ii) s is h-periodical and [a _ 5(x)dx = 0, where 
k=2,...,n—1. 


Then, for x € [x2, X»—1], 
fafts. 


The last property is known as noise removal. This 
phrase implies that both functions f (non-noisy) and f + 
s (noisy) have the same inverse F-transform. The noise 
is represented by s and characterized by conditions (i) 
or (ii). We illustrate this property in Fig. 7.5. 


tion f be defined on the set P = {p),...,p7} C [a, b], 
which is sufficiently dense with respect to the partition. 
We say that the n-tuple of real numbers (F|,..., Fn) is 
the discrete F-transform of f with respect to A,,...,An 
if 
l 
F, = 2p FA) (7.9) 
dja 1 Ac) 


It is not difficult to demonstrate that the components of 
the discrete F-transform have similar properties to those 
listed in Sect. 7.3.1. 

In the discrete case, we define the inverse F- 
transform on the same set P on which the original 
function is defined. 


H°. | Y Hed 


120 ~PartA | Foundations 


G°L | Y Hed 


Definition 7.7 

Let A;,...,A,, for n > 2, be the basic functions that 
form a generalized fuzzy partition of [a, b], and let func- 
tion f be defined on the set P = {p1,..., pi} C [a, b], 
which is sufficiently dense with respect to the parti- 
tion. Moreover, let F[f] = (F1, ..., Fn) be the discrete 
F-transform of f w.r.t. A1,..., An. Then, the function 
f : P — R represented by 


a Dok FrArp) 
f (pj) = TA 


is the inverse discrete F-transform of f . 


(7.10) 


Remark 7.3 

If a fuzzy partition of [a,b] fulfills the generalized 

Ruspini condition (7.2) with r > 1, i.e., for all pj € P, 
r=1 4k(p;) = r, then the inversion formula (7.10) can 

be simplified to 


x 1 n 
fp) = 7 9 FA) 
k=1 


or (in the case of Ruspini partition, i. e., r = 1) to 


fp) = Yo FAP) ; 
k= 


Analogous to Theorem 7.1, we can show that 
the inverse discrete F-transform f can approximate 
the original discrete function f on P with arbi- 
trary precision [7.1]. Moreover, the properties (a)— 
(c) that are listed in Sect. 7.3.2 have valid discrete 
analogies. 

An interesting comparison between the discrete F- 
transform and the least-square approximation was made 
in [7.20]. It was demonstrated that the discrete F- 
transform is invariant with respect to the interpolating 
and least-squares approximation of the set {(p;, f (p;)) | 
j= 1,...,l}. This means that the best approximation 
of f on P in the form of )~'_, a;A;, where n< l, 
has the same direct discrete F-transform as the origi- 
nal f. 


7.5 F-Transforms of Functions of Two Variables 


The direct and inverse F-transform of a function of two 
(and more) variables is a direct generalization of the 
case of one variable. We introduce it briefly and refer 
to [7.1] for more details. 

Suppose that the universe is a rectangle [a, b] x 
[c,d] € Rx R and that x; <--- < x, are the fixed nodes 
of [a, b] and yı <--- < Ym are the fixed nodes of [c, d] 
such that xı = a, x, = b, yı = C, Xm = d and n,m > 2. 
Let us formally extend the set of nodes by setting 
Xo = 4, Yo = C, Xn+1ı = b, and ym+1ı = d. Assume that 
A,...,A, are the basic functions that form a general- 
ized fuzzy partition of [a,b] and B4, ...,Bm are basic 
functions that form a generalized fuzzy partition of 
[c,d]. Then, the rectangle [a, b] x [c,d] is partitioned 
into fuzzy sets A, x Bı with the membership functions 
(AxB) (x, y) =Ag(W By), k= 1,...,n, l= 1,...,m 
Let C([a, b] x [c, d]) be the set of continuous functions 
of two variables on the domain and f € C([a, b] x [c, d]). 


Definition 7.8 

Let A,,...,A, be the basic functions that form a gen- 
eralized fuzzy partition of [a,b] and B4, ..., Bm be the 
basic functions that form a generalized fuzzy partition 
of [c,d]. Let f be any function from C([a, b] x [c, d]). 
We say that the n x m-matrix of real numbers F[f] = 
(Fri)nxm is the (integral) F-transform of f with respect 


ee and B,,..., 
1=1,. 


Bm if for each k= 1,...,n, 


oH nae y)AL(a)Bi(y)dxdy 
fo [Ë Ac Bi(y)dxdy 


Fu = (7.11) 


The components Fw (7.11) have properties (adapted 
to the case of two variables) similar to those listed 
in Sect. 7.3.1. For example, the property (e) has the 
following form (we assume that A;,...,A, form an hı- 
uniform Ruspini partition of [a, b] and By,..., Bm form 
an h2-uniform Ruspini partition of [c, d]) 


d b 
/ / f(x, y)dxdy 


hh 
a (Fi + Fim + Fri + Fim) 


oe le P (Sorat Shit Fut SF) 
n—1m—1 


+ hy MY} Fu. 
k=2 [=2 


In the discrete case, when an original function f is 
known only at points (p;, qj) € [a, b] x [c, d], where i = 


F-Transform |76 F'-Transform 121 


1,...,N and j= 1,...,M, the (discrete) F-transform 
of f can be introduced in a manner analogous to the 
case of a function of one variable. This case is important 
for applications of the F-transform to image process- 
ing [7.8, 9, 18, 24-26]. 


Definition 7.9 

Let a function f be given at nodes (p;, q) € [a, b] x 
[c,d], for which i=1,...,N and j=1,...,M, and 
Aj,...,A, and B,,...,Bm, where n < N and m < M, be 
the basic functions that form generalized fuzzy parti- 
tions of [a, b] and [c, d], respectively. Suppose that sets P 
and Q of these nodes are sufficiently dense with respect 
to the chosen partitions. We say that the n x m-matrix of 
real numbers F[f] = (Fiz)am is the discrete F-transform 
of f with respect to A,,...,A, and B,,..., Bm if 


DO EL SP gerd Bq) 
Ye Die) Ac Bi(q) 
holds for all k = 1,...,n, l= 1,...,m. 


Fy (7:12) 


7.6 F'-Transform 


In [7.16], a higher degree F-transform was introduced 
for the purpose for advanced applications in time series 
and image processing [7.26, 27]. In this section, we give 
a description of the F'-transform, which has working 
applications, and refer to [7.16] for the F”"-transform 
for which m > 1. 

Throughout this section, we assume that A4, . . . , An, 
n> 2 is an h-uniform generalized fuzzy partition of 
[a,b] such that there exists a generating function Ag : 
[—1, 1] — [0, 1] such that for all k =1,...,n, Ay is de- 
fined by (7.3) (the illustration is in Fig. 7.3). 

Let k be a fixed integer from {1,...,”}, and let 
L,(Ax) be a normed space of square-integrable func- 
tions f: [xx—1. xk+1] > R, where the norm |[f||, is 
given by 


| GE POA) dx 


Xk—1 


By L:(Aı,..., An) we denote a set of functions f : 
[a, b] — R such that for all k = 1,... sN, f Eip] E 
L(A), where fliw—i.4ı] i8 the restriction of f on 
[er=i: X4]: 

For any function f from L2(A1,..., An) we define 
the F!-transform of f with respect to Ay,...,A, as the 


The inverse F-transform of a function of two vari- 
ables is a simple extension of (7.7). It will be given 
below for the continuous version of a function. 


Definition 7.10 

Let Aj,...,A, and Bi,...,Bm be the basic func- 
tions that form generalized fuzzy partitions of [a,b] 
and [c,d], respectively. Let f be a function from 
C([a, b] x [c, d]) and F[f] be the F-transform of f with 
respect to A;,...,A, and B,,...,B,,. Then, the func- 
tion f : [a, b] x [c, d] > R represented by 


ke Dope FAn(x)Bi(y) 
Via D1 Ac) BQ) 


is called the the inverse F-transform. 


f(xy) = (7.13) 


Similar to the case of a function of one variable, we 
can prove that the inverse F-transform f can approxi- 
mate the original continuous function f with arbitrary 
precision, and the (adapted) properties (a)—-(c), which 
are listed in Sect. 7.3.2, are fulfilled. 


vector of linear functions 


F' [f] = (cio +01,1 8-1), -> Cn,0 + Cn,1 &—Xn)) 5 


(7.14) 
where for every k= 1,...,n, 
SE FODAR)dx 
Ck,9 = —————_ 
hso 
So FO xWAc (ade 
G1 = Sa 5 ‘ (7.15) 
S œ x) A dx 
and 
1 
so = IEOS 


The kth component of the vector F! [f] is denoted by 
Fi]. 

The following is a list of properties of the F!- 
transform of f with respect to a generalized fuzzy 
partition of [a, b]. They are particular cases of the prop- 
erties of the F’”-transform proved in [7.16]: 


(a) Let Fy and cg.o + ck. (x— xp), for k= 1,...,n, be 
respective kth components of F! [f] and F[f]. Then, 
Fy, = Ck,0- 


92 | Y Hed 


122 


LL | Y Hed 


Part A | Foundations 
4 

Fig. 7.6 Function f, its F!-transform components F}, ..., F} 

F?,...,F?,..., F? (star nodes) (after [7.16]) 

(b) If for all x € [a, b], f(x) = d+ cx, then all the com- In Fig. 7.6, we show a schematic representation of 
ponents of F!-transform of d + cx are equal to (d+ the F!-transform components of a generic function f. 
cx) +c(x— x), k= 1,...,n. Finally, we give simplified expressions of F!- 

(c) Iff = æg + Bh, then F! [f] = «F! [g] + BF' [A]. transform components with respect to an h-uniform 

(d) cko + ck, (4 — xk) = min ||f (x) — (d + c(x— triangular fuzzy partition [7.16] 
xoll, K=1,...,n, where min is considered Xe 
over the set of functions of the form (d + c(x—x,)). az as FO)AR(a) dx (7.16) 

(e) If f is four times continuously differentiable on , h 
[a, b], then 12 SE" SOE xAc dx 

cro =f) + OUP), S 7 pee 
cki =f A) +O), k=1,...,n. where k = 1,...,n. 


7.7 Applications 


In this section, we consider applications of the F- 
transform and F'!-transform to image processing. 


7.7.1 Image Compression 
and Reconstruction 


A method of lossy image compression and reconstruc- 
tion using fuzzy relations was proposed in [7.19]. The 


dominant idea was a choice of suitable granulation (rep- 
resented by a fuzzy relation) of an image domain. We 
will refer to this method as FEQ. F-transform image 
compression (FTR) is based on the same idea of gran- 
ulation but connects it with fuzzy partitions [7.1,9]. In 
the cited papers, two approaches were proposed: a uni- 
form fuzzy partition of the entire domain [7.1] and 
a two-step partition [7.9] in which initially the entire do- 


F-Transform | 7.7 Applications 


main is partitioned into blocks and second, each block is 
uniformly partitioned into fuzzy sets. Both approaches 
were compared with JPEG and other compression tech- 
niques (including FEQ) [7.9], and the conclusion was 
that the F-transform-based method is slightly worse 
than JPEG but better than FEQ. Two further improve- 
ments of the F-transform-based compression have been 
proposed in [7.18, 28], where an advantage over JPEG 
was achieved in many cases. 

In this section, after reiterating the principles of 
image compression and reconstruction using the F- 
transform and its inverse, we explain how a proper 
choice of a fuzzy partition improves the quality of the 
reconstructed image. A detailed elaboration and com- 
parison with other existing techniques is in [7.18, 28] 
and will be presented in subsequent papers. 


Principles of Image Compression 

Using the F-Transform 
Let a grayscale image of size N x M pixels be repre- 
sented by a function of two variables u : NxM — [0, 1]. 
The value u(i, j) represents the intensity range of each 
pixel in the gray scale. The problem of image compres- 
sion is to reduce the image’s size to save space or trans- 
mission time. A desirable size n x m (where n < N and 
m < M) of a compressed image can be obtained from 
the compression ratio, p = nm/(NM). If a compression 
method is lossy (JPEG, FEQ, and the F-transform, for 
example), then the respective reconstruction ĉ to a full 
size image is compared with the original image using 
the two quality indices PSNR (peak signal-to-noise ra- 
tio) and RMSE (root-mean-square error), where 


PSNR = 201In 2 : 
RMSE 
and 
evan i Ti Deap- itn 
z i 


Simple F-Transform Compression 
In [7.1], we proposed representing a compressed image 
by the n x m matrix U of F-transform components 


Uy... Uim 


Uni tae Unm 


computed over uniform fuzzy partitions (usually, trian- 
gular) Aj,...,A, and B}, ..., Bm of the entire domains 


[1, N] and [1, M], respectively 
SE DE ui DAROBD 


DL Die Ae BiG) 
ea lrst Pa lam: 


Uns 


We proposed reconstructing U to a full-size image using 
the inverse F-transform of u such that 


n m 


aii, j) = 5 > UpAr@Bif/) . 


k=1/=1 


This method does not take advantage of any property 
of the original image and therefore, its quality is not 
very high. Let us illustrate it on the image Camera- 
man taken from the Corel Gallery. In Fig. 7.7, we 
show the original image and its reconstruction using the 
simple F-transform compression described above. The 
compression ratio is p = 0.25, and PSNR = 25.422 
(compare with PSNR = 38.8 for JPEG with a similar 
compression ratio). 


F-Transform Compression with Block 

Decomposition 
This F-transform-based compression [7.9] was inspired 
by the JPEG method in which, at first, the entire domain 
was decomposed into blocks and then, each block was 
compressed according to a compression ratio. In [7.9], 
the same principle is used. In the first step, a decompo- 
sition into blocks of the same size is performed, where 
the size (chosen experimentally) is such that a certain 
quality of approximation by the inverse F-transform 
should be guaranteed (Theorem 7.1). Each block is then 
uniformly partitioned into cosine-shaped fuzzy sets and 
compressed by the simple F-transform method accord- 
ing to a compression ratio. In comparison with the 
simple F-transform compression, this method consid- 
ers the peculiarities of the original images when making 


Fig. 7.7a,b Original image Cameraman (a) and its reconstruc- 
tion after applying the simple F-transform compression (b) with 
PSNR = 25.422 


123 


22 | Y Hed 


124 


Z°. | Y Hed 


Part A | Foundations 
the block decomposition. In Fig. 7.8, we show the 
ESNE Camera PSNR quality measure of the image Cameraman com- 
504 quality 8 
45 pressed using three methods: FEQ, F-transform with 
40 block decomposition and JPEG. It is easily observed 
that the JPEG method is still better than the F-transform 
2 with block decomposition, whereas the latter is better 
w than FEQ. However, for the particular image Camera- 
25 e man and the compression ratio p = 0.25, the value of 
20 PSNR of the F-transform with block decomposition 
15 = a is similar to that of the simple F-transform compres- 
10 — JPEG sion: 25.0676 versus 25.422, respectively. This means 
5 that the uniform partition, even when applied to both 
0 »! steps independently, is not effective with respect to 
0 0.2 0.4 „06 the quality estimated by PSNR. In the next subsection, 
Compres ionta we propose an F-transform compression method [7.18] 
Fig. 7.8 The PSNR values of the image Cameraman com- that is almost nonlossy and is based on a nonuni- 
pressed using three methods: FEQ, the F-transform with form generalized partition adapted to each particular 
block decomposition, and JPEG (after [7.29]) image. 
Advanced Image Compression 
If we analyze the properties of the F-transform 
(Sect. 7.3.1), then it is immediate from (a) that the 
more the function behaves like a constant, the better is 
the approximation quality of the inverse F-transform. 
Thus, the following recommendation regarding the 
choice of a proper generalized fuzzy partition can be 
made: 
= © A generalized fuzzy partition of the domain [1, N] x 
[1, M] into fuzzy sets A, x Bı, where k= 1,...,n 
and /=1,...,m, should guarantee that the differ- 
ence between extremal values of the image over 
each A; x Bı is not greater than ¢ >0 or (if the 
Fig. 7.9 The quad tree algorithm and the generalized fuzzy preceding condition cannot be fulfilled) the area of 
partition on its base A, x Bı is not greater than ô > 0. 


Fig. 7.10a,b Two reconstructions of the image Cameraman after 
applying the advanced F-transform compression (the ratio is 0.188) 
with the histogram restoring (a) and without it (b). The PSNR val- 
ues are 29 (a) and 30 (b) 


There are several algorithms that can produce a gen- 
eralized fuzzy partition with the mentioned property. 
In [7.18], we used the quad tree algorithm for this pur- 
pose; see the illustration in Fig. 7.9. 

Let us add that the advanced image compression al- 
gorithm [7.18] uses the following two tricks to increase 
the quality of the reconstructed image: 


@ Preserve sharp edges 
@ Restore the histogram of the original image. 


Figure 7.10 shows how the histogram restoration 
influences the quality of the reconstructed image. In 
Fig. 7.11, we see that the PSNR values of the advanced 
F-transform and the JPEG are almost equal. 


F-Transform | 7.7 Applications 


7.7.2 Image Fusion 


Image fusion aims to integrate complementary distorted 
multisensor, multitemporal, and/or multiview scenes 
into one new image that contains the best parts of each 
scene. Thus, the primary problem in image fusion is to 
find the least distorted scene for every pixel. 

A local focus measure is traditionally used for the 
selection of an undistorted scene. The scene that maxi- 
mizes the focus measure is selected. Usually, the focus 
measure is a measure of high-frequency occurrences 
in the image spectrum. This measure is used when 
a source of distortion is connected with blurring, which 
suppresses high frequencies in an image. In this case, 
it is desirable that a focus measure decreases with an 
increase in blurring. 

There are various fusion methodologies that are cur- 
rently in use. They can be classified according to the 
primary technique: aggregation operators [7.22], fuzzy 
methods [7.30], optimization methods (e.g., neural net- 
works and genetic algorithms [7.29]), and multiscale 
decomposition methods based on various transforms 
(e.g., discrete wavelet transforms; [7.31]). 

The F-transform approach to image fusion was 
proposed in [7.32,33]. The primary idea is a com- 
bination of (at least) two fusion operators, both of 
which are based on the F-transform. The first fu- 
sion operator is applied to the F-transform compo- 
nents of scenes and is based on a robust partition 
of the scene domain. The second fusion operator is 
applied to the residuals of scenes with respect to 


PSNR Camera 


> 
0 0.2 0.4 0.6 
Compression rate 


Fig. 7.11 The PSNR values of the image Cameraman 
compressed using four methods: FEQ, the F-transform 
with block decomposition, the advanced F-transform, and 
JPEG 


inverse F-transforms with fused components and is 
based on a finer partition of the same domain. Al- 
though this approach is not explicitly based on focus 
measures, it uses the fusion operator, which is able 
to choose an undistorted scene among the available 
blurred scenes. 


Principles of Image Fusion 

Using the F-Transform 
In this subsection, we present a short overview of the 
two methods of fusion that were proposed in [7.32, 33] 
and introduce a new method [7.34] that is a weighted 
combination of those two. We will demonstrate that the 
new method is computationally more effective than the 
first two. 

The F-transform fusion is based on a certain decom- 
position of an image. We assume that the image u is 
a discrete real function u = u(x, y) defined on the N x M 
array of pixels P= {(i,/)|i=1,...,N,j=1,...,M} 
such that u : P —> R. Moreover, let fuzzy sets A4, . . . , An 
and Bi,...,Bm, where 2<n<N,2<m< M, estab- 
lish uniform Ruspini partitions of [1,N] and [1, M], 
respectively. We begin with the following representa- 
tion of u on P, 


u(x, y) = Unm(X, y) + e(x, y) Fi (7.18) 
e(x, y) = u(x, y) —Unm(%, y) , (7.19) 


where unm is the inverse F-transform of u and e is the 
respective first difference. If we replace e in (7.18) by 
its inverse F-transform eyy with respect to the finest 
partition of [1, N] x [1, M], the above representation can 
then be rewritten as follows, 


u(x, y) T Unm(X, y) F enm (X, y) . (7.20) 


We call (7.20) a one-level decomposition of u on P. 
If u is smooth, then the function eyy is small (this 
claim follows from the property (e) in Sect. 7.3.1), 
and we can stop at this level. In the opposite case, 
we continue with the decomposition of the first dif- 
ference e in (7.18). We decompose e into its inverse 
F-transform é,’,y (with respect to a finer fuzzy parti- 
tion of [1, N] x [1, M] with n’ :n <n! < N and m :m < 
m <M basic functions) and the second difference e’. 
Thus, we obtain the second-level decomposition of u 
nP 


u(x, y) = Unm(X, Y) + ewm (x, y) + e' (x, y), 
e' (x, y) = e(x, y) — enw (X, y) . 


125 


22 | Y Hed 


126 PartA 


Foundations 


Z°. | Y Hed 


In the same manner, we can obtain a higher level de- 
composition of u on P 


u(x, y) = Unm (x, y) =r e (x, y) a 


+e?) @y)te% Gy), (7.21) 
where 
O<n <m- <ni <N, 
0<m <m <-+-<m_- | <M, 
e) (x, y) = u(x, y) — Unm Œ, y) , 
e® (x, y) = eG, y) — ein y), 
i=2,...,k-1. (7.22) 


Three Algorithms for Image Fusion 
In [7.33], we proposed two algorithms: 


1. The simple F-transform-based fusion algorithm 
(SA) and 

2. The complete F-transform-based fusion algorithm 
(CA). 


The principal role in the fusion algorithms CA and 
SA is played by the fusion operator k : RX > R, which 
is defined as follows: 


K(x1,...,X«K) =x, if |x| = max(|x1|,..., |xx|) - 


(7.23) 


The Simple F-Transform-Based Fusion 

Algorithm 
In this subsection, we present a block description of 
the SA without technical details, which can be found 
in [7.33]. We assume that K > 2 input (channel) images 
C1, ..., Cg With various types of degradation are given. 
Our aim is to recognize undistorted parts in the given 
images and to fuse them into one image. The algorithm 
is based on the decompositions given in (7.20), which 
are applied to each channel image: 


1. Choose values n and m such that 2<n<N,2< 
m < M and create a fuzzy partition of [1, N] x [1, M] 
by fuzzy sets Ay, x Bı, where k=1,...,n and l = 
l,...,m. 

2. Decompose the input images c1, . . . , cx into inverse 
F-transforms and error functions according to the 
one-level decomposition (7.20). 

3. Apply the fusion operator (7.23) to the respective F- 
transform components of c),..., cg, and obtain the 
fused F-transform components of a new image. 


4. Apply the fusion operator to the respective F- 
transform components of the error functions e;, i = 
1,...,K, and obtain the fused F-transform compo- 
nents of a new error function. 

5. Reconstruct the fused image from the inverse F- 
transforms with the fused components of the new 
image and the fused components of the new error 
function. 


The SA-based fusion is very efficient if we can 
guess values n and m that characterize a proper fuzzy 
partition. Usually, this is performed manually according 
to the user’s skills. The dependence on fuzzy partition 
parameters can be considered as a primary shortcoming 
of this otherwise effective algorithm. Two recommen- 
dations follow from our experience: 


@ For complex images (with many small details), 
higher values of n and m yield better results. 

© Ifa triangular shape for a basic function is chosen, 
than the generic choice of n and m is such that the 
corresponding values of n, and m, are equal to 3 (re- 
call that n, is the number of points that are covered 
by every full basic function A4). 


The Complete F-Transform-Based Fusion 

Algorithm 
The CA-based fusion does not depend on the choice 
of only one fuzzy partition (as in the case of the SA) 
because it runs through a sequence (7.22) of increasing 
values of n and m. The algorithm is based on the decom- 
position presented in (7.21), which is applied to each 
channel image. The description of the CA is similar to 
that of the SA except for step 4, which is repeated in 
a cycle. Therefore, the quality of fusion is high, but the 
implementation of the CA is rather slow and memory 
consuming, especially for large images. For an illustra- 
tion, Fig. 7.12, Tables 7.1 and 7.2. 


Table 7.1 Basic characteristics of the three algorithms ap- 
plied to the image Balls 


Image Resolution Time (s) Memory (MB) 
CA SA ESA CA SA ESA 
Balls 16001200 340 1.2 36 270 | St3 | sy) 


Table 7.2 MSE (mean-square error) and PSNR character- 
istics of the three fusion methods applied to the image 
Balls 


Image set MSE PSNR 
CA SA ESA CA SA ESA 
Balls 1.28 6.03 0.86 48.91 43.81 52.57 


F-Transform | 7.7 Applications 127 


F)) b) 


Enhanced Simple Fusion Algorithm 
In [7.34], we proposed an algorithm that is as fast as the 
SA and as efficient as the CA. We aimed at achieving 
the following goals: 


© Avoid running through a long sequence of possible 
partitions (as in the case of CA). 

@ Automatically adjust the parameters of the fusion 
algorithm according to the level of blurring and the 
location of blurred areas in input images. 


The algorithm adds another run of the F-transform 
over the first difference (7.18). The explanation is as fol- 
lows: the first run of the F-transform is aimed at edge 
detection in each input image, whereas the second run 
propagates only sharp edges (and their local areas) to 
the fused image. We refer to this algorithm as to en- 
hanced simple algorithm (ESA) and give its informal 
description: 

for all input (channel) images do 
Compute the inverse F-transform 
Compute the first absolute difference between the 
original image and the inverse F-transform of it 
Compute the second absolute difference between 
the first one and its inverse F-transform and set 
them as the pixel weights 
end for 
for all pixels in an image do 
Compute the value of sow — the sum of the weights 
over all input images 
for all input images do 
Compute the value of wr — the ratio between the 
weight of a current pixel and sow 
end for 
Compute the fused value of a pixel in the resulting 
image as a weighted (by wr) sum of input image 
values 
end for 


The primary advantages of the ESA are: 


@ Time — the execution time is smaller than for the CA 
(Table 7.1). 


Fig. 7.12a-c The SA (a), CA (b) and 
ESA (c) fusions of the image Balls. 
The ESA fusion has the best quality 
(Table 7.2) 


© Quality —the quality of the ESA fusion is better than 
that of the SA and for particular cases (Table 7.2), it 
is better than that of the CA. 


Because of space limitations, we present only one 
illustration of the F-transform fusion performed using 
the three algorithms, SA, CA, and ESA. We chose the 
image Balls with geometric figures to demonstrate that 
our fusion methods are able to reconstruct edges. In 
Fig. 7.13, two (channel) inputs of the image Balls are 
given, and in Fig. 7.12, three fusions of the same image 
are demonstrated. 

In Table 7.1, we demonstrate that the complexity 
(measured by the execution time or by the memory 
used) of the ESA is greater than the complexity of the 
SA and less than the complexity of the CA. 

In Table 7.2, we demonstrate that for the particular 
image Balls, the quality of fusion (measured by the val- 
ues of MSE and PSNR) of the ESA result is better (the 
MSE value is smaller) than the quality of the SA result 
and even than the quality of the CA result. 


7.7.3 F'-Transform Edge Detector 


Edge detection is inevitable in image processing. In par- 
ticular, it is a first step in feature extraction and image 
segmentation. We focused on the Canny edge detec- 
tor [7.35], which is widely used in computer vision. 
It was developed to ensure three basic criteria: good 
detection, good localization, and minimal response. In 


AN 


Yl 
22 
Fig. 7.13a,b Two inputs for the image Balls. The central 


ball is blurred in (a), and conversely, it is the only sharp 
ball in (b) 


LL, Y Hed 


128 PartA | Foundations 


L'L | Y Hed 


Fig. 7.14a-d Original images (a,c) and their F!-transform 
edges (b,d) 


these aspects, the Canny detector can be considered an 
optimal edge detector. In [7.26], we proposed using the 
F!-transform with the purpose of simplifying the first 
two steps of the Canny algorithm. Below, we provide 
the details of our proposal. 

The Canny algorithm is a multistep procedure for 
detecting edges as the local maxima of the gradient 
magnitude. The first step, performed using a Gaus- 
sian filter, is image smoothing and filtering noise. 
The second step is computation of a gradient of the 
image function to find the local maxima of the gra- 
dient magnitude and the gradient’s direction at each 
point. This step is performed using a convolution of 
the original image with directional masks (edge de- 
tection operators, such as those of Roberts, Prewitt, 
and Sobel, are some examples of these filters). The 
next step is called nonmaximum suppression [7.36], 
and it selects those points whose gradient magnitudes 
are maximal in the corresponding gradient direction. 
The final step is tracing edges and hysteresis thresh- 
olding, which leads to preserving the continuity of 
edges. 

In our experiment, we removed the first two steps in 
the Canny algorithm and replaced them by computation 
of approximate gradient values using the F!-transform. 
The reason is that the F!-transform (similar to the 
ordinary F-transform) filters out noise when comput- 
ing approximate values of the first partial derivatives 
given by (7.15). We assume that the image is repre- 
sented by a discrete function u: P — R of two vari- 


ables, where P = {(i,j) |i=1,...,N,j=1,...,M} is 
an N x M array of pixels, and the fuzzy sets Ay,...,An 
and B4, .. . , Bm establish a uniform triangular fuzzy par- 
tition of [1, N] and [1, M], respectively. 

Let x1,...,%, €[1,N] and yi,...,Ym €[1,M] be 
the Ay and h,-equidistant nodes of [1, N] and [1, M], re- 
spectively. 

According to property (e) in Sect. 7.6, the coeffi- 
cients cy,; of the linear polynomials of the F!-transform 
components are approximate values of the first par- 
tial derivatives of the image function at nodes (xx, y1) 
(for simplicity, we assume k= 2,...,n—1 and l= 
2,...,m), where by (7.17) and (7.5) the following 
hold, 


12 N M 

ce = DD uD- ABQ) , 
xY j=1 j=1 
12 N M 

ca) = ay 2 2 ul. G-yAKDBI() - 
wY i=l j=l 


Then, we can write approximations of the first par- 
tial derivatives as the respective inverse F-transforms 


n m 


Gli.) ~ YOY ces ODARDBG) 
k 


=1/=1 


and 


n m 


Gi, j) © 5 5 crix ADB) . 


k=1 /=1 


All other steps of the Canny algorithm — namely, 
finding the local maxima of the gradient magnitude and 
its direction, nonmaximum suppression, tracing edges 
through the image and hysteresis thresholding — are the 
same as in the original procedure. 

In the two examples in Fig. 7.14, we demonstrate 
the results of the F'!-transform edge detector on 
images chosen from the dataset available at ftp:// 
figment.csee.usf.edu/pub/ROC/edge_comparison_ 
dataset.tar.gz. 

We observe that many thin edges/lines are detected 
as well as their connectedness and smoothness. More- 
over, the following properties are retained: 


@ Smoothness of circular lines 
© Concentricness circles 
@ Smoothness of sharp connections. 


F-Transform | References 


7.8 Conclusions 


In this chapter, the theory of the F-transform has been 
discussed from the perspective of the latest develop- 
ments and applications. The importance of a proper 
choice of fuzzy partition has been stressed. Various 
fuzzy partitions have been considered, including the 
most general partition (currently known). The definition 


References 


of the F-transform has been adapted to the general- 
ized fuzzy partition, and the main properties of the 
F-transform have been re-established. The applications 
to image processing, namely image compression, fu- 
sion and edge detection, have been discussed with 
sufficient technical details. 


7.1 I. Perfilieva: Fuzzy transforms: Theory and applica- 
tions, Fuzzy Sets Syst. 157, 993-1023 (2006) 

7.2 L.A. Zadeh: Outline of a new approach to the analy- 
sis of complex systems and decision processes, IEEE 
Trans. Syst. Man Cybern. SMC-3, 28-44 (1973) 

7.3 L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning, Part I, Inf. 
Sci. 8, 199-257 (1975) 

7.4 L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning, Part Il, Inf. 
Sci. 8, 301-357 (1975) 

7.5 L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning, Part Ill, Inf. 
Sci. 9, 43-80 (1975) 

7.6 T. Takagi, M. Sugeno: Fuzzy identification of systems 
and its application to modeling and control, IEEE 
Trans. Syst. Man Cybern. 15, 116-132 (1985) 

7.7 |. Perfilieva, H. De Meyer, B. De Baets, D. Piskova: 
Cauchy problem with fuzzy initial condition and its 
approximate solution with the help of fuzzy trans- 
form, Proc. WCCI 2008, IEEE Int. Conf. Fuzzy Syst., 
Hong Kong (2008) pp. 2285-2290 

7.8 I. Perfilieva: Fuzzy transforms and their applications 
to image compression, Lect. Notes Artif. Intell. 3849, 
19-31 (2006) 

7.9 F. Di Martino, V. Loia, |. Perfilieva, S. Sessa: An im- 
age coding/decoding method based on direct and 
inverse fuzzy transforms, Int. J. Approx. Reason. 48, 
110-131 (2008) 

7.10 V. Novák, M. Štěpnička, A. Dvořák, |. Perfilieva, 
V. Pavliska, L. Vavřičková: Analysis of seasonal time 
series using fuzzy approach, Int. J. Gen. Syst. 39, 
305-328 (2010) 

7.11 l. Perfilieva, V. Novák, V. Pavliska, A. Dvořák, M. Štěp- 
nička: Analysis and prediction of time series using 
fuzzy transform, Proc. WCCI 2008, IEEE Int. Conf. Neu- 
ral Netw., Hong Kong (2008) pp. 3875-3879 

7.12 F. Di Martino, V. Loia, S. Sessa: Fuzzy transforms 
method in prediction data analysis, Fuzzy Sets Syst. 
180, 146-163 (2011) 

7.13 M. Štěpnička, A. Dvořák, V. Pavliska, L. Vavřičková: 
A linguistic approach to time series modeling with 
the help of F-transform, Fuzzy Sets Syst. 180, 164- 
184 (2011) 


7.14 L. Troiano, P. Kriplani: Supporting trading strategies 
by inverse fuzzy transform, Fuzzy Sets Syst. 180, 121- 
145 (2011) 

7.15 L. Stefanini: F-transform with parametric general- 
ized fuzzy partitions, Fuzzy Sets Syst. 180, 98-120 
(2011) 

7.16 |. Perfilieva, M. Daňková, B. Bede: Towards F- 
transform of a higher degree, Fuzzy Sets Syst. 180, 
3-19 (2011) 

7.17 M. Holčapek, T. Tichý: A smoothing filter based 
on fuzzy transform, Fuzzy Sets Syst. 180, 69-97 
(2011) 

7.18 P. Hurtik, |. Perfilieva: Image compression method- 
ology based on fuzzy transform, Int. Jt. Conf. CISIS'12- 
ICEUTE'12-SOCO'12 Special Sessions, Adv. Intell. Soft 
Comput., Vol. 189 (Springer, Berlin, Heidelberg 2013) 
pp. 525-532 

7.19 K. Hirota, W. Pedrycz: Fuzzy relational compression, 
IEEE Trans. Syst. Man Cybern. 29, 407-415 (1999) 

7.20 G. Patanè: Fuzzy transform and least-squares ap- 
proximation: Analogies, differences, and general- 
izations, Fuzzy Sets Syst. 180, 41-54 (2011) 

7.21 1l. Perfilieva, V. Novak, A. Dvořák: Fuzzy transform in 
the analysis of data, Int. J. Approx. Reason. 48, 36- 
46 (2008) 

7.22 R.S. Blum: Robust image fusion using a statistical 
signal processing approach, Inf. Fusion 6, 119-128 
(2005) 

7.23 |. Perfilieva, R. Valášek: Fuzzy transforms in removing 
noise. In: Computational Intelligence, Theory and 
Applications, ed. by B. Reusch (Springer, Heidelberg 
2005) pp. 225-234 

7.24 F. Di Martino, V. Loia, S. Sessa: A segmentation 
method for images compressed by fuzzy transforms, 
Fuzzy Sets Syst. 161, 56-74 (2010) 

7.25 |. Perfilieva, M. Daňková: Image fusion on the basis 
of fuzzy transforms, Proc. 8th Int. FLINS Conf., Madrid 
(2008) pp. 471-476 

7.26 |. Perfilieva, P. Hodáková, P. Hurtik: Ft-transform 
edge detector inspired by Canny's algorithm. In: 
Advances on Computational Intelligence (IPMU2012), 
ed. by S. Greco, B. Bouchon-Meunier, G. Coletti, 
M. Fedrizzi, B. Matarazzo, R.R. Yager (Catania, Italy 
2012) pp. 230-239 


129 


Z | Y Hed 


130 PartA 


Foundations 


L | Y Hed 


V. Novák, |. Perfilieva, V. Pavliska: The use of higher- 
order F-transform in time series analysis, World 
Congr. IFSA 2011 AFSS 2011, Surabaya (2011) pp. 2211- 
2216 

|. Perfilieva, B. De Baets: Fuzzy transform of 
monotonous functions, Inf. Sci. 180, 3304-3315 (2010) 
A. Mumtaz, A. Masjid: Genetic algorithms and its 
applications to image fusion, IEEE Int. Conf. Emerg. 
Technol., Rawalpindi (2008) pp. 6-10 

R. Ranjan, H. Singh, T. Meitzler, G.R. Gerhart: Itera- 
tive image fusion technique using fuzzy and neuro 
fuzzy logic and applications, Proc. IEEE Fuzzy Inf. 
Process. Soc., Detroit (2005) pp. 706-710 

G. Piella: A general framework for multiresolution 
image fusion: From pixels to regions, Inf. Fusion 4, 
259-280 (2003) 

|. Perfilieva, M. Daňková, H.P.M. Vajgl: The use of 
f-transform for image fusion algorithms, Proc. Int. 


Conf. Soft Comput. Pattern Recognit. (SoCPaR2010), 
Cergy Pontoise (2010) pp. 472-477 

|. Perfilieva, M. Daňková, P. Hodáková, M. Vajgl: 
F-transform based image fusion. In: Image Fusion, 
ed. by 0. Ukimura (InTech, Rijeka 2011), pp. 3-22, 
available online from http://www.intechopen.com/ 
books/image-fusion/f-transform-based-image- 
fusion 

M. Vajgl, |. Perfilieva, P. Hodakova: Advanced F- 
transform-based image fusion, Adv. Fuzzy Syst. 2012, 
125086 (2012) 

J. Canny: A computational approach to edge detec- 
tion, IEEE Trans. Pattern Anal. Mach. Intell. PAMI- 
8(6), 679-698 (1986) 

A. Rosenfeld, M. Thurston: Edge and curve detection 
for visual scene analysis, IEEE Trans. Comput. C-20(5), 
562-569 (1971) 


8. Fuzzy Linear Programming and Duality 


Jaroslav Ramik, Milan Vlach 


The chapter is concerned with linear programming 
problems whose input data may be fuzzy while 
the values of variables are always real numbers. We 
propose a rather general approach to these types of 
problems, and present recent results for problems 
in which the notions of feasibility and optimality 
are based on the fuzzy relations of possibility and 
necessity. Special attention is devoted to the weak 
and strong duality. 


Formulation of an abstract model applicable to a com- 
plex decision problem usually involves a tradeoff be- 
tween the accuracy of the problem description and 
the tractability of the resulting model. One of the 
widespread models of decision problems is based on 
the assumption of linearity of constraints and optimiza- 
tion criteria, in spite of the fact that, in most instances 
of real decision problems, not all constraints and opti- 
mization criteria are linear. Fortunately, in many such 
cases, solutions of decision problems obtained through 
linear programming are exact or numerically tractable 
approximations. Given the practical relevance of lin- 
ear programming, it is not surprising that attempts to 
extend linear programming theory to problems involv- 
ing fuzzy data have been appearing since the early 
days of fuzzy sets. To obtain a meaningful extension 
of linear programming to problems involving fuzzy 
data, one has to specify a suitable class of permit- 
ted fuzzy numbers, introduce fundamental arithmetic 
operations with such fuzzy numbers, define inequal- 
ities between fuzzy numbers, and clarify the mean- 
ing of feasibility and optimality. Because this can 
be done in many different ways, we can hardly ex- 
pect a unique extension that would be so clean and 
clear like the theory of linear programming with- 
out fuzzy data. Instead, there exist several variants 
of the theory for fuzzy linear programming, the re- 


8.1 Preliminaries ..................cccccccccceeceeeee ees 132 
8.1.1 Linear Programming..................0 132 
8.1.2 Sets and Fuzzy Sets ...............:.000 133 

8.2 Fuzzy Linear Programming..................... 135 

8.3 Duality in Fuzzy Linear Programming..... 137 
8.3.1 Early Approaches ..................0008 137 
8.3.2 More General Approach ................ 140 

8:4 COonduUsion.... usii sniii iiia 143 

ROTCVORICES oie. cc cic sisccteadse sates sasencsaeascaecasees 143 


sults of which resemble in various degrees some of 
the useful results established in the conventional linear 
programming. 

Certainly, the most influential papers for the early 
development of optimization theory for problems with 
fuzzy data were papers written by Bellman and 
Zadeh [8.1], and Zimmermann [8.2]. As pointed out 
in a recent paper by Dubois [8.3], fuzzy optimization 
that is based on the Bellman and Zadeh, and Zim- 
mermann ideas comes down to max—min bottleneck 
optimization. Thus, strictly speaking, the fuzzy linear 
programming problems are not necessarily linear in the 
standard sense. 

Throughout the chapter, we assume that some or all 
of the input data defining the problem may be fuzzy 
while the values of variables are always real num- 
bers. For problems with fuzzy decision variables, see 
e.g. [8.4]. Moreover, we not always satisfy the require- 
ment of the symmetric model of [8.1, 2] which demands 
that the constraints and criteria are to be treated in the 
same way. In general, we take into consideration the 
fact that in many situations, in practice, the degree of 
feasibility may be essentially different from the degree 
of optimality attainment. 

The structure of the chapter is briefly described as 
follows. In the next section, we first recall the basic 
results of the conventional linear programming, espe- 


131 


132 


l'8 | Y Hed 


Part A 


Foundations 


cially the results on duality. As a canonical problem we 
consider the problem of the form: Given real numbers 


by, bo, 2.6, Bm C1, C25 6-65 Carai Ariam 
maximize CX; + C2X2 +++: + CnXn 
subject to aix + anx +++: + dinx, < bi, 
t= L2 aM; 
420, j=1,2,...,n. 


Then we review the basic notions and terminology of 
fuzzy set theory, which we need for precise formula- 
tion and description of results of linear programming 
problems involving fuzzy data. After these necessary 
preliminaries, in Sect. 3, we introduce and study fuzzy 
linear programming problems. We focus attention on 
analogous canonical form, namely on the following 


8.1 Preliminaries 


8.1.1 Linear Programming 


Linear programming is concerned with optimization 
problems whose objective functions are linear in the un- 
knowns and whose constraints are linear inequalities or 
linear equalities in the unknowns. The form of a lin- 
ear programming problem may differ from one problem 
to another but, fortunately, there are several standard 
forms to which any linear programming problem can 
be transformed. We shall use the following canonical 
form. 


Given real numbers b4, b2,...,bm, C1, C2,...,Cm 
11, 412, . -mns 
maximize c1X1 + C2X2 +-+- + CnXn (8.1) 
subjectto ajx + anx +++ + Ainkn < bi , 
i= 1,2,...,m, (8.2) 
z0, 
J=1,2,...,n. (8.3) 


The set of all n-tuples (%1,x2,...,%,) of real numbers 
that simultaneously satisfy inequalities (8.2) and (8.3) is 
called the feasible region of problem (8.1)—(8.3) and the 
elements of feasible region are called feasible solutions. 
A feasible solution £ such that no other feasible solution 
x satisfies 


cix + cX Hees F CnXn > Ciki + Cok. Hees H Cnn 


problem 
maximize Cx; 4- + ČnXn 
subject to aj x1 + ee + GinXn Pb; s 
i= 1,2 5.8045; 
420, j=l,2,...,n, 


where G;, aj, and b; are fuzzy quantities and the mean- 
ings of subject to and maximize are based on the 
standard possibility and necessity relations introduced 
in [8.5]. The final section is devoted to duality the- 
ory for fuzzy linear programming problems. First, we 
recall some of the early approaches that are based 
on the ideas of Bellman and Zadeh [8.1], and Zim- 
mermann [8.2]. Then we present recent results of 
Ramik [8.6, 7]. 


is called an optimal solution of (8.1)-(8.3), and the set 
of all optimal solutions is called the optimal region. 

Using the same data bj, b2,..., bm, C1, C25... ,Cns 
411,412,- - , Amn, We can associate with problem (8.1)— 
(8.3) another linear programming problem, namely, the 
problem 


minimize y,b, + y2b2 +--+ + Ymbm (8.4) 
subject to yjayj + y2dyj + +++ + YmAnj = G , 
FSA 2p acy ts (8.5) 
yz, 
FH 12,2203 (8.6) 


Analogously to the case of maximization, we say 
that the set of all m-tuples (y1, y2,...,¥m) of real num- 
bers that simultaneously satisfy inequalities (8.5) and 
(8.6) is the feasible region of problem (8.4)—(8.6), and 
that an element of the feasible region such that no 
other element y of the feasible region satisfies 


diy, + boyz +++ + bmYm < by, + b292 +e + bmm 


is an optimal solution of (8.4)—(8.6). 

The problem (8.1)-(8.3) is then called the pri- 
mal problem and the associated problem (8.4)-(8.6) is 
called the dual problem to (8.1)-(8.3). However, this 
terminology is relative because if we rewrite the dual 
problem into the form of the equivalent primal problem 


Fuzzy Linear Programming and Duality | 8.1 Preliminaries 


and again construct the corresponding dual, then we ob- 
tain a linear programming problem which is equivalent 
to the original primal problem. In other words, the dual 
to the dual is the primal. Consequently, it is just the mat- 
ter of convenience which of these problems is taken as 
the primal problem. 

The main theoretical results on linear programming 
are concerned with mutual relationship between the 
primal problem and its dual problem. They can be sum- 
marized as follows, see also [8.8, 9]. 

Let R” and R} denote the set of real n-vectors and 
real nonnegative n-vectors equipped by the usual eu- 
clidean distance. For n = 1, we simplify the notation to 
R and R+. The scalar product of vectors x and y from 
R” is denoted by xy: 


1. Ifx is a feasible solution of the primal problem and 
if y is a feasible solution of the dual problem, then 
cx < yb. 

2. Ifx is a feasible solution of the primal problem, and 
if y is a feasible solution of the dual problem, and 
if cx = yb, then x is optimal for the primal problem 
and y is optimal for the dual problem. 

3. If the feasible region of the primal problem is 
nonempty and the objective function x +> cx is not 
bounded above on it, then the feasible region of the 
dual problem is empty. 

4. If the feasible region of the dual problem is 
nonempty and the objective function y+> yb is not 
bounded below on it, then the feasible region of the 
primal problem is empty. 


It turns out that the following deeper results con- 
cerning mutual relation between the primal and dual 
problems hold: 


5. If either of the problem (8.1)-(8.3) or (8.4)-(8.6) 
has an optimal solution, so does the other, and the 
corresponding values of the objective functions are 
equal. 

6. If both problems (8.1)-(8.3) and (8.4)-(8.6) have 
feasible solutions, then both of them have optimal 
solutions and the corresponding optimal values are 
equal. 

7. A necessary and sufficient condition that feasible 
solutions x and y of the primal and dual problems 
are optimal is that 


x >0> yA =G, l<j<n, 
y=0 SA >ç, l<j<n, 
y>OSAx=b,, L<i<m, 
yı =0 4&A;x<bi, L<i<m, 


where A’ and A; stand for the j-th column and i-th 
row of A = {aj}, respectively. 


It is also well known that the essential duality results 
of linear programming can be expressed as a saddle- 
point property of the Lagrangian function, see [8.10]: 


8. Let L: R} x RY — R be the Lagrangian function 
for the primal problem (8.1)—(8.3), that is, L(x, y) = 
cx + y(b—Ax). The necessary and sufficient con- 
dition that x € R”_ be an optimal solution of the 
primal problem (8.1)—(8.3) and y € R? be an opti- 
mal solution of the dual problem (8.4)-(8.6) is that 
(x,y) be a saddle point of L; that is, for all x € R} 
and y E€ R”, 


L(x, Y) < L(x, Y) < LX, y) . (8.7) 
8.1.2 Sets and Fuzzy Sets 


A well-known fact about subsets of a given set is 
that their properties and their mutual relations can be 
studied by means of their characteristic functions. How- 
ever, these two notions are different, and the notion of 
characteristic function of a subset of a set is more com- 
plicated than that of a subset of a set. Indeed, because 
the characteristic function y4 of a subset A of a fixed 
given set X is a mapping from X into the set {0, 1}, we 
not only need the underlying set X and its subset A but 
also one additional set; in particular, the set {0, 1}. In 
addition, we also need the notion of an ordered pair and 
the notion of the Cartesian product of sets because func- 
tions are specially structured binary relations; in this 
case, special subsets of X x {0, 1}. 

The phrases the membership function of a fuzzy 
set ... or the fuzzy set defined by membership func- 
tion ... (and similar ones), which are very common in 
the fuzzy set literature, clearly indicate that a fuzzy set 
and its membership function are different mathematical 
objects. If we introduce fuzzy sets by means of their 
membership functions, that is, by replacing the range 
{0, 1} of characteristic functions with the unit interval 
[0, 1] of real numbers ordered by the standard ordering 
<, then we are tacitly assuming that the membership 
functions of fuzzy sets on X are related to fuzzy sets on 
X in an analogous way as the characteristic functions 
of subsets of X are related to subsets of X. What are 
those objects that we call fuzzy sets on X in set-theoretic 
terms? Obviously, they are more complex than just sub- 
sets of X because the class of functions mapping X into 
the lattice ([0, 1], <) is much richer than the class of 
functions mapping X into {0, 1}. We follow the opinion 


133 


Vs | Y Hed 


134 PartA 


Foundations 


Vs | Y Hed 


that fuzzy sets are special-nested families of subsets of 
a set, see [8.11]. 


Definition 8.1 

A fuzzy subset of a nonempty set X (or a fuzzy set on 
X) is a family {Aa}wefo,1] Of subsets of X such that 
Ao =X, Ag C Aa whenever 0 < a < B < 1, and Ag = 
No<a<gAa Whenever 0 < f < 1. 


Definition 8.2 

IfA = {Aw}qaefo,1] is a fuzzy subset of X, then the mem- 
bership function of A is the function u4 from X into the 
unit interval [0, 1] defined by u4 (x) = sup{« : x € Ag}. 


Remark 8.1 

It is worth noting that by defining a fuzzy subset of 
a set X as a special family of subsets of X, we can 
easily avoid certain troublesome phrases. For example, 
we are used to say a subset A of X and not a subset 
Xa : X — {0, 1} of a set X. Similarly, it is more natural 
to say a fuzzy subset A of X than to say a fuzzy sub- 
set ua : X — [0, 1] in X. Moreover, if a fuzzy set on X 
would be defined as a function jz from X to [0, 1], then 
we would obtain statements like fuzzy set u is function 
LL, or a fuzzy set u is convex if and only if p is quasi- 
concave. 


Let A be a subset of a set X and let {Ag}yefo,1] 
be the family of subsets of X defined by Ag = X and 
Aq =A for each positive œ from [0, 1]. It can easily 
be seen that this family is a fuzzy set on X and that 
its membership function is equal to the characteristic 
function of A; see [8.12, 13] for details. This one-to-one 
correspondence between the characteristic functions of 
subsets of X and the membership functions of certain 
fuzzy sets on X provides an embedding of the set of 
subsets of X into the set of fuzzy sets on X. Conse- 
quently, we can view subsets of X as special fuzzy sets 
on X. When we need to distinguish the latter from the 
other fuzzy sets on X, we call them the crisp fuzzy sets 
on X. Moreover, we can also view the elements of X as 
a special fuzzy sets on X by additionally employing the 
one-to-one correspondence that assigns to each element 
x of X the singleton {x}. When we need to distinguish 
an element x € X from the crisp fuzzy sets on X corre- 
sponding to {x}, we write k(x) for the latter. 

We denote the collection of all fuzzy sets on X by 
F(X). When A is from F(X) and p4 is the membership 
function of A, then we use the following terminol- 


ogy. The value j14(x) is called the membership degree 
of x in A. The set {xe X: a(x) = 1} is called the 
core of A. If the core of A is nonempty, then A is 
said to be normalized. The complement of A is the 
fuzzy set c(A) on X whose membership function is 
Hea (x) = 1 — ua (x). For each a € [0, 1], the set {x € 
X | a(x) = æ} is called the a-cut of A and is denoted 
by [A]. If X is a nonempty subset of a real finite- 
dimensional normed space, then a fuzzy set A in X is 
called closed, bounded, compact, or convex if the a-cut 
[Ala is a closed, bounded, compact or convex subset of 
X for every a € (0, 1], respectively. 

Following the terminology of [8.7], we say that 
a fuzzy subset A of R is a fuzzy quantity whenever A 
is normal, compact, and its membership function ua is 
semistrictly quasiconcave in the following sense: The 
membership function ua of A is semistrictly quasicon- 
cave on R if there exist a,b,c,d € R, —œ0 <a < b < 
c < d < +o, such that 


Halt) =0 ift<aort>d, 

a is Strictly increasing on the interval [a, b], 
HAQ=1 ifb<t<c, 

a is Strictly decreasing on the interval [c, d]. 


The set of all fuzzy quantities is denoted by Fo(R). 
Note that F(R) contains well-known classes of fuzzy 
numbers: crisp (real) numbers, crisp intervals, triangu- 
lar fuzzy numbers, trapezoidal, and bell-shaped fuzzy 
numbers etc. However, F (IR) does not contain fuzzy 
sets with stair-like membership functions. 

Recall that the binary relations on X are subsets 
of the Cartesian product X x X and that the fuzzy sets 
on X xX are called the fuzzy binary relation on X, or 
simply fuzzy relation on X. Because the binary rela- 
tions on X are subsets of X x X, we can view them as 
special fuzzy relations on X; namely, as those fuzzy re- 
lations on X whose membership functions are equal to 
the characteristic functions of the corresponding binary 
relations. Again, we call them crisp. Since the member- 
ship functions of fuzzy sets provide a mathematical tool 
for introducing grades in the notion of set membership, 
the fuzzy relations on X can be used for introducing 
grades in comparison of elements of X. However, if we 
need to compare not only elements of X but also fuzzy 
sets on X, then we need binary relations and fuzzy bi- 
nary relations on the set of fuzzy sets on X, that is, on 
F(X) x F(X). 

Let R be a fuzzy relation on X and let Q be a fuzzy 
relation on F(X), that is, R belongs to F (X x X) and Q 
belongs to F (F(X) x F(X)). We say that Q is a fuzzy 


Fuzzy Linear Programming and Duality 


8.2 Fuzzy Linear Programming 


extension (or briefly an extension) of R from X to F(X) 
if, for each pair x and y in X, 


Ho(k(x), k(y)) = ur, y) . (8.8) 


8.2 Fuzzy Linear Programming 


As mentioned in the beginning, we can hardly expect 
that some unique extension of the conventional linear 
programming to problems with fuzzy data can be es- 
tablished which would be so clean and clear like the 
theory of the conventional linear programming in finite- 
dimensional spaces. This can also be easily seen from 
the current literature where we can find a number of 
different extensions, the results of which resemble in 
various degrees some of the useful results established 
in the conventional linear programming. 

When dealing with problems that arise from the 
canonical linear programming problem (8.1)-(8.3) by 
permitting the input data c;, aj, and b; in (8.1)-(8.3) to 
be fuzzy quantities, we distinguish the fuzzy quantities 
from real numbers by writing the tilde above the corre- 
sponding symbol. Thus, we write ¢, aj, and b; and con- 
sequently uz :R — [0, 1], May: R > [0,1] and uz: 
R = (0, 1], respectively, for i € M = {1,2,...,m} and 
JEN = {1,2,...,n}. When the tilde is omitted, it sig- 
nifies that the corresponding data or values of variables 
are considered to be real numbers. Notice that if č; and 
ay are fuzzy quantities, then, for every (x1,X2,...,Xn) 
from R”, the fuzzy subsets ¢)x; +---+,x, and 
ax; + +++ + Ginx, of R defined by the extension prin- 
ciple are again fuzzy quantities. Also notice that it is 
possible to consider the conventional linear program- 
ming problems as special cases of such fuzzy problems 
because the real numbers can be identified with crisp 
fuzzy quantities. 

As the canonical fuzzy counterpart of the canonical 
linear programming problem (8.1)-(8.3), we consider 
the problem 


maximize (yx) 4 -<+ + GnXn 
subject to (axı 1 +++ 4 Ginx) Pi bi, ieM, 
>20; JEN, (8.9) 
where, for each ic M, the fuzzy quantities &;xı 
t+ F Ginx, and b; from F(R) are compared by 
a hey relation P; on F(R), and where the meanings 


of subject to and maximize, that is, the meanings of fea- 
sibility and optimality, remain to be specified. 


Because the set of the conventional binary relations 
on X can be embedded into the set of fuzzy relation 
on X, we also obtained from (8.8) extensions of conven- 
tional binary relations on X to fuzzy relations on F(X). 


Primarily, we shall study the case in which all P; ap- 
pearing in the constraints of problem (8.9) are the same. 
Namely, let P be a fuzzy relation on Fo(R) and let us as- 
sume that P; = P for all i € M. Then (8.9) simplifies to 


maximize jx) 4 -++ + nXn 
subject to (axı + +++ + GinxXn) P b; , TEM, 
420, JEN, (8.10) 
where the meaning of feasibility and optimality are 
specified as follows. 


© Feasibility: Let p be a positive number from (0, 1]. 
By a -feasible region of problem (8.10) we under- 
stand the B-cut of the fuzzy subset X of R” whose 
membership function ug is given by 


Mx (x) = 
min pp(aax; +--+ + Ginxn, bi) 
1<i<m 
if 4 =0 forall JEN, 
0 otherwise . 


(8.11) 


The elements of 6-feasible region are called £- 
feasible solutions of problem (8.10), and X defined 
by (8.11) is called the feasible region of problem 
(8.10). It is worth mentioning that when the data in 
(8.10) are crisp, then X become the feasible region 
of the canonical linear programming problem (8.1)— 
(8.3). 

© Optimality: When specifying the meaning of opti- 
mization, we have to take into account that the set of 
fuzzy values of the objective function is not linearly 
ordered, and that the relation for making compari- 
son of elements of this set may be independent of 
that used in the notion of feasibility. We propose 
to use the notion of a-efficient (a-nondominated) 
solution of the fuzzy linear programming (FLP) 
problem. (Some other approaches can be found in 
the literature; for example, see [8.6].) 
First, we observe that a feasible solution ĉ of non- 
fuzzy problem (8.1)—(8.3) is optimal exactly when 


135 


7s |V Hed 


136 PartA | Foundations 
there is no other feasible solution x such that cx > Equivalently, we write a ~<Pos b and @ <Nec b, instead of 
cx. This suggests the introduction of a suitable fuzzy —_[upos(@, b) and [Nec (G, b), respectively, and by & =P% b 
extensions of >. Let Q be a fuzzy relation on R and we mean b ~<Pos 3, 
let œ € (0, 1]. If & and b are fuzzy quantities, then The proofs of the following propositions can be 
we write found in [8.7]. 
å Qab , if we(a,b) > œ (8.12) Proposition 8.1 
7 . Let abe Fo(R) be fuzzy quantities. Then, for each 
and call Qg the w-relation on R associated toQ.We œ € (0, 1), we have 
also write 2 5 
Ls oe z Lpos(a, b) > a iff inflalg < sup[ble , (8.17) 
ee eee 2 
aQq b, if (4Qqb and uglb,ã) <a), (8.13) LNec(@,b) >a iff sup[ai—e < inflo. (8.18) 
and call ox the strict a-relation on R associated =~ 
to Q. Now let a and B be positive numbers from Let d € F(R) be a fuzzy quantity, let £ € [0, 1], and let 
[0, 1]. We say that a B-feasible solution £ of (8.10) d'(B) and d? (£) be defined by 
is (a, B)-maximal solution of (8.10) if there is no ~L . ~ gr 
B-feasible solution x of (8.10) different from £ such ¢ (P) = inf tile S lds j E intial , 
that d®(B) = sup {t|t € [d]g} = sup[d],. (8.19) 
CX] + C2X2 +e + Enka On či ee 
a os Proposition 8.2 
C2X2 + + CyXn (8.14) a 
i) Let P=<° and let £ € [0,1]. A vector x= 
(xı, - - - Xn) is a -feasible solution of the FLP prob- 
Remark 8.2 lem (8.10) if and only if it is a nonnegative solution 
Note that Oy and OF are binary relations on the set of of the system of inequalities 
fuzzy quantities Fo(R) that are constructed from fuzzy . Fs ; 
relation Q at the level œ € (0, 1], and that relation Q* is > àB) <b), iEM. 
the strict relation associated with the relation Òx. Also ISAN 
notice that if & and b are crisp fuzzy numbers corre- ii) Let P= <Nee, A vector x= (x1, ..., Xn) is a f- 
sponding to real numbers a and b, respectively, and Q feasible solution of the FLP problem (8.10) if and 
is a fuzzy extension of relation <, then a Qg b holds if only if it is a nonnegative solution of the system of 
and only if a < b does. Then, for œ € (0,1), a Q% b if inequalities 
and only ifa < b. 
——— >> ad -B)y <HU-f), ieM. 
Significance and usefulness of duality results for jEN 
linear programming problems with fuzzy data depend 
crucially on the choice of fuzzy relations P and Q ap- The following proposition is a simple consequence of 
= pearing in the definition of feasibility and optimality. In the above results applied to the particular fuzzy rela- 
o] what follows, we use the natural extensions of binary tions P = <?°S and P = <Ne, 
z relations < and > on R to fuzzy relations on F(R) that 
— are based on the possibility and necessity relations Pos Proposition 8.3 
5 and Nec defined on F(R) by Let @ and b be fuzzy quantities, a € (0, 1]. 


Hros (ã, b) 
= supfmin(uz (x), uz O) . ur Œ, Y) |x. y ER}, 
(8.15) 
LNec(G, b) 
= inf{max(1— uz (x) , 1 — uz 0), 
ur (x, y)) lx, yE R}. (8.16) 


i) Let P= <°% be a fuzzy relation on R defined by 
(8.15). Then 


b iff a'(w) < Ba), 


Py 
PX itt aa) <b (a). oan 


a 
a 


ii) Let P = <N“ be a fuzzy relation on R defined by 
(8.16). Then 


Fuzzy Linear Programming and Duality | 8.3 Duality in Fuzzy Linear Programming 137 


iff @(1—a) < b (1 — a), 
iff æ (1 —a) < b(1— a) and 
(1-a) <b (1-a). 


RAQ 
ero 


R*R 


HeSr 


(8.21) 


As to the optimal solution of FLP problem, we ob- 
tain the following result, see also [8.7]. 


Proposition 8.4 7 
Let «œ, f € (0,1) and let X be a feasible region of the 
FLP problem (8.10) with P = <P°S, Let c; be such that 


č (a) < o <E (a) for all je N. If x* = QÑ, y) 
is an optimal solution of the LP problem 


maximize view Gy 
subjectto rica ai; (B)x) <bR(p), ieM, 
HzO, JEN, 
(8.22) 


then x* is an (œ, B)-maximal solution of the FLP prob- 
lem (8.10). 


8.3 Duality in Fuzzy Linear Programming 


8.3.1 Early Approaches 


Dual Pairs of Rodder and Zimmermann 
One of the early approaches to duality in linear pro- 
gramming problems involving fuzziness is due to Röd- 
der and Zimmermann [8.14]. To be able to state the 
problems considered by Rédder and Zimmermann con- 
cisely, we first observe that conditions (8.7) bring up the 
pair of optimization problems 


maximize min L(x, y) subjecttoxe R" , (8.23) 
y= 


minimize max L(x, y) subjecttoye R” . (8.24) 
x=0 Sy 


Let u and u’ be the real-valued functions on R} 
and R}, respectively, and let rje R” and {vy eR" 
be families of real-valued functions on R and R}. 
respectively. Furthermore, let p, and Yy be real-valued 
functions on R}_ and R} defined by 


p(x) = min(u (x), vx(y)) , (8.25) 
Y0) = min(u’(y), y) - (8.26) 
Now let us consider the following pair of families of 
optimization problems 
Family {P,}: Given y € R? , 
maximize g(x) subject to x € R}. 
Family {D,}: Given x € R”_, 
maximize Yx(y) subject to y € R}. 


Motivated and supported by economic interpretation, 
Rödder and Zimmermann [8.14] propose to specify 


functions u and u’ and families {v,} and {v/} as fol- 
lows: Given an mx n matrix A, mx 1 vector b, 1xn 
vector c, and real numbers y and 6, define the functions 
LL, W’, vy and vý by 


u(x) = min(1,1—(y—ex)), 


wO) = min(1, 1 — (yb — 8)) ; (8.27) 
vx(y) = max(0, y(b—Ax)) , 
vy (x) = max(0, (vA—c)x). (8.28) 


Strictly speaking, we do not obtain a duality scheme 
as conceived by Kuhn because there is no relationship 
between the numbers y and ô. Indeed, if the family 
{Py}y>o is considered to be the primal problem, then 
we have the situation in which the primal problem is 
completely specified by data A, b,c, and y. However, 
these data are not sufficient for specification of fam- 
ily {D,},>0 because the definition of {Dx}x>0 requires 
knowledge of ô. Thus, from the point of view that the 
dual problem is to be constructed only on the basis 
of the primal problem data, every choice of ô deter- 
mines a certain family dual to {P,},>0. In this sense, we 
could say that every choice of ô gives a duality, the ô- 
duality. Analogously, if the primal problem is {D,},>0, 
then every choice of y determines some family {P,},>0 
dual to {D,},>0, and we obtain the y-duality. In other 
words, for every y,6, we obtain (y,6)-duality. It is 
worth noticing that families {P,} and {D,} consist of 
uncountably many linear optimization problems. More- 
over, every problem of each of these families may have 
uncountably many optimal solutions. Consequently, the 
solution of the problem given by family {P,},>0 is the 
family {X(¥)})>0 of subsets of R} where X (y) is the set 
of maximizers of p, over R"_. Analogously, the family 


E'8 | V Hed 


1338 PartA | Foundations 


€°8 | Y Hed 


{¥(x)}.>0 of maximizers of Yy over R} is the solution 
of problem given by family {Dx}x>0. R6dder and Zim- 
mermann propose to replace the families {P,} and {D,} 
by the families {P/} and {D} of problems defined as 
follows 


@ Family {P}: For every u > 0, 
maximize A 
subject to A<1+cx—y 
A <u(b— Ax) 
x>0, (8.29) 


@ Family {D‘}: For every x > 0, 


minimize 7 
subject to ņn > ub—ô-— 1 
n > (c—uA)x 
u>0. (8.30) 


They call these families of optimization problems 
the fuzzy dual pair and claim that the families {P,} and 
{D,} become families {P/} and {D/} when 4, u’, vy 
and v/ are defined by (8.27) and (8.28). To see that 
this claim cannot be substantiated, it suffices to observe 
that the value of function g, cannot be greater than 1, 
whereas the value of A is not bounded above whenever 
A and b are such that both cx and —yAx are positive for 
some x € R”. 

To obtain a valid conversion, one needs to add the 
inequalities A < 1 and 7 > —1 to the constraints. Thus, 
it seems that more suitable choice of functions vy and v/ 
in the Rédder and Zimmermann duality scheme would 
be 


vx(y) = min(1, 1 + y(b—Ax)) , (8.31) 
vy, (x) = min(1, 1 + QA —c)x). (8.32) 


Another objection to the Rödder and Zimmermann 
model arises from the fact that the duality results for 
the proposed fuzzy dual pair do not reduce to the stan- 
dard duality results for the crisp scenario, that is, for 
A = 1,ņ = —1. Again an easy remedy is to work with 
v, and vy defined by (8.31) and (8.32) instead of vy 
and v’ from (8.28). Similar approaches can be found 
in (8.15, 16]. 


Dual Pairs of Bector and Chandra 
In contrast to the usual practice, in the Rédder and Zim- 
mermann model, the range of membership functions u 


and u’ is (—oo, 1], and the range of membership func- 
tions vy and v’ is [0,00) or [1,0o) instead of usual 
[0, 1]. Bector and Chandra [8.17] proposed to replace 
the relations < and > appearing in the dual pair of 
linear programming problems by suitable fuzzy rela- 
tions on R. In particular, the inequality < appearing in 
the i-th constraint of the primal problem (8.1)-(8.3) is 
replaced by the fuzzy relation <; whose membership 
function u<: R x R — [0, 1] is defined by 


1 if a<B 
j<(0,B)=4 1-8 if P<a<ftp, 
0 if Btp,<a 


where p; is a positive number. Analogously, the in- 
equality > appearing in the j-th constraint of the dual 
problem (8.4)—(8.6) is replaced by the fuzzy relation >; 
with the membership function 


1 if «a> 
u~ (a, p)=3 1-6 if p>azb-q, 
0 if fp-q>a 


where q; is a positive number. The degree of satisfac- 
tion with which x € R” fulfills the i-th fuzzy constraint 
Aix <; b; of the primal problem is expressed by the 
fuzzy subset of R” whose membership function pi 
is defined by u;(x) = u<; (Aix, bi), and the degree of 
satisfaction with which y € R” fulfills the j-th fuzzy 
constraint yA! >; c; of the dual problem is expressed by 
the fuzzy subset of R” whose membership function p 
is defined by y) = p> A, c). 

Similarly, we can express the degree of satisfaction 
with a prescribed aspiration level y of the objective 
function value cx by the fuzzy subset of R” given by 
[Lo(x) = >, (cx, y) where, for the tolerance given by 
a positive number po, the membership function jz~, is 
defined by 


1 if «>$ 
Hx (a,B)= 4) 1-£* if B>a>B—po 
0 if B-po>a. 


Analogously, for the degree of satisfaction with the as- 
piration level 5 and tolerance qo in the dual problem, we 
have poly) = M<, (8, yb) where 


1 if «<£ 
uxla p)=4 1-* if B<a<B+qo 
0 if Btaq<a. 


Fuzzy Linear Programming and Duality | 8.3 Duality in Fuzzy Linear Programming 139 


This leads to the following pair of linear programming 
problems. 

Given positive numbers po, p1, ... 
number y, maximize À subject to 


,Pm, and a real 


(A—1)po <cx-y, 
(A—1)p; < bi— Aix, l<i<m, 
O<A<1, x=0. (8.33) 


Given positive numbers go, 41,---,9n, and a real 


number 6, minimize —7 subject to 


(7—1)go = 5—yb, 

(n—l)q<yA'-G, 1<j<n, 

O<n<1l, y=0. (8.34) 

Bector and Chandra call this pair the modified fuzzy 
pair of primal dual linear programming problems, and 


they show that if x, A, and u, 7 are feasible solutions of 
the corresponding problems, then 


A-D upit+a-) > ax, 


i=1 j=l 
<) umbi-} 9%, 
i=1 j=1 
(à — 1)po + (n— 1)qo 


m 


< oy-} ubi+(8-y). 


j=l i=1 
It follows that, for the crisp scenario A = 1 and n= 1, 
we have 


m n 


64 < do uibi < Do gt 6-y). 


j=l i=1 j=l 


Moreover, for y < ô, feasible solutions x, À and u, 7 are 
optimal if 


© (A-1) i at G—1) j= a 

= a1 Midi — i=! CA 
e (A—lN)pot (n= a0 

= } j= GH — Lin Mibi + (8 — y). 

Again we see that the dual problem is not stated 
by using only the data available in the primal problem. 
Indeed, if problem (8.33) is considered to be the pri- 
mal problem, then to state its dual problem one needs 


additional information; namely, a number ô and num- 
bers go, q1,---»4n3 if problem (8.34) is considered to 
be primal, then one needs a number y and numbers 
PO0>P15+++sPm- 


Dual Pairs of Verdegay 
Verdegay’s approach to duality in fuzzy linear prob- 
lems presented in [8.18] is based on two natural ideas: 
(i) Solutions to problems involving fuzziness should be 
fuzzy; (ii) the dual problem to a problem with fuzziness 
only in constraints should involve fuzziness only in the 
objective. 

The primal problem considered in [8.19] has the 
form 


maximize cx 
subject to Ajx<;b;, i=1,2,...,m 


x>0, (8.35) 


where the valued relation <; in the ith constraint is the 
same as in the previous section, that is, 


1 if a<B 
phla. p= 1-8 if B<a<B+pi 


0 if Bpi<a. 


The fuzzy solution of problem (8.35) is given by the 
fuzzy subset of R” whose each y-cut, 0 < y < 1, is the 
solution set of the problem 
maximize cx 
subject to pi (Aix, b)>y, ti=1,2,...,m 
x>0. (8.36) 
Consequently, we obtain the following problem of 
parametric linear programming. 
For0<y <1, 
maximize cx 
subject to Ajx <b; + (1 — y)pi , 
x>0. (8.37) 


i=1,2,...,m 


Consider now the ordinary dual problem to (8.37), that 
is, forO<y <1, 
minimize > ui(bi + (1 — y)pi) 
i=l 
subject to uA > c 
u>0. (8.38) 


€°8 | V Hed 


140 PartA 


Foundations 


€°8 | Y Hed 


This suggests to introduce variables y1, y2,...,Ym by 


y=b+6p;, i=1,2,...,m 
with 6 = 1 — y, and consider the family of problems: 
Given 0 <6 < 1 and u > 0 with uA > c, 
minimize X uy 
i=1 
subject to y; > bi + 6p; , 


i=1,2,...,m. (8.39) 


Consequently, in terms of the membership functions 
ui» we obtain the family of problems: Given 0 < ô < 1 
and u > 0 with uA > c, 
minimize > UiYi 
i=1 
subject to pi Yis bi) <1—6, i=1,2,...,m. 
(8.40) 


8.3.2 More General Approach 
In this section, we return to the general canonical FLP 
problem, that is, to the problem 
maximize (xy +++ + ČnXn 
subject to (&axı + +++ + GinXn) Pb, 
JEN. 


ieEM, 


G20, (8.41) 


We will call it the primal FLP problem and denote it by 
8. The feasible region of $ is introduced by (8.11) and 
the meaning of (œ, 6)-maximal solution is explained in 
(8.14) and (8.22). 

To introduce the dual problem to problem %$, we 
first define a suitable notion of duality for fuzzy rela- 
tions on F(X). Let and W be mappings from F(X x 
X) into F (F(X) x F(X)), and let © be a nonempty sub- 
set of F(X xX). We say that mapping ® is dual to 
mapping ¥ on Ø, if 

P(c(P)) = c(W(P)) (8.42) 
for each P € ©. Moreover, if P is in © and @ is dual 
to W on Ø, then we say that the fuzzy relation ®(P) on 
F(X) is dual to fuzzy relation ¥ (P). 


The dual FLP problem (denoted by D) to problem 
$ is formulated as 


minimize biyi 4- 4 Di Soi 
subject to GO (ay Foe pein) JEN, 
yi>0, i€M. 
(8.43) 


Here, P and a) are dual fuzzy relations to each other, 
particularly P = <P”, Q = ~<Nec, or, P= <*, Q= 
~<Pos Th problem $, maximization is considered with 
respect to fuzzy relation P,in problem ©, minimization 
is considered with respect to fuzzy relation Q. The pair 
of FLP problems and 9, that is, (8.41) and (8.43), is 
called the primal-dual pair of FLP problems. Now, we 
introduce a concept of feasible region of problem 9, 
which is a modification of the feasible region of primal 
problem §, see also [8.20]. 

Let Mii; and MG i€ M, j E€ N, be the membership 
functions of fuzzy quantities aj and &, respectively. Let 
P be a fuzzy extension of a binary relation P on R. 
A fuzzy set Y, whose membership function [Ly is de- 
fined for all y € R” by 


min{up(čı s uyi *- s tami Yn), 
sey Lep(Cn, Giny1 + ` tain) 
ify;>Oforalie™, 


0 otherwise , 


u0) = 


(8.44) 


is called a fuzzy set of feasible region or shortly feasible 
region of dual FLP problem (8.43). Moreover, if f € 
(0, 1], then the vectors belonging to [Y]g are called B- 
feasible solutions of problem (8.43). 

By the parallel way, we define an optimal solution 
of the dual FLP problem D. 

Let Gs aj, and b;, i € M, j € N, be fuzzy quantities 
on R. Let O bea fuzzy relation on F(R) that is a fuzzy 
extension of the usual binary relation < on R, and let 
a, B € (0, 1]. A B-feasible solution of (8.43) y € [Y]g is 
called the (a, B)-minimal solution of (8.43) if there is 
no y' €[¥]g.y’ Æ y, such that 

biyi + boy + ae + bmn Qù biyı + boy2 

Ani bites (8.45) 
where o* is the strict -relation on R associated to Q. 

Let P be the usual binary operation < on R. Now, 
we shall investigate FLP problems (8.36) and (8.43) 
with pairs of dual fuzzy relations in the constraints, par- 
ticularly P = <Pos, Q = <M“, or, P = <Ne, Ọ = <Pos. 
The values of objective functions z and w are maximized 
and minimized with respect to fuzzy relation P and Q, 
respectively. 

The feasible region of the primal FLP problem $ 
is denoted by X, the feasible region of the dual FLP 
problem D by Y. Clearly, X is a fuzzy subset of R”, Y 
is a fuzzy subset of R”. 


Fuzzy Linear Programming and Duality 


8.3 Duality in Fuzzy Linear Programming 


Note that in the crisp case, that is, when the pa- 
rameters G, aj, and b; are crisp fuzzy quantities, then 
by (8.15) and (8.16), the relations <?° and <N coin- 
cide with <. Hence, $ and Ð forms a primal—dual pair 
of linear programming problems in the classical sense. 
The following proposition is a useful modification of 
Proposition 8.4 and gives a sufficient conditions for y* 
to be an (a, 8)-minimal solution of the FLP problem 


(8.43). 


Proposition 8.5 

Let ¢, aj and b; be fuzzy quantities for all į € M and 
jE N, a, B € (0,1). Let Y be a feasible region of the 
FLP problem (8.43) with P= <% Let b; be such that 
bi (a) < b; < b? (æ) for all ie M. If y* = (7,..., y5) 
is an optimal solution of the LP problem 


minimize ) biyi 


iEM 
subject to > ap Biz gB), JEN, 

iEM 

y>0, leM; (8.46) 


then y* is a (œ, 6)-minimal solution of the FLP prob- 
lem (8.43). 


Dual Pairs of Ramik 

When presenting duality theorems obtained by Ramik 
in [8.21] (see also [8.7]), we always present two ver- 
sions: i) for fuzzy relation < Pos and ii) for fuzzy relation 
<Nec Tn order to prove duality results we assume that 
the level of satisfaction a of the objective function is 
equal to the level of satisfaction 6 of the constraints. 
Otherwise, the duality theorems in our formulation do 
not hold. The proofs of the following theorems can be 
found in [8.7]. 


Theorem 8.1 First Weak Duality Theorem 

Let G, aj and b; be fuzzy quantities, i € M and j € N, 

a € (0,1). 

i) Let X be a feasible region of the FLP problem (8.36) 
with P= <P°S | and Ÿ be a _ feasible region of 
the FLP problem (8.43) with Q = <%°, If a vec- 
tor x= (x1, ...,Xn) =O belongs to [Xe and y= 
(1,--+;¥m) = 0 belongs to —_ then 


VW F@% <>) Hy. (8.47) 


jEN icM 


ii) Let X be a feasible region of the FLP problem 
(8.36) with P = <M"! , Y be a feasible region of 
the FLP problem (8.43) with Q = <?°S. If a vec- 


tor x = (xı, . . Xn) = 0 belongs to [X]i—q and y = 
Ois- - -s Ym) Z 0 belongs to [Y]q, then 
VF Oy s DA @yi. (8.48) 
JEN iEM 


Theorem 8.2 Second Weak Duality Theorem 
Let G, aj and b; be fuzzy quantities for all į € M and 
jEN,« € (0,1). 


i) Let X¥ be a feasible region of the FLP problem 
(8.36) with P= <°”, Y be a feasible region of 
the FLP problem (8.43) with Q = <N°. If for 


some x= (X1,...,%n) = 0 belonging to [X]q and 

Y= Or- Ym) = 0 belonging to [Y];—q it holds 
2 Goa = >, bR(a)y; . (8.49) 
JEN iE M 


then x is an (œ, œ)-maximal solution of the FLP 
problem (8.36) and y is an (1—«, 1 — æ)-minimal 
solution of the FLP problem 9, (8.43). 

ii) Let X be a feasible region of the FLP problem (8.36) 
with P = Nec. Y be a feasible region of the FLP 
problem (8.43) with Q = <Pes, 

If for some x= (x1,...,Xn) >20 belonging to 
[X]i—a and y = ()1,..., Ym) = 0 belonging to [Y]q 


it holds 
> G (œ) = 5 bi (wy: , (8.50) 
JEN iEM 


then x is an (1 — œ, 1 — æ)-maximal solution of the 
FLP problem 8, (8.36) and y is an (œ, w)-minimal 
solution of the FLP problem 9, (8.43). 


Remark 8.3 
In the crisp case, Theorems 8.1 and 8.2 are the standard 
linear programming weak duality theorems. 


Remark 8.4 

Let «> 0.5. Then [X]e C[X]i-a, [Ya C [Yli-e, 
hence in the first weak duality theorem we can change 
the assumptions as follows: x € [X]q and ye [¥]q. 


141 


€°8 | V Hed 


142 Part A | Foundations 


However, the statements of the theorem remain un- 
changed. The same holds for the second weak duality 
theorem. 


Finally, let us direct our attention to the strong dual- 
ity. Motivated by the pairs of Propositions 8.4 and 8.5, 
in Theorem 8.2, we consider a pair of dual LP problems 
corresponding to FLP problems (8.36) and (8.43) with 
fuzzy relations P = <Pos O= <M" q = B 


maximize 5, CIE 


JEN 

(P1) subject to > Gi; (ov) x < bR (a) , LEM, 
JEN 
ţjz0, FEN, (8.51) 

minimize > bR (a)y; 

IEM 

(D1) subject to ` laiz Ga), JEN, 
IEM 
yız0, i€M. (8.52) 


Moreover, we consider a pair of dual LP problems with 
fuzzy relations P = Nec PP — ~<Pos 


maximize > č (ax; 
JEN 
(P2) subject to 5 ai (a)y < bi (a) , LEM, 
JEN 


GB=0, JEN, 


minimize > bi (a)y; 
IEM 
(D2) subject to DD plaiz ča), jen, 
IEM 
yiz0, 


(8.53) 


ieM. (8.54) 
Notice that (P1) and (D1) are classical dual linear 
programming problems and the same holds for (P2) 
and (D2). 


€°8 | Y Hed 


Theorem 8.3 Strong Duality Theorem 
Let čj, aj, and b; be fuzzy quantities for all i € M and 
JEN. 


i) Let X be a feasible region of the FLP problem 
(8.36) with P = <Pos. Y be a feasible region of the 
FLP problem (8.43) with Q = <N, If for some 
a € (0,1), [X]q and [¥];-q are nonempty, then 
there exists x* — an (œ, w)-maximal solution of the 
FLP problem %, and there exists y* — an (l—a, 
1—q@)-minimal solution of the FLP problem © 
such that 


Ff @x =>" Foy. 


jEN iEM 


(8.55) 


ii) Let X be a feasible region of the FLP problem 
(8.36) with P = «Nee, y be a feasible region of the 
FLP problem (8.43) with Q = <P°S. If for some a € 
(0, 1), [X]ı— and [Y]q are nonempty, then there ex- 
ists x* — an (1 —a@, 1 —@)-maximal solution of the 
FLP problem 8, and y* — an (a, @)-minimal solu- 
tion of the FLP problem D such that 


P Tak" = Do bre. (8.56) 
JEN iEM 
Remark 8.5 


In the crisp case, Theorem 8.3 is the standard linear pro- 
gramming (strong) duality theorem. 


Remark 8.6 7 7 : 7 

Let wa >0.5. Then [X]q C [Xi-a, Wla C Whee. 
hence in the strong duality theorem, we can assume 
x € [X]q and y€ [Y]q. Evidently, the statement of the 
theorem remains unchanged. 


Remark 8.7 

Theorem 8.3 provides only the existence of the (a, a)- 
maximal solution (or (1 —@, 1 —@)-maximal solution) 
of the FLP problem 8, and (1 —a@, 1 — @)-minimal so- 
lution ((@, w)-minimal solution) of the FLP problem © 
such that (8.55) or (8.56) holds. However, the proof of 
the theorem gives also the method for finding the so- 
lutions by solving linear programming problems (P1) 
and (D1). 


Fuzzy Linear Programming and Duality | References 


8.4 Conclusion 


The leading idea of this chapter is based on the fact 
that, in many cases, the solutions of decision prob- 
lems obtained through linear programming are numer- 
ically tractable approximations of the original nonlin- 
ear problems. Because of the practical relevance of 
linear programming and taking into account a vast 
literature on this subject, we extended linear pro- 
gramming theory to problems involving fuzzy data. 
To obtain a meaningful extension of linear program- 
ming to problems involving fuzzy data, we specified 
a suitable class of permitted fuzzy values called fuzzy 
quantities or fuzzy numbers, introduced fundamental 
arithmetic operations with such fuzzy numbers, de- 
fined inequalities between fuzzy numbers, and clarified 


References 


8.1 R.E. Bellman, L.E. Zadeh: Decision making in a fuzzy 
environment, Manag. Sci. 17, B141-B164 (1970) 

8.2 H.-J. Zimmermann: Fuzzy programming and linear 
programming with several objective functions, Fuzzy 
Sets Syst. 1, 45-55 (1978) 

8.3 D. Dubois: The role of fuzzy sets in decision sciences: 
Old techniques and new directions, Fuzzy Sets Syst. 
184, 3-28 (2011) 

8.4 C. Stanciulescu, P. Fortemps, M. Install, V. Wertz: 
Multiobjective fuzzy linear programming problems 
with fuzzy decision variables, Eur. J. Oper. Res. 149, 
654-675 (2003) 

8.5 D. Dubois, H. Prade: Ranking fuzzy numbers in the 
setting of possibility theory, Inf. Sci. 30, 183-224 
(1983) 

8.6 J. Ramík: Duality in fuzzy linear programming: Some 
new concepts and results, Fuzzy Optim. Decis. Mak. 
4, 25-39 (2005) 

8.7 J. Ramík: Duality in fuzzy linear programming with 
possibility and necessity relations, Fuzzy Sets Syst. 
157, 1283-1302 (2006) 

8.8 A.L. Soyster: A duality theory for convex program- 
ming with set-inclusive constraints, Oper. Res. 22, 
892-898 (1974) 

8.9 D.J. Thuente: Duality theory for generalized linear 
programs with computational methods, Oper. Res. 
28, 1005-1011 (1980) 

8.10 H.W. Kuhn: Nonlinear programming - A historical 
view, SIAM-AMS 9, 1-26 (1976) 

8.11 J. Ramík, M. Vlach: Generalized Concavity in Opti- 
mization and Decision Making (Kluwer, Dordrecht 
2001) 


the meaning of feasibility and optimality concepts. 
On the other hand, we did not deal with linear pro- 
gramming problems involving fuzzy variables. In the 
literature, it has been done in many different ways, 
here we focused on such variants of the theory for 
fuzzy linear programming the results of which resemble 
in various degrees some of the useful results estab- 
lished in the conventional linear programming. The 
final and main section of this work has been devoted 
to duality theory for fuzzy linear programming prob- 
lems. We recalled some of the early approaches that 
are based on the ideas of Bellman and Zadeh, and 
Zimmermann, and then we presented our own recent 
results. 


8.12 D.A. Ralescu: A generalization of the repre- 
sentation theorem, Fuzzy Sets Syst. 51, 309-311 
(1992) 

8.13 J. Ramík, M. Vlach: A non-controversial definition 
of fuzzy sets, Lect. Notes Comput. Sci. 3135, 201-207 
(2004) 

8.14 W. Rödder, H.-J. Zimmermann: Duality in Fuzzy 
Linear Programming, Extremal Methods and Sys- 
tem Analysis (Springer, New York 1980) pp. 415- 
429 

8.15 H. Rommelfanger, R. Slowinski: Fuzzy linear pro- 
gramming with single or multiple objective func- 
tions, Handb. Fuzzy Sets Ser. 1, 179-213 (1998) 

8.16 H.-C. Wu: Duality theory in fuzzy linear programming 
problems with fuzzy coefficients, Fuzzy Optim. Decis. 
Mak. 2, 61-73 (2003) 

8.17 C.R. Bector, C. Chandra: On duality in linear program- 
ming under fuzzy environment, Fuzzy Sets Syst. 125, 
317-325 (2002) 

8.18 J.L. Verdegay: A dual approach to solve the fuzzy lin- 
ear programming problem, Fuzzy Sets Syst. 14, 131-141 
(1984) 

8.19 M. Inuiguchi, H. Ichihashi, Y. Kume: Some properties 
of extended fuzzy preference relations using modal- 
ities, Inf. Sci. 61, 187-209 (1992) 

8.20 M. Inuiguchi, J. Ramik, T. Tanino, M. Vlach: Satis- 
ficing solutions and duality in interval and fuzzy 
linear programming, Fuzzy Sets Syst. 135, 151-177 
(2003) 

8.21 H.Hamacher, H. Lieberling, H.-J. Zimmermann: Sen- 
sitivity analysis in fuzzy linear programming, Fuzzy 
Sets Syst. 1, 269-281 (1978) 


143 


8 | V Hed 


9. Basic Solutions of Fuzzy Coalitional Games 


Tomas Kroupa, Milan Vlach 


This chapter is concerned with basic concepts of 

solution for coalitional games with fuzzy coalitions 
in the case of finitely many players and transfer- 
able utility. The focus is on those solutions which 
preoccupy the main part of cooperative game the- 
ory (the core and the Shapley value). A detailed 

discussion or just the comprehensive overview of 
current trends in fuzzy games is beyond the reach 
of this chapter. Nevertheless, we mention current 
developments and briefly discuss other solution 

concepts. 


The theory of cooperative games builds and analyses 
mathematical models of situations in which players can 
form coalitions and make binding agreements on how 
to share results achieved by these coalitions. One of 
the basic models of cooperative games is a cooperative 
game in coalitional form (briefly a coalitional game or 
a game). Following Osborne and Rubinstein [9.1] we 
assume that the data specifying a coalitional game are 
composed of: 


@ A nonempty set 92 (the set of players) and 
a nonempty set X (the set of consequences), 

© A mapping V that assigns to every subset S of 2 
a subset V(S) of X, and 

© A family {>;}:eq@ of binary relations on X (players’ 
preference relations). 


The set §2 of all players is usually referred to as the 
grand coalition, subsets of §2 are called coalitions, and 
the mapping V is called the characteristic function (or 
coalition function) of the game. 

This definition provides a rather general frame- 
work for analyzing many classes of coalitional games. 
The games of this type are usually called coalitional 
games without side payments or without transferable 
payoff (or utility). Obviously, for many purposes, this 
framework is too general because it neither speci- 
fies some useful structure of the set of consequences 
nor properties of preference relations. At the same 


9.1 Coalitional Games 


with Transferable Utility................0....... 146 
GA. WE CORE cssisianccssscavassactacasandcseawen 147 
9.1.2 The Shapley Value ........cccsccccccsese 147 
9.1.3 Probabilistic Values................0.... 149 
9.2 Coalitional Games 
with Fuzzy Coalitions .....................0... 150 
9.2.1 Multivalued Solutions................... 151 
9.2.2 Single-Valued Solutions ............... 153 
9.3. Final Remarks... 155 
REFEFENCES........ occ cece cece eee eeeeeceeaeseeeaeenenees 156 


time, this framework is also too restrictive because of 
requiring that the domain of the characteristic func- 
tion must be the system of all subsets of the player 
set. 

In this chapter, we are mainly concerned with coali- 
tional games in which the number of players is finite. 
The number of players will be denoted by n and, 
without loss of generality, the players will be named 
by integers 1,2,...,n. In other words, we set 2 = 
N where N = {1,2,...,n}. Moreover, we assume that 
the sets V(S) of consequences are subsets of the n- 
dimensional real linear space R”, and that each player i 
prefers (x1, ... , Xn) to (y1,...,¥,) if and only if x; > y;. 
Furthermore, we significantly restrict the generality by 
considering only the so-called coalitional games with 
transferable payoff or utility. This class of games is 
a subclass of games without transferable utility that is 
characterized by the property: for each coalition S, there 
exists a real number v(S) such that 


V(S) = fxe R": X x; < v(S) and x; = Oifj¢ Sp. 


icS 


Evidently, each such game can be identified with the 
corresponding real-valued function v defined on the sys- 
tem of all subsets of N. 

In coalitional games, whether with transferable or 
nontransferable utility, each player has only two alter- 


145 


146 PartA 


Foundations 


6 | Y Hed 


natives of participation in a nonempty coalition: full 
participation or no participation. This assumption is too 
restrictive in many situations, and there has been a need 
for models that give players the possibility of partici- 
pation in some or all intermediate levels between these 
two extreme involvements. 

The first mathematical models in the form of coali- 
tional games in which the players are permitted to 
participate in a coalition not only fully or not at all 
but also partially were proposed by Butnariu [9.2] 
and Aubin [9.3]. Aubin notices that the idea of par- 
tial participation in a coalition was used already in the 
Shapley—Shubik paper on market games [9.4]. In these 
models, the subsets of N no longer represent every pos- 
sible coalition. Instead, a notion of a coalition has to be 
introduced that makes it possible to represent the partial 
membership degrees. 

It has become customary to assume that a member- 
ship degree of player i € N is determined by a number 
ai in the unit interval 7 = [0, 1], and to call the result- 
ing vector a = (a),...,d,) € I” a fuzzy coalition. The 


n-dimensional cube J” is thus identified with the set 
of all fuzzy coalitions. Every subset S of N, that is, 
every coalition S, can be viewed as an n-vector from 
{0, 1}” whose ith components is 1 when i € S and 0 
when ig S. These special fuzzy coalitions are often 
called crisp coalitions. Hence, we may think of the set 
of all fuzzy coalitions 7” as the convex closure of the set 
{0, 1}” of all crisp coalitions. This leads to the notion of 
an n-player coalitional game with fuzzy coalitions and 
transferable utility (briefly a fuzzy game) as a bounded 
function v: 7” > R satisfying v(0) = 0. 

It turns out that most classes of coalitional games 
with transferable utility and most solution concepts 
have natural counterparts in the theory of fuzzy games 
with transferable utility. Therefore, in what follows, we 
start with the classical case (Sect. 9.1) and then deal 
with the fuzzy case (Sect. 9.2). Taking into account 
that, in comparison with the classical case, the theory 
of fuzzy games is relatively less developed, we focus 
attention on two well-established solution concepts of 
fuzzy games: the core and the Shapley value. 


9.1 Coalitional Games with Transferable Utility 


We know from the beginning of this chapter that from 
the mathematical point of view, every n-player coali- 
tional game with transferable utility can be identified 
with a real-valued function v defined on the system 
of all subsets of the set N = {1,2,...,n}. For conve- 
nience, we assume that always v(@) = 0. 

It is customary to interpret the value v(S) of the 
characteristic function v at coalition S as the worth of 
coalition S or the total payoff that coalition § will be 
able to distribute among its members, provided exactly 
the coalition S forms. However, equally well, the num- 
ber v(S) may represent the total cost of reaching some 
common goal of coalition S that must be shared by the 
members of S; or some other quantity, depending on the 
application field. In conformity with the players pref- 
erences stated previously, we usually assume that v(S) 
represents the total payoff that S can distribute among 
its members. 

Since the preferences are fixed, we denote the game 
given through N and v by (N,v), or simply v, and 
the collection of all games with fixed N by Gy. The 
sum v + w of games from Gy defined by (v + w)(S) = 
v(S) + w(S) for each coalition S is again a game from 
Gy. Moreover, if multiplication of v € Gy by a real 
number g is defined by (av)(S) = av(S) for each coali- 


tion S, then av also belongs to Gy. An important and 
well-known fact is that Gy endowed with these two al- 
gebraic operations is a real linear space. 


Example 9.1 Simple games 

If the range of a game v is the two-element set {0, 1} 
only, then the game can be viewed as a model of a vot- 
ing system where each coalition A C N is either winning 
(v(A) = 1) or loosing (v(A) = 0). Then it is natural to 
assume that the game also satisfies monotonicity; that 
is, if coalition A is winning and B is a coalition with 
A CB, then B is also winning. It is also natural to con- 
sider only games with at least one winning coalition. 
Thus, we define a simple game [9.5, Section 2.2.3] to 
be a {0, 1}-valued coalitional game v such that the grand 
coalition is winning and v(A) < v(B), whenever A C B 
foreach A,B C N. 


We say that a game v is superadditive if v(A U B) > 
v(A) + v(B), for every disjoint pair of coalitions A, B C 
N. Consequently, in a superadditive game, it may be ad- 
vantageous for members of disjoint coalitions A and B 
to form coalition A U B because every pair of disjoint 
coalitions can obtain jointly at least as much as they 
could have obtained separately. Consequently, it is ad- 


Basic Solutions of Fuzzy Coalitional Games 


9.1 Coalitional Games with Transferable Utility 


vantageous to form the largest possible coalitions, that 
is, the grand coalition. 

The strengthening of the property of superadditivity 
is the assumption of nondecreasing marginal contri- 
bution of a player to each coalition with respect to 
coalition inclusion: a game v is said to be convex 
whenever 


v(A U {i}) — v(A) < v(B U {i}) — v(B) 


for each i € N and every AC B C N \ {i}. It can be di- 
rectly checked that convexity of v is equivalent to 


v(A UB) + (AN B) > v(A) + v(B) 
foreveryA,BCN. 


Example 9.2 
Let B be a nonempty coalition in a simple game (N, v). 
Then the game vg given by 


1, ADB, 


vp(A) = 
B(A) 0, otherwise, 


ACN, 


is a convex simple game. 


Example 9.3 Bankruptcy game [9.6] 

Let e>0 be the total value of assets held in 
a bankruptcy estate of a debtor and let N be the set of all 
creditors. Furthermore, let d; > 0 be the debt to creditor 
i € N. Assume that e < J` ;ey di. The bankruptcy game 
is then the game such that, for every A C N, 


v(A) = max | 0,e — > d; 


iEN\A 
It can be shown that the bankruptcy game is convex. 


There is a variety of solution concepts for coali- 
tional games with n players. Some, like the core, stable 
set or bargaining set, may consist of sets of real n- 
vectors, while others offer as a solution of a game 
a single real n-vector. 


9.1.1 The Core 


Let v be an n-player coalitional game with transferable 
utility. The core of v is the set of all efficient payoff 
vectors x € R” upon which no coalition can improve, 


that is, 


C(v) = jx € R" 


oxi = v(N) 


iEN 


and Xox > v(A) for each A CN}. (9.1) 


icA 


The Bondareva—Shapley theorem [9.5, Theorem 3.1.4] 
gives a necessary and sufficient condition for the core 
nonemptiness in terms of the so-called balanced sys- 
tems. It is easy to see that the core of every game is 
a (possibly empty) convex polytope. Moreover, the core 
of a convex game is always nonempty and its vertices 
can be explicitly characterized [9.7]. 


9.1.2 The Shapley Value 


Let f = (fi.f2,..-.f,) be a mapping that assigns to ev- 
ery game v from some collection of games from Gy 
a real n-vector f(v) = (fi(v), P), ..-,fn(v)). Follow- 
ing the basic interpretation of values of a characteristic 
functions as the total payoff, we can interpret the values 
of components of such a function as payoffs to individ- 
ual players in game v. 

Let A be a nonempty collection of games from Gy. 
A solution function on A is a mapping f from A into 
the n-dimensional real linear space IR”. If the domain 
A of f is not explicitly specified, then it is assumed to 
be Gy. The collection of such mappings is too broad to 
contain only the mappings that lead to sensible solution 
concepts. Hence, to obtain reasonable solution concepts 
we have to require that the solution functions have some 
reasonable properties. One of the natural properties in 
many contexts is the following property of efficiency. 


Property 9.1 Efficiency 

A solution function f on a subset A of Gy is efficient on 
A iffi) +fav) +++: +f, (v) = v(N) for every game v 
from A. 


This property can be interpreted as a combina- 
tion of the requirements of the feasibility defined by 
Ai@) +h) ++: +fav) < v(N) and collective rational- 
ity defined by fi ~) + fa(v) ++ ++ +fn(v) 2 vN). 

In addition to satisfying the efficiency condition, 
solution functions are required to satisfy a number of 
other desirable properties. To introduce some of them, 
we need further definitions. 

Player i from N is a null player in game v if 
v(SU {i}) = v(S) for every coalition S that does not con- 


147 


6 | Y Hed 


148 PartA 


Foundations 


6 | Y Hed 


tain player i; that is, participation of a null player in 
a coalition does not contribute anything to the coalition 
in question. 

Player i from N is a dummy player in game v if v(S'U 
{i}) = v(S) + v({i}) for every coalition S that does not 
contain player i; that is, a dummy player contributes to 
every coalition the same amount, his or her value of the 
characteristic function. 

Players i and j from N are interchangeable in 
game v if v(SU {i}) = v(SU {j}) for every coalition S 
that contains neither player i nor player j. In other 
words, two players are interchangeable if they can re- 
place each other in every coalition that contain one of 
them. 


Property 9.2 Null player 
A solution function f satisfies the null player property 
if f(v) = 0 whenever v € Gy and iis a null player in v. 


Property 9.3 Dummy player 

A solution function f satisfies the dummy player prop- 
erty if f;(v) = v({i}) whenever v € Gy andi is a dummy 
player in v. 


Property 9.4 Equal treatment 

A solution function f satisfies the equal treatment prop- 
erty if fi) =f(v) for every v € Gy and every pair of 
players i, j that are interchangeable in v. 


These three properties are quite reasonable and 
attractive, especially from the point of fairness and im- 
partiality: a player who contributes nothing should get 
nothing; a player who contributes the same amount to 
every coalition cannot expect to get anything else than 
he or she contributed; and two players who contribute 
the same to each coalition should be treated equally by 
the solution function. 

The next property reflects the natural requirement 
that the solution function should be independent of the 
players’ names. Let v be a game from Gy and mx: N > N 
be a permutation of N, and let the image of coalition S 
under 7x be denoted by 7 (S). It is obvious that, for every 
v € Gy, the function zv defined on Gy by (zv)(S) = 
v(x (S)) is again a game from Gy. Apparently, the game 
xv differs from game v only in players’ names; they are 
interchanged by the permutation z. 


Property 9.5 Anonymity 
A solution function f is said to be anonymous if, for 


every permutation x of N, we have fj(zv) = faa (v) for 
every game v € Gy and every player i € N. 


When a game consists of two independent games 
played separately by the same players or if a game is 
split into a sum of games, then it is natural to require 
the following property of additivity. 


Property 9.6 Additivity 

A solution function f on Gy is said to be additive if 
f(u+ v) =f(u)+ f(v) for every pair of games u and v 
from Gy. 


The requirement of additivity differs from the previ- 
ous conditions in one important aspect. It involves two 
different games that may or may not be mutually depen- 
dent. In contrast, the dummy player and equal treatment 
properties involve only one game, and the anonymity 
property involves only those games which are com- 
pletely determined by a single game. 


Remark 9.1 

The terminology introduced in the literature for various 
properties of players and solution functions is not com- 
pletely standardized. For example, some authors use the 
term dummy player and symmetric players (or substi- 
tutes), for what we call null player and interchangeable 
players, respectively. Moreover, the term symmetry is 
sometimes used for our equal treatment and sometimes 
for our anonymity. 


One of the most studied and most influential single- 
valued solution concept for coalitional games with 
transferable utility is the Shapley solution function 
or briefly the Shapley value, proposed by Shapley in 
1953 [9.8]. The simplest way of introducing the Shap- 
ley value is to define it explicitly by the following 
well-known formula for calculation of its components. 


Definition 9.1 

The Shapley value on a subset A of Gy is a solution 
function g on A whose components ¢ (Vv), p2(v),..., 
Pn (v) at game v € A are defined by 


(s—1)!(n—s)! i 
g=}, ——[v(S) — SD], 
: n! 
Si ES 
(9.2) 
where the sum is meant over all coalitions § contain- 
ing player i, and s generically stands for the number of 
players in coalition S. 


Basic Solutions of Fuzzy Coalitional Games 


9.1 Coalitional Games with Transferable Utility 


To clarify the basic idea behind this definition, we 
first recall the notion of players’ marginal contributions 
to coalitions. 


Definition 9.2 

For each player i and each coalition S, a marginal con- 
tribution of player i to coalition S in game v from Gy is 
the number m; (S) defined by 


v(S)—v(S\ {i}) if ies 


m=] SU- if igs 


Now imagine a procedure for dividing the total payoff 
v(N) among the members of N in which the players en- 
ter a room in some prescribed order and each player 
receives his or her marginal contribution as payoff to 
the coalition of players already being in the room. Sup- 
pose that the prescribed order is (7x (1), 2(2),..., 7z (n)) 
where 1: N — N is a fixed permutation of N. Then the 
procedure under consideration determines the payoffs 
to individual players as follows: before the first player 
z(l) entered the room, there was the empty coalition 
waiting in the room. After player 7x (1) enters, the coali- 
tion in the room becomes {7 (1)} and the player receives 
v({z(1)}) —v(@). Similarly, before the second player 
z (2) entered, there was coalition {2 (1)} waiting in the 
room. After player 7 (2) enters, coalition {7 (1), 7(2)} 
is formed in the room and player (2) receives 
v({z (1), (2)}) — v({w(1)}). This continues till the last 
player x (n) enters and receives v(N) — v(N \ {x (n)}). 

Let S7 denote the coalition of players preceding 
player i in the order given by (z(1),7(2),...,2(n)); 
that is, S7 = {x (1), z (2),..., x (j— 1)} where j is the 
uniquely determined member of N such that i = x (j). 
Because there are n! possible orders, the arithmeti- 
cal average of the marginal contributions of player i 
taken over all possible orderings is equal to the num- 
ber (1/n!) X m; (S7) where the sum is understood over 
all permutations z of N. This number is exactly the i-th 
component of the Shapley value. Therefore, in addition 
to the equality (9.2) we also have the equality 


1 (QT 
giv) = = 2 (s7) (9.3) 


for computing the components of the Shapley value. 

In addition to satisfying the condition of efficiency, 
the Shapley value has a number of other useful prop- 
erties. In particular, it satisfies all properties 9.2-9.6. 
Remarkably, no other solution function on Gy satisfies 


the properties of null player, equal treatment, and addi- 
tivity at the same time. 


Theorem 9.1 Shapley 

For each N, there exists a unique solution function on 
Gy satisfying the properties of efficiency, null player, 
equal treatment, and additivity; this solution function is 
the Shapley value introduced by Definition 9.1. 


The standard proof of this basic result follows from 
the following facts: 


@ The collection {ur :T Æ Ø,T C N} of unanimity 
games defined by 


1 ifTCS 


y= 0 otherwise , 


(9.4) 
form a base of the linear space Gy. 

@ The null player and equal treatment properties guar- 
antee that o is determined uniquely on multiples of 
unanimity games. 

@ The property of additivity (combined with the fact 
that the unanimity games form a basis) makes it 
possible to extend g in a unique way to the whole 
space Gy. 


In addition to the original axiomatization by Shap- 
ley, there exist several equally beautiful alternative 
axiomatizations of the Shapley value that do not use the 
property of additivity [9.9, 10]. 


9.1.3 Probabilistic Values 


Let us fix some player i and, for every coalition S that 
does not contain player i, denote by a(S) the num- 
ber s!(n— s — 1)!/n!. The family {a;(S) : S CN \ {i}} is 
a probability distribution over the set of coalitions not 
containing player 7. Because the i-th component of the 
Shapley value can be computed by 


gv) = DY) a(S)[(SU {i}) —v(S)], 


SEND 


we see that the i-th component of the Shapley value 
is the expected marginal contribution of player i with 
respect to the probability measure {a;(S) : S CN \ {i} 
and that the Shapley value belongs to the following 
class of solution functions: 


Definition 9.3 
A solution function f on a subset A of Gy is called 
probabilistic on A if, for each player i, there ex- 


149 


6 | Y Hed 


150 PartA 


Foundations 


776 | Y Hed 


ists a probability distribution {p;(S):S CN \ {i}} on 
the collection of coalitions not containing i such 
that 


fv = J pilS)Su {i}) — v(S)] 


SSN\iF 


(9.5) 


for every v € A. 


The family of probabilistic solution functions embraces 
an enormous number of functions [9.11]. The efficient 
probabilistic solution functions are often called quasi- 
values, and the anonymous probabilistic solution func- 
tions are called semivalues. Since the Shapley value 
is anonymous and efficient on Gy, we know that it is 
both a quasivalue and a semivalue on Gy. Moreover, the 
Shapley value is the only probabilistic solution function 
with these properties. 


Theorem 9.2 Weber 

If N has at least three elements, then the Shapley value 
is the unique probabilistic solution function on Gy, that 
is, anonymous and efficient. 


Another widely known probabilistic solution func- 
tion is the function proposed originally only for voting 
games by Banzhaf [9.12]. 


Definition 9.4 
The Banzhaf value on Gy is a solution function y on 
Gy whose components at game v are defined by 


yi0) = by : 


a=] 
SSN\Li} 


PSU ti}) — v(S)] . (9.6) 


Again, by simple computation, we can verify that the 
Banzhaf solution is a probabilistic solution function. 
Consequently, the i-th component of the Banzhaf so- 
lution is the expected marginal contribution of player i 
with respect to the probability measure {6;(S) : SC N \ 
{i}}, where B;(S) = 1/2"! for each subset S of N \ {i}. 
From the probabilistic point of view, the Banzhaf so- 
lution is based on the assumption that each player i is 
equally likely to join any subcoalition of N \ {i}. On the 
other hand, the Shapley value is based on the assump- 
tion that the coalition the player i enters is equally likely 
to be of any size s, 0 < s < n—1, and that all coalitions 
of this size are equally likely. 


9.2 Coalitional Games with Fuzzy Coalitions 


Since the publication of Aubin’s seminal paper [9.3], 
cooperative scenarios allowing for players’ fractional 
membership degrees in coalitions have been studied. In 
such situations, the subsets of N no longer model ev- 
ery possible coalition. Instead, a notion of coalitions has 
to be introduced that makes it possible to represent the 
partial membership degrees. It has become customary 
to assume that a membership degree of player i € N is 
determined by a number a; in the unit interval J = [0, 1], 
and to call the resulting vector a= (d),...,d,) € I” 
a fuzzy coalition. (The choice of J” is not the only pos- 
sible choice, see [9.13] or the discussion in [9.14].) 
The n-dimensional cube 7” is thus identified with the 
set of all fuzzy coalitions. Every subset A of N, that 
is, every classical coalition, can be viewed as a vector 
ly € {0, 1}” with coordinates 


1 ifieA 
1 i= , 
(la) 0 otherwise . 
These special fuzzy coalitions are also called crisp 


coalitions. When A = {i} is a singleton, we write 
simply 1; in place of lsp. Hence, we may think 


of the set of all fuzzy coalitions J” as the con- 
vex closure of the set {0,1}" of all crisp coalitions; 
see [9.14] for further explanation of this convexification 
process. 

Several definitions of fuzzy games appear in the lit- 
erature [9.3, 15]. We adopt the one used by Azrieli and 
Lehrer [9.13]. However, note that the authors of [9.13] 
use a slightly more general definition, since they con- 
sider a fuzzy coalition a to be any nonnegative real 
vector such that a < q, where q € R” is a given non- 
negative vector. 


Definition 9.5 

An n-player game (with fuzzy coalitions and transfer- 
able utility) is a bounded function v: I” —> R satisfying 
v(1g) = 0. 


If we want to emphasize the dependence of Defini- 
tion 9.5 on the number n of players, then we write (/”, v) 
in place of v. Further, by v we denote the restriction of 
v to all crisp coalitions 


(A) =v(I4), ACN. (9.7) 


Basic Solutions of Fuzzy Coalitional Games 


9.2 Coalitional Games with Fuzzy Coalitions 


Hence, every game with fuzzy coalitions v induces 
a classical coalition game v with transferable utility. 
Most solution concepts of the cooperative game the- 
ory have been generalized to games with fuzzy coali- 
tions. A payoff vector is any vector x with n real coordi- 
nates, x = (x,,...,X,) € R”. In a particular game with 
fuzzy coalitions (/",v), each player i€ N obtains the 
amount of utility x; as a result of his cooperative activity. 
Consequently, a fuzzy coalition a € J" gains the amount 


(a, x) = X ax s 
i=1 

which is just the weighted average of the players’ pay- 
offs x with respect to their participation levels in the 
fuzzy coalition a. By a feasible payoff in game (I",v), 
we understand a payoff vector x with (1y,x) < v(1y). 

The following general definition captures most so- 
lution concepts for games with fuzzy coalitions. 


Definition 9.6 

Let Iy be a class of all games with fuzzy coalitions 
(I”, v) and let Ay be its nonempty subclass. A solution 
on Ay is a function o that associates with each game 
(I, v) in Ay a subset o(/", v) of the set 


{x € R”|(1x. x) < vy)} 
of all feasible payoffs in game (7”, v). 


The choice of ø is governed by all thinkable rules 
of economic rationality. Every solution ø is thus de- 
termined by a system of restrictions on the set of all 
feasible payoff vectors in the game. For example, we 
may formulate a set of axioms for ø to satisfy or single 
out inequalities making the payoffs in o(/”, v) stable, in 
some sense. 


9.2.1 Multivalued Solutions 


Core 
The core is a solution concept o defined on the whole 
class of games with fuzzy coalitions Iy. We present the 
definition that appeared in [9.3]. 


Definition 9.7 
Let N = {1,...,n} be the set of all players and v € Ty. 
The core of v is a set 
C(v) = {x € R"|(1y, x) = v(x), (a,x) > v(a), 
for everyael"}. 
(9.8) 


In words, the core of v is the set of all payoff vec- 
tors x such that no coalition a € J" is better off when 
accepting any other payoff vector y ¢ C(v). This is 
a consequence of the two conditions in (9.8): Pareto ef- 
ficiency (1y,x) = v(1y) requires that the profit of the 
grand coalition is distributed among all the players in N 
and coalitional rationality (a,x) > v(a) means that no 
coalition a € 7” accepts less than is its profit v(a). 

Observe that the core C(v) of a game with fuzzy 
coalitions v is an intersection of uncountably many 
halfspaces (a,x) > v(a) with the affine hyperplane 
(1y,x) =v(1y). This implies that the core is a possi- 
bly empty compact convex subset of R”, since C(v) is 
included in the core (9.1) of a classical coalition game v 
given by (9.7). In this way, we may think of the Aubin’s 
core C(v) as a refinement of the classical core (9.1). 

A payoff x in the core C(v) must meet uncountably 
many restrictions represented by all coalitions /”. This 
raises several questions: 


1. When is C(v) nonempty/empty? 

2. When is C(v) reducible to the intersection of finitely 
many sets only? 

3. For every a € l”, is there a core element x € C(v) 
giving coalition a exactly its worth v(a)? 

4. Is there an allocation rule for assigning payoffs in 
C(v) to fuzzy coalitions? 


Azrieli and Lehrer formulated a necessary and a suf- 
ficient condition for the core nonemptiness [9.13], thus 
generalizing the well-known Bondareva—Shapley the- 
orem for classical coalition games. We will need an 
additional notion in order to state their result. The 
strong superadditive cover of a game v € Ty is a game 
» € Iy such that, for every a € 1”, 


£ 
(a) = sup | X iw) LEN, a <a, à> 0, 


k=1 


£ 
X Aa’ =a, feal ; 


k=1 


The nonemptiness of C(v) depends on value of > at 
one point only. 


Theorem 9.3 Azrieli and Lehrer [9.13] 
Let v € Iy. The core C(v) is nonempty if and only if 
v(x) = (ly). 


The above theorem answers Question 1. Neverthe- 
less, it may be difficult to check the condition v(1y) = 


151 


776 |Y Hed 


152 PartA | Foundations 


776 | Y Hed 


v(1y). Can we simplify this task for some classes of 
games? In particular, can we show that the shape of the 
core is simpler on some class of games? This leads nat- 
urally to Question 2. Branzei et al. [9.16] showed that 
the class of games for which this holds true is the class 
of convex games. We say that a game v € I) is convex, 
whenever the inequality 


v(a + c)— v(a) < v(b +c) —v(b) (9.9) 


is satisfied for every a, b, c € I” such that b + c € J" and 
a <b. A word of caution is in order here: in general, 
as shown in [9.13], the convexity of the game v € Iy 
does not imply and is not implied by the convexity of v 
as an n-place real function. The game-theoretic convex- 
ity captures the economic principle of nondecreasing 
marginal utility. Interestingly, this property makes it 
possible to simplify the structure of C (v). 


Theorem 9.4 Branzei et al. [9.12] 

Let ve Iy be a convex game. Then C(v)#ø and, 
moreover, C (v) coincides with the core C (v) of the clas- 
sical coalition game v. 


The previous theorem, which solves Question 2, pro- 
vides in fact the complete characterization of core on 
the class of convex games with fuzzy coalitions. Indeed, 
since the game v is convex, we can use the result of 
Shapley [9.7] to describe the shape of C(v) = C (v). 
The point 3 motivates the following definition. A 
game v € Iy is said to be exact whenever for every 
a € l”, there exists x € C(v) such that (a, x) = v(a). The 
class of exact games can be explicitly described [9.13]. 


Theorem 9.5 
Let v € Iy. Then the following properties are equiva- 
lent: 


i) vis exact; 
ii) v(a) = min {(a, x)|x € C(v)}; 
iii) v is simultaneously 
a) aconcave, positively homogeneous function on 
T”, and 
b) v(a + (1 —A)ly) = Av(a) + 1 —åà)v(1x), for 
every a € l” andevery0 <A <1. 


The second equivalent property enables us to generate 
many examples of exact games — it is enough to take the 
minimum of a family of linear functions, each of which 
coincides at point ly. 


Question 4 amounts to asking for the existence of 
allocation rules in the sense of Lehrer [9.17] or dy- 
namic procedures for approximating the core elements 
by Wu [9.18]. A bargaining procedure for recovering the 
elements of the Aubin’s core C(v) is discussed in [9.19], 
where the authors present the so-called Cimmino-style 
bargaining scheme. For a game v € Iy and some initial 
payoff x? € R”, the goal is to recover a sequence of pay- 
offs converging to a core element, provided that C(v) Æ 
Ø. We consider a probability measure that captures the 
bargaining power of coalitions a € I": a coalitional as- 
sessment is any complete probability measure v on 7”. 
In what follows, we will require that v is such that, for 
every Lebesgue measurable set A C 7”, 


v(A)>0, whenever A is openor ly EA. (9.10) 


Let x € R” be an arbitrary payoff and a € 7”. We denote 


C (v) = ty € R"|(a, y) > v(a)} aéel"\ {In}, 
‘ {y€R"|(Iy,y) =vdy)} a=1y. 


What happens when payoff x is accepted by a, that 
is, x € C,(v)? Then coalition a has no incentive to bar- 
gain for another payoff. On the contrary, if x € C,(v), 
then a may seek the payoff Pax € C,(v) such that Pax 
is the closest to x in some sense. Specifically, we will 
assume that P,x minimizes the Euclidean distance of x 
from set C,(v). This yields the formula 


Px 
= arg Mie, (v) lly — x|l 
max{0, v(a) — (a, x)} 
lal? 


= P v(1y) — (iy, x) 


ael"\ {1g, 1n}, 


1 N a= 1 N, 
n 
x a= lg š 
where ||- || is the Euclidean norm. After all coalitions 


a € I” have raised their requests on the new payoff Pax, 
we will average their demands with respect to the coali- 
tional assessment v in order to obtain a new proposal 
payoff vector Px. Hence, Px is computed as 


Px = [re dv(a). 
jy 


The integral on the right-hand side is well defined, 
whenever v is Lebesgue measurable. The amalgamated 


Basic Solutions of Fuzzy Coalitional Games | 9.2 Coalitional Games with Fuzzy Coalitions 153 


projection operator P is the main tool in the Cimmino- 
style bargaining procedure: an initial payoff x° is arbi- 
trary, and we put x* = Px! for each k = 1,2,... 


Theorem 9.6 
Let v € Ty be a continuous game with fuzzy coalitions 
and let v be a coalitional assessment satisfying (9.10): 


1. If the sequence (x*)zen generated by the Cimmino 
procedure is bounded and 


lim ed dv(a) =0, (9.11) 


k—oo 
[0.1]” 


then C(v) Æ Ø and lim x € C). 
k—->oo 


2. Ifthe sequence (x*),en is unbounded or (9.11) does 
not hold, then C(v) = Ø. 


The interested reader is invited to consult [9.19] for fur- 
ther details and numerical experiments. 


9.2.2 Single-Valued Solutions 


Shapley value 
Aubin defined Shapley value on spaces of games 
with fuzzy coalitions possessing nice analytical prop- 
erties [9.3, 14, Chapter 13.4]. Specificically, let a func- 
tion v: R” > R be positively homogeneous and Lip- 
schitz in the neighborhood of 1y. Such functions are 
termed generalized sharing games with side payments 
by Aubin [9.14, Chap. 13.4]. The restriction of v onto 
the cube 7” is clearly a game with fuzzy coalitions and 
therefore we would not make any distinction between v 
and its restriction to 7”. In addition, assume that func- 
tion v is continuously differentiable at 1y and denote by 
Gy the class of all such games with fuzzy coalitions. 
Hence, we may put 


o(v)=Vv(ly), veGy. (9.12) 


Each coordinate 0;(v) of the gradient vector o(v) cap- 
tures the marginal contribution of player i € N to the 
grand coalition Iy. As pointed out by Aubin, the gradi- 
ent measures the roles of the players as pivots in game v. 
Moreover, the operator o given by (9.12) can be con- 
sidered as a generalized Shapley value on the class of 
games G}, (cf. Theorem 9.1): Aubin proved [9.14, Chap- 
ter 13.4] that the operator defined by (9.12) satisfies 


(Ix. o @)) = vy), 


for every game v € G}, and 
oi(av) = Ora V), 


for every player i € N and every permutation x of N. 
Moreover, o fulfills a certain variant of the Dummy 
Property. 

When defining a value on games with fuzzy coali- 
tions, many other authors [9.15,20] proceed in the 
following way: a classical cooperative game is extended 
from the set of all crisp coalitions to the set of all fuzzy 
coalitions. The main issue is to decide on the nature of 
this extension procedure and to check that the extended 
game with fuzzy coalitions inherits all or at least some 
properties of the function that is extended (such as su- 
peradditivity or convexity). Clearly, there are as many 
choices for the extension as there are possible interpo- 
lations of a real function on {0, 1}” to the cube [0, 1]”. 

Tsurumi et al. [9.20] used the Choquet integral as 
an extension. Specifically, for every a € 1”, let Vz = 
{a;|a; > 0, i € N} and let na = |V,|. Without loss of gen- 
erality, we may assume that the elements of V, are 
ordered and write them as bı <--- <b,,. Further, put 
[a], = {i € Nla; > y}, for each a € J” and for each y € 
[0, 1]. 


Definition 9.8 
A game with fuzzy coalitions (/",v) is a game with 
Choquet integral form whenever 


Na 


v(a) = X v (laln) bi-b-1), acr, 


i=1 


where bo = 0. Let IF be the class of all games with 
Choquet integral form. 


In the above definition the function v is the so-called 
Choquet integral [9.21] of a with respect to the restric- 
tion v of v to all crisp coalitions. It was shown that 
every game v € IY is monotone whenever y is mono- 
tone [9.20, Lemma 2] and that v is a continuous function 
on 7” [9.20, Theorem 2]. The authors define a mapping 


f: TE > ((0,00)")", 
which is called a Shapley function, by the following as- 
signment 


Na 


FOA = X PO) bi- bim) 


i=1 


ieN, verf, acl", 


76 |Y Hed 


154 PartA 


Foundations 


776 | Y Hed 


where 
POS) 
~ (JA ! ! 
= y A= DMI 49 Gta) -sAN t), 
Ach |B|! 
i€A 
BCN, 


whenever i € B, and f? (V) (B) = 0, otherwise. Observe 
that f;(v)(a) is the Choquet integral of a with respect 
to f°(v) and that £°(¥)(B) is the Shapley value of y 
with the grand coalition N replaced with the coalition B. 
Before we show that the Shapley function has some ex- 
pected properties, we prepare the following definitions. 
Let a € I” and i,j € N. For each b e I” with b < a, de- 
fine a vector bj, whose coordinates are 


bi AG k=i, 
(bj), = yb AG k=j, keN. 
by otherwise , 
For an arbitrary b € 7”, put 
b k=i, 
(b; lal), = bi k=j, ken. 


b, otherwise , 


Clearly, we have both bi < aand bi; [a] < a. The follow- 
ing theorem is proved in [9.20]. 


Theorem 9.7 
The operator f : Tf > ({0,00)")" has the following 
properties: 


1. Ifve Df anda €17”, then 
XAO =va) and foa)=0, 
iEN 
for every j € N such that a = 0. 


2. Ifve Tl’, a el", and be!" such that v(b Ac) = 
v(b), for every c € J" with c < a, then 


fi) (a) =fi(v)(b) foreveryieN. 
3. fve, a er, aj; is such that v(a; ^c) = v(b), 


for every c € I” with c < a, and v(b) = v(bi,) for ev- 
ery b € I” with b < aj then 


fO =f). 


4. If v, w € IE, then v+ w € If, and 


FiO +w) =fMOt+hw@ . 


for every i € N and every a € J”. 


The previous theorem thus says that the Shapley func- 
tion f on the class of games [F has the properties 
analogous to the Shapley value: efficiency, the carrier 
property, symmetry, and additivity. 

Butnariu and Kroupa [9.15] studied a value operator 
on the class of fuzzy games (7”, v) satisfying 


a= D> yva), ael, 
tEe[0,1] 
where y : [0, 1] > R fulfills 
(w(t) = 0 iff t = 0) and y (1) = 1 
and 


d = {i€ Njai = t} r 


The class of such fuzzy games is denoted by T, a The 
so-called Shapley mapping function can be axioma- 
tized on T, w [9.15, Axioms 1-3]: it turns out that there 
is only one Shapley mapping ® : rY — (R”)” [9.15, 
Theorem 1]. 


Theorem 9.8 
There exists a unique Shapley mapping ® : T, vy > 
(R”)” and it is given by the following formula: 


P(v)(a) = 
vm X 


SEP;(a") 


asl = Dla" — |S)! 


la’|! 


(v(S) —v(S\ D), 


ifa;=r>0, 


0, otherwise, 
where 
Pi(a’)={RCN|ieRandRCa}. 


The expected total allocation of player i € N is then ob- 
tained as 


dv) = J @,(v)(a) da, 


rP 


Basic Solutions of Fuzzy Coalitional Games | 9.3 Final Remarks 


provided that the above Lebesgue integral exists. The 
operator d= (Ê, oe Ên) is called the cumulative 
value of v. If the weight function w is bounded and 
Lebesgue integrable, then [9.15, Theorem 2] shows that 
the cumulative value is well-defined and its coordinates 
are 


1 
Êv) = v(1i) | YA de 
| 


for each i € N. 

Owen’s approach to classical Shapley value [9.22] 
cannot be, strictly speaking, classified as an attempt to 
define a Shapley-style value on some class of games 
with fuzzy coalitions, but we mention his construction 
for the sake of completeness. The idea is to extend 
a game v € Gy with crisp coalitions from its domain 
{1,4|A C N} to the whole unit cube 7” by way of the 
multilinear interpolation. The resulting multilinear ex- 
tension v can be described explicitly as the function 


va) = Y | [Ja] [a-a) | va), 


ACN | i&A iA 


a= (ai,..., an) ET”. (9.13) 


9.3 Final Remarks 


We presented results concerning basic concepts of so- 
lution for coalitional games with fuzzy coalitions and 
finitely many players in the case of transferable utility. 
We concentrated on those solutions which preoccupy 
the main part of cooperative game theory (the core and 
the Shapley value). A detailed discussion or just the 
comprehensive overview of the current trends in fuzzy 
games is beyond the reach of this chapter. Neverthe- 
less, in this section we mention current developments 
and briefly discuss other solution concepts. The reader 
should always consult the relevant reference for the 
specification of the concepts used by the cited authors; 
for example, we can find at least two definitions of 
a convex fuzzy game: 


1. Azrieli and Lehrer [9.13] and [9.16] use the defini- 
tion (9.9) employed herein; 

2. Tsurumi et al. [9.20] call a game with fuzzy coali- 
tions v convex whenever 


v(av b) + v(an b) = v(a) + v(b) 
holds true for every a,b € 1”. 


Function Y is linear in each of its variables separately 
and v(A) = v(1,), for each A C N. The usual formula 
(9.13) for the Shapley value (v) of v now takes the 
following diagonal form [9.22] 


1 
640) = f Een dt. (9.14) 
Ox; 
0 


Hence, ¢;(v) is completely determined by the behavior 
of the function v in the neighborhood of the diagonal 
in 7”. The formula (9.14) is important from the com- 
putational point of view: its use in connection with 
statistical techniques can enhance computations with 
the Shapley value — see [9.23, Chap. XII.4] for further 
details. 

Since the space of games with crisp coalitions is fi- 
nite dimensional unlike the space of games with fuzzy 
coalitions, there is no general approach to the Shap- 
ley value of fuzzy games. Even a direct comparison of 
the cumulative value introduced above with the Shap- 
ley function on the space of games If of Tsurumi 
et al. [9.20] is hardly possible since the domains of 
Shapley operators are essentially different. The selec- 
tion of the right space of games and an appropriate 
solution thus vary from one application to another. 


Shellshear [9.24] employs the concavification of 
the fuzzy game — the strong supperadditive cover — in 
order to show [9.24, Theorem 4.4] that the strong sup- 
peradditive cover has a stable core if and only if the 
original game has a stable core. Further, he investigates 
important properties of the concavification and its su- 
perdifferential; new necessary and sufficient conditions 
for core stability are given in [9.23, Chap. XIL.4]. 

Yang et al. [9.25] introduced the concept of bargain- 
ing sets for games with fuzzy coalitions; they prove 
that the bargaining set coincides with the Aubin core 
whenever a game is continuous and convex. Liu and 
Liu [9.26] extended the results from [9.25] in order 
to overcome some weakness of the previously used 
fuzzy bargaining sets. The concept of the classical Mas- 
Colells bargaining set was also generalized and the 
authors proved existence theorems for such fuzzy bar- 
gaining sets. Moreover, both Aumann and Maschler and 
Mas-Colell fuzzy bargaining sets of a continuous con- 
vex cooperative fuzzy game coincide with its Aubin 
core. 


155 


€°6 | Y Hed 


156 PartA | Foundations 


6 | Y Hed 


A fuzzy game is represented as a convex program 
in [9.27]. It is shown that the optimum of the program 
determines the optimal coalitions as well as the optimal 
rewards for the players. Further, this framework seems 
to unify a number of existing representations of solu- 
tions: the core, the least core, and the nucleolus. 

Wu [9.28] investigates various types of cores based 
on the dominance among payoff vectors and the con- 


References 


cepts of the true payoff and quasi-payoff of a fuzzy 
coalition. 

Interpretational difficulties related to fuzzy games 
are pointed out by Mareš and Vlach in [9.29]. The 
authors propose an alternative model for a fuzzy coali- 
tion — a collection of crisp coalitions — and discuss some 
of its consequences. 


9.1 M.J. Osborne, A. Rubinstein: A Course in Game Theory 
(MIT Press, Cambridge 1994) 

9.2 D. Butnariu: Fuzzy games: A description of the con- 
cept, Fuzzy Sets Syst. 1, 181-192 (1978) 

9.3. J.-P. Aubin: Coeur et valeur des jeux flous à 
paiements latéraux, Comptes Rendus de l'Académie 
des Sciences Série A 279, 891-894 (1974) 

9.4 L.S. Shapley, M. Shubik: On Market Games, J. Econ. 
Theory 1, 9-25 (1969) 

9.5 B. Peleg, P. Sudhölter: Introduction to the Theory 
of Cooperative Games, Theory and Decision Library: 
Series C. Game Theory, Vol. 34, 2nd edn. (Springer, 
Berlin 2007) 

9.6 R.J. Aumann, M. Maschler: Game theoretic analysis 
of a bankruptcy problem from the Talmud, J. Econ. 
Theory 36(2), 195-213 (1985) 

9.7 L.S. Shapley: Cores of convex games, Int. J. Game 
Theory 1, 11-26 (1972) 

9.8 L.S. Shapley: A value for n-person games. In: Con- 
tributions to the Theory of Games. Vol. Il, Annals 
of Mathematics Studies, Vol. 28, ed. by H.W. Kuhn, 
A.W. Tucker (Princeton Univ. Press, Princeton 1953) 
pp. 307-317 

9.9 H.P. Young: Monotonic solutions of cooperative 
games, Int. J. Game Theory 14, 65-72 (1985) 

9.10 S. Hart, A. Mas-Colell: Potential, value and consis- 
tency, Econometrica 57, 589-614 (1989) 

9.11 R. Weber: Probabilistic values for games. In: The 
Shapley Value, ed. by A.E. Roth (Cambridge Univ. 
Press, Cambridge 1988) pp. 101-120 

9.12 J.F. Banzhaf Ill: Weighted voting does not work: 
A mathematical analysis, Rutgers Law Rev. 19, 317- 
343 (1965) 

9.13 Y. Azrieli, E. Lehrer: On some families of cooperative 
fuzzy games, Int. J. Game Theory 36(1), 1-15 (2007) 

9.14 J.-P. Aubin: Optima and Equilibria, Graduate Texts 
in Mathematics, Vol. 140, 2nd edn. (Springer, Berlin 
1998) 

9.15 D. Butnariu, T. Kroupa: Shapley mappings and 
the cumulative value for n-person games with 


fuzzy coalitions, Eur. J. Oper. Res. 186(1), 288-299 
(2008) 

9.16 R. Branzei, D. Dimitrov, S. Tijs: Models in Cooperative 
Game Theory, Lecture Notes in Economics and Math- 
ematical Systems, Vol. 556 (Springer, Berlin 2005) 

9.17 E. Lehrer: Allocation processes in cooperative games, 
Int. J. Game Theory 31(3), 341-351 (2003) 

9.18 L.S.Y. Wu: A dynamic theory for the class of games 
with nonempty cores, SIAM J. Appl. Math. 32(2), 328- 
338 (1977) 

9.19 D. Butnariu, T. Kroupa: Enlarged cores and bargain- 
ing schemes in games with fuzzy coalitions, Fuzzy 
Sets Syst. 5(160), 635-643 (2009) 

9.20 M. Tsurumi, T. Tanino, M. Inuiguchi: A Shapley func- 
tion on a class of cooperative fuzzy games, Eur. 
J. Oper. Res. 129(3), 596-618 (2001) 

9.21 D. Denneberg: Non-Additive Measure and Integral, 
Theory and Decision Library B. Mathematical and 
Statistical Methods Series, Vol. 27 (Kluwer, Dordrecht 
1994) 

9.22 G. Owen: Multilinear extensions of games, Manag. 
Sci. 18, P64—P79 (1971) 

9.23 G. Owen: Game Theory, 3rd edn. (Academic, San 
Diego 1995) 

9.24 E. Shellshear: A note on characterizing core stability 
with fuzzy games, Int. Game Theory Rev. 13(01), 105- 
118 (2011) 

9.25 W. Yang, J. Liu, X. Liu: Aubin cores and bargaining 
sets for convex cooperative fuzzy games, Int. J. Game 
Theory 40(3), 467-479 (2011) 

9.26 J. Liu, X. Liu: Fuzzy extensions of bargaining sets and 
their existence in cooperative fuzzy games, Fuzzy 
Sets Syst. 188(1), 88-101 (2012) 

9.27 M. Keyzer, C. van Wesenbeeck: Optimal coalition for- 
mation and surplus distribution: Two sides of one 
coin, Eur. J. Oper. Res. 215(3), 604-615 (2011) 

9.28 H.-C. Wu: Proper cores and dominance cores of fuzzy 
games, Fuzzy Optim. Decis. Mak. 11(1), 47-72 (2012) 

9.29 M. Mareš, M. Vlach: Disjointness of fuzzy coalitions, 
Kybernetika 44(3), 416-429 (2008) 


10 


11 


12 


13 


14 


15 


Basics of Fuzzy Sets 
János C. Fodor, Budapest, Hungary 
Imre J. Rudas, Budapest, Hungary 


Fuzzy Relations: 

Past, Present, and Future 
Susana Montes, Oviedo, Spain 
Ignacio Montes, Oviedo, Spain 
Tania Iglesias, Oviedo, Spain 


Fuzzy Implications: 

Past, Present, and Future 

Michat Baczynski, Katowice, Poland 
Balasubramaniam Jayaram, Hyderabad, 
India 

Sebastia Massanet, Palma de Mallorca, 
Spain 

Joan Torrens, Palma de Mallorca, Spain 


Fuzzy Rule-Based Systems 
Luis Magdalena, Mieres, Spain 


Interpretability of Fuzzy Systems: 
Current Research Trends and Prospects 
Jose M. Alonso, Mieres, Spain 

Ciro Castiello, Bari, Italy 

Corrado Mencar, Bari, Italy 


Fuzzy Clustering - 
Basic Ideas and Overview 
Sadaaki Miyamoto, Tsukuba, Japan 


Part B 


Part B Fuzzy Logic 


Ed. by Enrique Herrera Viedma, Luis Magdalena 


16 


17 


18 


19 


20 


An Algebraic Model of Reasoning 
to Support Zadeh's CWW 
Enric Trillas, Mieres, Spain 


Fuzzy Control 

Christian Moewes, Magdeburg, Germany 
Ralf Mikut, Eggenstein-Leopoldshafen, 
Germany 

Rudolf Kruse, Magdeburg, Germany 


Interval Type-2 Fuzzy PID Controllers 
Tufan Kumbasar, Maslak, Istanbul, Turkey 
Hani Hagras, Colchester, UK 


Soft Computing in Database 

and Information Management 

Guy De Tré, Ghent, Belgium 
Stawomir Zadrożny, Warsaw, Poland 


Application of Fuzzy Techniques 
to Autonomous Robots 

Ismael Rodríguez Fdez, Santiago de 
Compostela, Spain 

Manuel Mucientes, Santiago de 
Compostela, Spain 

Alberto Bugarín Diz, Santiago de 
Compostela, Spain 


157 


Janos C. Fodor, Imre J. Rudas 


In this chapter we summarize basic knowledge on 
fuzzy logics and fuzzy sets. After a short histori- 
cal overview of ideas strongly connected to and 
preceding the notion of fuzzy logics and fuzzy 
sets, we outline links between many-valued and 
fuzzy logics. Then fuzzy subsets of a universe are 
introduced. Interpretations of unary and binary 
connectives in fuzzy logics as appropriate functions 
(operations) on the unit interval are central to the 
approach. Fundamental knowledge on these func- 
tion classes is presented then, including results on 
triangular norms and conorms, as well as on impli- 


In everyday life we use and process vague, imprecise 
linguistic terms like young, hot, or around midnight. 
Classical mathematics is unable and inadequate to pro- 
vide models that can express the complex semantics of 
such terms. Fuzzy sets, introduced by Zadeh [10.1] on 
the basis of his observation that 


more often than not, the classes of objects en- 
countered in the real physical world do not have 
precisely defined criteria or membership, 


are appropriate for modeling the semantics of vague lin- 
guistic terms. Fuzzy sets offer a framework to deal with 
predicates whose satisfaction is a matter of degree. 
Some forerunners discussed ideas or formal def- 
initions for describing vague predicates or classes 
with imprecise boundaries, very close to the basic 
notions introduced by Zadeh [10.1]. We should men- 
tion Peirce [10.2], Russel [10.3], Lukasiewicz [10.4], 
Black [10.5], Weyl [10.6], Kaplan and Schott [10.7]. 
The mathematician Karl Menger was the first (in 1951) 
who used the term ensemble flou (the French counter- 
part for fuzzy set) in the title of a paper in French [10.8]. 
In addition, Menger’s work on probabilistic metric 
spaces also led to the introduction of so-called tri- 
angular norms and conorms, extensively studied by 


10. Basics of Fuzzy Sets 


10.1 Classical Mathematics and Logic........... 160 
10.2 Fuzzy Logic, Membership Functions, 
and Fuzzy Sets... 160 
10.3 Connectives in Fuzzy Logic ................... 161 
VO) WCAG S esea 161 
10.3.2 Triangular Norms and Conorms... 162 
10.3.3 Fuzzy implications.: 166 
10.4 Concluding Remarks .......................... 168 
RGTORENICES 5 cscs <-besacisssanconcesethaceavsedeasenecenseat 168 


cations. Our concluding remarks suggest further 
reading, beyond the basics. 


Schweizer and Sklar [10.9], and which later have turned 
out to be basic operators for fuzzy sets [10.10]. For 
more historical facts we refer to [10.11]. 

We want to emphasize that Zadeh’s motivations 
and background were quite different from those of the 
above-mentioned authors. He introduced the concept of 
a fuzzy set completely independently of their proposals 
in order to provide a tool for representing and reason- 
ing with the available information in a manner similar 
to the way humans express knowledge and summarize 
data. 

This Chapter is organized as follows. In the next 
section we briefly recall some notions from classical 
mathematics and its underlying two-valued (Boolean) 
logic. We extend this material, and in Sect. 10.3 we 
introduce key terms related to fuzzy sets. Sect. 10.4 
contains the core knowledge on interpretations of con- 
nectives in fuzzy logic and fuzzy set-theoretic oper- 
ations. This includes fundamentals of negations, tri- 
angular norms and conorms, together with the most 
important parametric families and particular opera- 
tions. Fuzzy implications are also handled in a similar 
way. Concluding remarks are given at the end, in- 
cluding several suggested literature items for further 
reading. 


159 


v 
o 

= 

= 
is) 
= 
(=) 


160 


TOL | d Hed 


Part B | Fuzzy Logic 


10.1 Classical Mathematics and Logic 


Classical mathematics is based on two-valued logic, 
in which the set of truth values consists of two ele- 
ments: {0, 1}. There are two basic binary operations A 
(AND), v (OR), and the unary complement — (NOT). 
All other logical operations, e.g. the implication >, 
the logical equivalence <>, and the exclusive or XOR, 
can be constructed from the three basic operations 
A,V,7. 

A proposition is either an atomic propositional vari- 
able pı, p2,..., or a compound expression (p A q), (pV 
q), or =p, where p and q are propositions. A propo- 
sition is either true (with truth value 1) or false (with 
truth value 0), but not both. 

A set A is a collection of objects in a given uni- 
verse X, where, for each possible object x from X, it 
either belongs to the set A (in symbols: x € A) or not 
(x Z A). A set A is a subset of B if all objects in A are in 
B as well (in symbols: A C B). We write A C Bif A CB 
and there is at least one element in B which is not in 
A. The set of all subsets of X is denoted by P(X). The 
empty set, which does not contain any object, is denoted 
by Ø. 


We consider three fundamental operations on sets. 
The intersection of two sets A and B, denoted by A N B, 
is the set of objects from X which belong both to A and 
B. The union of two sets A and B, denoted by A U B, is 
the set of objects from X which belong at least to one 
of the sets A and B. The complement of a set A, denoted 
by A‘, is the set of objects from X which do not belong 
to A. 

A function %4 : X — {0, 1} is called the character- 
istic function of the set A if 


DE 1 ifxeA 
MA) No ifxgA 


The characteristic function discriminates between mem- 
bers and nonmembers of the set A. With the help of 
characteristic functions, set operations can be expressed 
as follows 


Xans) = Zax) A xr), 
Xaus X) = Zax) V Xe), 
Nac (x) = maa). 


10.2 Fuzzy Logic, Membership Functions, and Fuzzy Sets 


The idea behind fuzzy logic is to replace the set of 
truth values {0, 1} by the entire unit interval [0, 1]. Then 
a fuzzy set on a universe X is represented by a function 
which maps each element x € X to a degree of member- 
ship from the unit interval [0, 1]. Larger values indicate 
higher degrees of membership. 

For several decades, many-valued logic was con- 
sidered as a pure mathematical topic. The introduction 
of fuzzy sets [10.1] produced a new impact to the 
investigation of many-valued logics. Informally speak- 
ing, fuzzy logic is understood as an extension of 
many-valued logics, with an ultimate goal of providing 
foundations for approximate reasoning with imprecise 
propositions using fuzzy set theory as the principal tool. 

A many-valued propositional logic in which the 
class of truth values is modelled by the unit interval 
[0, 1], and which forms an extension of the classical 
Boolean logic, i. e., the two-valued logic with truth val- 
ues {0,1}, is quite often called a fuzzy logic [10.10]. 
For sake of simplicity, it is assumed that all fuzzy log- 
ics have the same syntax, they may differ only by their 
semantics. 


A membership function u4 is a mapping from the 
universal set X to the unit interval, i.e., a :X > 
[0, 1]. Membership functions are direct generalizations 
of characteristic functions. In a logical setting, the de- 
gree of membership u(x) can also be seen as the truth 
value of the statement x is element of A. 

Notice that a membership grade can have three 
meanings: 


© Degree of similarity. The membership grade pa (x) 
represents the degree of proximity of x from proto- 
type elements of A. 

© Degree of preference. A represents a set of more or 
less preferred objects, and j14(x) represents an in- 
tensity of preference in favor of object x. 

© Degree of uncertainty. The degree j14(x) can be 
viewed as the degree of plausibility that a param- 
eter p has value x, given that all that is known about 
it is that p is A. 


These three semantics of fuzzy sets appear in the 
works of Zadeh and he was the first to propose each of 
them. 


Basics of Fuzzy Sets | 10.3 Connectives in Fuzzy Logic 


A fuzzy set on X (or a fuzzy subset of X) is defined as 
the collection of the ordered pairs of elements of X and 
their membership grades. Practically, a fuzzy set A on 
X is identified with the membership function u4. The 
family of all fuzzy subsets of X is denoted by F(X). 
Classical subsets of X are special fuzzy subsets on X, 
and are called crisp sets. Note that one may represent 
membership grades not only by the unit interval but also 
by a (partially or completely) ordered set. 

Given two fuzzy subsets A and B of X, we say that 
A is equal to B (in symbols A = B) if u4 = up, and that 
A is a subset of B (in symbols A C B) if u4 < Hp. 


10.3 Connectives in Fuzzy Logic 


In order to generalize the classical set-theoretical op- 
erations like intersection, union and complement, it is 
quite natural to use interpretations of logic connectives 
A, V and >, respectively. Indeed, the values (A N B)(x), 
(A U B)(x) and A‘(x) describe the truth values of the 
statements x is element of A AND x is element of B, 
x is element of A OR x is element of B, and x is NOT 
element of A, respectively. 

We introduce appropriate classes of functions 
N : [0,1] > [0,1], T : [0, 1]? — [0, 1] and S: [0, 1]? > 
[0, 1] in order to interpret logic operations ~, A and 
V, respectively, on the evaluation set [0, 1]. In addition, 
fuzzy implications are also introduced later on. 

Then, the complement of a fuzzy set A, the intersec- 
tion and union of fuzzy sets A, B are specified by the 
functions N, T and S, respectively, such that 


Ay (x) = N(A(Q)) , 
(ANr B)(x) = TAQ), BO) . 
(A Us B)(x) = S(A(x), B(x) , (10.1) 


where x € X, A,B € F(X). Therefore, desired proper- 
ties of fuzzy set-theoretic (or equivalently, logic) op- 
erations can be obtained through the corresponding 
properties of the above functions N, T and S. 


10.3.1 Negations 


Starting with the negation —, it is clear that its inter- 
pretation should map 1 to 0 and O to 1, in order to 
be an extension of the interpretation of the classical 
two-valued negation. Another natural property is that 
the interpretation of the negation — be a non-increasing 
function. To simplify notations, and since there is no 


As it is emphasized in [10.11], membership func- 
tions express a vertical view of fuzzy sets. Another view 
is to consider a fuzzy set as a nested family of classical 
sets, by using the notion of a-cuts. For any a € [0, 1] we 
can introduce the a-cut Aq of a fuzzy set A. By defini- 
tion, Aq is the crisp subset of X that contains all the 
elements of X that have a membership grade greater 
than or equal to the specified value a. More formally, 
Aq = {x E€ X| ua (x) = a}, œ € [0, 1]. 

In the sequel, membership functions and fuzzy sets 
will be denoted by the same symbol: we write simply 
A(x) instead of u4 (x) for A € F(X) and x € X. 


confusion possible, an interpretation of the negation — 
will also be called a negation. 


Definition 10.1 

A decreasing function N : [0, 1] — [0, 1] with N(0) = 
1, N(1) = 0 is called a negation. A strictly decreasing 
continuous negation is called a strict negation. A strict 
negation N is said to be a strong negation if N is also 
involutive: N(N (x)) = x holds for all x € [0, 1]. 


Since a strict negation N is a strictly increasing and 
continuous function, its inverse NT! is also a strict 
negation, generally different from N. Obviously, we 
have NT! = N if and only if N is involutive: N(N(x)) = 
x holds for all x € [0, 1]. This means that the graph of 
the function N is symmetric with respect to the line 
{(x, y) | x = y}. 

Another important property of a strict negation N 
is that there exists a unique value 0 < v < 1 such that 
N(v) = v. Then we also have NT! (v) = v. 

A negation which is neither strong nor strict is the 
Gédel (or intuitionistic) negation given by 


ifx=0 


1 
No = fo if x € ]0, 1] 


By duality, we can define the dual Gédel negation as 
follows 


1 ifxe [0,1 
Nac = Jo nee 


It is easy to see that for any negation N we have 


Ne SN <MNac - 


161 


€°OL| 4 Hed 


162 


€°OL| d Hed 


Part B 


Fuzzy Logic 


A strict but not strong negation can be given by N(x) = 
1-x. 

A parametric family of strong negations is defined 
as follows (see [10.12] under the name A-complement) 


=% 
1+Ax’ 


N(x) = ii), 


The standard negation N; is defined simply as 
Ns(x) =1—x, x € [0,1]. This is the most frequently 
used negation, which is obviously a strong negation. It 
plays a key role in the representation of strong nega- 
tions presented in the following theorem. In this chapter 
we call a continuous, strictly increasing function ọ : 
[0, 1] > [0,1] with g(0) =0, g(1) =1 an automor- 
phism of the unit interval. 


Theorem 10.1 

A function N : [0, 1] — [0, 1] is a strong negation if and 
only if there exists an automorphism ¢ of the unit inter- 
val such that [10.13] 


Naw =¢ '(1-¢@), xe [0,1]. (10.2) 


In this case Nọ denotes N in (10.2) and is called a g- 
transform of the standard negation. If the complement 
of fuzzy sets on X is defined by Ng, we use the short 
notation Aj, for A € F(X), instead of writing ANg: 


10.3.2 Triangular Norms and Conorms 


It is assumed that the conjunction ^, which is always 
in the tuple of connectives, is interpreted by a t-norm, 
which, in a canonical way, is a generalization of the 
interpretation of the conjunction in Boolean logic. In 
a logical sense, a t-conorm is an ideal candidate for the 
interpretation of the disjunction V, since it is a canon- 
ical extension of the interpretation of the two-valued 
disjunction. This is clear from the following definition. 


Definition 10.2 

A triangular norm (shortly: a t-norm) is a function 
T : [0, 1]? — [0, 1] which is associative, commutative 
and increasing, and satisfies the boundary condition 
T(1,x) = x for all x € [0, 1]. 

A triangular conorm (shortly: a t-conorm) is a func- 
tion S : [0, 1]? > [0, 1] which is associative, commuta- 
tive and increasing, with boundary condition S(0, x) = 
x for all x € [0, 1]. 


The class of t-norms (with slightly different axioms) 
was introduced in the theory of statistical (probabilistic) 
metric spaces as a tool for generalizing the classi- 
cal triangular inequality by Menger [10.14] (see also 
Schweizer and Sklar [10.9], Alsina etal. [10.15], and 
Klement et al. [10.10]). 

Notice that continuity of a t-norm and a t-conorm is 
not taken for granted. Even more: conditions in Def- 
inition 10.2 do not even imply that all t-norms, as 
two-place functions, are measurable (see [10.16] for 
a counter-example). 

However, the definition implies the following prop- 
erties 


T(x, y) < min(x, y) , 
S(x,y) > max(x,y) (x,y € [0,1]), 
and 


T(0,y)=0, S(1,y)=1 forall ye [0,1]. 


The smallest t-norm is the drastic product Tp given by 


0 if (x, y) € [0, 1[ 


Tı = ; f 
Dy) min(x, y) otherwise. 


The greatest (and the only idempotent) t-norm is ob- 
viously Ty = min, the minimum t-norm. Thus, for any 
t-norm T we have 


Tp <T<Tmu. 


The smallest and greatest t-norms (Tp and Ty) to- 
gether with the product t-norm Tp(x, y) = xy, and the 
Lukasiewicz t-norm Ty, given by 


Ti (x, y) = max (0,x + y— 1) 


are called basic t-norms. 
The first known left-continuous and not continuous 
t-norm is the nilpotent minimum [10.17] defined by 


0 ifx+y<1 


T, Ka i 
nM (X, y) min(x, y) otherwise. 


A t-conorm S is called the dual to the t-norm T if 
S(x,y) = 1— T(1 — x, 1 — y) holds for all x, y € [0, 1]. 

The t-conorm SĮ = max is the smallest t-conorm 
and it is dual to the greatest t-norm min. The dual to 
the drastic product is the t-conorm Sp given by 


1 if (x, y) € JO, 1] 


Sp, y) = max(x,y) otherwise. 


For each t-conorm S, we have Sm < S < Sp. 


Basics of Fuzzy Sets | 10.3 Connectives in Fuzzy Logic 


The dual t-conorm to the product Tp is called 
the probabilistic sum and it is denoted by Sp, with 
Sp(x, y) =x +y— xy. 

The Łukasiewicz t-conorm S, called also the 
bounded sum, is given by S (x, y) = min (1, x + y). 

From algebraic point of view, a function T: 
[0, 1]? — [0, 1] is a t-norm if and only if ([0, 1], T, <) 
is a fully ordered commutative semigroup with 
neutral element 1 and annihilator 0. Similarly, 
a function S: [0,1]? > [0,1] is a t-conorm if and 
only if ([0,1],S,<) is a fully ordered commuta- 
tive semigroup with neutral element 0 and annihila- 
tor 1. 

Clearly, for every t-norm T and strict negation N, 
the operation S defined by 


S(x,y) =N~"'(TING),NG))), x,y € [0,1] 


(10.3) 


is a t-conorm. In addition, if N is a strong nega- 
tion then NT! = N, and we have for x, y € [0, 1] that 
T(x, y) = N(S(N(x), N(y))). In this case S and T are 
called N-duals. In case of the standard negation (i.e., 
when N = N,) we simply speak about duals. Obviously, 
equality (10.3) expresses the De Morgan’s law. 


Definition 10.3 

A triplet (T, S, N) is called a De Morgan triplet if and 
only if T is a t-norm, S is a t-conorm, N is a strong 
negation and they satisfy (10.3). 


It is worth noting that, given a De Morgan triplet 
(T,S,N), the tuple ([0, 1],7,5,N,0,1) can never be 
a Boolean algebra [10.18]: in order to satisfy distribu- 
tivity we must have T = min and § = max, in which 
case it is impossible to have both T(x, N(x)) = 0 and 
S(x, N(x)) = 1 for all x € [0, 1]. Depending on the oper- 
ations used, one can, however, obtain rather general and 
useful structures such as, for instance, De Morgan al- 
gebras, residuated lattices, l-monoids, Girard algebras, 
MV-algebras, see [10.10]. 

There are several examples of De Morgan triplets. 
We list in Table 10.1 those ones that are related to 
the examples above. Let y be an automorphism of the 
unit interval, Ny the corresponding strong negation, and 
x,y € [0, 1]. 

A function K : [0, 1]? > [0, 1] will often be called 
a binary operation on [0,1]. For an automorphism g 
of [0, 1], the g-transform Kọ of such a K is defined by 


Ko (x, y) = 9 '(K(g(x), (y))), x,y € [0, 1]. Thus, Ta- 


ble 10.1 contains -transforms of some fundamental 
t-norms and t-conorms. 


Continuous Archimedean t-Norms 

and t-Conorms 
A broad class of problems consists of the representation 
of multi-place functions in general by composition of 
simpler functions and functions of fewer variables (see 
Ling [10.19] for a brief survey), such as 


Ka, y) = sFf@+f0)), 


where K is a two-place function and f, g are real func- 
tions. In that general framework, the representation of 
(two-place) associative functions by appropriate one- 
place functions is a particular problem. It was Abel who 
first obtained such a representation in 1826 [10.20], by 
assuming also commutativity, strict monotonicity and 
differentiability. Since Abel’s result, a lot of contribu- 
tions have been made to representations of associative 
functions (and generally speaking, of abstract semi- 
groups). 

For any x € [0, 1], any n € N, and for any associa- 
tive binary operation K on [0,1], denote x? the n-th 
power of x defined by 


NRO Sk; 
and 
x” = K(x,...,x) forn>2. 
eae 


n-times 


Definition 10.4 
A t-norm T (resp. a t-conorm S) is said to be: 


a) Continuous if T (resp. S) as a function is continuous 
on the unit interval; 
b) Archimedean if for each (x, y) €]0, 1[? there is ann € 


N such that x) < y (resp. ag >y). 


Note that the definition of the Archimedean prop- 
erty is borrowed from the theory of semigroups. 


Table 10.1 Some Nọ-dual triangular norms and conorms 
if S 
min(x, y) max(x, y) 
9 '(~@)90)) oT EW + 90) — 9@)90)) 
gy '(max(g(x) + ¢0)—1,0)) g!(min(g@) + p0), 1) 
0 if p(x) + py) S1 \max(x. y) if p(x) + GQ) <1 
min(x, y) otherwise 1 otherwise 


163 


€°OL| 4 Hed 


164 PartB 


Fuzzy Logic 


€°OL| d Hed 


We state here the representation theorem of con- 
tinuous Archimedean t-norms and t-conorms attributed 
very often to Ling [10.19]. In fact, her main theorem 
can be deduced from previously known results on topo- 
logical semigroups, see [10.21—23]. Nevertheless, the 
advantage of Ling’s approach is twofold: treating two 
different cases in a unified manner and establishing el- 
ementary proofs. 


Theorem 10.2 
A t-norm T is continuous and Archimedean if and only 
if there exists a strictly decreasing and continuous func- 
tion t : [0, 1] — [0, co] with (1) = 0 such that 
T(x,y) =X) +t0)) ye [0, 1), 
(10.4) 


where £’ is the pseudoinverse of t defined by 


—] . 
cnra JE (x) ifx < t(0) 
ee 0 otherwise. 
Moreover, representation (10.4) is unique up to a posi- 
tive multiplicative constant. [10.19] 


We say that T is generated by t if T has represen- 
tation (10.4). In this case t is said to be an additive 
generator of T. 


Theorem 10.3 
A t-conorm S is continuous and Archimedean if and 
only if there exists a strictly increasing and continuous 
function s : [0, 1] — [0, co] with s(0) = 0 such that 
Sexy) =s EAs) ye (0.1), 
(10.5) 


where s‘—! is the pseudoinverse of s defined by 


—1 . 
cna Js œ) ifx<s() 
oa) = 1 otherwise. 
Moreover, representation (10.5) is unique up to a posi- 
tive multiplicative constant. [10.19] 


We say that a continuous Archimedean t-conorm S 
is generated by s if S has representation (10.5). In this 
case s is said to be an additive generator of S. 

Remark that Aczél published the representation of 
strictly increasing, continuous and associative two- 


place functions on open or half-open real inter- 
vals [10.24-26]. This was the starting point to be 
generalized by Ling [10.19]. 


Definition 10.5 

We say that a t-norm T has zero divisors if there exist 
x, y €]0, 1[ such that T(x, y) = 0. T is said to be positive 
if x, y > Oimply T(x, y) > 0. A t-norm T or at-conorm S 
is called strict if it is a continuous and strictly increasing 
function in each place on ]0, 1[?. T is called nilpotent 
if it is continuous and Archimedean with zero divi- 
sors. Triangular conorms which are duals of nilpotent 
t-norms are also called nilpotent. 


The representation theorem of t-norms (resp. t- 
conorms) does not indicate any condition on the value 
of a generator function at O (resp. at 1). On the basis 
of this value, one can classify continuous Archimedean 
t-norms (resp. t-conorms) as it is stated in the following 
theorem. 


Theorem 10.4 

Let T be a continuous Archimedean t-norm with ad- 
ditive generator t, and S be a continuous Archimedean 
t-conorm with additive generator s. Then: 


a) T is nilpotent if and only if t(0) < +00; 
b) T is strict if and only if t(0) = lim,—,9 t(x) = +00; 
c) Sis nilpotent if and only if s(1) < +00; 
d) Sis strict if and only if s(1) = lim,_; s(x) = +00. 


Using the general representation theorem of contin- 
uous Archimedean t-norms, we can give another form 
of representation for a class of continuous t-norms with 
zero divisors. More exactly, for continuous t-norms T 
such that T(x, N(x)) = 0 holds with a strict negation N 
for all x € [0, 1]. Such t-norms are Archimedean, as it 
was proved in [10.27]. The following theorem is estab- 
lished after [10.28]. 


Theorem 10.5 

A continuous t-norm T is such that T(x, N(x)) = 0 
holds for all x € [0, 1] with a strict negation N if and 
only if there exists an automorphism ¢ of the unit inter- 
val such that for all x, y € [0, 1] we have 


T(x, y) = 9 | (max{g(x) + p0) — 1, 0}) 
and 


N(x) <p- pa). 


(10.6) 


Basics of Fuzzy Sets | 10.3 Connectives in Fuzzy Logic 


As a consequence, we obtain that any nilpotent t- 
norm is isomorphic to (i.e., is a g-transform of) the 
Lukasiewicz t-norm Ty (x, y) = max(x+ y— 1,0). Sim- 
ilarly, any strict t-norm T is isomorphic to the algebraic 
product: there is an automorphism ø of [0, 1] such that 
T, y) = 9 '(g(2)9(0)), for all x, y € [0, 1]. 

Similar statements can be proved for t-conorms, 
see [10.29] for more details. For instance, any strict t- 
conorm is isomorphic to the probabilistic sum Sp, and 
any nilpotent t-conorm is isomorphic to the bounded 
sum SL. 


Continuous t-Norms and t-Conorms 
Suppose that {[a@m, Bml}mem is a countable family of 
non-overlapping, closed, proper subintervals of [0,1]. 
With each [a,,, Em] associate a continuous Archimedean 
t-norm Tp. Let T be a function defined on [0, 1]? by 


T(x, y) 
x—-a y—-a 
Qin tr (Bn E Am) Tm ( “ > u ) 
RE Bn an Bn Tan 
7 if (x, y) € [Om, Bnl? 
min(x, y) otherwise . 


(10.7) 


Then T is a continuous t-norm. In this case T 
is called the ordinal sum of {([om, Pm], Tm)}mem and 
each T„ is called a summand. 

Similar construction works for t-conorms S. Just re- 
place T,,, with a continuous Archimedean t-conorm Sm, 
and min with max in (10.7). Thus defined S is 
a continuous t-conorm, called the ordinal sum of 
{([Om, Pml, Sm)}mem, Where each Sm is called a sum- 
mand. 

Assume now that T is a continuous t-norm. Then, T 
is either the minimum, or T is Archimedean, or there 
exist a family {([Qm,Bm],7Tm)tmem With continuous 
Archimedean summands T, such that T is equal to the 
ordinal sum of this family, see [10.19, 22]. It has also 
been proved there that a continuous t-conorm S is either 
the maximum, or Archimedean, or there exist a fam- 
ily {([Oms Bin], Sm)}mem With continuous Archimedean 
summands S,, such that S is the ordinal sum of this 
family. 


Parametric Families of Triangular Norms 
We close this subsection by giving taste of the wide 
variety of parametric t-norm families. For a com- 
prehensive list with detailed properties please look 
in [10.10]. 


Frank t-norms {T}}y<[0,00]- Let A>0,A #1 be 
a real number. Define a continuous Archimedean t- 
norm TF in the following way 


es) =g (1+ EE) 


x (x,y € [0, 1]). 


We can extend this definition for A = 0, A = 1 and 
à = œ by taking the appropriate limits. Thus we get 
T 7 TE and em as follows 


TË (x, y) = lim TY (x, y) = min{x, y} , 
A—>0 

TE (x,y) = lim y) =x, 
A->1 


TE (x, y)= aim TË (x,y) = max{x+y-—1,0}. 


Each TY is a strict t-norm for À €]0, co[. The corre- 
sponding additive generators i are given by 


—logx ifA=1 

10) = —log ¿= if A €]0,00f, A 1 
The family {TF }A€[0,00] is called the Frank family of 
t-norms (see Frank [10.30]). Note that members of this 
family are decreasing functions of the parameter A (see 
e.g. [10.31]). 

The De Morgan law enables us to define the Frank 
family of t-conorms {Ss} A€[0,00] by 


Shy) = 1-TH(1—x, 1—y) 


for any A € [0, oo]. 
In [10.30] one can find the following interesting 
characterization of these parametric families. 


Theorem 10.6 
A t-norm T and a t-conorm S satisfy the functional 
equation 


T(x, y) +S, y) =x+y (x,y €[0,1]) (10.8) 


if and only if 


a) there is a number A € [0, co] such that T = TY and 
S= Ss or 

b) T is representable as an ordinal sum of t-norms, 
each of which is a member of the family {Ti} 0< 
À < oo, and S is obtained from T via (10.8). [10.30] 


165 


€°OL| 4 Hed 


166 


€°OL| d Hed 


Part B | Fuzzy Logic 
Hamacher t-norms {TH} 2€[0,00] - Let us define three the unit interval [0,1]. As in the case of the nega- 
parameterized families of t-norms, t-conorms and tion, we call an interpretation of the implication > 
strong negations, respectively, as follows. also an implication (or sometimes, a fuzzy implication). 
A comprehensive study of fuzzy implications can be 
TH(x,y) = a a ash found in the book [10.34]. 
A+(L-A)(x+y—xy) In a very broad sense, any function /: [0, 1]? > 
F x+y + (B— Ixy [0, 1] which is decreasing/increasing and preserves the 
Sg, y= 1+ bxy >» p2-l, values of the crisp implication on {0, 1} is considered 
{=z as a fuzzy implication. 
N, (x) = , yor. 
l1+yx 


Hamacher proved the following characterization 
theorem [10.32]. 


Theorem 10.7 
(T, S, N) is a De Morgan triplet such that 


T(x, y) = T(x,z) => y=z, 

S(x,y) = SQ, 2) => y =z, 

Vz<x Jy,y such that T(x, y) =z, 
S(z.y) =x 


and T and S are rational functions if and only if there are 
numbers A > 0, 6 > —1 and y > —1 such that A = 4 
and T = T}, S = S% and N = Ny. 


Remark that another characterization of the 
Hamacher family of t-norms with positive parameter 
has been obtained in [10.33] as solutions of a functional 
equation. 


Dombi t-norms {T)}j<€[0,00]- The formula for this t- 
norm family is given by 


Tp (x, y) ifA =0 
Tm (x, y) if à = œ 
T? (x,y) = ' 
a03) l if 4 €]0, cof 


OO 


Essential properties of these t-norms and other well- 
known families can be found in [10.31]. 


10.3.3 Fuzzy Implications 


Turning to the interpretation of the implication —> in 
fuzzy logics, it becomes apparent that there are sev- 
eral logical formulae which, in the Boolean two-valued 
logic, are equivalent to the implication, but give rise 
to different interpretations when replacing {0,1} by 


Definition 10.6 

A function I: [0, 1]? > [0, 1] is called a fuzzy implica- 
tion if and only if it satisfies the following conditions: 
I1. 7(0,0) = 7(0, 1) =7(1, 1) = 1; 701, 0) = 0. 

12. If x < z then I(x, y) > I(z, y) for all y € [0, 1]. 

13. If y < t then I(x, y) < I(x, t) for all x € [0, 1]. 


The reason behind I1 is obvious, while a fuzzy im- 
plication is required to be decreasing/increasing (i. e., I2 
and I3 should be satisfied) because it measures that the 
consequent is more true than the antecedent [10.35]. 

Clearly, a fuzzy implication 7 has the following 
properties (as a consequence of the definition): 


14. I(0,x) = 1 for all x € [0, 1]. 
15. I(x, 1) = 1 for all x € [0, 1]. 


Note that originally we defined a fuzzy implica- 
tion in a slightly different form [10.29, Definition 1.15], 
which is equivalent to Definition 10.6. 

Further properties may be required for a fuzzy 
implication that can be important also in some appli- 
cations: 


16. I(1,x) = x for all x € [0, 1]. [10.36] 

17. I(x, I,z))=1I,I(x,z)) for all x,y,z e€ [0,1]. 
[10.36] 

I8. x< y if and only if I(x, y) = 1 for all x, y € [0, 1]. 
[10.37] 

19. N(x) = I(x, 0) is a strong negation (x € [0, 1]). 

110. I(x, y) > y for all x, y € [0, 1]. [10.38] 

I11. (x, x) = 1 for all x € [0, 1]. [10.39] 

112. I(x, y) = (NO), N(x)) with a strong negation N, 
for all x, y € [0, 1]. 


113. J is a continuous function. 


Property I6 yields that tautology cannot justify any- 
thing. Condition I7 is called the exchange principle, and 
is based on the following equivalence: 


if P; then (if Pz then P3) <=> if (P; AND P2) 
then P3 . 


Basics of Fuzzy Sets | 10.3 Connectives in Fuzzy Logic 


I8 expresses that implication defines an ordering, 
I9 reflects that P —> Q = —P if Q is false. I10 is the 
numerical counterpart of P —> (Q —> P). I11 is called 
the identity principle and it yields that P — P is always 
true. I12, the contraposition law (or in other words, the 
contrapositive symmetry), expresses a relationship be- 
tween modus ponens and modus tollens, see [10.35]. 
In general, this is a strong condition, see [10.17]. 113 
prevents implication from reacting in a chaotic way 
to a small change of the truth value of either the 
antecedent or the consequent. This is also a fairly re- 
strictive condition. 


Fuzzy Implications Defined by t-Norms, 

t-Conorms and Negations 
To be consistent, implications and conjunctions (or 
implications and disjunctions) cannot be studied inde- 
pendently. Thus, we introduce two particular classes 
of fuzzy implications based on t-norms, t-conorms 
and negations. These were identified in [10.36, 40- 
44]. 

For a left-continuous t-norm T, its T-residuum 
[10.10] Zr generalizes the Boolean implication, we pre- 
fer the name R-implication for Ir (see the next definition 
and the Remark after that). 

Another way to introduce an implication (called 
S-implication) which is an extension of the Boolean im- 
plication is to exploit the fact that, in a two-valued logic, 
the formulae p — q and >p V q are equivalent. 


Definition 10.7 
Suppose (T, S, NV) is a De Morgan triplet. 

An R-implication Ir associated with the t-norm T is 
defined by 


Ir(x, y) = sup{z|T(x, z) <y} (wy €[0, 1). 
(10.9) 


An S-implication Is associated with the t- 
conorm S and the strong negation N is defined by 


Ts. n(x, y) = S(N(x),y) (x, y € [0, 1]) . (10.10) 


It is easy to see that both Jr and Js y satisfy prop- 
erties I1-I3 for any t-norm T, t-conorm S and strong 
negation N, thus they are fuzzy implications. Note that 
if T is a continuous Archimedean t-norm with additive 
generator ¢ then 


Ir(x, y) = t! (max{t(y) — 2), 0}) Gy € (0, 1). 


Let us emphasize an important link between R- 
implications defined by left-continuous t-norms, and 
residuums in lattice-ordered monoids. 

Assume that L is a non-empty set, (L,~<) is 
a lattice and (L,*) is a semigroup with neutral ele- 
ment. We introduce some definitions, for more details 
see [10.45]. 


i) The triplet (L,*,~<) is called a lattice-ordered 
monoid (or an l-monoid) if for all x,y,z € L we 
have: 

LMI) xx*(yVz) = (x*y) V (x * 2), 
LM2) (xVy)*z=(**z) V (y*2Z). 

ii) An l-monoid (L, x, <) is said to be commutative if 
the semigroup (L, *) is commutative. 

iii) A commutative l-monoid (L, x, <) is called a com- 
mutative, residuated I-monoid if there exists a fur- 
ther binary operation —> * on L, i.e., a function 
—>x*:L? —> L (the *-residuum), such that for all 
x,y,z E L we have (R) x* y Xz if and only if x < 
You Z. 

iv) An l-monoid (L, *, <) is called integral if there is 
a greatest element in the lattice (L, <) (often called 
the universal upper bound) which coincides with the 
neutral element of the semigroup (L, *). 


It is evident that ([0, 1], T, <) is a commutative in- 
tegral l-monoid if and only if the function T : [0, 1]? > 
[0, 1] is a t-norm. It turns out that the left-continuity 
of a t-norm can be characterized by the fact that the 
corresponding |-monoid is residuated. In this case the 
T-residuum Jr is given by (10.9), see [10.10]. Be- 
cause of its interpretation in [0, 1]-valued logics, the 
T-residuum is also called a residual implication (or 
briefly, an R-implication). 

Given a left-continuous t-norm T, the R-implication 
Tr is left-continuous in its first and right-continuous in 
its second argument, and it is continuous if and only if 
the underlying t-norm is nilpotent [10.29, 46]. 

For the sake of completeness we mention a third 
type of connectives used in quantum logic and called 
QL-implication defined as follows 


Irs x,y) = S(N (x), T(x, y)) (x,y € [0, 1]). 


(10:11) 


For the idea behind QL-implications, see [10.47]. In 
general, Ir, S, N violates property I2, so it is not a fuzzy 
implication in the sense of Definition 10.6. Conditions 
under that I2 is satisfied can be found in [10.17]. 


167 


€°OL| 4 Hed 


168 


OL | d Hed 


Part B 


Fuzzy Logic 


Negations Defined by Implications 
As we have seen, several types of negations can be 
introduced in fuzzy logic. The link between fuzzy im- 
plications and negations can be expressed by requiring 
that the function N defined by 

N(x) =1(x,0) forallxe [0,1], 
be a negation, where / is a fuzzy implication. This is 
motivated by the corresponding classical rule. 

Suppose that 7 : [0, 1]? > [0, 1] is a function satisfy- 
ing I3, I7, I8 and define N(x) = I(x, 0) (x € [0, 1]). Then 
(a) N is a negation; (b) x < N(N(x)) for all x € [0, 1]; 


10.4 Concluding Remarks 


The study of fuzzy implications, triangular norms and 
their extensions is a never ending story. During such 
research, fundamental new properties and classes have 
been discovered, essential results have been proved and 
applied to diverse problem classes. These are beyond 
the goal of the present chapter. Nevertheless, we name 
just a few directions. 

Firstly, we mention uninorms [10.48,49], a joint 
extension of both t-norms and t-conorms, with neu- 
tral element being an arbitrary number between 0 and 
1. Their study includes the Frank functional equa- 
tion [10.50], their residual operators [10.51], differ- 
ent extensions [10.52,53], characterizing their math- 
ematical properties [10.54] and important subclasses 
such as idempotent [10.55] and representable uninorms 


(c) N(N(N(x))) = N(x) for all x € [0, 1]. If, in addition, 
N is continuous then it is involutive [10.29]. Thus, un- 
der the above conditions, N cannot be a noninvolutive 
strict negation: it is either discontinuous or a strong 
negation. If N is continuous then Z fulfils 112 with 
this N. 

For a positive t-norm T (like min or the algebraic 
product), the negation obtained via its R-implication is 
not continuous at all. In fact, in this case we have that 


1 ifx=0 
Ir (x, 0) = 


0: eso: VU 


[10.56]. It turns out that some special uninorms have al- 
ready been hidden, without using the name uninorm, in 
the classical expert system MYCIN [10.57]. 

Secondly, some recent papers on fuzzy implications 
are briefly listed, in which the interested reader can 
find further references. After the book [10.34] was pub- 
lished, several important contributions have been made 
by the authors themselves, like [10.58]. Some algebraic 
properties of fuzzy implications such as distributiv- 
ity [10.59] and contrapositive symmetry [10.60], the 
law of importation or the exchange principle [10.61], 
typically in the form of functional equations, have also 
been studied intensively. New construction methods 
have also been introduced and deeply studied [10.62- 
65]. 


References 
10.1 L. Zadeh: Fuzzy sets, Inf. Control 8, 338-353 (1965) 10.7 A. Kaplan, H.F. Schott: A calculus for empirical 
10.2 C. Hartshorne, P. Weiss (Eds.): Principles of Phi- classes, Methods III, 165-188 (1951) 
losophy, Collected Papers of Charles Sanders Peirce 10.8 K. Menger: Ensembles flous et fonctions aleatoires, 
(Harvard University Press, Cambridge 1931) C. R. Acad. Sci. Paris 232, 2001-2003 (1951) 
10.3 B. Russell: Vagueness, Austr. J. Philos. 1, 84-92 10.9 B. Schweizer, A. Sklar: Probabilistic Metric Spaces 
(1923) (North-Holland, Amsterdam 1983) 
10.4 J. tukasiewicz: Philosophical remarks on many- 10.10 E.P. Klement, R. Mesiar, E. Pap: Triangular Norms 
valued systems of propositional logic. In: Selected (Kluwer, Dordrecht 2000) 
Works, Studies in Logic and the Foundations of 10.11 D. Dubois, W. Ostasiewicz, H. Prade: Fuzzy sets: His- 
Mathematics, ed. by L. Borkowski (North-Holland, tory and basic notions. In: Fundamentals of Fuzzy 
Amsterdam 1970) pp. 153-179 Sets, (Kluwer, Dordrecht 2000), Chap. 1, p. 21-124 
10.5 M. Black: Vagueness, Philos. Sci. 4, 427-455 (1937) 10.12 M. Sugeno: Fuzzy measures and fuzzy initegrals: 
10.6 H. Weyl: The ghost of modality. In: Philosophical A survey. In: Fuzzy Automata and Decision Pro- 


Essays in Memory of Edmund Husserl, ed. by M. Far- 
ber (Cambridge, Cambridge 1940) pp. 278-303 


cesses, ed. by G.N. Saridis, M.M. Gupta, B.R. Gaines 
(North-Holland, Amsterdam 1977) pp. 89-102 


Basics of Fuzzy Sets 


References 


10.13 


10.14 


10.15 


10.16 


10.17 


10.18 
10.19 


10.20 


10.21 


10.22 


10.23 


10.24 


10.25 


10.26 


10.27 


10.28 


10.29 


10.30 


10.31 


10.32 


10.33 


10.34 


E. Trillas: Sobre funciones de negación en la teori 
a de conjuntos difusos, Stochastica III, 47-60 
(1979) 

K. Menger: Statistical metric spaces, Proc. Natl. 
Acad. Sci. USA 28, 535-537 (1942) 

C. Alsina, M.J. Frank, B. Schweizer: Associative 
Functions: Triangular Norms and Copulas (Word 
Scientific, Hoboken 2006) 

E.P. Klement: Operations on fuzzy sets - An ax- 
iomatix approach, Inf. Sci. 27, 221-232 (1982) 

J.C. Fodor: Contrapositive symmetry of fuzzy impli- 
cations, Fuzzy Sets Syst. 69, 141-156 (1995) 

R. Sikorski: Boolean Algebras (Springer, Berlin 1964) 
C.H. Ling: Representation of associative functions, 
Publ. Math. Debr. 12, 189-212 (1965) 

N.H. Abel: Untersuchung der Fuctionen zweier 
unabhängig verdnderlichen Grössen x und y 
wie f(x,y), welche die Eigenschaft haben, dass 
F(z, f(%, y)) eine symmetrische Function von x, y 
und zist, J. Reine Angew. Math. 1, 11-15 (1826) 
W.M. Faucett: Compact semigroups irreducibly con- 
nected between two idempotents, Proc. Am. Math. 
Soc. 6, 741-747 (1955) 

P.S. Mostert, A.L. Shields: On the structure of semi- 
groups on a compact manifold with boundary, 
Annu. Math. 65, 117-143 (1957) 

A.B. Paalman-de Mirinda: Topological Semigroups, 
Technical Report (Mathematisch Centrum, Amster- 
dam 1964) 

J. Aczél: Uber eine Klasse von Funktionalgleichun- 
gen, Comment. Math. Helv. 54, 247-256 (1948) 

J. Aczél: Sur les opérations définies pour des nom- 
bres réels, Bull. Soc. Math. Fr. 76, 59-64 (1949) 

J. Aczél: Lectures on Functional Equations and their 
Applications (Academic, New York 1966) 

S. Ovchinnikov, M. Roubens: On fuzzy strict pref- 
erence, indifference and incomparability relations, 
Fuzzy Sets Syst. 47, 313-318 (1992) 

S. Ovchinnikov, M. Roubens: On strict pref- 
erence relations, Fuzzy Sets Syst. 43, 319-326 
(1991) 

J. Fodor, M. Roubens: Fuzzy Preference Modelling 
and Multicriteria Decision Support (Kluwer, Dor- 
drecht 1994) 

M.J. Frank: On the simultaneous associativity of 
F(x,y) and x+ y — F(x, y), Aeq. Math. 19, 194-226 
(1979) 

E.P. Klement, R. Mesiar, E. Pap: A characterization 
of the ordering of continuous t-norms, Fuzzy Sets 
Syst. 86, 189-195 (1997) 

H. Hamacher: Über logische Aggrationen nicht- 
binär explizierter Entscheidungskriterien; Ein ax- 
iomatischer Beitrag zur normativen Entschei- 
dungstheorie (Fischer, Frankfurt 1978) 

J.C. Fodor, T. Keresztfalvi: Characterization of the 
Hamacher family of t-norms, Fuzzy Sets Syst. 65, 
51-58 (1994) 

M. Baczyński, B. Jayaram: Fuzzy Implications 
(Springer, Berlin 2008) 


10.35 


10.36 


10.37 


10.38 


10.39 


10.40 


10.41 


10.42 


10.43 


10.44 


10.45 


10.46 


10.47 


10.48 


10.49 


10.50 


10.51 


10.52 


10.53 


P. Smets, P. Magrez: Implication in fuzzy logic, Int. 
J. Approx. Reason. 1, 327-347 (1987) 

E. Trillas, L. Valverde: On some functionally ex- 
pressable implications for fuzzy set theory, Proc. 
3rd Int. Seminar on Fuzzy Set Theory (Johannes Ke- 
pler Universität, Linz 1981) pp. 173-190 

B.R. Gaines: Foundations of fuzzy reasoning, Int. 
J. Man-Mach. Stud. 8, 623-668 (1976) 

R.R. Yager: An approach to inference in approxi- 
mate reasoning, Int. J. Man-Mach. Stud. 13, 323- 
338 (1980) 

W. Bandler, L.J. Kohout: Fuzzy power sets and fuzzy 
implication operators, Fuzzy Sets Syst. 4, 13-30 
(1980) 

E. Trillas, L. Valverde: On implication and in- 
distinguishability in the setting of fuzzy logic. 
In: Management Decision Support Systems us- 
ing Fuzzy Sets and Possibility Theory, ed. by 
J. Kacprzyk, R.R. Yager (Verlag TÜV Rheinland, Köln 
1985) pp. 198-212 

H. Prade: Modèles mathématiques de |'imprécis 
et de l'incertain en vue d'applications au raison- 
nement naturel, Ph.D. Thesis (Université P. 
Sabatier, Toulouse 1982) 

S. Weber: A general concept of fuzzy connectives, 
negations and implications based on t-norms and 
t-conorms, Fuzzy Sets Syst. 11, 115-134 (1983) 

D. Dubois, H. Prade: Fuzzy logics and the general- 
ized modus ponens revisited, Int. J. Cybern. Syst. 
15, 293-331 (1984) 

D. Dubois, H. Prade: Fuzzy set-theoretic differ- 
ences and inclusions and their use in the analy- 
sis of fuzzy equations, Control Cybern. 13, 129-145 
(1984) 

G. Birkhoff: Lattice Theory, Collected Publications, 
Vol. 25 (Am. Math. Soc., Providence 1967) 

U. Bodenhofer: A Similarity-Based Generalization 
of Fuzzy Orderings, Schriften der Johannes-Kepler- 
Universitat Linz, Vol. 26 (Universitatsverlag Rudolf 
Trauner, Linz 1999) 

D. Dubois, H. Prade: Fuzzy sets in approximate 
reasoning, part 1: Inference with possibility distri- 
butions, Fuzzy Sets Syst. 40, 143-202 (1991) 

R.R. Yager, A. Rybalov: Uninorm aggregation oper- 
ators, Fuzzy Sets Sys. 80, 111-120 (1996) 

J.C. Fodor, R.R. Yager, A. Rybalov: Structure of uni- 
norms, Int. J. Uncertain. Fuzziness Knowl.-Based 
Syst. 5(4), 411-427 (1997) 

T. Calvo, B. De Baets, J. Fodor: The functional 
equations of frank and alsina for uninorms and 
nullnorms, Fuzzy Sets Syst. 120, 385-394 (2001) 

B. De Baets, J. Fodor: Residual operators of uni- 
norms, Soft Comput. 3, 89-100 (1999) 

M. Mas, G. Mayor, J. Torrens: T-operators and uni- 
norms on a finite totally ordered set, Int. J. Intell. 
Syst. 14, 909-922 (1999) 

M. Mas, M. Monserrat, J. Torrens: On left and right 
uninorms, Int. J. Uncertain. Fuzziness Knowl.- 
Based Syst. 9, 491-507 (2001) 


169 


OL| d Hed 


170 PartB | Fuzzy Logic 
10.54 M. Mas, G. Mayor, J. Torrens: The modularity condi- 10.60 M. Baczyński, F. Qin: Some remarks on the distribu- 
o tion for uninorms and t-operators, Fuzzy Sets Syst. tive equation of fuzzy implication and the con- 
a 126, 207-218 (2002) trapositive symmetry for continuous, archimedean 
= 10.55 B. De Baets: Idempotent uninorms, Eur. J. Oper. t-norms, Int. J. Approx. Reason. 54, 290-296 
=à Res. 118, 631-642 (1999) (2013) 
© 10.56 J. Fodor, B. De Baets: A single-point characteri- 10.61 S. Massanet, J. Torrens: The law of importation ver- 
zation of representable uninorms, Fuzzy Sets Syst. sus the exchange principle on fuzzy implications, 
202, 89-99 (2012) Fuzzy Sets Syst. 168, 47-69 (2011) 
10.57 B. De Baets, J.C. Fodor: Van melle?s combining 10.62 S. Massanet, J. Torrens: On a new class of fuzzy im- 
function in mycin is a representable uninorm: plications: h-implications and generalizations, Inf. 
An alternative proof, Fuzzy Sets Syst. 104, 133-136 Sci. 181, 2111-2127 (2011) 
(1999) 10.63 S. Massanet, J. Torrens: On some properties of 
10.58 B.Jayaram, M. Baczyński, R. Mesiar: R-implications threshold generated implications, Fuzzy Sets Syst. 
and the exchange principle: The case of border 205, 30-49 (2012) 
continuous t-norms, Fuzzy Sets Syst. 224, 93-105 10.64 S. Massanet, J. Torrens: Threshold generation 
(2013) method of construction of a new implication from 
10.59 M. Baczyński: On two distributivity equations for two given ones, Fuzzy Sets Syst. 205, 50-75 (2012) 
fuzzy implications and continuous, archimedeant- 10.65 S. Massanet, J. Torrens: On the vertical threshold 


norms and t-conorms, Fuzzy Sets Syst. 211, 34-54 
(2013) 


generation method of fuzzy implication and its 
properties, Fuzzy Sets Syst. 206, 32-52 (2013) 


11. Fuzzy Relations: Past, Present, and Future 


Susana Montes, Ignacio Montes, Tania Iglesias 


Relations are used in many branches of mathe- 
matics to model concepts like is lower than, is 
equal to, etc. Initially, only crisp relations were 
considered, but in the last years, fuzzy relations 
have been revealed as a very useful tool in psy- 
chology, engineering, medicine, economics or any 
mathematically based field. A first approach to the 
concept of fuzzy relations is given in this chapter. 
Thus, operations among fuzzy relations are defined 
in general. When considering the particular case 
of fuzzy binary relations, their main properties are 
studied. Also, some particular cases of fuzzy binary 
relations are considered and related among them. 
Of course, this chapter is just a starting point to 
study in detail more specialized literature. 


1.1 Fuzzy Relations ..................cccccccccceeeeeee 172 
1.11 Operations on Fuzzy Relations... 172 

11.1.2 Specific Operations 
on Fuzzy Relations................... 173 


The notion of relation plays a central role in various 
fields of mathematics. As a consequence of that, it is 
a very important concept in all engineering, science, 
and mathematically based fields. 

Crisp or classical relations show a problem; they do 
not allow to express partial levels of relationship among 
two elements. This is a problem in many practical sit- 
uations since not always an element is clearly related 
to another one. The valued theory arises with the aim 
of allowing to assign degrees to the relations between 
alternatives. As it is well known, according to fuzzy 
set theory [11.1], the connection established among two 
alternatives admits different degrees of intensity and 
that intensity is represented by a value in the interval 
[0, 1]. The idea of working with values different from 0 
and | to express the relationship between two elements 
was already considered by Lukasiewicz in the 1920s 


11.2 Cut Relations...........00.0cc eee 174 
11.3. Fuzzy Binary Relations........................ 174 
TSA = RET OMI oa. scse.cepctadeesanctenepase 174 
W32 Mele eraa 175 
TES SVME eenia 175 
1.3.4 Antisymmetry ........ eee 175 
TE3:5 . . ASYMMEWY iirrainn 175 
113:6  Transttivity serisirenocsrosistiss 176 
1.3.7 Negative Transitivity ................. 177 
1.3.8 Semitransitivity...............0...0- if? 
11.3.9 Completeness ..............0..:ceee 177 
TRIO Kimea os ccsstesscassereseieaseaenedens 179 
11.4 Particular Cases 
of Fuzzy Binary Relations..................... 179 
1.4.1 Similarity Relation ...0........ 179 
WA? Fuzzy 0rdëf oc..eccessescdetsawaisnsenn 179 
11.5 Present and Future 
of Fuzzy Relations......................00..c006 180 
Referatas: oeodna uuas 180 


when he introduced his three-valued logic, and later by 
Luce [11.2] or Menger [11.3], but it was Zadeh [11.4] 
who formally defined the concept of a fuzzy (or multi- 
valued) relation. 

In the history of fuzzy mathematics, fuzzy relations 
were early considered to be useful in various appli- 
cations: fuzzy modeling, fuzzy diagnosis, and fuzzy 
control. They also have applications in fields such as 
psychology, medicine, economics, and sociology. For 
this reason, they have been extensively investigated. For 
a contemporary general approach to fuzzy relations one 
should look at Bělohláveks book [11.5], and also to 
other general publications, as for instance the books by 
Klir and Yuan [11.6] and Turunen [11.7]. 

Since this chapter is entirely devoted to fuzzy rela- 
tions, our aim is to give a detailed introduction to them 
for a nonexpert reader. 


171 


uu 
o 

= 

Co 
w 
= 
= 


172 Part B | Fuzzy Logic 


LLL | d Hed 


11.1 Fuzzy Relations 


Assume that X and Y are two given sets. A fuzzy relation 
R is a mapping from the Cartesian product X x Y to the 
interval [0, 1]. Therefore, R is basically a fuzzy set in 
the universe X x Y. This means that, for any x € X and 
any y € Y, the value R(x, y) measures the strength with 
which R connects x with y. If R(x, y) is close to 1, x is 
related to y by R. If R(x, y) is a value close to 0, then it 
hardly connects x with y and so on. 

This definition can be extended to the Cartesian 
products of more than two sets and then they are called 
n-ary fuzzy relations. Note that fuzzy sets may be 
viewed as degenerate, l-ary fuzzy relations. 


Example 11.1 
If we consider the case X = Y = [0, 3], we could define 
the fuzzy relation approximately equal to as follows 


R(x, y) =e7!,  Y(x,y) € [0,3] x [0, 3] 
which is represented in Fig. 11.1. 


The domain of a fuzzy relation R is a fuzzy set on 
X, whose membership function is given by 


dom R(x) = sup R(x, y) 
yEY 


and the range is given by 


ran R(x) = sup R(x, y) . 


xEX 


When X and Y are finite sets, we can consider a ma- 
trix representation for any fuzzy relation. The entry on 
the line x and column y of the associated matrix is the 
value R(x, y). 


Example 11.2 

Let us consider the set X formed by three papers, X = 
{P1,P2,P3}, and let Y be a set formed by five different 
topics Y = {t), fo, t3, t4, ts}. The fuzzy relation R mea- 
suring the degree of relationship of any paper with any 
topic is defined by 


R ti lo f3 t4 ts 


pı 10 07 09 04 0.2 
p 05 08 10 0.3 0.9 
p3 9.7 05 0.8 0.3 0.8 


and its domain is given by the fuzzy subset of X 


dom R = {(p1, 1), (p2, 1), (p3.0.8)}. 
and its range by 


ran R = {(t;, 1), (t2, 0.8), (t3, 1), (t4, 0.4), (t5,0.9)}. 


11.1.1 Operations on Fuzzy Relations 


All concepts and operations applicable to fuzzy sets 
are applicable to fuzzy relations as well. Thus, for any 
fuzzy relations R and Q on the Cartesian product X x Y, 
we have 


@ Given a t-norm T (for a complete study about 
t-norms and f-conorms, we refer to [11.8]), the 
T-intersection (or just intersection if there is not am- 
biguity) of R and Q is the fuzzy relation on X x Y 
defined by 


RO rz Q(x, y) = T(R(x, y), Q(x, y)) , 
V(x,y) EXXxY. 


Fig. 11.1 Membership function of the fuzzy re- 
lation R (approximately equal to) introduced in 
Example 11.1 


Fuzzy Relations: Past, Present, and Future | 11.1 Fuzzy Relations 


Initially, T was considered to be the minimum oper- 
ator. 

Given a ft-conorm S, the S-union of R and Q is the 
fuzzy relation on X x Y defined by 


RUs Q(x, y) = S(R(x, y), Q(x, y)) , 
V(x, y) EXxY. 


At the initial proposal, the maximum f-conorm was 
considered. 

The transpose or inverse of the fuzzy relation R, de- 
noted as in the classical case by R~!, is the fuzzy 
relation that satisfies 


R7'(x,y) =RO.x), Vy) EXXY. 


fuzzy relation defined by 
R OT Q(x, y) m Sup T(R(x, y), Oy, z)) , 
ye 
V(x,z)EXxZ. 


Due to associativity and nondecreasingness of the 
t-norms, the following result can be easily proven. 


Proposition 11.1 
Let R, Q, and P be the three fuzzy relations on X x Y, 
Y x Z, Z x U, respectively. Then: 


i) Ror (Qoz P) = (Ror Q) or P, 
ii) If R’ is another fuzzy relation on X x Y such that 
RCR, then Ror Q CR’ or Q. 


@ The complement of a fuzzy relation is not unique. 
It depends on the negator n we choose. The n- 
complement of the fuzzy relation R, denoted by R°, 
is the fuzzy relation defined by 


R(x, y) = n(R(x,y)), Wx y) Ee Xx Y. 


Although the definition is given for any negator, 
the most widely used one is the standard negator 
(n(x) = 1 —x). In this case, it is called the standard 
complement and is defined by R° (x, y) = 1—R(x, y). 

@ The dual of the fuzzy relation R is defined and de- 
noted as in the classical case. The fuzzy relation R? 
is the complement of the transpose of R 


R? (x,y) =n(R(y,x)), Y&,y)EXxY. 


That is, R? = (R~!)°. 

@ We say that R is contained in Q, and we denote it 
by RC Q, if and only if for all (x, y) € X x Y the 
inequality R(x, y) < Q(x, y) holds. 

© R and Q are said to be equal if and only if for 
all (x,y) € X x Y we have the inequality R(x, y) = 
Q(x, y), that is, RC Qand QCR. 


11.1.2 Specific Operations on Fuzzy Relations 


In the previous items, we are only considering that 
fuzzy relations can be seen as fuzzy sets on X x Y and 
we have adapted the corresponding definitions. How- 
ever, fuzzy relations involve additional concepts and 
operations. The most important are: compositions, pro- 
jections and cylindrical extensions, among others. 


© Let R and Q two fuzzy relations on X x Y and Y x Z, 
respectively. The T-composition of R and Q is the 


Let R be a fuzzy relation on X x Y. We can project 
R with respect X and Y as follows: 
Ry(x) = sup R(x, y), WxeX, and 


yer 


Ry(y) = sup R(x, y), YyeY 
xEX 

where Ry and Ry denote the projected relation of R 
to X and Y, respectively. 

It is clear that the projection to X coincides with the 
domain of R and the projection to Y with the range. 
The definition given for 2-ary fuzzy relations can be 
generalized to n-ary relations. Thus, if R is a fuzzy 
relation on X; x X2 x -+ - x Xn, the projected relation 
of R to the subspace X;, x X; x --- x X;, is defined 
by 


Ry; Xj, xx Xi in Xiz os Xip) 
= sup  R(x1,X2,..., Xn), 
Hi H2 +++ m 
where Xj, , Xz». - - , Xn represent the omitted dimen- 
sions and X;,,Xj,,...,X;, the remained ones. That 
is 


{ij,i2,...,i$U {Jije dm} = {1,2,...,n} 


and 


tit, in... de} OU J2,- -Jm =O. 


Another operation on relations, which is in some 
sense an inverse to the projection, is called a cylin- 
drical extension. If A is a fuzzy subset of X, then 
its cylindrical extension to Y is the fuzzy relation 
defined by 


cylA(x, y) =A(x), Vay) exxY. 


173 


LLL |d Hed 


174 Part B | Fuzzy Logic 


E'L | 9 Hed 


11.2 Cut Relations 


Any fuzzy relation R has an associated family of crisp 
relations {Rg |æ € [0, 1]}, called cut relations, which are 
defined by 


Ra = {(x, y) E X x Y|R(x, y) = a}. 


It is clear that they are just the -cuts of R, considered as 
a fuzzy set. Thus, it is immediate that they form a chain 
(nested family) of relations, that is, 


BC Rap © Rapa C C Ra, CXXY 


if0 <a, Se. An—1 S Am S 1. 


11.3 Fuzzy Binary Relations 


In the particular case X = Y, fuzzy relations are called 
fuzzy binary relations or valued binary relations and 
they have specific and interesting properties. A de- 
tailed proof of the results presented here can be found 
in [11.9]. 

The first specific characteristic of fuzzy binary rela- 
tions is that, apart from the matrix representation, they 
admit a graph representation if X is finite. In this di- 
rected graph, X is the set of nodes (vertices) and R is 
the set of arcs (edges). The arc from x to y exists if and 
only if x and y are related in some sense (R(x, y) > 0). 
A number on each arc represents the membership de- 
gree of this elements to R. 


Example 11.3 
If we consider X = {x, y, z, t}, the fuzzy binary relation 
R x y Zz t 


10 04 0.2 0.0 
0.6 0.9 0.0 0.0 
0.0 0.0 0.0 0.0 
t 00 0.0 0.8 0.0 
can also be represented by the graph in Fig. 11.2. 


NS & 


In the following, we will list some basic properties 
of fuzzy binary relations. Usually, these properties are 
translations of the equivalent for the particular case of 
(crisp) binary relations. 


11.3.1 Reflexivity 


The most used definition of reflexivity for fuzzy bi- 
nary relations was given by Zadeh in 1971 [11.4]. Thus, 


Moreover, it is possible to represent a fuzzy relation 
by means of its cuts relations, since 


R(x, y)= sup min(a, R(x, y)), 


ae€[0,1] 
V(x,y) EXXY, 
which is denoted by 
R= sup aRq. 
a@e[0.1] 


a fuzzy binary relation R on X is said to be reflexive iff 
R@,x=1, Vxex. 


This means that every vertex in the graph originates 
a simple loop. 

Other less restrictive definitions have also been 
considered in the literature. Thus, we say that R is €- 
reflexive [11.10], with € € (0, 1], iff 


R(x,x) >€, Wxex 
and weakly reflexive [11.10] iff 
R(x, x) > R(x, y), Wx,yex. 


Of course, in the particular case of crisp binary 
relations, all of them are the usual definition of reflex- 
ivity. Moreover, if R is reflexive, it is €-reflexive for any 


Fig. 11.2 Directed graph associated to a fuzzy binary 
relation 


€ € (0, 1] and weakly reflexive. The remaining implica- 
tions are not true in general. 

A cutworthy study of this property is given in the 
following proposition. 


Proposition 11.2 

Let R be a fuzzy binary relation on X. If R is reflexive, 
then its associated cut relations Ra, with «œ € (0, 1], are 
reflexive. 


The cutworthy property is not fulfilled, in general, 
by €-reflexive or weakly reflexive fuzzy binary rela- 
tions. 


11.3.2 Irreflexivity 


A fuzzy binary relation that is irreflexive, or antire- 
flexive, is a fuzzy binary relation where no element is 
related in any degree to itself. Formally 


R(x,x)=0, Wxrex. 


This means that there is not any vertex in the graph orig- 
inating a simple loop. 

Analogous to the case of reflexivity, we can con- 
sider some generalizations of this concept: 


© ¢-Irreflexive, with € € [0, 1): R(x, x) < €, Yx € X 
@ Weakly irreflexive: R(x, x) < R(x, y), Yx, y E X. 


The behavior of cut relations is similar to the previ- 
ous case. 


Proposition 11.3 

Let R be a fuzzy binary relation on X. If R is irreflexive, 
then its associated cut relations Rg, with «œ € (0, 1], are 
irreflexive. 


Again this condition is not fulfilled for ¢-reflexivity 
or weak reflexivity. 


11.3.3 Symmetry 


For symmetry, there is not any change with the classi- 
cal definition for crisp relations. Thus, a fuzzy binary 
relation R on X is said to be symmetric if 


R(x, y) =RO,x), Vx, yeXx. 


This is equivalent to require that R and its inverse are 
equal 


R=R!. 


Of course, if R is symmetric, so are their associated 
cut relations for any œ € (0, 1]. 


11.3.4 Antisymmetry 


In the crisp case, a binary relation R is antisymmetric 
iff xRy and yRx which implies that x = y. This is equiv- 
alent to require that x Æ y implies that (x, y) € RO R7!. 
Thus, the intersection can be used in order to define an- 
tisymmetry. In the fuzzy case, the intersection will be 
defined, as usual, by means of a t-norm T. The defini- 
tion will be directly related to the t-norm, and therefore 
the used ¢-norm should appear in the name of the prop- 
erty. Thus, a fuzzy binary relation R on X is said to be 
T-antisymmetric if 


x # y= T(R(x, y), RO, x)) = 0 
or, equivalently 
RArR™!(x,y)=0, YxÆy. 


It is immediate that if T and T’ are t-norms such 
that 7’ < T, then the T-antisymmetry of a fuzzy relation 
implies its T’-antisymmetry. 

In 1971, Zadeh [11.4] proposed to use the minimum 
t-norm for this aim, and he called it perfect antisymme- 
try or just antisymmetry. In that case, its cut relations 
are antisymmetric, for any œ € (0, 1]. This is also true 
for T-antisymmetry for a positive t-norm (x, y > 0 = 
T(x, y) > 0), since in this case, T-antisymmetry and 
perfect antisymmetry are equivalent. However, the cut- 
worthy property is not fulfilled, in general, for any other 
T-norm. 

Clearly, perfect antisymmetry implies the T-anti- 
symmetry for any t-norm T. However, perfect antisym- 
metry can be too restrictive in many cases, since it 
excludes relations where R(x, y) and R(y, x) are almost 
zero. If we consider t-norms with zero divisors, this 
problem is solved. In this way, the case when R(x, y) 
and R(y,x) are high is avoided, but the equality to 
zero is not required. For instance, if we consider the 
Łukasiewicz t-norm (T(x, y) = max(x+ y— 1,0)), T- 
antisymmetry is equivalent to require that 


R(x, y) +R, x) <1, Yx,yE€X such that x £ y. 
11.3.5 Asymmetry 


Asymmetry is a stronger condition, since it is not only 
required for pairs of elements (x, y) such that x Æ y, but 


Fuzzy Relations: Past, Present, and Future | 11.3 Fuzzy Binary Relations 175 


ELL | @ Hed 


176 PartB 


Fuzzy Logic 


E'L | 9 Hed 


also for any pair of elements in X x X. Thus, the T- 
asymmetry of a fuzzy binary relation R on X is defined 
by 


T(R(x, y), Ry, x)) =0, Yx,yEX 


or, equivalently, 
ROrR '=6. 


Clearly, T-asymmetry implies 7’-asymmetry if 
T’ < T and then, classic asymmetry (T = min) implies 
T-asymmetry for any t-norm T. Usually it is called just 
asymmetry. In particular, asymmetry is equivalent to the 
T-asymmetry if T is positive. In that case, its associ- 
ated cut relations are crisp asymmetric relations for any 
a € (0, 1]. 

It is possible to relate asymmetry and irreflexivity 
as follows: 


Proposition 11.4 
Let R be a T-asymmetric fuzzy binary relation on X. 
The following statements hold: 


i) R is irreflexive if and only if T is a positive t-norm; 
ii) R is e€-irreflexive for € < 1 if and only if T has zero 
divisors and € belongs to the interval (0, sup{x € 


[0, 1]|T (x, x) = O}). 


11.3.6 Transitivity 


The pairwise comparison of possible alternatives is 
a first step in many approaches to decision making. If 
this first step lacks coherence, the whole decision pro- 
cess might become meaningless. A popular criterion for 
coherence is the transitivity of the involved relations, 
expressing that the strength of the link between two al- 
ternatives cannot be weaker than the strength of any 
chain involving another alternative [11.11]. 

The usual definition of transitivity for fuzzy rela- 
tions is related to a t-norm and it is a generalization 
of the proposal given by Zadeh in 1971 [11.4]. Thus, 
a fuzzy binary relation R on X is said to be T-transitive 
if 


T(R(x, y), RO, Z)) < R(x, z) 
for all x, y, z € X. 


As it happened for the concepts of T-asymmetry 
and T-antisymmetry, T-transitivity is not unique as it 


happened for classical relations. When T is the mini- 
mum t-norm that definition can also be expressed as 


R(x,z) > max(min(R(x, y),RQ,z))), Yx,zEX. 


Then, it is sometimes called max-min-transitivity. This 
coincides with the initial definition proposed by Zadeh. 

The T-transitivity is a natural way of extending the 
original definition by Zadeh, specially after t-norms and 
t-conorms began to be used in the 1970s by different 
authors to generalize the intersection and the union. 
However, many other types of transitivity were defined. 
From the least restrictive ones as the minimal transitiv- 
ity [11.12] defined by 


R(x, y) = 1 and R(y, z) = 1 = > R(x,z) = 1 


or the preference sensitive transitivity [11.13], also 
called quasitransitivity [11.12], defined by 


R(x,y)>0 and RỌy,z)>0 = R(x,z)>0. 


The weak and parametric transitivities [11.14] de- 
fined, respectively, by 


R(x, y) > R(y, x) and R(y, z) > R(z, y) 
=> R(x, z) > R(z, x) 


and by 


R(x, y) > 0 > R(y, x) and R(y, z) > 6 > R(z, y) 
=> R(x, z) > 0 > R(z, x) 


where @ is a fixed value in the interval [0, 1). 

Or the weighted mean transitivity [11.15] stating 
that the inequalities R(x, y) > 0 and R(y, z) > 0 require 
the existence of some 6 € (0, 1) such that 


R(x, z) > 0 - max(R(x, y), RO, z)) 
+ (1 — 0) - min(R(x, y), R(y, z)) 


among others. 

T-norms offer a way of defining transitivity for 
fuzzy relations, but it is known that these operators 
are too restrictive in some cases. If we have a look at 
the properties an operator defining transitivity must sat- 
isfy, associativity is only necessary if we try to extend 
the definition to more than three elements. Concerning 
the commutativity and the boundary condition, a much 
weaker condition is sufficient to generalize the classical 


Fuzzy Relations: Past, Present, and Future | 11.3 Fuzzy Binary Relations 177 


definition: commutativity and boundary condition on 
{0, 1}. Thus, recent studies about transitivity for fuzzy 
binary relations are not restricted to t-norms, but a much 
more general definition of transitivity is considered, the 
one obtained by considering only the necessary condi- 
tions [11.16]. Thus, we consider a conjunctor, that is an 
increasing binary operator f : [0, 1]? — [0, 1] which co- 
incides with the Boolean conjunction on {0, 1}. Recall 
that this definition preserves the concept of conjunction 
for classical relations. It is also clear that the notion 
of conjunctor is much more general than the one of 
t-norm. Neither associativity nor commutativity are re- 
quired. Note that conjunctors are even not required to 
have neutral element 1. 

Given a conjunctor f, we can define the f-transitiv- 
ity of a fuzzy relation R in the same way as it is defined 
for t-norms 


f(R(x,y), RO,Z)) < RO, Yx,y,zEX. 


Since conjunctors are a much wider family of oper- 
ators, this definition includes more types of transitivity 
than the definition given just for t-norms. It is trivial 
that if we restrict this definition to t-norms, we get the 
classical definition of T-transitivity. 

The definition for conjunctors is a too general no- 
tion in a particular case: if we consider a reflexive 
relation R f-transitive, where f is a conjunctor, that con- 
junctor must be bounded by the minimum. That is, only 
conjunctors smaller than or equal to minimum can de- 
fine the transitivity of reflexive fuzzy relations. 


11.3.7 Negative Transitivity 


Another important property is negative transitivity, 
which is a dual property of transitivity. Thus, given a t- 
conorm S, a fuzzy binary relation R on X is said to be 
negatively S-transitive if 


R(x, z) < S(R(x, y), RQ, z)) 


for all x,y,z € X. 

If T is a t-norm and S is its dual t-conorm, R is 
T-transitive if and only if its dual R@ is negatively S- 
transitive. 

Clearly, if R is negatively S-transitive, then it is 
negatively S’-transitive for any other t-conorm S’ such 
that S < S’. In particular, the negative transitivity of the 
maximum implies the negatively S-transitivity for any 
t-conorm S. 


11.3.8 Semitransitivity 


In the classical case, a crisp relation R is semitransitive 
if xRy and yRz implies that there exists t € X such that 
xRt or tRz. 

If we consider t-norms and t-conorms to generalize 
AND and OR, respectively, we obtain that a fuzzy bi- 
nary relation R on X is T-S-semitransitive if 


T(R(x, y), R, z)) < S(R(x, t), R(t, z)) 


for every x, y, z, t € X. 

It is clear that T-S-semitransitivity implies T’-S’- 
semitransitivity of any t-norm T’ such that T” < T and 
any t-conorm S’ such that S< S’. Thus, the classi- 
cal semitransitivity with the minimum t-norm and the 
maximum f-conorm implies the T-S-transitivity for any 
t-norm T and any t-conorm S. 

As a consequence of the definition, for any T-S- 
semitransitive fuzzy relation R we have that: 


© If Ris reflexive, then it is negatively S-transitive 
@ If R is irreflexive, then it is T-transitive. 


Moreover, we can easily prove the following propo- 
sitions [11.9]. 


Proposition 11.5 
If R is T-transitive and negatively S-transitive, then R is 
T-S-semitransitive. 


Proposition 11.6 

Suppose that T is a continuous t-norm in the De Morgan 
triple (T, S,n). If R is T-asymmetric and negatively S- 
transitive then R is T-S-semitransitive. 


11.3.9 Completeness 


In the crisp set theory, the concept of completeness is 
clear, the relation R defined on X is complete if every 
two elements are related by R, that is, if at least xRy or 
yRx for any pair of values x different from y in X. And 
it is still equivalent, for reflexive relations, to the con- 
cept of strong completeness (xRy or yRx, Vx, y € X). It 
is logical that in the setting of classical relations, the 
completeness is equivalent to the absence of incom- 
parability. There is no pair of elements that cannot be 
compared since they are related at least by R or R7!. 
When we try to generalize this concept to fuzzy re- 
lations, the problem arises when trying to fuzzify the 


ELL | @ Hed 


178 PartB 


Fuzzy Logic 


E'L | d Hed 


notion related at least by R or R~'. In the classical case, 
it is clear that x and y are related by R if and only if 
R(x, y) = 1 or R(y, x) = 1. This could be a first way to 
define the concept of completeness for fuzzy relations. 

Thus, a fuzzy binary relation R defined on X is 
strongly complete if 


max(R(x, y), R, x) =1, VxyeXx. 


Perny and Roy [11.17] call it just complete. 

But this condition could be considered too restric- 
tive for fuzzy relations. Consider, for example, the 
case in which both R(x, y) = 0.95 and R(y, x) = 0.95. 
By the definition given above (strong completeness), 
R is not complete but it is clear that x and y are re- 
lated by R. Taking into account this type of situations, 
other less restrictive completeness conditions were pro- 
posed. 

Among the most employed in the literature, we find 
the one known as weak completeness. A fuzzy relation 
R defined on X is weakly complete if 


R(x, y) +R, x) > 1, Yx,yEX. 


This condition is called connectedness in [11.13, 
18] while in other works [11.14] this name makes ref- 
erence to the strong completeness. 

It is clear that this definition is much less restrictive 
than the one called strong completeness. Strongly com- 
plete relations are a particular type of weakly complete 
relations. 

If we take a careful look at these two definitions, we 
can express them by means of a t-conorm. On the one 
hand, we can quickly identify the maximum t-conorm 
as the operator that relates R and its transpose in the first 
definition. On the other hand, since R(x, y) + R(y, x) > 1 
is equivalent to min(R(x, y) + R, x), 1) = 1, then the 
weakly completeness relates R and RT! by means of 
the Łukasiewicz t-conorm. 

These two conditions are special cases of what is 
called S-completeness. A fuzzy relation R defined on X 
is called S-complete [11.19], where S is a t-conorm, if 


S(R(x, y), Ry, x))=1, Yx,yEX. 


It is immediate that this is equivalent to require that 
RUsR™!=XxX. 
Remark that the previous definition corresponds, ac- 


cording to [11.19], to strong S-completeness, while the 
concept of S-completeness only requires the equality 


S(R(x, y), R(y, x)) = 1 for pairs of different elements, 
xy. 

A direct consequence of the definition is that for 
any two f-conorms S and S’, such that S < S’, S-com- 
pleteness (respectively, strong S-completeness) implies 
S’-completeness (respectively, strong S’-completeness). 
It is easy to check that strong completeness can be 
identified not only with the S-completeness defined by 
the maximum f-conorm, but also with any t-conorm of 
which the dual t-norm has no zero divisors. The S-com- 
pleteness of a fuzzy relation R is equivalent to the T- 
antisymmetry of its dual R? [11.19], where T is the dual 
t-norm of S by means of a strong negation n. 

The behavior of cut relations are the same as in 
the previous properties. Thus, if R is a max-complete 
(resp. strongly max-complete) fuzzy binary relation, 
then its associated cut relations Ra are complete (resp. 
strongly complete) crisp binary relations, for any œ € 
(0, 1]. 

As it happens in the crisp case, we can relate 
completeness and reflexivity. In this case, we obtain dif- 
ferent results depending on the chosen t-norm. 


Proposition 11.7 
Let R be a strongly S-complete fuzzy binary relation 
on X: 


1. R is reflexive if and only if the dual t-norm associ- 
ated to S is a positive t-norm. 

2. R is e-reflexive with e < 1 if, and only if, S is 
a nilpotent ¢-conorm and e€ belongs to the interval 
[inf{x € [0, 1]|S(x, x) = 1}, 1). 


Table 11.1 Some properties of fuzzy binary relations 


Property Definition 

Reflexivity R@w,x)=1, Wxex 

Irreflexivity R@,x)=0, VxEx 

Symmetry R(x, y) = RO, x), Yx,yEX 
T-antisymmetry T(R(x, y), R@,x)) =0, Vx,yEX,x Éy 
T-asymmetry T(R(x, y), R, x)) =0, Yx,yEX 
T-transitivity T(R(@, y), RO, 2)) < R(x, z), Yx, y,zEX 
Negative R(x, z) < SRCE RO DDV EX 
S-transitivity 

T-S T(R(x, y), R, z)) 

semitransitivity ZSR O RUDE YA y,z,tEX 
S-completeness S(R(x, y), RO, x)) =1, Yx,yEX 
Strong max(R(x, y),RQy,x))=1, Vx,yEX 
completeness 

Weak R(x, y) RO) 21, Wx, yex 
completeness 

T-linearity n7(R(x,y)) <RO,x), Vx, yeX 


Fuzzy Relations: Past, Present, and Future | 11.4 Particular Cases of Fuzzy Binary Relations 179 


Completeness also plays an important role in or- 
der to relate transitivity and negative transitivity [11.9]. 
Thus, given a De Morgan triple (T, S, n), with T a con- 
tinuous f-norm, and a strongly S-complete fuzzy binary 
relation R, the T-transitivity of R implies: 


@ Its negatively S-transitivity 
@ The 7-transitivity of R°. 


11.3.10 Linearity 


S-completeness is also very related to the concept of T- 
linearity. Given a t-norm T, a fuzzy relation R defined 


on X is called T-linear [11.20] if 
nr (R(x, y)) <RO.x), Yx,yEX, 


where ny stands for the negator n(x) = sup{z € 
[0, 1]| 7, z) = 0}. 

S-completeness is equivalent to T-linearity when- 
ever T is nilpotent and S is the dual t-conorm of T by 
using ny, that is, S(x, y) =nr(T(nr(x), nr(y))). 

As the Lukasiewicz t-norm T; is in particular 
a nilpotent t-norm, the weak completeness, that is Sz- 
completeness, is equivalent to T;-linearity. 

We summarize the properties and definitions we 
have introduced in this section in Table 11.1. 


11.4 Particular Cases of Fuzzy Binary Relations 


In this section, we deal with some particular cases of 
fuzzy binary relations, which are very important in sev- 
eral fields and they generalize classic concepts. 


11.4.1 Similarity Relation 


The notion of similarity is essentially a generalization 
of the notion of equivalence. 

More concretely, a T-indistinguishability relation R 
is a fuzzy binary relation which is reflexive, sym- 
metric, and 7-transitive. Sometimes it is also called 
fuzzy equivalence relation or equality relation. R(x, y) 
is interpreted as the degree of indistinguishability (or 
similarity) between x and y. 

In this definition, reflexivity expresses the fact that 
every object is completely indistinguishable from itself. 
Symmetry says that the degree in which x and y are 
indistinguishable is the same as the degree in which y 
and x are indistinguishable. For transitivity, as it de- 
pends on a t-norm, we have a more flexible property. In 
particular, when we use the product t-norm, we obtain 
the so-called possibility relations introduced by Menger 
in [11.3]; if we choose the Lukasiewicz t-norm, we 
obtain the relations called likeness introduced by Rus- 
pini [11.21]; while for the minimum f-norm we obtain 
similarity relations [11.22]. 

When the transitivity is not required, the relation R 
is said to be a proximity relation. 


11.4.2 Fuzzy Order 


Next, we make a quick overview on some different 
fuzzy ordering relations, focusing on the properties they 
shall satisfy. Consider a fuzzy binary relation R on the 
set X. R is called: 


© Partial T-preorder or T-quasiorder if R is reflexive 
and T-transitive; 

@ Total T-preorder or linear T-quasiorder if R is 
strongly complete and T-transitive; 

© Partial T-order if R is antisymmetric and T- 
transitive; 

© Strict partial T-order if R is asymmetric and T- 
transitive; 

@ Total T-order or linear T-order if R is a com- 
plete partial T-order, that is, R is antisymmetric, 
T-transitive and complete; 

@ Strict total T-order if R is a complete strict partial 
T-order, that is, R is asymmetric, T-transitive and 
complete. 


As in the previous concepts, when T is the t-norm 
of the minimum, we call R simply total preorder, total 
order, etc. 

The previous definitions are summarized in Ta- 
ble 11.2. 


LL | d Hed 


180 PartB 


LL | d Hed 


Fuzzy Logic 


Table 11.2 Fuzzy binary relations by properties 


Reflexivity Symmetry 
Preorder Yes 
Order Yes No 
Strict order Yes No 
Proximity Yes Yes 
T-indistinguishability Yes Yes 


Antisymmetry Asymmetry Transitivity 
Yes 

Yes Yes 

Yes Yes Yes 

No No 

No No Yes 


11.5 Present and Future of Fuzzy Relations 


In this chapter, we have tried to give to the reader 
a first approach to the concept of fuzzy relations. Of 
course this is just a starting point. Due to the current 
development of the topics related to fuzzy relations, 
a researcher interested in this notion should study in de- 
tail more specialized materials. 

Here we have presented the definition of classic 
fuzzy relations, which take values in the interval (0, 1]. 


References 


11.1 L.A. Zadeh: Fuzzy sets, Inf. Control 8, 338-353 (1965) 

11.2 R.D. Luce: Individual Choice Behavior (Wiley, New 
York 1959) 

11.3 K. Menger: Probabilistic theories of relations, Proc. 
Natl. Acad. Sci. USA 37, 178-180 (1951) 

11.4 L.A. Zadeh: Similarity relations and fuzzy ordering, 
Inf. Sci. 3, 177-200 (1971) 

11.5 R. Bélohlavek: Fuzzy Relational System (Kluwer, 
Dordrecht 2002) 

11.6 G.J. Klir, B. Yuan: Fuzzy Sets and Fuzzy Logic (Pren- 
tice Hall, Upper Saddle River 1995) 

Vt E. Turunen: Mathematics Behind Fuzzy Logic (Phys- 
ica, Heidelberg 1999) 

11.8 E.P. Klement, R. Mesiar, E. Pap: Triangular Norms 
(Kluwer, Dordrecht 2000) 

11.9 J. Fodor, M. Roubens: Fuzzy Preference Modelling 
and Multicriteria Decision Support (Kluwer, Dor- 
drecht 1994) 

11.10 R.T. Yeh: Toward an algebraic theory of fuzzy re- 
lational systems, Proc. Int. Congr. Cybern. Namur 
(1973) pp. 205-223 

11.11 D. Dubois, H. Prade: Fuzzy Sets and Systems Theory 
and Applications (Academic, New York 1980) 

11.12 P.C. Fishburn: Binary choice probabilities: On the 
varieties of stochstic transitivity, J. Math. Psychol. 
10, 327-352 (1973) 

11.13 C. Alsina: On a family of connectives for fuzzy sets, 
Fuzzy Sets Syst. 16, 231-235 (1985) 

11.14 J. Fodor: A new look at fuzzy connectives, Fuzzy Sets 
Syst. 57, 141-148 (1993) 


However, lattice-valued fuzzy relations are considered 
as a interesting tool in some areas. In that case, the rela- 
tion R is a map from X x Y in a complete lattice L. Some 
approaches and interesting references for this lattice- 
valued relations are in [11.23]. 

A particular interesting case is when the lattice is 
a chain of linguistic labels. Some approaches in this di- 
rection can be found, for instance, in [11.24—26]. 


11.15 J. Azcél, F.S. Roberts, Z. Rosenbaum: On scientific 
laws without dimensional constants, J. Math. Anal. 
Appl. 119, 389-416 (1986) 

11.16 S. Diaz, B. De Baets, S. Montes: General results 
on the decomposition of transitive fuzzy relations, 
Fuzzy Optim. Decis. Mak. 9, 1-29 (2010) 

11.17 P. Perny, B. Roy: The use of fuzzy outranking rela- 
tions in preference modelling, Fuzzy Sets Syst. 49, 
33-53 (1992) 

11.18 J. Fodor: Traces of fuzzy binary relatios, Fuzzy Sets 
Syst. 50, 331-341 (1992) 

11.19 J.J. Buckley: Ranking alternatives using fuzzy num- 
bers, Fuzzy Sets Syst. 15, 21-31 (1985) 

11.20 J. Azcél: Lectures on Functional Equations and Ap- 
plications (Academic, New York 1966) 

11.21 E. Ruspini: Recent Developments. In: Fuzzy Cluster- 
ing, (Pergamon, Oxford 1982) pp. 133-147 

11.22 J. Fodor, M. Roubens: Aggregation of strict pref- 
erence relations. in MCDM procedures. In: Fuzzy 
Approach to Reasoning and Decision-Making, ed. 
by V. Novak, J. Ramik, M. Mares, M. Cerny, J. Nekola 
(Academia, Prague 1992) pp. 163-171 

11.23 J. Jiménez, S. Montes, B. Seselja, A. Tepavcevic: 
Lattice-valued approach to closed sets under fuzzy 
relations: Theory and applications, Comput. Math. 
Appl. 62(10), 3729-3740 (2011) 

11.24 L. Martinez, L.G. Pérez, M.I. Barroco: Filling incom- 
plete linguistic preference relations up by prior- 
izating experts’ opinions, Proc. EUROFUSE09 (2009) 
pp. 3-8 


Fuzzy Relations: Past, Present, and Future | References 181 


11.25 


R.M. Rodriguez, L. Martinez, F. Herrera: A group 
decision making model dealing with comparative 
linguistic expressions based on hesitant fuzzy lin- 
guistic term set, Inf. Sci. 241, 28-42 (2013) 


11.26 


J.M. Tapia-Garcia, M.J. del Moral, M.A. Martinez, 
E. Herrera-Viedma: A consensus model for group 
decision making problems with linguistic interval 
fuzzy preference relations, Expert Syst. Appl. 39, 
10022-10030 (2012) 


LL | g Hed 


12. Fuzzy Implications: Past, Present, 


Michat Baczynski, Balasubramaniam Jayaram, Sebastia Massanet, Joan Torrens 


Fuzzy implications are a generalization of the clas- 
sical two-valued implication to the multi-valued 
setting. They play a very important role both in the 
theory and applications, as can be seen from their 
use in, among others, multivalued mathematical 
logic, approximate reasoning, fuzzy control, im- 
age processing, and data analysis. The goal of this 
chapter is to present the evolution of fuzzy impli- 
cations from their beginnings to the current days. 
From the theoretical point of view, we present the 
basic facts, as well as the main topics and lines of 
research around fuzzy implications. We also de- 
vote a specific section to state and recall a list of 
main application fields where fuzzy implications 
are employed, as well as another one to the main 
open problems on the topic. 


12.1 Fuzzy Implications: 
Examples, Properties, and Classes ........ 184 


Fuzzy logic connectives play a fundamental role in the 
theory of fuzzy sets and fuzzy logic. The basic fuzzy 
connectives that perform the role of generalized And, 
Or, and Not are t-norms, t-conorms, and negations, re- 
spectively, whereas fuzzy conditionals are usually man- 
aged through fuzzy implications. Fuzzy implications 
play a very important role both in theory and applica- 
tions, as can be seen from their use in, among others, 
multivalued mathematical logic, approximate reason- 
ing, fuzzy control, image processing, and data analysis. 
Thus, it is hardly surprising that many researchers have 
devoted their efforts to the study of implication func- 
tions. This interest has become more evident in the last 
decade when many works have appeared and have led to 
some surveys [12.1,2] and even some research mono- 
graphs entirely devoted to this topic [12.3,4]. Thus, 
most of the known results and applications of fuzzy 
implications until the publication date were collected 
in [12.3], and very recently the edited volume [12.4] has 
been published complimenting the earlier monograph 


12.2 Current Research 


on Fuzzy Implications ......................0.. 187 
12.2.1 Functional Equations 
and Properties ..............cccccecee ees 187 
12.2.2 New Classes 
and Generalizations .................. 189 
12.2.3 New Construction Methods......... 190 
12.2.4 Fuzzy Implications 
in Nonclassical Settings.............. 192 
12.3 Fuzzy Implications in Applications........ 193 
12.3.1 Fl,—Fuzzy Logic 
in the Narrow Sense.................. 193 
12.3.2 Approximate Reasoning............. 195 
12.3.3 Fuzzy Subsethood Measures ....... 196 
12.328. Fuzzy COMO ipsus: 196 
12.3.5 Fuzzy Mathematical Morphology. 197 
12.4 Future of Fuzzy Implications................. 198 
ROTOR QSNCRS iiss coe nncs dis ensanecanamadwccoensedansabedashesdt 199 


with the most recent lines of investigation on fuzzy im- 
plications. 

In this regard, we have decided to devote this chap- 
ter, as the title suggests, to present the evolution of 
fuzzy implications from their beginnings to the present 
time. The idea is not to focus on a list of results already 
collected in other works, but unraveling the relations 
and highlighting the importance in the development 
and progress that fuzzy implications have experienced 
along the time. From the theoretical point of view we 
present the basic facts, as well as the main topics and 
lines of research around fuzzy implications, recalling 
in most of the cases where the corresponding results 
can be found, instead of listing them. Of course, we 
also devote a specific section to state and recall a list 
of the main application fields where fuzzy implications 
are employed. A final section looks ahead to the future 
by listing some of the main open-problem-solutions of 
which are certain to enrich the existing literature on the 
topic. 


and Future 


183 


an] 
o 

æ] 

(ana 
[se] 
= 
N 


184 Part B | Fuzzy Logic 


12.1 Fuzzy Implications: Examples, Properties, and Classes 


Fuzzy implications are a generalization of the classical 
implication to fuzzy logic. It is a well-established fact 
that fuzzy concepts have to generalize the correspond- 
ing crisp one, and consequently fuzzy implications 
restricted to {0,1}? must coincide with the classical 
implication. Currently, the most accepted definition of 


a fuzzy implication is the following one. 


Definition 12.1 [12.3, Definition 1.1.1] 


A function J: [0, 1]? + [0, 1] is called a fuzzy implica- 


tion if it satisfies the following conditions: 


ZL | a Wed 


This definition is flexible enough to allow uncount- 
ably many fuzzy implications. This great repertoire of 
fuzzy implications allows a researcher to pick out, de- 
pending on the context, that fuzzy implication which 
satisfies some desired additional properties. Many ad- 
ditional properties, all of them arising from tautologies 
in classical logic, have been postulated in many works. 


(I1) I(x, z) > I(y, z) when x < y, for all z € [0, 1] 
(12) I(x, y) < I(x, z) when y < z, for all x € [0, 1] 
(13) 7(0,0) = 7(1, 1) = 1 and /(1,0) = 0. 


The most important of them are collected below: 


@ (NP): The left neutrality principle, 


I(,y) = 


y., ye€[0,1]. 


@ (EP): The exchange principle, 


I(x, 1Qy,2)) =10,1,2)), xy,z€ [0,1]. 
@ (OP): The ordering property, 
x<y<4> I(x,y)=1, x, ye [0,1]. 


© (IP): The identity principle, 


I(x,x)=1, xe[0,1]. 
@ (CP(N)): The contrapositive symmetry with respect 

to a fuzzy negation N, 

I(x,y) =(N(Q).N@), xy € [0,1]. 

Given a fuzzy implication J, its natural negation is 
defined as N;(x) = I(x, 0) for all x € [0, 1]. This func- 
tion is always a fuzzy negation. For the definitions of 
basic fuzzy logic connectives like fuzzy negations, t- 
norms and t-conorms please see [12.5]. Moreover, Ny 
can be continuous, strict, or strong and these are also 
additional properties usually required of a fuzzy impli- 
cation I. 

Table 12.1 lists the most well-known fuzzy im- 
plications along with the additional properties they 
satisfy [12.3, Chap.1]. In addition, the following 


Table 12.1 Basic fuzzy implications and the additional properties they satisfy where Nc, Np,, and Np, stand for the 
classical, the least and the greatest fuzzy negations, respectively 


Name 
Lukasiewicz 


Gédel 


Reichenbach 
Kleene—Dienes 


Goguen 


Rescher 


Yager 


Weber 


Fodor 


Formula 
Tix(x, y) = min{1, 1—x +y} 
Igp (x. y) = r n 

y ifx>y 
Inc(,y) = 1—x-+xy 
Ixy (x, y) = max{1 — x, y} 


I May 
aN = a . 
E 
Tea teeny, 
oe = 
ean 0 ifx>y 
1 if, y) = (0,0) 
Tye. y) = 
y” if œ, y) # (0,0) 
io ia 
Iwg (x, y) = ; 
y ifx=1 
E 1 ifx<y 
x,y) = 
Boece max{l—x,y} ifx>y 


Vv 


< BAe 


(NP) (EP) (IP) (OP) (CP(N)) Nr 
Vv Vv Vv Ne Ne 
vA Vv Vv x Np, 
A x x Ne Ne 
VA X X Nc Ne 
Vv Vv Vv X Np, 
xX Vv Vv Ne Np, 
7 x X x Np, 
Vv Vv x X Np» 
Vv Vv Vv Ne Ne 


Fuzzy Implications: Past, Present and Future 


12.1 Fuzzy Implications: Examples, Properties, and Classes 


two implications 


WED l, ifx=Oory=1, 
x,y) = 
oma = o. Gee Oaidy ei: 


Ka 1, ifx<lory>0, 
X, = 
me 0, ifx= 1 andy=0, 


are the least and the greatest fuzzy implications, respec- 
tively, of the family of all fuzzy implications. 

Beyond these examples of fuzzy implications, sev- 
eral families of these operations have been proposed 
and deeply studied. There exist basically two strate- 
gies in order to define classes of fuzzy implications. 
The most usual strategy is based on some combina- 
tions of aggregation functions. In this way, t-norms and 
t-conorms [12.5] were the first classes of aggregation 
functions used to generate fuzzy implications. Thus, the 
following are the three most important classes of fuzzy 
implications of this type: 


1) (S,N)-implications defined as 
Isny) = S(WQ@),y), x,y E€ [0,1], 


where S is a f-conorm and N a fuzzy negation. 
They are the immediate generalization of the clas- 
sical boolean material implication p > q = >p V q. 
If N is involutive, they are called strong or S- 
implications. 

2) Residual or R-implications defined by 


Ir(x, y) = sup{z € [0, 1] | T(x, z) < y} , x,y € [0, 1], 


where T is a t-norm. When they are obtained from 
left-continuous t-norms, they come from residuated 
lattices based on the residuation property 


T(x, y) <z & I(x,z) >y, forallx,y,z € [0,1]. 
3) QL-operations defined by 
Ir snx, y) = SN (x), T(x, y)), x,ye [0, 1], 


where S$ is a f-conorm, T is a t-norm and N is 
a fuzzy negation. Their origin is the quantum me- 
chanic logic. 


Note that R- and (S,N)-implications are always 
implications in the sense of Definition 12.1, whereas 
QL-operations are not implications in general (they 
are called QL-implications when they actually are). 


A characterization of those QL-operations which are 
also implications is still open (Sect. 12.4), but a com- 
mon necessary condition is S(N(x),x) = 1 for all x € 
[0, 1]. Yet another class of fuzzy implications is that of 
Dishkant or D-operations [12.6] which are the contra- 
position of QL-operations with respect to a strong fuzzy 
negation. 

These initial classes were successfully general- 
ized considering more general classes of aggregation 
functions, mainly uninorms, generating new classes 
of fuzzy implications with interesting properties. In 
this way, (U, N), RU-implications and QLU-operations 
have been deeply analyzed [12.3, Chap. 5], [12.6]. 

A second approach to obtain fuzzy implications is 
based on the direct use of unary monotonic functions. 
In this way, the most important families are Yager’s f- 
and g-generated fuzzy implications which can be seen 
as implications generated from additive generators of 
continuous Archimedean t-norms and f-conorms, re- 
spectively [12.3, Chap. 3]: 


1) Yager’s f-generated implications are defined as 


Kay =f fO), x ye [0,1], 


with the understanding 0 - oo = 0, where f: [0, 1] > 
[0, co] is a strictly decreasing and continuous func- 
tion with f(1) = 0. 

2) Yager’s g-generated implications are defined as 


let) = 7 (min } +0). 
x,y € [0,1], 


with the understanding i= co and œo:0= œ 


where g: [0, 1] — [0, oo], is a strictly increasing and 
continuous function with g(0) = 0. 


The above classes give rise to fuzzy implications 
with different additional properties which are collected 
in Table 12.2. All the results referred in Table 12.2 are 
from [12.3, Chaps. 2 and 3]. 

One of the main topics in this field is the character- 
ization of each of these families of fuzzy implications 
through algebraic properties. This is an essential step in 
order to understand the behavior of these families. The 
available characterization results of the above families 
of implications are collected below. 


Theorem 12.1 [12.3, Theorem 2.4.10] 
For a function J: [0, 1]? — [0,1] the following state- 
ments are equivalent: 


185 


ZL] a Wed 


186 PartB 


Fuzzy Logic 


ZL | @ Wed 


Table 12.2 Classes of fuzzy implications and the additional properties they satisfy 


Class / Properties (NP) (EP) (IP) (OP) (CP(N)) Nr 

(S, N)-imp. Vv Vv Thm. 2.4.17 Thm. 2.4.19 Prop. 2.4.3 N 

R-imp. with l-c. T Vv Vv Vv Vv Prop. 2.5.28 Nr 
QL-imp. Vv Thm. 2.6.19 Sect. 2.6.3 Sect. 2.6.4 Sect. 2.6.5 N 

f-gen. Vv Vv x x Thm. 3.1.7 Prop. 3.1.6 
g-gen. v v Thm. 3.2.8 Thm. 3.2.9 x Np, 


i) J is an (S,N)-implication with a continuous (strict, 
strong) fuzzy negation N. 

ii) J satisfies (11), (EP), and N; is a continuous (strict, 
strong) fuzzy negation. 


Moreover, in this case the representation /(x, y) = 
S(N(x),y) is unique with N=WN, and S(x,y) = 
I(Ity(x),y) (for the definition of Ry see [12.3, 
Lemma 1.4.10]). 


Theorem 12.2 [12.3, Theorem 2.5.17] 
For a function Z: [0,1]? — [0,1] the following state- 
ments are equivalent: 


i) IZ is an R-implication generated from a left- 
continuous f-norm. 

ii) Z satisfies (I2), (EP), (OP) and it is right continuous 
with respect to the second variable. 


Moreover, the representation 
I(x, y) = max{t € [0, 1]|T7(@, ) < y} 
is unique with 


T(x, y) = min{t € [0, 1]|J@, 1) > y}. 


As already said, it is still an open question when 
QL-operations are fuzzy implications. However, in the 
continuous case, when S and N are the g-conjugates of 
the Lukasiewicz t-conorm Sx and the classical negation 
Nc, respectively, for some order automorphism ¢ on the 
unit interval, the QL-operation has the following expres- 
sion 

Ir sn (x,y) = lo.r y) 

=9 '(1-9@) + 9(T(@.y))), 
x,y € [0,1], 


and we have the following characterization result. 


Theorem 12.3 [12.3, Theorem 2.6.12] 

For a QL-operation Iy,7, where T is a t-norm and @ 
is an automorphism on the unit interval, the following 
statements are equivalent: 


i) Ig,r is a QL-implication. 
ii) Ty-1 satisfies the Lipschitz condition, i. e., 


|To- &1, y1) — To- 2. y2)| 
<x =x] + lyi = y2l; 41.42, y1, y2 € [0, 1] . 
In addition, (U, N)-implications are characterized 
in [12.3, Theorem 5.3.12] and more recently, Yager’s f 
and g-generated [12.7] and RU-implications [12.8] have 
been also characterized. Finally, due to its importance in 
many results, we recall the characterization of the fam- 
ily of the conjugates of the Łukasiewicz implication. 


Theorem 12.4 [12.3, Theorem 7.5.1] 

For a function Z: [0, 1]? > [0, 1] the following state- 

ments are equivalent: 

i) Zis continuous and satisfies both (EP) and (OP). 

ii) I is a -conjugate with the Łukasiewicz implica- 
tion Jig, i.e., there exists an automorphism on 
the unit interval, which is uniquely determined, such 
that J has the form 


I(x, y) = (Lk)o (x,y) 
= '(min{1— px) +90), }}), 


x,y € [0, 1]. 
Irs Io Th Ipc Iæ |} Tie 
Txp |g Irc||\/v4 
Ip Tix 
lac 
Trp Icp 
Is,N Iwg 
FI 2 Ir 


Fig. 12.1 Intersections between the main classes of fuzzy 
implications 


Fuzzy Implications: Past, Present and Future | 12.2 Current Research on Fuzzy Implications 187 


For the conjugates of the other basic implications 
in Table 12.1, see the characterization results in [12.3, 
Sect. 7.5]. 

The great number of classes of fuzzy implications 
induces the study of the intersection between the differ- 
ent classes which brings out both the unity that exists 
among this diversity of classes and where the basic im- 
plications from Table 12.1 are located. The intersections 
among the main classes of fuzzy implications were stud- 
ied in [12.3, Chap. 4] and are graphically displayed in 
Fig. 12.1 (note that FI, Is nN, Ir, Ion, lr and Ig 
denote the families of all fuzzy implications, (S,N)- 
implications, R-implications, QL-implications, Yager’s 
f-generated implications and Yager’s g-generated impli- 


cations, respectively). In this figure, we have included 
the fuzzy implications of Table 12.1 and the following 
fuzzy implications which are examples of implications 
lying in some intersection between some families 


3 y 
Ia (x, y) = min f1, =| : 
XX 


1, ifx=0, 


Í f — 
Dt.) l ifx>0, 
Ipc(x, y) = 1 — (max{x(x + xy? — 2y), 0})? . 


Also note that it is still an open problem to prove if 
den NIr)\Is.n = 9. 


12.2 Current Research on Fuzzy Implications 


In the previous sections, we have seen some func- 
tional equations, namely, the exchange property (EP), 
the contrapositive symmetry (CP(N)) and the like. In 
this section, we deal with a few functional equations (or 
inequalities) involving fuzzy implications. These equa- 
tions, once again, arise as the generalizations of the 
corresponding tautologies in classical logic involving 
boolean implications. 


12.2.1 Functional Equations and Properties 


A study of such equations stems from their applica- 
bility. The need for a plethora of fuzzy implications 
possessing various properties is quite obvious. On the 
one hand, they allow us to clearly classify and charac- 
terize different fuzzy implications, while on the other 
hand, they make themselves appealing to different ap- 
plications. Thus, the functional equations presented in 
this section are chosen to reflect this dichotomy. 


Distributivity over other Fuzzy Logic Operations 
The distributivity of fuzzy implications over different 
fuzzy logic connectives, like t-norms, t-conorms, and 
uninorms is reduced to four equations 


I(x, Ci(y, z)) = Co, y), L(x, 2) , (12.1) 
I(x, D,(y,2)) = Da (x, y), I(x, 2)) , (12.2) 
I(C(x, y), z) = DU (x, z), 10,2) , (12.3) 
I(D(, y), z) = CU, z), 10,2), (12.4) 


satisfied for all x,y,z € [0, 1], where J is some gener- 
alization of classical implication, C, C1, C2 are some 


generalizations of classical conjunction and D, D1, D2 
are some generalizations of classical disjunction. 

All the above equations can be investigated in two 
different ways. On the one hand, one can assume that 
function J belongs to some known class of fuzzy im- 
plications and investigate the connectives C;, D; that 
satisfy (12.1)-(12.4), as is done in the following works, 
for e.g., Trillas and Alsina [12.9], Balasubramaniam 
and Rao [12.10], Ruiz-Aguilera and Torrens [12.11, 
12] and Massanet and Torrens [12.13]. On the other 
hand, one can assume that the connectives C;, D; come 
from the known classes of functions and investigate 
the fuzzy implications / that satisfy (12.1)—(12.4). See 
the works of Baczyński [12.14, 15], Baczyński and Ja- 
yaram [12.16], Baczyński and Qin [12.17, 18] for such 
an approach. 

The above distributive equations play an important 
role in reducing the complexity of fuzzy systems, since 
the number of rules directly affects the computational 
duration of the overall application (we will discuss this 
problem again in Sect. 12.3.2). 


Law of Importation 
One of the desirable properties of a fuzzy implication is 
the law of importation as given below 


I(x, I(y,z)) =I(T(x,y),z), x y,z€[0,1], (12.5) 


where T is a f-norm (or, in general, some conjunc- 
tion). It generalizes the classical tautology (p ^q) > 
r = (p> (q > r)) into fuzzy logic context. This equa- 
tion has been investigated for many different families 


zzl | a Hed 


188 PartB 


Fuzzy Logic 


zzl | d Hed 


of fuzzy implications (for results connected with main 
classes see [12.3, Sect. 7.3]). Fuzzy implications satis- 
fying (12.5) have been found extremely useful in fuzzy 
relational inference mechanisms, since one can obtain 
an equivalent hierarchical scheme which significantly 
decreases the computational complexity of the system 
without compromising on the approximation capability 
of the inference scheme. For more on this, we refer the 
readers to the following works [12.19, 20]. Related with 
(12.5) is its equivalence with (EP) that has been an open 
problem till the recent paper [12.21], where it is proved 
that (12.5) is stronger than (EP) and equivalent when N; 
is continuous. 


T-Conditionality or Modus Ponens 
Another property investigated in the scientific litera- 
ture, which is of great practical importance (see also 
Sect. 12.3.1), is the so-called T-conditionality, defined 
in the following way. If J is a fuzzy implication and T is 
a t-norm, then J is called an MP-fuzzy implication for T, 
if 


T(x, Ix, y) <y, (12.6) 


Investigations of (12.6) have been done for the three 
main families of fuzzy implications, namely, (S, N)-, 
R-, and QL-implications [12.3, Sect. 7.4]. 


x,y € [0, 1]. 


Nonsaturating Fuzzy Implications 

Investigations connected with subsethood measures 
(see Sect. 12.3.3) and constructing strong equality func- 
tions by aggregation of implication functions by the 
formula W(x, y) = M (I(x, y), I(y, x)), where M is some 
symmetric function, have led researchers to consider 
under which properties a fuzzy implication J satisfies 
the following conditions: 


(P1) I(x, y) = 1 if and only if x = 0 or y = 1; 
(P2) I(x, y) = 0 if and only if x = 1 andy = 0. 


In [12.22], the authors considered the possible re- 
lationships between these two properties and the prop- 
erties usually required of implication operations. More- 
over, they developed different construction methods of 
strong equality indexes using fuzzy implications that 
satisfy these two additional properties. 


Special Fuzzy Implications 
Special implications were introduced by Hájek and Ko- 
hout [12.23] in their investigations on some statistics on 
marginals. The authors further have shown that they are 
related to special GUHA-implicative quantifiers (see, 
for instance, [12.24—26]). Thus, special fuzzy impli- 


cations are related to data mining. In their quest to 
obtain some many-valued connectives as extremal val- 
ues of some statistics on contingency tables with fixed 
marginals, they especially focussed on special homoge- 
nous implicational quantifiers and showed that: 


Each special implicational quantifier determines 
a special implication. Conversely, each special 
implication is given by a special implicational 
quantifier. 


Definition 12.2 

A fuzzy implication J is said to be special, if for any 
€ > Oand for all x, y € [0, 1] such thatx+e, y+e € [0, 1] 
the following condition is satisfied 


I(x, y) <I(xteyte). (12.7) 


Recently, Jayaram and Mesiar [12.27] have investi- 
gated the above functional equation. Their study shows 
that among the main classes of fuzzy implications, no f- 
implication is a special implication, while the Goguen 
implication /gg is the only special g-implication. Based 
on the available results, they have conjectured that the 
(S, N)-implications that are special also turn out to be 
R-implications. However, in the case of R-implications 
(generated from any t-norm) they have obtained the fol- 
lowing result. 


Theorem 12.5 [12.27, Theorem 4.6] 

Let T be any t-norm and /r be the R-implication 
obtained from T. Then the following statements are 
equivalent: 


i) Ir satisfies (12.7). 

ii) T satisfies the 1-Lipschitz condition. 

iii) T has an ordinal sum representation ((eq,dq, 
Ta))awea Where each t-norm Ty, a € A is generated 
by a convex additive generator (for the definition of 
ordinal sum, see [12.5]). 


Having shown that the families of (S,N)-, f-, and 
g-implications do not lead to any new special implica- 
tions, Jayaram and Mesiar [12.27] turned to the most 
natural question: Are there any other special implica- 
tions, than those that could be obtained as residuals 
of t-norms? This led them to propose some interest- 
ing constructions of fuzzy implications which were 
also special — one such construction is given in Defi- 
nition 12.4 in Sect. 12.2.2. 


Fuzzy Implications: Past, Present and Future | 12.2 Current Research on Fuzzy Implications 189 


12.2.2 New Classes and Generalizations 


Another current research line on fuzzy implications is 
devoted to the study of new classes and generalizations 
of the already known families. The research in this di- 
rection has been extensively developed in recent years. 
Among many generalizations of already known classes 
of implications that have been dealt with in the litera- 
ture, we highlight the following ones. 


Generalizations of R-implications 
The family of residual implications is one of the most 
commonly selected families for generalization. As al- 
ready mentioned in Sect. 12.1, the RU-implications 
were the first generalization obtained via residuation 
from uninorms instead of from t-norms. In the same 
line, many other families of aggregation functions have 
been used to derive residual implications: 


1. Copulas, quasi-copulas, and semicopulas were used 
in [12.28]. The main results in this work relate to 
the axiomatic characterizations of those functions J 
that are the residual implications of left-continuous 
commutative semicopulas, the residuals of quasi- 
copulas, and the residuals of associative copulas. 
For details on these characterizations, that involve 
up to ten different axioms, see [12.28]. 

2. Representable aggregation functions (RAFs) were 
used in [12.29]. These are aggregation functions 
constructed from additive generators of continuous 
Archimedean f-conorms and strong negations. The 
interest in the residual implications obtained from 
them lies in the fact that they are always continu- 
ous and in many cases they also satisfy the modus 
ponens with a nilpotent f-conorm. In particular, 
residual implications that depend only on a strong 
negation N are deduced from the general method 
just by considering specific generators of continu- 
ous Archimedean f-conorms. 

3. A more general situation is studied in [12.30] where 
residual implications derived from binary functions 
F: [0, 1]? + [0, 1] are studied. In this case, the pa- 
per deals with the minimal conditions that F must 
satisfy in order to obtain an implication by residu- 
ation. The same is done in order to obtain residual 
implications satisfying each one of the most usual 
properties. 

4. It is well known that residual implications de- 
rived from continuous Archimedean t-norms can 
be expressed directly from the additive genera- 
tor of the t-norm. A generalization of this idea is 


presented in [12.31], where strictly decreasing func- 
tions f: [0, 1] — [0, +00] with f(1) = 0 are used to 
derive implications as follows 


ee l; ifx<y, 

My VY Ap : 

FEO FOD-H), ifx>y, 
where f(t) =lim, + f0) and FAH =f0). 
Properties of these implications are studied and 
many new examples are also derived in [12.31]. 


Generalizations of (S, N)-Implications 
Once again a first generalization of this class of im- 
plications has been done using uninorms leading to 
the (U, N)-implications mentioned in Sect. 12.1, but 
recently many other aggregation functions were also 
employed. 

This is the case for instance in [12.32], where 
the authors make use of TS-functions obtained from 
a t-norm T, a t-conorm S and a continuous, strictly 
monotone function f: [0, 1] — [—oo, +00] through the 
expression 


TSa py) =f (AA (Ty) + AF (SOY) 


for x, y € [0, 1], where A € (0, 1). Operators defined by 
I(x, y) = TS, ¢(N(x), y) are studied in [12.32] giving 
the conditions under which they are fuzzy implications. 

Another approach is based on the use of dual repre- 
sentable aggregation functions G, that are simply the 
N-dual of RAFs, introduced earlier. In this case, the 
corresponding (G, N)-operator is always a fuzzy impli- 
cation and several examples and properties of this class 
can be found in [12.33]. See also [12.34] where it is 
proven that they satisfy (EP) (or (12.5)) if and only if G 
is in fact a nilpotent t-conorm. 


Generalizations of Yager's Implications 
In this case, the generalizations usually deal with the 
possibility of varying the generator used in the defini- 
tion of the implication. A first step in this line was taken 
in [12.35] by considering multiplicative generators of t- 
conorms, but it was proven in [12.36] that this new class 
is included in the family of all (S, V)-implications ob- 
tained from f¢-conorms and continuous fuzzy negations. 

Another approach was given in [12.37] introducing 
(f, g)-implications. In this case, the idea is to general- 
ize f-generated Yager’s implications by substituting the 
factor x by g(x) where g: [0, 1] — [0, 1] is an increasing 
function satisfying g(0) = 0 and g(1) = 1. 


TZL | a Hed 


190 PartB 


Fuzzy Logic 


zzl | d Hed 


In the same direction, a generalization of f- and 
g-generated Yager’s implications based on aggregation 
operators is presented and studied in [12.38], where the 
implications are constructed by replacing the product 
t-norm in Yager’s implications by any aggregation func- 
tion. 

Finally, h-implications were introduced in [12.39] 
and are constructed from additive generators of repre- 
sentable uninorms as follows. 


Definition 12.3 ([12.39]) 

Let h: [0, 1] —> [—co, co] be a strictly increasing and 
continuous function with h(0) = —oo, h(e) = 0 for an 
e e (0,1) and A(1) = +00. The function J": [0, 1]? > 
[0, 1] defined by 


1, ifx=0, 
I" (x, y= 4h '(x-h(y)), 
h“'(2-hO)) , 


ifx>Oandy<e, 


ifx>Oandy>e, 


is called an h-implication. 


This kind of implications maintains several properties 
of those satisfied by Yager’s implications, like (EP) and 
(12.5) with the product t-norm, but at the same time 
they satisfy other interesting ones. For more details on 
this kind of implications, as well as some generaliza- 
tions of them, see [12.39]. 


12.2.3 New Construction Methods 


In this section, we recall some construction meth- 
ods of fuzzy implications. The relevance of these 
methods is based on their capability of preserv- 
ing the additional properties satisfied by the ini- 
tial implication(s). First, note that some of them 
were already collected in [12.3, Chaps. 6 and 7], 
like: 


@ The ¢g-conjugation of a fuzzy implication J 


To(x.y) = P7 MA, pO), xy € [0,1], 


where ¢ is an order automorphism on (0, 1]. 
@ The min and max operations from two given fuzzy 
implications 


(IV J) (x,y) = max{I (x, y), J(x, y)} , x,y € [0, 1], 
(TA J)@, y) = mnf, y), JŒ, y)}, x,y € [0, 1]. 


@ The convex combinations of two fuzzy implica- 
tions, where À € [0, 1] 


Pay) = ATGey) + (1-A)- Jy), 
x,y € [0,1]. 

@ The N-reciprocation of a fuzzy implication 7 
Ty(x,y) =1(N(y).N@)), xy € [0,1], 


where N is a fuzzy negation. 
@ The upper, lower, and medium contrapositivization 
of a fuzzy implication J defined, respectively, as 


Ty (x, y) = max{I(x, y), Inx, y)} 
= (IV In)(x, y), 
Ty (x, y) = min{I(x, y), Inx, y)} 
= (TA In)(x,y) , 
Ix (x, y) = mintl(x, y) V N(x), Inx, y) Vy}, 


where N is a fuzzy negation and x, y € [0, 1]. Please 
note that the lower (upper) contrapositivization is 
based on applying the min (max) method to a fuzzy 
implication J and its N-reciprocal. 


It should be emphasized that the first major work to 
explore contrapositivization in detail, in its own right, 
was that of Fodor [12.40], where he discusses the con- 
trapositive symmetry of fuzzy implications for the three 
main families, namely, S-, R-, and QL-implications. 
In fact, during this study Fodor discovered the nilpo- 
tent minimum f-norm Tam, which is by far the first 
left-continuous but noncontinuous t-norm known in 
the literature. This study had a major impact on the 
development of left-continuous t-norms with strong 
natural negation, for instance, see the early works of 
Jenei [12.41, and references therein]. 

The above fact clearly illustrates how the study of 
functional equations involving fuzzy implications have 
also had interesting spin-offs and have immensely ben- 
efited other areas and topics in fuzzy logic connectives. 

Among the new construction methods proposed in 
the recent literature, we can roughly divide them into 
the following categories. 


Implications Generated from Negations 
The first method was introduced by Jayaram and 
Mesiar in [12.42], while they were studying special im- 
plications (see Definition 12.2). From this study, they 
introduced the neutral special implications with a given 
negation and they studied the main properties of this 
new class. 


Fuzzy Implications: Past, Present and Future | 12.2 Current Research on Fuzzy Implications 191 


Definition 12.4 [12.42] 
Let N be a fuzzy negation such that N < Nc. Then the 
function Itn]: [0, 1]? — [0, 1] given by 


1, ifx<y, 
N(x-y)A- 

y @—y)U—x) 
l—-x+y 


Taney ifx>y 


with the understanding g = 0, is called the neutral spe- 


cial implication generated from N. 


The second method of generation of fuzzy implica- 
tions from fuzzy negations was introduced in [12.43]. 


Definition 12.5 [12.43] 
Let N be a fuzzy negation. The function 7™1: [0, 1]? > 
[0, 1] is defined by 


1 ifx<y, 


gD] A = = 
(x,y) EY og ifx>y. 
x 


Again, several properties of these new implications 
can be derived, specially when the following classes of 
fuzzy negations are considered 


MG) 1, ifxeA, 
x = 
7 0, ifx¢A, 
1, ifxeA, 
Nap) = 4 1-x fxg A 
, ifx¢gA, 
1+ Bx 


where A = [0, œ) with a € (0,1) or A = [0, a] witha € 
[0, 1]. Note that Nto, = Np, and N¢o,,g is the Sugeno 
class of negations. Note also that J! can be expressed 
as IM (x, y) = Sp (N(x), Ice (x, y)) for all x,y € [0, 1]. 
From this observation, replacing Sp for any f-conorm 
S and Jgg for any implication /, the function 


INST x, y) = SNO), 1y), x,y € [0,1], 
is always a fuzzy implication. 


Implications Constructed 

from Two Given Implications 
In this section, we present methods that generate a fuzzy 
implication from two given ones. 

The first method is based on an adequate scaling of 
the second variable of the two initial implications and it 
is called the threshold generation method [12.44]. 


Definition 12.6 [12.44] 
Let J; and h be two fuzzy implications and e € (0, 1). 
The function J;,—;,: [0, 1]? — [0, 1] defined by 


1, ifx=0, 


el (x z), 
e 


e+(1-e)-h (z N, 
l—e 


ifx>Oandy<e, 
Th —h (x,y) = 


ifx>Oandy>e, 


is called the e-threshold generated implication from /, 
and h. 


This method allows for a certain degree of con- 
trol over the rate of increase in the second variable of 
the generated implication. Furthermore, the importance 
of this method derives from the fact that it allows us 
to characterize h-implications as the threshold gener- 
ated implications of an f-generated and a g-generated 
implication [12.13, Theorem 2 and Remark 30]. Fur- 
ther, in contrast to many other generation methods of 
fuzzy implications from two given ones, it preserves 
(EP) and (12.5) if the initial implications possess them. 
Moreover, for an e € (0,1), the e-threshold generated 
implications can be characterized as those implications 
that satisfy I(x, e) = e for all x > 0. 

The threshold generation method given above is 
based on splitting the domain of the implication with 
a horizontal line and then scaling the two initial impli- 
cations in order to be well defined in those two regions. 
An alternate but analogous method can be proposed 
by using a vertical line instead of a horizontal line. 
This is the idea behind the vertical threshold generation 
method of fuzzy implications. This method does not 
preserve as many properties as the horizontal threshold 
method, but some results can still be proven. In partic- 
ular, they are characterized as those fuzzy implications 
such that /(e, y) = e for all y < 1 [12.45]. 

The following two construction methods were pre- 
sented in [12.46]. Given two implications J, J, the 
following operations are introduced 


IVI) (xy) =I, x), Jy) » 
(1@ J)(x, y) = 1x, Jy). 


for all x, y € [0, 1]. The properties of these new opera- 
tions as well as the structure of the set of all implica- 
tions FI equipped with each one of these operations is 
studied in [12.46]. 


TZL | a Hed 


192 


zzl | d Hed 


Part B 


Fuzzy Logic 


Other Construction Methods 
In addition to the above methods, we would like to 
recall the following interesting method based on condi- 
tional probability and conditional distribution functions 
presented by Grzegorzewski in [12.47]. 


Definition 12.7 [12.47, 48] 
The function Jc: [0, 1]? — [0, 1] given by 


l, ifx=0, 


C(x, y) 
x 


Io(x,y) = 


, ifx>0, 


where C is acopula, is called a probabilistic implication 
based on copula C. 


Conditions on copula C ensuring that the corre- 
sponding Ic is an implication, as well as properties of 
these implications are detailed in [12.48]. The main in- 
terest on this kind of implications lies in the fact that 
they are a powerful link between probability theory and 
fuzzy implications theory that can be useful in approxi- 
mate reasoning. Moreover, results on these probabilistic 
implications can also be useful for examining and inter- 
preting the behavior of some stochastic events. Some 
early results in this direction have appeared in [12.49, 
50], where some generalizations of the previous idea 
are considered. In particular in [12.51], survival impli- 
cations based on the probability that a given object will 
survive a fixed time into a population are studied. In this 
case, the survival implications are defined by 


1, ifx=0, 
x+y—1+C(1—x,1—y) 
x 


ley) = 


> ifx>0. 


where C is again a copula. 

Finally, we only briefly mention that there exist 
other construction methods. For instance, Massanet 
and Torrens [12.13,44,45] have proposed methods of 
constructing implications derived from a given impli- 
cation I and a fuzzy negation N as part of their study 
on some properties of horizontal and vertical threshold 
generated implications. 


12.2.4 Fuzzy Implications in Nonclassical 
Settings 


When we deal with uncertainty through fuzzy sets and 
fuzzy logic the natural framework is the unit inter- 
val [0, 1] and hence the logical connectives to be used 


are interpreted as operators on this interval. However, 
there are many different tools that have been proposed 
for managing uncertainty. In this context, some ex- 
tensions of fuzzy logic and fuzzy sets have also been 
developed. One can list at least the following exten- 
sions: interval-valued fuzzy sets, Atanassov intuitionis- 
tic fuzzy sets (that are equivalent to the interval-valued 
approach, [12.52]), interval-valued intuitionistic fuzzy 
sets, type-2 fuzzy sets, fuzzy multisets, n-dimensional 
fuzzy sets, and hesitant fuzzy sets. 

For all these extensions, the usual logical connec- 
tives like fuzzy conjunctions and fuzzy disjunctions 
need to be studied to develop a comprehensive theory, 
and especially fuzzy implications in order to make in- 
ferences in each one of these extensions. Due to space 
constraints, we only recall some aspects of interval- 
valued (or intuitionistic) fuzzy implications and the 
references where they can be found. 


Interval-Valued Approach 

A good compilation of the known results related to 
fuzzy implications (and other operations) in the interval- 
valued framework, can be found in [12.53] or [12.54] 
wherein, interval-valued or intuitionistic (S, N)- and R- 
implications are developed and some of their properties 
are presented. Works that deal with the construction of 
these classes of interval-valued implications can also be 
found in the literature. For instance, in [12.55] a con- 
struction method for the residual implication associated 
with a representable t-norm (constructed from two stan- 
dard t-norms T; and T, with Tı < T2) is presented. Sim- 
ilarly, (S, N)- and R-implications generated from: 


i) Aggregation functions and a standard fuzzy nega- 
tion are presented in [12.56]. 

ii) Some classes of interval-valued aggregation func- 
tions based on f-norms and f-conorms are dealt with 
in [12.57]. 

iii) The so-called Kq-operators have been proposed in 
[12.58]. 


Discrete Approach 
Note that all the above mentioned tools are mainly used 
in the management of imprecise quantitative informa- 
tion. However, experts deal with many problems where 
qualitative information is usually expressed through 
linguistic terms. Qualitative information is often inter- 
preted to take values in a totally ordered finite scale like 


{Extremely Bad, Very Bad, Bad, Fair, 


(12.8) 
Good, Very Good, Extremely Good}. 


Fuzzy Implications: Past, Present and Future 


12.3 Fuzzy Implications in Applications 


In these cases, the representative finite chain L, = 
{0,1,...,m} is usually considered to model these lin- 
guistic hedges and several researchers have devel- 
oped an extensive study of operations on L,, usu- 
ally called discrete operations. This approach allows 
avoiding numerical interpretations and consequently, 
the fuzzification and defuzzification steps become un- 
necessary. In this framework, the smoothness con- 
dition is usually considered as the discrete counter- 
part of continuity. In fact, in the discrete framework 
this property is equivalent to the divisibility prop- 
erty as well as to the Lipschitz condition. In this 
way, smooth discrete t-norms and ft-conorms were 
studied and characterized in [12.59] and also dis- 
crete fuzzy implications derived from them have been 
introduced. 

As in the case of [0,1], the four most usual ways 
to construct discrete implications from t-norms and t- 
conorms on L, are (S, N)-, R-, QL-, and D-implications. 
The first two classes derived from smooth t-norms 
and f-conorms and the only strong negation on L, 
(given by No(x) =n—x) were studied in [12.60]. In 
the smooth case, it is proven that the intersection be- 
tween (S,N)- and R-implications contains only the 
Lukasiewicz implication [12.60, Proposition 10]. Fur- 
ther, the nonsmooth case has also been investigated 
showing a parameterized family of nonsmooth t-norms 
T for which the corresponding R-implication coincides 
with the (S, V)-implication derived from the No-dual of 
T. The case of discrete QL- and D-operators is studied 
in [12.61], where characterization results on when such 
operators are in fact implications are given and, more- 
over, it is proven that both these classes coincide in the 
smooth case. 


However, the modeling of linguistic information is 
limited because the information provided by experts for 
each variable must be expressed by a simple linguistic 
term. In most cases, this is a problem for experts be- 
cause their opinion does not agree with a concrete term. 
On the contrary, experts’ values are usually expressions 
like better than Good, between Fair and Very Good, or 
other even more complex expressions. 

To avoid the limitation above, an approach has re- 
cently appeared trying to increase the flexibility of 
the elicitation of linguistic information. This approach 
deals with the possibility of extending monotonic op- 
erations on L, to operations on the set of discrete 
fuzzy numbers whose support is a subinterval of Ln, 
usually denoted by A". The idea lies in the fact that 
any discrete fuzzy number A € A can be consid- 
ered (identifying the scale £ given in (12.8) with the 
chain Le) as an assignment of a [0, 1]-value to each 
term in our linguistic scale. As an example, the above 
mentioned expression between Fair and Very Good can 
be performed, for instance, by a discrete fuzzy number 
AE AK, with support given by the subinterval 


[Fair, Very Good] = {Fair, Good, Very Good} , 


(that corresponds to the subinterval [3,5] in L6). The 
values of A in its support should be described by 
experts, allowing in this way a complete flexibility 
of the qualitative valuation. Usual operations like t- 
norms, f-conorms, strong negations, aggregation func- 
tions, and also fuzzy implications have been introduced 
in this framework. The case of (S,N)-, QL- and D- 
implications can be found in [12.62, 63] and the case 
of R-implications in [12.64]. 


12.3 Fuzzy Implications in Applications 


So far, we have discussed the theoretical aspects of 
fuzzy implications, namely, analytical and algebraic. In 
this section, we discuss their applicational value which 
shows a wide spectrum of areas wherein they are em- 
ployed and how the gamut of properties that a fuzzy 
implication possesses plays an important role in its em- 
ployability. 


12.3.1 FL,—Fuzzy Logic in the Narrow Sense 


Boolean implications are employed in inference 
schemas like modus ponens, modus tollens, etc., where 


the reasoning is done with statements or propositions 
whose truth-values are two valued. Fuzzy implica- 
tions play a similar role in the generalizations of the 
above inference schemas, where reasoning is done with 
fuzzy statements whose truth-value lies in [0, 1] instead 
of {0, 1}. 


Fuzzy Propositions 
An expression of the form x is A where A is a fuzzy 
set on an appropriate domain U, with reference to the 
context, is termed as a Fuzzy Statement or a Fuzzy 
Proposition. (The above two interpretations bear a close 


193 


EZL | d Hed 


194 PartB 


Fuzzy Logic 


EZ | d Hed 


resemblance to the Adjunctive and Connective interpre- 
tations as given in [12.65, pp. 331], though they are 
originally given for a binary operator. For other views 
and interpretation of the above statement, see, for in- 
stance, Bezdek et al., [12.66].) 

Let it be given that x is A and also that x assumes 
the precise value, let us say, x = u, where u € U, the 
domain of A. Then the truth value of the above fuzzy 
statement is obtained as follows 


t(x is A) = A(u) , 


i. e., the truth value of the above fuzzy statement, given 
that x is precisely known, is equal to the degree to 
which u — the value x assumes — is itself compatible 
with the fuzzy set A. Thus greater the membership de- 
gree of u in the concept A, higher is the truth value of 
the fuzzy statement. 

Consider the statement John is Tall and that x — the 
height of John — is precisely given to be 5'10” € U. 
Now, A(5’10”) gives the membership degree of 5'10” 
in the concept A = Tall, which can be interpreted as 
how much John belongs to the set of all Tall men, or 
equivalently, how much John is Tall is true, which is 
nothing but the truth-value t(John is Tall). 


Fuzzy Conditionals or Fuzzy IF-THEN Rules 
A fuzzy statement of the type discussed above X is A 
can be interpreted in yet another way, namely, as a lin- 
guistic statement, i.e., as an assignment of a fuzzy set 
to a variable. 

Let A: U — [0, 1] be a fuzzy set on a suitable do- 
main U. Then A can be taken to represent a concept. 
A linguistic variable of U is a symbol * that can assume 
or be assigned any fuzzy subset of U. Then a linguistic 
statement X is A is interpreted as the linguistic variable 
X taking the linguistic value A. 

For example, let U denote the set of all values in 
degrees centigrade. If the linguistic variable ¥ denotes 
Temperature, then it can assume the following linguis- 
tic values A, namely, high, more or less high, medium, 
cool, very cold, etc. Each of the linguistic values (say 
A = cool) is represented by a fuzzy set on the domain 
U of the linguistic variable Ñ, i. e., A: U — [0, 1]. 

The shape of the graph of the function represents 
the concept (say high temperature). The concept of high 
temperature is itself again context dependent. For ex- 
ample, high temperature (fever) for a human being is 
different from the high temperature in a blast furnace, 
and accordingly the domain of the linguistic variable is 
selected. 


A fuzzy IF-THEN tule is of the form 


IF xis A THEN Vis B, (12.9) 
where A, B are linguistic expressions/values assumed 
by the linguistic variables x, y. For example, 


IF’ X (temperature) is A (high) 
THEN y (pressure) is B (low). 


Generalized Modus Ponens 
Let a, P be two fuzzy propositions as given above and 
let x — £ be the fuzzy conditional which is a fuzzy 
IF-THEN rule as above. In classical logic, one uses 
rules of deduction, like modus ponens and modus tol- 
lens to deduce new knowledge from a given set of 
propositions. For instance, modus ponens states that 
aA(a—>B)FB. 

In fuzzy logic, since we deal with fuzzy propo- 
sitions whose truth values vary over the entire [0, 1] 
interval we employ fuzzy logic operations. Typically ^ 
is interpreted as a t-norm T and for the — a fuzzy im- 
plication is used. 

Unlike with classical propositions, when we deal 
with fuzzy propositions it is not always given that from 
a A (a —> f) one obtains f. This type of deduction 
is known as generalized modus ponens (GMP) and the 
study of pairs of operators (A, —), or alternately, a t- 
norm and fuzzy implication (T, /), that can be employed 
in GMP becomes important. It can be shown that this 
property translates to studying pairs (T,/) that satisfy 
the functional equation T(x, I(x, y)) < y for x, y € [0, 1], 
which is nothing but T-conditionality as dealt with in 
Sect. 12.2.1. 


Proof by Contradiction 
In classical logic, many a time one proves a statement 
of the form a —> £ by proving its contrapositive, i. e., 
=p — ~a. However, in the setting of fuzzy logic, of- 
ten the negation — used is noninvolutive, i. e., ~=—a +Æ 
a. 

For instance, when the underlying fuzzy logic 
operations come from the Gödel residuated lattice 
([0, 1], Tm, Zep, A, V), the natural negation of the fuzzy 
implication Igp is not involutive and Igp is not contra- 
positive w.r.t. any fuzzy negation. This led to the study 
of contrapositivization of fuzzy implications which was 
begun by Fodor [12.40] and is dealt with in Sect. 12.2.3 
above. 


Fuzzy Implications: Past, Present and Future | 12.3 Fuzzy Implications in Applications 195 


12.3.2 Approximate Reasoning 


One of the best known application areas of fuzzy logic 
is approximate reasoning (AR), wherein from impre- 
cise inputs and fuzzy premises or rules we obtain, often, 
imprecise conclusions [12.67]. AR with fuzzy sets en- 
compasses a wide variety of inference schemes and 
have been readily embraced in many fields, especially 
among others: decision making, expert systems, and 
control. Fuzzy implications play a vital role in many of 
these inference mechanisms, a brief discussion of which 
is presented below. 


Inference Mechanisms in AR 
Let us be given a set of n fuzzy IF-THEN rules of the 
form given in (12.10) 


Ifxis A; Thenyis Bi, i=1,2,...,n, (12.10) 


where A;, B; are fuzzy sets on input and output domains. 
Now, given a fuzzy input, i.e., a fuzzy proposition or 
a statement of the form Y is A’, the role of an inference 
mechanism is to obtain a fuzzy output B’ that satisfies 
some desirable properties [12.68, 69]. 

Note that, if we denote the fuzzy rules as A; —> 
Bi, i=1,2,...,n, as is typically done, then these 
are exactly the fuzzy conditionals discussed above in 
Sect. 12.3.1. Further, if we denote the input as A’ then 
an inference mechanism implements the generalized 
modus ponens by composing the fuzzy input A’ with 
all the rules A; —> B; to obtain the fuzzy output B’. 

There are two established ways to accomplish the 
above, namely, fuzzy relational inference (FRI) and sim- 
ilarity based reasoning (SBR). Fuzzy implications play 
a major role in both the types of inference mechanisms 
as detailed below. 


Fuzzy Relational Inference (FRI) 
In a fuzzy relational inference, all the rules A; — B; 
are combined into a single fuzzy relation R and the out- 
put B’ is obtained as an image of the input A’ composed 
with R. 

A fuzzy IF-THEN tule base of the form (12.10) is 
modeled as a fuzzy relation R(x, y):X x Y — [0, 1] as 
follows 


R(x, y) = NL, (Ai) > BiG) 
= Nim, (Ai (x), Bi(y))) . 
which reflects the conditional nature of the rules and 


where J is usually a fuzzy implication. Then given a fact 
Xis A’, the inferred output B’ is obtained either as: 


(12:11) 


i) sup-T composition, as in the compositional rule of 
inference (CRI) of Zadeh [12.70], or 

ii) An inf-J composition, as in the Bandler-Kohout 
subproduct (BKS) [12.71], 


of A’(x) and R(x, y), i.e., 


B’(y) = Aœ) 0 R, y) = sup T(A’(x),R(x,y)) . 
(12.12) 


B'O) =4 0 RG) = inf 114’), ROY), 
(12.13) 


where T can be any t-norm and 7 is any fuzzy implica- 
tion. 

It is clear from (12.12) and (12.13) that the impor- 
tant role fuzzy implications and their properties play in 
the goodness of an inference scheme. In the following 
subsection, we present a few issues where this role is 
highlighted. 


Issues in FRI 
While the rule base is an example of a single input sin- 
gle output (SISO) case, in practice we need multi-input 
single-output (MISO) rules of the form given below, 
with m input domains X;, j = 1,2,...,m, 


R; $ IF xX is Ail AND X2 is Ap AND 
... AND %, is Ain THEN Yis B; . 


While MISO rule bases are of great practical necessity, 
they spring up some new issues when they are em- 
ployed in FRIs. 


Combinatorial Explosion of Rules 

and Distributivity of Fuzzy Implications 
Let there be k; fuzzy sets defined on each of the do- 
mains X;, j= 1,2,...,m. Then in a complete MISO 
rule base, we will have n = kı x kz X - - - km number of 
rules. Clearly, as m or k; increases n increases and we 
have a combinatorial explosion of rules. 

In a seminal work on studying this issue, Combs 
and Andrews [12.72] proposed an equivalent transfor- 
mation of the CRI to mitigate the computational cost. 
The authors showed that the distributivity of fuzzy 
implications over t-norms play a major role in this 
transformation. This was further studied by Balasubra- 
maniam and Rao [12.10] and its use in SBR was also 
demonstrated later by Jayaram [12.73]. 


EZL | d Hed 


196 PartB 


Fuzzy Logic 


EZ | d Hed 


Computational Complexity, 

Hierarchical Systems, 

and the Law of Importation 
Let us consider an MISO rule base. From (12.11), it is 
clear that the relation Ê obtained is a multidimensional 
matrix, with R : X; x Xo xX- -X Xm xX Y > [0, 1]. In fact, 
when one uses the First-Infer-Then-Aggregate mecha- 
nism in an FRI, either CRI or BKS, one needs to store 
n such m-dimensional matrices. Further, the input A’ is 
also an m-dimensional matrix and the computation of 
the output gets costlier. 

To overcome this, Jayaram [12.19] proposed an 
alternate hierarchical inference scheme which can be 
shown to be equivalent both in the CRI [12.19] 
and BKS [12.20] setting, when the underlying 
operators are such that the ż-norm T and the 
fuzzy implication J satisfy the law of importation 
(12.5): 


12.3.3 Fuzzy Subsethood Measures 


Inclusion or subsethood of sets is an important con- 
cept. The first such definition of inclusion of a fuzzy 
set A over X in another fuzzy set B, was given by Zadeh 
[12.74] as follows 


A Cz B 4>A(x) < B(x), 
foralxexX. 


Note that this definition was more or less crisp, since 
an A was either contained in B or not. A more 
general notion of degree of inclusion was missing 
in the above definition. Subsequently many fuzzy 
subsethood measures, denoted (usually) Inc, were 
proposed. 


Axiomatic Studies 

on Fuzzy Subsethood Measures 
From the isomorphism that exists between classical set 
theory and classical logic, we know that A C B is equiv- 
alent to ya => Xg, where 7x is the characteristic func- 
tion of the set X. Thus, early fuzzy subsethood measures 
also mimicked this equivalence by defining them based 
on fuzzy implications. Many researchers, in particular, 
Sinha and Dougherty [12.75], Kitainik [12.76], Bandler 
and Kohout [12.77] proposed sets of axioms for an Inc 
to satisfy. 

It is easy to see that all of the above axiomatic ap- 
proaches, eventually lead to employing implications as 
the underlying operators to define the corresponding Inc 


measure, as given below 
Incgp(A, B) = inf min (1, A(A(x)) +AC.— B(x))) , 
xE 
Inc, (A, B) = inf eUKp (B(x), AQ) i 
xe 


1 — Iko (4x), B(x) , 
Incpx(A, B) = inf (A), B@))) , 


where A: [0, 1] — [0, 1] is a decreasing function with 
some additional properties, g: A — [0,1] a func- 
tion with additional properties where A = {(x, y) € 
[0, 1]?|x > y} and J is any fuzzy implication. 

From the above formulae the important position 
a fuzzy implication J holds in measuring fuzzy sub- 
sethood is apparent. Note that the Jnc measure is used 
extensively in similarity based reasoning (SBR) and in 
fuzzy mathematical morphology (FMM) which are dis- 
cussed below. 


12.3.4 Fuzzy Control 


While Sect. 12.3.2 dealt with FRIs which are largely 
used in the context of decision making and expert sys- 
tems, in this section we deal with another type of fuzzy 
inference mechanism (FIM) that is used in fuzzy con- 
trol, where the approximation properties of the FIM are 
important. 


Similarity-Based Reasoning (SBR) 
Let us once again consider a fuzzy IF-THEN rule base 
of the form (12.10) and a fuzzy input A’. In an SBR 
inference scheme, the following steps are employed to 
produce the output: 


@ Matching: The input A’ is matched against each 
of the antecedents A; of the rules (12.10) using 
a matching function M to obtain the correspond- 
ing similarity values s; = M(A’,A;) € [0, 1] for i= 
1,2,...,n. 

@ Modification: Each of the similarity values s; is used 
to modify the corresponding consequent B; of the 
rule (12.10) using a modification J to obtain the 
modified output BY = J(s;, Bi). 

@ Aggregation: Finally all the modified outputs B; 
are aggregated to obtain an overall output B= 
G(B),..., Bi). 


In notations, we can write the above as 


B'O) = GL, (IMA, A), Bi). yer. 


(12.14) 


Fuzzy Implications: Past, Present and Future | 12.3 Fuzzy Implications in Applications 197 


Fuzzy Implications and Matching Functions 
Clearly, since A, A; € F(X), we see that the matching 
function M : F(X) x F(X) —> [0, 1]. Typically, a fuzzy 
subsethood measure Jnc is employed as an M. While 
there exist M that are not based on fuzzy implications, 
it is seen that those that are based on fuzzy implications 
often satisfy many of the desirable properties required 
on the matching function M in different contexts, for 
instance, when the SBR is required to be interpolative, 
monotonic or for the SBR to possess good approxima- 
tion properties. For more on this topic, see the works of 
Jayaram [12.73] or Mandal and Jayaram [12.78]. 


Fuzzy Implications and Modification Functions 
From (12.14), it is clear that the modification function J 
can be seen simply as a binary function on [0, 1]. While 
any fuzzy logic operation could be used for J, fuzzy 
implications are preferred either due to their proper- 
ties or due to the conditional nature of the underlying 
rules. For instance, when J = I a fuzzy implication, if 
the original output B; is normal then the modified out- 
put BY is also normal, which is usually not the case when 
one uses, say, a t-norm. In fact, different properties of J 
like (OP), (IP) and the nature of its natural negation Ny 
all play a role in the reasonableness of the final output 
of an SBR. 

In real-life systems, the input and output domains 
X,Y are subsets of R. Now, let the consequents B; 
be of bounded support, i.e., {y € Y C R|B;(y) > 0} = 
[a,b] © Y for some finite a,b € R. When an J whose 
N; is not the Gödel least negation Np, is employed, the 
support of B; becomes larger and in the case N; is in- 
volutive then the support of the modified output sets 
B’ become the whole of the set Y. This often makes 
the modified output sets B; to be nonconvex (and of 
larger support) and makes it difficult to apply stan- 
dard defuzzification methods. For more on these see the 
works of Štěpnička and De Baets [12.79]. The above 
discussion brings out an interesting aspect of fuzzy im- 
plications. While fuzzy implications 7 whose N; are 
strong are to be preferred in the setting of fuzzy logic 
FL, for inferencing as noted in Sect. 12.3.1 above, an J 
with an N; that is not even continuous is to be preferred 
in inference mechanisms used in fuzzy control. 

By the core of a fuzzy set B on Y, we mean the 
set {y € Y|B(y) = 1}. Now, an J which possesses (OP) 
or (IP) is preferred in an SBR to ensure there is an 
overlap between the cores of the modified outputs B; — 
a property that is so important to ensure coherence in 
the system [12.80] and that, once again, standard de- 
fuzzification methods can be applied. 


12.3.5 Fuzzy Mathematical Morphology 


Consider a 2D binary image P, i.e., the value at a pixel 
is either 0 or 1. P can be seen as a function from X C 
R? — {0, 1} or just a classical subset X C R*. Mathe- 
matical morphology (MM) is a set-theoretic method for 
the extraction of shape information from a scene. Here, 
a Y C R? — which can be seen as another image Q and 
often referred to as the structuring element — is used to 
transform the original image P by some well-defined 
local operators termed Dilation and Erosion as defined 
below 


D(P, 2) = {ve R?|A,(Q)NP FB}, (12.15) 
E(P, Q) = {v E R*|A,(Q) € P}, (12.16) 


where A, (Q) = {u € R?|u—v € Q} is the translation of 
Q by ve R?. 

FMM is the extension of MM to gray-level im- 
ages by using fuzzy sets and possibility theory. Note 
that a gray-level image P can be interpreted as a fuzzy 
set X C R? — (0, 1] where the pixel value is interpreted 
as its membership degree to the original data set. This 
fuzzified image is then processed via morphological op- 
erators that are extensions of the boolean ones. 

In the literature, one finds two approaches to this 
extension: 


i) As a formal translation of crisp equations using 
t-norms and negations, by employing a fuzzy in- 
tersection for N in (12.15) and a fuzzy subsethood 
measure Inc for C in (12.16), and 

ii) Using adjunction and residual implications. 


While the first approach is based on the duality be- 
tween dilation and erosion, the second approach stems 
more from an algebraic setting. 

De Baets (12.81, 82] took the second approach, and 
defined the fuzzy dilation and erosion as follows 


D(P,2)y)= sup [C(P(x—y), Q(@))]. 


xEA,(YINX 


E(P,Q)(y) = inf [(P—y), 20], 
xEA, (Y) 


where C is any fuzzy conjunction and J is a fuzzy im- 
plication. 

When the pair of operations (C,T) satisfy the ad- 
junction property, or equivalently, Z is a residual impli- 
cation obtained from C, then many interesting aspects 
emerge. Firstly, it can be shown that opening and 
closing operations, which are some morphological op- 
erations obtained from the defined Ď, È turn out to 


EZL | d Hed 


198 PartB 


Fuzzy Logic 


zl | d Hed 


be idempotent, which is highly desirable [12.83]. Sec- 
ondly, it can be shown, as was done by Nachtegael and 
Kerre [12.84], that this approach is more general and 
many other approaches become a specific case of it. 
Thirdly, recently, Bloch [12.85] showed that both the 
above approaches based on duality and adjunction are 
equivalent under some rather general and mild con- 


12.4 Future of Fuzzy Implications 


Since the publication of [12.2,3], the peak of interest 
in fuzzy implications has led to a rapid progress in at- 
tempts to solve open problems in this topic. Specially, 
in [12.3], many open problems were presented covering 
all the subtopics of this field: characterizations, intersec- 
tions, additional properties, etc. Many of these problems 
have been already solved and the solutions have been 
collected in [12.88]. However, there still remain many 
open problems involving fuzzy implications. Thus, in 
this section, we will list some of them whose choice has 
been dictated either based on the importance of the prob- 
lem or the significance of the solution. 

The first subset corresponds to open problems deal- 
ing with the satisfaction of particular additional prop- 
erties of fuzzy implications. The first one deals with 
the law of importation (LI). Recently, some works 
have dealt with this property and its equivalence to the 
exchange principle and from them, some new character- 
izations of (S, N)- and R-implications based on (12.5) 
have been proposed, see [12.21]. However, some ques- 
tions are still open. Firstly, (12.5) with a t-norm (or 
a more general conjunction) and (EP) are equivalent 
when N; is a continuous negation, but the equivalence 
in general is not fully determined. 


Problem 12.1 
Characterize all the cases when (LI) and (EP) are equiv- 
alent. 


Secondly, it is not yet known which fuzzy implica- 
tions satisfy (LI) when the conjunction operation is fixed. 


Problem 12.2 

Given a conjunction C (usually a t-norm or a conjunc- 
tive uninorm), characterize all fuzzy implications / that 
satisfy (LI) with this conjunction C. For instance, which 
implications Z satisfy the following functional equation 


I (xy, z) = I(x, I(y, z)) 


that comes from (LI) with T = Tp? 


ditions, but those that often lead to highly desirable 
settings. 

Recently, the approach initiated by De Baets has 
been enlarged by considering uninorms instead of t- 
norms and their residual implications with good results 
in edge detection, as well as in noise reduction [12.86, 
87]. 


Another problem now concerning only the ex- 
change principle follows. 


Problem 12.3 
Give a necessary condition on a nonborder continuous 
t-norm T for the corresponding Ir to satisfy (EP). 


It should be mentioned that some related work on 
the above problem appeared in [12.89]. 

Some other open problems with respect to the sat- 
isfaction of particular additional properties are based 
on the preservation of these properties from some ini- 
tial fuzzy implications to the generated one using some 
construction methods like max, min, or the convex com- 
bination method. 


Problem 12.4 
Characterize all fuzzy implications J, J such that Jv J, 


IA J and K^ satisfy (EP) or (LI), where A € [0, 1]. 


The above problem is also related to the following 
one: 


Problem 12.5 

Characterize the convex closures of the following fam- 
ilies of fuzzy implications: (S,N)-, R- and Yager’s f- 
and g-generated implications. 


Another open problem which has immense applica- 
tional value is the satisfaction of the T-conditionality by 
the Yager’s families of fuzzy implications. 


Problem 12.6 

Characterize Yager’s f-generated and g-generated im- 
plications satisfying the T-conditionality property with 
some f-norm T. 


The following two open problems are related to the 
characterization of some particular classes of fuzzy im- 
plications. 


Fuzzy Implications: Past, Present and Future | References 


Problem 12.7 
What is the characterization of (S, V)-implications gen- 
erated from noncontinuous negations? 


Problem 12.8 
Characterize triples (T,S,N) such that the correspond- 
ing QL-operation Ir sy satisfies (I1). 


Finally, a fruitful topic where many open problems 
are still to be solved is the study of the intersections 
among the classes of fuzzy implications (Fig. 12.1). 


Problem 12.9 


i) Is there a fuzzy implication 7, other than the 
Weber implication Jwg, which is both an (S,N)- 
implication and an R-implication which is obtained 
from a nonborder continuous t-norm and cannot be 
obtained as the residual of any other left-continuous 
t-norm? 


References 


12.1 M. Baczyński, B. Jayaram: (S,N)- and R-im- 
plications: A state-of-the-art survey, Fuzzy Sets 
Syst. 159(14), 1836-1859 (2008) 

12.2 M. Mas, M. Monserrat, J. Torrens, E. Trillas: A survey 
on fuzzy implication functions, IEEE Trans. Fuzzy 
Syst. 15(6), 1107-1121 (2007) 

T23 M. Baczyński, B. Jayaram: Fuzzy Implications, Stud- 
ies in Fuzziness and Soft Computing, Vol. 231 
(Springer, Berlin, Heidelberg 2008) 

12.4 M. Baczyński, G. Beliakov, H. Bustince, A. Pradera 
(Eds.): Advances in Fuzzy Implication Functions, 
Studies in Fuzziness and Soft Computing, Vol. 300 
(Springer, Berlin, Heidelberg 2013) 

12.5 E.P. Klement, R. Mesiar, E. Pap: Triangular norms 
(Kluwer, Dordrecht 2000) 

12.6 M. Mas, M. Monserrat, J. Torrens: Two types of im- 
plications derived from uninorms, Fuzzy Sets Syst. 
158(23), 2612-2626 (2007) 

12.7 S. Massanet, J. Torrens: On the characterization of 
Yager's implications, Inf. Sci. 201, 1-18 (2012) 

12.8 |. Aguiló, J. Suñer, J. Torrens: A characteriza- 
tion of residual implications derived from left- 
continuous uninorms, Inf. Sci. 180(20), 3992-4005 
(2010) 

12.9 E. Trillas, C. Alsina: On the law [(pAq) > r] = [(p > 
r) v (q > r)] in fuzzy logic, IEEE Trans. Fuzzy Syst. 
10(1), 84-88 (2002) 

12.10 J. Balasubramaniam, C.J.M. Rao: On the dis- 
tributivity of implication operators over T and 
S norms, IEEE Trans. Fuzzy Syst. 12(2), 194-198 
(2004) 


ii) If the answer to the above question is affirmative, 
characterize the above nonempty intersection. 


Problem 12.10 


i) Characterize the nonempty intersection between 
(S,N)-implications and QlL-implications, i.e., 
Is,n OIoQL. 

ii) Is the Weber implication Jwg the only QL- 
implication that is also an R-implication obtained 
from a nonleft continuous t-norm? If not, give other 
examples from the above intersection and hence, 
characterize the nonempty intersection between R- 
implications and QL-implications. 

iii) Prove or disprove by giving an example: that there is 
no fuzzy implication which is both a QL- and an R- 
implication, but it is not an (S, N)-implication, i.e., 
(lon NIr)\Is.n = ð. 


12.11 D. Ruiz-Aguilera, J. Torrens: Distributivity of strong 
implications over conjunctive and disjunctive uni- 
norms, Kybernetika 42(3), 319-336 (2006) 

12.12 D. Ruiz-Aguilera, J. Torrens: Distributivity of resid- 
ual implications over conjunctive and disjunctive 
uninorms, Fuzzy Sets Syst. 158(1), 23-37 (2007) 

12.13 S. Massanet, J. Torrens: On some properties of 
threshold generated implications, Fuzzy Sets Syst. 
205(16), 30-49 (2012) 

12.14 M. Baczyński: On the distributivity of fuzzy implica- 
tions over continuous and Archimedean triangular 
conorms, Fuzzy Sets Syst. 161(10), 1406-1419 (2010) 

12.15 M. Baczyński: On the distributivity of fuzzy implica- 
tions over representable uninorms, Fuzzy Sets Syst. 
161(17), 2256-2275 (2010) 

12.16 M. Baczyński, B. Jayaram: On the distributivity of 
fuzzy implications over nilpotent or strict triangu- 
lar conorms, IEEE Trans. Fuzzy Syst. 17(3), 590-603 
(2009) 

12.17 F. Qin, M. Baczyński, A. Xie: Distributive equa- 
tions of implications based on continuous trian- 
gular norms (I), IEEE Trans. Fuzzy Syst. 20(1), 153-167 
(2012) 

12.18 M. Baczyński, F. Qin: Some remarks on the distribu- 
tive equation of fuzzy implication and the con- 
trapositive symmetry for continuous, Archimedean 
t-norms, Int. J. Approx. Reason. 54(2), 290-296 
(2012) 

12.19 B. Jayaram: On the law of importation (x ^ y) > 
z= (X > (y — 2)) in fuzzy logic, IEEE Trans. Fuzzy 
Syst. 16(1), 130-144 (2008) 


199 


ZL | d Hed 


200 PartB 


Fuzzy Logic 


ZL | d Hed 


12.20 


12.21 


12.22 


12.23 


12.24 


12.25 


12.26 


12.27 


12.28 


12.29 


12.30 


12.31 


12.32 


12.33 


12.34 


12.35 


12.36 


M. Štěpnička, B. Jayaram: On the suitability of 
the Bandler-Kohout subproduct as an inference 
mechanism, IEEE Trans. Fuzzy Syst. 18(2), 285-298 
(2010) 

S. Massanet, J. Torrens: The law of importation ver- 
sus the exchange principle on fuzzy implications, 
Fuzzy Sets Syst. 168(1), 47-69 (2011) 

H. Bustince, J. Fernandez, J. Sanz, M. Baczynski, 
R. Mesiar: Construction of strong equality index 
from implication operators, Fuzzy Sets Syst. 211(16), 
15-33 (2013) 

P. Hajek, L. Kohout: Fuzzy implications and gen- 
eralized quantifiers, Int. J. Uncertain. Fuzziness 
Knowl. Syst. 4(3), 225-233 (1996) 

P. Hajek, M.H. Chytil: The GUHA method of auto- 
matic hypotheses determination, Computing 1(4), 
293-308 (1966) 

P. Hajek, T. Havránek: The GUHA method-its aims 
and techniques, Int. J. Man-Mach. Stud. 10(1), 3- 
22 (1977) 

P. Hajek, T. Havránek: Mechanizing Hypothesis For- 
mation: Mathematical Foundations for a General 
Theory (Springer, Heidelberg 1978) 

B. Jayaram, R. Mesiar: On special fuzzy im- 
plications, Fuzzy Sets Syst. 160(14), 2063-2085 
(2009) 

F. Durante, E. Klement, R. Mesiar, C. Sempi: Con- 
junctors and their residual implicators: Charac- 
terizations and construction methods, Mediterr. 
J. Math. &(3), 343-356 (2007) 

M. Carbonell, J. Torrens: Continuous R-implications 
generated from representable aggregation func- 
tions, Fuzzy Sets Syst. 161(17), 2276-2289 (2010) 

Y. Ouyang: On fuzzy implications determined by ag- 
gregation operators, Inf. Sci. 193, 153-162 (2012) 

V. Biba, D. Hliněná: Generated fuzzy implications 
and known classes of implications, Acta Univ. M. 
Belii Ser. Math. 16, 25-34 (2010) 

H. Bustince, J. Fernandez, A. Pradera, G. Beliakov: 
On (TS, N)-fuzzy implications, Proc. AGOP 2011, Ben- 
evento, ed. by B. De Baets, R. Mesiar, L. Troiano 
(2011) pp. 93-98 

|. Aguiló, M. Carbonell, J. Suñer, J. Torrens: Dual 
representable aggregation functions and their de- 
rived S-implications, Lect. Notes Comput. Sci. 6178, 
408-417 (2010) 

S. Massanet, J. Torrens: An overview of construc- 
tion methods of fuzzy implications. In: Advances 
in Fuzzy Implication Functions, Studies in Fuzziness 
and Soft Computing, Vol. 300, ed. by M. Baczynski, 
G. Beliakov, H. Bustince, A. Pradera (Springer, 
Berlin, Heidelberg 2013) pp. 1-30 

J. Balasubramaniam: Yager's new class of impli- 
cations 4 and some classical tautologies, Inf. Sci. 
177(3), 930-946 (2007) 

M. Baczynski, B. Jayaram: Yager's classes of fuzzy 
implications: Some properties and intersections, 
Kybernetika 43(2), 157-182 (2007) 


12.37 


12.38 


12.39 


12.40 


12.41 


12.42 


12.43 


12.44 


12.45 


12.46 


12.47 


12.48 


12.49 


12.50 


12.51 


12.52 


12.53 


12.54 


A. Xie, H. Liu: A generalization of Yager's f- 
generated implications, Int. J. Approx. Reason. 
54(1), 35-46 (2013) 

S. Massanet, J. Torrens: On a generalization of 
Yager's implications, Commun. Comput. Inf. Sci. 
Ser. 298, 315-324 (2012) 

S. Massanet, J. Torrens: On a new class of fuzzy im- 
plications: h-implications and generalizations, Inf. 
Sci. 181(11), 2111-2127 (2011) 

J.C. Fodor: Contrapositive symmetry of fuzzy impli- 
cations, Fuzzy Sets Syst. 69(2), 141-156 (1995) 

S. Jenei: New family of triangular norms via con- 
trapositive symmetrization of residuated implica- 
tions, Fuzzy Sets Syst. 110(2), 157-174 (2000) 

B. Jayaram, R. Mesiar: l-Fuzzy equivalence rela- 
tions and l-fuzzy partitions, Inf. Sci. 179(9), 1278- 
1297 (2009) 

Y. Shi, B.V. Gasse, D. Ruan, E.E. Kerre: On depen- 
dencies and independencies of fuzzy implication 
axioms, Fuzzy Sets Syst. 161(10), 1388-1405 (2010) 
S. Massanet, J. Torrens: Threshold generation 
method of construction of a new implication from 
two given ones, Fuzzy Sets Syst. 205, 50-75 (2012) 
S. Massanet, J. Torrens: On the vertical threshold 
generation method of fuzzy implication and its 
properties, Fuzzy Sets Syst. 226, 32-52 (2013) 

N.R. Vemuri, B. Jayaram: Fuzzy implications: Novel 
generation process and the consequent algebras, 
Commun. Comput. Inf. Sci. Ser. 298, 365-374 (2012) 
P. Grzegorzewski: Probabilistic implications, Proc. 
EUSFLAT-LFA 2011, ed. by S. Galichet, J. Montero, 
G. Mauris (Aix-les-Bains, France 2011) pp. 254-258 
P. Grzegorzewski: Probabilistic implications, Fuzzy 
Sets Syst. 226, 53-66 (2013) 

P. Grzegorzewski: On the properties of probabilistic 
implications. In: Eurofuse 2011, Advances in Intelli- 
gent and Soft Computing, Vol. 107, ed. by P. Melo- 
Pinto, P. Couto, C. Serôdio, J. Fodor, B. De Baets 
(Springer, Berlin, Heidelberg 2012) pp. 67-78 

A. Dolati, J. Fernandez Sanchez, M. Ubeda-Flores: 
A copula-based family of fuzzy implication opera- 
tors, Fuzzy Sets Syst. 211(16), 55-61 (2013) 

P. Grzegorzewski: Survival implications, Commun. 
Comput. Inf. Sci. Ser. 298, 335-344 (2012) 

G. Deschrijver, E. Kerre: On the relation between 
some extensions of fuzzy set theory, Fuzzy Sets Syst. 
133(2), 227-235 (2003) 

G. Deschrijver, E. Kerre: Triangular norms and re- 
lated operators in L*-fuzzy set theory. In: Logical, 
Algebraic, Analytic, Probabilistic Aspects of Trian- 
gular Norms, ed. by E. Klement, R. Mesiar (Elsevier, 
Amsterdam 2005) pp. 231-259 

G. Deschrijver: Implication functions in interval- 
valued fuzzy set theory. In: Advances in Fuzzy Im- 
plication Functions, Studies in Fuzziness and Soft 
Computing, Vol. 300, ed. by M. Baczynski, G. Be- 
liakov, H. Bustince, A. Pradera (Springer, Berlin, 
Heidelberg 2013) pp. 73-99 


Fuzzy Implications: Past, Present and Future 


References 


12.55 


12.56 


12.57 


12.58 


12.59 


12.60 


12.61 


12.62 


12.63 


12.64 


12.65 


12.66 


12.67 


12.68 


12.69 


12.70 


12.71 


12.72 


C. Alcalde, A. Burusco, R. Fuentes-Gonzalez: A con- 
structive method for the definition of interval- 
valued fuzzy implication operators, Fuzzy Sets Syst. 
153(2), 211-227 (2005) 

H. Bustince, E. Barrenechea, V. Mohedano: 
Intuitionistic fuzzy implication operators-an 
expression and main properties, Int. J. Un- 
certain. Fuzziness Knowl. Syst. 12(3), 387-406 
(2004) 

G. Deschrijver, E. Kerre: Implicators based on binary 
aggregation operators in interval-valued fuzzy set 
theory, Fuzzy Sets Syst. 153(2), 229-248 (2005) 

R. Reiser, B. Bedregal: K-operators: An approach 
to the generation of interval-valued fuzzy impli- 
cations from fuzzy implications and vice versa, Inf. 
Sci. 257, 286-300 (2013) 

G. Mayor, J. Torrens: Triangular norms in dis- 
crete settings. In: Logical, Algebraic, Analytic, and 
Probabilistic Aspects of Triangular Norms, ed. by 
E.P. Klement, R. Mesiar (Elsevier, Amsterdam 2005) 
pp. 189-230 

M. Mas, M. Monserrat, J. Torrens: S-implications 
and R-implications on a finite chain, Kybernetika 
40(1), 3-20 (2004) 

M. Mas, M. Monserrat, J. Torrens: On two types of 
discrete implications, Int. J. Approx. Reason. 40(3), 
262-279 (2005) 

J. Casasnovas, J. Riera: S-implications in the set 
of discrete fuzzy numbers, Proc. IEEE-WCCI 2010, 
Barcelona (2010), pp. 2741-2747 

J.V. Riera, J. Torrens: Fuzzy implications defined on 
the set of discrete fuzzy numbers, Proc. EUSFLAT-LFA 
2011 (2011) pp. 259-266 

J.V. Riera, J. Torrens: Residual implications in the 
set of discrete fuzzy numbers, Inf. Sci. 247, 131-143 
(2013) 

P. Smets, P. Magrez: Implication in fuzzy logic, Int. 
J. Approx. Reason. 1(4), 327-347 (1987) 

J.C. Bezdek, D. Dubois, H. Prade: Fuzzy Sets in 
Approximate Reasoning and Information Systems 
(Kluwer, Dordrecht 1999) 

D. Driankov, H. Hellendoorn, M. Reinfrank: An In- 
troduction to Fuzzy Control, 2nd edn. (Springer, 
London 1996) 

D. Dubois, H. Prade: Fuzzy sets in approximate 
reasoning, Part 1: Inference with possibility distri- 
butions, Fuzzy Sets Syst. 40(1), 143-202 (1991) 

G.J. Klir, B. Yuan: Fuzzy sets and fuzzy logic- 
theory and applications (Prentice Hall, Hoboken 
1995) 

L.A. Zadeh: Outline of a new approach to the anal- 
ysis of complex systems and decision processes, 
IEEE Trans. Syst. Man Cybern. 3(1), 28-44 (1973) 

W. Bandler, L.J. Kohout: Semantics of implica- 
tion operators and fuzzy relational products, Int. 
J. Man-Mach. Stud. 12(1), 89-116 (1980) 

W.E. Combs, J.E. Andrews: Combinatorial rule ex- 
plosion eliminated by a fuzzy rule configuration, 
IEEE Trans. Fuzzy Syst. 6(1), 1-11 (1998) 


12.73 


12.74 


12.75 


12.76 


12.77 


12.78 


12.79 


12.80 


12.81 


12.82 


12.83 


12.84 


12.85 


12.86 


12.87 


12.88 


B. Jayaram: Rule reduction for efficient inferencing 
in similarity based reasoning, Int. J. Approx. Rea- 
son. 48(1), 156-173 (2008) 

L.A. Zadeh: Fuzzy sets, Inf. Control 8(3), 338-353 
(1965) 

D. Sinha, E.R. Dougherty: Fuzzification of set in- 
clusion: Theory and applications, Fuzzy Sets Syst. 
55(1), 15-42 (1991) 

L. Kitainik: Fuzzy inclusions and fuzzy dichoto- 
mous decision procedures. In: Optimization mod- 
els using fuzzy sets and possibility theory, ed. 
by J. Kacprzyk, S. Orlovski (Reidel, Dordrecht 1987) 
pp. 154-170 

W. Bandler, L. Kohout: Fuzzy power sets and fuzzy 
implication operators, Fuzzy Sets Syst. 4(1), 13-30 
(1980) 

S. Mandal, B. Jayaram: Approximation capability 
of SISO SBR fuzzy systems based on fuzzy implica- 
tions, Proc. AGOP 2011, ed. by B. De Baets, R. Mesiar, 
L. Troiano (University of Sannio, Benevento 2011) 
pp. 105-110 

M. Štěpnička, B. De Baets: Monotonicity of implica- 
tive fuzzy models, Proc. FUZZ-IEEE, 2010 Barcelona 
(2010), pp. 1-7 

D. Dubois, H. Prade, L. Ughetto: Checking the 
coherence and redundancy of fuzzy knowledge 
bases, IEEE Trans. Fuzzy Syst. 5(3), 398-417 (1997) 
B. De Baets: Idempotent closing and opening op- 
erations in fuzzy mathematical morphology, Proc. 
ISUMA-NAFIPS'95, Maryland (1995), pp. 228-233 

B. De Baets: Fuzzy morphology: A logical approach. 
In: Uncertainty Analysis in Engineering and Sci- 
ence: Fuzzy Logic, Statistics, Neural Network Ap- 
proach, ed. by B.M. Ayyub, M.M. Gupta (Kluwer, 
Dordrecht 1997) pp. 53-68 

J. Serra: Image Analysis and Mathematical Mor- 
phology (Academic, London, New York 1988) 

M. Nachtegael, E.E. Kerre: Connections between 
binary, gray-scale and fuzzy mathematical mor- 
phologies original, Fuzzy Sets Syst. 124(1), 73-85 
(2001) 

|. Bloch: Duality vs. adjunction for fuzzy math- 
ematical morphology and general form of fuzzy 
erosions and dilations, Fuzzy Sets Syst. 160(13), 
1858-1867 (2009) 

M. González-Hidalgo, A. Mir Torres, D. Ruiz- 
Aguilera, J. Torrens Sastre: Edge-images using 
a uninorm-based fuzzy mathematical morphology: 
Opening and closing. In: Advances in Computa- 
tional Vision and Medical Image Processing, Com- 
putational Methods in Applied Sciences, Vol. 13, ed. 
by J. Tavares, N. Jorge (Springer, Berlin, Heidelberg 
2009) pp. 137-157 

M. González-Hidalgo, A. Mir Torres, J. Torrens Sas- 
tre: Noisy image edge detection using an uninorm 
fuzzy morphological gradient, Proc. ISDA 2009 (IEEE 
Computer Society, Los Alamitos 2009) pp. 1335-1340 
M. Baczyński, B. Jayaram: Fuzzy implications: Some 
recently solved problems. In: Advances in Fuzzy 


201 


ZL | d Hed 


202 Part B | Fuzzy Logic 


Implication Functions, Studies in Fuzziness and 12.89  B.Jayaram, M. Baczyński, R. Mesiar: R-implications 


Soft Computing, Vol. 300, ed. by M. Baczyński, and the exchange principle: The case of border 
G. Beliakov, H. Bustince, A. Pradera (Springer, continuous t-norms, Fuzzy Sets Syst. 224, 93-105 
Berlin, Heidelberg 2013) pp. 177-204 (2013) 


ZL | d Hed 


13. Fuzzy Rule-Based Systems 


Luis Magdalena 


Fuzzy rule-based systems are one of the most 

important areas of application of fuzzy sets and 

fuzzy logic. Constituting an extension of classical 
rule-based systems, these have been successfully 
applied to a wide range of problems in differ- 

ent domains for which uncertainty and vagueness 
emerge in multiple ways. In a broad sense, fuzzy 
rule-based systems are rule-based systems, where 
fuzzy sets and fuzzy logic are used as tools for rep- 
resenting different forms of knowledge about the 
problem at hand, as well as for modeling the in- 
teractions and relationships existing between its 
variables. The use of fuzzy statements as one of 
the main constituents of the rules allows cap- 

turing and handling the potential uncertainty of 
the represented knowledge. On the other hand, 

thanks to the use of fuzzy logic, inference meth- 
ods have become more robust and flexible. This 

chapter will mainly analyze what is a fuzzy rule- 
based system (from both conceptual and structural 
points of view), how is it built, and how can be 
used. The analysis will start by considering the 

two main conceptual components of these sys- 

tems, knowledge, and reasoning, and how they 
are represented. Then, a review of the main struc- 
tural approaches to fuzzy rule-based systems will 
be considered. Hierarchical fuzzy systems will also 
be analyzed. Once defined the components, struc- 


From the point of view of applications, one of the most 
important areas of fuzzy sets theory is that of fuzzy 
rule-based systems (FRBSs). These kind of systems 
constitute an extension of classical rule-based systems, 
considering JF-THEN rules whose antecedents and 
consequents are composed of fuzzy logic (FL) state- 
ments, instead of classical logic ones. 

Conventional approaches to knowledge representa- 
tion are based on bivalent logic, which has associated 
a serious shortcoming: the inability to reason in situa- 


13.1 Components of a Fuzzy Rule 


Based-System..................:cccceeeeeeneeee ees 204 
13.1.1 Knowledge Base.............::-.s.0-+6 205 
13.1.2 Processing Structure ...............0.. 206 
13.2 Types of Fuzzy Rule-Based Systems....... 209 
13.2.1 Linguistic Fuzzy Rule-Based 
E L E ETT 209 
13.2.2 Variants of Mamdani 
Fuzzy Rule-Based Systems.......... 209 
13.2.3 Takagi-Sugeno-Kang 
Fuzzy Rule-Based Systems.......... 211 
13.2.4 Singleton Fuzzy Rule-Based 
AEEA e AA E TEAT 212 
13.2.5 Fuzzy Rule-Based Classifiers....... 212 
13.2.6 Type-2 Fuzzy Rule-Based 
SE LLL A E 212 
13.2.7 Fuzzy Systems 
with Implicative Rules ............... 213 
13.3 Hierarchical Fuzzy Rule-Based Systems. 213 
13.4 Fuzzy Rule-Based Systems Design......... 214 
13.4.1 PRBS Properties ........5.5.00.scceese008 214 
13.4.2 Designing FRBSS..............00.0000-..6 215 
13.5 Conclusions ...................cccceeeeeeeeeeeeeeeees 216 
Referentes. orainean neras 217 


ture and approaches to those systems, the ques- 
tion of design will be considered. Finally, some 
conclusions will be presented. 


tions of uncertainty and imprecision. As a consequence, 
conventional approaches do not provide an adequate 
framework for this mode of reasoning familiar to hu- 
mans, and most commonsense reasoning falls into this 
category. 

In a broad sense, an FRBS is a rule-based sys- 
tem where fuzzy sets and FL are used as tools for 
representing different forms of knowledge about the 
problem at hand, as well as for modeling the interac- 
tions and relationships existing between its variables. 


203 


v 
o 

= 

as 
wo 
—_ 
Ww 


204 PartB 


Fuzzy Logic 


Vel | g Hed 


The use of fuzzy statements as one of the main con- 
stituents of the rules, allows capturing and handling 
the potential uncertainty of the represented knowl- 
edge. On the other hand, thanks to the use of fuzzy 
logic, inference methods have become more robust and 
flexible. 

Due to these properties, FRBSs have been success- 
fully applied to a wide range of problems in different 
domains for which uncertainty and vagueness emerge 
in multiple ways [13.1-5]. 

The analysis of FRBSs will start by considering 
the two main conceptual components of these systems, 
knowledge and reasoning, and how they are repre- 


sented. Then, a review of the main structural approaches 
to FRBSs will be considered. Hierarchical fuzzy sys- 
tems would probably match in this previous section, 
but being possible to combine the hierarchical approach 
with any of the structural models defined there, it seems 
better to consider it independently. Once defined the 
components, structure, and approaches to those sys- 
tems, the question of design will be considered. Finally, 
some conclusions will be presented. It is important to 
notice that this chapter will concentrate on the general 
aspects related to FRBSs without deepening in the foun- 
dations of FL which are widely considered in previous 
chapters. 


13.1 Components of a Fuzzy Rule-Based System 


Knowledge representation in FRBSs is enhanced with 
the use of linguistic variables and their linguistic val- 
ues, that are defined by context-dependent fuzzy sets 
whose meanings are specified by gradual membership 
functions [13.6-8]. On the other hand, FL inference 
methods such as generalized Modus Ponens, general- 
ized Modus Tollens, etc., form the basis for approx- 
imate reasoning [13.9]. Hence, FL provides a unique 
computational framework for inference in rule-based 
systems. This idea implies the presence of two clearly 
different concepts in FRBSs: knowledge and reasoning. 
This clear separation between knowledge and reason- 
ing (the knowledge base (KB) and processing structure 
shown in Fig. 13.1) is the key aspect of knowledge- 
based systems, so that from this point of view, FRBSs 
can be considered as a type of knowledge-based system. 

The first implementation of an FRBS dealing 
with real inputs and outputs was proposed by Mam- 


Knowledge base 


Scaling 


dani [13.10], who considering the ideas published just 
a few months before by Zadeh [13.9] was able to aug- 
ment his initial formulation allowing the application of 
fuzzy systems (FSs) to a control problem, so creating 
the first fuzzy control application. These kinds of FSs 
are also referred to as FRBSs with fuzzifier and de- 
fuzzifier or, more commonly, as fuzzy logic controllers 
(FLCs), as proposed by the author in his pioneering pa- 
per [13.11], or Mamdani FRBSs. From the beginning, 
the term FLC became popular since control systems 
design constituted the main application of Mamdani 
FRBSs. At present, control is only one more of the 
many application areas of FRBSs. 

The generic structure of a Mamdani FRBS is shown 
in Fig. 13.1. The KB stores the available knowledge 
about the problem in the form of fuzzy IF-THEN rules. 
The processing structure, by means of these rules, puts 
into effect the inference process on the system inputs. 


functions 


Fuzzy Fuzzy 
rules partitions 


Defuzzi- 
fication 


Fuzzi- 
fication 


Input 
scaling 


Inference 
engine 


Processing structure 


Output 
scaling 


Fig. 13.1 General structure of 
a Mamdani FRBS 


Fuzzy Rule-Based Systems 


13.1 Components of a Fuzzy Rule Based-System 


The fulfillment of rule antecedent gives rise to the ex- 
ecution of its consequent, i. e., one output is produced. 
The overall process includes several steps. The input 
and output scalings produce domain adaptations. Fuzzi- 
fication interface establishes a mapping between crisp 
values in the input domain U, and fuzzy sets defined on 
the same universe of discourse. On the other hand, the 
defuzzification interface performs the opposite opera- 
tion by defining a mapping between fuzzy sets defined 
in the output domain V and crisp values defined in 
the same universe. The central step of the process is 
inference. 

The next two subsections analyze in depth the two 
main components of an FRBS, the KB and the pro- 
cessing structure, considering the case of a Mamdani 
FRBS. 


13.1.1 Knowledge Base 


The KB of an FRBS serves as the repository of the 
problem-specific knowledge — that models the rela- 
tionship between input and output of the underlying 
system — upon which the inference process reasons 
to obtain from an observed input, an associated out- 
put. 

This knowledge is represented in the form of rules, 
and the most common rule structure in Mamdani 
FRBSs involves the use of linguistic variables [13.6—8]. 
Hence, when dealing with multiple inputs-single output 
(MISO) systems, these linguistic rules possess the fol- 
lowing form 


IF X, is LT, and ... and X, is LT, 
THEN Y is LT, , (13.1) 


with X; and Y being, respectively, the input and output 
linguistic variables, and with LT; being linguistic terms 
associated with these variables. 

Note that the KB contains two different informa- 
tion levels, i. e., the linguistic variables (providing fuzzy 
rule semantics in the form of fuzzy partitions) and 
the linguistic rules representing the expert knowledge. 
Apart from that, a third component, scaling functions, is 
added in many FRBSs to act as an interfacing compo- 
nent for domain adaptation between the external world 
and the universes of discourse used at the level of the 
fuzzy partitions. This conceptual distinction drives to 
the three separate entities that constitute the KB: 


© The fuzzy partitions (also called Frames of Cogni- 
tion) describe the sets of linguistic terms associated 


with each variable and considered in the linguis- 
tic rules, and the membership functions defining 
the semantics of these linguistic terms. Each lin- 
guistic variable involved in the problem will have 
associated a fuzzy partition of its domain. Fig- 
ure 13.2 shows a fuzzy partition using triangu- 
lar membership functions. This structure provides 
a natural framework to include expert knowledge 
in the form of fuzzy rules. The fuzzy partition 
shown in the figure uses five linguistic terms {very 
small, small, medium, large, and very large}, (rep- 
resented as VS, S, M, L, and VL, respectively) 
with the interval [/,r] being its domain (Universe 
of discourse). The figure also shows the mem- 
bership function associated to each of these five 
terms. 

© A rule base (RB) is comprised of a collection of 
linguistic rules (as the one shown in (13.1)) that 
are joined by the also operator. In other words, 
multiple rules can fire simultaneously for the same 
input. 

@ Moreover, the KB also comprises the scaling 
functions or scaling factors that are used to 
transform between the universe of discourse in 
which the fuzzy sets are defined from/to the 
domain of the system input and output vari- 
ables. 


It is important to note that the RB can present 
several structures. The usual one is the list of rules, 
although a decision table (also called rule matrix) be- 
comes an equivalent and more compact representation 
for the same set of linguistic rules when only a few in- 
put variables (usually one or two) are considered by the 
FRBS. 

Let us consider an FRBS where two input vari- 
ables (x; and x2) and a single output variable (y) 
are involved, with the following term sets associated: 
{small, medium, large}, {short, medium, long} and 
{bad, medium, good}, respectively. The following RB 


VS S M L VL 


0.5 


il r 


Fig. 13.2 Example of a fuzzy partition 


205 


VEL | g Wed 


206 PartB 


VEL | g Wed 


Fuzzy Logic 


composed of five linguistic rules 


R,: IF X, is small and Xa is short THEN Y is bad, 
also 
Ro: IF X, is small and Xə is medium THEN Y is 
bad, also 
R3: IF X, is medium and X is short THEN Y is 
medium, also 

R4: IF X; is large and X is medium THEN Y is 
medium, also 


Rs: IF X; is large and Xp is long THEN Y is good , 
(13.2) 


can be represented by the decision table shown in 
Table 13.1. 

Before concluding this section, we should notice 
two aspects. On one hand, the structure of a lin- 
guistic rule may be more generic if a connective 
other than the and operator is used to aggregate the 
terms in the rule antecedent. However, it has been 
demonstrated that the above rule structure is generic 
enough to subsume other possible rule representa- 
tions [13.12]. The above rules are therefore com- 
monly used throughout the literature due to their sim- 
plicity and generality. On the other hand, linguistic 
rules are not the only option and rules with a dif- 
ferent structure can be considered, as we shall see in 
Sect. 13.2. 


13.1.2 Processing Structure 


The functioning of FRBSs has been described as the 
interaction of knowledge and reasoning. Once briefly 
considered the knowledge component, this section will 
analyze the reasoning (processing) structure. The pro- 
cessing structure of a Mamdani FRBS is composed of 
the following five components: 


© The input scaling that transforms the values of the 
input variables from its domain to the one where the 


input fuzzy partitions are defined. 


Table 13.1 Example of a decision table 


x1 
x2 small medium large 
short bad medium 
medium bad medium 
long good 


© A fuzzification interface that transforms the crisp in- 
put data into fuzzy values that serve as the input to 
the fuzzy reasoning process. 

@ An inference engine that infers from the fuzzy in- 
puts to several resulting output fuzzy sets according 
to the information stored in the KB. 

© A defuzzification interface that converts the fuzzy 
sets obtained from the inference process into a crisp 
value. 

© The output scaling that transforms the defuzzified 
value from the domain of the output fuzzy parti- 
tions to that of the output variables, constituting the 
global output of the FRBS. 


In the following, the five elements will be briefly 
described. 


The Input/Output Scaling 
Input/output scaling maps (applying the corresponding 
scaling functions or factors contained in the KB) the in- 
put/output variables to/from the universes of discourse 
over which the corresponding linguistic variables were 
defined. 

This mapping can be performed with different func- 
tions ranging from a simple scaling factor to linear and 
nonlinear functions. 

The initial idea for scaling was the use of scaling 
factors with a tuning purpose [13.13], giving a certain 
adaptation capability to the fuzzy system. 

Additional degrees of freedom could be obtained by 
using a more complex scaling function. A second op- 
tion is the use of linear scaling with a function of the 
form 


fQ@)=A-x4+v, (13.3) 


where the scaling factor A enlarges or reduces the op- 
erating range, which in turn decreases or increases the 
sensitivity of the system in respect to that input vari- 
able, or the corresponding gain in the case of an output 
variable. The parameter v shifts the operating range and 
plays the role of an offset for the corresponding vari- 
able. 

Finally, it is possible to use more complex mappings 
generating nonlinear scaling. A common nonlinear 
scaling function is 


f(x) = sign(x) - |x| . (13.4) 


This nonlinear scaling increases (a > 1) or decreases 
(a < 1) the relative sensitivity in the region closer to the 


Fuzzy Rule-Based Systems | 13.1 Components of a Fuzzy Rule Based-System 207 


central point of the interval and has the opposite effect 
when moving far from the central point [13.14]. 


The Fuzzification Interface 

The fuzzification interface enables Mamdani FRBSs 
to handle crisp input values. Fuzzification establishes 
a mapping from crisp input values to fuzzy sets defined 
in the universe of discourse of those inputs. The mem- 
bership function of the fuzzy set A’ defined over the 
universe of discourse U associated to a crisp input value 
Xo is computed as 


pw = Fo), (13.5) 


in which F is a fuzzification operator. 

The most common choice for the fuzzification oper- 
ator F is the point wise or singleton fuzzification, where 
A’ is built as a singleton with support xo, i. e., it presents 
the following membership function: 

w l, ifx= xo isa 

Wy (x) = . 13. 

0, otherwise . 
Nonsingleton options [13.15] are also possible and have 
been considered in some cases as a tool to represent the 
imprecision of measurements. 


The Inference System 
The inference system is the component that derives 
the fuzzy outputs from the input fuzzy sets accord- 
ing to the relation defined through the fuzzy rules. The 
usual fuzzy inference scheme employs the generalized 
Modus Ponens, an extension to the classical Modus Po- 
nens [13.9] 


IF X is A THEN Y is B 
X is A’ (13.7) 
Y is B’. 
In this expression, IF X is A THEN Y is B describes 
a conditional statement that in this case is a fuzzy con- 
ditional statement, since A and B are fuzzy sets, and 
X and Y are linguistic variables. A fuzzy conditional 
statement like this one represents a fuzzy relation be- 
tween A and B defined in U x V. This fuzzy relation is 
expressed again by a fuzzy set (R) whose membership 
function ug(x, y) is given by 


ur, y) = (max), Ma), Vee U,yev, 
(13.8) 


in which u4 (x) and ug(y) are the membership functions 
of the fuzzy sets A and B, and 7 is a fuzzy implication 
operator that models the existing fuzzy relation. 


Going back to (13.7), the result of applying gen- 
eralized Modus Ponens is obtaining the fuzzy set B’ 
(through its membership function) by means of the 
compositional rule of inference [13.9]: 


If R is a fuzzy relation defined in U and V, and A’ 
is a fuzzy set defined in U, then the fuzzy set B’, in- 
duced by A’, is obtained from the composition of R 
and A’, 


that is 
B' =A oR. (13.9) 


Now it is needed to compute the fuzzy set B’ from 
A’ and R. According to the definition of composition 
(T-composition) given in the chapter devoted to fuzzy 
relations, the result will be 


ugr O) = sup T(x (x), uR(X,y)) 5 (13.10) 


where T is a triangular norm (t-norm). The concept and 
properties of t-norms have been previously introduced 
in the chapter devoted to fuzzy sets. 

Given now an input value X = xo, obtaining A’ in 
accordance with (13.6) (where uy (x)= 0 Vx 4 xo), 
and considering the properties of t-norms (T(1,a) = 
a, T(0, a) = 0), the previous expression is reduced to 


ue (Y) = T (mx (xo), Ur. Y)) 
= T(1, ur(x0, y)) = ur(Xo, Y) - (13.11) 


The only additional point to arrive to the final value 
of ug (y) is the definition of R, the fuzzy relation 
representing the Implication. This is a somehow con- 
troversial question. Since the very first applications of 
FRBSs [13.10, 11] this relation has been implemented 
with the minimum (product has been also a common 
choice). If we analyze the definition of fuzzy impli- 
cation given in the corresponding chapter, it is clear 
that the minimum does not satisfy all the conditions 
to be a fuzzy implication, so, why is it used? It can 
be said that initially it was a short of heuristic de- 
cision, which demonstrated really good results being 
accepted and reproduced in all subsequent applications. 
Further analysis can offer different explanations to this 
choice [13.16-18]. 

In any case, assuming the minimum as the represen- 
tation for R, (13.11) produces the following final result: 


ueg (y) = min(ua (xo), HBG) - (13.12) 


VEL | g Hed 


208 Part B | Fuzzy Logic 
Considering now an n-dimensional input space, the Usually, the aggregation operator G is implemented 
inference will establish a mapping between fuzzy sets by the maximum (a t-conorm), and the defuzzifier D 
defined in the Cartesian product (U = U x U2 x---x is the center of gravity (CG) or the mean of maxima 
Un) of the universes of discourse of the input variables (MOM), whose expressions are as follows: 
X1, ..., Xn, and fuzzy sets defined in V, being the uni- 
verse of discourse of the output variable Y. Therefore, © CG: 
when applied to the ith rule of the RB, defined as 
ee ee declan asi _ fy y: be Ody 
fi 1 is Ay and ... andX,, is Ai, THEN Y is B; , ME lida (13.16) 
(13.13) Jy He) y 
considering an input value xo = (x1,..., Xn), the out- @ MOM: 
put fuzzy set B’ will be obtained by replacing ua (xo) 
eae yal Yint = inf{z| up (z) = max pp’ (y)} 
Ha; (x0) = T (Man i), MAin (Xn). E 
where T is a fuzzy conjunctive operator (a t-norm). Ysup = Suptz| Ma (2) = a Le (y)} 
The Defuzzification Interface Yo = Ty tee . (13.17) 
The inference process in Mamdani-type FRBSs oper- 
ates at the level of individual rules. Thus, the applica- Mode B-FITA: First Infer, then Aggregate. In this 
a tion of the compositional rule of inference to the current second approach, the contribution of each fuzzy set is 
z input, using the m rules in the KB, generates m out- considered separately and the final crisp value is ob- 
= put fuzzy sets B;. The defuzzification interface has to tained by means of an averaging or selection operation 
a aggregate the information provided by the m individ- applied to the set of crisp values derived from each of 


ual outputs and obtain a crisp output value from the 
aggregated set. This task can be done in two different 
ways [13.1, 12, 19]: Mode A-FATI (first aggregate, then 
infer) and Mode B-FITA (first infer, then aggregate). 

Mamdani originally suggested the mode A-FATI 
in his first conception of FLCs [13.10]. In the last 
few years, the Mode B-FITA is becoming more pop- 
ular [13.19-21], in particular, in real-time applications 
which demand a fast response time. 


Mode A-FATI: First Aggregate, then Infer. In this 
case, the defuzzification interface operates as follows: 


e Aggregate the individual fuzzy sets B; into an over- 
all fuzzy set B’ by means of a fuzzy aggregation 
operator G (usually named as the also operator): 


H) = G fjue O); Hag O); -+ + Hay O)} 
(13.14) 
@ Employ a defuzzification method, D, transforming 
the fuzzy set B’ into a crisp output value yo: 


yo = D(a (y)). (13.15) 


the individual fuzzy sets By. 

The most common choice is either the CG or the 
maximum value (MV), then weighted by the matching 
degree. Its expression is shown as follows: 


(13.18) 


with y; being the CG or the MV of the fuzzy set By, 
inferred from rule R;, and h; = ua; (xo) being the match- 
ing between the system input x9 and the antecedent 
(premise) of rule i. 

Hence, this approach avoids aggregating the rule 
outputs to generate the final fuzzy set B’, reducing the 
computational burden compared to mode A-FATI de- 
fuzzification. 

This defuzzification mode constitutes a different ap- 
proach to the notion of the also operator, and it is 
directly related to the idea of interpolation and the ap- 
proach of Takagi-Sugeno—Kang (TSK) fuzzy systems, 
as can be seen by comparing (13.18) and (13.25). 


Fuzzy Rule-Based Systems | 13.2 Types of Fuzzy Rule-Based Systems 


13.2 Types of Fuzzy Rule-Based Systems 


As discussed earlier, the first proposal of an FRBS was 
that of Mamdani, and this kind of system has been 
considered as the basis for the general description of 
previous section. This section will focus on the differ- 
ent structures that can be considered when building an 
FRBS. 


13.2.1 Linguistic Fuzzy Rule-Based Systems 


This approach corresponds to the original Mam- 
dani FRBS [13.10,11], being the main tool to de- 
velop Linguistic models, and is the approach that has 
been mainly considered to this point in the chap- 
ter. 

A Mamdani FRBS provides a natural framework 
to include expert knowledge in the form of linguis- 
tic rules. This knowledge can be easily combined with 
rules which are automatically generated from data sets 
that describe the relation between system input and 
output. In addition, this knowledge is highly inter- 
pretable. The fuzzy rules are composed of input and 
output variables, which take values from their term sets 
having a meaning (a semantics) associated with each 
linguistic term. Therefore, each rule is a description of 
a condition-action statement that exhibits a clear in- 
terpretation to a human — for this reason, these kinds 
of systems are usually called linguistic or descrip- 
tive Mamdani FRBSs. This property makes Mamdani 
FRBSs appropriate for applications in which the em- 
phasis lies on model interpretability, such as fuzzy 
control [13.20, 22,23] and linguistic modeling [13.4, 
21). 


13.2.2 Variants of Mamdani Fuzzy 
Rule-Based Systems 


Although Mamdani FRBSs possess several advantages, 
they also come with some drawbacks. One of the prob- 
lems, especially in linguistic modeling applications, 
is their limited accuracy in some complex problems, 
which is due to the structure of the linguistic rules. 
[13.24] and [13.25] analyzed these limitations conclud- 
ing that the structure of the fuzzy linguistic JF-THEN 
tule is subject to certain restrictions because of the use 
of linguistic variables: 


@ There is a lack of flexibility in the FRBS due 
to the rigid partitioning of the input and output 
spaces. 


@ When the input variables are mutually dependent, it 
becomes difficult to find a proper fuzzy partition of 
the input space. 

@ The homogeneous partition of the input and output 
space becomes inefficient and does not scale well 
as the dimensionality and complexity of the input— 
output mapping increases. 

@ The size of the KB increases rapidly with the num- 
ber of variables and linguistic terms in the system. 
This problem is known as the course of dimension- 
ality. In order to obtain an accurate FRBS, a fine 
level of granularity is needed, which requires addi- 
tional linguistic terms. This increase in granularity 
causes the number of rules to grow, which compli- 
cates the interpretability of the system by a human. 
Moreover, in the vast majority of cases, it is possi- 
ble to obtain an equivalent FRBS that achieves the 
same accuracy with a fewer number of rules whose 
fuzzy sets are not restricted to a fixed input space 
partition. 


Both variants of linguistic Mamdani FRBSs de- 
scribed in this section attempt to solve the said prob- 
lems by making the linguistic rule structure more 
flexible. 


DNF Mamdani Fuzzy Rule-Based Systems 
The first extension to Mamdani FRBSs aims at a differ- 
ent rule structure, the so-called disjunctive normal form 
(DNF) fuzzy rule, which has the following form [13.26, 
27): 


IF X, is A; and ... and X, is A, 
THEN Y is B, (13.19) 


where each input variable X; takes as its value a set 
of linguistic terms A;, whose members are joined by 
a disjunctive operator, while the output variable remains 
a usual linguistic variable with a single label associated. 
Thus, the complete syntax for the antecedent of the rule 
is 


Xx is Ay = {Aj or ... or Ay} and... 
and Xn is Ay = {An or... orAni,}. (13.20) 
An example of this kind of rule is shown as follows. Let 


us suppose we have three input variables, X1, X2, and 
X3, and one output variable, Y, such that the linguistic 


209 


TEL | d Hed 


210 + PartB 


Fuzzy Logic 


cel | da Hed 


term sets D; (i= 1,2,3) and F, associated with each 
variable, are 


Dı = {A11, A12, A13} 
Dy = {A21 , A22, A23, A24, Ars} 


D3 = {A31,A32} F = {By, Bo, B3} . (13.21) 
In this case, an example of DNF rule will be 
IF X; is {Ai or Aj2} and X> is {A23 or Arq} 
and X3 is {A31 or A32} THEN Y is B2 . (13.22) 


This expression contains an additional connective 
different than the and considered in all previous rules. 
The or connective is computed through a t-conorm, the 
maximum being the most commonly used. 

The main advantage of this rule structure is its 
ability to integrate in a single expression (a single 
DNF rule) the information corresponding to several 
elemental rules (the rules commonly used in Mam- 
dani FRBSs). In this example, (13.22) corresponds to 
8 (2 x 2 x 2) rules of the equivalent system expressed as 
(13.1). This property produces a certain level of com- 
pression of the rule base, being quite helpful when the 
number of input variables increases, alleviating the ef- 
fect of the course of dimensionality. 


Approximate Mamdani-Type 

Fuzzy Rule-Based Systems 
While the previous DNF fuzzy rule structure does not 
involve an important loss in the linguistic Mamdani 


a) Descriptive Knowledge base 


NB NM NS ZR PS PM PB 


X 


XI 


R1: If X is NB then Y is NB 
R2: If X is NM then Y is NM 
R3: If X is NS then Y is NS 
R4: If X is ZR then Y is ZR 


NB NM NS ZR PS PM PB 


Y 


Xr Yl Nar 


R5: If X is PS then Y is PS 
R6: If X is PM then Y is PM 
R7: If X is PB then Y is PB 


b) Approximate fuzzy rule base 


RI: IfXis /X thenYis Mœ 
R2: If Xis Z^ thenYis A 
R3: If Xis 7_ thenY is ÆA 
R4: If X is ZX then Y is 7X 


Fig. 13.3a,b Comparison between a descriptive KB and an approx- 
imate fuzzy rule base 


FRBS interpretability, the point of departure for the sec- 
ond extension is to obtain an FRBS which achieves 
a better accuracy at the cost of reduced interpretability. 
These systems are called approximate Mamdani-type 
FRBSs [13.1, 25, 28-30], in opposition to the previous 
descriptive or linguistic Mamdani FRBSs. 

The structure of an approximate FRBS is similar to 
that of a descriptive one shown in Fig. 13.1. The dif- 
ference is that in this case, the rules do not refer in 
their definition to predefined fuzzy partitions of the lin- 
guistic variables. In an approximate FRBS, each rule 
defines its own fuzzy sets instead of using a linguistic 
label pointing to a particular fuzzy set of the partition 
of the underlying linguistic variable. Thus, an approxi- 
mate fuzzy rule has the following form: 


IF X; is Ay and ... and X,, is A, THEN Y is B. 


(13.23) 


The major difference with respect to the rule struc- 
ture considered in linguistic Mamdani FRBSs is the fact 
that the input variables X; and the output one Y are fuzzy 
variables instead of linguistic variables and, thus, A; and 
B are not linguistic terms (L7;) as they were in (13.1), 
but independently defined fuzzy sets that elude an in- 
tuitive linguistic interpretation. In other words, rules of 
approximate nature are semantic free, whereas descrip- 
tive rules operate in the context formulated by means of 
the linguistic terms semantics. 

Therefore, approximate FRBSs do not relay on 
fuzzy partitions defining a semantic context in the form 
of linguistic terms. The fuzzy partitions are somehow 
integrated into the fuzzy rule base in which each rule 
subsumes the definition of its underlying input and out- 
put fuzzy sets, as shown in Fig. 13.3(b). 

Approximate FRBSs demonstrate some specific ad- 
vantages over linguistic FRBSs making them particu- 
larly useful for certain types of applications [13.25]: 


@ The major advantage of the approximate approach 
is that each rule employs its own distinct fuzzy sets 
resulting in additional degrees of freedom and an in- 
crease in expressiveness. It means that the tuning of 
a certain fuzzy set in a rule will have no effect on 
other rules, while changing a fuzzy set of a fuzzy 
partition in a descriptive model affects all rules con- 
sidering the corresponding linguistic label. 

e@ Another important advantage is that the number of 
rules can be adapted to the complexity of the prob- 
lem. Simple input—output relationships are modeled 
with a few rules, but still more rules can be added as 


Fuzzy Rule-Based Systems | 13.2 Types of Fuzzy Rule-Based Systems 


the complexity of the problem increases. Therefore, 
approximate FRBSs constitute a potential remedy 
to the course of dimensionality that emerges when 
scaling to multidimensional systems. 


These properties enable approximate FRBSs to 
achieve a better accuracy than linguistic FRBS in com- 
plex problem domains. However, despite their benefits, 
they also come with some drawbacks: 


@ Their main drawback compared to the descriptive 
FRBS is the degradation in terms of interpretabil- 
ity of the RB as the fuzzy variables no longer 
share a unique linguistic interpretation. Still, un- 
like other kinds of approximate models such as 
neural networks that store knowledge implicitly, 
the knowledge in an approximate FRBS remains 
explicit as the system behavior is described by lo- 
cal rules. Therefore, approximate FRBSs can be 
considered as a compromise between the apparent 
interpretability of descriptive FRBSs and the type of 
black-box behavior, typical for nondescriptive, im- 
plicit models. 

@ The capability to approximate a set of training data 
accurately can lead to over-fitting and therefore to 
a poor generalization capability to cope with previ- 
ously unseen input data. 


According to their properties, fuzzy model- 
ing [13.1] constitutes the major application of approx- 
imate FRBSs, as model accuracy is more relevant than 
description ability. Approximate FRBSs are usually not 
the first choice for linguistic modeling and fuzzy control 
problems. Hence, descriptive and approximate FRBSs 
are considered as complementary rather than competi- 
tive approaches. Depending on the problem domain and 
requirements on the obtained model, one should use one 
or the other approach. Approximate FRBSs are recom- 
mendable in case one wants to trade interpretability for 
improved accuracy. 


13.2.3 Takagi-Sugeno-Kang 
Fuzzy Rule-Based Systems 


Instead of working with linguistic rules of the kind in- 
troduced in the previous section, Sugeno et al. [13.31, 
32] proposed a new model based on rules whose an- 
tecedent is composed of linguistic variables and the 
consequent is represented by a function of the input 
variables. The most common form of this kind of rules 
is the one in which the consequent expression consti- 


tutes a linear combination of the variables involved in 
the antecedent 


IF X, is Ay and ... and X, is An 


THEN Y = po+pi-X1 +++ +Pn- Xn, (13.24) 
where X; are the input variables, Y is the output variable, 
and p = (po, P1,- - - , Pn) is a vector of real parameters. 
Regarding A;, they are either a direct specification of 
a fuzzy set (thus X; being fuzzy variables) or a linguis- 
tic label that points to a particular member of a fuzzy 
partition of a linguistic variable. These rules, and conse- 
quently the systems using them, are usually called TSK 
fuzzy rules, in reference to the names of their first pro- 
ponents. 

The output of a TSK FRBS, using a KB composed 
of m rules, is obtained as a weighted sum of the indi- 
vidual outputs provided by each rule, Y;, i= 1,...,m, 
as follows: 


Ni hi . Y; 
Dihi | 


in which h; = T (Ai (x1), .. ., Ain(Xn)) is the matching 
degree between the antecedent part of the ith rule 
and the current inputs to the system, xo = (x1, . . . , Xn). 
T stands for a conjunctive operator modeled by a t- 
norm. Therefore, to design the inference engine of TSK 
FRBSs, the designer only selects this conjunctive op- 
erator T, with the most common choices being the 
minimum and the product. As a consequence, TSK sys- 
tems do not need defuzzification, being their outputs 
real numbers. 

This type of FRBS divides the input space in sev- 
eral fuzzy subspaces and defines a linear input—output 
relationship in each one of these subspaces [13.31]. 
In the inference process, these partial relationships are 
combined in the said way for obtaining the global 
input-output relationship, taking into account the dom- 
inance of the partial relationships in their respective 
areas of application and the conflicts emerging in the 
overlapping zones. As a result, the overall system per- 
forms as a sort of interpolation of the local models 
represented by each individual rule. 

TSK FRBSs have been successfully applied to 
a large variety of practical problems. The main ad- 
vantage of these systems is that they present a set of 
compact system equations that allows the parameters p; 
to be estimated by means of classical regression meth- 
ods, which facilitates the design process. However, the 
main drawback associated with TSK FRBSs is the form 


(13.25) 


211 


TEL | d Hed 


212 


TEL | a Hed 


Part B 


Fuzzy Logic 


of the rule consequents, which does not provide a natu- 
ral framework for representing expert knowledge that is 
afflicted with uncertainty. Still, it becomes possible to 
integrate expert knowledge in these FRBSs by slightly 
modifying the rule consequent: for each linguistic rule 
with consequent Y is B, provided by an expert, its con- 
sequent is substituted by Y = po, with po standing for 
the modal point of the fuzzy set associated with the la- 
bel B. These kinds of rules are usually called simplified 
TSK rules or zero-order TSK rules. 

However, TSK FRBSs are more difficult to interpret 
than Mamdani FRBSs due to two different reasons: 


© The structure of the rule consequents is difficult to 
be understood by human experts, except for zero- 
order TSK. 

© Their overall output simultaneously depends on the 
activation of the rule antecedents and on the evalu- 
ation of the function defining rule consequent, that 
depends itself on the crisp inputs as well, rather than 
being constant. 


TSK FRBSs are used in fuzzy modeling [13.4, 31] 
as well as control problems [13.31, 33]. 

As with Mamdani FRBSs, it is also possible to built 
descriptive as well as approximate TSK systems. 


13.2.4 Singleton Fuzzy Rule-Based Systems 


The singleton FRBS, where the rule consequent takes 
a single real-valued number, may be considered as 
a particular case of the linguistic FRBS (the consequent 
is a fuzzy set where the membership function is one 
for a specific value and zero for the remaining ones) or 
of the TSK-type FRBS (the previously described zero- 


order TSK systems). 
Its rule structure is the following 
IF X; is A; and ... and X, is An 
THEN Y is yọ. (13.26) 


Since the single consequent seems to be more easily 
interpretable than a polynomial function, the singleton 
FRBS may be used to develop linguistic fuzzy mod- 
els. Nevertheless, compared with the linguistic FRBS, 
the fact of having a different consequent value for each 
rule (no global semantic is used for the output variable) 
worsens the interpretability. 


13.2.5 Fuzzy Rule-Based Classifiers 


Previous sections have implicitly considered FRBSs 
working with inputs and, what is more important, out- 


puts which are real variables. These kinds of fuzzy 
systems show an interpolative behavior where the over- 
all output is a combination of the individual outputs of 
the fired rules. This interpolative behavior is explicit in 
TSK models but it is also present in Mamdani systems. 
This situation gives FRBSs a sort of smooth output, 
generating soft transitions between rules, and being one 
of the significant properties of FRBSs. 

A completely different situation is that of having 
a problem where the output takes values from a finite 
list of possible values representing categories or classes. 
Under those circumstances, the interpolative approach 
of previously defined aggregation and defuzzification 
methods, makes no sense. As a consequence, some ad- 
ditional comments will be added to highlight the main 
characteristics of fuzzy rule-based classifiers (FRBCs), 
and the differences with other FRBSs. 

A fuzzy rule-based classifier is an automatic clas- 
sification system that uses fuzzy rules as knowledge 
representation tool. Therefore, the fuzzy classification 
rule structure is as follows 


IF X; is A, and... 
THEN Y isC , 


and X, is Ay 
(13.27) 


with Y being a categorical variable, so C being a class 
label. The processing structure is similar to that previ- 
ously described in what concerns to the evaluation of 
matching degree between each rule’s antecedent and 
current input, i.e., for each rule R; we obtain h; = 
T (Aj (x1), .--,Ain(%n)). Once obtained h;, the winner 
rule criteria could be applied so that the overall output 
is assigned with the consequent of the rule achieving 
the highest matching degree (highest value of h;). More 
elaborated evaluations as voting are also possible. 

Other alternative representations that include a cer- 
tainty degree or weight for each rule have also been 
considered [13.34]. In this case, the previously de- 
scribed rule will also include a rule weight w; that 
weights the matching degree during the inference pro- 
cess. The effect will be that the winning rule will be 
that achieving the highest value of h; - w;, or in the case 
of voting schemes, the influence of the vote of the rule 
will be proportional to this value. 


13.2.6 Type-2 Fuzzy Rule-Based Systems 


The idea of extending fuzzy sets by allowing member- 
ship functions to include some kind of uncertainty was 
already mentioned by Zadeh in early papers [13.6-8]. 
The idea, that was not really exploited for a long period, 


Fuzzy Rule-Based Systems 


13.3 Hierarchical Fuzzy Rule-Based Systems 


has achieved now a significant presence in the literature 
with the proposal of Type-2 fuzzy systems and Interval 
type-2 fuzzy systems [13.35]. The main concept is that 
the membership degree is not a value but a fuzzy set or 
an interval, respectively. The effect is obtaining addi- 
tional degrees of freedom being available in the design 
process, but increasing the complexity of the process- 
ing structure that requires now a type-reduction step 
added to the overall process described in previous sec- 
tion. As the complexity of the type reduction process 
is much lower for Interval type-2 fuzzy systems than 
in the general case of type-2 fuzzy systems, interval ap- 
proaches are the most widely considered now in the area 
of Type-2 fuzzy sets. 


13.2.7 Fuzzy Systems with Implicative Rules 


Rule-based systems mentioned to this point consider 
tules that, having the form if X is A then Y is B, model 
the inference through a t-norm, usually minimum or 
product (Sect. 13.1.2). With this interpretation, rules are 


described as conjunctive rules, representing joint sets 
of possible input and output values. As mentioned in 
the chapter devoted to fuzzy control, these rules should 
be seen not as logical implications but rather as input— 
output associations. 

That kind of rule is the one commonly used in 
real applications to the date. However, different au- 
thors have pointed out that the same rule will have 
a completely different meaning when modeled in terms 
of material implications (the approach for Boolean 
if-then statements in propositional logic) [13.18]. As 
a result, in addition to the common interpretation 
of fuzzy rules that is widely considered in the lit- 
erature, some authors are exploring the modeling of 
fuzzy rules (with exactly the same structure pre- 
viously mentioned) by means of material implica- 
tions [13.36]. Even being in a quite preliminary stage 
of development, it is interesting to mention this ideas 
since it constitutes a completely different interpreta- 
tion of FRBSs, offering so new possibilities to the 
field. 


13.3 Hierarchical Fuzzy Rule-Based Systems 


The knowledge structure of FRBSs offers different 
options to introduce hierarchical structures. Rules, par- 
titions, or variables can be distributed at different levels 
according to their specificity, granularity, relevance, etc. 
This section will introduce different approaches to hier- 
archical FRBSs. 

It would be possible to consider hierarchical fuzzy 
systems as a different type of FRBS, so including it in 
previous section, or as a design option to build simpler 
FRBSs, being then included as part of the next section. 
Including it in previous section could be a little bit con- 
fusing since it is possible to combine the hierarchical 
approach with several of the structural models defined 
there, it seems better to consider it independently devot- 
ing a section to analyze them. 

The definition of hierarchical fuzzy systems as 
a method to solve problems with a higher level of 
complexity than those usually focused on with FRBSs, 
has produced some good results. In most of the cases, 
the underlying idea is to cope with the complexity of 
a problem by applying some kind of decomposition 
that generates a hierarchy of lower complexity sys- 
tems [13.37]. 

Several methods to establish hierarchies in fuzzy 
controllers have been proposed. These methods 


could be grouped according to the way they struc- 
ture the inference process, and the knowledge ap- 
plied. 

A first approach defines the hierarchy as a prioriti- 
zation of rules in such a way that rules with a different 
level of specificity receive a different priority, having 
higher priority those rules being more specific [13.38, 
39]. With this kind of hierarchy, a generic rule is ap- 
plied only when no suitable specific rule is available. In 
this case, the hierarchy is the effect of a particular im- 
plication mechanism applying the rules by taking into 
account its priority. This methodology defines the hi- 
erarchy (the decomposition) at the level of rules. The 
rules are grouped into prioritized levels to design a hi- 
erarchical fuzzy controller. 

Another option is that of considering a hierarchy of 
fuzzy partitions with different granularity [13.40]. From 
that point, an FRBS is structured in layers, where each 
layer contains fuzzy partitions with a different granu- 
larity, as well as the rules using those fuzzy partitions. 
Usually, every partition in a certain layer has the same 
number of fuzzy terms. In this case, rules at different 
layers have different granularity, being somehow re- 
lated to the idea of specificity of the previous paragraph. 
It is even possible to generate a multilevel grid-like 


213 


EEL | a Hed 


214 Part B 


Fuzzy Logic 


HEL | g Hed 


partition where only for some specific regions of the 
input space (usually those regions showing poor perfor- 
mance) a higher granularity is considered [13.41], with 
a similar approach to that already considered in some 
neuro-fuzzy systems [13.42]. 

A completely different point of view is that of in- 
troducing the decomposition at the level of variables. In 
this case, the input space is decomposed into subspaces 
of lower dimensionality, and each input variable is only 
considered at a certain level of the hierarchy. The re- 
sult is a cascade structure of FRBSs where, in addition 
to a subset of the input variables, the output of each 
level is considered as one of the inputs to the follow- 
ing level [13.43]. As a result, the system is decomposed 
into a finite number of reduced-order subsystems, elim- 
inating the need for a large-sized inference engine. This 
decomposition is usually stated as a way to maintain 
under control the problems generated by the so-called 
course of dimensionality, the exponential growth of the 
number of rules related to the number of variables of 
the system. 

The number of rules of an FRBS with n input vari- 
ables and / linguistic terms per variable, will be /”. In 
this approach to hierarchical FRBSs, the variables (and 
rules) are divided into different levels in such a way that 
those considered the most influential variables are cho- 
sen as input variables at the first level, the next most 
important variables are chosen as input variables at the 


second level, and so on. The output variable of each 
level is introduced as input variable at the following 
level. 

With that structure, the rules at first level of the 
FRBS have a similar structure to any Mamdani FRBS, 
i.e., that describe by (13.1), but at k-th level (k > 1), 
rules include the output of the previous level as input 


IF Xn,41 is LTy,41 and ... and Xn, +n, is LTN +n 
and O;—1 is LTox—; THEN O; is LTok, (13.28) 


where the value N; determines the input variables con- 
sidered in previous levels 


k—1 
N; = ) Nt, 
t=1 


with n, being the number of system variables applied at 
level ¢. Variable O% represent the output of the k level 
of the hierarchy. All outputs are intermediate variables 
except for the output of the last level that will be Y (the 
overall output of the system). 

With this structure it is shown [13.43] that the num- 
ber of rules in a complete rule base could be reduced 
to a linear function of the number of variables, while in 
a conventional FRBS it was an exponential function of 
the number of variables. 


(13.29) 


13.4 Fuzzy Rule-Based Systems Design 


Once defined the components and functioning of 
an FRBS, it is time to consider its design, i.e., how to 
built an FRBS to solve a certain problem while showing 
some specific properties. The present section will focus 
on this question. 

An FRBS can be characterized according to its 
structure and its behavior. When referring to its struc- 
ture, we can consider questions as the dimension of 
the system (number of variables, fuzzy sets, rules, etc.) 
as well as other aspects related to properties of its 
components (distinguishability of the fuzzy sets, re- 
dundancy of the fuzzy rules, etc.). On the other hand, 
the characterization related to the behavior mostly an- 
alyzes properties considering the input-output relation 
defined by the FRBS. In this area, we can include ques- 
tions as stability or accuracy. Finally, there is a third 
question that simultaneously involves structure and be- 
havior. This question is interpretability, a central aspect 


in fuzzy systems design that is considered in an inde- 
pendent chapter. 


13.4.1 FRBS Properties 


All the structural properties to be mentioned are related 
to properties of the KB, and basically cover charac- 
teristics related to the individual fuzzy sets, the fuzzy 
partitions related to each input and output variable, the 
fuzzy rules, and the rule set as a whole. 

The elemental components of the KB are fuzzy sets. 
At this level, we have several questions to be analyzed 
as normality, convexity, or differentiability of fuzzy 
sets; all of them being related to the properties of the 
membership function (ua (x)) defining the fuzzy set (A). 
In most applications the considered fuzzy sets adopt 
predefined shapes as triangular, trapezoidal, Gaussian, 
or bell; the fuzzy sets are then defined by only changing 


Fuzzy Rule-Based Systems | 13.4 Fuzzy Rule-Based Systems Design 


some parameters of these parameterized functions. In 
summary, most fuzzy sets considered in FRBSs are nor- 
mal and convex sets belonging to one of two possible 
families: piecewise linear functions and differentiable 
functions. Piecewise linear functions are basically tri- 
angular and trapezoidal functions offering a reduced 
complexity from the processing point of view. On the 
other hand, differentiable functions are mainly Gaus- 
sian, bell, and sigmoidal functions being better adapted 
to some kind of differential learning approaches as 
those used in neuro-fuzzy systems, but adding complex- 
ity from the processing point of view. 

Once individual fuzzy sets have been considered, 
the following level is that of fuzzy partitions related 
to each variable. The main characteristics of a fuzzy 
partition are cardinality, coverage, and distinguishabil- 
ity. Cardinality corresponds to the number of fuzzy sets 
that compose the fuzzy partitions. In most cases, this 
number ranges from 3 to 9, with 9 being an upper limit 
commonly accepted after the ideas of Miller [13.44]. 
The larger the number of fuzzy sets in the partition, the 
most difficult the design and interpretation of the FRBS. 
Coverage corresponds to the minimum membership de- 
gree with which any value of the variable (x), through 
its universe of discourse (U), will be assigned to at least 
a fuzzy set (A;) in the partition. Coverage is then defined 
as 


min max [4,(x) , (13.30) 
i=1...n 


xEU 
being n the cardinality of the partition. As an example, 
the fuzzy partition in Fig. 13.2 has a coverage of 0.5. 
Finally, distinguishability of fuzzy sets is related to the 
level of overlapping of their membership functions, be- 
ing analyzed with different expressions. 

On the basis of the fuzzy sets and fuzzy partitions, 
the fuzzy rules are built. The first structural question 
regarding fuzzy rules is the type of fuzzy rule to be con- 
sidered: Mamdani, TSK, descriptive or approximate, 
DNF, etc. If we consider now the interaction between 
the different fuzzy rules of a fuzzy system, questions as 
knowledge consistency or redundancy appear, i. e., does 
a fuzzy system include pieces of knowledge (usually 
rules) providing contradictory (or redundant) informa- 
tion for a specific situation. Finally, when considering 
the rule base as a whole, completeness and complex- 
ity are to be considered. Completeness refers to the 
fact that any potential input value will fire at least one 
rule. 

Considering now behavioral properties, the most 
widely analyzed are stability and accuracy. It is also 


possible to take into account other properties as con- 
tinuity or robustness, but we will concentrate in those 
having the larger presence in the literature. Behavioral 
properties are related to the overall system, i.e., to the 
processing structure as well as to the KB. 

Stability is a key aspect of dynamical systems anal- 
ysis, and plays a central role in control theory. FRBSs 
are nonlinear dynamical systems, and after its early ap- 
plication to control problems, the absence of a formal 
stability analysis was seriously criticized. As a con- 
sequence, the stability question received significant 
attention from the very beginning, at present being 
a widely studied problem [13.45] for both Mamdani 
and TSK fuzzy systems, considering the use of different 
approaches as Lyapunov’s methods, Popov criterion or 
norm-based analysis among others. 

Another question with a continuous presence in the 
literature is that of accuracy and the somehow related 
concept of universal approximation. The idea of fuzzy 
systems as universal approximators states that, given 
any continuous real-valued function on a compact sub- 
set of R”, we can, at least in theory, find an FRBS that 
approximates this function to any degree. This prop- 
erty has been established for different types of fuzzy 
systems [13.46—-48]. On this basis, the idea of build- 
ing fuzzy models with an unbounded level of accuracy 
can be considered. In any case, it is important to notice 
that previous papers proof the existence of such a model 
but assuming at the same time an unbounded complex- 
ity, i. e., the number of fuzzy sets and rules involved in 
the fuzzy system will usually grow as the accuracy im- 
proves. That means that improving accuracy is possible 
but always with a cost related either to the complexity 
of the system or to the relaxation of some of its proper- 
ties (usually interpretability). 


13.4.2 Designing FRBSs 


Given a modeling, classification, or control problem to 
be solved, and assumed it will be focused on through 
an FRBS, there are several steps in the process of 
design. The first decision is the choice between the 
different types of systems mentioned in Sect. 13.2, par- 
ticularly Mamdani and TSK approaches. They offer 
different characteristics related to questions as their ac- 
curacy and interpretability, as well as different methods 
for the derivation of its KB. 

Once chosen a type of FRBS, its design im- 
plies the construction of its processing structure as 
well as the derivation of its KB. Even consider- 
ing that there are several options to modify the 


215 


HEL | 9 Hed 


216 Part B 


Fuzzy Logic 


S'EL | 9 Hed 


processing structure of the system (Sect. 13.1.2), 
most designers consider a standard inference engine 
and concentrate on the knowledge extraction prob- 
lem. 

Going now to the knowledge extraction problem, 
some of its parts are common to any modeling pro- 
cess (being fuzzy or not). Questions as the selection of 
the input and output variables and the determination of 
the range of those variables are generic to any model- 
ing approach. The specific aspects related to the fuzzy 
environment are the definition of the fuzzy sets or the 
fuzzy partition related to each of those variables, and 
the derivation of a suitable set of fuzzy rules. These 
two components can be jointly derived in a single pro- 
cess, or sequentially performed by considering first the 
design of the fuzzy partition associated with each vari- 
able and then the fuzzy rules. The design process can 
be based on two main sources of information: expert 
knowledge and experimental data. 

If we first consider the definition of fuzzy sets and 
fuzzy partitions, quite different approaches [13.49] can 
be applied. Even the idea of simply generating a uni- 


13.5 Conclusions 


Fuzzy rule-based systems constitute a tool for repre- 
senting knowledge and reasoning on it. Jointly with 
fuzzy clustering techniques, FRBSs are probably the 
developments of fuzzy sets theory leading to the larger 
number of applications. These systems, being a kind of 
rule-based system, can be analyzed as knowledge-based 
systems showing a structure with two main compo- 
nents: knowledge and processing. The processing struc- 
ture relays on many concepts presented in previous 
chapters as fuzzy implications, connectives, relations 
and so on. In addition, some new concepts as fuzzifica- 
tion and defuzzification are required when constructing 
a fuzzy rule-based system. But the central concept 
of fuzzy rule-based systems are fuzzy rules. Different 
types of fuzzy rules have been considered, particularly 
those having a fuzzy (or not) consequent, producing dif- 
ferent types of FRBS. In addition, new formulations are 
being considered, e.g., implicative rules. Eventually, the 
representation capabilities of fuzzy sets have been con- 
sidered as too limited to represent some specific kinds 
of knowledge or information, and some extended types 
of fuzzy sets have been defined. Type-2 fuzzy sets are 
an example of extension of fuzzy sets. 


formly distributed strong fuzzy partition of a certain 
cardinality is widely considered. 

Going now to rules, Mamdani FRBSs are partic- 
ularly adapted to expert knowledge extraction, and 
knowledge elicitation for that kind of system has been 
widely considered in the literature. In any case, there 
is not a standard methodology for fuzzy knowledge 
extraction from experts and at present most practical 
works consider either a direct data-driven approach, or 
the integration of expert and data-driven knowledge ex- 
traction [13.50]. 

When considering data-driven knowledge extrac- 
tion, there is an almost endless list of approaches. 
Some options are the use of ad-hoc methods based 
on data covering measures (as [13.46]), the generation 
of fuzzy decisions trees [13.51], the use of cluster- 
ing techniques [13.52], and the use of hybrid systems 
where genetic fuzzy systems [13.53] and neuro fuzzy 
systems [13.54] represent the most widely considered 
approaches to fuzzy systems design. Some of those 
techniques produce both the partitions (or fuzzy sets) 
and the rules in a single process. 


Having been said that FRBSs are knowledge-based 
systems, and as a consequence, its design involves, 
apart from aspects related to the processing structure, 
the elicitation of a suitable KB properly describing the 
way to solve the problem under consideration. Even 
considering the large number of problems solved us- 
ing FRBSs, there is not a clear design methodology 
defining a well-established design protocol. In addi- 
tion, two completely different sources of knowledge, 
requiring different extraction approaches, have been 
considered when building FRBSs: expert knowledge 
and data. Many expert and data-driven knowledge ex- 
traction techniques and methods are described in the 
literature and can be considered. Connected to this 
question, as part of the process to provide automatic 
knowledge extraction capabilities to FRBSs, many hy- 
brid approaches have been proposed, genetic fuzzy 
systems and neuro-fuzzy systems being the most widely 
considered. 

In summary, FRBSs are a powerful tool to solve 
real world problems, but many theoretical aspects and 
design questions remain open for further investiga- 
tion. 


Fuzzy Rule-Based Systems 


References 


References 

13.1 A. Bardossy, L. Duckstein: Fuzzy Rule-Based Model- 
ing with Application to Geophysical, Biological and 
Engineering Systems (CRC, Boca Raton 1995) 

13,2 Z. Chi, H. Yan, T. Pham: Fuzzy Algorithms: With Ap- 
plications to Image Processing and Pattern Recog- 
nition (World Scientific, Singapore 1996) 

13.3 K. Hirota: Industrial Applications of Fuzzy Technol- 
ogy (Springer, Berlin, Heidelberg 1993) 

13.4 W. Pedrycz: Fuzzy Modelling: Paradigms and Prac- 
tice (Kluwer Academic, Dordrecht 1996) 

13.5 R.R. Yager, L.A. Zadeh: An Introduction to Fuzzy 
Logic Applications in Intelligent Systems (Kluwer 
Academic, Dordrecht 1992) 

13.6 L.A. Zadeh: The concept of a linguistic variable and 
its applications to approximate reasoning — Part |, 
Inf. Sci. 8(3), 199-249 (1975) 

13.7 L.A. Zadeh: The concept of a linguistic variable and 
its applications to approximate reasoning - Part Il, 
Inf. Sci. 8(4), 301-357 (1975) 

13.8 L.A. Zadeh: The concept of a linguistic variable and 
its applications to approximate reasoning — Part Ill, 
Inf. Sci. 9(1), 43-80 (1975) 

13.9 L.A. Zadeh: Outline of a new approach to the anal- 
ysis of complex systems and decision processes, 
IEEE Trans. Syst. Man Cybern. 3, 28-44 (1973) 

13.10 E.H. Mamdani: Applications of fuzzy algorithm for 
control of simple dynamic plant, Proc. IEE 121(12), 
1585-1588 (1974) 

13.11 E.H. Mamdani, S. Assilian: An experiment in lin- 
guistic synthesis with a fuzzy logic controller, Int. 
J. Man-Mach. Stud. 7, 1-13 (1975) 

13.12 L.X. Wang: Adaptive Fuzzy Systems and Control: De- 
sign and Analysis (Prentice Hall, Englewood Cliffs 
1994) 

13.13 T.J. Procyk, E.H. Mamdani: A linguistic self- 
organizing process controller, Automatica 15(1), 15- 
30 (1979) 

13.14 L. Magdalena: Adapting the gain of an FLC with ge- 
netic algorithms, Int. J. Approx. Reas. 17(4), 327-349 
(1997) 

13.15 G.C. Mouzouris, J.M. Mendel: Nonsingleton fuzzy 
logic systems: Theory and application, IEEE Trans. 
Fuzzy Syst. 5, 56-71 (1997) 

13.16 F. Klawonn, R. Kruse: Equality relations as a ba- 
sis for fuzzy control, Fuzzy Sets Syst. 54(2), 147-156 
(1993) 

13.17 J.M. Mendel: Fuzzy logic systems for engineering: 
A tutorial, Proc. IEEE 83(3), 345-377 (1995) 

13.18 D. Dubois, H. Prade: What are fuzzy rules and how 
to use them, Fuzzy Sets Syst. 84, 169-185 (1996) 

13.19 0. Cordón, F. Herrera, A. Peregrin: Applicability of 
the fuzzy operators in the design of fuzzy logic con- 
trollers, Fuzzy Sets Syst. 86, 15-41 (1997) 

13.20 D. Driankov, H. Hellendoorn, M. Reinfrank: An In- 


troduction to Fuzzy Control (Springer, Berlin, Hei- 
delberg 1993) 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


13. 


21 


22 


23 


24 


25 


27 


28 


29 


30 


31 


32 


33 


34 


35 


36 


37 


38 


M. Sugeno, T. Yasukawa: A fuzzy-logic-based ap- 
proach to qualitative modeling, IEEE Trans. Fuzzy 
Syst. 1(1), 7-31 (1993) 

C.C. Lee: Fuzzy logic in control systems: Fuzzy logic 
controller — Part |, IEEE Trans. Syst. Man Cybern. 
20(2), 404-418 (1990) 

C.C. Lee: Fuzzy logic in control systems: Fuzzy logic 
controller — Part Il, IEEE Trans. Syst. Man Cybern. 
20(2), 419-435 (1990) 

A. Bastian: How to handle the flexibility of linguis- 
tic variables with applications, Int. J. Uncertain. 
Fuzziness Knowl.-Based Syst. 3(4), 463-484 (1994) 
B. Carse, T.C. Fogarty, A. Munro: Evolving fuzzy rule 
based controllers using genetic algorithms, Fuzzy 
Sets Syst. 80, 273-294 (1996) 

A. Gonzalez, R. Pérez, J.L. Verdegay: Learning the 
structure of a fuzzy rule: A genetic approach, Fuzzy 
Syst. Artif. Intell. 3(1), 57-70 (1994) 

L. Magdalena, F. Monasterio: A fuzzy logic controller 
with learning through the evolution of its knowl- 
edge base, Int. J. Approx. Reas. 16(3/4), 335-358 
(1997) 

R. Alcala, J. Casillas, 0. Cordón, F. Herrera: Building 
fuzzy graphs: Features and taxonomy of learning 
for non-grid-oriented fuzzy rule-based systems, 
J. Intell. Fuzzy Syst. 11(3/4), 99-119 (2001) 

0. Cordon, F. Herrera: A three-stage evolutionary 
process for learning descriptive and approximate 
fuzzy logic controller knowledge bases from exam- 
ples, Int. J. Approx. Reas. 17(4), 369-407 (1997) 

L. Koczy: Fuzzy if ... then rule models and their 
transformation into one another, IEEE Trans. Syst. 
Man Cybern. 26(5), 621-637 (1996) 

T. Takagi, M. Sugeno: Fuzzy identification of sys- 
tems and its application to modeling and control, 
IEEE Trans. Syst. Man Cybern. 15(1), 116-132 (1985) 
M. Sugeno, G.T. Kang: Structure identification of 
fuzzy model, Fuzzy Sets Syst. 28(1), 15-33 (1988) 

R. Palm, D. Driankov, H. Hellendoorn: Model Based 
Fuzzy Control (Springer, Berlin, Heidelberg 1997) 

H. Ishibuchi, T. Nakashima: Effect of rule weights in 
fuzzy rule-based classification systems, IEEE Trans. 
Fuzzy Syst. 9, 506-515 (2001) 

J.M. Mendel: Type-2 fuzzy sets and systems: An 
overview, Comput. Intell. Mag. IEEE 2(1), 20-29 
(2007) 

H. Jones, B. Charnomordic, D. Dubois, S. Guil- 
laume: Practical inference with systems of gradual 
implicative rules, IEEE Trans. Fuzzy Syst. 17, 61-78 
(2009) 

V. Torra: A review of the construction of hierarchi- 
cal fuzzy systems, Int. J. Intell. Syst. 17(5), 531-543 
(2002) 

R.R. Yager: On a hierarchical structure for fuzzy 
modeling and control, IEEE Trans. Syst. Man Cybern. 
23(4), 1189-1197 (1993) 


217 


EL | d Hed 


218 + PartB 


Fuzzy Logic 


EL | d Hed 


13.39 


13.40 


13.41 


13.42 


13.43 


13.44 


13.45 


13.46 


13.47 


R.R. Yager: On the construction of hierarchical fuzzy 
systems models, IEEE Trans. Syst. Man Cybern. C 
28(1), 55-66 (1998) 

0. Cordón, F. Herrera, |. Zwir: Linguistic modeling by 
hierarchical systems of linguistic rules, IEEE Trans. 
Fuzzy Syst. 10, 2-20 (2002) 

E. D'Andrea, B. Lazzerini: A hierarchical approach to 
multi-class fuzzy classifiers, Exp. Syst. Appl. 40(9), 
3828-3840 (2013) 

H. Takagi, N. Suzuki, T. Koda, Y. Kojima: Neural 
networks designed on approximate reasoning ar- 
chitecture and their applications, IEEE Trans. Neural 
Netw. 3(5), 752-760 (1992) 

G.V.S. Raju, J. Zhou, R.A. Kisner: Hierarchical fuzzy 
control, Int. J. Control 54(5), 1201-1216 (1991) 

G.A. Miller: The magical number seven, plus or mi- 
nus two: Some limits on our capacity for processing 
information, Psychol. Rev. 63, 81-97 (1956) 

K. Michels, F. Klawonn, R. Kruse, A. Nürnberger: 
Fuzzy Control: Fundamentals, Stability and Design 
of Fuzzy Controllers (Springer, Berlin, Heidelberg 
2006) 

L.-X. Wang, J.M. Mendel: Fuzzy basis functions, 
universal approximation, and orthogonal least- 
squares learning, IEEE Trans. Neural Netw. 3(5), 
807-813 (1992) 

B. Kosko: Fuzzy systems as universal approxima- 
tors, IEEE Trans. Comput. 43(11), 1329-1333 (1994) 


13.48 


13.49 


13.50 


13.51 


13.52 


13.53 


13.54 


J.L. Castro: Fuzzy logic controllers are universal ap- 
proximators, IEEE Trans. Syst. Man Cybern. 25(4), 
629-635 (1995) 

R. Krishnapuram: Membership function elicitation 
and learning. In: Handbook of Fuzzy Computation, 
ed. by E.H. Ruspini, P.P. Bonissone, W. Pedrycz (IOP 
Publ., Bristol 1998) pp. 349-368 

J.M. Alonso, L. Magdalena: HILK++: An interpret- 
ability-guided fuzzy modeling methodology for 
learning readable and comprehensible fuzzy rule- 
based classifiers, Soft Comput. 15(10), 1959-1980 
(2011) 

N.R. Pal, S. Chakraborty: Fuzzy rule extraction from 
id3-type decision trees for real data, IEEE Trans. 
Syst. Man Cybern. B 31(5), 745-754 (2001) 

M. Delgado, A.F. GOmez-Skarmeta, F. Martin: 
A fuzzy clustering-based rapid prototyping for 
fuzzy rule-based modeling, IEEE Trans. Fuzzy Syst. 
5, 223-233 (1997) 

0. Cordón, F. Herrera, F. Hoffmann, L. Magdalena: 
Genetic Fuzzy Systems: Evolutionary Tuning and 
Learning of Fuzzy Knowledge Bases (World Scien- 
tific, Singapore 2001) 

D.D. Nauck, A. Nürnberger: Neuro-fuzzy systems: 
A short historical review. In: Computational In- 
telligence in Intelligent Data Analysis, ed. by 
C. Moewes, A. Nürnberger (Springer, Berlin, Heidel- 
berg 2013) pp. 91-109 


219 


14. interpretability of Fuzzy Systems: 
Current Research Trends and Prospects 


Jose M. Alonso, Ciro Castiello, Corrado Mencar 


Fuzzy systems are universally acknowledged as 
valuable tools to model complex phenomena 
while preserving a readable form of knowledge 
representation. The resort to natural language for 
expressing the terms involved in fuzzy rules, in 
fact, is a key factor to conjugate mathematical 


14.1 The Quest for Interpretability ............... 220 


14.1.1. Why Is Interpretability 
So Important? sessir 221 
14.1.2 A Historical Review..................0. 222 


14.2 Interpretability Constraints and Criteria 224 


14.2.1 Constraints and Criteria 


formalism and logical inference with human- tar Fuzzy aire Se aah 
- a 14.2.2 Constraints and Criteria 

centered interpretability. That makes fuzzy systems far Fuses Partitions 225 
specifically suitable in every real-world context 14.2.3 maiie Ai and Criteria pe pes 
where people are in charge of crucial decisions. oe for Fuzzy Rules 996 
This is because the self-explanatory nature of 2% Constraints and Citetia 
fuzzy rules profitably supports expert assessments. for Fuzzy Rule Bases .......c.cccsse0e- 226 
Additionally, as far as interpretability is investi- ins 

14.3 Interpretability Assessment.................. 227 


gated, it appears that (a) the simple adoption of 
fuzzy sets in modeling is not enough to ensure 
interpretability; (b) fuzzy knowledge representa- 
tion must confront the problem of preserving the 


overall system accuracy, thus yielding a trade- the Interpretability-Accuracy Pi 
off which is frequently debated. Such issues have Trade-Off..... A EEEIEE 229 z 
attracted a growing interest in the research com- 14.4.2 Design Decisions — 
munity and became to assume a central role in at Fuzzy Processing Level............ 232 F 


the current literature panorama of computational 
intelligence. This chapter gives an overview of the 
topics related to fuzzy system interpretability, fac- 


14.4 Designing Interpretable Fuzzy Systems. 229 


14.4.1 Design Strategies for the 
Generation of a KB Regarding 


14.5 Interpretable Fuzzy Systems 


in the Real World .............000000..0e, 233 


14.6 Future Research Trends 


ing the ambitious goal of proposing some answers on Interpretable Fuzzy Systems ............ 234 
w a number of open challenging quesions: unai let CONCIUSIONS saaien 234 
is interpretability? Why interpretability is worth 

REFEFENCOS....... eee cc ceeeceeeeeeeeeeseeeenees 235 


considering? How to ensure interpretability, and 
how to assess (quantify) it? Finally, how to design 
interpretable fuzzy models? 

The objective of this chapter is to provide some 
answers for the questions posed above. Section 14.1 
deals with the challenging task of setting a proper 
definition of interpretability. Section 14.2 intro- 
duces the main constraints and criteria that can 
be adopted to ensure interpretability when de- 
signing interpretable fuzzy systems. Section 14.3 
gives a brief overview of the soundest indexes for 


assessing interpretability. Section 14.4 presents 
the most popular approaches for designing fuzzy 
systems endowed with a good interpretability- 
accuracy trade-off. Section 14.5 enumerates some 
application fields where interpretability is a main 
concern. Section 14.6 sketches a number of chal- 
lenging tasks which should be addressed in the 
near future. Finally, some conclusions are drawn 
in Sect. 14.7. 


220 PartB 


Fuzzy Logic 


VHL | d Hed 


The key factor for the success of fuzzy logic stands 
in the ability of modeling and processing perceptions 
instead of measurements [14.1]. In most cases, such 
perceptions are expressed in natural language. Thus, 
fuzzy logic acts as a mathematical underpinning for 
modeling and processing perceptions described in nat- 
ural language. 

Historically, it has been acknowledged that fuzzy 
systems are endowed with the capability to conjugate 
a complex behavior and a simple description in terms 
of linguistic rules. In many cases, the compilation of 
fuzzy systems has been accomplished manually, with 
human knowledge purposely injected in fuzzy rules in 
order to model the desired behavior (the rules could 
be eventually tuned to improve the system accuracy). 
In addition, the great success of fuzzy logic led to the 
development of many algorithms aimed at acquiring 
knowledge from data (expressing it in terms of fuzzy 
rules). This made the automatic design of fuzzy sys- 
tems (through data-driven design techniques) feasible. 
Moreover, theoretical studies proved the universal ap- 
proximation capabilities of such systems [14.2]. 


14.1 The Quest for Interpretability 


Answering the question What is interpretability? is 
not straightforward. Defining interpretability is a chal- 
lenging task since it deals with the analysis of the 
relation occurring between two heterogeneous entities: 
a model of the system to be designed (usually formal- 
ized through a mathematical definition) and a human 
user (meant not as a passive beneficiary of a system’s 
outcome, but as an active reader and interpreter of the 
model’s working engine). In this sense, interpretability 
is a quality which is inherent in the model and yet it 
refers to an act performed by the user who is willing to 
grasp and explain the meaning of the model. 

To pave the way for the definition of such a relation, 
a common ground must be settled. This could be rep- 
resented by a number of fundamental properties to be 
incorporated into a model, so that its formal description 
becomes compatible with the user’s knowledge repre- 
sentation. In this way, the human user may interface the 
mathematical model resting on concepts that appear to 
be suitable to deal with it. The quest for interpretability, 
therefore, calls for the identification of several features. 
Among them, resorting to an appropriate framework for 
knowledge representation is a crucial element and the 
adoption of a fuzzy inference engine based on fuzzy 


The adoption of data-driven design techniques is 
a common practice nowadays. Nevertheless, while 
fuzzy sets can be generally used to model perceptions, 
some of them do not lead to a straight interpretation 
in natural language. In consequence, the adoption of 
accuracy-driven algorithms for acquiring knowledge 
from data often results in unintelligible models. In 
those cases, the fundamental plus of fuzzy logic is 
lost and the derived models are comparable to other 
measurement-based models (like neural networks) in 
terms of knowledge interpretability. 

In a nutshell, interpretability is not granted by 
the adoption of fuzzy logic which represents a nec- 
essary yet not a sufficient requirement for modeling 
and processing perceptions. However, interpretability 
is a quality that is not easy to define and quantify. 
Several open and challenging questions arise while con- 
sidering interpretability in fuzzy modeling: What is 
interpretability? Why interpretability is worth consid- 
ering? How to ensure interpretability? How to assess 
(quantify) interpretability? How to design interpretable 
fuzzy models? And so on. 


rules is straightforward to approach the linguistic-based 
formulation of concepts which is typical of the human 
abstract thought. 

A distinguishing feature of a fuzzy rule-based 
model is the double level of knowledge representation. 
The lower level of representation is constituted by the 
formal definition of the fuzzy sets in terms of their 
membership functions, as well as the aggregation func- 
tions used for inference. This level of representation 
defines the semantics of a fuzzy rule-based model as 
it determines the behavior of the model, i.e. the in- 
put/output mapping for which it is responsible. 

On the higher level of representation, knowledge is 
represented in the form of rules. They define a formal 
structure where linguistic variables are involved and re- 
ciprocally connected by some formal operators, such as 
AND, THEN, and so on. Linguistic variables correspond 
to the inputs and outputs of the model. The (sym- 
bolic) values they assume are related to linguistic terms 
which, in turn, are mapped to the fuzzy sets defined 
in the lower level of representation. The formal oper- 
ators are likewise mapped to the aggregation functions. 
This mapping provides the interpretative transition that 
is quite common in the mathematical context: a formal 


Interpretability of Fuzzy Systems | 14.1 The Quest for Interpretability 


structure is assigned semantics by mapping symbols 
(linguistic terms and operators) to objects (fuzzy sets 
and aggregation functions). 

In principle, the mapping of linguistic terms to 
fuzzy sets is arbitrary. It just suffices that identical lin- 
guistic terms are mapped to identical fuzzy sets. Of 
course, this is not completely true for formal opera- 
tors (e.g., t-norms, implications, etc.). The correspond- 
ing aggregation functions should satisfy a number of 
constraints; however some flexibility is possible. Nev- 
ertheless, the mere use of symbols in the high level of 
knowledge representation implies the establishment of 
a number of semiotic relations that are fundamental for 
the quest of interpretability of a fuzzy model. In partic- 
ular, linguistic terms — as usually picked from natural 
language — must be fully meaningful for the expected 
reader since they denote concepts, i. e. mental represen- 
tations that allow people to draw appropriate inferences 
about the entities they encounter. 

Concepts and fuzzy sets, therefore, are both denoted 
by linguistic terms. Additionally, concepts and fuzzy 
sets play a similar role: the former (being part of the 
human knowledge) contribute to determine the behav- 
ior of a person; the latter (being the basic elements of 
a fuzzy rule base) contribute to determine the behavior 
of a system to be modeled. As a consequence, concepts 
and fuzzy sets are implicitly connected by means of 
common linguistic terms they are related to, which re- 
fer to object classes in the real world. The key essence 
of interpretability is therefore the property of cointen- 
sion [14.3] between fuzzy sets and concepts, consisting 
in the possibility of referring to similar classes of ob- 
jects: such a possibility is assured by the use of common 
linguistic terms. 

Semantic cointension is a key issue when dealing 
with interpretability of fuzzy systems. It has been in- 
troduced and centered on the role of fuzzy sets, but 
it can be easily extended to refer to some more com- 
plex structures, such as fuzzy rules or the whole fuzzy 
models. In this regard, a crisp assertion about the im- 
portance of cointension pronounced at the level of the 
whole model is given by the Michalski’s Comprehensi- 
bility Postulate [14.4]: 


The results of computer induction should be sym- 
bolic descriptions of given entities, semantically 
and structurally similar to those a human expert 
might produce observing the same entities. Com- 
ponents of these descriptions should be compre- 
hensible as single chunks of information, directly 
interpretable in natural language, and should relate 


quantitative and qualitative concepts in an inte- 
grated fashion. 


It should be observed that the above postulate 
has been formulated in the general area of machine 
learning. Nevertheless, the assertion made by Michal- 
ski has important consequences in the specific area 
of fuzzy modeling (FM) too. According to the Com- 
prehensibility Postulate, results of computer induction 
should be described symbolically. Symbols are nec- 
essary to communicate information and knowledge; 
hence, pure numerical methods, such as neural net- 
works, are not suited for meeting interpretability unless 
an interpretability-oriented postprocessing of the result- 
ing knowledge is performed. 

The key point of the Michalski’s postulate is the 
human centrality of the results of a computer induc- 
tion process. The importance of the human compo- 
nent implicitly suggests a novel aspect to be taken 
into account in the quest for interpretability. Actu- 
ally, the semantic cointension is related to one facet 
of the interpretability process, which can be referred 
to as comprehensibility of the content and behavior of 
a fuzzy model. In other words, cointension concerns 
the semantic interpretation performed by a user de- 
termined to comprehend such a model. On the other 
hand, when we turn to consider the cognitive capa- 
bilities of human brains and their intrinsic limitations, 
then a different facet of the interpretability process 
can be defined in terms of readability of the bulk 
of information conveyed by a fuzzy model. In that 
case, simplicity is required to perform the interpretation 
process because of the limited ability to store informa- 
tion in the human brain’s short-term memory [14.5]. 
Therefore, structural measures concerning the com- 
plexity of a rule base affect the cognitive efforts of 
a user determined to read and interpret a fuzzy model. 


Comprehensibility and readability represent two 
facets of a common issue and both of them are to 
be considered while assessing the interpretability pro- 
cess. In particular, this distinction should be acknowl- 
edged when criteria are specifically designed to provide 
a quantitative definition of interpretability. 


14.1.1 Why Is Interpretability So Important? 


A great number of inductive modeling techniques are 
currently available to acquire knowledge from data. 
Many of these techniques provide predictive models 
that are very accurate and flexible enough to be applied 


221 


LHL | g Hed 


222 


Ll | g Hed 


Part B 


Fuzzy Logic 


in a wide range of applications. Nevertheless, the re- 
sulting models are usually considered as black boxes, 
i.e. models whose behavior cannot be easily explained 
in terms of the model structure. On the other hand, the 
use of fuzzy rule-based models is a matter of design 
choice: whenever interpretability is a key factor, fuzzy 
rule-based models should be naturally preferred. It is 
worth noting that interpretability is a distinguishing fea- 
ture of fuzzy rule-based models. Several reasons justify 
a choice inclined toward interpretability. They include 
but are not limited to: 


© Integration: In an interpretable fuzzy rule-based 
model the acquired knowledge can be easily verified 
and related to the domain knowledge of a hu- 
man expert. In particular, it is easy to verify if 
the acquired knowledge expresses new and inter- 
esting relations about the data; also, the acquired 
knowledge can be refined and integrated with ex- 
pert knowledge. 

© Interaction: The use of natural language as a mean 
for knowledge communication enables the possibil- 
ity of interaction between the user and the model. 
Interactivity is meant to explore the acquired knowl- 
edge. In practice, it can be done at symbolical level 
(by adding new rules or modifying existing ones) 
and/or at numerical level (by modifying the fuzzy 
sets denoted by linguistic terms; or by adding new 
linguistic terms denoting new fuzzy sets). 

© Validation: The acquired knowledge can be eas- 
ily validated against common-sense knowledge and 
domain-specific knowledge. This capability enables 
the detection of semantic inconsistencies that may 
have different causes (misleading data involved in 
the inductive process, local minimum where the 
inductive process may have been trapped, data over- 
fitting, etc.). This kind of anomaly detection is 
important to drive the inductive process toward 
a qualitative improvement of the acquired knowl- 
edge. 

© Trust: The most important reason to adopt inter- 
pretable fuzzy models is their inherent ability to 
convince end users about the reliability of a model 
(especially those users not concerned with knowl- 
edge acquisition techniques). An interpretable fuzzy 
rule-based model is endowed with the capability of 
explaining its inference process so that users may 
be confident on how it produces its outcomes. This 
is particularly important in such domains as medi- 
cal diagnosis, where a human expert is the ultimate 
responsible for a decision. 


14.1.2 A Historical Review 


It has been long time since Zadeh’s seminal work on 
fuzzy sets [14.6] and nowadays there are lots of fruit- 
ful research lines related to fuzzy logic [14.7]. Hence, 
we can state that fuzzy sets and systems have become 
the subjects of a mature research field counting several 
works both theoretical and applied in their scope. Fig- 
ure 14.1 shows the distribution of publications per year 
regarding interpretability issues. Three main phases can 
be identified taking into account the historical evolution 
of FM. 


From 1965 to 1990 

During this initial period, interpretability emerged 
naturally as the main advantage of fuzzy systems. 
Researchers concentrated on building fuzzy models 
mainly working with expert knowledge and a few sim- 
ple linguistic variables [14.8—10] and linguistic rules 
usually referred to as Mamdani rules [14.11]. As a re- 
sult, those designed fuzzy models were characterized 
by their high interpretability. Moreover, interpretability 
is assumed as an intrinsic property of fuzzy systems. 
Therefore, there are only a few publications regard- 
ing interpretability issues. Note that the first proposal 
of a fuzzy rule-based system (FRBS) was presented 
by Mamdani who was able to augment Zadeh’s initial 
formulation allowing the application of fuzzy systems 
to a control problem. These kinds of fuzzy systems 
are also referred to as fuzzy logic controllers, as pro- 
posed by the author in his pioneering paper. In addition, 
Mamdani-type FRBSs soon became the main tool to de- 
velop linguistic models. Of course, many other rule for- 
mats were arising and gaining importance. In addition 
to Mamdani FRBSs, probably the most famous FRBSs 
are those proposed by Takagi and Sugeno [14.12], the 
popular TSK fuzzy systems, where the conclusion is 
a function of the input values. Due to their current popu- 
larity, in the following we will use the term fuzzy system 
to denote Mamdani-type FRBSs and their subsequent 
extensions. 


From 1990 to 2000 
In the second period the focus was set on accuracy. 
Researchers realized that expert knowledge was not 
enough to deal with complex systems. Thus, they ex- 
plored the use of fuzzy machine learning techniques 
to automatically extract knowledge from data [14.13, 
14]. Accordingly, those designed fuzzy models became 
composed of extremely complicated fuzzy rules with 
high accuracy but at the cost of disregarding inter- 


Interpretability of Fuzzy Systems | 14.1 The Quest for Interpretability 


Publications 


Expert—driven FM Data—driven FM 


bh œ 


oca —™e—_ Ini m 


Interpretability—oriented FM 


1983 1985 1987 1989 1991 1993 1995 ae 


I 2001 2003 2005 2007 2009 2011 2013 
Year 


Fig. 14.1 Publications per year related to interpretability issues 


pretability as a side effect. Obviously, automatically 
generated rules were rarely as readable as desired. 
Along this period some researchers started claiming 
that fuzzy models are not interpretable per se. Inter- 
pretability is a matter of careful design. Thus, inter- 
pretability issues must be deeply analyzed and seriously 
discussed. Although the amount of publications related 
to interpretability issues is still small in this period, 
please pay attention to the fact that publications begin 
to grow exponentially at the end of this second phase. 


From 2000 to 2013 
After the two previous periods, researchers realized that 
both expert-driven (from 1965 to 1990) and data-driven 
(from 1990 to 2000) design approaches have their own 
advantages and drawbacks, but they are somehow com- 
plementary. For instance, expert knowledge is general 
and easy to interpret but hard to formalize. On the con- 
trary, knowledge derived from data can be extracted 
automatically but it becomes quite specific and its inter- 
pretation is usually hard [14.15]. Moreover, researchers 
were aware of the need of taking into account simulta- 
neously interpretability and accuracy during the design 
of fuzzy models. As a result, during this third phase 
the main challenge was how to combine expert knowl- 
edge and knowledge extracted from data, with the aim 
of designing compact and robust systems with a good 
interpretability—accuracy trade-off. When considering 
both interpretability and accuracy in FM, two main 
strategies turn up naturally [14.16]: linguistic fuzzy 
modeling (LFM) and precise fuzzy modeling (PFM). On 
the one hand, in LFM, designers first focus on the inter- 
pretability of the model, and then they try to improve its 
accuracy [14.17]. On the other hand, in PFM, design- 


ers first build a fuzzy model maximizing its accuracy, 
and then they try to improve its interpretability [14.18]. 
As an alternative, since accuracy and interpretability 
represent conflicting goals by nature, multiobjective 
fuzzy modeling strategies (considering accuracy and 
interpretability as objectives) have become very popu- 
lar [14.19, 20]. 

At the same time, there has been a great effort 
for formalizing interpretability issues. As a result, the 
number of publications has grown. Researchers have 
actively looked for the right definition of interpretabil- 
ity. In addition, several interpretability constraints have 
been identified. Moreover, interpretability assessment 
has become a hot research topic. In fact, several in- 
terpretability indexes (able to guide the FM design 
process) have been defined. Nevertheless, a universal 
index widely admitted is still missing. Hence, further 
research on interpretability issues is demanded. 

Unfortunately, although the number of publications 
was growing exponentially until 2009, later it started 
decreasing. We would like to emphasize the impact of 
the two pioneer books [14.17, 18] edited in 2003. They 
contributed to make the fuzzy community aware of the 
need to take into account again interpretability as a main 
research concern. It is worth noting that the first formal 
definition of interpretability (in the fuzzy literature) was 
included in [14.18]. It was given by Bodenhofer and 
Bauer [14.21] who established an axiomatic treatment 
of interpretability at the level of linguistic variables. 

We encourage the fuzzy community to keep pay- 
ing attention to interpretability issues because there is 
still a lot of research to be done. Interpretability must 
be the central point on system modeling. In fact, some 
of the hottest and most recent research topics like pre- 


223 


LHL | g Hed 


224 PartB 


Fuzzy Logic 


cat | d Hed 


High-level 


Abstraction 
levels 


Low-level 


cisiated natural language, computing with words, and 
human centric computing strongly rely on the inter- 
pretability of the designed models. The challenge is 
to better exploit fuzzy logic techniques for improving 


the human-centric character of many intelligent sys- 
tems. Therefore, interpretability deserves consideration 
as a main research concern and the number of publica- 
tions should grow again in the next years. 


14.2 Interpretability Constraints and Criteria 


Interpretability is a quality of fuzzy systems that is not 
immediate to quantify. Nevertheless, a quantitative def- 
inition is required both for assessing the interpretability 
of a fuzzy system and for designing new fuzzy systems. 
This requirement is especially stringent when fuzzy 
systems are automatically designed from data, through 
some knowledge extraction procedure. 

A common approach for defining interpretability 
is based on the adoption of a number of constraints 
and criteria that, taken as a whole, provide for a def- 
inition of interpretability. This approach is inherent 
to the subjective nature of interpretability, because 
the validity of some conditions/criteria is not univer- 
sally acknowledged and may depend on the application 
context. 

In the literature, a large number of interpretability 
constraints and criteria can be found. Some of them 
are widely accepted, while others are controversial. The 
nature of these constraints and criteria is also diverse. 
Some are neatly defined as a mathematical condition, 
others have a fuzzy character and their satisfaction is 
a matter of degree. This section is addressed to give 
a brief yet homogeneous outline of the best known 
interpretability constraints and criteria. The reader is re- 


> Compactness 

> Average firing rules 
> Logical view 

> Completeness 

> Locality 


t Fuzzy rule bases 


> Description length 


Fuzzy rule 
wey j > Granular output 


> Justifiable number of elements 
> Distinguishability 

> Coverage 

> Relation preservation 

| > Prototypes on special elements 


Fuzzy partitions 


> Normality 
> Continuity 
> Convexity 


Fuzzy sets 


Fig. 14.2 Interpretability constraints and criteria in different ab- 
straction levels 


ferred to the specialized literature for deeper insights on 
this topic [14.22, 23]. 

Several ways are available to categorize inter- 
pretability constraints and criteria. It could be possible 
to refer to their specific nature (e.g., crisp vs. fuzzy), 
to the components of the fuzzy system where they are 
applied, or to the description level of the fuzzy system 
itself. Here, as depicted in Fig. 14.2, we choose a hi- 
erarchical organization that starts from the most basic 
components of a fuzzy system, namely the involved 
fuzzy sets, and goes on toward more complex levels, 
such as fuzzy partitions, fuzzy rules, up to considering 
the model as a whole. 


14.2.1 Constraints and Criteria for Fuzzy Sets 


Fuzzy sets are the basic elements of fuzzy systems and 
their role is to express elementary yet imprecise con- 
cepts that can be denoted by linguistic labels. Here 
we assume that fuzzy sets are defined on a universe 
of discourse represented by a closed interval of the 
real line (this is the case of most fuzzy systems, espe- 
cially those acquired from data). Thus, fuzzy sets are 
the building blocks to translate a numerical domain in 
a linguistically quantified domain that can be used to 
communicate knowledge. 

Generally speaking, single fuzzy sets are employed 
to express elementary concepts and, through the use of 
connectives, are combined to represent more complex 
concepts. However, not all fuzzy sets can be related to 
elementary concepts, since the membership function of 
a fuzzy set may be very awkward but still legitimate 
from a mathematical viewpoint. Actually, a subclass of 
fuzzy sets should be considered, so that its members 
can be easily associated with elementary concepts and 
tagged by the corresponding linguistic labels. Fuzzy 
sets of this subclass must verify a number of basic in- 
terpretability constraints, including: 


© Normality: At least one element of the universe 
of discourse is a prototype for the fuzzy set, i.e. 
it is characterized by a full membership degree. 


Interpretability of Fuzzy Systems 


14.2 Interpretability Constraints and Criteria 


A normal fuzzy set represents a concept that fully 
qualifies at least one element of the universe of dis- 
course, i. e. the concept has at least one example that 
fulfills it. On the other hand, a subnormal fuzzy set 
is usually a consequence of a partial contradiction 
(it is easy to show that the degree of inclusion of 
a subnormal fuzzy set in the empty set is nonzero). 

© Continuity: The membership function is continu- 
ous on the universe of discourse. As a matter of 
fact, most concepts that can be naturally represented 
through fuzzy sets derive from a perceptual act, 
which comes from external stimuli that usually vary 
in continuity. Therefore, continuous fuzzy sets are 
better in accordance with the perceptive nature of 
the represented concepts. 

© Convexity: In a convex fuzzy set, given three el- 
ements linearly placed on the axis related to the 
universe of discourse, the degree of membership of 
the middle element is always greater than or equal 
to the minimum membership degree of the side ele- 
ments [14.24]. This constraint encodes the rule that 
if a property is satisfied by two elements, then it is 
also satisfied by an element settled between them. 


14.2.2 Constraints and Criteria 
for Fuzzy Partitions 


The key success factor of fuzzy logic in modeling 
is the ability of expressing knowledge linguistically. 
Technically, this is realized by linguistic variables, i.e. 
variables that assume symbolic values called linguis- 
tic terms. The peculiarity of linguistic variables with 
respect to classical symbolic approaches is the interpre- 
tation of linguistic terms as fuzzy sets. The collection of 
fuzzy sets used as interpretation of the linguistic terms 
of a linguistic variable forms a fuzzy partition of the 
universe of discourse. 

To understand the role of a fuzzy partition, we 
should consider that it is meant to define a relation 
among fuzzy sets. Such a relation must be co-intensive 
with the one connecting the elementary concepts repre- 
sented by the fuzzy sets involved in the fuzzy partition. 
That is the reason why the design of fuzzy partitions 
is so crucial for the overall interpretability of a fuzzy 
system. The most critical interpretability constraints for 
fuzzy partitions are: 


© Justifiable number of elements: The number of 
fuzzy sets included in a linguistic variable must be 
small enough so that they can be easily remembered 


and recalled by users. Psychological studies suggest 
at most nine fuzzy sets or even less [14.5, 25]. Usu- 
ally, three to five fuzzy sets are convenient choices 
to set the partition cardinality. 

Distinguishability: Since fuzzy sets are denoted 
by distinct linguistic terms, they should refer to 
well-distinguished concepts. Therefore, fuzzy sets 
in a partition should be well separated, although 
some overlapping is admissible because usually 
perception-based concepts are not completely dis- 
joint. Several alternatives are available to quantify 
distinguishability, including similarity and possibil- 
ity [14.26]. 

Coverage: Distinguishable fuzzy sets are necessary, 
but if they are too much separated they risk to 
under-represent some subset of the universe of dis- 
course. The coverage constraint requires that each 
element of the universe of discourse must belong to 
at least one fuzzy set of the partition with a mem- 
bership degree not less than a threshold [14.22]. 
This requirement involves that each element of the 
universe of discourse has some quality that is well 
represented in the fuzzy partition. On the other 
hand, the lack of coverage is a signal of incom- 
pleteness of the fuzzy partition that may hamper the 
overall comprehensibility of the system’s knowl- 
edge. Coverage and distinguishability are somewhat 
conflicting requirements that are usually balanced 
by fuzzy partitions that enforce the intersection of 
adjacent fuzzy sets to elements whose maximum 
membership degree is equal to a threshold (usually 
the value of this threshold is set to 0.5). 

Relation preservation: The concepts that are rep- 
resented by the fuzzy sets in a fuzzy partition are 
usually cross related. The most immediate relation 
which can be conceived among concepts is related 
to the order (e.g., Low preceding Medium, preced- 
ing High, and so on). Relations of this type must 
be preserved by the corresponding fuzzy sets in the 
fuzzy partition [14.27]. 

Prototypes on special elements: In many problems, 
some elements of the universe of discourse have 
some special meaning. A common case is the mean- 
ing of the bounds of the universe of discourse, 
which usually represent some extreme qualities 
(e.g., Very Large or Very Small). Other examples 
are possible, which could be aside from the bounds 
of the universe of discourse being, instead, more 
problem-specific (e.g., prototypes could be con- 
ceived for the icing point of water, the typical 


225 


cal | d Hed 


226 PartB 


Fuzzy Logic 


cal | d Hed 


human body temperature, etc.). In all these cases, 
the prototypes of some fuzzy sets of the partition 
must coincide with such special elements. 


14.2.3 Constraints and Criteria 
for Fuzzy Rules 


In most cases, a fuzzy system is defined over a multi- 
dimensional universe of discourse that can be split into 
many one-dimensional universes of discourse, each of 
them associated with a linguistic variable. A subset of 
these linguistic variables is used to represent the input 
of a system, while the remaining variables (usually only 
one variable) are used to represent the output. The in- 
put/output behavior is expressed in terms of rules. Each 
rule prescribes a linguistic output value when the input 
matches the rule condition (also called rule premise), 
usually expressed as a logical combination of soft con- 
straints. A soft constraint is a linguistic proposition 
(specification) that ties a linguistic variable to a linguis- 
tic term (e.g., Temperature is High). Furthermore, the 
soft constraints combined in a rule condition may in- 
volve different linguistic variables (e.g., Temperature is 
High AND Pressure is Low). 

A fuzzy rule is a unit of knowledge that has the 
twofold role of determining the system behavior and 
communicating this behavior in a linguistic form. The 
latter feature urges to adopt a number of interpretability 
constraints which are to be added up to the constraints 
required for fuzzy sets and fuzzy partitions. Some of the 
most general interpretability constraints and criteria for 
fuzzy rules are as follows: 


@ Description length: The description length of 
a fuzzy rule is the sum of the number of soft con- 
straints occurring in the condition and in the conse- 
quent of the rule (it is usually known as total rule 
length). In most cases, only one linguistic variable 
is represented in a rule consequent, therefore the de- 
scription length of a fuzzy rule is directly related 
to the complexity of the condition. A small number 
of soft constraints in a rule implies both high read- 
ability and semantic generality; hence, short rules 
should be preferred in fuzzy systems. 

© Granular outputs: The main strength of fuzzy sys- 
tems is their ability to represent and process im- 
precision in both data and knowledge. Imprecision 
is part of fuzzy inference, therefore the inferred 
output of a fuzzy system should carry information 
about the imprecision of its knowledge. This can be 
accomplished by using fuzzy sets as outputs. De- 


fuzzification collapses fuzzy sets into single scalars; 
it should be therefore used only when strictly nec- 
essary and in those situations where outputs are not 
the object of user interpretation. 


14.2.4 Constraints and Criteria 
for Fuzzy Rule Bases 


As previously stated, the interpretability of a rule base 
taken as a whole has two facets: (1) a structural facet 
(readability), which is mainly related to the easiness 
of reading the rules; (2) a semantic facet (compre- 
hensibility), which is related to the information con- 
veyed to the users who are willing to understand the 
system behavior. The following interpretability con- 
straints and criteria are commonly defined to ensure 
the structural and semantic interpretability of fuzzy rule 
bases. 


© Compactness: A compact rule base is defined by 
a small number of rules. This is a typical structural 
constraint that advocates for simple representation 
of knowledge in order to allow easy reading and un- 
derstanding. Nevertheless, a small number of rules 
usually involves low accuracy; it is therefore very 
common to balance compactness and accuracy in 
a trade-off that mainly depends on user needs. 

© Average firing rules: When an input is applied to 
a fuzzy system, the rules whose conditions are ver- 
ified to a degree greater than zero are firing, i.e. 
they contribute to the inference of the output. On 
an average, the number of firing rules should be as 
small as possible, so that users are able to under- 
stand the contributions of the rules in determining 
the output. 

© Logical view: Fuzzy rules resemble logical proposi- 
tions when their linguistic description is considered. 
Since linguistic description is the main mean for 
communicating knowledge, it is necessary that log- 
ical laws are applicable to fuzzy rules; otherwise, 
the system behavior may result counter intuitive. 
Therefore, the validity of some basic laws of the 
propositional logic (like Modus Ponens) and the 
truth-preserving operations (e.g., application of dis- 
tributivity, De Morgan laws, etc.) should also be 
verified for fuzzy rules. 

© Completeness: The behavior of a fuzzy system is 
well defined for all inputs in the universe of dis- 
course; however, when the maximum firing strength 
determined by an input is too small, it is not easy to 
justify the behavior of the system in terms of the 


Interpretability of Fuzzy Systems | 14.3 Interpretability Assessment 


activated rules. It is therefore required that for each 
possible input at least one rule is activated with a fir- 
ing strength greater than a threshold value (usually 
set to 0.5) [14.22]. 

© Locality: Each rule should define a local model, 
i.e. a fuzzy region in the universe of discourse 
where the behavior of the system is mainly due 
to the rule and only marginally by other rules that 
are simultaneously activated [14.28]. This require- 
ment is necessary to avoid that the final output 
of the system is a consequence of an interpolative 
behavior of different rules that are simultaneously 
activated with high firing strengths. On the other 
hand, a moderate overlapping of local models is 
admissible in order to enable a smooth transition 
from a local model to another when the input 


14.3 Interpretability Assessment 


The interpretability constraints and criteria presented 
in previous section belong to two main classes: (1) 
structural constraints and criteria referring to the static 
description of a fuzzy model in terms of the elements 
that compose it; (2) semantic constraints and criteria 
quantifying interpretability by looking at the behav- 
ior of the fuzzy system. Whilst structural constraints 
address the readability of a fuzzy model, semantic con- 
straints focus on its comprehensibility. 

Of course, interpretability assessment must regard 
both global (description readability) and local (infer- 
ence comprehensibility) points of view. It must also 
take into account both structural and semantic issues 
when considering all components (fuzzy sets, fuzzy 
partitions, linguistic partitions, linguistic propositions, 
fuzzy rules, fuzzy operators, etc.) of the fuzzy system 
under study. 

Thus, assessing interpretability represents a chal- 
lenging task mainly because the analysis of inter- 
pretability is extremely subjective. In fact, it clearly 
depends on the feeling and background (knowledge, ex- 
perience, etc.) of the person who is in charge of making 
the evaluation. Even though having subjective indexes 
would be really appreciated for personalization pur- 
poses, looking for a universal metric widely admitted 
also makes the definition of objective indexes manda- 
tory. Hence, it is necessary to consider both objective 
and subjective indexes. On the one hand, objective in- 
dexes are aimed at making feasible fair comparisons 
among different fuzzy models designed for solving 


values gradually shift from one fuzzy region to an- 
other. 


In summary, a number of interpretable constraints 
and criteria apply to all levels of a fuzzy system. This 
section highlights only the constraints that are general 
enough to be applied independently on the model- 
ing problem; however, several problem-specific con- 
straints are also reported in the literature (e.g., attribute 
correlation). Sometimes interpretability constraints are 
conflicting (as exemplified by the dichotomy distin- 
guishability versus coverage) and, in many cases, they 
conflict with the overall accuracy of the system. A bal- 
ance is therefore required, asking in its turn for a way 
to assess interpretability in a qualitative but also quanti- 
tative way. This is the main subject of the next section. 


the same problem. On the other hand, subjective in- 
dexes are thought for guiding the design of customized 
fuzzy models, thus making easier to take into account 
users’ preferences and expectations during the design 
process. 

The rest of this section gives an overview on 
the most popular interpretability indexes which turn 
out from the specialized literature. Firstly, Zhou and 
Gan [14.29] established a two-level taxonomy regard- 
ing interpretability issues. They distinguished between 
low-level (also called fuzzy set level) and high-level 
(or fuzzy rule level). This taxonomy was extended 
by Alonso et al. [14.30] who introduced a conceptual 
framework for characterizing interpretability. They con- 
sidered both fuzzy partitions and fuzzy rules at several 
abstraction levels. Moreover, in [14.31] Mencar et al. 
remarked the need to distinguish between readability 
(related to structural issues) and comprehensibility (re- 
lated to semantic issues). Later, Gacto et al. [14.32] 
proposed a double axis taxonomy regarding semantic 
and structural properties of fuzzy systems, at both par- 
tition and rule base levels. Accordingly, they pointed 
out four groups of indexes. Below, we briefly introduce 
the two most sounded indexes inside each group (they 
are summarized in Fig. 14.3): 


G1. Structural-based interpretability at fuzzy partition 
level: 
© Number of features. 
© Number of membership functions. 


227 


EHL | 9 Hed 


228 PartB 


Fuzzy Logic 


EHL | 9 Hed 


Fuzzy partition level 


Structural—based 
interpretability 


Fuzzy rule base level 


Semantic—based 


interpretability GM3M index 


Context—adaptation—based index 


Gl G2 
Number of features Number of rules 
Number of membership functions Number of conditions 

G3 G4 


Semantic—cointension—based index 
Co-firing—based—comprehensibility index 


Fig. 14.3 Interpretability indexes considered in this work 


G2. Structural-based interpretability at fuzzy rule base 
level: 
@ Number of rules. This index is the most widely 


used [14.30]. 


© Number of conditions. This index corresponds 


to the previously mentioned total rule length 
which was coined by Ishibuchi et al. [14.33]. 


G3. Semantic-based interpretability at fuzzy partition 
level: 
© Context-adaptation-based index [14.34]. This 


index was introduced by Botta et al. with the 
aim of guiding the so-called context adaptation 
approach for multiobjective evolutionary design 
of fuzzy rule-based systems. It is actually an 
interpretability index based on fuzzy ordering 
relations. 

GM3M index [14.35]. Gacto et al. proposed an 
index defined as the geometric mean of three 
single metrics. The first metric computes the 
displacement of the tuned membership func- 
tions with respect to the initial ones. The second 
metric evaluates the changes in the shapes of 
membership functions in terms of lateral am- 
plitude rate. The third metric measures the area 
similarity. This index was used to preserve 
the semantic interpretability of fuzzy partitions 
along multiobjective evolutionary rule selection 
and tuning processes aimed at designing fuzzy 
models with a good interpretability-accuracy 
trade-off. 


G4. Semantic-based interpretability at fuzzy rule base 
level: 
@ Semantic-cointension-based index [14.36]. This 


index exploits the cointension concept coined 
by Zadeh [14.3]. In short, two different concepts 
referring almost to the same entities are taken as 
cointensive. Thus, a fuzzy system is deemed as 
comprehensible only when the explicit seman- 
tics (defined by fuzzy sets attached to linguistic 


terms as well as fuzzy operators) embedded in 
the fuzzy model is cointensive with the implicit 
semantics inferred by the user while reading 
the linguistic representation of the rules. In the 
case of classification problems, semantic coin- 
tension can be evaluated through a logical view 
approach, which evaluates the degree of fulfill- 
ment of a number of logical laws exhibited by 
a given fuzzy rule base [14.31]. The idea mainly 
relies on the assumption that linguistic propo- 
sitions resemble logical propositions, for which 
a number of basic logical laws are expected to 
hold. 

Co-firing-based — comprehensibility index 
[14.37]. It measures the complexity of under- 
standing the fuzzy inference process in terms of 
information related to co-firing rules, i. e. rules 
firing simultaneously with a given input vector. 
This index emerges in relation with a novel 
approach for fuzzy system comprehensibility 
analysis, based on visual representations of the 
fuzzy rule-based inference process. Such rep- 
resentations are called fuzzy inference-grams 
(fingrams) [14.38,39]. Given a fuzzy rule 
base, a fingram plots it graphically as a social 
network made of nodes representing fuzzy rules 
and edges connecting nodes in terms of rule 
interaction at the inference level. Edge weights 
are computed by paying attention to the number 
of co-firing rules. Thus, looking carefully at 
all the information provided by a fingram it 
becomes easy and intuitive understanding the 
structure and behavior of the fuzzy rule base it 
represents. 


Notice that, most published interpretability indexes 


only deal with structural issues, so they correspond to 
groups G1 and G2. Indexes belonging to these groups 
are mainly quantitative. They essentially analyze the 


Interpretability of Fuzzy Systems | 14.4 Designing Interpretable Fuzzy Systems 229 


structural complexity of a fuzzy model by counting the 
number of elements (membership functions, rules, etc.) 
it contains. As a result, these indexes can be deemed 
as objective ones. Although these indexes are usually 
quite simple (that is the reason why we have just listed 
them above), they are by far the most popular ones. On 
the contrary, only a few interpretability indexes are able 
to assess the comprehensibility of a fuzzy model deal- 
ing with semantic issues (they belong to groups G3 and 
G4). This is mainly due to the fact that these indexes 
must take into account not only quantitative but also 
qualitative aspects of the modeled fuzzy system. They 
are inherently subjective and therefore not easy to for- 


malize (that is the reason why we have provided more 
details above). Anyway, the interested reader is referred 
to the cited papers for further information. Moreover, 
a much more exhaustive list of indexes can be found 
in [14.32]. 

Even though there has been a great effort in the last 
years to propose new interpretability indexes, a univer- 
sal index is still missing. Hence, defining such an index 
remains a challenging task. Anyway, we would like to 
highlight the need to address another encouraging chal- 
lenge which is a careful design of interpretable fuzzy 
systems guided by one or more of the already existing 
interpretability indexes. 


14.4 Designing Interpretable Fuzzy Systems 


Linguistic (Mamdani-type) fuzzy systems are widely 
known as a powerful tool to develop linguistic mod- 
els [14.11]. They are made up of two main components: 


@ The inference engine, that is the component of the 
fuzzy system in charge of the fuzzy processing tasks. 

@ The knowledge base (KB), that is the component of 
the fuzzy system that stores the knowledge about 
the problem being solved. It is composed of: 

— The fuzzy partitions, describing the linguistic 
terms along with the corresponding membership 
functions defining their semantics, and 

— The fuzzy rule base, constituted by a collection 
of linguistic rules with the following structure 


IF X; is A; and ... and X, is Ay 
THEN Y; is Bı and ... and Y,, is Bm 


with X; and Y; being input and output linguis- 
tic variables, respectively, and A; and B; being 
linguistic terms defined by the corresponding 
fuzzy partitions. This structure provides a nat- 
ural framework to include expert knowledge 
in the form of linguistic fuzzy rules. In addi- 
tion to expert knowledge, induced knowledge 
automatically extracted from experimental data 
(describing the relation between system input 
and output) can also be easily formalized in the 
same rule base. Expert and induced knowledge 
are complementary. Furthermore, they are rep- 
resented in a highly interpretable structure. The 
fuzzy rules are composed of input and output 
linguistic variables which take values from their 
term sets having a meaning associated with each 


linguistic label. As a result, each rule is a de- 
scription of a condition-action statement that 
offers a clear interpretation to a human. 


The accuracy of a fuzzy system directly depends 
on two aspects, the composition of the KB (fuzzy 
partitions and fuzzy rules) and the way in which it im- 
plements the fuzzy inference process. Therefore, the 
design process of a fuzzy system includes two main 
tasks which are going to be further explained in the fol- 
lowing subsections, regarding both interpretability and 
accuracy: 


© Generation of the KB in order to formulate and de- 
scribe the knowledge that is specific to the problem 
domain. 

© Conception of the inference engine, that is the 
choice of the different fuzzy operators that are em- 
ployed by the inference process. 


Mamdani-type fuzzy systems favor interpretability. 
Therefore, they are usually considered when looking for 
interpretable fuzzy systems. However, it is important 
to remark that they are not interpretable per se. Notice 
that designing interpretable fuzzy systems is a matter of 
careful design. 


14.4.1 Design Strategies for the Generation 
of a KB Regarding 
the Interpretability-Accuracy 
Trade-Off 


The two main objectives to be addressed in the FM 
field are the interpretability and accuracy. Of course, 


tt | d Hed 


230 Part B | Fuzzy Logic 
the ideal aim would be to satisfy both objectives to The rest of this section provides additional details 
a high degree but, since they represent conflicting related to each of these approaches. 
goals, it is generally not possible. Regardless of the 
approach, a common scheme is found in the existing First Interpretability Then Accuracy 
literature: LFM has some inflexibility due to the use of linguistic 
variables with global semantics that establishes a gen- 
© Firstly, the main objective (interpretability or accu- eral meaning of the used fuzzy sets [14.40]: 
racy) is tackled defining a specific model structure 
to be used, thus setting the FM approach. 1. There is a lack of flexibility in the fuzzy system 
@ Then, the modeling components (model structure because of the rigid partitioning of the input and 
and/or modeling process) are improved by means output spaces. 
of different mechanisms to achieve the desired ratio 2. When the system input variables are dependent, it 
between interpretability and accuracy. is very hard to find out right fuzzy partitions of the 
input spaces. 
This procedure resulted in four different possibili- 3. The usual homogeneous partitioning of the in- 
ties: put and output spaces does not scale to high- 
dimensional spaces. It yields to the well-known 
1. LFM with improved interpretability, curse of dimensionality problem that is character- 
2. LFM with improved accuracy, istic of fuzzy systems. 
3. PFM with improved interpretability, and 4. The size of the KB directly depends on the number 
4. PFM with improved accuracy. of variables and linguistic terms in the model. The 
derivation of an accurate linguistic fuzzy system 
Option (1) gives priority to interpretability. Al- usually requires a big number of linguistic terms. 
though a fuzzy system designed by LFM uses a model Unfortunately, this fact causes the number of rules 
structure with high descriptive power, it has some prob- to rise significantly, which may cause the system to 
lems (curse of dimensionality, excessive number of lose the capability of being readable by human be- 
D input variables or fuzzy rules, garbled fuzzy sets, etc.) ings. Of course, in most cases it would be possible 
= that make it not as interpretable as desired. In conse- to obtain an equivalent fuzzy system with a much 
Ex quence, there is a need of interpretability improvements smaller number of rules by renouncing to that kind 
P to restore the pursued balance. of rigidly partitioned input space. 
F On the contrary, option (4) considers accuracy as 


the main concern. However, obtaining more accuracy in 
PFM does not pay attention to the interpretability of the 
model. Thus, this approach goes away from the aim of 
this chapter. It acts close to black-box techniques, so it 
does not follow the original objective of FM (not taking 
profit from the advantages that distinguish it from other 
modeling techniques). 

Finally, the two remaining options, (2) and (3), pro- 
pose improvement mechanisms to compensate for the 
initial imbalance in the quest for the best trade-off be- 
tween interpretability and accuracy. In summary, three 
main approaches exist depending on how the two ob- 
jectives are optimized (sequentially or at once): 


© First interpretability then accuracy (LFM with im- 
proved accuracy). 

© First accuracy then interpretability (PFM with im- 
proved interpretability). 

@ Multiobjective design. Both objectives are opti- 
mized at the same time. 


However, it is possible to make some considerations 
to face the disadvantages enumerated above. Basically, 
two ways of improving the accuracy in LFM can be 
considered by performing the improvement in: 


© The model structure, slightly changing the rule 
structure to make it more flexible, or in 

@ The modeling process, extending the model design 
to other components beyond the rule base, such as 
the fuzzy partitions, or even considering more so- 
phisticated derivations of it. 


Note that, the so-called strong fuzzy partitions are 
widely used because they satisfy most of the inter- 
pretability constraints introduced in Sect. 14.2.2. The 
design of fuzzy partitions may be integrated within the 
whole derivation process of a fuzzy system with differ- 
ent schemata: 


© Preliminary design. It involves extracting fuzzy 
partitions automatically by induction (usually per- 


Interpretability of Fuzzy Systems 


14.4 Designing Interpretable Fuzzy Systems 


formed by nonsupervised clustering techniques) 
from the available dataset. 

@ Embedded design. Following a meta-learning pro- 
cess, this approach first derives different fuzzy 
partitions and then samples its efficacy running 
an embedded basic learning method of the entire 
KB [14.41]. 

@ Simultaneous design. The process of designing 
fuzzy partitions is developed together with the 
derivation of other components such as the fuzzy 
rule base [14.42]. 

@ A posteriori design. This approach involves tuning 
of the previously defined fuzzy partitions once the 
remaining components have been obtained. Usu- 
ally, the tuning process changes the membership 
function shapes with the aim of improving the 
accuracy of the linguistic model [14.43]. Neverthe- 
less, sometimes it also takes care of getting better 
interpretability (e.g., merging membership func- 
tions [14.44]). 


It is also possible to opt for using more sophisticated 
tule base learning methods while the fuzzy partitions 
and the model structure are kept unaltered. Usually, all 
these improvements have the final goal of enhancing the 
interpolative reasoning the fuzzy system develops. For 
instance, the COR (cooperative rules) method follows 
the primary objective of inducing a better cooperation 
among linguistic rules [14.45]. 

As an alternative, other authors advocate the exten- 
sion of the usual linguistic model structure to make 
it more flexible. As Zadeh highlighted in [14.46], 
a way to do so without losing the description abil- 
ity to a high degree is to use linguistic hedges (also 
called linguistic modifiers in a wider sense). In ad- 
dition, the rule structure can be extended through 
the definition of double-consequent rules, weighted 
rules, rules with exceptions, hierarchical rule bases, 
etc. 


First Accuracy Then Interpretability 
The birth of more flexible fuzzy systems such as TSK 
or approximate ones (allowing the FM to achieve higher 
accuracy) entailed the eruption of PFM. Nevertheless, 
the modeling tasks with these kinds of fuzzy systems 
increasingly resembled black-box processes. Conse- 
quently, nowadays several researchers share the idea 
of rescuing the seminal intent of FM, i.e. to preserve 
the good interpretability advantages offered by fuzzy 
systems. This fact is usually attained by reducing the 
complexity of the model [14.47]. Furthermore, there are 


approaches aimed at improving the local description of 
TSK-type fuzzy rules: 


© Merging/removing fuzzy sets in precise fuzzy sys- 
tems. The interpretability of TSK-type fuzzy sys- 
tems may be improved by removing those fuzzy 
sets that, after an automatic adaptation and/or ac- 
quisition, do not contribute significantly to the 
model behavior. Two aspects must be consid- 
ered: 

— Redundancy. It refers to the coexistence of simi- 
lar fuzzy sets representing compatible concepts. 
In consequence, models become more complex 
and difficult to understand (the distinguishabil- 
ity constraint is not satisfied). 

— Irrelevancy. It arises when fuzzy sets with 
a constant membership degree equal to 1, or 
close to it, are used. These kinds of fuzzy sets 
do not furnish relevant information. 

The use of similarity measures between fuzzy sets 

the has been proposed to automatically detect these 

undesired fuzzy sets [14.48]. Through first merg- 
ing/removing fuzzy sets and then merging fuzzy 
tules, the precise fuzzy model goes through an in- 
terpretability improvement process that makes it 
less complex (more compact) and more easily in- 
terpretable (more transparent). 

© Ordering/selecting TSK-type fuzzy rules. An effi- 
cient way to improve the interpretability in FM is 
to select a subset of significant fuzzy rules that rep- 
resent in a more compact way the system to be 
modeled. Moreover, as a side effect this selection 
of important rules reduces the possible redundancy 
existing in the fuzzy rule base, thus improving 

the generalization capability of the system, i.e., 

its accuracy. For instance, resorting to orthogonal 

transformations [14.49] is one of the most success- 
ful approaches in this sense. 

© Exploiting the local description of TSK-type fuzzy 
rules. TSK-type fuzzy systems are usually consid- 
ered as the combination of simple models (the rules) 
that describe local behaviors of the system to be 
modeled. Hence, insofar as each fuzzy rule is either 
forced to have a smoother consequent polynomial 
function or to develop an isolated action, the inter- 
pretability will be improved: 

— Smoothing the consequent polynomial func- 
tion [14.50]. Through imposing several con- 
straints to the weights involved in the poly- 
nomial function of each rule consequent then 
a convex combination of the input variables is 


231 


at | d Hed 


232 


tt | d Hed 


Part B 


Fuzzy Logic 


performed. This contributes to a better under- 
standing of the model. 

— Isolating the fuzzy rule actions [14.47]. The de- 
scription of each fuzzy rule is improved when 
the overlapping between adjacent input fuzzy 
sets is reduced. Note that the performance re- 
gion of a rule is more clearly defined by avoid- 
ing that other rules have high firing degree in the 
same area. 


Multiobjective Design 

Since interpretability and accuracy are widely recog- 
nized as conflicting goals, the use of multiobjective 
evolutionary (MOE) strategies is becoming more and 
more popular in the quest for the best interpretability- 
accuracy trade-off [14.19,51]. Ducange and Marcel- 
loni [14.52] proposed the following taxonomy of mul- 
tiobjective evolutionary fuzzy systems: 


@ MOE Tuning. Given an already defined fuzzy sys- 
tem, its main parameters (typically membership 
function parameters but also fuzzy inference param- 
eters) are refined through MOE strategies [14.53, 
54]. 

@ MOE Learning. The components of a fuzzy sys- 
tem KB, the both fuzzy partitions forming the 
database (DB) and fuzzy rules forming the rule-base 
(RB), are automatically generated from experimen- 
tal data. 

— MOE DB Learning. The most relevant variables 
are identified and the optimum membership 
function parameters are defined from scratch. 
It usually wraps a RB heuristic-based learning 
process [14.55]. 

— MOE RB Selection. Starting from an initial RB, 
a set of nondominated RBs is generated by 
selecting subsets of rules exhibiting different 
trade-offs between interpretability and accu- 
racy [14.56]. In some works [14.35,57], MOE 
RB selection and MOE tuning are carried out 
together. 

— MOE RB Learning. The entire set of fuzzy rules 
is fully defined from scratch. In this approach, 
uniformly distributed fuzzy partitions are usu- 
ally considered [14.58]. 

— MOE KB Learning. Simultaneous evolution- 
ary learning of all KB components (DB and 
RB). Concurrent learning of fuzzy partitions 
and fuzzy rules proved to be a powerful tool 
in the quest for a good balance between inter- 
pretability and accuracy [14.59]. 


It is worthy to note that for the sake of clarity we 
have only cited some of the most relevant papers in the 
field of MOE fuzzy systems. For further details, the in- 
terested reader is referred to [14.51,52] where a much 
more exhaustive review of related works is carried out. 


14.4.2 Design Decisions at Fuzzy Processing 
Level 


Although there are studies analyzing the behavior of the 
existing fuzzy operators for different purposes, unfor- 
tunately this question has not been considered yet as 
a whole from the interpretability point of view. Keeping 
in mind the interpretability requirement, the implemen- 
tation the of the inference engine must address the 
following careful design choices: 


@ Select the right conjunctive operator to be used in 
the antecedent of the rule. Different operators (be- 
longing to the t-norm family) are available to make 
this choice [14.60]. 

@ Select the operator to be used in the fuzzy impli- 
cation of IF-THEN rules. Mamdani proposed to 
use the minimum operator as the t-norm for im- 
plication. Since then, various other t-norms have 
been suggested as implication operator [14.60], 
for instance the algebraic product. Other important 
family of implication operators are the fuzzy im- 
plication functions [14.61], one of the most usual 
being the Lukasiewicz’s one. Less common impli- 
cation operators such as force-implications [14.62], 
t-conorms and operators not belonging to any of the 
most known implication operator families [14.63, 
64] have been considered too. 

© Choose the right inference mechanism. Two main 
strategies are available: 

— FATI (First Aggregation Then Inference). All 
antecedents of the rules are aggregated to form 
a multidimensional fuzzy relation. Via the com- 
position principle the output fuzzy set is derived. 
This strategy is preferred when dealing with im- 
plicative rules [14.65]. 

— FITA (First Inference Then Aggregation). The 
output of each rule is first inferred, and then all 
individual fuzzy outputs are aggregated. This is 
the common approach when working with the 
usual conjunctive rules. This strategy has be- 
come by far the most popular, especially in case 
of real-time applications. The choice for an out- 
put aggregation method (in some cases this is 
called the also operator) is closely related to 


Interpretability of Fuzzy Systems 


14.5 Interpretable Fuzzy Systems in the Real World 


the considered implication operator since it has 

to be related to the interpretation of the rules 
(which is connected to the kind of implication). 

© Choose the most suitable defuzzification interface 
operation mode. There are different options being 
the most widely used the center of area, also called 


center of gravity, and the mean of maxima. Even 
though most methods are based on geometrical or 
Statistical interpretations, there are also paramet- 
ric methods, adaptive methods including human 
knowledge, and even evolutionary adaptive meth- 
ods [14.66]. 


14.5 Interpretable Fuzzy Systems in the Real World 


Interpretable fuzzy systems have an immediate impact 
on real-world applications. In particular, their useful- 
ness is appreciable in all application areas that put 
humans at the center of computing. Interpretable fuzzy 
systems, in fact, conjugate knowledge acquisition capa- 
bilities with the ability of communicating knowledge in 
a human-understandable way. 

Several application areas can take advantage from 
the use of interpretable fuzzy systems. In the follow- 
ing, some of them are briefly outlined, along with a few 
notes on specific applications and potentialities. 


@ Environment: Environmental issues are often chal- 
lenging because of the complex dynamics, the 
high number of variables and the consequent un- 
certainty characterizing the behavior of subjects 
under study. Computational intelligence techniques 
come into play when tolerance for imprecision 
can be exploited to design convenient models 
that are suitable to understand phenomena and 
take decisions. Interpretable fuzzy systems show 
a clear advantage over black-box systems in pro- 
viding knowledge that is capable of explaining 
complex and nonlinear relationships by using lin- 
guistic models. Real-world environmental applica- 
tions of interpretable fuzzy systems include: harm- 
ful bioaerosol detection [14.67]; modeling habitat 
suitability in river management [14.68]; modeling 
pesticide loss caused by meteorological factors in 
agriculture [14.69], and so on. 

© Finance: This is a sector where human-computer 
cooperation is very tight. Cooperation is carried out 
in different ways, including the use of computers 
to provide business intelligence for decision sup- 
port in financial operations. In many cases financial 
decisions are ultimately made by experts, who can 
benefit from automated analyses of big masses of 
data flowing daily in markets. To this pursuit, Com- 
putational intelligence approaches are spreading 
among the tools used by financial experts in their 
decisions, including interpretable fuzzy systems for 


stock return predictions [14.70], exchange rate fore- 
casting [14.71], portfolio risk monitoring [14.72], 
etc. 

Industry: Industrial applications could take advan- 
tage from interpretable fuzzy systems when there is 
the need of explaining the behavior of complex sys- 
tems and phenomena, like in fault detection [14.73]. 
Also, control plans for systems and processes can 
be designed with the help of fuzzy systems. In such 
cases, a common practice is to start with an ini- 
tial expert knowledge (used to design rules which 
are usually highly interpretable) that is then tuned 
to increase the accuracy of the controller. However, 
any unconstrained tuning could destroy the origi- 
nal interpretability of the knowledge base, whilst, 
by taking into account interpretability, the possibil- 
ity of revising and modifying the controller (or the 
process manager) can be enhanced [14.74]. 
Medicine and Health-care: As a matter of fact, in al- 
most all medical contexts intelligent systems can be 
invaluable decision support tools, but people are the 
ultimate actors in any decision process. As a conse- 
quence, people need to rely on intelligent systems, 
whose reliability can be enhanced if their outcomes 
may be explained in terms that are comprehensible 
by human users. Interpretable fuzzy systems could 
play a key role in this area because of the possibility 
of acquiring knowledge from data and communicat- 
ing it to users. In the literature, several approaches 
have been proposed to apply interpretable fuzzy 
systems in different medical problems, like assisted 
diagnosis [14.75], prognosis prediction [14.76], pa- 
tient subgroup discovery [14.77], etc. 

Robotics: The complexity of robot behavior model- 
ing can be tackled by an integrated approach where 
a first modeling stage is carried out by combining 
human expert and empirical knowledge acquired 
from experimental trials. This integrated approach 
requires that the final knowledge base is provided 
to experts for further maintenance: this task could 
be done effectively only if the acquired knowledge 


233 


SL | d Hed 


234 PartB 


Fuzzy Logic 


ZHL | d Hed 


is interpretable by the user. Some concrete applica- 
tions of this approach can be found in robot local- 
ization systems [14.78] and motion analysis [14.79, 
80]. 

@ Society: The focus of intelligent systems for social 
issues has noticeably increased in recent years. For 


reasons that are common to all the previous appli- 
cation areas, interpretable fuzzy systems have been 
applied in a wide variety of scopes, including qual- 
ity of service improvement [14.81], data mining 
with privacy preservation [14.82], social network 
analysis [14.37], and so on. 


14.6 Future Research Trends on Interpretable Fuzzy Systems 


Research on interpretable fuzzy systems is open in sev- 
eral directions. Future trends involve both theoretical 
and methodological aspects of interpretability. In the 
following, some trends are outlined amongst the pos- 
sible lines of research development [14.7]. 


© Interpretability definition: The blurred nature of in- 
terpretability requires continuous investigations on 
possible definitions that enable a computable treat- 
ment of this quality in fuzzy systems. This require- 
ment casts the research on interpretable fuzzy sys- 
tems toward cross-disciplinary investigations. For 
instance, this research line includes investigations 
on computable definitions of some conceptual qual- 
ities, like vagueness (which has to be distinguished 
from imprecision and fuzziness). Also, the problem 
of interpretability of fuzzy systems can be intended 
as a particular instance of the more general problem 
of communication between granular worlds [14.83], 
where many aspects of interpretability could be 
treated in a more abstract way. 

© Interpretability assessment: A prominent objec- 
tive is the adoption of a common framework for 
characterizing and assessing interpretability with 
the aim of avoiding misleading notations. Within 
such a framework, novel metrics could be de- 
vised, especially for assessing subjective aspects 
of interpretability, and integrated with objective in- 
terpretability measures to define more significant 
interpretability indexes. 

© Design of interpretable fuzzy models: A current re- 
search trend in designing interpretable fuzzy models 
makes use of multiobjective genetic algorithms in 


14.7 Conclusions 


Interpretability is an indispensable requirement for de- 
signing fuzzy systems, yet it cannot be assumed to hold 
by the simple fact of using fuzzy sets for modeling. In- 
terpretability must be encoded in some computational 


order to deal with the conflicting design objectives 
of accuracy and interpretability. The effectiveness 
and usefulness of these approaches, especially those 
concerning advanced schemes, have to be veri- 
fied against a number of indexes, including indexes 
that integrate subjective measures. This verifica- 
tion process is particularly required when tackling 
high-dimensional problems. In this case, the combi- 
nation of linguistic and graphical approaches could 
be a promising approach for descriptive and ex- 
ploratory analysis of interpretable fuzzy systems. 

@ Representation of fuzzy systems: For very complex 
problems the use of novel forms of representa- 
tion (different from the classical rule based) may 
help in representing complex relationship in com- 
prehensible ways thus yielding a valid aid in de- 
signing interpretable fuzzy systems. For instance, 
a multilevel representation could enhance the inter- 
pretability of fuzzy systems by providing different 
granularity levels for knowledge representation. On 
the one hand, the highest granulation levels give 
a coarse (yet immediately comprehensible) descrip- 
tion of knowledge, while lower levels provide for 
more detailed knowledge. 


As a final remark, it is worth observing that inter- 
pretability is one aspect of the multifaceted problem of 
human-centered design of fuzzy systems [14.84]. Other 
facets include acceptability (e.g., according to ethical 
rules), interestingness of fuzzy rules, applicability (e.g., 
with respect to law), etc. Many of them are not yet in the 
research mainstream but they clearly represent promis- 
ing future trends. 


methods in order to drive the design of fuzzy systems, 
as well as to assess the interpretability of existing mod- 
els. The study of interpretability issues started about 
two decades ago and led to a number of theoretical 


Interpretability of Fuzzy Systems 


References 


and methodological results of paramount value in fuzzy 
modeling. Nevertheless, research is still open both in 


depth — through new ways of encoding and assessing 

References 

14.1 L.A. Zadeh: From computing with numbers to com- 
puting with words-from manipulation of mea- 
surements to manipulation of perceptions, IEEE 
Trans. Circuits Syst. I: Fundam. Theory Appl. 45(1), 
105-119 (1999) 

14.2 L.-X. Wang, J.M. Mendel: Fuzzy basis functions, 
universal approximation, and orthogonal least 
squares learning, IEEE Trans. Neural Netw. 3, 807- 
814 (1992) 

14.3 L.A. Zadeh: Is there a need for fuzzy logic?, Inf. Sci. 
178(13), 2751-2779 (2008) 

14.4 R.S. Michalski: A theory and methodology of induc- 
tive learning, Artificial Intell. 20(2), 111-161 (1983) 

14.5 G.A. Miller: The magical number seven, plus or mi- 
nus two: Some limits on our capacity for processing 
information, Psychol. Rev. 63, 81-97 (1956) 

14.6 L.A. Zadeh: Fuzzy sets, Inf. Control 8, 338-353 (1965) 

14.7 J.M. Alonso, L. Magdalena: Editorial: Special is- 
sue on interpretable fuzzy systems, Inf. Sci. 181(20), 
4331-4339 (2011) 

14.8 L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning. Part |, 
Inf. Sci. 8, 199-249 (1975) 

14.9 L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning. Part Il, 
Inf. Sci. 8, 301-357 (1975) 

14.10 L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning. Part Ill, 
Inf. Sci. 9, 43-80 (1975) 

14.11 E.H. Mamdani: Application of fuzzy logic to ap- 
proximate reasoning using linguistic synthesis, IEEE 
Trans. Comput. 26(12), 1182-1191 (1977) 

14.12 T. Takagi, M. Sugeno: Fuzzy identification of sys- 
tems and its applications to modelling and control, 
IEEE Trans. Syst. Man Cybern. B Cybern. 15, 116-132 
(1985) 

14.13 E. Hiillermeier: Fuzzy methods in machine learning 
and data mining: Status and prospects, Fuzzy Sets 
Syst. 156(3), 387-406 (2005) 

14.14 E. Hilllermeier: Fuzzy sets in machine learning and 
data mining, Appl. Soft Comput. 11(2), 1493-1505 
(2011) 

14.15 S. Guillaume: Designing fuzzy inference systems 
from data: An interpretability-oriented review, IEEE 
Trans. Fuzzy Syst. 9(3), 426-443 (2001) 

14.16 R. Alcala, J. Alcala-Fdez, J. Casillas, 0. Cordón, 
F. Herrera: Hybrid learning models to get the 
interpretability-accuracy trade-off in fuzzy mod- 
eling, Soft Comput. 10(9), 717-734 (2006) 

14.17 J. Casillas, 0. Cordón, F. Herrera, L. Magdalena 


(Eds.): Accuracy Improvements in Linguistic Fuzzy 


interpretability — and in breadth, by integrating inter- 
pretability in the more general realm of human centered 
computing. 


14.18 


14.19 


14.20 


14.21 


14.22 


14.23 


14.24 


14.25 


14.26 


14.27 


14.28 


14.29 


14.30 


14.31 


Modeling, Studies in Fuzziness and Soft Comput- 
ing, Vol. 129 (Springer, Berlin, Heidelberg 2003) 

J. Casillas, 0. Cordón, F. Herrera, L. Magdalena 
(Eds.): Interpretability Issues in fuzzy modeling, 
Studies in Fuzziness and Soft Computing, Vol. 128 
(Springer, Berlin, Heidelberg 2003) 

0. Cordón: A historical review of evolutionary 
learning methods for Mamdani-type fuzzy rule- 
based systems: Designing interpretable genetic 
fuzzy systems, Int. J. Approx. Reason. 52, 894-913 
(2011) 

F. Herrera: Genetic fuzzy systems: Taxonomy, cur- 
rent research trends and prospects, Evol. Intell. 1, 
27-46 (2008) 

U. Bodenhofer, P. Bauer: A formal model of inter- 
pretability of linguistic variables, Stud. Fuzzin. Soft 
Comput. 128, 524-545 (2003) 

C. Mencar, A.M. Fanelli: Interpretability constraints 
for fuzzy information granulation, Inf. Sci. 178(24), 
4585-4618 (2008) 

J. de Valente Oliveira: Semantic constraints for 
membership function optimization, IEEE Trans. 
Syst. Man Cybern. A 29(1), 128-138 (1999) 

W. Pedrycz, F. Gomide: An Introduction to Fuzzy 
Sets. Analysis and Design (MIT Press, Cambridge 
1998) 

T.L. Saaty, M.S. Ozdemir: Why the magic number 
seven plus or minus two, Math. Comput. Model. 
38(3-4), 233-244 (2003) 

C. Mencar, G. Castellano, A.M. Fanelli: Distin- 
guishability quantification of fuzzy sets, Inf. Sci. 
177(1), 130-149 (2007) 

U. Bodenhofer, P. Bauer: Interpretability of linguis- 
tic variables: A formal account, Kybernetika 41(2), 
227-248 (2005) 

A. Riid, E. Riistern: Transparent Fuzzy Systems in 
Modelling and Control, Studies in Fuzziness and 
Soft Computing, Vol. 128 (Springer, Berlin, Heidel- 
berg 2003) pp. 452-476 

S.-M. Zhou, J.Q. Gan: Low-level interpretability 
and high-level interpretability: A unified view of 
data-driven interpretable fuzzy system modelling, 
Fuzzy Sets Syst. 159(23), 3091-3131 (2008) 

J.M. Alonso, L. Magdalena, G. Gonzdlez-Rodriguez: 
Looking for a good fuzzy system interpretability 
index: An experimental approach, Int. J. Approx. 
Reason. 51(1), 115-134 (2009) 

C. Mencar, C. Castiello, R. Cannone, A.M. Fanelli: 
Design of fuzzy rule-based classifiers with se- 
mantic cointension, Inf. Sci. 181(20), 4361-4377 
(2011) 


235 


hl | 3 Hed 


236 PartB 


Fuzzy Logic 


Hl | a Hed 


14.32 


14.33 


14.34 


14.35 


14.36 


14.37 


14.38 


14.39 


14.40 


14.41 


14.42 


14.43 


14.44 


M.J. Gacto, R. Alcalá, F. Herrera: Interpretability of 
linguistic fuzzy rule-based systems: An overview of 
interpretability measures, Inf. Sci. 181(20), 4340- 
4360 (2011) 

H. Ishibuchi, T. Nakashima, T. Murata: Three- 
objective genetics-based machine learning for lin- 
guistic rule extraction, Inf. Sci. 136(1-4), 109-133 
(2001) 

A. Botta, B. Lazzerini, F. Marcelloni, D.C. Ste- 
fanescu: Context adaptation of fuzzy systems 
through a multi-objective evolutionary approach 
based on a novel interpretability index, Soft Com- 
put. 13(5), 437-449 (2009) 

M.J. Gacto, R. Alcalá, F. Herrera: Integration of an 
index to preserve the semantic interpretability in 
the multiobjective evolutionary rule selection and 
tuning of linguistic fuzzy systems, IEEE Trans. Fuzzy 
Syst. 18(3), 515-531 (2010) 

C. Mencar, C. Castiello, R. Cannone, A.M. Fanelli: 
Interpretability assessment of fuzzy knowledge 
bases: A cointension based approach, Int. J. Ap- 
prox. Reason. 52(4), 501-518 (2011) 

J.M. Alonso, D.P. Pancho, 0. Cordón, A. Quirin, 
L. Magdalena: Social network analysis of co- 
fired fuzzy rules. In: Soft Computing: State of 
the Art Theory and Novel Applications, ed. by 
R.R. Yager, A.M. Abbasov, M. Reformat, S.N. Shah- 
bazova (Springer, Berlin, Heidelberg 2013) pp. 13- 
128 

D.P. Pancho, J.M. Alonso, 0. Cordón, A. Quirin, 
L. Magdalena: FINGRAMS: Visual representations 
of fuzzy rule-based inference for expert analysis 
of comprehensibility, IEEE Trans. Fuzzy Syst. 21(6), 
1133-1149 (2013) 

D.P. Pancho, J.M. Alonso, L. Magdalena: Quest for 
interpretability-accuracy trade-off supported by 
fingrams into the fuzzy modeling tool GUAJE, Int. 
J. Comput. Intell. Syst. 6(1), 46-60 (2013) 

A. Bastian: How to handle the flexibility of lin- 
guistic variables with applications, Int. J. Uncer- 
tain. Fuzzin. and Knowl.-Based Syst. 2(4), 463-484 
(1994) 

0. Cordón, F. Herrera, P. Villar: Generating the 
knowledge base of a fuzzy rule-based system by 
the genetic learning of the data base, IEEE Trans. 
Fuzzy Syst. 9(4), 667-674 (2001) 

A. Homaifar, E. McCormick: Simultaneous design of 
membership functions and rule sets for fuzzy con- 
trollers using genetic algoritms, IEEE Trans. Fuzzy 
Syst. 3(2), 129-139 (1995) 

B.-D. Liu, C.-Y. Chen, J.-Y. Tsao: Design of adaptive 
fuzzy logic controller based on linguistic-hedge 
concepts and genetic algorithms, IEEE Trans. Syst. 
Man Cybern. B Cybern. 31(1), 32-53 (2001) 

J. Espinosa, J. Vandewalle: Constructing fuzzy mod- 
els with linguistic integrity from numerical data- 
AFRELI algorithm, IEEE Trans. Fuzzy Syst. 8(5), 591- 
600 (2000) 


14.45 


14.46 


14.47 


14.48 


14.49 


14.50 


14.51 


14.52 


14.53 


14.54 


14.55 


14.56 


14.57 


14.58 


J. Casillas, 0. Cordón, F. Herrera: COR: A methodol- 
ogy to improve ad hoc data-driven linguistic rule 
learning methods by inducing cooperation among 
rules, IEEE Trans. Syst. Man Cybern. B Cybern. 32(4), 
526-537 (2002) 

L.A. Zadeh: Outline of a new approach to the anal- 
ysis of complex systems and decision processes, 
IEEE Trans. Syst. Man. Cybern. 3(1), 28-44 (1973) 

A. Riid, E. Riistern: Identification of transpar- 
ent, compact, accurate and reliable linguistic fuzzy 
models, Inf. Sci. 181(20), 4378-4393 (2011) 

M. Setnes, R. Babuška, U. Kaymak, H.R. van Nauta 
Lemke: Similarity measures in fuzzy rule base sim- 
plification, IEEE Trans. Syst. Man Cybern. B Cybern. 
28(3), 376-386 (1998) 

P.A. Mastorocostas, J.B. Theocharis, V.S. Petridis: 
A constrained orthogonal least-squares method 
for generating TSK fuzzy models: Application to 
short-term load forecasting, Fuzzy Sets Syst. 118(2), 
215-233 (2001) 

A. Fiordaliso: A constrained Takagi-Sugeno fuzzy 
system that allows for better interpretation and 
analysis, Fuzzy Sets Syst. 118(2), 307-318 (2001) 

M. Fazzolari, R. Alcalá, Y. Nojima, H. Ishibuchi, 
F. Herrera: A review of the application of multi- 
objective evolutionary fuzzy systems: Current status 
and further directions, IEEE Trans. Fuzzy Syst. 21(1), 
45-65 (2013) 

P. Ducange, F. Marcelloni: Multi-objective evolu- 
tionary fuzzy systems, Lect. Notes Artif. Intell. 6857, 
83-90 (2011) 

J. Alcalá-Fdez, F. Herrera, F. Márquez, A. Peregrín: 
Increasing fuzzy rules cooperation based on evo- 
lutionary adaptive inference systems, Int. J. Intell. 
Syst. 22(4), 1035-1064 (2007) 

P. Fazendeiro, J. De Valente Oliveira, W. Pedrycz: 
A multiobjective design of a patient and anaesthe- 
tist-friendly neuromuscular blockade controller, 
IEEE Trans. Bio-Med. Eng. 54(9), 1667 (2007) 

R. Alcalá, M.J. Gacto, F. Herrera: A fast and scalable 
multi-objective genetic fuzzy system for linguis- 
tic fuzzy modeling in high-dimensional regression 
problems, IEEE Trans. Fuzzy Syst. 19(4), 666-681 
(2011) 

H. Ishibuchi, T. Murata, I.B. Türksen: Single- 
objective and two-objective genetic algorithms for 
selecting linguistic rules for pattern classification 
problems, Fuzzy Sets Syst. 89(2), 135-150 (1997) 

R. Alcalá, Y. Nojima, F. Herrera, H. Ishibuchi: Mul- 
tiobjective genetic fuzzy rule selection of single 
granularity-based fuzzy classification rules and 
its interaction with the lateral tuning of mem- 
bership functions, Soft Comput. 15(12), 2303-2318 
(2011) 

J. Casillas, P. Martínez, A.D. Benítez: Learning con- 
sistent, complete and compact sets of fuzzy rules in 
conjunctive normal form for regression problems, 
Soft Comput. 13(5), 451-465 (2009) 


Interpretability of Fuzzy Systems 


References 


14.59 


14.60 


14.61 


14.62 


14.63 


14.64 


14.65 


14.66 


14.67 


14.68 


14.69 


14.70 


14.71 


14.72 


M. Antonelli, P. Ducange, B. Lazzerini, F. Marcel- 
loni: Learning concurrently data and rule bases of 
Mamdani fuzzy rule-based systems by exploiting 
a novel interpretability index, Soft Comput. 15(10), 
1981-1998 (2011) 

M.M. Gupta, J. Qi: Design of fuzzy logic controllers 
based on generalized T-operators, Fuzzy Sets Syst. 
40(3), 473-489 (1991) 

E. Trillas, L. Valverde: On implication and in- 
distinguishability in the setting of fuzzy logic. 
In: Management Decision Support Systems Us- 
ing Fuzzy Logic and Possibility Theory, ed. by 
J. Kacpryzk, R.R. Yager (Verlag TÜV Rheinland, Köln 
1985) pp. 198-212 

C. Dujet, N. Vincent: Force implication: A new ap- 
proach to human reasoning, Fuzzy Sets Syst. 69(1), 
53-63 (1995) 

J. Kiszka, M. Kochanska, D. Sliwinska: The influence 
of some fuzzy implication operators on the accu- 
racy of a fuzzy model — Part |, Fuzzy Sets Syst. 15, 
111-128 (1985) 

J. Kiszka, M. Kochanska, D. Sliwinska: The influence 
of some fuzzy implication operators on the accu- 
racy of a fuzzy model — Part Il, Fuzzy Sets Syst. 15, 
223-240 (1985) 

H. Jones, B. Charnomordic, D. Dubois, S. Guillaume: 
Practical inference with systems of gradual im- 
plicative rules, IEEE Trans. Fuzzy Syst. 17(1), 61-78 
(2009) 

0. Cordon, F. Herrera, F.A. Marquez, A. Peregrin: 
A study on the evolutionary adaptive defuzzifica- 
tion methods in fuzzy modeling, Int. J. Hybrid Int. 
Syst. 1(1), 36-48 (2004) 

P. Pulkkinen, J. Hytonen, H. Koivisto: Developing 
a bioaerosol detector using hybrid genetic fuzzy 
systems, Eng. Appl. Artif. Intell. 21(8), 1330-1346 
(2008) 

E. Van Broekhoven, V. Adriaenssens, B. de Baets: 
Interpretability-preserving genetic optimization of 
linguistic terms in fuzzy models for fuzzy ordered 
classification: An ecological case study, Int. J. Ap- 
prox. Reason. 44(1), 65-90 (2007) 

S. Guillaume, B. Charnomordic: Interpretable fuzzy 
inference systems for cooperation of expert knowl- 
edge and data in agricultural applications using 
FisPro, IEEE Int. Conf. Fuzzy Syst., Barcelona (2010) 
pp. 2019-2026 

A. Kumar: Interpretability and mean-square error 
performance of fuzzy inference systems for data 
mining, Intell. Syst. Account. Finance Manag. 13(4), 
185-196 (2005) 

F. Cheong: A hierarchical fuzzy system with high 
input dimensions for forecasting foreign exchange 
rates, Int. J. Artif. Intell. Soft Comput. 1(1), 15-24 
(2008) 

A. Ghandar, Z. Michalewicz, R. Zurbruegg: Enhanc- 
ing profitability through interpretability in algo- 


14.73 


14.74 


14.75 


14.76 


14.77 


14.78 


14.79 


14.80 


14.81 


14.82 


14.83 


14.84 


rithmic trading with a multiobjective evolutionary 
fuzzy system, Lect. Notes Comput. Sci. 7492, 42-51 
(2012) 

S. Altug, M.-Y. Chow, H.J. Trussell: Heuristic con- 
straints enforcement for training of and rule ex- 
traction from a fuzzy/neural architecture. Part Il: 
Implementation and application, IEEE Trans. Fuzzy 
Syst. 7(2), 151-159 (1999) 

A. Riid, E. Rustern: Interpretability of fuzzy systems 
and its application to process control, IEEE Int. Conf. 
Fuzzy Syst., London (2007) pp. 1-6 

|. Gadaras, L. Mikhailov: An interpretable fuzzy 
rule-based classification methodology for med- 
ical diagnosis, Artif. Intell. Med. 47(1), 25-41 
(2009) 

J.M. Alonso, C. Castiello, M. Lucarelli, C. Mencar: 
Modelling interpretable fuzzy rule-based classifiers 
for medical decision support. In: Medical Appli- 
cations of Intelligent Data Analysis: Research Ad- 
vancements, ed. by R. Magdalena, E. Soria, J. Guer- 
rero, J. GOmez-Sanchis, A.J. Serrano (IGI Global, 
Hershey 2012) pp. 254-271 

C.J. Carmona, P. Gonzalez, M.J. del Jesus, M. Navio- 
Acosta, L. Jimenez-Trevino: Evolutionary fuzzy rule 
extraction for subgroup discovery in a psychiatric 
emergency department, Soft Comput. 15(12), 2435- 
2448 (2011) 

J.M. Alonso, M. Ocaña, N. Hernandez, F. Herranz, 
A. Llamazares, M.A. Sotelo, L.M. Bergasa, L. Mag- 
dalena: Enhanced WiFi localization system based 
on soft computing techniques to deal with small- 
scale variations in wireless sensors, Appl. Soft Com- 
put. 11(8), 4677-4691 (2011) 

J.M. Alonso, L. Magdalena, S. Guillaume, 
M.A. Sotelo, L.M. Bergasa, M. Ocaña, R. Flores: 
Knowledge-based intelligent diagnosis of ground 
robot collision with non detectable obstacles, 
J. Int. Robot. Syst. 48(4), 539-566 (2007) 

M. Mucientes, J. Casillas: Quick design of fuzzy 
controllers with good interpretability in mobile 
robotics, IEEE Trans. Fuzzy Syst. 15(4), 636-651 
(2007) 

F. Barrientos, G. Sainz: Interpretable knowledge 
extraction from emergency call data based on fuzzy 
unsupervised decision tree, Knowl.-Based Syst. 
25(1), 77-87 (2011) 

L. Troiano, L.J. Rodriguez-Muniz, J. Ranilla, |. Diaz: 
Interpretability of fuzzy association rules as means 
of discovering threats to privacy, Int. J. Comput. 
Math. 89(3), 325-333 (2012) 

A. Bargiela, W. Pedrycz: Granular Computing: An 
Introduction (Kluwer Academic Publishers, Boston, 
Dordrecht, London 2003) 

A. Bargiela, W. Pedrycz: Human-Centric Infor- 
mation Processing Through Granular Modelling, 
Studies in Computational Intelligence, Vol. 182 
(Springer, Berlin, Heidelberg 2009) 


237 


hl | 3 Hed 


15. Fuzzy Clustering — Basic Ideas and Overview 


Sadaaki Miyamoto 


This chapter overviews basic formulations as 
well as recent studies in fuzzy clustering. A ma- 
jor part is devoted to the discussion of fuzzy 
c-means and their variations. Recent top- 
ics such as kernel-based fuzzy c-means and 
clustering with semi-supervision are men- 
tioned. Moreover, fuzzy hierarchical clustering 
is overviewed and fundamental theorem is 
given. 


15.1. Fuzzy Clustering ..................eee 239 


15.1 Fuzzy Clustering 


Data clustering is an old subject [15.1—4] but recently 
more researchers are developing different techniques 
and application fields are enlarging. Fuzzy cluster- 
ing [15.5-10] is also popular in a variety of fuzzy 
systems. This chapter reviews basic ideas of fuzzy clus- 
tering, and provides a brief overview of recent studies. 
First, we consider the most popular method in fuzzy 
clustering, i.e., fuzzy c-means. There are many vari- 
ations, extensions, and applications of fuzzy c-means, 


15.2 Fuzzy c-Means 


We begin with basic notations and then introduces 
the method of fuzzy c-means by Dunn [15.5,6] and 
Bezdek [15.7, 8]. 


15.2.1 Notations 


The set of objects for clustering is denoted by X = 
{x1,... Xy} where each objects is a point of p- 
dimensional Euclidean space RP: x, = (xl, .. . , 3%), k = 
1,..., N. Clusters are denoted either by G; or simply by 
i when no confusion arises. Clustering uses a similarity 
or dissimilarity measure. In this section, a dissimilarity 
measure denoted by D(x, y), x, y € RP, is used. 


15:2 FUZZY -Meangos 239 
15:21  NOPAHONS censes setes 239 
15.2.2 Fuzzy c-Means Algorithm........... 240 
15.2.3 A Natural Classifier ................ 240 
15.2.4 Variations of Fuzzy c-Means....... 241 
15.2.5 Possibilistic Clustering ............... 242 
15.2.6 Kernel-Based Fuzzy c-Means ..... 243 
15.2.7 Clustering with Semi-Supervision 244 

15.3 Hierarchical Fuzzy Clustering ................ 245 

154 HOMCMSHOM: «6.52 c0..cceeceseseloasssccecatncedeanesc 246 

Referentes. n access casvesacnnwsapnesnivaracecmasacccasioes 247 


some of which are described here. Recent studies on 
kernel-based methods and clustering with semisupervi- 
sion are also discussed in relation to fuzzy c-means. 

Moreover, another fuzzy clustering is briefly men- 
tioned which uses the transitive closure of fuzzy 
relations [15.10]. This method is shown to be 
equivalent to the well-known methods of the sin- 
gle linkage of agglomerative hierarchical cluster- 
ing [15.11]. 


Although we have different choices for dissimilarity 
measure, a standard measure is the squared Euclidean 
distance 


P 
D(x, y) = lx =y = X w -y . (15.1) 


j=1 


In fuzzy c-means and related methods, the number 
of clusters, denoted by c is assumed to be given be- 
forehand. The membership of object x, to cluster i is 
assumed to be given by uxi. Moreover, the collection 
of all memberships is denoted by matrix U = (uzi). It 
is natural to assume that ug; € [0,1] for all 1 < i< c 


239 


v 
o 
= 
+ 
ow 
— 
vI 
. 
N 


240 PartB 


Fuzzy Logic 


T'Sl | d Hed 


and 1 <k< N, and, moreover, De uy = 1l, for all 
1<k<N. 

The method of fuzzy c-means also uses a center for 
a cluster, which is denoted by v; = (v},..., v?) € RP for 
cluster i. For the ease of reference, all cluster centers are 
summarized into V = (v1, ... , Ve). 


Basic K-Means Algorithm 
Many studies of clustering handles K-means [15.12] as 


a standard method. 


Algorithm 15.1 KM: Basic K-means algorithm 


KM0: Generate randomly c cluster centers. 

KM1: Allocate each object x, (k = 1,...,N) to the 
cluster of the nearest center. 

KM2: Calculate new cluster centers as the centroid 
(the center of gravity). If all cluster cen- 
ters are convergent, stop. Otherwise, go to 
KMI. 

End KM. 

Note that the centroid of a cluster G; is given by 
v= Gl J yeg; Xk» Where |G;| is the number of objects 
in Gi. 


15.2.2 Fuzzy c-Means Algorithm 


It should first be noted that the basic idea of fuzzy 
c-means is an alternative optimization of an objective 
function proposed by Dunn [15.5, 6] and Bezdek [15.7, 
8] 


c N 
JUV) = YD "Dev (m> 1), 
i=1k=1 
(15.2) 


where D(x;,v;) is the squared Euclidean distance 
(15.1). 

Using this objective function, the following alterna- 
tive optimization is carried out. 


Algorithm 15.2 FCM: Fuzzy c-means algorithm 


FCMO: Generate randomly initial fuzzy clusters. 


Let the solutions be (U, V) 


FCM1: Minimize J(U, V) with respect to U. Let the 
optimal solution be a new U. 
FCM2: Minimize J(U, V) with respect to V. Let the 


optimal solution be a new V. 


FCM3: If the solution (U s V) is convergent, stop. 
Else go to FCM1. 


End FCM. 


Note that optimization with respect to U is with the 
constraint 

ui E€ [0,1], Vi<i<c,1<k<QN, 
c 


Yiug=l, YI<k<N, 
j=l 


(15.3) 


while optimization with respect to V is without any con- 
straint. 
It is not difficult to have the optimal solutions as 
follows 
1 
= D(xk.vi) maT 
Uki = yeo ; (15.4) 
IEL poy.) = 
N = 
— Dei (tna) Xk 
a N = 
=i (ji) 
The derivations are omitted; the readers should refer 
to [15.8] or other textbooks. 
Note also that (15.4) appears ill-defined when x, = 
v;. In such a case, we use 


1 


= 1+ R p (Saee 
Dies D(x.) 71 


(15.5) 


Uki ; (15.6) 


which has the same value as (15.4) without a singular 
point. 


Moreover, we write these equations without the use 
of bars like 


1 
— 
D(xk,vi) mT 


Da — 
a 


1 = 
D(x yj) m—I 
N 
opm na) Xk 
T SN a er oe 
a= (ui) 


for simplicity and without any confusion. 


Uki = 


i 


15.2.3 A Natural Classifier 


These solutions lead us to the following natural fuzzy 
classifier with a given set of cluster centers V 


1 


Le Sa, 


1 
D(x.vj) m—1 


Uj (x; V) = (15.7) 


Fuzzy Clustering — Basic Ideas and Overview | 15.2 Fuzzy c-Means 


There is nothing strange in (15.7), since U;(x; V) has 
been derived from ug; simply by replacing object x; by 
the variable x. 

This replacement appears rather trivial and it also 
appears that U;(x; V) has no further information than 
ugi. On the contrary, this function is important if we 
wish to observe theoretical properties of fuzzy c-means. 

The following propositions are not difficult to prove 
and hence the proofs are omitted [15.13]. In particular, 
the first proposition is trivial. 


Proposition 15.1 
U(x; V) = ug, i. e., the fuzzy classifier interpolates the 
membership value uzi. 


Proposition 15.2 
When |x| go to infinity, U;(x; V), i=1,...,c, ap- 
proaches the same value of 1/c 


1 
lim U;(x;V)=-. 
c 


lix co 


Proposition 15.3 
The maximum value of U;(x; V),i=1,...,c, is at x = 
Vi 


max U;(x; V) = U;(v;,V) = 1. 
xERP 


The significance of the function U;(x; V) is shown in 
these propositions. An object x; is a fixed point, while 
xis a variable. Without such a variable, we cannot ob- 
serve theoretical properties of fuzzy c-means. 


15.2.4 Variations of Fuzzy c-Means 


Many variations of fuzzy c-means have been studied, 
among which we first mention fuzzy c-varieties [15.8], 
fuzzy c-regressions [15.14], and the method of 
Gustafson and Kessel [15.15] to take clusterwise co- 
variance into account. Note that these are relatively old 
variations and they all are based on variations of objec- 
tive functions including the change of D(x, v). 

In this section, we use the additional symbols 
(x,y) =x! y= y! x, which is the standard scalar prod- 
uct of the Euclidean space R?. Moreover, we introduce 


D(x, v; S) = (x= v) ST! (xv), 


which is the squared Mahalanobis distance. 


Fuzzy c-Varieties 
Let us first consider a q-dimensional subspace 


span{s},...,Sg} = {a11 H +H dgSq: 
-0 <q <+, 
k=1,...,q}, 
where s1,...,Sq is a set of orthonormal set of vectors 


with q < p. So is a given vector of RP. A linear variety L 
in R? is represented by 


L= {l= sot aisi +++ + ags: 
-00 <q < +00, k= laag} 


Let 


P(x, 1) = arg max (x — so, l— so) 
IEL 


be the projection of x onto L. We then define 
D(xXk, li) = lxx ae soll? — P(x, 1) “ 


We consider the objective function for fuzzy c-varieties 


EN 
JU, D=) 9 "Dæ l) (m>1), (15.8) 


i=1k=1 


where L = (li, ... , le). 
The derivation of the solutions is omitted here, but 
the solutions are as follows: 


== 
1 
D(xk,li) ™=1 


Uki = s i ; (15.9) 
I=L (yh) T 

-Ù yL 1 (uki) Xk 

sO = m (15.10) 


a (uki) 


while a" (Gj =1,...,q) is the normalized eigenvector 
corresponding to the g maximum eigenvalues of the ma- 


trix 


N 


F aw E 
Aj = Yo u)" (x -s?) (x — sf?) . 


k=1 


(15.11) 


Note that the superscript s® shows those vectors for 
cluster i. 

Therefore, alternative optimization of (15.8) is done 
by calculating (15.9), (15.10), and the eigenvectors for 


(15.11) repeatedly until convergence. 


241 


esl | d Wed 


242 


7st | d Hed 


Part B 


Fuzzy Logic 


Fuzzy c-Regression Models 
In this section, we assume that x= (x!, ...,4?) is 
a p-dimensional independent variable, while y is 
a scalar-valued dependent variable. Hence data set 
{(x1, y1), ---, (xw, yn)} is handled. We consider c re- 
gression models 


p 
jj +1 : 
y=) > Bx + BP s te Tysciges 
j=l 
Hence, the squared error is taken to be the dissimilarity 


2 


p o. 
D((xx, yk), Bi) = | yk— 5 ppe 


j=l 


and an objective function 


c N 
J(U,B) =) J 0" D(x, Ye), B) (m> 1), 
i=1k=1 
(15.12) 


is considered, where B; = (B},..., petty and B= 
(B,,...,B.). To express the solutions in a compact 
manner, we introduce two vectors 


z= (xl, 38,1), B= (Bhs b t). 


Then we have 


1 
= 
D( (xx .yK) Bi) mI 


Mi = Fe ; (15.13) 
T=" Oky), Bi) mT 
N ly 

p= (2 Yi (uei)"Yeze (15.14) 
k=1 k=1 


Thus the alternative optimization of J(U, B) is to calcu- 
late (15.13) and (15.14) iteratively until convergence. 


The Method of Gustafson and Kessel 
The method of Gustafson and Kessel enables us to in- 
corporate clusterwise covariance variables denoted by 
S,,...,5,. We consider 


c N 
J(U,V,S) = > Ys u)" Dx. v; Si) (m>1), 
i=1k=1 
(15.15) 


where a simplified symbol S = (S1,..., Sc) and the 
clusterwise squared Mahalanobis distance D(xx, v;i; Sj) 
is used. Note also that S; is with the constraint 


(0; > 0) 


where p; is a fixed parameter and |S;| is the determinant 
of S;. We assume, for simplicity, p; = 1 [15.16]. 
The solutions are as follows 


[Si] = pi (15.16) 


= 


c A 
Dre vi Si) \ T 

uri = »( vist 2) (15.17) 
= D(x, vj; Sj) 
N m 

y= Det et)" (15.18) 
pe UK)” 
1 N 

i= aE J u" avav. (15.19) 
Sil? k=1 


where $; = YON (u) (te — vi) (Xe — vi) T 

Since three types of variables are used for the 
method of Gustafson and Kessel, the alternative opti- 
mization iteratively calculates (15.17-15.19) until con- 
vergence. 


15.2.5 Possibilistic Clustering 


The possibilistic clustering [15.17,18] proposed by 
Krishnapuram and Keller does not use the con- 
straint (15.3) in the alternative optimization algorithm 
FCM. Rather, the optimization with respect to U is 
without any constraint. To handle arg miny J(U, V) 
of (15.2) without a constraiint leads to the trivial so- 
lution of U=O (the zero matrix), and hence they 
proposed a modified objective function 


c 


N 
Joos(U, V) = Yuri)" D(x, vi) 


i=1k=1 


g N 
+ Xn xe — Uji)" (m> 1), 


i=1 k=l 
(15.20) 
where n; (i = 1,...,c) is a positive constant. 
We easily have the optimal U 
1 
p= = (15.21) 
D(xg.vi) \) "T 
ae) 


while optimal V is given by (15.5). 


Fuzzy Clustering — Basic Ideas and Overview | 15.2 Fuzzy c-Means 


The natural fuzzy classifier derived from the possi- 
bilistic clustering is 


1 
Ui(x; vi) = — r >; A EE T 
ie (22) 
(15.22) 


This classifier has the following properties: 


Proposition 15.4 
Ui (xx; vi) = uki by (15.21), i.e., the possibilistic classi- 
fier interpolates the membership value u;;. 


Proposition 15.5 
When |x| go to infinity, U;(x;v;) (i= 1,...,c) ap- 
proaches zero 


lim U(x; v) =0. 


IlxI| co 


Proposition 15.6 
The maximum value of U;(x;v;) (i= 1,...,c) is atx = 
Vi 


max U;(x; vi) = U;(j, v) = 1. 
xERP 


15.2.6 Kernel-Based Fuzzy c-Means 


The support vector machines [15.19, 20] is now one of 
the most popular methods of supervised classification. 
Since positive definite kernels are frequently used in 
support vector machines, the study of kernels has also 
been done by many researchers (e.g., [15.21]). The pos- 
itive definite kernels can also be used for fuzzy c-means, 
as we see in this section. 

The reason why we use kernels for clustering is that 
essentially the K-means and fuzzy c-means have linear 
boundaries between clusters. 

Note that the K-means classifier uses the nearest 
center allocation rule for a given x € R? 


x—> G; 4> i=arg min D(x, vi), 
1<j<c 


and hence the rules generates the Voronoi re- 
gion [15.22] with the centers v,,...,v. which has 
piecewise linear boundaries. 


For fuzzy c-means, the classifiers are fuzzy but if 
we introduce simplification of the rules by crisp reallo- 
cation [15.13] by 


x—> G; => i=arg max U;(x; V). 
zje 


Then we again have the Voronoi regions with the cen- 
ters Vig sees Veo 

The introduction of the covariance variables enables 
the cluster boundaries to be quadratic, but more flexible 
nonlinear boundaries cannot be obtained. 

In order to have clusters with nonlinear boundaries, 
we can use positive-definite kernels. Kernels are intro- 
duced by using a high-dimensional mapping ®: R? —> 
H, where H is a Hilbert space with the inner product 
(-,-)y and the norm ||- |a. 

Given objects x1,...xy, we consider its images 
by the mapping ®: ®(x,),..., (xy). Note that the 
method of kernels does not assume that an explicit form 
of (xı), ..., (xy) is known, but their inner product 
(D(x;), B%))z is assumed to be known. Specifically, 
a positive-definite function K(x, y) is given and we as- 
sume 


K(x, y) = (Dx), OQ))a - 


This assumption seems abstract, but if we are given an 
actual kernel function, the method becomes simple, for 
example, a well-known kernel is the Gaussian kernel 


K(x, y) = exp(—C||x—yl|*) . 


Then what we handle is (®(x;), ®())4 = exp(—C||x— 
yl®. 

We now proceed to consider kernel-based fuzzy 
c-means [15.23]. The objective function uses (x1), 
..., (xy) and cluster centers w1, ..., We of H 


c N 
J(U, V) = » P (wei) IPO) -wilg (m>1), 


i=1k=1 


(15.23) 
where W = (w1, ..., Wc). We have 
— 
mat 
‘me ie (15.24) 
gal a e 
IDe —wil g! 
N m 
— (upi) D (xk 
"m Dg= Uni)” Pe) (15.25) 


YL 1 (uki) 


243 


esl | d Hed 


244 Part B 


Fuzzy Logic 


Z'Sl | d Hed 


Note, however, that the explicit form of ®(x,) and 
hence w; is not available. Therefore, we eliminate w; 
from the iterative calculation. Thus, the updating w; is 
replaced by the update of 


Dy (Xx, Wi) = || (Xe) — will? - 
We then have 
Dy (Xk, Wi) = K (Xx, Xt) 
20 Sn 
XC K (x4, Xx) 


~ SN 
i (Uji) j=l 


1 
(Eii) 


N N 
x by YS yue)" K, xe) . 


+ 


j=l €=1 
(15.26) 
Using (15.26), we calculate 
—__1___ 
uy = — Pe Gem (15.27) 


UF! an 


=1 . 
Dy (xg. wi) m1 


Thus the alternative optimization repeats (15.26) 
and (15.27) until convergence. 

Fuzzy classifiers of kernel-based fuzzy c-means can 
also be derived [15.13]. We omit the details and show 
the function in the following 


N 
doi" KG. y) 


N 
ae (uki) j=l 


N 1 
(Sey) 


N N 
x D Yo Cuei)” K i xe), 


j=1 £=1 


D(x; vi) = K(x, x) — 


(15.28) 
—ı 


|S Dæ 
Uj(x) = Diaen) (15.29) 


J=1 


A Simple Numerical Example 
A well-known and simple example to see how the ker- 
nel method work to produce clusters with nonlinear 
boundaries is given in Fig. 15.1. There is a circular clus- 
ter inside another group of objects of a ring shape. We 
call it ring around circle data. 

Figure 15.2 shows the shape of a fuzzy classifier 
(15.29) with m = 2 and c = 2 obtained from the ring 
around circle data. Thus the ring and the circle inside 
the ring are perfectly separated. 


15.2.7 Clustering 
with Semi-Supervision 


Recently many studies consider semisupervised learn- 
ing (e.g., [15.24,25]). In this section we briefly 
overview literature in fuzzy clustering with semi- 
supervision. 

We begin with two classes of semisupervised learn- 
ing after Zhu and Goldberg [15.25]. They defined 
semisupervised classification that has a set of labeled 
samples and another set of unlabeled samples. Another 
class is called constrained clustering which has two 
sets of must-links ML = {(x,y),...} and cannot-links 
CL = {(z,w),...}. Two objects x and y in the must-link 
set has to be allocated in the same cluster, while z and 
w in the cannot-link set has to be allocated to different 
clusters. 


A 
1 x 
x x 
x% mx į x * 
gone he xX yx x x 
xX x xX, x 
xX x 
0.8 xx x% xX 
x x x 
x x 
x 
0.6 ork xy x 
x x x x% x 
x Xx xX x x 
$ x<” X x% 7 
x xx X x x xXx 
x x x, 
0.4 x * Xx% xx 
xx $x 
x x 
x 
x * 
xx 
0.2 
x x x| x 
x x 
x x z 
x x 
x K x xx Lk 
0 x% x 
> 
0 0.2 0.4 0.6 0.8 1 


Fig. 15.1 An example of a circle and a ring around the 
circle 


Fuzzy Clustering - Basic Ideas and Overview 


15.3 Hierarchical Fuzzy Clustering 


Let us briefly mention two studies in the first 
class of semisupervised classification. Bouchachia and 
Pedrycz [15.26] used the following objective function 
that has an additional term 


e N 
J(U, V) = x Yi {quay Dax, vi) 


i=1k=1 
+ a (uzi — ki)” D(&k, vi) } . 


where iz; is a given membership showing semisupervi- 
sion. 

Miyamoto [15.27] proved that an objective func- 
tion with entropy term [15.13,28-31] can generalize 
the EM solution of the mixture of Gaussian densities 
with semisupervision [15.25, p. 27]. 

Another class of constrained clustering [15.32, 33] 
has also been studied using a modified objective func- 
tion [15.34] with additional terms of the must-link and 
cannot-link 


e N 
JU, V) =X) ui)’ Dar vi) 


i=1k=1 


> 5 Ugly + > 5 UkiUjk 


(xx ..43) EML il (k. )ECL i= 1 


N 
+a 
k=1 


15.3 Hierarchical Fuzzy Clustering 


There is still another method of fuzzy clustering that is 
very different from the above fuzzy c-means which is 
related to the single linkage in agglomerative hierarchi- 
cal clustering. 

In this section, we assume that objects X = 
{x1,...,Xn} are not necessarily in an Euclidean space. 
Rather, a relation S(x,y) satisfying reflexivity and 
symmetry 


S(xx)=1, VWxex, 

S(x,y) = S(y,x), Vx, yEex 
is assumed, where a larger value of S(x, y) means that x 
and y are more similar. 


We then describe the general algorithm of agglom- 
erative hierarchical clustering as follows: 


(15.30) 
(15.31) 


Algorithm 15.3 AHC: Algorithm of Agglomerative 
Hierarchical Clustering 


AHCI: Let initial clusters be individual objects 


Gi = {xj}, i= 1,...,N and put K = N. 


Fig. 15.2 Two clusters and a fuzzy classifier from the ring 
circle data; fuzzy c-means with m = 2 is used 


To summarize, the method of fuzzy c-means with 
semisupervision including constrained clustering has 
not yet gained wide popularity, due to the limited num- 
ber of studies comparing with those in another field of 
machine learning where many papers and a number of 
books have been published [15.24, 25, 32]. Hence more 
results can be expected in this area of studies. 


AHC2: Find pair of clusters of maximum similarity 
(Gp, G4) = arg max S(G;, G;) . 
ij 

Merge G, = G U G4. K = K- 1 and if K = 
1, stop. 

AHC3: Update S(G,, G’) for all other clusters G’. 
Go to AHC1. 

End AHC. 


The updating step of AHC3 admits different choices 
of similarity between clusters, among which the single 
linkage methods uses 


eee 


Although there are other choices, discussion in this sec- 
tion is focused upon the single linkage. 

On the other hand, studies in the 1970s including 
Zadeh’s [15.10] proposed hierarchical clustering using 


around 


245 


€°SL| 9 Hed 


246 PartB 


Fuzzy Logic 


7SL| d Hed 


the transitive closure of fuzzy relations S(x, y). To de- 
fine the transitive closure, we introduce the max—min 
composition 


(So T)(x, z) = max min{S(x, y), T(y,z)} , 


where S and T are the fuzzy relations of X. Using the 
max-min composition, we can define the transitive clo- 
sure S* of S 


S* (x, y) = max{S(x,y), S y), P@y),- 5, 


where S?=SoS and S‘=SoS*—!, It also is not 
difficult to see S* = SMT! when S is reflexive and 
symmetric. 

When S is reflexive and symmetric, the transitive 
closure S* is also reflexive and symmetric, and more- 
over transitive 

S* (x,y) > min{S* (x, z),S*(z,y)}}, YzexX. 

If a fuzzy relation is reflexive, symmetric, and transi- 
tive, then it is called a fuzzy equivalence relation: it 
has a property that every a-cut is a crisp equivalence 
relation 


B“ lex x)=, YxexX, (15.32) 
[S*]a(x, y) = [S*]le0 x), Yx, yEX, (15.33) 
[S*]o(x,y) = 1, [S*]e(y,z) = 1 

> [S*]a(%,z)=1, (15.34) 


where [S*]q (x, y) is the a-cut of S* (x, y) 


[S*]e(x,y) =1 — > S*(x,y) >a; 
[S*]a@,y) =0 <> S*(x,y) <a. 


Thus each a-cut of S* induces an equivalence class of 
X, and moreover if œ decreases, the equivalence class 


15.4 Conclusion 


We overviewed fuzzy c-means and related studies. 
Kernel-based clustering algorithm and clustering with 
semisupervision were also discussed. Moreover, fuzzy 
hierarchical clustering which is based on fuzzy graphs 
and very different from fuzzy c-means was briefly re- 
viewed. 


becomes coarser, and therefore S* defines a hierarchical 
clusters. 

We now can describe a fundamental theorem on 
fuzzy hierarchical clustering. 


Theorem 15.1 Miyamoto [15.11] 

Given a set of objects X = {x1,...,xy} and a similar- 
ity measure S(x,y) for all x, y € X, the following four 
methods give the same hierarchical clusters: 


1. Clusters by the single linkage, 
Clusters by the transitive closure S*, 

3. Clusters as vertices of connected components of 
fuzzy graph with vertices X and edges X x X with 
membership values S(x, y), and 

4. Clusters generated from the maximum spanning 
tree of the network with vertices X and edges X x X 
with weight S(x, y). 


The above theorem needs some more explanations. 
Connected components of a fuzzy graph means the 
family of those connected components of all -cuts of 
the fuzzy graph. Since connected components grows 
with decreasing a, those sets of vertices form hier- 
archical clusters. The minimum spanning tree is well 
known, but the maximum spanning tree is used instead. 
The way in which hierarchical clusters are generated 
is the same as the connected components of the fuzzy 
graph. 

Although this theorem shows the importance of 
fuzzy hierarchical clustering, it appears that no new re- 
sults that are useful in applications are included in this 
theorem. Miyamoto [15.35] showed, however, that other 
methods of DBSCAN [15.36] and Wishart’s mode anal- 
ysis [15.37] have close relations to the above results, 
and he discusses the possibility of further applica- 
tion of this theorem, e.g., to nonsymmetric similarity 
measure. 


New methods and algorithms based on the idea of 
fuzzy c-means are still being developed, as the funda- 
mental idea has enough potential to produce many new 
techniques. On the other hand, fuzzy hierarchical clus- 
tering is rarely mentioned in the literature. However, 
there are possibilities for having new theory, methods, 


Fuzzy Clustering — Basic Ideas and Overview | References 247 


and applications in this area, as the fundamental math- 
ematical structure is well established. 

Many important studies were not mentioned in 
this overview, for example, Ruspini’s method [15.38] 


ers may read books on fuzzy clustering [15.8, 
9,13, 16,39] for details of fuzzy clustering. Also, 
Miyamoto [15.11] can still be used for study- 
ing the fundamental theorem on fuzzy hierarchical 


and cluster validation measures are important. Read- clustering. 

References 

15.1 R.0. Duda, P.E. Hart: Pattern Classification and 15.20 V.N. Vapnik: The Nature of Statistical Learning The- 
Scene Analysis (John Wiley, New York 1973) ory, 2nd edn. (Springer, New York 2000) 

15.2 B.S. Everitt: Cluster Analysis, 3rd edn. (Arnold, Lon- 15.21 B. Schdlkopf, A.J. Smola: Learning with Kernels (MIT 
don 1993) Press, Cambridge 2002) 

15.3 A.K. Jain, R.C. Dubes: Algorithms for Clustering Data 15.22 T. Kohonen: Self-Organizing Maps, 2nd edn. 
(Prentice Hall, Englewood Cliffs, NJ 1988) (Springer, Berlin 1997) 

15.4 L. Kaufman, P.J. Rousseeuw: Finding Groups in 15.23 S. Miyamoto, D. Suizu: Fuzzy c-means clustering 
Data: An Introduction to Cluster Analysis (Wiley, using kernel functions in support vector machines, 
New York 1990) J. Adv. Comput. Intell. Intell. Inf. 7(1), 25-30 (2003) 

15.5 J.C. Dunn: A fuzzy relative of the ISODATA process 15.24 0. Chapelle, B. Schdlkopf, A. Zien (Eds.): Semi- 
and its use in detecting compact well-separated Supervised Learning (MIT Press, Cambridge 2006) 
clusters, J. Cybern. 3, 32-57 (1974) 15.25 X. Zhu, A.B. Goldberg: Introduction to Semi- 

15.6 J.C. Dunn: Well-separated clusters and optimal Supervised Learning (Morgan Claypool, San Rafael 
fuzzy partitions, J. Cybern. 4, 95-104 (1974) 2009) 

15.7 J.C. Bezdek: Fuzzy Mathematics in Pattern Classifi- 15.26 A. Bouchachia, W. Pedrycz: A semi-supervised 
cation, Ph.D. Thesis (Cornell Univ., Ithaca 1973) clustering algorithm for data exploration, IFSA 

15.8 J.C. Bezdek: Pattern Recognition with Fuzzy Objec- 2003, Lect. Notes Artif. Intell. 2715, 328-337 (2003) 
tive Function Algorithms (Plenum, New York 1981) 15.27 S. Miyamoto: An overview of hierarchical and 

15.9 J.C. Bezdek, J. Keller, R. Krishnapuram, N.R. Pal: non-hierarchical algorithms of clustering for semi- 
Fuzzy Models and Algorithms for Pattern Recogni- supervised classification, LNAI 7647, 1-10 (2012) 
tion and Image Processing (Kluwer, Boston 1999) 15.28 R.-P. Li, M. Mukaidono: A maximum entropy ap- 

15.10 L.A. Zadeh: Similarity relations and fuzzy orderings, proach to fuzzy clustering, Proc. 4th IEEE Int. Conf. 
Inf. Sci. 3, 177-200 (1971) Fuzzy Syst. (FUZZ-IEEE/IFES'95) (1995) pp. 2227-2232 

15.11 S. Miyamoto: Fuzzy Sets in Information Retrieval 15.29 S. Miyamoto, M. Mukaidono: Fuzzy c-means as 
and Cluster Analysis (Kluwer, Dordrecht 1990) a regularization and maximum entropy approach, 

15.12 J.B. MacQueen: Some methods of classification and Proc. 7th Int. Fuzzy Syst. Assoc. World Congr. 
analysis of multivariate observations, Proc. 5th IFSA'97), Vol. 2 (1997) pp. 86-92 
Berkeley Symp. Math. Stat. Prob. (Univ. of California 15.30 H. Ichihashi, K. Honda, N. Tani: Gaussian mixture 
Press, Berkeley 1967) pp. 281-297 PDF approximation and fuzzy c-means clustering 

15.13 S. Miyamoto, H. Ichihashi, K. Honda: Algorithms for with entropy regularization, Proc. 4th Asian Fuzzy 
Fuzzy Clustering (Springer, Berlin 2008) Syst. Symp., Vol. 1 (2000) pp. 217-221 

15.14 R.J. Hathaway, J.C. Bezdek: Switching regression 15.31 H. Ichihashi, K. Miyagishi, K. Honda: Fuzzy c- 
models and fuzzy clustering, IEEE Trans. Fuzzy Syst. means clustering with regularization by K-L infor- 
1(3), 195-204 (1993) mation, Proc. 10th IEEE Int. Conf. Fuzzy Syst., Vol. 2 

15.15 E.E. Gustafson, W.C. Kessel: Fuzzy clustering with (2001) pp. 924-927 
a fuzzy covariance matrix, IEEE CDC (1979) pp. 761- 15.32 S. Basu, I. Davidson, K.L. Wagstaff: Constrained 
766 Clustering (CRC, Boca Raton 2009) 

15.16 F. HOppner, F. Klawonn, R. Kruse, T. Runkler: Fuzzy 15.33 N. Shental, A. Bar-Hillel, T. Hertz, D. Weinshall: 
Cluster Analysis (Wiley, New York 1999) Computing Gaussian mixture models with EM using 

15.17 R. Krishnapuram, J.M. Keller: A possibilistic ap- equivalence constraints, Advances in Neural In- 
proach to clustering, IEEE Trans. Fuzzy Syst. 1, 98- formation Processing Systems 16, ed. by S. Thrun, 
110 (1993) L.K. Saul, B. Schölkopf (MIT Press, Cambridge 2004) 

15.18 R.N. Davé, R. Krishnapuram: Robust clustering 15.34 N. Wang, X. Li, X. Luo: Semi-supervised kernel- 
methods: A unified view, IEEE Trans. Fuzzy Syst. 5, based fuzzy c-means with pairwise constraints, 
270-293 (1997) Proc. WCCI 2008 (2008) pp. 1099-1103 

15.19 V.N. Vapnik: Statistical Learning Theory (Wiley, New 15.35 S. Miyamoto: Statistical and non-statistical mod- 


York 1998) 


els in clustering: An introduction and recent topics, 


SL | d Hed 


248 PartB 


Fuzzy Logic 


SL | g Hed 


15.36 


Analysis and Modelling of Complex Data in Be- 
havioural and Social Sciences, JCS-CLADAG 12, ed. 
by A. Okada, D. Vicari, G. Ragozini (2012) pp. 3- 
6 

M. Ester, H.-P. Kriegel, J. Sander, X.W. Xu: A density- 
based algorithm for discovering clusters in large 
spatial databases with noise, Proc. 2nd Int. Conf. 
Knowl. Discov. Data Min. (KDD-96) (AAAI Press, 
Menlo Park 1996) 


15.37 


15.38 


15.39 


D. Wishart: Mode analysis: A generalization of 
nearest neighbour which reduces chaining effects, 
Numer. Taxn., Proc. Colloq., ed. by A.J. Cole (1968) 
pp. 283-311 

E.H. Ruspini: A new approach to clustering, Inf. 
Control 15, 22-32 (1969) 

D. Dumitrescu, B. Lazzerini, L.C. Jain: Fuzzy Sets and 
Their Application to Clustering and Training (CRC 
Press, Boca Raton 2000) 


249 


16. An Algebraic Model of Reasoning 
to Support Zadeh's CWW 


Enric Trillas 


In the very wide setting of a Basic Fuzzy Alge- 
bra, a formal algebraic model for Commonsense 
Reasoning is presented with fuzzy and crisp sets 
including, in particular, the usual case of the Stan- 
dard Algebras of Fuzzy Sets. The aim with which 
the model is constructed is that of, first, adding to 
Zadeh's Computing with Words a wide perspective 
of ordinary reasoning in agreement with some ba- 
sic characteristics of it, and second, presenting an 
operational ground on which linguistic terms can 
be represented, and schemes of inference posed. 
Additionally, the chapter also tries to express the 
author's belief that reasoning deserves to be stud- 
ied like an Experimental Science. 


16.3 Reasoning...................cccccceceeceeeeeeeeeees 251 
16.3.1 A Remark 

on the Mathematical Reasoning. 252 

16.3.2 A Remark on Medical Reasoning. 254 


16.4 Reasoning and Logic......................00. 254 


16.5 A Possible Scheme 
for an Algebraic Model 
of Commonsense Reasoning................. 255 


16.6 Weak and Strong Deduction: 
Refutations and Conjectures in a BFA 
(with a Few Restrictions)...................... 260 


16.7 Toward a Classification of Conjectures... 262 


16.8 Last Remarks... 264 
16.1 A View on Reasoning „o...on J49 16.9 TONCHUSIONSS: sorei: 265 
a MOUE eiser shaennssgnnodhebenddt 250 — Referents. ociscene iniedi irei 266 


16.1 A View on Reasoning 


Thinking is a yet not scientifically well-known natu- 
ral and complex neurophysiological phenomenon that, 
shown by people and given thanks to their brains, is 
mostly and significantly externalized in some observ- 
able physical ways, like it is the case of talking by 
means of uttered or written words. Only recently the 
functioning of the brain’s systems started to be stud- 
ied with the current methods of experimental science, 
and made some knowledge on its internal working 
possible. 

Talking acquires full development with a typically 
social human manifestation called telling with, at least, 
its two modalities of discourse and narrative that, either 
in different oral, spatial hand’s signs, or written forms, 
not only support telling but, together with abstraction, 
could be considered among the highest expressions of 
brain’s capability of thinking, surely reinforced during 
evolution by the physical possibilities of the humans to 
tackle and to consider the possible usefulness of ob- 


jects. Telling can be roughly described as consisting in 
chains of sentences organized with some purpose. 

Thinking and telling are but names for abstract 
concepts covering the totality of those human actions 
designated by to think, to tell, and to discuss, of which 
only the last two can be directly observed by a layper- 
son. With telling and discussing not only reasoning is 
shown, but also abstraction is conveyed. In this sense, it 
can be said that telling and discussing cannot exist with- 
out reasoning, and that they are intermingled in some 
inextricable form that allows us to guess for foresee- 
ing what will come in the future, to imagine what could 
happen in it, and to express it by words. At this point, 
the human capability of conjecturing appears as some- 
thing fundamental [16.1]. 

Foreseeing and imagining resulted essential for the 
growing and expanding of mankind on Earth, and are 
assisted by the human capabilities of guessing (or con- 
jecturing), and refuting, for saying nothing on those 


v 
w 
= 
“e 
ic) 
= 
fon) 
= 


250 PartB 


Fuzzy Logic 


T94 | 9 Hed 


emotions that so often drive human reasoning toward 
creative thinking. Of course, as thinking comprises 
more features than reasoning, like they are the case of 
feelings and imagining with sounds, images, etc., both 
concepts should not be confused. 

This Introduction only refers generically and just 
in a co-lateral form, to telling as supporting discourse 
and narrative in natural language, and for whose un- 
derstanding the context-dependent and purpose-driven 
concept of the meaning of statements [16.2], their se- 
mantics, is essential. This chapter mainly deals with 
an algebraic analysis of the reasoning that, generated 
by the physical processes in the brain thinking consists 
in, is externalized by means of the language of signs, 
oral or written expressions, figures, etc. It pretends 
nothing else than to be a first trial toward a possi- 
ble and more general algebra of reasoning than are 
Boolean algebras, orthomodular lattices, and standard 
algebras of fuzzy sets, the algebraic structures in which 
classical propositional calculus, quantum physics’ rea- 
soning, and approximate reasoning are, respectively, 
represented and formally presented [16.3, 4]. 

Reasoning is considered the manifestation of ratio- 
nality [16.5], a concept that comes from the Latin word 
ratio (namely, referred to comparing statements), and 
from very old, allowed to believe in a clear cut existing 
among the living species and under which the human 
one is the unique that is rational, that can reason. Rea- 
soning is also, at its turn, an abstract concept referring 
to several ways for obtaining conclusions from a pre- 
vious knowledge, or information, or evidence, given by 
statements that are called the premises; it is sometimes 
said that the premises are the reasons for the conclu- 
sions, or the reasons that support them. Conclusions are 
also statements, and without a previous knowledge nei- 
ther reasoning, nor understanding, is possible. 

Apparent processes of reasoning is what, to some 
extent, is observable and can be submitted to a 
Menger’s kind of exact thinking [16.6], by analyzing 


16.2 Models 


The pair telling—reasoning can be completed to a triplet 
of philosophically essential concepts with that of mod- 
eling that, facilitated by abstraction, is one among the 
best ways people have for capturing the basics of the 
phenomena appearing in some reality by not only tak- 
ing some perspective and distance with them, but, after 
observation, recognizing their more basic treats. Thanks 


them in general enough algebraic terms. This is ac- 
tually the final goal of this chapter, whose aim is to 
be placed at the ground of Zadeh’s Computing with 
Words (CWW) [16.7], helping to adopt in it the point 
of view of ordinary reasoning and not only that of the 
deductive one, and for a viewing of CWW close to the 
mathematical modeling of natural language and ordi- 
nary reasoning. 


Note 

Due, in a large part, to the author’s lack of knowledge, 
there are many topics appearing as manifestations of 
thinking and, up to some extent, matching with rea- 
soning, that cannot even be slightly taken into account 
in a paper that is, in itself, of a very limited scope, 
and only contains generic reflections in its nontechnical 
Sects. 16.1-16.4. Among these topics, there are some 
that, as far as the author knows, are not yet submitted 
to a systematic scientific study. It is the case, for in- 
stance, of what could be called sudden direct action, 
or action under pressing, as well as those concerning 
thinking and reasoning in both the beaux arts, and the 
music’s creation [16.8]. 

Those topics can still deserve some scientific and 
subsequent philosophical reflections. In some of them 
like it is, for instance, the case in modern paint- 
ing, it appears a yet mysterious kind of play between 
actual or virtual situations, and where the same ob- 
jects of reasoning can be seen as unfinished, but 
not as not-finished [16.9]. That is, in which, the 
antonym seems to play a role different from that of the 
negate [16.10]. 

Being those topics yet open to more analysis, per- 
haps some of the methodologies of analogical or case- 
based reasoning could suggest some ways for an ex- 
ploration of them in terms of fuzzy sets [16.11], and 
in view of a possible computational mechanization of 
some of their aspects. 


to models, not only the terms or words employed in 
the linguistic description of a phenomenon can be well 
enough understood by bounding its meaning in a formal 
frame, but only thanks to mathematical models the use 
of the safest type of reasoning, formal deduction, can be 
used for its study. In addition and currently, models are 
often useful formalisms through which the possibility 


An Algebraic Model of Reasoning to Support Zadeh's CWW 


16.3 Reasoning 


of coping with human reasoning by means of comput- 
ers can be done. This chapter essentially presents a new 
mathematical (algebraic) model for reasoning that is 
neither directly based on truth, but on contradiction, nor 
confuses reasoning and deducing, a confusion CWW 
should not fall in, since people mainly reason in non- 
formal and nondeductive ways. 

If Experimental Science can be roughly described 
as an art for building plausible specific models of some 
reality with the aim of intellectually capturing it, in no 
case they (namely, the algebraic ones) are established 
forever but do change with time, something that shows 
they cannot be always confused with a time-cut of the 
reality tried to be modeled. On the other side, good 
models do warrant, at least, to preserve what already 
hold in them once new evidence on the corresponding 
reality is known. 

Even if models are not to be confused with what 
they represent, same as an architect’s mockup that 
should not be confused with a building following 


16.3 Reasoning 


To give a definition of reasoning is very difficult, if 
not impossible since, at the end, it is a family of natu- 
ral processes generated in the brain. A first operational 
question that can be posed is, What is reasoning for? of 
which, in a first approach, can be just said that reason- 
ing is actually intermingled with the will of people to 
ask and to answer questions, to influence, to inform, to 
teach, to convince, or just to communicate with other 
people. Something usually managed by means of re- 
ciprocal telling, or dialogue, or conversation, between 
people. 

In a second and perhaps complementary approach, 
it should also be said that reasoning also serves for sat- 
isfying human’s will of searching new ideas that are 
not immediately seen in the evidence, and that can 
help for a further exploring of the reality to which the 
premises refer to. The human will for communicating 
and influencing, for foreseeing and for exploring, are 
made possible by means of reasoning that, in this per- 
spective, seems to be a capability acquired thanks to 
the brain complexity once it, and through the senses, 
is in contact with the world and can try to under- 
stand what is and what happens in it by means of the 
neuronal/synaptic representations reached in the brain 
thanks, in part, to the external receptors of the human 
nervous system. 


from it, mathematical and computational modeling are 
among the greatest human acquisitions coming from 
abstraction and safe reasoning. Actually, a good deal 
of what characterizes the current civilization derives 
from models, and mathematical ones are usually con- 
sidered at the very top of rationality since they had 
proven to allow for a good comprehension of several 
and important realities previously recognized as actu- 
ally existing. 

Models are a good help for a better understanding 
of the reality they model, and mathematical ones show 
the so-called unreasonable effectiveness of mathemat- 
ics [16.12] for the understanding of reality thanks, in 
a good part, to formal deduction, the safest form of rea- 
soning they allow to use. If it can metaphorically be said 
that if reality is in color, a model of this reality is a sim- 
plification of it in black and white; of course and up to 
some extent, modeling is an art with whose simplifica- 
tions it should not be avoided what is essential for the 
description of the corresponding reality. 


To answer a more scientifically sensitive and less 
psychological question, What does reasoning appear 
to be? let us place ourselves in two different, although 
overlapping, points of view: those from the premises 
and those from the conclusions. From the first, rea- 
soning can try to confirm, to explain, to enlarge, or 
to refute, the information conveyed by the premises. 
From the second, and in confrontation with the con- 
text of the premises, conclusions can be classified in 
either necessary, or contingent. At its turn, the last 
can be either explanations or speculations that is, con- 
clusions trying to foresee, either backward or forward 
and perhaps by jumps from the premises and without 
clear rules for it, respectively, something new and cur- 
rently unknown, but that eventually can be suggested by 
the premises and, very often, reached thanks to some 
additional background knowledge. All this is what con- 
ducts to the typically human capability called creativity, 
many times obtained through either hypotheses (back- 
ward case), or speculations (forward jumping case). It 
even helps to take rational, pondered decisions, that are 
among the essential characteristics shown by the intel- 
ligence attributed to people. 

For what concerns necessary conclusions, they are 
not usually for capturing something radically new since, 
in general, what is not included, or hidden, in the 


251 


€°OL| a Hed 


252 


€°OL| 9 Hed 


Part B 


Fuzzy Logic 


premises, but is external to them, is changing. From 
necessary premises not only necessary conclusions fol- 
low under some rules of inference, but also contingent 
ones that, in addition, cannot be always deployed from 
the premises under some well-known precise rules. 
Necessary conclusions are surely useful for arriving at 
a better understanding of what is strictly described by 
the premises, and for deploying what they contain, as it 
happens in formal sciences, and also to show what can- 
not be the case if the given premises do not perfectly 
reflect the reality. In this sense, refutation of either a part 
of the evidence, or of hypotheses, etc., is actually im- 
portant [16.13]. 


16.3.1 A Remark 
on the Mathematical Reasoning 


In the case of mathematics, where the only certified 
knowledge is furnished by the theorems proved by 
formal deduction, the former statement necessary con- 
clusions are not for capturing something radically new, 
could be actually surprising, especially if it could mean 
that nothing new can be deductively deployed from 
some supposedly necessary premises. For instance, 
from the Peano’s axioms defining the set N of natural 
numbers, a big amount of new and fertile concepts and 
theorems are deductively deployed and successfully ap- 
plied to fields outside mathematics. 

To quote a case: The not easily captured — for 
nonmathematicians — abstract concept of the real num- 
ber is constructed after that of rational number (whose 
set is Q), that comes from an equivalence between 
pairs of integers (set Z), after defining the concept of 
the integer number from another equivalence between 
pairs of natural numbers and that, at the end and with 
a jump from pairs to some infinite sequences of rational 
numbers, makes of each equivalence’s class of such se- 
quences a real number (set R). Not only real numbers 
are very useful in maths and outside them, but many 
more concepts arise after the real number is constructed 
in such a classificatory way, like, for instance, the two 
classifications of R in rational/irrational, and in alge- 
braic/transcendental, from which a big amount of useful 
certified knowledge arises. Some hints on how all that 
happened are the following. 

The just described process for knowing through 
classifying came along a large period of time in which 
the same concept of number suffered changes. For 
instance, irrational numbers were not seen as actual 
numbers by some old Greek thinkers for whom num- 
bers were only the rational ones. Also, the natural 


number concept came from the counting of objects 
in the real world, the integer from indexing units in 
scales above and below some point, and the rational 
ones from systematically fractioning segments. Even 
for a long time the number 0 was neither known, nor 
latter on considered as an actual number, and the in- 
terest in irrational numbers grown from the necessity 
of managing expressions composed by roots of ratio- 
nal numbers, essentially in the solution of polynomial 
equations, as well as from the relevance of some rare 
numbers like x and e. Since along this process, math 
was involved in many practical problems, those math- 
ematicians who finally constructed the real numbers in 
just pure mathematical terms and under the deductive 
procedures characterizing math, and including in it inte- 
gers, rational, and irrationals, were strongly influenced 
by that large history. A hint on this influence is shown, 
for instance, by the names assigned to some classes of 
natural numbers: prime, quadratic, cubic, friend, etc., 
not to speak of the concept of a complex number com- 
ing from the so-called imaginary numbers. 

The above-mentioned equivalences allowing to pass 
from N to Z, Z to Q, and Q to R, are classifications in 
some sets that, at each case, are derived from the for- 
mer one and that, once well formally constructed, and 
hence being of an increasingly abstract character and 
named with more or less common names, all of them 
are based, at the end, on the Peano’s definition of N. 
This definition makes N a very intriguing set, which cu- 
rious and often surprising certified properties generated 
one of the more complex branches of maths, Number 
Theory, in which study many concepts of mathemati- 
cal analysis and probability theory are used. Let us just 
remember the sophisticated proof that changed the old 
Fermat’s conjecture into a theorem, that is, into certified 
mathematical knowledge. Before deductively proven, 
mathematical conjectures, and in particular those in 
Number Theory, are but speculations well based on 
many positive instances. 

Even accepting that math is freely created by math- 
ematicians from some accepted minimal number of 
noncontradictory and independent axioms, and that the 
only way of certifying its knowledge is by deductive 
proof, it should not be forgotten that to imagine what at 
each case can be deduced from the premises, and that 
is often done by analogy with a previously solved, or at 
least considered, case, is a sample of commonsense rea- 
soning. In addition, the beauty mathematicians attribute 
to their results not only play an important role in the de- 
velopment of math, but also show how mathematicians 
do reason same as cultured people do. A nice example 


An Algebraic Model of Reasoning to Support Zadeh's CWW 


16.3 Reasoning 


of the fact that mathematicians also do reason through 
commonsense reasoning, is shown by the typical state- 
ment How beautiful this (supposed) concept or theorem 
is, claimed before a deductive proof fails and avoids to 
accept it as mathematical knowledge. 

Mathematicians such as philosophers, detectives, 
writers, businessmen, scientists, physicians, etc., rea- 
son thanks to their brains and with the experiential 
background knowledge stored in it. They are moved 
by curiosity, supported in imagination and conjectur- 
ing, and with the will of reaching new knowledge in 
their respective field. In addition, a high level of cre- 
ativity seems to be a remarkable characteristic of great 
mathematicians. All that originated Wigner’s famous 
unreasonable effectiveness of mathematics in the nat- 
ural sciences [16.12]. 

A way for obtaining a positive confirmation against 
reality of a reasoning, can be searched for through 
those conclusions showing a good level of agreement 
with the reality to which the information refers to, and 
a negative one through the refutations, or conclusions 
contradicting either the premises, or what necessarily 
follows from them. With respect to explanations, or hy- 
potheses, from them not only necessarily should follow 
the premises, but also all that necessarily follows from 
them. Of course, all that can be qualified as necessary 
should not be only as safe as the premises could be, 
but obtained by means of precise rules allowing any- 
body to totally reproduce the processes going from the 
premises to the conclusions. In this sense, perhaps it 
could be better said that necessary conclusions are safe 
in the context of the premises, but that contingent con- 
clusions are unsafe, and that the first should be obtained 
in a way showing that they are just reproducibly de- 
ployed from the premises. It seems clear enough that 
confirmation with reality should be searched, in gen- 
eral, by means of some theoretic or experimental testing 
of the conclusions against the reality to which they refer 
to. 

With respect to premises and to ensure its safety, its 
set should be bound to show some internal consistency, 
like it is for instance to neither contain contradictory 
pairs of them, nor self-contradictory ones, since in such 
a case it does not seem acceptable that the premises 
can jointly convey information that could be taken as 
admissible in the model. The same happens with the 
set of necessary conclusions directly deployed from the 
premises, since in this case the existence of contradic- 
tory conclusions will delete its necessity. The case of 
contingent conclusions is different, as it comes from 
everyday experience, since the existence of contradic- 


tory explanations or contradictory speculations is not 
only not surprising at all, but it is sometimes the case of 
having contradictory hypotheses or speculations for or 
from the same phenomenon. 

All that, if only expressed by words or linguistic 
terms, cannot facilitate by itself a clear, distinct, and 
complete comprehension on the subject of reasoning 
with fertility enough to go further. To increase the com- 
prehension of the machinery of reasoning is for what 
a modeling of it in black and white [16.14] can be 
a good help toward a better understanding of what it 
is, or surrounds it, like mathematical models offer in 
experimental sciences from, at least, Newton’s time. Of 
course, to establish a mathematical model for reason- 
ing it is indeed necessary not only to be acquainted 
both with what reasoning is, and its different modali- 
ties, but to have a suitable frame of representation for 
all the involved linguistic terms, including the concept 
of contradiction. 

The concept of representation in a suitable formal 
frame is essential for establishing models and, jointly 
with the use of the deductive (safe) reasoning the for- 
mal frame makes possible, is what not only marks an 
important frontier between Science and Philosophy, but 
also helps us to show the unreasonable effectiveness of 
mathematics in Science and Technology. For instance, 
like the set of rational numbers is a good enough for- 
mal frame for the shop’s bill, the three-dimensional real 
space is a formal frame for 3D Euclidean geometry, the 
four-dimensional Riemann space is that for Relativity 
theory, and the infinite-dimensional Hilbert space is the 
frame for quantum physics. Mathematical models add 
to the study of reality the gift consisting in the possibil- 
ity of systematically applying to its analysis the safest 
form of reasoning, that is, formal deduction. 

When the linguistic terms translating the corre- 
sponding concepts are precise and, at least in principle, 
all the information that eventually can be needed is 
supposed to be available, like it is the case in the 
classical propositional calculus, the frame of Boolean 
algebras seems to be well suitable for representing these 
terms [16.3]. If the linguistic terms designating the 
involved basic concepts are precise but not all the infor- 
mation is always available, like it happens, for instance, 
in quantum physics, weaker structures than Boolean al- 
gebras, like orthomodular lattices are, could be taken 
into account. If there are involved essentially imprecise 
linguistic terms, like it happens in commonsense rea- 
soning, then the so-called algebras of fuzzy sets seem 
to be suitable once the linguistic terms are well enough 
designed by fuzzy sets [16.15, 16]. 


253 


€°OL| a Hed 


254 PartB 


Fuzzy Logic 


1'94 | 9 Hed 


The problem of selecting a convenient frame of 
representation for ordinary reasoning, defined by a min- 
imal number of axioms, is a crucial point that should 
be established in agreement with both the methodologi- 
cal principle the Occam’s Razor states by Not introduce 
more entities than those strictly necessary, and with 
the Menger’s addenda, Nor less than those with which 
some interesting results can be obtained [16.17]. Such 
methodological principle and addenda are, of course, 
taken into account in what follows since, on the con- 
trary, almost nothing could be added to which philoso- 
phers said. Two important features that Menger’s exact 
thinking [16.6] offers through mathematical models are 
that it is always clear in it under which presuppositions 
the obtained results can hold, that deductive (safe) rea- 
soning can be extensively used through a mathematical 
symbolism translating the basic treats of the subject, 
and that what is not yet included in it has a possibility 
of, at least, being clearly situated outside the model and, 
perhaps, latter on included in a new and larger model. 
This is what, at the end, happened with the old Euclid’s 
Geometry and Linear Algebra. 


16.3.2 A Remark on Medical Reasoning 


The field of medicine [16.18], that is full of imprecise 
technical concepts, is one in which the ordinary rea- 
soning used in it, deserves a careful consideration. In 
particular, it is important to know if a medical con- 
cept can or cannot be specified by a classical set, in 
the negative case, since it is not possible to conduct 
the corresponding reasoning in the classical Boolean 
frame. For instance, if two concepts D and B, are in- 
terpreted as fuzzy sets, the statement ((D and B) or 
(D and not B)) is only equivalent to D in some stan- 
dard algebras of fuzzy sets in which no law of duality 
holds [16.19]. 


16.4 Reasoning and Logic 


Let us stop for a while at the question Which is the rela- 
tionship between reasoning and logic? requiring to first 
stop at the concept of logic, classically understood as 
the formal study of the laws of reasoning and that, in 
addition and modernly, is basically understood by re- 
stricting reasoning to deduction. Today’s logic is indeed 
the study of systems allowing the safest type of reason- 
ing under which, and from a consistent set of premises, 
a consistent set of necessary conclusions, or logical 


In the typically clinical reasoning for diagnosing, 
there are many technical concepts that cannot be con- 
sidered precise by being clearly subjected to degrees. 
It is hence the concept of observed diabetes that can 
be submitted to a (empirical) Sorites’ process [16.20] 
to conclude that it cannot be represented by a classical 
set. For instance, a patient with 100 mg/dl of glucose in 
blood does not suffer diabetes, as well as with 101, 102, 
..., and up to 120 mg/dl, in which moment the patient 
could be diagnosed with diabetes. Nevertheless, since 
the crisp mark 120 is not liable in all cases by being 
a somehow changing experimental threshold, it is bet- 
ter to frame the diagnose in the setting of fuzzy logic 
and by also taking into account the weight, the kind of 
job, the age, as well as the usual alimentation of the 
patient. That is, the physicians do reason on the basis 
of the complex imprecise predicate that could be writ- 
ten by: (Diabetes/Patient) (p) ~ glucose in blood (p) & 
weight (p) & job (p) & age (p) & alimentation (p), at its 
turn composed by elemental predicates, some of which 
are also imprecise. 

The physicians cannot reason by only taking into 
account the amount of glucose in blood, and under the 
typical schemes of classical reasoning. Once the current 
medical concepts are translated into fuzzy terms, it is 
necessary to follow the reasoning under those schemes 
allowed in a suitable algebra of fuzzy sets [16.19], and 
once it is designed accordingly with the context where 
the concepts are inscribed in. 

In addition, since what the processes conducting to 
diagnose try to find is a good enough hypothesis match- 
ing with the symptoms of the presumed illness, it is rel- 
evant for the researchers on medical reasoning to know 
about inductive abduction and speculation [16.18], and, 
mainly, for what respects to applying CWW. For all that 
the laws holding in the framework taken for represent- 
ing the involved fuzzy terms is relevant. 


consequences, is derived in a step-by-step ruled pro- 
cess that can be fully reproduced by another person who 
masters the use of rules. Hence, nowadays logic con- 
sists in the study of the so-called deductive systems, and 
it mainly avoids other types of reasoning like those by 
analogy, by abduction, and by induction. After Tarski 
formalized deductive systems by means of consequence 
operators [16.21], a logic is defined, in mathematical 
terms, as a pair consisting in a set of statements and 


An Algebraic Model of Reasoning to Support Zadeh's CWW | 16.5 Possible Scheme for an Algebraic Model of Commonsense Reasoning 255 


a consequence operator that can be applied to some (se- 
lected) subset of these statements, and once all that is 
represented in a formal frame. 

Nevertheless, it seems that in commonsense, every- 
day, or ordinary reasoning, people only make deduc- 
tions in, at most, a 25% of the cases [16.22], and that 
many conclusions are just reached either by the help of 
analogy from a precedent and similar case, or just by 
speculating at each case accordingly with some rules 
of thumb. In addition, some properties and schemes of 
reasoning that were classically considered like laws of 
reasoning, today cannot be seen as universally valid as 
it is with the distributive law in the reasoning of quan- 
tum physics, and with the several schemes of reasoning 
with fuzzy sets studied in [16.4] in the line of analyz- 
ing what is sometimes known as the preservation of the 
Aristotelian form. 

Consequently, the analysis of more than 75% of 
nondeductive reasoning processes should be considered 
of an upmost importance for a more complete study 
of reasoning. In some sense, such study began with 
the work of Peirce [16.23] for understanding scientific 


thinking, and latter on was continued with the studies 
on nonmonotonic reasoning in the field of Artificial In- 
telligence for helping to mechanize some ordinary ways 
of reasoning [16.24,25]. Since Computing with Words 
deals with ordinary reasoning in natural language, it 
seems obvious that the nondeductive ways of reason- 
ing should be not only in the back of CWW, but also 
taken into account in it. 

Toward a formal, algebraic, study of such 75% is 
mainly devoted to this chapter that, essentially, can be 
seen as a trial for enlarging logic from formal deductive 
to everyday reasoning with, perhaps, a kind of return- 
ing to Middle Age’s logic, as it can be considered from 
the Occam’s saying [16.26] that demonstration is the 
noblest part of logic, reflecting that logic was seen in 
that time as more than the study of deduction, even if it 
is considered a crucial form of reasoning. In the model 
presented in this chapter, based on conjectures (a term 
coming from [16.27], deduction, as a modality of rea- 
soning, plays a central role both in a weak and a strong 
form, respectively, corresponding to the formal, and to 
the ordinary ways of reasoning. 


16.5 A Possible Scheme for an Algebraic Model 


of Commonsense Reasoning 


It does not seem that the term deduction can refer to 
the same concept in formal and in ordinary reasoning, 
since in the first it appears more strict than in the second 
where, for instance, the conclusions (consequences) are 
not necessarily admissible like the premises. Think, 
for instance, on what basis philosophers consider is 
a deduction, and on what mathematicians refer to by 
a proof. Anyway, in both cases it should try to re- 
flect a safe enough kind of reasoning, in the sense of 
attributing to the conclusions no less confidence than 
that attributed to the items of initial information. These 
items should also be admissible in the sense of being as 
safe as possible knowledge on some subject. The good 
quality of initial information is actually important in 
any process of reasoning. 

Mathematics is considered the paradigm of deduc- 
tive reasoning, but it does not mean (as it is remarked 
in Sect. 16.3) that mathematicians only reason de- 
ductively since when they search for something new 
they do reason like other people do, often by do- 
ing jumps from the initial information to unwarranted 
conclusions [16.22]. What is not at all accepted in math- 
ematics are contradictions, and nondeductive proofs. 


If for mathematicians the mathematical model, based 
on the admissibility of its axioms, is their reality, then 
for applied scientists or for engineers a model is just 
a representation of some reality. For instance, no en- 
gineer confuses the actual working of a machine with 
a dynamic model of it and, when launching a rocket, 
it is well known that the so-called nominal trajec- 
tory (computed from a mathematical model), is not 
exactly coincidental with the actual one, and the perfor- 
mance of the rocket’s propulsion system is measured 
by taking into account the difference between these 
trajectories. In commonsense reasoning (CR) the situa- 
tion yet shows currently sensible differences with these 
cases. 

Some characteristics separating commonsense from 
formal deductive reasoning are as follows: 


a) CR does not consist in a single type of reasoning, 
but in several. A reasoning in CR can be schema- 
tized by P F q, where P is a set of admissible items 
of information, q is a conclusion under the consid- 
ered type of reasoning, and the symbol F reflects the 
corresponding reasoning’s process. Only in the case 


S°OL| g Hed 


256 PartB 


S°9OL| d Hed 


Fuzzy Logic 


b) 


c) 


d) 


e) 


of a deductive reasoning these processes are done 
under a strict regulation. 

Alternatively, if AC(P) reflects the set of attain- 
able conclusions, the scheme can be changed to 
q € AC(P). 

Often the items of information from which CR starts 
are expressed in natural language, with precise and 
imprecise linguistic terms, numbers, functions, pic- 
tures, etc. In addition and also often, such items of 
information are partial and/or partially liable with 
respect to the reality they are concerned with. In 
what follows, it will be supposed that no ambiguous 
terms are contained in these items, and that they are 
expressed by linguistic statements. 

Often CR lacks monotony. That is, when the num- 
ber of initial information items increases, then either 
the number of conclusive items decreases (anti- 
monotony), or there is no law for its variation (non 
monotony). Deduction is always monotonic, that is, 
no less conclusive items are obtained when the num- 
ber of items of initial information grows. 

It is typical of CR to jump from the initial in- 
formation to some conclusions, that is, that no 
step-by-step/element-after-element way can be fol- 
lowed. Jumping is never the case in formal de- 
duction, where the conclusions should be deployed 
from the initial items of information in a strict 
step-by-step manner and under previously known 
rules, even if the current reasoner avoids some triv- 
ial steps that, nevertheless, always can be easily 
recovered. 

In CR, people try to obtain either explanations, 
or refutations, or what is hidden in the given in- 
formation, or new ideas lucubrate from what it is 
supposedly known (the initial information). Only 
the third of these kinds of conclusions are typical 
of deduction. 

A minimal limitation in CR is that of keeping some 
kind of consistency among the given items of initial 
information, like it is not containing two contra- 
dictory such items, and also between them and the 
conclusions. 


Let us denote by P an accepted set of items of admis- 


sible initial information (premises), and by AC(P) the 
attainable conclusions under one of the last four types 
of CR in (e). It will be supposed that P is in some des- 
ignated family F of sets able to consistently describe 
something, and that AC(P) is in a larger family C such 
that F C C. Hence, AC can be seen as a mapping AC: 
F — C. The sets in F are supposed to contain items of 


admissible information, but this is not the case for those 
sets in C — F. Then: 


a) 


b) 


c) 


To do an analysis of CR in mathematical terms, an 
algebraic frame for representing all that is involved 
in CR should be selected in a way of not introducing 
more objects and laws than those strictly necessary 
at each case like they are, for instance, a symbolic 
representation of the linguistic connectives and, or, 
not, and If/Then. The symbols that will be used in 
this chapter are ., +,’, and <, of which the first two 
are binary operations, the third is a unary operation, 
and the fourth is a binary relation. 

Basic in all kind of reasoning is the concept of con- 
sistency even if it is not a unique way of seeing it. 
Three possible definitions of consistency are as fol- 
lows: 


@ Consistency is identified with noncontradiction: 
If p, then it is never not-g, symbolically repre- 
sented by p <q’. 

@ Consistency is identified with joint noncontra- 
diction: If (p and q), then it is never not-(p and 
q), symbolically represented by p.q É (p.q)’. 

@ Consistency is identified with incompatibility: It 
is never (p and q), symbolically represented by 
p.q = 0, provided there exist a symbol 0 like in 
set theory is the empty set Ø. 


At each case a suitable definition of consistency 
should be chosen accordingly with the correspond- 
ing context but, in what follows is just taken the 
first one. Of course, no other and less formal ways 
of seeing at consistency should be excluded and, in 
any case, the concept of consistency between pairs 
of elements should be extended to sets of premises 
and sets of conclusions to make them consisting, 
respectively, in admissible premises and attainable 
conclusions. 

Notice that in a Boolean algebra, since it is p.q = 
0&p<q & p.q< (p.q)’, the former definitions 
are equivalent, and, for this reason in the rea- 
soning with precise linguistic terms there is no 
discussion for what refers to the concept of consis- 
tency. 

p is contradictory with q, provided If p, then not- 
q, and p is self-contradictory when If p, then 
not-p. Notice that in ortholattices the only self- 
contradictory element is p = 0, and that with fuzzy 
sets endowed with the negation l-id, the self- 
contradictory fuzzy sets are those such that A < 
A' =1—A $ A(x) <}, for all x € X. 


An Algebraic Model of Reasoning to Support Zadeh's CWW | 16.5 Possible Scheme for an Algebraic Model of Commonsense Reasoning 257 


d) If P C AC(P), for all P in F, it is said that AC is ex- 
tensive, and AC(P) necessarily contains some items 
of admissible information. 

e) If P CQ, both in F, then if AC(P) C AC(Q), it is 
said that AC is monotonic. If AC(Q) C AC(P), an- 
timonotonic, and if AC is neither monotonic, nor 
antimonotonic, it is said that AC is nonmonotonic 
and there can exist cases in which, being P C Q, 
AC(P) and AC(Q) are not comparable under the set- 
inclusion C. 

f) Provided q represents a statement, and q’ represents 
its negation not-q: If q is in AC(P), then q’ is not in 
AC(P), or q’ is in AC(P)‘, it is said that AC is con- 
sistent in P. AC is just consistent if it is consistent 
in all Pin F. 

g) AC is said to be a closure, if for all PEF it is 
AC(P) € F, and AC(AC(P)) = AC?(P) = AC(P). 

h) AC(P) € C—F means that not all the elements in 
AC(P) show the characteristics that make admissi- 
ble those items in F. 


Main Definitions 

1) A mapping AC: F—C is said to be a weak- 
deduction operator [16.28] if it is monotonic, and 
consistent under a suitable definition. 

2) A mapping AC: F-—F is said to be a strong- 
deduction operator, or a Tarski’s logical conse- 
quence operator [16.21], if it is a weak deduction 
one that is also extensive, and is a closure. 


If AC is a weak-deduction operator, the elements 
in AC(P) are called weak consequences of P. If AC 
is a Tarski’s operator, the elements in AC(P) are the 
strong or logical consequences of P. Since logicians 
universally consider that Tarski’s operators translate 
the characteristics of formal deductive systems, or for- 
mal deduction, it will be here considered that weak- 
deduction operators translate those of (some kind of) 
commonsense deduction. 


Remarks 16.1 


a) Notice that, in the model, F represents the family of 
those sets whose elements are accepted as items of 
admissible initial information, and that such admis- 
sibility, once translated into a suitable definition of 
consistency, should be defined at each case. Hence, 
each time F should be conveniently chosen. For in- 
stance, a possible definition is: F is the family of 
those P for which there are no p and q in it and such 
that If p, then not-p. 


b) At each case, it should be defined to which ground- 
set W the sets F and C are included in, and W should 
be endowed with operations able to represent all 
that is necessary for the formalization of CR. For 
instance, it will be supposed that there is a binary re- 
lation < in W such that p < q translates into W the 
linguistic statement Jf p, then q. Analogously, and 
in the same vein in which ’ represents the linguis- 
tic not, there should be binary operations ., and +, 
representing, respectively, the linguistics and, and 
or. 

c) Basic in CR is the idea of conjecture, and that of 
refutation. Once a weak or strong consequence op- 
erator AC is adopted, the refutations of P could 
be defined as those elements r € W, such that 7’ € 
AC(P), that is those whose negation is deducible 
from P. At its turn, the conjectures from P can be de- 
fined as those elements q € W such that g’ ¢ AC(P), 
that is, those whose negation is not deducible from 
P. In this sense, the conjectures are the elements that 
are not (deductive) refutations of P. Both concepts 
could be more precisely named AC-refutations, and 
AC-conjectures. 


Since both precise and imprecise linguistic terms 
are usually managed in CR, in what follows W will 
be the set of all fuzzy sets in a universe of dis- 
course X, that is, W = [0, 1]*, the set of all functions 
A:X— [0,1]. This set will be endowed with the al- 
gebraic structure of a Basic Fuzzy Algebra, where 
the restriction of its operations to {0,1}* makes this 
set a Boolean algebra, isomorphic to the power set 
2* endowed with the classical set-operations of in- 
tersection, union, and complement of subsets. Crisp 
sets allow us to represent precise linguistic terms as 
it is stated by the axiom of Specification in naive set 
theory [16.29], an axiom that cannot be immediately 
extended to imprecise predicates since, for instance, 
they are not always represented by a single fuzzy 
set. 


Definition 16.1 [16.1] 

If .and + are binary operations and ’ is a unary one, 
then ((0, 1}*,., +, ^) is a Basic Fuzzy Algebra (BFA) 
provided it holds, 


1. Ap.A = A.Aọ = Ap, A} .A = A.A; =A, Ap tA =A ++ 
Ap =A,A+A; =A, +A=A, 

2. IfA < B, then C.A < C.B, A.C < B.C,C+A<C+ 
B,A+C<B+C,and B’ <A’. 

3. Aj =A, and Aj = Ao. 


S°OL| g Hed 


258 PartB | Fuzzy Logic 


S'94 | d Hed 


a) 


b) 


c) 


d) 


e) 


If A,B € {0,1}, then A.B = min(A, B), A+ B= 
max(A, B), and A’ = 1 —A, 

where Apo is the function Aọ(x) = 0, A, is the func- 
tion A; (x) = 1, and 1 —A is (1—A)(x) = 1— A(x), 
for all x € X. Obviously, in 2*, Ao represents the 
empty set Ø, A; the ground set X, and 1—A the com- 
plement A° of A. 

Of course, it is A < B if and only if A(x) < B(x), for 
all x in X, a partial order that with crisp sets reduces 
toACB. 


Notice that: 


The formal connectives., +, and’, in a BFA are nei- 

ther presumed to be functionally expressible, nor 

associative, nor commutative, nor distributive, nor 
dual, etc. 

Only if . = min, and + = max, the BFA is a lattice 

that, if the negation ’ is a strong one (A” = A, for all 

A), is a De Morgan—Kleene algebra. Hence, no BFA 

is a Boolean algebra, and not even an ortholattice. 

It is not difficult to prove [16.28] that it is always 

A.B < min(A, B) < max(A, B) < A + B. Of course, 

the standard algebras of fuzzy sets are particular 

BFAs. 

It is also easy to prove that: 

d.1) In a BFA with + = max, it holds the first law 
of semiduality: A’ + B’ < (A.B)’, regardless of 
which are . and’. 

d.2) In a BFA with . = min, it holds the second law 
of semiduality: (A + B)’ < A’.B’, regardless of 
which are + and’. 

d.3) Regardless of ’, in a BFA with . = min, and 
+ = max, both semiduality laws hold. 

Obviously, all standard algebras of fuzzy 

sets [16.30] (those in which . is decomposed 

by a continuous t-norm, + by a continuous t- 

conorm, and ’ by a strong negation function) are 

BFAs. 


Since BFAs are defined by just a few axioms in prin- 


ciple only allowing very simple calculations, what can 
be proven in their framework has a very general validity 
that is not modified by the addition of new independent 
axioms. One of the weaknesses of Boolean and De Mor- 
gan algebras, as well as of orthomodular lattices, for 
representing CR, just lies in the big amount of laws they 
enjoy and make them too rigid to afford the flexibility 
natural language and CR show in front of any artificially 
constructed language, and of formal reasoning. 


Remarks 16.2 
1. 


For what concerns the representation of a linguis- 
tic predicate L in a universe of discourse X by 
a fuzzy set, it is of an actual interest to reflect 
on what can mean the values Az (x), for x € X and 
Az : X — [0, 1], the membership function of a fuzzy 
set labeled L. Just the expression fuzzy set labeled 
L, forces that the membership function Az should 
translate something closely related to L, namely , to 
the meaning of L in X. It is not clear at all that all 
predicates can be represented by a function taking 
its values in the totally ordered unit interval of the 
real line: It should be added to the involved predi- 
cate the possibility of some numerical quantification 
of its meaning. 

To well linguistically manage a numerically quan- 
tifiable predicate L in X it should, at least, be 
recognized when it is, or it is not, the case that x 
is less P than y, a linguistic (empirical and percep- 
tively captured) relationship that can be translated 
into a binary relation x <z y, with <¿C X x X. This 
relation reflects how the amount of L varies on X, 
and once the pair (X, <,) is known, a measure of 
the extent up to which each x € X, is L, is a mapping 
M_,:X — [0,1] such that x <z y => Mz(x) < MLO), 
and those elements x such that M(x) = 1, if ex- 
isting, can be called the prototypes of L in X. 
Analogously, those y such that Mz (y) =0 can be 
called the antiprototypes of L in X. Obviously, and 
in the same vein that there is not a single probabil- 
ity measuring a random event, there is not always 
a single measure Mz. 

If the use of the predicate is precise in X, all their 
elements x should be prototypes, or antiprototypes, 
that is Mz(x) is in {0,1}. When for some x it is 
0 < M(x) < 1, it is said that the use of L is impre- 
cise in X. Once a triplet (X, <z, Mz) is known, it is 
a quantity that can be understood as reflecting the 
meaning of L in X [16.2]. Calling M; an ideal mem- 
bership function of the fuzzy set labeled L, it can 
be said that it exists when the meaning of L in X is 
a quantity. 

Notice that each measure Mz defines a new binary 
relation given by x <m y & Mz(x) < Mi (y), obvi- 
ously verifying <C <mz, that is, the new relation 
is larger than the former that is directly drawn from 
the perceived linguistic behavior of L in X. 

It is said that M; perfectly reflects L whenever 
<z~=<w_, but, since the second is always a linear re- 
lation — for all x, y in X, it is either Mz (x) < M (y), 


An Algebraic Model of Reasoning to Support Zadeh's CWW 


16.5 Possible Scheme for an Algebraic Model of Commonsense Reasoning 


or M(x) > M: (y) —, and <z is not usually so, not 
always can be the case that Mz perfectly reflects 
L. This is one of the reasons for which, being <z 
often difficult to be completely known — for in- 
stance, if X is not finite — the designer just arrives 
to a function A, (the membership function of the 
fuzzy set labeled L) that is not usually the ideal 
membership function Mz but an approximation of 
it, obtained through the data on L that are avail- 
able to the designer. Of course, a good design is 
reached when it can be supposed that the value Sup 


{x € X; /Mz,(x) — Az (x)/} is minimized. 


From all that it comes the importance of carefully 
designing [16.15,16] the membership functions 
with which a fuzzy system is represented. Analo- 
gous comments can be made for what concerns the 
, and the axioms they verify, to 
reach an election of the BFA (0, 1]*,.,+,’) well 


connectives., +, ’ 


linked to the currently considered problem. 


2. The suitability of the wide structure of BFAs for 
representing CR comes from, for instance, the fact 
that the linguistic conjunction and is not always 
commutative specially when time intervenes, the 
laws of duality are not always valid when dealing 
with statements in natural language, the connec- 
tives’ decomposability (or functional expressibil- 
ity), is not always guaranteed, the distributive laws 


between . and + not always hold, etc. 


3. When the linguistic terms are represented in a set W 
endowed with a partial order <, and with operations 
. of conjunction, + of disjunction, and’ of negation, 
respectively, representing the linguistic connectives 
If/Then, and, or, not, if (W,., +) is a lattice (for 
all that concerns lattices, see [16.31]) at least the 


following five points do hold: 


I. ItisA< B & A.B =A & A+B = B: The con- 
ditional statement Zf A, then B should be equiva- 
lent to the statements A and B coincides with A, 


and A or B coincides with B. 


II. If and is represented by the lattice’s conjunction 
., A.B is the greatest lower bound of both A and 
B: It should be known the set of all that is below 
A (C is below A means C < A), the set of all that 
is below B, the intersection of these two sets, 
and that A.B is the greatest element in this last 


set (respect to the partial order <). 


HI. Analogously, for the case of or, if represented 
by the lattice’s disjunction +, there should be 
known the sets of elements in W that are greater 
than A, those that are greater than B, their inter- 


section, and that A + B is the lowest element in 
this last set. 
IV. A.B = B.A: The meaning of the statements A 
and B, and B and A cannot be different. 
V. A+B = B +A: The meaning of the statements 
A or B, and B or A cannot be different. 
All this shows that contrary to what usually hap- 
pens in both CR and the applications, where all the 
previous information is not only costly in search- 
ing for, in money, and almost impossible to collect 
completely, a lot of structural information on the 
reasoning’s context should be necessarily known for 
establishing the model. Something that is typical in 
formal sciences, but that in the case of CR, and also 
in many applications, produces some scepticism for 
the possibility of always taking (W,., +) as a lat- 
tice. 
To count with a representation’s lattice for CR is 
but something to be considered rare or, at least, 
limited to some cases as it can be that of repre- 
senting a formal-like type of reasoning with precise 
linguistic terms where the former five points are 
usually accepted. These are, for instance, the cases 
in Boolean algebras with A > B=A’+B=1<¢ 
A< B, and orthomodular lattices with A —> B = 
A’+A.B=1<A<B, with the respective impli- 
cation operators — translating the corresponding 
linguistic If/Then. 
With the standard algebras of fuzzy sets, a lattice 
is only reached when the connectives are given by 
the greatest t-norm min, and the lowest t-conorm 
max [16.30]. In this case, the implication func- 
tions with which itis A —> B= A, & A < B, are the 
T-residuated ones [16.30], functionally expressible 
through the numerical functions Jz (a, b) = Sup{r € 
[0, 1]; T(a,r) < b}, where T is a left-continuous 
t-norm, and that generalize the Boolean material 
conditional A > B = A’ +B since, in a complete 
Boolean algebra, it is A’ + B = Sup{C;A.C < B}. 
The fuzzy implications given by (A > B) (x, y) = 
J7(A(x), B(y)) enjoy many of the typical properties 
of Boolean algebras with the material conditional, 
and with them the standard algebra with min, max, 
and the strong negation l-id, enjoys, among these 
algebras, the biggest amount of Boolean laws and 
makes of it a very particular algebra to be used for 
extensive use in CR. 
Nevertheless, it should be remembered that what 
concerns CR, when time intervenes not always can 
coincide the meanings of the statements A and B, 


259 


S°OL| g Hed 


260 PartB 


Fuzzy Logic 


9°94 | g Hed 


and B and A, as it is the case He sneezed and came 
to bed, and He came to bed and sneezed. 

4. The BFA’s structure, based on [0, 1]*, can yet be 
made more abstract. It simply requires to con- 
sider, instead of [0,1]*, once pointwise ordered 
by “A < B & A(x) < B(x), for all x in X, a poset 
(L, <), with minimum 0 and maximum 1, endowed 
with two binary operations . and +, and a unary 
one, containing a subset Lo ({0, 1} C Lo C L) that, 
with the restrictions of the three operations ., +, 
and ’, is a Boolean algebra, and verifying anal- 


ogous laws to the former in | to 4. These alge- 
braic structures are called [16.32] Formal Basic 
Flexible Algebras, and they are a shell compris- 
ing ortholattices, De Morgan algebras, BFAs and, 
of course, orthomodular lattices, Boolean algebras, 
and standard algebras of fuzzy sets, as particular 
cases. By taking (W,., +, ’) as a Formal Flexi- 
ble Algebra, what follows can be generalized, with 
a few restrictions, to such abstract and general 
shell. 


16.6 Weak and Strong Deduction: Refutations and Conjectures in a BFA 


(with a Few Restrictions) 


Let ((0, 1}*, ., +, ^) be a BFA whose negation’ is a weak 
one, that is, restricted to verify the law A < A” for all 
A € [0, 1]*, and whose conjunction . is associative, and 
commutative. No other properties are presumed and, 
hence, what follows contains by large the case with 
a standard algebra of fuzzy sets, and what can be ob- 
viously restricted to the Boolean algebra of crisp sets in 
{0, 1%. 

Let us consider as the former family F of admis- 
sible premises, the F(.) comprising the finite sets (of 
premises) P = {A;,...,A,}, such that their conjunction 
Ap = Å; ... An is not self-contradictory, that is, Ap £ 
Aj. Of course, Ap £ Ap implies A; £ Ay, for all Aj, Aj 
in P, and, obviously, it should be Ap 4 Ag. Hence, sets 
P € F(.) neither contain contradictory premises, nor the 
empty set Ag. Notice that associativity and commuta- 
tivity of the conjunction are presumed just to warrant 
a nonambiguous definition of Ap, and the restriction on 
the negation ’ is just to allow some step in a proof. 
Under these conditions, the operator defined [16.28] 
by 


C.(P) = {B € [0, 1]“; Ap < B} , 


translating into the BFA the statement If A} and Az and 
...and A,, then B, or B follows from Ap in the order <, 
verifies the following: 


a) Usually, C.(P) is not in F(.), for instance, it is not 
always finite. Hence, in general it has no sense to 
reapply the operator C. to C.(P). That is, the opera- 
tor C.? cannot be usually defined, and less again to 
make C. a closure. 

b) Since Ap < A;, 1 <i<n, it is PC C.(P):C. is ex- 
tensive. 


c) 


d) 


e) 


8) 


h) 


i) 


If P C Q, with Q = PU {A,41,...,Am}, and since 
Ap <A,...An-An+ti..-Am, then it follows C.(P) C 
C.(Q) : C. is monotonic. 

If B € C.(P), it is not B’ € C.(P) : Ap < B and Ap < 
B’ > B < B” < Ap, and it follows the absurd Ap < 
A}:C. is consistent in all P € F(.). 

Obviously, A; € C.(P), but Ao ¢ C.(P). Hence, 
C.(P) Æ ø. Analogously, C.(P) cannot coincide 
with the full set [0, 1]* since it will imply the ab- 
surd Ap = Ao. 

Consequently, all operators C. are consistent and 
extensive weak-deduction operators. 


In addition, 


If C.(P) is not finite, then it is obviously Inf C.(P) = 
Ap. 

No contradictory elements are in C.(P): If B,C € 
C.(P) and it were B < C’, from Ap < B, Ap < C, it 
follows B’ < A; and C’ < Aj, and the absurd Ap < 


B<C' < Ab. 
If . <.2, that is, the operation .; is weaker than 
the operation .2, it is obviously Aj.) ....1An < 


Aj.2....2A,, and hence C.2(P) C C.;(P). That is, 
the bigger the operation ., the smaller the set C. 
(P). Consequently, if it can be selected the opera- 
tion min, it is Cmin (P) C C.(P), for all operations., 
and all P € F(min): Cmin is the smallest among the 
operators C.. 

Notice, that it is always F(min) C F(.): the family 
with min is the smallest among those of admissible 
premises. 

Provided C.(P) is a finite set, and since C. is 
extensive, it is C.(P) = PU{A,41,...,Am}, and 
then Ac.p) =A... An An+1 apg Ap.An+1 


An Algebraic Model of Reasoning to Support Zadeh's CWW | 16.6 Weak and Strong Deduction: Refutations and Conjectures in a BFA 261 


i) 


...Am <Ap. Thus, if .= min, Ac py = min(Ap, 
min(A;,...,Am)) =Ap, that means Cmin(P) € 
F(min), and has sense to reapply Cmin. Since 
P C Cmin(P) > Cmin(P) C Cmin(Cmin(P), and, 
if A € Cmin(Cmin(P)), then Acmincp) = Ap < A, 
it is A € Cmin(P), and from Cmin(Cmin(P)) C 
Cmin(P), it finally follows Cmin?(P) = Cmin(P). 
In conclusion, provided all the involved sets 
Cmin (P), with P € F(min), were finite, it is 
Cmin: F(min) > F(min), and Cmin will be 
a strong, or Tarski’s, consequence operator. 
Provided the family F of sets of premises is made 
free of only containing finite sets, since it always 
exist Inf and, bounding P to verify Inf P £ (Inf P)’, 
for all P € F, the operator 


Coo(P) = {B € [0, 1]*; Inf P < B} , 


obviously verifying Inf Coo(P)=InfP, is not 
only extensive, consistent, and monotonic, but it 
is also a closure: From Coo(P)€F, and PC 
Coo(P), it follows Coo(P) C Coo(Coo(P)), but if 
BeECoo(Coo(P)), that is, Inf Coo(P) <B, from 
Inf Coo(P) = Inf P, follows B € Coo(P). Finally, 
Coo (Coo (P)) = Coo(P). 

Coo, restricted to finite sets is just Cmin, and re- 
stricted to all the crisp sets in {0, 1}*, is the con- 
sequence operator on which classical propositional 
calculus is developed [16.3]. 


Definition 16.2 
Given a weak-deduction operator C. [16.28]: 


1. 


2: 


The set of C.-refutations of P, is Ref.(P) = {B € 
[0, 1]*; B’ € C.(P)}. 

The set of C.-conjectures of P, is Conj.(P) = {B € 
[0, 1]}*; B’ € C.(P)*}. 

Namely, refutations are those fuzzy sets whose 
negation is weakly deducible from the premises, 
and conjectures those whose negation is not weakly 
deducible from them. Obviously, Conj.(P) = 
Ref.(P)°. 

Notice that it immediately follows that all oper- 
ators Ref. are consistent, and monotonic, but not 
extensive, and that all operators Conj. are extensive, 
antimonotonic, not consistent, and consequently it 
cannot be stated that Conj.(P) is always in F(.). It 
is Conj.: F(.) > [0, 1]*:, and not all conjectures can 
be taken as items of admissible information. 

It is also immediate that Ref.(P) UConj.(P) = 
[0, 1]*, and Ref.(P) NConj.(P) = Ø, that is, both sets 


constitute a partition of the set of all fuzzy sets in 

X, and Conj.(P) = Ref.(P)°. Hence, the conjectures 

are those fuzzy sets that are nonrefutable in front of 

the information furnished by P. 

Since, C min(P) C C.(P), it follows that: 

@ Refmin(P) C Ref.(P): Refmin is the smallest 
among refutation operators. 

@ Conj.(P) C Conj min(P): Conjmin is the big- 
gest among conjecture operators. Namely, 


P C C min(P) C C.(P) 
C Conj.(P) C Conj min(P), 


a chain of inclusions showing that both weak 
and strong consequences are but a particular 
type of conjectures. Consequently, in the model 
deducing is but one of the forms of conjecturing 
as it is asked for in [16.27]. 


Remarks 16.3 
1. 


Only if itis . = min, it holds: Ref min (C min(P)) = 
Refmin(P), and Conjmin (Cmin(P) = Conj 
min(P), showing that strong consequences nei- 
ther allow to obtain more refutations, nor more 
conjectures. 

Since Ao € Ref.(P), it is Ref. (P) 4 Ø. Nevertheless, 
Ref.(P) cannot coincide with the full set [0, 1]*, 
since it will imply A = Apo. 

The sets Conj.(P) cannot be empty since it will 
imply C.(P) = [0,1]*. On the other side, it is 
Conj.(P) = [0, 1}* © C.(P) =ø. Hence, it is al- 
ways, 


Ø t Conj.(P) Z [0, 1]}* . 


In this model, the empty set Ap cannot be taken for 
either conjecturing, or deducing, or refuting. 

The particularization of the concept of conjecture 
to crisp sets, that is, to the fuzzy sets in {0, 1}* Cc 
[0, 1]}*, reduces to take as F(min) the set of those 
crisp sets that are nonempty, since with crisp sets it 
is A CAS & A = Ø, and thus Conj.(P) is the set of 
those B C X, such that AN B Æ Ø, since with crisp 
sets it is A CBS & AN B = Ø. 

With classical sets, that is, in Boolean algebras, 
there is no distinction between contradiction, and 
incompatibility. 

After the classical definition: B is decidable < ei- 
ther B is provable, or not B is provable, it can be 
defined the set of .-weakly decidable elements for 


9°94 | 4 Hed 


262 


2OL| d Hed 


Part B | Fuzzy Logic 
P by C.(P) URef.(P), and the set of strongly de- be essayed by posing à la Popper [16.13], 
cidable elements for P by C min(P) U Ref min(P). 
Obviously, and for all operations . , strongly decid- CR.(P) = Ref.(P) U Conj.(P) , 
able elements are .-weakly decidable ones, but not 
reciprocally [16.32]. once the operations., and ’ are selected, and C.(P) 
The nonstrongly decidable elements are those fuzzy defined with, at least, A Æ Ap. Anyway, this defi- 
sets in the set nition gives nothing else than CR.(P) = Conj.(P) U 
Conj.(P)° = Ref.(P) U Ref.(P)°, with which the for- 
(C min(P)URef min(P))° mal model for CR appears as nothing else than 
= Cmin(P)° N Ref min(P)* either conjecturing, or refuting once .-weak con- 
2 z z poaa sequences are taken as the basic concept toward 
= Crain NConjmin(E);, formalizing CR. 
that is, they are the conjectures that are not strong Thus, a new concept of strong-CR, can be intro- 
consequences. Analogously, given P, the .-weakly duced by 
nondecidable elements are those fuzzy sets that are . , o 
.-weak conjectures, but not .-weak consequences. CR min(P) = Ref min(P) U Conj min(P) , 
Consequently, to obtain a classification of the dif- i i 
ference sets Conj.(P) —C.(P), and Conj min(P) — and in both cases deduction (weak and strong, re- 
C min(P), is actually important. spectively), is a type of conjecturing [16.27]. i 
. Since Ref. is monotonic and consistent, it could be 8. The imposed PORTAL and associativity of 
alternatively taken for defining conjectures [16.28] the conjincuon can be avoided by previously Hix- 
in the parallel form ing an algorithm to define Ap. For instance, if P 
contains four premises, then the algorithm’s steps 
Conj * .(P) = {B € 0, 1*; B’ ¢ Ref.(P)} = C.(P)° , can be the following: 1) Select an order for the 
, ; premises and call them A4, . . . , A4. 2) Define Ap by 
where conjectures appear as just those fuzzy sets A,,(Ao,(A3.A4)). Then, of course, all that has been 
that are not weak consequences instead of those formerly said depends on the way chosen to de- 
that are not refutations. Under this new definition, it fine Ap. 
is Ref.(P) C Conj = -(P). Notice that tefutations B For what refers to the restriction on the negation ’, 
can be defined without directly referring to C., by notice that it is already verified in the cases (usual 
í : in fuzzy logic) where it is strong: A” = A, for all 
Ref.(P) = {B € [0, 1}" ;Ap < B}. A € [0, 1]*, as it is the case in the standard algebras 
: : ae of fuzzy sets. 
. With all that, a tentative formal definition for 
a model of (Commonsense) Reasoning (CR) could 
16.7 Toward a Classification of Conjectures 
Provided it is Conj.(P) — C.(P) Æ Ø, it is clear that the with the symbol ne shortening not <-comparable with. 
left-hand difference set is equal to Notice that the second set in this union contains the 
{B € Conj.(P):B < Ap} U{B € Conj.(P): fuzzy sets B being neither empty, nor contradictory with 


Ap, and for which Zf B, then Ap. Consequently, it can be 
Ap is not < —comparable with B} . said that these fuzzy sets B explain Ap, or P, and, of 
That is course, they also explain any .-weak consequence of P: 
i If Ap < C, it follows B < C. Let us denote this set of 
Conj.(P) = C.(P) U {B € [0, 1 ; conjectures by Hyp.(P), and call it the set of explicative 
Ap É B' & Ao < B < Ap} conjectures or, for short, hypotheses for P. If C.(P) E€ 
F(.), it is clear that Hyp.(C.(P)) = Hyp.(P), as it also 

U {B € [0, 1]; Ap £ BY & Ap ne B} , happens with the strongest conjunction . = min. 


An Algebraic Model of Reasoning to Support Zadeh's CWW 


16.7 Toward a Classification of Conjectures 


For what concerns the third set in the last union, call 
it Sp.(P), it is decomposable in the disjoint union 


{B € [0, 1]}*; Ap < BY & Ap ne B} U {B € [0, 1]* ; 
Ap ne B’ & ApncB} , 


whose elements will be called speculative conjectures 
or, for short, speculations. Let us, respectively, denote 
by Sp.ı(P) and Sp.2(P), the first and the second set 
in the decomposition. The elements in Sp.;(P) will be 
called type-i speculations (i = 1, 2). 

It should be pointed out that the symbol ne shows 
the jumps cited before, and that these jumps affect both 
types of speculations but, specially, those in the type-2. 
In the case in which B € Sp .ı (P), since Ap > B is equiv- 
alent to A} < B, B could be captured by going forward 
from Aj, but if B € Sp .2(P) a jump from either Ap, or 
from Ab, is necessary to reach B. 

It should also be pointed out that weak conse- 
quences are reached by moving forward from Ap, that 
hypotheses are reached by moving backward from Ap, 
but that for speculations a jump forward from either 
Ap, Or Ap, is required. It is clear, in addition, that 
Conj.(P) = Conj.({Ap}), C.(P) = C.({Ap}), Hyp.(P) = 
Hyp.({Ap}), and Sp. (P) = Sp.({Ap}), since {Ap} € F(.), 
and in this sense, Ap can be seen as the résumé of the 
information conveyed by P. 

With all that, the set of conjectures from P is com- 
pletely classified by the disjoint union 


Conj. (P) = C.(P) U Hyp. (P) U Sp.ı (P) U Sp.2(P) , 


since all the intersections C.(P)MHyp.(P),..., 
Sp.ı (P) N Sp.2(P), are empty. Hence, conjectures are 
either weak consequences, or hypotheses, or type-1 
speculations, or type-2 speculations. Consequently, 
and once given P € F(.), all the fuzzy sets in [0, 1]* 
are classified in refutations, .-weak consequences, 
hypotheses, speculations of type-1 and of type-2, with 
strong consequences being a part of the weak ones. It 
should be pointed out that, in some particular cases, the 
sets Hyp.(P), or Sp.(P), can be empty. 


Note 

It can be said that the above algebraic model for CR 
contains strong and weak deduction, abduction (the 
search for hypotheses), and also speculative reasoning. 


What happens with hypotheses and speculations for 
what relates to monotony? Since given two enchained 
sets of premises P C Q, both in F(.), the conjunction 


of their premises, call them respectively Ap and Bo, 
obviously verify Bg < Ap, it is Hyp.(Q) C Hyp.(P): 
the operator Hyp. : F(.) > [0, 1]*, is antimonotonic. 
Notice that Hyp. is not a consistent operator and, con- 
sequently, it cannot be supposed that Hyp.(P) always 
contains admissible information. It is risky to take a hy- 
pothesis as a new premise. 

Analogously, Sp. is also a mapping F(.) — [0, 1]*, 
that is nonconsistent, and, as it is easy to see by means 
of simple examples with crisp sets, it is neither mono- 
tonic, nor antimonotonic, nor elements in Sp.(P) can be 
always taken as admissible information. That is, since 
there is no law for the growing of Sp. with the growing 
of the premises, it can be said that speculations are pe- 
culiar among conjectures. Such peculiarity is somewhat 
clarified by what follows: If S € Sp.(P): 


a) If Sis such that Ap.S Æ Ap, it is Ap.S < Ap since if it 
were Ap.S = Ap, then follows Ap < S, or the absurd 
S € C.(P). From Ag < Ap.S < Ap, and provided Ap.S 
is a conjecture, it follows Ap.S € Hyp.(P). 

b) Since Ap < Ap + S, it is always Ap + S € C.(P). 

c) Provided the law of semiduality B’ + C’ < (B.C), 
holds in the BFA (0, 1]*,., +,’ ), as it happens with 
+ = max, and since Ap < Ap + S’ < (A},.S)’, it fol- 
lows that Aj,.S is a refutation. 


Hence, by means of speculations S, hypotheses Ap.S 
can be obtained provided Ap.S Æ Ao is a conjecture, 
Ap +S is always a weak consequence, and with semid- 
uality it is AZ.S a refutation. In this sense, speculations 
can serve as a tool for deducing, for abducing, and 
for refuting: They are auxiliary conjectures for either 
deploying what is hidden in the premises, or for refut- 
ing the premises, or to explain them. For this reason, 
to speculate is an important type of nonruled reason- 
ing, and whose mastering should be encouraged to be 
learned. 


A Remark on Heuristics 
Although the concept of heuristics is not yet for- 
malized, from last paragraphs, and under a few and 
soft constraints, speculations can be seen as auxiliary 
conjectures that intermediate for advancing reasoning. 
Since to reach a speculation S a jump from Ap should 
be taken, since there are no direct and step-by-step 
links between Ap and S, the process to arrive by the 
intermediary of S to either a consequence Ap + S, or 
a hypothesis Ap.S, or a refutation Aj,.S, is a typically 
heuristic one, perhaps obtained at each case by some 
nonstep-by-step rule of thumb. For instance, a hy- 


263 


Z'94 | d Hed 


264 PartB 


Fuzzy Logic 


8°91 | q Hed 


pothesis like H = Ap. S can be reached after some 
heuristic path conducting to S by some jumping from 
Ap that, additionally, can be done in several and differ- 
ent steps. 

At this respect, the formal characterization of those 
hypotheses that are reducible to the form Ap.C for some 
fuzzy set C € [0, 1}* is yet an open problem. Since in 
the case of orthomodular lattices and, of course, of 
Boolean algebras, it was proven that all hypotheses are 
reducible [16.1], but that in the case of nonorthomodu- 
lar ortholattices nonreducible ones should exist, given P 
it can be analogously supposed the existence in [0, 1]* 
of nonreducible hypotheses. Consequently, if existing, 
such hypotheses can be seen as isolated ones that can- 
not be reached by the intermediary of a speculation, but 
only directly by a particular heuristic backward-track 
from Ap. 

For what concerns hypotheses, and except in formal 
deductive reasoning where, in principle, they can be ei- 
ther safely accepted, or refused through deduction, in 
CR a crucial point is to know how a hypothesis can be 
deductively or inductively refuted [16.28, 33]. As it is 
well known, the idea of refuting a hypothesis is central 
in scientific research [16.13], and it can be formalized 
in the current model as follows [16.32]. 


16.8 Last Remarks 


In former papers of the author [16.28, 32, 34], the con- 
cept of a conjecture was formalized in the settings of 
ortholattices, and De Morgan algebras. In the first, and 
since Ap < Aj, implies Ap = 0, it suffices to take Ap Æ 0. 
What lacked was the case of the standard algebras of 
fuzzy sets with a t-norm and a f-conorm different, re- 
spectively, of min and max, a case that is subsumed in 
what is presented in [16.28], and now is completed in 
this chapter. 

With only crisp sets, the résumé Ap of the in- 
formation conveyed by the premises in P, obviously 
verifies Ap < B’ & Ap.B = Ay & Ap.B < (Ap.B)’, but 
this chain of equivalences fails to hold if Ap or B are 
proper fuzzy sets. Consequently, and in addition to 
C.(P) = {B; Ap < B}, it can be also considered the two 
operators [16.32], 


C.'(P) = {B; Ap.B’ = Ao}, 
C.?(P) = {B; Ap.B’ < (Ap.B’)’} , 


and 


Let us suppose that there is a doubt between which 
one of the two statements: H € Hyp.(P), and H ¢ 
Hyp.(P), is valid but knowing that H £ H’, that the pre- 
sumed hypothesis is not self-contradictory: 


a) Provided the first statement is valid, since H < Ap 
and Ap < C imply H <C, it is C.(P) C C.({A}). 
Thus, if there is D € C.(P) such that D ¢ C.({H}), H 
cannot be a hypothesis for P: To weak-deductively 
refute H as a hypothesis for P, it suffices to find 
a weak consequence of P that is not a weak con- 
sequence of {H}. 

Of course, classical (strong) deductive refutation 
corresponds to the case in which it is possible to 
take . = min. 

b) In addition, it is C.(P) C Conj.({H}). Indeed, Ap < 
B, and H < Ap, do not imply B < H’ since, in this 
case it follows H < H’, itis B ¢ Conj.(P). Thus, it is 
B £ H’, or B € Conj.({H}). Consequently, to weak- 
inductively refute H as a hypothesis for P it suffices 
to find a weak consequence of P that is not conjec- 
turable from {H}. 

All that can be, mutatis mutandis, repeated with the 
strongest conjunction . = min, and the concept of 
the strong-inductive refutation of H is reached. 


with which the corresponding conjecture operators 
could also be defined by 


Conj. (P) = {Be [0,1%;B’ ¢ CİP), i= 1,2, 


by taking consistency in the other two forms cited in 
Sect. 16.5. 

Both operators C . are not extensive, but monotonic, 
at least Cla is consistent if the negation is functionally 
expressible, it is unknown which other C | (i= 1,2) is 
consistent, C.! (P) C C.? (P), and it is not actually clear 
if in both cases it is, or it is not, C.’ (P) C Conj.’ (P) ex- 
cept when . = min [16.28]. 

Consequently, the door is open to consider alterna- 
tive definitions for the concept of a conjecture depend- 
ing on the way of defining when an element B € [0, 1]* 
is consistent with the résumé Ap. Anyway, what it seems 
actually difficult is how to imagine a kind of nonconsis- 
tent deduction. 


An Algebraic Model of Reasoning to Support Zadeh's CWW 


16.9 Conclusions 


It is well known that in CR conclusions are often 
obtained by some kind of analogy or similitude with 
a previously considered case. Without trying to com- 
pletely formalize analogical reasoning, let us introduce 
some ideas that, eventually, could conduct toward such 
formalization. 

Define B is analogous to A, if it exists a family K of 
mappings 


o : (0, 1]* — [0, 1]* ; 


such that B=ooA, for ø € K. Namely, B is K- 
analogous to A. At its turn the set of fuzzy sets that are 
analogous or similar to those in P can be defined by 


K(4p) = {B € [0, 1}"; 
B=ooAp,o E K}. 


16.9 Conclusions 


The establishment of a general framework for the sev- 
eral types of reasoning comprised in Commonsense 
Reasoning seems to be of an upmost importance. At 
least, it should be so for fuzzy researchers in the new 
field of Computing with Words (CWW) that, with a cal- 
culus able to simulate reasoning, tries to deal with 
sentences and arguments in natural language more com- 
plex than those considered in today’s current fuzzy 
logic. 

In the way toward a full development of CWW cov- 
ering as many scenarios as possible of the people’s ways 
of reasoning, it seems relevant not to forget the big 
amount of nondeductive reasoning people commonly 
do. This chapter just offers a wide framework to jointly 
consider the four modalities of deduction, abduction, 
speculation, and refutation, typical of both ordinary and 
also specialized reasoning, where deduction and refuta- 
tion are deductive modalities of reasoning, but where 
abduction and speculation can be considered its induc- 
tive modalities [16.35, 36]. 

There remain some unended questions that concern 
the proposed model and, among them, can be posed the 
two following ones: 


© The finding of rules in the Mill ’s style [16.37], 
for obtaining hypotheses and speculations from the 
premises. These rules could conduct to obtain com- 
puter programs or algorithms able, in some cases, 


Then: 


1) Ifid < o = Ap < ooAp => OOAp € C.(P) 

2) Ifo < id = coAp < Ap = ooAp € Hyp.(P) 

3) Ifid ne o = ooAp ne Ap => ooAp € Sp.(P) 

4) If id< o> Ap < o/oAp = (coAp)’ => (coAp)’ z 
Ref.(P). 


Depending on the possible ordering of the pairs (id, 
o), and (id, g’), with id the identity mapping in [0, 1]*, 
that is, id(A) = A, for all A € [0, 1]*, the fuzzy sets anal- 
ogous to Ap are either consequences or hypotheses, 
speculations, or refutations. Notice that in point [16.4] 
it is id o” equivalent to ø < id’, that is, oA < A’ for all 
fuzzy set A. 

Of course, for a further study of analogical types of 
reasoning it lacks to submit transformations ø to some 
restricting properties, surely depending on the concrete 
case under consideration. 


to find either hypotheses or speculations, and can be 
also useful for clarifying the concept of a heuristics. 

@ The study of what happens with the conjectures 
once new consistent information supplied by new 
premises, is added to the initial set P [16.38]. It 
should not to be forgot that ordinary reasoning is 
rarely made from a static initial set of items of in- 
formation, but that the information comes in a kind 
of flux under which conjectures can vary of number 
and of character. 


Like speculations facilitate heuristics for finding 
consequences, hypotheses, and refutations, analogical 
reasoning could constitute a good trick for obtaining 
conjectures and refutations, on the base of some earlier 
solved similar problems. It should be pointed out that 
what is not yet clear enough is how to compute the de- 
gree of liability an analogical conclusion could deserve. 

What is not addressed in this chapter is the theoretic 
and practical important problem of which is the best hy- 
pothesis or speculation to be selected at each particular 
case. This question seems linked with the translation 
into the conclusions of some numerical weights of 
confidence previously attributed to the premises, and 
that depend on the case into consideration. Because 
in this chapter such weights are neither considered for 
premises, nor for conclusions, the presented model can 
be qualified as a crisp one, but not yet as a fuzzy one, 


265 


6°94 | d Hed 


266 PartB 


Fuzzy Logic 


94 | d Hed 


since it fails taking into account the level of liability of 
the sentences represented in fuzzy terms. The seman- 
tics of the linguistic terms is here confined, through its 
most careful possible design [16.15, 16], to the contex- 
tual and purpose driven meaning of the involved fuzzy 
sets and fuzzy connectives, as well as to the possible lin- 
guistic interpretation of the accepted conclusions, but 
what is not yet taken into account is their degree of 
liability. 

By viewing formal theories as abstract construc- 
tions that could help us to reach a better understanding 
of a subject inscribed in some reality, this chapter rep- 
resents a formal theory of reasoning with precise and 
imprecise linguistic terms represented by fuzzy sets 
and already presented in [16.28]. It consists in a way 
of formalizing conjectures and refutations in the wide 
mathematical setting of BFAs whose axioms comprise 
the particular instances of ortholattices, De Morgan al- 
gebras, standard algebras of fuzzy sets, and, of course, 
Boolean algebras. As is shown by two of the forms 
allowing to define what weak deduction could be, de- 
pending on the kind of consistency chosen, and that, 
with crisp sets, also collapse in the classical case, this 
formalization cannot be yet considered as definitive, but 
open to further study. 

Nevertheless, this new theory should not be con- 
fused with the actual human reasoning, and only 
through some work of an experimental character on CR 
it will be possible to clarify which degree of agreement 
with the reality of reasoning either the selected way for 
defining weak deduction, or a different one, does show. 
Provided this kind of work could be done, the observa- 
tional appearance of some observed regularities in CR, 
or observed laws of the actual ordinary reasoning, and 
that can be predicted by some invariants in the model, 
is a very important topic for future research. 

Anyway, to advance toward a kind of Experimen- 
tal Science of CR, there are today no answers to some 
crucial questions like, for instance: 


@ Which regularities exist in natural language and in 
CR that, reflected by some invariants in the model, 
can be submitted to experimentation? 


References 


16.1 E. Trillas, A. Pradera, A. Alvarez: On the reducibility 
of hypotheses and consequences, Inf. Sci. 179(23), 
3957-3963 (2009) 

|. Garcia-Honrado, E. Trillas: An essay on the lin- 
guistic roots of fuzzy sets, Inf. Sci. 181, 4061-4074 


(2011) 


16.2 


@ How observation and experiments on CR could be 
systematically projected and programmed? 

@ Which quantities linked to such realities can exist in 
the model, and that could show numerical effects, 
measurable through experiments? 


One of the reasons for such ignorance could come 
from both the small number of axioms the BFA-model 
have, and from the too big number of axioms the 
standard algebras of fuzzy sets show, for saying noth- 
ing on the Boolean and De Morgan algebras. When 
the number of axioms is too small, it seems difficult 
to think in a way to find invariants, and if there are 
too many axioms then the big number of relationships 
among them seems to show too much theoretical in- 
variants, like it is, for instance, the indistinguishability 
between contradiction and incompatibility in Boolean 
algebras. 

That ignorance can also come from the lack of 
numerical quantities associated with some important 
properties concerning both natural language and CR, 
like it can be, for instance, a degree of associativity in 
the case the model is just a BFA, or a degree of liabil- 
ity of speculations for allowing to reach refutations for 
some reality described by a set P of premises. It can 
also be the lack of a standard degree of similarity (like 
the one studied in [16.11]) to numerically reflect how 
much analogous to a possible hypothesis for P can be 
some known statement in order to be taken itself as the 
hypothesis. 

Without some numerical quantities reflecting im- 
portant characteristics of, for instance, words, it is 
difficult to think on how to project experiments for de- 
tecting some regularities or invariants without which it 
seems very difficult to conduct a deep scientific study 
of CR. Possible instances of such quantities are the 
measures of both the specificity [16.39], and the fuzzi- 
ness [16.40] of imprecise predicates, and a experiment 
to be projected could be, perhaps, an analysis of how 
their values jointly vary in some context. 

Actually and currently, an Experimental Science of 
CR is just a dream. Would this chapter also serve as 
a humble bell’s peal toward such goal! 


16.3 E. Trillas, I. Garcia-Honrado: Hacia un replantea- 
miento del cálculo proposicional clásico?, Agora 
32(1), 7-25 (2013) 

16.4 E. Trillas, C. Alsina, E. Renedo: On some classical 


schemes of reasoning in fuzzy logic, New Math. Nat. 
Comput. 7(3), 433-451 (2011) 


An Algebraic Model of Reasoning to Support Zadeh's CWW 


References 


16.5 


16.6 


16.7 


16.8 


16.9 


16.10 


16.11 


16.12 


16.13 


16.14 


16.15 


16.16 


16.17 


16.18 


16.19 


16.20 


16.21 


16.22 


A. Pradera, E. Trillas: A reflection on rational- 
ity, guessing, and measuring, IPMU, Annecy (2002) 
pp. 777-784 

K. Menger: Morality, Decision and Social Organiza- 
tion (Reidel, Dordrecht 1974) 

L.A. Zadeh: Computing with Words. Principal 
Concepts and Ideas (Springer, Berlin, Heidelberg 
2012) 

F.H. Rauscher, G.L. Shaw, K.N. Ky: Listening 
to Mozart enhances spatial-temporal reasoning, 
Neurosci. Lett. 185, 44-47 (1995) 

J. Berger: The Success and Failure of Picasso (Pan- 
theon, New York 1989) 

E. Trillas, C. Moraga, S. Guadarrama, S. Cu- 
billo, E. Castiñeira: Computing with antonyms, 
Stud. Fuzziness Soft Comput. 217, 133-153 
(2007) 

E. Castiñeira, S. Cubillo, E. Trillas: On a similarity 
ratio, Proc. EUSFLAT-ESTYLF (1999) pp. 239-242 

E. Wigner: The unreasonable effectiveness of 
mathematics in the natural sciences, Commun. 
Pure Appl. Math. 13(1), 1-14 (1960) 

K. Popper: Conjectures and Refutations (Harper 
Row, New York 1968) 

E. Trillas: Reasoning: In black & white?, Proc. NAFIPS 
(2012) 

E. Trillas, S. Guadarrama: Fuzzy representations 
need a careful design, Int. J. Gen. Syst. 39(3), 329- 
346 (2010) 

E. Trillas, C. Moraga: Reasons for a careful design of 
fuzzy sets, Proc. EUSFLAT (2013), Forthcoming 

K. Menger: A counterpart of Occam's Razor, Syn- 
these 13(4), 331-349 (1961) 

A.C. Masquelet: Le Raisonnement 
(PUF/Clarendon, Oxford 2006), in French 
E. Trillas, C. Alsina, E. Renedo: On some schemes of 
reasoning in fuzzy logic, New Math. Nat. Comput. 
7(3), 433-451 (2011) 

E. Trillas, L.A. Urtubey: Towards the dissolution of 
the Sorites paradox, Appl. Soft Comput. 11(2), 1506- 
1510 (2011) 

A. Tarski: Logic, Semantics, 
(Hackett, Cambridge 1956) 

J.F. Sowa: E-mail to P. Werbos, 2011, in BISC- 
GROUP 


Médical 


Metamathematics 


16.23 


16.24 


16.25 


16.26 


16.27 


16.28 


16.29 


16.30 


16.31 


16.32 


16.33 


16.34 


16.35 


16.36 


16.37 


16.38 


16.39 


16.40 


C.S. Peirce: Deduction, induction, and hypothesis, 
Popul. Sci. Mon. 13, 470-482 (1878) 

J.F. Sowa: Knowledge Representation (Brooks/Cole, 
Farmington Hills 2000) 

R. Reiter: Nonmonotonic reasoning, Annu. Rev. 
Comput. Sci. 2, 147-186 (1987) 

W. Ockham: Summa Logica (Parker, London 2012), 
in Spanish and Latin 

W. Whewell: Novum Organum Renovatum (Sec- 
ond Part of the Philosophy of Inductive Sciences) 
(Parker, London 1858) 

E. Trillas: A model for Crisp Reasoning with fuzzy 
sets, Int. J. Intell. Syst. 27, 859-872 (2012) 

P.R. Halmos: Naive Set Theory (Van Nostrand, Am- 
sterdam 1960) 

A. Pradera, E. Trillas, E. Renedo: An overview on the 
construction of fuzzy set theories, New Math. Nat. 
Comput. 1(3), 329-358 (2005) 

G. Birkhoff: Lattice Theory (American Mathematical 
Society, New York 1967) 

|. Garcia-Honrado, E. Trillas: On an attempt to 
formalize guessing. In: Soft Computing in Human- 
ities and Social Sciences, ed. by R. Seising, V. Sanz 
(Springer, Berlin, Heidelberg 2012) pp. 237-255 

E. Trillas, S. Cubillo, E. Castiñeira: On conjectures 
in orthocomplemented lattices, Artif. Intell. 117(2), 
255-275 (2000) 

E. Trillas, D. Sánchez: Conjectures in De Morgan al- 
gebras, Proc. NAFIPS (2012) 

S. Baker: Induction and Hypotheses (Cornell Univ. 
Press, Ithaca 1957) 

B. Bossanquet: Logic or the Morphology of Knowl- 
edge, Vol. I (Clarendon, Oxford 1911) 

J.S. Mill: Sir William Hamilton's Philosophy and the 
Principal Philosophical Questions Discussed in His 
Writings (Longmans Green, London 1889) 

I. García-Honrado, A.R. de Soto, E. Trillas: Some 
(Unended) queries on conjecturing, Proc. 1st World 
Conf. Soft Comput. (2011) pp. 152-157 

L. Garmedia, R.R. Yager, E. Trillas, A. Salvador: Mea- 
sures of fuzzy sets under T-indistinguishabilities, 
IEEE Trans. Fuzzy Syst. 14(4), 568-572 (2006) 

A. De Luca, S. Termini: A definition of a nonprob- 
abilistic entropy in the setting of fuzzy sets theory, 
Inf. Control 20, 301-312 (1972) 


267 


94 | ad Hed 


Christian Moewes, Ralf Mikut, Rudolf Kruse 


Fuzzy control is by far the most successful field of 
applied fuzzy logic. This chapter discusses human- 
inspired concepts of fuzzy control. After a short 
introduction to classical control engineering, three 
types of very well known fuzzy control concepts 
are presented: Mamdani-Assilian, Takagi-Sugeno 
and fuzzy logic-based controllers. Then three real- 
world fuzzy control applications are discussed. 
The chapter ends with a conclusion and a future 
perspective. 


17.1 Knowledge-Driven Control ................... 269 
17.2 Classical Control Engineering................ 270 
17.3 Using Fuzzy Rules for Control ............... 271 


17.1 Knowledge-Driven Control 


With no doubt, the biggest achievement of fuzzy logic 
with respect to industrial and commercial applications 
has been obtained by fuzzy control. Since its first practi- 
cal use for a simple dynamic plant by Mamdani [17.1], 
over attention-getting applications such as the auto- 
matic train operation in Sendai, Japan [17.2], fuzzy 
control systems have become indispensable in the in- 
dustry today (Until today more than 60000 patents 
have been filed worldwide using the words fuzzy and 
control according to [17.3]). A wide range of real- 
world applications have also been described by Hi- 
rota [17.4], Terano and Sugeno [17.5], Precup and 
Hellendoorn [17.6]. 

Simply speaking, fuzzy control is a kind of defining 
a nonlinear table-based controller. Every entry in such 
a table can be seen as partial knowledge about the speci- 
fied input-output behavior [17.7]. However, knowledge 
does not have to exist for every input-output combina- 
tion. Thus, the transition function of a fuzzy controller 
is a typical nonlinear interpolation between defined re- 
gions of this knowledge. The knowledge is commonly 


17. Fuzzy Control 


17.3.1 Mamdani-Assilian Control ......... 271 
17.3.2 Takagi-Sugeno Control............... 274 
17.3.3 Fuzzy Logic-Based Controller...... 275 
17.4 A Glance at Some Industrial 
Applications ....................ccccccceseeeeeeeeees 276 
17.4.1 Engine Idle Speed Control .......... 276 
17.4.2 Flowing Shift-Point 
Determination .................:cce 278 


17.5 Automatic Learning 


of Fuzzy Controllers ................ eee 279 

17.5.1 Transfer Passenger Analysis 
Based om POM ia cicitsscccstencctsavses 280 
17.6 Conclusions .................cceeeeeeeeeeeeceeeeees 281 
ReEFEFENCOS........ oe eee ccc cc eeeec ee eeeceeeeeaeeeenees 281 


stored as imprecise rules consisting of imprecise terms 
such as small, big, cold, or warm. Consequently, these 
tules lead to an imprecisely defined transition func- 
tion that is eventually defuzzified if a crisp decision is 
needed. 

This procedure is sometimes advantageous when 
compared to classical control systems — especially for 
control problems that are usually solved intuitively by 
human beings, but not by computing machines, e.g., 
parking a car, riding the bike, boiling an egg [17.8]. 
This might be also a reason why fuzzy control did not 
originate from control engineering researchers. It had 
rather been inspired by Zadeh [17.9] who proposed 
rule-based systems for handling complex systems us- 
ing fuzzy sets — a concept which he introduced 8 years 
before [17.10]. 

The focus of this chapter is a profound discus- 
sion of such human-inspired concepts of fuzzy control. 
Other fuzzy control approaches based on a fuzzification 
of well-known methods of the classical control the- 
ory (fuzzy PID control, fuzzy adaptive control, stability 


269 


vV 
o 

= 

et 
[se] 
= 
N 
= 


270 PartB | Fuzzy Logic 


T'LL | d Hed 


of fuzzy systems, fuzzy sliding mode, fuzzy observer, 
etc.) are only described briefly. For these topics, the 
reader is referred to other textbooks, e.g., Tanaka and 
Wang [17.11]. 

But before we formally present the concepts of 
fuzzy control, let us give a brief introduction of clas- 
sical control engineering in Sect. 17.2. Then, in detail, 
we cover fuzzy control in Sect. 17.3, including the 


17.2 Classical Control Engineering 


To introduce the problem of controlling a process, let us 
consider a technical system for which we dictate a de- 
sired behavior [17.12]. Generally speaking, we wish 
to reach a desired set value for a time-dependent out- 
put variable of the process. This output is influenced 
by a variable that we can influence, i.e., the control 
variable. Last but not least — to deal with unexpected 
influences — let a time-dependent disturbance variable 
be given that manipulates the output, too. 

Then the current control value is typically spec- 
ified by mainly two components, i.e., the present 
measurement values of the output variable €, the vari- 
ation of the output AE = ae and further variables 
which we do not specify here. We refer to n in- 
put variables é; € X1,..., En EX, of the controller 
(e.g., computed from the measured output variable 
of the process and its desired values) and one con- 
trol variable 7 € Y. Formally, the solution of a control 
problem is a desired control function g :Xı x---x 
X, — Y which sets a suitable control value y = g(x) 
for every input tuple ¥ = (x), x®,...,x®) eX, x 
-+-xX,. Controllers with multiple outputs are often 
handled as independent controllers with one output 
each. 

In classical control, we can determine ¢ using dif- 
ferent techniques. The most popular one for practical 
applications is the use of simple standard controllers 
such as the so-called PID controller. This controller 
uses three parameters to compute a weighted sum of 
proportional, integral, and derivative (PID) components 
of the error between the output variable and desired val- 
ues to compute the control variable. In many relevant 
cases, a good control performance of the closed-loop 
feedback system with controller and process can be 


most well-known approaches of Mamdani and Assil- 
ian, Takagi and Sugeno, and truly fuzzy-logic-based 
controllers in the Sects. 17.3.1, 17.3.2, and 17.3.3, re- 
spectively. We also talk about their advantages and 
limitations. We discuss some more recent industrial 
applications in Sect. 17.4 and automatic learning strate- 
gies in Sect. 17.5. Finally, we conclude our presentation 
of fuzzy control in Sect. 17.6. 


reached by simple tuning heuristics for these param- 
eters. This strategy is successful for many processes 
that can be described by linear differential equations 
for the overall behavior or at least near all relevant 
setpoints. More advanced controllers are used in cases 
with nonlinear process behavior, time-variant process 
changes, or complicated process dynamics. They re- 
quire both a mathematical process model based on a set 
of differential or difference equations and a fitness func- 
tion to quantify the performance. Here, many different 
strategies exist starting from a setpoint-dependent adap- 
tation of the PID parameters, additional feedforward 
components to react to setpoint changes or known 
disturbances, the estimation of internal process states 
by observers in state-space controllers, the online es- 
timation of unknown process parameters in adaptive 
controllers, the use of inverted process models as con- 
trollers, or robust controllers that can handle bounded 
parameter changes. For all these controllers, many elab- 
orate design and analysis techniques exist that guar- 
antee an optimal behavior based on a known process 
model, see e.g., Åström and Wittenmark [17.13], Good- 
win et al. [17.14]. 

However, it might be mathematically intractable or 
even impossible to define the exact differential equa- 
tions for the process and the controller g. For such 
cases, classical control theory cannot be applied at all. 
For instance, consider the decision process of human 
beings and compare it to formal mathematical equa- 
tions. Many of us have the great ability to control 
diverse processes without knowing about higher math- 
ematics at all — just think of a preschool child operating 
a bike or juggling a European football with its foot or 
even head. 


Fuzzy Control | 17.3 Using Fuzzy Rules for Control 


17.3 Using Fuzzy Rules for Control 


Probably the simplest way to obtain a human control 
behavior for a given process is to find out — by ask- 
ing direct questions for instance — how a person would 
react given a specific situation. One alternative is to ob- 
serve the process to be controlled, e.g., using sensors, 
and then discover substantive information in these sig- 
nals. Both approaches can be seen as knowledge-based 
analysis which eventually provides us with a set of lin- 
guistic rules. We assume that these if—then rules are 
able to properly control the given process. 

Let us briefly outline the operating principle of 
a fuzzy controller based on such if—then rules. Each 
tule consists of an antecedent and consequent part. The 
former relates to an imprecise description of the crisp 
measured input, whereas the latter defines a suitable 
fuzzy output for the input. In order to enable a comput- 
ing machine to use such linguistic rules, mathematical 
terms of the linguistic expressions used in the rules need 
to be properly defined. Once a control input is present, 
more than one rule might (partly) fulfill the present con- 
cepts. Thus, there is a need for suitable accumulation 
methods for these instantiated rules to eventually com- 
pute one fuzzy output value. From this value, a crisp 
output value can be obtained if necessary. 

This knowledge-based model of a fuzzy controller 
is conceptually shown in Fig. 17.1. The fuzzification 
interface operates on the current input value Xo. Here 
too, X9 might be mapped to a desired space if it is 
necessary — one may want to normalize the input to 
the unit interval first. The fuzzification interface even- 
tually translates X into a linguistic term described by 
a fuzzy set. The knowledge base is head of the con- 
troller, it serves as database. Here, every essential piece 
of information is stored, i.e., the variable ranges, do- 
main transformations, and the definition of the fuzzy 
sets with their corresponding linguistic terms. Further- 
more, it comprises the rule base that is required to 
linguistically describe and control the process. The de- 
cision logic computes the fuzzy output value of the 
given measurement value by taking into account the 
knowledge base. Last but not least, the defuzzification 
interface computes a crisp output value from the fuzzy 
one. 

Two well-known and similar approaches have led 
to the tremendous use and the success of fuzzy con- 
trol. The Mamdani—Assilian and the Takagi-Sugeno 
approaches are motivated intuitively in Sects. 17.3.1 
and 17.3.2, respectively. They have in common that 
their interpretation of a linguistic rule diverges from 


mathematical implications. Both types of controllers 
rather associate an input specified as an antecedent 
part with the given output given as a consequent 
part. A mathematically formal approach to fuzzy con- 
trol as discussed in Sect. 17.3.3 leads to completely 
different computations. As it turns out in practice, 
controllers based on any kind of logical implications 
are usually too restrictive to suitably control a given 
process. 


17.3.1 Mamdani-Assilian Control 


Just one year after the publication of Zadeh [17.9], 
Ebrahim Abe Mamdani and his student Sedrak Assil- 
ian were the first who successfully controlled a simple 
process using fuzzy rules [17.1, 15]. They developed 
a fuzzy algorithm to control a steam engine based on 
human expert knowledge in an application-driven way. 
That is why today we refer to their approach as nowa- 
days Mamdani-Assilian control. 

The expert knowledge of a Mamdani—Assilian con- 
troller needs to be expressed by linguistic rules. There- 
fore, for every set X; of given values for an input, we 
define suitable linguistic terms that summarize or parti- 
tion this input by fuzzy sets. Let us consider the first set 
Xı for which we define p; fuzzy sets ae TP D € 
F(X). Each of these fuzzy sets is mapped to one 
preferable linguistic term. Thus, X, is partitioned by 
these fuzzy sets. To ensure a better interpretability of 
each fuzzy set, we recommend to use just unimodal 
membership functions. In doing so, every fuzzy set can 
be seen as imprecisely defined value or interval. We fur- 
thermore urge to choose disjoint fuzzy sets for every 


pepo ee ee Knowledge 
base 


v v 
Fuzzification Decision Defuzzification 
interface logic Fuzzy interface 


Controller 
output 


Measured 


values Controlled 


system 


Fig. 17.1 Architecture of a fuzzy controller 


271 


EZL | a Hed 


272 


EZL | d Hed 


Part B | Fuzzy Logic 
partition, i.e., the fuzzy sets shall satisfy to the rule’s output fuzzy set 
ixj => sup {min [uP w, PN <0.5. ps? O) = min {or,, 14;,(9)} - (17.3) 
xEX ` 
' When the decision logic does that for all œ, for r= 
When X; is partitioned into pı fuzzy sets ua, sea jes 1,...,, then it unifies all output fuzzy sets as by 
we can continue to partition the remaining sets e a t-conorm. The standard Mamdani-Assilian con- 
X2,...,Xn and Y in the same way. The eventual out- troller uses the maximum as t-norm. Thus, the ultimate 
come of this procedure (i. e., the linguistic terms asso- output fuzzy set 
ciated with the fuzzy sets for each variable) establishes s 
the database in our knowledge base. RO) = Pee ail „min {r MiO) - (17.4) 
The rule base of any Mamdani—Assilian controller ce 
is specified by rules of the form The whole process of evaluating Mamdani—Assilian 
rules is depicted in Fig. 17.2. 
if £ is A and ... and £, is A™ then nis B Of course, from a fuzzy set-theoretic and interpre- 
tational point of view, it suffices to keep 2(y) as the 
et) final output value. However, fuzzy controllers are used 
where A®,...,A® and B symbolize linguistic terms in real-control application where a crisp control value is 
which correspond to the fuzzy sets uo , p” ind for sure needed, e.g. to increase the electric current of 
u, respectively, according to the fuzzy partitions of X; x ĉ hotplate when boiling an egg. That is why the fuzzy 
.. xX, and Y. Thus, the rule base consists of k control control output u€ is Drócessed in the defuzzification 
üles interface. Depending on the implemented method of 
fuzzifying u£, a real value is ultimately obtained. Three 
R, : if & is a and ... and &, is AD methods are used in the literature extensively, i. e., the 
a ` max criterion method, the mean of maxima (MOM) 
then 7 is B}, r=1,...,k. method, and the center of gravity (COG) method. 
We again underline that these rules are not interpreted The max criterion method simply returns an ar- 
logically as mathematical implications. They rather bitrary value y € Y such that WO) obtains a maxi- 
specify the function n = 9(E1,...,) piecewise using mum membership degree. However, this arbitrary value 
the existing associations between the known input- picked at random typically results i s nondeterminis- 
output tuples, i. e., tic control behavior. That is usually undesired as the 
interpretability and repeatability of already produced 
B, ifia ~ AY a nd ... and & © A” | outcomes get lost. The MOM method returns the mean 
i in i value of the set of elements y € Y that have maximal 
nr): : membership degrees in the output fuzzy set. Using this 
B, iff ~ KOO and cand £, xA”. approach, it might happen that the defuzzified control 
me ne value 7 may not even belong to the points leading 
The control function g is thus composed of partial t° maximal membership degrees. Just consider for in- 
knowledge that we connect in a disjunctive manner, Stance a bimodal normal fuzzy set as a fuzzy output. 
Thus, Mamdani-Assilian control can be referred to That is why MOM may lead to control actions that you 
knowledge-based interpolation. would not await. Finally, the COG method returns the 
Now assume that we observe a measurement xe Value at the center of gravity of the fuzzy output area 
Xı X++- X Xn. Naturally, the decision logic applies each ME, i.e., 
rule R, separately to the measured input. It then com- 
putes the degree to which the input x fulfills the an- 
tecedent of R,, i. e., the degree of applicability J WEO) ydy / | EO) dy 
yey yey 
a, = min THE o uP "i : (17.2) (17.5) 


This degree of applicability literally cuts off the output 
fuzzy set u;, of the rule R, at the level œ, which leads 


Usually, the COG method is taken to defuzzify the 
fuzzy output as it typically leads to a smooth control 


Fuzzy Control | 17.3 Using Fuzzy Rules for Control 


i Å Pos. small 


0.3 


> 
0 1502550 45 


* Pos. medium 


0 ils) 235) 310) 


function. Nevertheless, it is possible to obtain an unrea- 
sonable output, too. We refer the gentle reader to Kruse 
et al. [17.16] for a richer treatment of defuzzification 
methods. 

Coming back to the theoretical background of 
Mamdani-—Assilian control, let us analyze its type of 
linguistic rules. Having a look at (17.3) again, we see 
that the minimum is used to serve as fuzzy implica- 
tion. But the minimum does not fulfill all truth value 
combinations of the propositional logic’s implication. 
To see this, let us consider p —> q and assume that p 
is false. Then p — q is true — regardless of the truth 
value of q. The minimum of 0 and q, however, is 0. 
This logical flaw could be seen as inconsistency of 
the standard Mamdani—Assilian controller. On the con- 
trary, it actually turns it into a very powerful technique. 
Created to solve a simple practical problem, we might 
speak from a heuristic method instead. When we do 
not see Mamdani—Assilian rules as logical implications 
but rather as association [17.17], then the controller is 
even theoretically sound: Every rule R, associates an 
output fuzzy set B;, with n input fuzzy sets AP for 
j=l,...,n. Consequently, we must use a fuzzy con- 
junction, e.g., the minimum f-norm. 

Mamdani and Assilian’s heuristics can be obtained 
by the extension principle, too [17.18, 19]. If the fuzzy 
relation R that relates to the input x” and the out- 
put y satisfies a couple of extensionality properties, 
then Mamdani and Assilian’s approach can also be 
obtained. Therefore, let E and F’ be two similarity re- 
lations defined on the domains X and Y of x and y, 


Fig. 17.2 The Mamdani—Assilian rule 
evaluation. Here we assume that the 
control process is described by two 
input variables 6, Ë and one output F. 
Let the controller be specified by two 
Mamdani—Assilian rules. Here the in- 
put tuple (25, —4) leads to the fuzzy 
output shown on the lower right side. 
If a crisp output is needed, any de- 
fuzzification method can eventually 
be applied 


respectively. The extensionality of R on X x Y indicates 
that 


Vx eX: Vy,y €Y:R(x,y) @E(yy) < Rwy), 
Yx, x EX: Vye Y: R(x, y) Q E(x, x’) < R(x’, y). 
(17.6) 


Thus, if (x,y) € R, then x is related to the neigh- 
borhood y. The same holds for y in relation to x. 
Then AY (x) = E(x,a) and B,(x) = E’(y,b,) can be 
regarded as fuzzy sets of values in the proximity 
of a” and b,, respectively. Hence Vr=1,...,k: 
R (a, sreg a”), b;) = 1. Applying this type of control 
definition to real-world problems, a practitioner must 
specify sensible similarity relations Æ; and Æ’ for each 
input § and output 7, respectively. Eventually, using the 
extension principle for R, we obtain 


RAY, ished x”) y) > max S 


Q (APD), AD (x), A,(9)) 


In addition, if we use the minimum t-norm for ®, 
then we get exactly the approach of Mamdani- 
Assilian. Boixader and Jacas [17.20], Klawonn and 
Castro [17.21] show that indistinguishability or sim- 
ilarity is the connection between the extensionality 
property and fuzzy equivalence relations. 

In practical applications, different t-norms and t- 
conorms as product instead of minimum and bounded 


273 


EZL | d Hed 


274 Part B 


Fuzzy Logic 


EZL | d Hed 


sum instead of maximum play an important role. The 
reason is a stepwise multilinear interpolation behavior 
between the rules that cause a smoother function y = 
(x) compared to minimum and maximum operators. 

Depending on the input and output variables, 
Mamdani—Assilian controllers can intentionally or un- 
intentionally copy concepts from classical control the- 
ory. As an example, a controller with inputs as pro- 
portional, integral and derivative errors between the 
output variable and desired values in combination with 
a symmetrical rule base has a similar behavior to a PID 
controller. Such a reinvention of established concepts 
with a more complicated implementation should be 
avoided though. 


17.3.2 Takagi-Sugeno Control 


Partitioning both the input and output domains seems 
to be reasonable from an interpretational point of view. 
Nevertheless, one might face control applications where 
a sufficient approximation quality can only be achieved 
using many linguistic terms for each dimension. This, in 
turn, will increase the potential number of rules which 
most probably worsen the ability to interpret the rule 
base of the fuzzy controller. For such control processes, 
we recommend to neglect the concept of partitioning 
the output domain and instead define functions that lo- 
cally approximate the control behavior. 

A controller that uses rules like this is called the 
Takagi—Sugeno controller [17.22]. The Takagi-Sugeno 
rules R, for r = 1,...,k are typically defined as 


R,: if & is an and ... and £, is Ay 
then n =f-(&, sey En) . 


Most commonly, linear functions f, can be found in 
many controllers, 1. e., 


n 
fœ = a® + yaa E 


i=1 


R, at Ei is D 
R, : if &; is A and &, is S 


3 9 4 13 


R; : if & is PARAN and & is Ze 


3 Qi) 18 4 13 


A and &, is Ze 


11 18 4 13 


Ry :kf ë is 


then 7; = 1 - E; + 0.5 - 2+ 1 
then y =-0.1-& +4- 2+ 1.2 
then %3 = 0.9 - E; + 0.7 - &2 +9 


then 44 = 0.2 - & + 0.1 - & + 0.2 


The rules of a Takagi—Sugeno controller share the 
same antecedent parts with the Mamdani—Assilian con- 
troller, so does the decision logic computes the same 
degree of applicability @,, i.e., using (17.2). An ex- 
ample of such a controller for two inputs is shown in 
Fig. 17.3. Eventually, all degrees are used to compute 
the crisp control value 


z T Qr -fœ 
E=% 


which is the weighted sum over all rule outputs. Ob- 
viously, a Takagi-Sugeno controller does not need any 
defuzzification method. 

Takagi—Sugeno controllers are not only popular for 
translating human strategies into formal descriptions, 
but they can also be combined with many well-known 
methods from classical control theory. For instance, 
many locally valid linear models of the process can 
be aggregated into one nonlinear model in the form 
of Takagi-Sugeno rules. In a next step, the desired 
behavior of the resulting closed-loop system is formu- 
lated. Stability can be defined in the strictest form as 
guaranteed convergence to the desired setpoint from 
any initial state, including robustness against bounded 
parameter uncertainties of the process model. Mathe- 
matical methods as Lyapunov functions in the form of 
linear matrix equations are now applied to design the 
Takagi—Sugeno fuzzy controller. For details about these 
concepts, we refer to Feng [17.23]. The main advantage 
of this strategy is the guaranteed performance, whereas 
many disadvantages come into play, too. This approach 
requires a model and sophisticated mathematical meth- 
ods. Also it usually leads to a limited performance due 
to conservative design results and the limited inter- 
pretability. That is also why many recent papers propose 
iterative improvements, e.g., by proposing ways to han- 
dle other types of uncertainties such as varying time 
delays [17.24], and by reducing the conservativeness of 
the solutions [17.25]. 


Fig. 17.3 A Takagi-Sugeno con- 
troller for two inputs and one output 
described by four rules. If a cer- 

tain clause 3; is Ay in any rule R, 

is missing, then the corresponding 
membership function p; (4) = 1 

for all linguistic values j,,.. In this 
example, consider for instance x2 in 
rule Rı. Thus, Ui (x2) = 1 for all ip; 


Fuzzy Control | 17.3 Using Fuzzy Rules for Control 


17.3.3 Fuzzy Logic-Based Controller 


Both controllers that have been introduced so far in- 
terpret every linguistic rule as an association of an 
n-dimensional fuzzy input point with a fuzzy output. 
Thus, we can interpret the set of fuzzy rules as set- 
points of the control system. Recall, however, that this 
has nothing to do with logic inference since not all rules 
need to be activated by a given input. 

When all rules are evaluated in a conjunctive man- 
ner, we can regard each fuzzy rule as a fuzzy constrain 
on a fuzzy input-output relation. The inference oper- 
ation of such a controller is identical to approximate 
reasoning. Note that classical reasoning uses inference 
tules (so-called tautologies) to deductively infer crisp 
conclusions from crisp propositions. A generalization 
of classical reasoning is approximate reasoning applied 
to fuzzy propositions. Zadeh [17.9] proposed the first 
approaches to handle fuzzy sets in approximate rea- 
soning. The gentle reader is referred to further details 
explained in Zadeh [17.26,27]. The basic idea is to 
represent incomplete knowledge as possibility distribu- 
tions. 

Possibility theory [17.28] has been proposed to 
study and model imperfect descriptions of an existing 
element xp in a set A C X. It can be seen as a counterpart 
to probability theory. To formally define a possibility 
distribution JT : 2* — [0, 1] we need the following ax- 
ioms that seem to have similarities to the well-known 
Kolmogorov axioms: 


IT(@) =0, 
TT(A) < TI (B) if A C B and 
TI (A U B) = max{TI (A), TI (B)} for all A,B C X. 


The expression J (A) = 1 includes that xọ € A is un- 
conditional possible. If JI (A) = 0, then it is impossible 
that x9 E€ A. Zadeh [17.29] models uncertainty about 
xo by the possibility measure M : 2? —> [0, 1], IT(A) = 
sup{u (x) | x € A} when a fuzzy set jz : x > [0, 1] is the 
only known description of x. Then the possibility mea- 
sure is given by the possibility degrees of singletons, 
i.e., M(x} = u(x). 

Now, consider only one-dimensional input and out- 
put spaces. Then we must specify a suitable two- 
dimensional possibility distribution. Let the rule 


R : if € is A then ņ is B 


associate the input fuzzy set u4 with the output fuzzy 
set ug. We can express this rule by a possibility distri- 


bution 


nx, y (x, y) = (ma x), UaQ)) 


where J is an implication of any multivalued logic. So, 
we can compute the output by the composition of the 
input and the rule base, i. e., ug = Ha ° x,y Where the 
fuzzy rules are expressed by the fuzzy relation zy,y de- 
fined on X x Y. The composition of a fuzzy set jz with 
a fuzzy relation z is defined by 


por : Y — [0,1], y sup {min{ u(x), n(x, y)} . 


xEX 


We can easily see that this is a fuzzification of the 
standard composition o of two crisp sets M C X and 
RCXxY,i.e., 


def 


MoR={yEYlAvreX:(XEMA(x, VJ ER}CY. 


The challenge in fuzzy control applications using 
relational equations is to search for a fuzzy relation z 
that satisfies all equations ug, = ua, 0 x for every rule 
R, with r= 1,...,k. If multiple inputs X;,...,X,, are 
used, then u4 is defined on the product space X = 
X, X--++x X, as in (17.2). The fuzzy relation m can be 
found by determining the Gödel relation for every given 
relational equation, i. e., 


(x,y) € yey 4 (xX € Ha > y € up). 


Here the implication arrow — represents the Gödel im- 
plication 


1 ifa<b, 
b ifa>b. 


a>b= 


So, actually a linguistic rule expresses the gradual rule 
in terms of the more 4, the more upg. Hence it con- 
strains the fuzzy relation x by the inequality 


min(ua (x), w(x, y)) < May) 


for all (x,y) € X x Y. The Gödel implication is theo- 
retically not the only way to represent 2. Dubois and 
Prade [17.30,31] give a variety of good conclusions, 
however, not to take another but the Gödel implication. 

If the system of relational equations jig, = Ma, © 7 
for r= 1,...,k is solvable, then the intersection of all 
rule’s Gödel relations 


k 
xÊ = N 1S (ua, (x), HB, O)) 


r=l 


275 


EZL | d Hed 


276 


LL | d Hed 


Part B 


Fuzzy Logic 


From the air cleaner 


is a solution with N being the minimum t-norm. Due to 
the mathematical properties of the Gödel implication, 
the Gödel relation z“ is the greatest solution in terms 
of elementwise membership degrees. 

To conclude this type of controller, we recall that 
the relation 


T(x, y)}) = x,y) 


approximates if it is possible to assign the output value 
y to the input tuple x. Besides the overall conjunctive 


nature of the rules does softly constrain the control 
function g. It might thus happen in practical appli- 
cations that these constraints lead to contradictions if 
very narrow output fuzzy sets are assigned to overlap- 
ping input fuzzy sets. In such a case, the controller’s 
output will be the empty fuzzy set which corresponds 
to no solution. One way to overcome this problem is 
to specify both narrow input fuzzy sets and broader 
output fuzzy sets. This procedure, however, limits the 
expressiveness and thus applicability of fuzzy logic- 
based controllers. 


17.4 A Glance at Some Industrial Applications 


Shortly after big success stories in the 1980s, mainly in 
Japan [17.2], many real-world control applications have 
been greatly solved using the Mamdani—Assilian ap- 
proach all around the world. So did the research group 
of the paper’s third author initiate the development of 
some automobile controllers. 

We want to discuss two of these control pro- 
cesses that have been developed with Volkswagen AG 


Air bypass to 
the throttle 


Auxiliary air regulator 


© 


Air flow sensor 


aaa 
DIGIFANT 
i e: | / b control device 
Throttle 
Banking up R 
flap Compensation wo m E 
flaptodamp S F F P 
vibrations To the intake valves 


REV0_LO 


Fig. 17.4 Principle of the engine idle speed control 


Fuzzy controller 


Meta controller 


dREV 


detect dAARCUR | Control | |A ARCURIN 
and >| range > 
gREV limit. 


Pilot value for air conditioning system 


Fig. 17.5 Structure of the fuzzy controller 


(VW), i.e., the engine idle speed control [17.18] and 
the shift-point determination of an automatic transmis- 
sion [17.32]. Both of these very successful Mamdani- 
Assilian controllers can nowadays be still found in 
VW automobiles. The idle speed controller is based 
on similarity relations which facilitates to interpret the 
control function as interpolation of a point-wise impre- 
cisely known function. The shift-point determination 
continuously adapts the gearshift schedule between two 
extremes, 1.e., economic and sporting. This controller 
determines a so-called sport factor and individually 
adapts the gearshift movements of the driver. 


17.4.1 Engine Idle Speed Control 


This controller shall adjust the idle speed of a spark 
ignition engine. Usually, a volumetric control is used 
to control the spark ignition engine. The principle is 
shown in Fig. 17.4. Here an auxiliary air regulator dif- 
fers the cross-section of a bypass to the throttle. 

The controller’s task is to adjust the auxiliary air 
regulator’s pulse width. In the case of a rapid fall of 
the number of revolutions, the controller shall drive 
the auxiliary air regulator to broaden the bypass cross- 
section. This increase of the air flow rate is measured 
by an air flow sensor which serves as controller signal. 
Then a new amount for the fuel injection has to be deter- 
mined, and with a higher air flow rate the engine yields 
more torque. This, in turn, leads to a higher number of 
revolutions which could be decreased correspondingly 
by narrowing the bypass cross-section. 

The ultimate goal is to reduce both fuel consump- 
tion and pollutant emissions. It is straightforward to 
achieve this goal by slowing down the idle speed. On 
the contrary, some automobile facilities, e.g., the air- 


Fuzzy Control | 17.4 A Glance at Some Industrial Applications 277 


conditioning system, are very often switched on and 
off which forces the number of revolutions to drop. So 
avery flexible controller is needed to adjust this process 
properly. Schröder et al. [17.32] even point out other 
problems of this control application. 

As it turned out, the engineers who defined the sim- 
ilarity relations to model indistinguishability/similarity 
of two control states did not experience any big diffi- 
culties. Remember that the control expert must define 
a set of k input—output tuples ((x?, ade ,x®)) syr). So, 
for each r = 1,...,k the output value y, seems appro- 
priate for the input xO, nae ,x®)), Like that the control 
expert specifies a partial control function gp. According 
to (17.6), we directly obtain a Mamdani—Assilian con- 
troller by determining the extensional hull of pọ given 
the similarity relations. We thus obtain the rules from 
the partial control function gp as 


R, : if & is approximately x) and... 
and &, is approximately x? ) 


then 7 is approximately y, . 


Klawonn et al. [17.18] explain the more detailed theory 
of this approach. 

Eventually, only two input variables are needed to 
control the engine idle speed controller, i. e.: 


dAARCUR 


Fig. 17.6 Performance characteristics 


TXE 


-70 -50 -30 -10 10 30 50 


Fig. 17.7 Deviation dREV of the number of revolutions 


1. The deviation dREV [rpm] of the number of revolu- 
tions to the set value. 

2. The gradient gREV [rpm] of the number of revolu- 
tions between two ignitions. 


There exists just one output variable which is the 
change of current (AARCUR for the auxiliary air reg- 
ulator. The final controller is shown in Fig. 17.5. 

The control rules of the engine idle speed con- 
troller have been found from idle speed exper- 
iments. The partial control function @p : Xarevy X 
X(erev) > Yaaarcur) is depicted in the upper half of 
Tab. 17.1. 

The fuzzy controller has been defined by a similar- 
ity relation and the partial control mapping go. With the 
center of area (COA) method, it yields a control sur- 
face as shown in Fig. 17.6. The function values here 
are evaluated in a grid of equally sampled input points. 
The respective Mamdani—Assilian controller has been 
found by relating each point of go to a linguistic term, 
e.g., negative big (nb), negative medium (nm), negative 
small (ns), and approximately zero (az). The resulting 
fuzzy partitions of (REV, gREV, and d(AARCUR are 
displayed in Figs. 17.7, 17.8, 17.9, respectively. So, we 
obtain linguistic rules from go like 


if AREV is A and gREV is B then dAARCUR is C. 


The complete set of rules is given in the lower part of 
Tab. 17.1. 

Klawonn et al. [17.18], Schröder et al. [17.32] show 
that this Mamdani—Assilian controller leads to a very 
smooth and thus better control behavior when com- 


; KAX 


4 FAB 9 5 AF 


Fig. 17.8 Gradient gREV of the number of revolutions 


RARER 


25 -20 -15 -10 wW S 20525 


Fig. 17.9 Change of current (AARCUR for the auxiliary 
air regulator 


LL | d Hed 


278 PartB 


LL | d Hed 


Fuzzy Logic 


Fig. 17.10 Flowing shift-point deter- 


Classification of driver/driving situation Gearshift ae ; ; 
by fuzzy logic computation mination with fuzzy logic 
Fuzzification Inference Defuzzi- Interpolation 
machine fication 
Accelerator pedal 
Filtered speed of ee 
accelerator pedal Determination 
of speed limits 
Number of a Sport for shifting Gear 
changes in factor (t) into higher or selection 
pedal direction / lower gear 


Sport factor (t-1) / 


pared to classical controllers. Still, for this application, 
it has been much simpler to define a fuzzy controller 
than a one based on higher mathematics. Moreover, 
they found that this fuzzy controller reaches the de- 
sired setpoint precisely and fast. Last but not least, 
even increasing the load slowly does not change the 
control behavior significantly. So it is nearly impossi- 
ble to experience any vibration, even after drastic load 
changes. 


17.4.2 Flowing Shift-Point Determination 


Conventional automatic transmissions select gears 
based on the so-called gearshift diagrams. Here, the 
gearshift simply depends on the accelerator position 


Table 17.1 The partial control mapping go (upper table) 
and its corresponding fuzzy rule base (lower table) 


gREV 


0) a ON a LO 3 6 40 
=W 15 15 10 10 5 5 
—50 20 15 10 LO FS 5) 0 
=0 | 15 w ES 5 3 0 0 
dREV 0 5 5 0 0 0 =O] = 
300 E0 0 0 = [=> | 10] —5 
W | © =) |= | H10] =15 | =15 | —20 
70 =) | =S | =10] =| =15] =15 | =15 


ns pb pm ps ps az az 
dREV az ps ps az az az nm ns 
ps az az az ns ns nm nb 
pai faz ns ns nm nb nb nh 
pb ns ns nm nb nb nb nb 


depending on 
sport factor 


and the velocity. A lagging between the up and down 
shift avoids oscillating gearshift when the velocity 
varies slightly, e.g., during stop-and-go traffic. For in- 
stance, if the driver kicks gas with half throttle, the 
gearshift will start with the first gear. For a standard- 
ized behavior, a fixed diagram works well. Until 1994, 
the VW gear box had two different types of gearshift 
diagrams, i.e., economic ECO and sporting SPORT. 
An economic gearshift diagram switches gears at a low 
number of revolutions to reduce the fuel consumption. 
A sporting one leads to gearshifts at a higher number 
of revolutions. Since 1991 it was a research issue at 
VW to develop an individual adaption of shift-points. 
No additional sensors should be used to observe the 
driver. 

The idea was that the car observes the driver [17.32] 
and classifies him or her into calm, normal, sportive 
(assigning a sport factor € [0, 1]), or nervous (to calm 
down the driver). A test car from VW was operated 
by many different drivers. These people were classified 
by a human expert (passenger). Simultaneously, 14 at- 
tributes were continuously measured during test drives. 
Among them were variables such as the velocity of the 
car, the position of the acceleration pedal, the speed of 
the acceleration pedal, the kick down, or the steering 
wheel angle. 

The final Mamdani controller was based on four in- 
put variables and one output. The basic structure of the 
controller is shown in Fig. 17.10. In total, seven rules 
could be identified at which the antecedent consists of 
up to four clauses. The program was highly optimized: 
It used 24 byte RAM and 702 byte ROM, i. e., less than 
1 KB. The runtime was 80ms which means that 12 
times per second a new sport factor was assigned. The 
controller is in series since January 1995. It shows an 
excellent performance. 


Fuzzy Control | 17.5 Automatic Learning of Fuzzy Controllers 


17.5 Automatic Learning of Fuzzy Controllers 


The automatic generation of linguistic rules plays an 
important role in many applications, e.g., classifica- 
tion [17.33-36], regression [17.37—39], and image pro- 
cessing [17.40,41]. Since fuzzy controllers are based 
on linguistic rules, automatic ways to tune and learn 
them have been developed for control applications as 
well [17.18, 19, 42—44]. 

How can a computer learn fuzzy rules from data to 
explain or support decisions like people do? We think 
that the fuzzy analysis of data can answer this ques- 
tion sufficiently [17.45]. The easiest and most common 
way is to use fuzzy clustering which automatically de- 
termines fuzzy sets from data. 

Before we talk about the generation of linguistic 
rules from fuzzy clustering, however, let us briefly list 
some of the very diverse methods of fuzzy data analy- 
sis. Grid-based approaches define fixed fuzzy partitions 
for every variable. Every cell in that multidimensional 
grid may correspond to one rule [17.39]. Most well 
known are hybrid methods to induce fuzzy rules. There- 
fore, a fuzzy system is combined with computational 
intelligence techniques. For instance, evolutionary al- 
gorithms are used for guided searching the space of 
possible rule bases [17.46] or fuzzifying and thus 
summarizing a crisp set of rules [17.47]. Neuro-fuzzy 
systems use learning methods of artificial neural net- 
work (e.g., backpropagation) to tune the parameters of 
a network that can be directly understood as a fuzzy 
system [17.48]. Standard rule generation methods have 
been fuzzified as well (e.g., separate-and-conquer rule 
learning [17.49], decision trees [17.50], and support 
vector machines [17.51,52]). 

Using fuzzy clustering to learn fuzzy rules from 
data, we only refer to the standard fuzzy c-means al- 
gorithm (FCM) [17.53,54]. Consider the input space 
X C R” and the output space Y C R. We observe m pat- 
terns (x, yj) ES CX x Y where j= 1,...,m. Running 
FCM on that dataset S leads to c cluster prototypes 


t= (0...) 


with i= 1,...,c that can be seen as concatenation of 
both the input values of, j=1,...,n and the output 
value c? y, Thus, every prototype represents one linguis- 
tic rule 


R; : if x is close to CE meee ef?) 
(y) 


i 


then y is close toc 


Using the membership degrees U, we can rewrite these 
tules as 


Ri : if EŒ) then 2 (y). (17.7) 


The only problem is that FCM returns the membership 
degrees u;(x, y) of the product space X x Y. To obtain 
rules like (17.7), we must project ù; onto Ù and w. If 
x and y are restricted to [Xmin, Xmax] and [Ymin, Vmax], re- 
spectively, the projections are given by 
O= sup ÜG), 
YE [ymin -Ymax ] 
u(y) = sup 


xE [min Xmax] 


uj(X, y) ‘ 


We can also project ú; onto each single input variable 
Xı aaa Xn by 


k >x? 
ui (x! )) = sup w (x) 
2) ERGP She] 
a def 
for k=1,...,n where x = (x) GD, 


xETD x), We may thus write (17.7) in the form 
of a Mamdani-Assilian rule (17.1) as 


R; if /\ ux(x) then wO). (17.8) 
k=1 


For one rule, the output value of an unseen input x € R” 
will be equivalent to (17.2) if the minimum t-norm is 
used as conjunction ^. The overall output of the com- 
plete rule base is given by a disjunction V of all rule 
outputs (cf. (17.4) if V is the t-conorm maximum). 

A crisp output can then again be computed by de- 
fuzzification, e.g., using the COG method (17.5). Since 
this computation is rather costly, the output member- 
ship functions w are commonly replaced by singletons, 
iê; 

0) 


i , 


1 ify=c 
wy) = 
i0) 0 otherwise. 


Since each rule consequently comprise the component 
co? of the cluster prototype, we can rewrite (17.8) as 


the Sugeno-Yasukawa rule [17.55] 


n 
R; 2 if VAN u(x) then y = ®., 
k=1 


279 


S'Z4 | d Hed 


280 PartB 


Fuzzy Logic 


G2L| d Hed 


1 


v 


Fig. 17.11 Fuzzy rules and induced imprecise areas 


These rules strongly resemble the neurons of an radial 
basis function (RBF) network. This will become clear 
if every membership function is Gaussian, i. e., 


= 2 

> X-L: 

iG) = exo A) 
Oi 


and if there are normalized, i. e., 


yl @) =1 forall ZER”. 
i=1 
Note that this link is used in neuro-fuzzy systems 


for both training fuzzy rules with backpropagation and 
initializing RBF networks with fuzzy rules [17.34]. 


17.5.1 Transfer Passenger Analysis 
Based on FCM 


In this section, we present another real-world con- 
trol problem that deals with the control of passenger 
movements and flows in terminal areas on an air- 
port’s land side. Especially during mass events such as 
world championships, concerts, or in the peak season 
in touristic areas, the capacities of passenger airports 
reach their upper limits. The conflict-free allocation, 
e.g., using intelligent destination boards and signpost, 
can increase the safety and security of passengers and 
airport employees. Thus, it is rational to study intel- 
ligent control approaches to allocate passengers from 
their arrival terminal to their departure terminal. 

To evaluate different controllers, the German 
Aerospace Center (DLR) implemented a macroscopic 


passenger flow model that simulates passenger move- 
ments. Here, probabilistic distributions are used to 
describe passenger movements in terminal areas. The 
approach of Keller and Kruse [17.56] constructs a fuzzy 
rule base using FCM to describe the transfer passen- 
ger amount between aircraft. These rules can be used as 
control feedback for the macroscopic simulation. 

The following attributes of passengers are used to 
for analysis: 


@ The maximal amount of passengers in a certain air- 
craft (depending on the type of the aircraft) 

@ The distance between the airport of departure and 
the airport of destination (in three categories: short-, 
medium-, and long-haul) 

© The time of departure 

@ The percentage of transfer passengers in the aircraft. 


The number of clusters is determined by validity 
measures [17.41,57] evaluating the whole partition of 
all data. The clustering algorithm is run for a varying 
number of clusters. The validity of the resulting parti- 
tions is then compared by different measures. 

An example of resulting fuzzy clusters is given in 
Fig. 17.11. Here, every fuzzy cluster corresponds to 
one fuzzy rule. The color intensity indicates the firing 
strength of a specific rule. The imprecise areas are the 
fuzzy clusters where the color intensity indicates the 
membership degree. The tips of the fuzzy partitions are 
obtained in every domain by projections of the mul- 
tidimensional cluster centers (as explained before in 
Sect. 17.5). 

The fuzzy rules obtained by FCM are simplified 
through several steps. First, similar fuzzy sets are com- 
bined to one fuzzy set. Fuzzy sets similar to the univer- 
sal fuzzy set are removed. Fuzzy rules with the same 
input clauses are either combined if they also share 
the same output clauses or else they are removed from 
the rule base. Eventually, FCM and the rule-simplifying 
process yield five rules. 

Among them are the two following rules. If an 
aircraft with a relatively small amount of maximal 
passengers (80—200) has a short- or medium-haul des- 
tination departing late at night, then usually this flight 
has a high amount of transfer passengers (80—90%). 
If a flight with a medium-haul destination and a small 
aircraft (about 150 passengers) starts about noon, then 
it carries a relatively high amount of transfer passen- 
gers (ca. 70%). We refer the gentle reader to Keller and 
Kruse [17.56] for further details about this real-world 
control application. 


Fuzzy Control | References 


17.6 Conclusions 


In this chapter, we introduced fuzzy control — a human- 
inspired way to control a nonlinear process as an im- 
precisely defined function. We talked about classical 
control engineering and its limitations which also moti- 
vates the need for a human knowledge-based approach 
of control. This knowledge is typically represented 
as either Mamdani—Assilian rules or Takagi-Sugeno 
tules. We presented both types of fuzzy controllers, 
and also discussed the shortcomings of logic-based 
controllers, although they are mathematically well de- 
fined. We thoroughly presented two successful indus- 
trial applications of fuzzy control. We also stressed 
the necessity for automatic learning and tuning al- 
gorithms. We mentioned the most known approaches 
briefly and rule induction from fuzzy clustering in de- 


References 


17.1 E.H. Mamdani: Application of fuzzy algorithms for 
the control of a simple dynamic plant, Proc. IEEE 
121(12), 1585-1588 (1974) 

17.2 S. Yasunobu, S. Miyamoto: Automatic Train Oper- 
ation System by Predictive Fuzzy Control (North- 
Holland, Amsterdam 1985) pp. 1-18 

17.3 Google patents: http://patents.google.com/, last 
accessed on August 22, 2013 

17.4 K. Hirota (Ed.): Industrial Applications of Fuzzy 
Technology (Springer, Tokio 1993) 

17.5 T. Terano, M. Sugeno: Applied Fuzzy Systems (Aca- 
demic, Boston 1994) 

17.6 R.-E. Precup, H. Hellendoorn: A survey on indus- 
trial applications of fuzzy control, Comput. Ind. 
62(3), 213-226 (2011) 

17.7 C. Moewes, R. Kruse: Fuzzy control for knowledge- 
based interpolation. In: Combining Experimen- 
tation and Theory: A Hommage to Abe Mam- 
dani, Studies in Fuzziness and Soft Computing, 
Vol. 271, ed. by E. Trillas, P.P. Bonissone, L. Mag- 
dalena, J. Kacprzyk (Springer, Berlin, Heidelberg 
2012) pp. 91-101 

17.8 P. Podrzaj, M. Jenko: A fuzzy logic-controlled ther- 
mal process for simultaneous pasteurization and 
cooking of soft-boiled eggs, Chemom. Intell. Lab. 
Syst. 102(1), 1-7 (2010) 

17.9 L.A. Zadeh: Outline of a new approach to the anal- 
ysis of complex systems and decision processes, 
IEEE Trans. Syst. Man Cybern. 3(1), 28-44 (1973) 

17.10 L.A. Zadeh: Fuzzy sets, Inf. Control 8(3), 338-353 
(1965) 

17.11 K. Tanaka, H.0. Wang: Fuzzy Control Systems Design 
and Analysis: A Linear Matrix Inequality Approach 
(Wiley, New York 2001) 


tail. We also showed a real-world control application 
where such rules are used in the feedback loop of 
a simulation. 

How will the future of fuzzy control look alike? 
Even in university lectures for control engineers, 
fuzzy control has become part of the curriculum 
years ago. With the drastically growing number of 
control systems — becoming more and more com- 
plex — a new generation of well-educated engineers 
and scientists will strengthen the presence of fuzzy 
controllers for real-world applications. In the far fu- 
ture, the use of data analysis techniques and algo- 
rithms will most probably drive evolving fuzzy con- 
trollers that are even able to react in nonstationary 
environments. 


17.12 K. Michels, F. Klawonn, R. Kruse, A. Nürnberger: 
Fuzzy Control: Fundamentals, Stability and Design 
of Fuzzy Controllers, Studies in Fuzziness and Soft 
Computing, Vol. 200 (Springer, Berlin, Heidelberg 
2006) 

17.13 K.J. Åström, B. Wittenmark: Adaptive Control 
(Courier Dover, Mineola 2008) 

17.14 G.C. Goodwin, S.F. Graebe, M.E. Salgado: Control 
System Design, Vol. 240 (Prentice Hall, Upper Saddle 
River 2001) 

17.15 E.H. Mamdani, S. Assilian: An experiment in lin- 
guistic synthesis with a fuzzy logic controller, Int. 
J. Man-Mach. Stud. 7(1), 1-13 (1975) 

17.16 R. Kruse, J. Gebhardt, F. Klawonn: Foundations of 
Fuzzy Systems (Wiley, Chichester 1994) 

17.17 0. Cordón, M.J. del Jesus, F. Herrera: A proposal on 
reasoning methods in fuzzy rule-based classifica- 
tion systems, Int. J. Approx. Reason. 20(1), 21-45 
(1999) 

17.18 F. Klawonn, J. Gebhardt, R. Kruse: Fuzzy control 
on the basis of equality relations with an exam- 
ple from idle speed control, IEEE Trans. Fuzzy Syst. 
3(3), 336-350 (1995) 

17.19 F. Klawonn, R. Kruse: Equality relations as a ba- 
sis for fuzzy control, Fuzzy Sets Syst. 54(2), 147-156 
(1993) 

17.20 D. Boixader, J. Jacas: Extensionality based approx- 
imate reasoning, Int. J. Approx. Reason. 19(3/4), 
221-230 (1998) 

17.21 F. Klawonn, J.L. Castro: Similarity in fuzzy 
reasoning, Mathw. Soft Comput. 2(3), 197-228 
(1995) 

17.22 T. Takagi, M. Sugeno: Fuzzy identification of sys- 
tems and its applications to modeling and con- 


281 


Z4 | d Hed 


282 PartB | Fuzzy Logic 


L4 | 9 Hed 


Tí; 


17. 


17. 


17. 


17. 


17. 


17. 


Ir; 


17. 


Hl 


17. 


17. 


17. 


17. 


17. 


30 


31 


32 


37 


trol, IEEE Trans. Syst. Man Cybern. 15(1), 116-132 
(1985) 

G. Feng: A survey on analysis and design of model- 
based fuzzy control systems, IEEE Trans. Fuzzy Syst. 
14(5), 676-697 (2006) 

L. Wu, X. Su, P. Shi, J. Qiu: A new approach to sta- 
bility analysis and stabilization of discrete-time ts 
fuzzy time-varying delay systems, IEEE Trans. Syst. 
Man Cybern. B: Cybern. 41(1), 273-286 (2011) 

K. Tanaka, H. Yoshida, H. Ohtake, H.0. Wang: 
A sum-of-squares approach to modeling and con- 
trol of nonlinear dynamical systems with polyno- 
mial fuzzy systems, IEEE Trans. Fuzzy Syst. 17(4), 
911-922 (2009) 

L.A. Zadeh: A theory of approximate reasoning, 
Proc. 9th Mach. Intell. Workshop, ed. by J.E. Hayes, 
D. Michie, L.I. Mikulich (Wiley, New York 1979) 
pp. 149-194 

L.A. Zadeh: The role of fuzzy logic in the manage- 
ment of uncertainty in expert systems, Fuzzy Sets 
Syst. 11(1/3), 197-198 (1983) 

D. Dubois, H. Prade: Possibility Theory: An Approach 
to Computerized Processing of Uncertainty (Plenum 
Press, New York 1988) 

L.A. Zadeh: Fuzzy sets as a basis for a theory of pos- 
sibility, Fuzzy Sets Syst. 1(1), 3-28 (1978) 

D. Dubois, H. Prade: The generalized modus ponens 
under sup-min composition — A theoretical study. 
In: Approximate Reasoning in Expert Systems, ed. 
by M.M. Gupta, A. Kandel, W. Bandler, J.B. Kiszka 
(North-Holland, Amsterdam 1985) pp. 217-232 

D. Dubois, H. Prade: Possibility theory as a basis 
for preference propagation in automated reason- 
ing, 1992 IEEE Int. Conf. Fuzzy Syst. (IEEE, New York 
1992) pp. 821-832 

M. Schröder, R. Petersen, F. Klawonn, R. Kruse: 
Two paradigms of automotive fuzzy logic appli- 
cations. In: Applications of Fuzzy Logic: Towards 
High Machine Intelligence Quotient Systems, Envi- 
ronmental and Intelligent Manufacturing Systems 
Series, Vol. 9, ed. by M. Jamshidi, A. Titli, L. Zadeh, 
S. Boverie (Prentice Hall, Upper Saddle River 1997) 
pp. 153-174 

L.l. Kuncheva: Fuzzy Classifier Design, Studies in 
Fuzziness and Soft Computing, Vol. 49 (Physica, 
Heidelberg, New York 2000) 

D. Nauck, R. Kruse: A neuro-fuzzy method to learn 
fuzzy classification rules from data, Fuzzy Sets Syst. 
89(3), 277-288 (1997) 

R. Mikut, J. Jakel, L. Groll: Interpretability issues in 
data-based learning of fuzzy systems, Fuzzy Sets 
Syst. 150(2), 179-197 (2005) 

R. Mikut, 0. Burmeister, L. Groll, M. Reischl: Takagi- 
Sugeno-Kang fuzzy classifiers for a special class of 
time-varying systems, IEEE Trans. Fuzzy Syst. 16(4), 
1038-1049 (2008) 

J.A. Dickerson, B. Kosko: Fuzzy function approxi- 
mation with ellipsoidal rules, IEEE Trans. Syst. Man 
Cybern. B: Cybern. 26(4), 542-560 (1996) 


17.38 


17.39 


17.40 


17.41 


17.42 


17.43 


17.44 


17.45 


17.46 


17.47 


17.48 


17.49 


17.50 


17.51 


17.52 


17.53 


D. Nauck, R. Kruse: Neuro-fuzzy systems for func- 
tion approximation, Fuzzy Sets Syst. 101(2), 261-271 
(1999) 

L. Wang, J.M. Mendel: Generating fuzzy rules by 
learning from examples, IEEE Trans. Syst. Man Cy- 
bern. 22(6), 1414-1427 (1992) 

J.C. Bezdek, J. Keller, R. Krisnapuram, N.R. Pal: 
Fuzzy Models and Algorithms for Pattern Recog- 
nition and Image Processing, The Handbooks of 
Fuzzy Sets, Vol. 4 (Kluwer, Norwell 1999) 

F. HOppner, F. Klawonn, R. Kruse, T. Runkler: Fuzzy 
Cluster Analysis: Methods for Classification, Data 
Analysis and Image Recognition (Wiley, New York 
1999) 

F. Klawonn, R. Kruse: Automatic generation of fuzzy 
controllers by fuzzy clustering, 1995 IEEE Int. Conf. 
Syst. Man Cybern.: Intell. Syst. 21st Century, Vol. 3 
(IEEE, Vancouver 1995) pp. 2040-2045 

F. Klawonn, R. Kruse: Constructing a fuzzy controller 
from data, Fuzzy Sets Syst. 85(2), 177-193 (1997) 
Z.-W. Woo, H.-Y. Chung, J.-J. Lin: A PID type fuzzy 
controller with self-tuning scaling factors, Fuzzy 
Sets Syst. 115(2), 321-326 (2000) 

R. Kruse, P. Held, C. Moewes: On fuzzy data analysis, 
Stud. Fuzzin. Soft Comput. 298, 351-356 (2013) 

0. Cordon, F. Gomide, F. Herrera, F. Hoffmann, 
L. Magdalena: Ten years of genetic fuzzy systems: 
Current framework and new trends, Fuzzy Sets Syst. 
141(1), 5-31 (2004) 

C. Moewes, R. Kruse: Evolutionary fuzzy rules 
for ordinal binary classification with monotonic- 
ity constraints. In: Soft Computing: State of the 
Art Theory and Novel Applications, Studies in 
Fuzziness and Soft Computing, Vol. 291, ed. by 
R.R. Yager, A.M. Abbasov, M.Z. Reformat, S.N. Shah- 
bazova (Springer, Berlin, Heidelberg 2013) pp. 105- 
112 

D. Nauck, F. Klawonn, R. Kruse: Foundations of 
Neuro-Fuzzy Systems (Wiley, New York 1997) 

J.C. Hühn, E. Hüllermeier: FR3: A fuzzy rule learner 
for inducing reliable classifiers, IEEE Trans. Fuzzy 
Syst. 17(1), 138-149 (2009) 

C. Olaru, L. Wehenkel: A complete fuzzy decision 
tree technique, Fuzzy Sets Syst. 138(2), 221-254 
(2003) 

C. Moewes, R. Kruse: Unification of fuzzy SVMs and 
rule extraction methods through imprecise domain 
knowledge, Proc. Int. Conf. Inf. Process. Manag. 
Uncertain. Knowl.-Based Syst. (IPMU-08), ed. by 
J.L. Verdegay, L. Magdalena, M. Ojeda-Aciego (Tor- 
remolinos, Málaga 2008) pp. 1527-1534 

C. Moewes, R. Kruse: On the usefulness of fuzzy 
SVMs and the extraction of fuzzy rules from 
SVMs, Proc. 7th Conf. Eur. Soc. Fuzzy Logic Tech- 
nol. (EUSFLAT-2011) and LFA-2011, Vol. 17, ed. by 
S. Galichet, J. Montero, G. Mauris (Atlantis, Ams- 
terdam, Paris 2011) pp. 943-948 

J.C. Bezdek: Fuzzy Mathematics in Pattern Classifi- 
cation, Ph.D. Thesis (Cornell University, Itheca 1973) 


Fuzzy Control 


References 


17.54 


17.55 


17.56 


J.C. Bezdek: Pattern Recognition with Fuzzy Objec- 
tive Function Algorithms (Kluwer, Norwell 1981) 

M. Sugeno, T. Yasukawa: A fuzzy-logic-based ap- 
proach to qualitative modeling, IEEE Trans. Fuzzy 
Syst. 1(1), 7-31 (1993) 

A. Keller, R. Kruse: Fuzzy rule generation for trans- 
fer passenger analysis, Proc. 1st Int. Conf. Fuzzy 


17.57 


Syst. Knowl. Discovery (FSDK'02), ed. by L. Wang, 
S.K. Halgamuge, X. Yao (Orchid Country Club, Sin- 
gapore 2002) pp. 667-671 

R. Kruse, C. Döring, M. Lesot: Fundamentals of fuzzy 
clustering. In: Advances in Fuzzy Clustering and Its 
Applications, ed. by J.V. de Oliveira, W. Pedrycz (Wi- 
ley, Chichester 2007) pp. 3-30 


283 


Z4 | d Hed 


18. Interval Type-2 Fuzzy PID Controllers 


Tufan Kumbasar, Hani Hagras 


The aim of this chapter is to present a general 
overview about interval type-2 fuzzy PID (propor- 
tional-integral-derivative) controller structures. 
We will focus on the standard double input direct 
action type fuzzy PID controller structures and their 
present design methods. It has been shown in 
various works that the type-1 fuzzy PID controllers, 
using crisp type-1 fuzzy sets, might not be able to 
fully handle the high levels of uncertainties asso- 
ciated with control applications while the type-2 
fuzzy PID controller using type-2 fuzzy sets might 
be able to handle such uncertainties to produce 
a better control performance. Thus, we will clas- 
sify and examine the handled fuzzy PID controllers 
within two groups with respect to the fuzzy sets 
they employ, namely type-1 and interval type-2 
fuzzy sets. We will present and examine the con- 
troller structures of the direct action type-1 fuzzy 
PID and interval type-2 fuzzy PID controllers on 


18.1 Fuzzy Control Background 


It is a known fact that the conventional PID controllers 
are the most popular controllers used in industry due 
to their simple structure and cost efficiency [18.1, 2]. 
However, the PID controller being linear is not suited 
for strongly nonlinear and uncertain systems. Thus, 
fuzzy logic controllers (FLCs) are extensively used as 
an alternative to PID control in processes where the sys- 
tem dynamics is either very complex or exhibit highly 
nonlinear characteristics. FLCs have achieved a huge 
success in real-world control applications since it does 
not require the process model and the controller can be 
constructed based on the human operator’s control ex- 
pertise. 

In the fuzzy control literature, fuzzy PID controllers 
(FPID) are often mentioned as an alternative to the con- 
ventional PID controllers since they are analogous to 
the conventional PID controllers from the input-output 


18.1 Fuzzy Control Background .................... 285 
18.2 The General Fuzzy PID Controller 
SUUE iera die aiei 286 
18.2.1 Type-1 Fuzzy PID Controllers........ 287 
18.2.2 Interval Type-2 Fuzzy PID 
Coma aleri iyesi 288 
18.3 Simulation Studies.....................6. 291 
18.4 COMCIUSION.................ccccceeeeeeeeeeeeeeees 292 
RefereittS encerra iesinoieti sititon 293 


a generic, a symmetrical 3 x 3 rule base. We will 

present general information about the type-1 fuzzy 
PID and interval type-2 fuzzy PID controllers tuning 
parameters and design strategies. Finally, we will 
present a simulation study to evaluate the control 
performance of the type-1 fuzzy PID and interval 

type-2 fuzzy PID on a first-order plus time-delay 
benchmark process. 


relationship point of view [18.3—6]. The FPID con- 
trollers can be classified into three major categories as 
direct action type, fuzzy gain scheduling type, and hy- 
brid type fuzzy PID controllers [18.6]. The direct action 
type can also be classified into three categories accord- 
ing to the number of inputs as single input, double input, 
and triple input direct action FPID controllers [18.6]. 
In the literature, researchers mainly focused on and ana- 
lyzed double input direct action FPID controllers [18.6- 
10]. Numerous techniques have been developed in the 
literature for analyzing and designing a wide variety of 
FPID control systems. After the pioneer study by Qiao 
and Mizumoto [18.7], the main research was focused 
on type-1 fuzzy PID controllers (T1-FPID); however, 
a growing number of techniques have been developed 
for interval type-2 fuzzy PID controllers (IT2-FPID) 
controllers, recently. 


285 


vV 
o 
= 
rr 
[se] 
= 
© 
° 
= 


286 


T'8SL | a Hed 


Part B 


Fuzzy Logic 


It has been demonstrated that type-2 fuzzy logic 
systems are much more powerful tools than ordinary 
(type-1) fuzzy logic systems to represent highly nonlin- 
ear and/or uncertain systems. As a consequence, type-2 
fuzzy logic systems have been applied in various ar- 
eas especially in control system design. The internal 
structure of the interval type-2 fuzzy logic controllers 
(IT2-FLC) is similar to the type-1 counterpart. How- 
ever, the major difference is that at least one of the input 
fuzzy sets (FSs) is an interval type-2 fuzzy set (IT2- 
FS) [18.11]. Thus, a type reducer is needed to convert 
type-2 sets into a type-1 fuzzy set before a defuzzifica- 
tion procedure can be performed [18.12]. Generally, in- 
terval type-2 fuzzy logic systems achieve better control 
performance because of the additional degree of free- 
dom provided by the footprint of uncertainty (FOU) in 
their membership functions [18.13]. Consequently, IT2- 
FLCs have attracted much research interest, especially 
in control applications, since they are a much more pow- 


erful to handle uncertainties and nonlinearities [18.14]. 
Thus, several applications employed successfully IT2- 
FLCs such as pH control [18.13], liquid-level process 
control [18.15], autonomous mobile robots [18.14, 16, 
17], and bioreactor control [18.18]. 

In this chapter, we will focus on the most commonly 
used double input direct action type FPID controller 
structures. We will first present the general structure 
of the FPID controller and then classify the FPID con- 
trollers within two groups with respect to the fuzzy sets 
they employ, namely type-1 and interval type-2 fuzzy 
sets. Thus, we will present and examine the structures 
of the T1-FPID and T2-FPID controllers on a generic, 
a symmetrical 3 x 3 rule base. We will present detailed 
information about their internal structures and design 
strategies presented in the literature. Finally, we will 
evaluate the control performance of the T1-FPID and 
IT2-FPID on a first-order plus time delay benchmark 
process. 


18.2 The General Fuzzy PID Controller Structure 


In this section, we present the general structure of the 
two input direct action type FPID controllers formed 
using a fuzzy PD controller with an integrator and 
a summation unit at the output [18.7—10]. The standard 
FPID controller is constructed by choosing the inputs 
to be error (e) and derivative of error (Ae) as shown 
and the output is the control signal (u) as illustrated in 
Fig. 18.1. The output of the FPID is defined as 


u=aU +B | var, (18.1) 


where U is the output of the fuzzy inference system. 
The design parameters of the FPID controller struc- 
ture can be summarized within two groups, structural 
parameters and tuning parameters [18.6]. The structural 
parameters include input/output variables to fuzzy in- 


| al 
Fuzzy U 7 F 
logic ( |] 

controller i 

B 


Fig. 18.1 Illustration of the FPID controller structure 


ference, fuzzy linguistic sets, type of membership func- 
tions, fuzzy rules, and the inference mechanism, i. e., 
the fuzzy logic controller. In the handled FPID struc- 
ture, the FLC is constructed as a set of heuristic control 
rules, and the control signal is directly deduced from 
the knowledge base and the fuzzy inference as done 
in diagonal rule base generation approaches [18.7—10]. 
More detailed information about the internal structure 
of the FPID will be presented in the following sub- 
sections. Usually the structural parameters of the FPID 
controller structure are determined during an off-line 
design. 

The tuning parameters include input/output scaling 
factors (SFs) and parameters of membership functions 
(MFs). As can be seen from Fig. 18.1, the handled 
FPID controller structure has two input and two out- 
put scaling factors [18.7—10]. The input SFs K, (for 
error (e)) and Ka (for the change of error (Ae)) normal- 
ize the inputs to the common interval [—1, 1] in which 
the membership functions of the inputs are defined 
(thus e(t) and Ae(t))) are converted after normalization 
into E and AE). While the FLC output (U) is mapped 
onto the respective actual output (u) domain by out- 
put SFs œ and f. Usually, the tuning parameters can 
be calculated during offline design process as well as 
online adjustments of the controller to enhance the pro- 
cess performance [18.9, 10]. 


Interval Type-2 Fuzzy PID Controllers 


18.2 The General Fuzzy PID Controller Structure 


In the following subsections, we will examine the 
structures of the T1-FPID and IT2-FPID controllers. 
We will present detailed information about the T1-FPID 
and IT2-FPID internal structures, design parameters, 
and tuning strategies. Finally, we will present a com- 
parative simulation results to show the superiority of 
the interval type-2 fuzzy PID controller compared to its 
type-1 counterparts. 


18.2.1 Type-1 Fuzzy PID Controllers 


In this subsection, we will start by presenting the in- 
ternal structure of the handled T1-FPID controller and 
then we will present brief information about the design 
strategies for the T1-FPID controller structure in the lit- 
erature. 


The Internal Structure 

of the Type-1 Fuzzy PID Controller 
In the handled T1-FPID structure, a symmetrical 3 x 
3 rule base is used as shown in Table 18.1. The rule 
structure of the type-1 fuzzy logic controller (T1-FLC) 
is as follows 

Rm: If E is Ay, and AE is Ay then U is Gm, (18.2) 
where E (normalized error) and AE (normalized change 
of error) are the inputs, U is the output of FLC, G,, is 
the consequent crisp set (f = 1... F = 9), and F is the 
number of rules. Here, A;, and Az, represent the type- 
1 membership functions (T1-MFs) (k = 1,2, K = 3; 
l= 1,2, L=3), K, and L are the number of MFs that 
cover the universe of discourse of the inputs E and AE, 
respectively. In this chapter, we will employ three tri- 
angular type T1-MFs for each input domain (E and 
AE) and denote them as N (negative), Z (zero), and P 
(positive). The T1-MFs of the T1-FLC are defined with 
the three parameters (lj, cy, rj; i= 1, I= 2; j= 1,2, 
J = 3), as shown in Fig. 18.2a. Here, Z is the total 
number of the inputs (I = 2) and J (J = K = L = 3) is 
the total number of MFs. The outputs of the FLC are 


a) u b) u 


Table 18.1 The rule base of the FPID controller 


E/AE N Z P 
N N NM Z 
Z NM Z PM 
P Z PM P 


defined with five crisp singleton consequents (negative 
(N) = yn, negative medium (NM) = ynm, zero (Z) = 
yz, positive medium (PM) = ypm, positive (P) = yp) as 
illustrated in Fig. 18.2b. The implemented T1-FLCs use 
the product implication and the center of sets defuzzi- 
fication method. Thus, the output (U) of the T1-FLC is 
defined as 


M 
U= net mGm (18.3) 


pees 


where fn is the total firing strength for the m-th rule is 
defined as 

Jin = Haw * Han - (18.4) 
Here, x represents the product implication (the t-norm) 


and ju4,, and ua, are the membership grades of the Aj, 
and Az; T1-FMs, respectively. 


Type-1 Fuzzy PID Design Strategies 
In the design of the handled T1-FPID controller struc- 
ture with a 3 x3 rule base, the parameters to be de- 
termined are the scaling factors and the parameters of 
the antecedent and consequent membership functions. 
The antecedent MFs of the T1-FPID controller that 
are labeled as the N and P are defined for each in- 
put with two parameters each which are cj, rj (for 
N) and l3, c (for P) (i= 1,2), respectively, while 
the linguistic label Z is defined with three parameters 
which are lj, cn, rn, (i = 1,2). Hence for two inputs, 
the total number of the antecedent membership func- 
tion parameters to be designed for the T1-FPID is then 
2x 7= 14. Moreover, five output consequent parame- 
ters (YN, YNM; YZ, YPM, yp) have to be determined. Thus, 


Fig. 18.2a,b Illustration of the 


ca lz ra Ca lis 


r2 CB YN YNM 


> (a) antecedent MFs (b) consequent 
MFs of the T1-FLC 


287 


T'SI | Hed 


288 


T'8SL | a Hed 


Part B | Fuzzy Logic 
in total 19 MF parameters have to be tuned. Besides the a type-2 membership function jj (x, u), i. e., 
input and output scaling factors (Ke, Ka, œ and £) of 7 
the T1-FPID controller must also be determined. Thus, A= {((x,u), L(x, u)) |YxEX, 
there are 19 MF and four SF parameters (19 + 4 = 23) 
that have to be tuned for the handled T1-FPID con- VES E10, T) ee 


Crisp 
input 


troller. 

In the fuzzy control literature, one method for 
the T1-FPID controller design is by employing evo- 
lutionary algorithms [18.19,20]. Besides Ahn and 
Truong [18.21] used a robust extended Kalman fil- 
ter to tune the membership functions of the fuzzy 
controller during the system operation process to im- 
prove the control performance in an online manner. 
Moreover, various heuristic and nonheuristic scaling 
factor tuning algorithms have been presented in the 
case of the systems that own nonlinearities, parameter 
changes, modeling errors, disturbances [18.7, 9, 10, 22— 
25]. 


18.2.2 Interval Type-2 Fuzzy PID Controllers 


In this subsection, we will start by presenting the in- 
ternal structure of the handled IT2-FPID controller and 
then we will present brief information about the design 
strategies for the IT2-FPID controller structure in the 
literature. 


The Internal Structure of the Interval Type-2 

Fuzzy PID Controller 
In the handled IT2-FPID structures, the same 3 x 3 rule 
base presented for T1-FPID controller is used which is 
presented in Table 18.1. The internal structure of the 
IT2-FPID is similar to the type-1 counterpart. How- 
ever, the major differences are that IT2-FLCs employ 
IT2-FSs (rather than type-1 fuzzy sets) and the IT2- 
FLCs process interval type-2 fuzzy sets (IT2-FSs) and 
thus the IT2-FLC has the extra type-reduction pro- 
cess [18.12, 14]. 

Type-2 fuzzy sets are the generalized forms of type- 
1 fuzzy sets. A type-2 fuzzy set (A) is characterized by 


Output 
Rules processing Crisp 
——— output 
Defuzzifier 


x 


Fuzzifier 
Type-reducer 
~ > Inference 
Type-2 intput Type-2 output 
fuzzy sets fuzzy sets 


Fig. 18.3 Block diagram of the IT2-FLC 


in which 0 < uz(x, u) < 1. 
For a continuous universe of discourse, A can be 
also expressed as 


A= | f wwie, J, © [0,1]; (18.6) 


xEX uel, 


where ff denotes union over all admissible x and 
u [18.12, 14]. J, is referred to as the primary member- 
ship of x, while uz (x,u) is a type-1 fuzzy set known 
as the secondary set. The uncertainty in the primary 
membership of a type-2 fuzzy set A is defined by a re- 
gion named footprint of uncertainty (FOU). The FOU 
can be described in terms of an upper membership 
function (mz) and a lower membership function (u3). 
The primary membership is called Jy, and its associ- 
ated possible secondary membership functions can be 
trapezoidal, interval, etc. When the interval secondary 
membership function is employed an IT2-FS (such as 
the ones shown in Fig. 18.4a) is obtained [18.12, 14]. 
In other words, when u3 (x, u) = 1 for Y u € Jy C [0, 1], 
an IT2-FS is constructed. 

The internal structure of the IT2-FLC is given in 
Fig. 18.3. Similar to a T1-FLC, an IT2-FLC includes 
fuzzifier, rule-base, inference engine, and substitutes 
the defuzzifier by the output processor comprising 
a type reducer and a defuzzifier. The IT2-FLC uses 
interval type-2 fuzzy sets (such as the ones shown in 
Fig. 18.4a) to represent the inputs and/or outputs of the 
FLC. In the interval type-2 fuzzy sets all the third di- 
mension values are equal to one. The use of IT2-FLC 
helps to simplify the computation (as opposed to the 
general type-2 FLC where the third dimension of the 
type-2 fuzzy sets can take any shape). 

The IT2-FLC works as follows: the crisp inputs are 
first fuzzified into input type-2 fuzzy sets; singleton 
fuzzification is usually used in IT2-FLC applications 
due to its simplicity and suitability for embedded pro- 
cessors and real-time applications. The input type-2 
fuzzy sets then activate the inference engine and the 
rule base to produce output type-2 fuzzy sets. The IT2- 
FLC rule base remains the same as for the T1-FLC but 
its MFs are represented by interval type-2 fuzzy sets 
instead of type-1 fuzzy sets. The inference engine com- 
bines the fired rules and gives a mapping from input 


Interval Type-2 Fuzzy PID Controllers | 18.2 The General Fuzzy PID Controller Structure 


b) 


ile PM 


Fig. 18.4a,b Illustration of the 
(a) antecedent MFs (b) consequent 
MEFs of the IT2-FLC 


TENI 


ca l Traco la T2 c3 


type-2 fuzzy sets to output type-2 fuzzy sets. The type-2 
fuzzy output sets of the inference engine are then pro- 
cessed by the type reducer which combines the output 
sets and performs a centroid calculation which leads to 
type-1 fuzzy sets called the type-reduced sets. There are 
different types of type-reduction methods. In this pa- 
per, we will be using the center of sets type reduction 
as it has reasonable computational complexity that lies 
between the computationally expensive centroid type- 
reduction and the simple height and modified height 
type-reductions which have problems when only one 
rule fires [18.12]. After the type-reduction process, the 
type-reduced sets are defuzzified (by taking the average 
of the type-reduced sets) to obtain crisp outputs. More 
information about the type-2 fuzzy logic systems and 
their benefits can be found in [18.11, 12, 14]. 

The rule structure of the interval type-2 fuzzy logic 
controller is as follows 


R„: If E is Äi; and AE is Ay, then U is Gp», (18.7) 


where E (normalized error) and AE (normalized change 
of error) are the inputs, U is the output of IT2-FLC, 
Gm is the consequent interval set (Gn = = l8, i gnl m = 
1,...,M = 9) and M is the number of rules. The an- 
tecedents of the IT2-FLC are defined interval type-2 
membership functions (IT2-MFs) (Aix, A21) for the in- 
puts E and AE, respectively, which can be simply ob- 
tained by extending/blurring the T1-MFs (Aj,, A21) of 
the T1-FLC. Here, the IT2-MF is defined with four pa- 
rameters (Jj, cj, Fj, 63 i = 1,2, j = 1,2, 3), as shown in 
Fig. 18.4a. Since the input IT2-FS is described in terms 
of an upper membership function (mz) and a lower 
membership function (m ), the total firing strength for 
the mth rule is 


h= [EA iF (18.8) 
where fn is the total firing interval and is defined as 

Iam ta Ba (18.9) 

Ím = Pay, * Mia (18.10) 


YN YN YnmM YNM = yz 


YPM Yem yp Ye 


Here, x represents the product implication (the t-norm) 
and Ma Lia and Mz,» Mä, are the lower and upper 


membership grades of the A 1; and Ady IT2-FMs, respec- 
tively. 

The consequent membership functions of the 
IT2-FLC are defined with five interval consequents 
and label them as negative (N) = Dy YJ, negative 
medium (NM) = [Y m nm]; Zero (Z) = [y, yz], pos- 
itive medium (PM) = Diane Ypy]; and positive (P) = 
A Yp] as shown in Fig. 18.4b. 

The implemented IT2-FLC uses the center of sets 
type reduction method [18.12]. It has been demon- 
strated that the defuzzified output of an IT2-FLC can 
be calculated as 


Ui + U: 


U= 
2 


(18.11) 


where U; and U, are the left- and right-end points, 
respectively, of the type reduced set, are defined as fol- 
lows 


U Sita F De. f, Gin 
=) a oe 
eee A F Yoo len 


m= f Gm F (mee 
U, = Ermi, Gn + Eat l (18.13) 


Lail + La iP 


—mMm 


(18.12) 


The typed reduced set can be calculated by using the 
iterative Karnik and Mendel method (KM), which is 
given in Table 18.2 [18.26]. 


Interval Type-2 Fuzzy PID Design Strategies 
In the interval type-2 fuzzy PID control design strategy, 
the scaling factors, the parameters of the antecedent, 
and consequent membership functions of the IT2- 
FPIDs have to be determined. The antecedent IT2-FSs 
of IT2-FPID controller that are labeled as the P and N 
for each input are defined with three parameters each 
which are (cj, ri, ôn), and (l3, ciz, 633, i= 1,2), re- 
spectively, while the IT2-FS labeled as Z is defined with 


289 


T'SI | Hed 


290 PartB 


T'8SL | d Hed 


Fuzzy Logic 


Table 18.2 Calculation of the two end points of the type reduced set 


Steps The Karnik Mendel algorithm for computing U1 


il. Sort En m= 1,...,M ) in increasing order such that 


GSB Sooo SB, . Match the corresponding weights f mtg 


(with their noe corresponds to the renumbered E? ») 


The Karnik Mendel algorithm for computing Ur 

Sort g (m = 1,2,...,M) in increasing order such that 

2) <2 <- < Zy. Match the corresponding weights f, de. 
(with their index corresponds to the renumbered g,,,) 


2 Initialize f by setting Initialize fin by setting 
Ja. Poel 
fn= oer m=1,2,...M fu= m=1,2,...,M 
Compute Compute 
M 
ü= See, ve 2E mEn 
M 
See D 
3. Find the switch point L (1 < L < M — 1) such that Find the switch point R (1 < L < M — 1) such that 
EE U Sgi Br <US Boil 
fa ZIL m<R 
4. Set fn = Fm EF Set fn = Ln Zz 
and compute and compute 
y'= 2 mE m Wa DR 
EEM a F M 
Dea r 
5, Check if U = V’. If not go to step 3 and set U’ = U. If yes Check if U = U’. If not go to step 3 and set U’ = U. If yes 


stop and set U; = U’. 


four parameters (l2, Cn, rn, ôn, i = 1, 2). Consequently, 
for the two inputs the total numbers to be designed is 
2x 10 = 20. Moreover, 10 output consequent param- 
eters i »YNo Yyy’ YNM? Y7 YZ: YPM? YPM: Yp Yp) have to 
be determined T Hence, in total 30 MF parameters have 
to be tuned. Besides, the input and output scaling fac- 
tors (Ke, Ka, œ, and f) of the controller must also be 
determined. Thus, there are 30 MF and four SF param- 
eters (30 + 4 = 34) have to be tuned for the handled 
IT2-FPID controller. It is obvious that, the IT2-FPID 
has 11 more tuning parameters, i.e., extra degrees of 
freedom, than the T1-FPID controller structure (23 
parameters). 

The systematic design of type-2 fuzzy controllers is 
a challenging problem since the output cannot be pre- 
sented in a closed form due to the KM-type reduction 
method. To overcome this bottleneck, alternative type 
reduction algorithms which are closed-form approxi- 
mations to the original KM-type reduction algorithm 
have been proposed and employed in controller de- 
sign [18.27,28]. However, the main difficulty is to 
tune the relatively big number of parameters of the 


stop and set U,= U’. 


IT2-FPID controller structure. Thus, several studies 
have employed various techniques for the design prob- 
lem including genetic algorithms [18.15, 29], particle 
swarm optimization [18.30], and ant colony optimiza- 
tion [18.31]. 

In practical point of view, the IT2-FPID con- 
troller design problem can be simply solved by blur- 
ring/extending MFs of an existing T1-FPID con- 
troller [18.16, 28,32]. Moreover, each rule consequent 
can also be chosen as a crisp number (g, = gpn) to 
reduce the number of parameters to be “determined. 
It is also common to set the consequent parameters 
and the scaling factors to the same value of a pre- 
designed T1-FPID controller. Thus, in the IT2-FPID 
design only the antecedent membership parameters 
have to be tuned [18.28]. This design approach will 
reduce the parameters to be tuned from 34 to 20 
since only the parameters of the antecedent mem- 
bership functions must be designed. The design of 
the antecedent MFs can be solved by extensively 
trial and error procedures or employing evolutionary 
algorithms [18.33]. 


Interval Type-2 Fuzzy PID Controllers | 18.3 Simulation Studies 


18.3 Simulation Studies 


In this section, we will compare the performances 
of the IT2-FPID controller with the T1-FPID controller 
for the following first-order plus-time delay process 


K 
G(s) = oe 18.14 
(s) a r ( ) 


where K is the gain, L is the time delay, and t is the time 
constant of the process. The nominal system parameters 
are K =1,L=landt=1. 

We will first design a T1-FPID controller and then 
extend the type-1 fuzzy controller to design an IT2- 
FPID controller structure since type-2 fuzzy logic the- 
ory is a generalization and extension of type-1 fuzzy 
logic theory. We will characterize each input domain 
(E and AE) of the T1-FPID controllers with three uni- 
formly distributed symmetrical triangular MFs. The pa- 
rameters of the MFs are tabulated in Table 18.3. We will 


a) Nominal process (K = 1, T= 1, L= 1) 
> 
a 1 
2 
g 0.8 
g ; 
ep OMS) i 
0.4 i 
i =: Reference 
0.2 H ---- T1-FPID 
i — IT2-FPID 
o ! 
0 10 20 30 40 
Time (s) 
c) Perturbed process-2 (K = 1,7 SIRE 2) 
eee 
> IA 
3 A 
S i 
2 i 
Oe 
3 
a 
nan 
0.5 
—-—- Reference 
---- T1-FPID 
— IT2-FPID 
0 T > 
0 10 20 30 40 
Time (s) 


set the consequent parameters as yy = —1.0 (nega- 
tive), ynm = —0.75 (negative medium), yz = 0.0 (zero), 
ypm = 0.75 (positive medium), yp = 1.0 (positive) to 
obtain a standard diagonal rule base. Then, the scal- 
ing factors of the T1-FPID have been chosen such that 
to obtain a fast and satisfactory output response for 


Table 18.3 The antecedent MF parameters of the T1-FPID 
and IT2-FPID controllers 


T1-FPID IT2-FPID 
il c r I c r ô 
E N —1.0 0.0 =10| 00 | Ow 
Z =10) on 10 | = GO] 10 | O® 
P 0.0 1.0 0.0 1.0 0.2 
AE N —1.0 0.0 =I] OO | Oe 
Z —1.0 00 10 —10 00 10 0.9 
B 0.0 1.0 0.0 1.0 0.2 
) 
= 1.4 
> 
a 12 
= 
5 1 
| 
2 0.8 
S 
n 
0.6 
0.4 = Reference 
---- T1-FPID 
0.2 T2-FPID 
0 
0 10 20 30 40 
Time (s) 
) Perturbed process-3 (K = 1, T= 1, L= 1) 
5 
3 
5 
© 
| 
2 
A 
an 


=-— Reference 
---- T1-FPID 
T2-FPID 


0 10 20 30 40 
Time (s) 


Fig. 18.5a-d Illustration of the step responses: (a) nominal process (b) perturbed process-1 (c) perturbed process-2 


(d) perturbed process-3 


291 


€°SL| d Hed 


292 


78L|q Hed 


Part B | Fuzzy Logic 


Table 18.4 Control performance comparison of the FPID controllers 


Nominal process Perturbed process-1 Perturbed process-2 Perturbed process-3 

(eibe Sik =i) (K=1,r=2,L=1) (===) (K=2,r = 150 =") 

os is ITAE OS Ii ITAE OS Is ITAE OS Ts ITAE 
ER PID ss 2 9.8 27.87 25 18.6 53.38 43 23.8 80.83 47 15.5 43.74 
IT2-FPID 4 6.0 26.63 16 14.8 47.83 30 18.2 SOLS 36 11.9 32.17 


a unit step input. The scaling factors are set as Ke = 1, 
Ka = 0.1, œ = 0.1, and $ = 0.5. 

As it has been asserted, the IT2-FPID controller de- 
sign will be accomplished by only blurring/extending 
antecedent MFs of the T1-FPID controller. Thus, we 
will set the consequent parameters and the scaling 
factors as the same values of its type-1 counterpart. 
The antecedent MFs of the IT2-FPID parameters are 
presented in Table 18.3. This setting will give the op- 
portunity to illustrate how the extra degrees of freedom 
provided by FOUs affect the control system perfor- 
mances. 

In the simulation studies, both FPID controllers 
are implemented as the discrete-time versions obtained 
with the bilinear transform with the sampling time t, = 
0.1s. The simulations were done on a personal com- 
puter with an Intel Pentium Dual Core T2370 1.73 GHz 
processor, 2.99 GB RAM, and software package MAT- 
LAB/Simulink 7.4.0. Note that the simulation solver 
option is chosen as ode5 (Dormand-prince) and the step 
size is fixed at a value of 0.1 s. 

The unit step response performances of the type-1 
and type-2 fuzzy PID control systems are investigated 
for the nominal parameter set K=1, t=1, L=1 
(nominal process) and for three perturbed parameter 
sets which are K = 1, t = 2, L = 1 (perturbed process- 
1), K =1,t = 1, L = 2 (perturbed process-2), and K = 
2, tT = 1, L = 1 (perturbed process-3) to examine their 


18.4 Conclusion 


The aim of this chapter is to present a general overview 
about FPID controller structures in the literature since 
fuzzy sets are recognized as a powerful tool to han- 
dle the faced uncertainties within control applications. 
We mainly focused on the double input direct ac- 
tion type fuzzy PID controller structures and their 
state-of-the-art design methods. We classified the fuzzy 
PID controllers in the literature within two groups, 
namely T1-FPID and IT2-FPID controllers. We ex- 
amined the internal structures of the T1-FPID and 
IT2-FPID on a generic, a diagonal 3 x3 rule base. 


robustness against parameter variations. In this context, 
we will consider three performance measures namely, 
the settling time (7,), the overshoot (OS%), and the in- 
tegral time absolute error (ITAE). 

The system performances of the nominal and per- 
turbed systems are illustrated in Fig. 18.5 and the 
performance measures are given in Table 18.4. As it can 
be clearly seen in Fig. 18.5, the IT2-FPID controller 
produces superior control performance in comparison 
to its type-1 controller counterpart. For instance, if we 
examine the results for nominal process, as compared 
to T1-FPID, the IT2-FPID control structure reduces 
the overshoot by about 66%; it also decreases the 
settling time by about 39% and the total IAE value 
by about 8%. Moreover, if we examine the results 
of perturbed process-2 (the time delay (L) has been 
increased 100%) it can be clearly seen that the T1- 
FPID control system response is oscillating while the 
IT2-FPI was able to reduces the overshoot by about 
30%, the settling time by about 24% and the to- 
tal ITAE value by about 27%. Similar comments can 
be made for presented other two perturbed system 
performances. 

It can be concluded that the transient state perfor- 
mance of the IT2-FPID control structure is better than 
the T1-FPID controllers while it appears to be more ro- 
bust against parameter variations in comparison to the 
type-1 counterpart. 


We presented detailed information about their inter- 
nal structures, design parameters, and tuning strate- 
gies presented in the literature. Finally, we evaluated 
the control performance of the T1-FPID and IT2- 
FPID on a first-order plus time delay benchmark 
process. We illustrated that the T1-FPID controller 
using crisp type-1 fuzzy sets might not be able to 
fully handle the high levels of uncertainties while 
IT2-FPID using type-2 fuzzy sets might be able to 
handle such uncertainties to produce a better control 
performance. 


Interval Type-2 Fuzzy PID Controllers 


References 


References 


18.1 


18.2 


18.3 


18.4 


18.5 


18.6 


18.7 


18.8 


18.9 


18.10 


18.11 


18.12 


18.13 


18.14 


18.15 


18.16 


18.17 


S. Skogestad: Simple analytic rules for model re- 
duction and PID controller tuning, J. Process Control 
13(4), 291-309 (2003) 

M. Zhuang, D.P. Atherton: Automatic tuning of op- 
timum PID controllers, Control Theory Appl., IEE 
Proc. D 140(3), 216-224 (1993) 

S. Galichet, L. Foulloy: Fuzzy controllers: Synthesis 
and equivalences, IEEE Trans. Fuzzy Syst. 3, 140-148 
(1995) 

B.S. Moon: Equivalence between fuzzy logic con- 
trollers and PI controllers for single input systems, 
Fuzzy Sets Syst. 69, 105-113 (1995) 

T.T. Huang, H.Y. Chung, J.J. Lin: A fuzzy PID con- 
troller being like parameter varying PID, IEEE Int. 
Fuzzy Syst. Conf. Proc. 1, 269-275 (1999) 

B. Hu, G.K.Il. Mann, R.G. Gasine: A systematic study 
of fuzzy PID controllers — Function-based evalua- 
tion approach, IEEE Trans. Fuzzy Syst. 9(5), 699-711 
(2001) 

W.Z. Qiao, M. Mizumoto: PID type fuzzy controller 
and parameters adaptive method, Fuzzy Sets Syst. 
78, 23-35 (1996) 

H.X. Li, H.B. Gatland: Conventional fuzzy control 
and its enhancement, IEEE Trans. Syst. Man Cybern. 
Part B 26(5), 791-797 (1996) 

M. Guzelkaya, I. Eksin, E. Yesil: Self-tuning of PID- 
type fuzzy logic controller coefficients via relative 
rate observer, Eng. Appl. Artif. Intell. 16, 227-236 
(2003) 

X.-G. Duan, H.-X. Li, H. Deng: Effective tuning 
method for fuzzy PID with internal model control, 
Ind. Eng. Chem. Res. 47, 8317-8323 (2008) 

N.N. Karnik, J.M. Mendel, Q. Liang: Type-2 fuzzy 
logic systems, IEEE Trans. Fuzzy Syst. 7, 643-658 
(1999) 

Q. Liang, J.M. Mendel: Interval type-2 fuzzy logic 
systems: Theory and design, IEEE Trans. Fuzzy Syst. 
8(5), 535-550 (2000) 

T. Kumbasar, |. Eksin, M. Guzelkaya, E. Yesil: Type-2 
fuzzy model based controller design for neutral- 
ization processes, ISA Transactions 51(2), 277-287 
(2012) 

H. Hagras: A hierarchical type-2 fuzzy logic control 
architecture for autonomous mobile robots, IEEE 
Trans. Fuzzy Syst. 12(4), 524-539 (2004) 

D. Wu, W.W. Tan: Genetic learning and perfor- 
mance evaluation of internal type-2 fuzzy logic 
controllers, Eng. Appl. Artif. Intell. 19, 829-841 
(2006) 

M. Galluzzo, B. Cosenza, A. Matharu: Control of 
a nonlinear continuous bioreactor with bifurcation 
by a type-2 fuzzy logic controller, Comput. Chem. 
Eng. 32(12), 2986-2993 (2008) 

J.S. Martinez, J. Mulot, F. Harel, D. Hissel, M.C. Péra, 
R.I. John, M. Amiet: Experimental validation of 
a type-2 fuzzy logic controller for energy manage- 


18.18 


18.19 


18.20 


18.21 


18.22 


18.23 


18.24 


18.25 


18.26 


18.27 


18.28 


18.29 


18.30 


18.31 


18.32 


ment in hybrid electrical vehicles, Eng. Appl. Artif. 
Intell. 26(7), 1772-1779 (2013) 

C. Lynch, H. Hagras: Developing type-2 fuzzy logic 
controllers for handling the uncertainties in marine 
diesel engine speed control, J. Comput. Intell. Res. 
4 (4), 402-422 (2009) 

Y.T. Juang, Y.T. Chang, C.P. Huang: Design of fuzzy 
PID controllers using modified triangular member- 
ship functions, Inf. Sci. 178, 1325-1333 (2008) 

G. Fang, N.M. Kwok, Q. Ha: Automatic membership 
function tuning using the particle swarm opti- 
mization, 2008 IEEE Pasific-Asia Workshop Comput. 
Intell. Indust. Appl. (2008) pp. 324-328 

K.K. Ahn, D.Q. Truong: Online tuning fuzzy PID con- 
troller using robust extended Kalman filter, J. Pro- 
cess Control 19, 1011-1023 (2009) 

R.K. Mudi, N.R. Pal: A robust self-tuning scheme for 
PI- and PD-type fuzzy controllers, IEEE Trans. Fuzzy 
Syst. 7(1), 2-16 (1999) 

Z.W. Woo, HY. Chung, J.J. Lin: A PID-type fuzzy con- 
troller with self-tuning scaling factors, Fuzzy Sets 
Syst. 115, 321-326 (2000) 

S. Bhatttacharya, A. Chatterjee, S. Munshi: A new 
self-tuned PID-type fuzzy controller as a combi- 
nation of two-term controllers, ISA Transactions 43, 
413-426 (2004) 

0. Karasakal, M. Guzelkaya, |. Eksin, E. Yesil: 
An error-based on-line rule weight adjustment 
method for fuzzy PID controllers, Expert Syst. Appl. 
38(8), 10124-10132 (2011) 

H. Wu, J. Mendel: Enhanced karnik-mendel algo- 
rithms, IEEE Trans. Fuzzy Syst. 17(4), 923-934 (2009) 
M. Biglarbegian, W.W. Melek, J.M. Mendel: On the 
stability of interval type-2 TSK fuzzy logic control 
systems, IEEE Trans. Syst. Man. Cybern. Part B 4(3), 
798-818 (2010) 

D. Wu: On the fundamental differences between 
type-1 and interval type-2 fuzzy logic controllers, 
IEEE Trans. Fuzzy Syst. 10(5), 832-848 (2012) 

R. Martinez, 0. Castillo, L.T. Aguilar: Optimiza- 
tion of interval type-2 fuzzy logic controllers for 
a perturbed autonomous wheeled mobile robot 
using genetic algorithms, Inf. Sci. 179(13), 2158-2174 
(2009) 

S.-K. Oh, H.-J. Jang, W. Pedrycz: A comparative 
experimental study of type-1/type-2 fuzzy cascade 
controller based on genetic algorithms and parti- 
cle swarm optimization, Expert Syst. Appl. 38(9), 
11217-11229 (2011) 

0. Castillo, R. Martinez, P. Melin, F. Valdez, J. So- 
ria: Comparative study of bio-inspired algorithms 
applied to the optimization of type-1 and type-2 
fuzzy controllers for an autonomous mobile robot, 
Inf. Sci. 19(2), 19-38 (2012) 

0. Linda, M. Manic: Comparative analysis of type- 
1 and type-2 fuzzy control in context of learn- 


293 


8L | qa Hed 


294 Part B | Fuzzy Logic 


ing behaviors for mobile robotics, IECON 2010-36th 18.33 0. Castillo, P. Melin: A review on the design and op- 
Annu. Conf. IEEE Ind. Electron. Soc. (2010) pp. 1092- timization of interval type-2 fuzzy controllers, Appl. 
1098 Soft Comput. 12, 1267-1278 (2012) 


8L| a Hed 


295 


19. Soft Computing in Database 
and Information Management 


Guy De Tré, Stawomir Zadrozny 


a) 
fy) 
zE 

i fos) 

Information is often imperfect. The sources of 19.2.1 Relational Databases ................. 296 ee 

a aE A Pa : 19.2.2 Fuzzy Set Theory = 
this imperfection include imprecision, uncertainty, and Possibility Theory 297 D 


incompleteness, and ambiguity. Soft computing 
techniques allow for coping more efficiently with 
such kinds of imperfection when handling data in 
information systems. In this chapter, we give an 
overview of selected soft computing techniques for 
database management. The chapter is subdivided 
in two parts which deal with the soft computing 
techniques, respectively, for information mod- 
elling and querying. A considerable part of the 


19.3 Soft Computing 

in Information Modeling...................... 298 
19.3.1 Modeling 

of Imperfect Information — 

Basic Approaches ..................0665 299 
19.3.2 Modeling 

of Imperfect Information — 

Selected Advanced Approaches... 301 


chapter is related to the issue of bipolarity of pref- 19.4 Soft Computing in Querying ................. 302 
erences and data representation which is among 19.4.1 Flexible Querying 
the important recent research trends. of Regular Databases ................. 303 
19.4.2 Flexible Querying 
. of Fuzzy Databases .................... 307 
19.1 Challenges for Modern Information 
SYSTEMS eiaeiiai deinna 295 19.5 Conclusions .............0aaae 309 
19.2 Some Preliminaries.....................0.00. 296 RefErences............ cc cccecccceccceseessesaeeeeeeeees 309 


19.1 Challenges for Modern Information Systems 


Database systems nowadays form a basic component of 
almost every information system and their role is getting 
more and more important. Almost every person or com- 
pany keeps track of a large amount of digital data and 
this amount is still growing everyday. Many ICT man- 
agers declare that big data is a point of attention for the 
coming years. Despite the fact that the concept of big 
data is not clearly defined, one can generally agree that 
it refers to the increasing need for efficiently storing and 
handling large amounts of information. 

However, it is easy to observe that information is 
not always available in a perfect form. Just consider the 
fact that human beings communicate most of the time 
using vague terms hereby reflecting the fact that they do 
not know exact, precise values with certainty. In gen- 
eral, imperfection of information might be due to the 
imprecision, uncertainty, incompleteness, or ambiguity. 


Our life and society have changed in such a way that we 
simply cannot neglect or discard imperfect information 
anymore. To be competitive, companies need to cope 
with all information that is available. Efficiently storing 
and handling imperfect information without introduc- 
ing errors or causing data loss is therefore considered 
as one of the main challenges for information manage- 
ment in this century [19.1]. 

Soft computing offers formalisms and techniques 
for coping with imperfect data in a mathematically 
sound way [19.2]. The earliest research activities in this 
area dates back to the early eighties of the previous cen- 
tury. In this chapter, we present an overview of selected 
results of the research on soft computing techniques 
aimed at improving database modelling and database 
access in the presence of imperfect information. The 
scope of the chapter is further limited to database access 


296 PartB 


Fuzzy Logic 


z°6L | d Hed 


techniques that are based on querying and specifying 
and handling user preferences in query formulations. 
Other techniques, not dealt with in this chapter, include: 


@ Self-query auto completion systems that help users 
in formulating queries by exploiting past queries, as 
used in recommendation systems [19.3]. 

@ Navigational querying systems that allow intelligent 
navigation through the database [19.4]. 

@ Cooperative querying systems that support indirect 
answers such as summaries, conditional answers, 
and contextual background information for (empty) 
results [19.5]. 


19.2 Some Preliminaries 


In order to review and discuss main contributions to 
the research area of soft computing in database and 
information management, we have to introduce the ter- 
minology and notation related to the basics of database 
management and fuzzy logic. 


19.2.1 Relational Databases 


The techniques presented in this work will be described 
as general as possible, so that they are in fact applica- 
ble to multiple database models. However, due to its 
popularity and mathematical foundations, the relational 
database model has been used as the original formal 
framework for many of these techniques. For that rea- 
son, we opt for using the relational model as underlying 
database model throughout the chapter. 

A relational database can in an abstract sense be 
seen as a collection of relations or, informally, of ta- 
bles which represent them. Informally speaking, the 
columns of a table represent its characteristics, whereas 
its rows reflect its content [19.6]. From a formal point 
of view, each relation R is defined via its relation 
schema [19.7] 

R(A, : Dy,...,An: Dn), (19.1) 
where A; : D;,i=1,...,n are the attributes (columns) 
of the relation. Each attribute A; : D; is characterized by 
its name A; and its associated data type (domain) Dj, 
to be denoted also as domy,;. The data type D; deter- 
mines the allowed values for the attribute and the basic 
operators that can be applied on them. Each relation (ta- 
ble) represents a set of real world entities, each of them 


The remainder of the chapter is organized as fol- 
lows. In Sect. 19.2, we give some preliminaries on 
(relational) databases, which are used as a basis for 
illustrating the described techniques. The next two 
Sects. 19.3 and 19.4 form the core of the chapter. 
In Sect. 19.3, an overview of soft computing tech- 
niques for the modelling and handling of imperfect 
data in databases is presented. Whereas, in Sect. 19.4 
the main trends in soft computing techniques for flexi- 
ble database querying are discussed. Both, querying of 
regular databases and querying of databases contain- 
ing imperfect data are handled. The conclusions of the 
chapter are stated in Sect. 19.5. 


being modeled by a tuple (row) of the relation. Rela- 
tions schemas are the basic components of a database 
schema. In this way, a table contains data describing 
a part of the real world being modeled by the database 
schema. 

The most interesting operation on a database, from 
this chapter’s perspective, is the retrieval of data satis- 
fying certain conditions. Usually, to retrieve data, a user 
forms a query specifying these conditions (criteria). The 
conditions then reflect the user’s preferences with re- 
spect to the information he or she is looking for. The 
retrieval process may be meant as the calculation of 
a matching degree for each tuple of relevant relation(s). 
Classically, a row either matches the query or not, i. e., 
the concept of matching is binary. In the context of soft 
computing, flexible criteria, soft aggregation, and soft 
ranking techniques can be used, so that tuple matching 
becomes a matter of a degree. 

Usually two general formal approaches to the 
querying are assumed: the relational algebra and the 
relational calculus. The former has a procedural char- 
acter: a query consists here of a sequence of operations 
on relations that finally yield requested data. These op- 
erations comprise five basic ones: union (U), difference 
(\), projection (zr), selection (a), and cross product (x) 
that may be combined to obtain some derived opera- 
tions such as, e.g., intersection (N), division (+), and 
join (x). The latter approach, known in two flavours 
as the tuple relational calculus (TRC) or the domain 
relational calculus (DRC), is of a more declarative na- 
ture. Here a query just describes what kind of data is 
requested, but how it is to be retrieved from a database 
is left to the database management system. The exact 


Soft Computing in Database and Information Management 


19.2 Some Preliminaries 


form of queries is not of an utmost importance for our 
considerations. However, some reported research in this 
area employs directly the de-facto standard querying 
language for relational databases, i.e. SQL (structured 
query language) [19.7,8]. Thus, we will also some- 
times refer to the SELECT-FROM-WHERE instruction 
of this language and more specifically consider its 
WHERE clause, where query conditions are specified. 


19.2.2 Fuzzy Set Theory 
and Possibility Theory 


We will use the following concepts and notation con- 
cerning fuzzy set theory [19.9]. A fuzzy set F in the 
universe U is characterized by a membership function 


bre: U > (0, 1]: u> wr). (19.2) 


For each element u € U, up (u) denotes the membership 
grade or extent to which u belongs to F. The origins 
of membership functions may be different and depend- 
ing on that they have different semantics [19.10]. With 
their traditional interpretation as degrees of similarity, 
membership grades allow it to appropriately represent 
vague concepts, like tall man, expensive book, and large 
garden, taking into account the gradual characteris- 
tics of such a concept. Membership grades can also 
express degrees of preference, hereby expressing that 
several values apply to a different extent. For exam- 
ple, the languages one speaks can be expressed by 
a fuzzy set {(English, 1), (French, 0.7), (Spanish, 0.2)}, 
and then the membership degrees represent skill levels 
attained in a given language. A fuzzy set can also be 
interpreted as a possibility distribution, in which case 
its membership grades denote degrees of uncertainty. 
Then, it can be used to represent, e.g., the uncertainty 
about the actual value of a variable, like the height 
of a man, the price of a book and the size of a gar- 
den [19.11,12]. This interpretation is related to the 
concept of the disjunctive fuzzy set. 

Possibility distributions are denoted by z. The no- 
tation zy is often used to indicate that the distribution 
concerns the value of a variable X, 


my: U > [0, 1] : ub ayx(u), (19.3) 


where X takes its value from a universe U. 
Possibility and necessity measures can provide for 
the quantification of such an uncertainty. These mea- 


sures are denoted by JI and N, respectively, i. e., 


IT: 6(U) > [0,1]: A I(A) and 
N: (U) > [0, 1]:A N(A), (19.4) 


where the fuzzy power set (U) stands for the family of 
fuzzy sets defined over U. Assuming that all we know 
about the value of a variable X is a possibility distribu- 
tion zy, these measures, for a given fuzzy set F, assess 
to what extent, respectively, this set is consistent (JI) 
and its complement is inconsistent (N) with our knowl- 
edge on the value of X. More precisely, if zy is the 
underlying possibility distribution, then 


ITy(F) = Supra (rae) ur(u)), (19.5) 
Nx(F) = inf max(1 — xx (u), pr (u)) . (19.6) 


Sometimes the interval [Ny (F), My(F)] is used as an es- 
timate of the consistency of F with the actual value of 
X. The possibility (necessity) that two variables X and 
Y, whose values are given by possibility distributions, 
mtx and sry, are in relation 0 — e.g., equality — is com- 
puted as follows. The joint possibility distribution, ztxy, 
of X and Y on U x U (assuming noninteractivity of the 
variables) is given by 


Ttxy (u, w) = min(zy(u), ty(w)) . (19.7) 


The relation 6 can be fuzzy and represented by a fuzzy 
set F € (U x U) such that ur (u, w) expresses to what 
extent u is in relation 0 with w. The possibility (resp. 
necessity) measure associated with zyy will be denoted 
by Ixy (resp. Nyy). Then, we calculate the measures of 
possibility and necessity that the values of the variables 
are in relation 0 as follows 


I1(X 0 Y) = Iky(F) 
= oe min(szx(u), my(w), urlu, w)) A 
Uwe 
(19.8) 


N(X 6 Y) = My(F) = inf ,max(1 —mx(u) , 
1—zy(w), pplu, w)). (19.9) 


Knowing the possibility distributions of two variables 
X and Y, one may also be interested on how these dis- 
tributions are similar to each other. Obviously, (19.8)— 
(19.9) provide some assessment of this similarity, 
but other indices of similarity are also applicable. 


297 


7’6L |g Hed 


298 PartB | Fuzzy Logic 


€°6L| a Hed 


Table 19.1 Special EPTVs 


1 p) Interpretation 
(T,1) p is true 

(F, 1) p is false 

(T, 1), (F,1) p is unknown 
(1, 1) p is inapplicable 


(7,1), (,),(L,) Information about p is not available 
This leads to a distinction between representation- 
based and value-based comparisons of possibility dis- 
tributions [19.13]. We will discuss this later on in 
Sect. 19.4.2. 

An important class of possibility distributions are 
extended possibilistic truth values (EPTV) [19.14]. An 
EPTV is defined as a possibility distribution (a dis- 
junctive fuzzy set) in the universe /* = {T, F, L} that 
consists of the three truth values T (true), F (false) 
and | (undefined). The set of all EPTVs is denoted as 
go(/*). They are meant to represent uncertainty as to the 
truth value of a proposition p € P, where P denotes the 
universe of all propositions, in particular in the context 
of database querying. Thus, a valuation 7* is assumed 
such that 


T: P—>pU):p t (p). (19.10) 


In general the EPTV 7* (p) representing (the knowledge 
of) the truth of a proposition p € P has the following 
format 


t (p) = (T, HT p) (T)), (F, HUT p) (F)), 
(L, uy (p) (L))]. (19.11) 


19.3 Soft Computing in Information 


Soft computing techniques make it possible to grasp im- 
perfect information about a modeled part of the world 
and represent it directly in a database. If fuzzy set 
theory [19.9] or possibility theory [19.12] are used to 
model imperfect data in a database, the database is 
called a fuzzy database. Other approaches include those 
that are based on rough set theory [19.15] and on prob- 
ability theory [19.16] and resulting databases are then 
called rough databases and probabilistic databases, re- 
spectively. 

In what follows, we give an overview of the most 
important soft computing techniques for modeling im- 
perfect information which are based on fuzzy set theory 


Hereby, [7*(p)(T), U) (F) and uo) (L), respec- 
tively denote the possibility that p is true, false, or 
undefined. The latter value is also covering cases where 
p is not applicable or not supplied. EPTVs extend the 
approach based on the possibility distributions defined 
on just the set {T, F} with an explicit facility to deal 
with the inapplicability of information as can for ex- 
ample occur with the evaluation of query conditions. In 
Table 19.1, some special cases of EPTVs are presented: 
These cases are verified as follows: 


© If it is completely possible that the proposition is 
true and no other truth values are possible, then it 
means that the proposition is known to be true. 

© If it is completely possible that the proposition is 
false and no other truth values are possible, then it 
means that the proposition is known to be false. 

© If it is completely possible that the proposition is 
true, it is completely possible that the proposition 
is false and it is not possible that the proposition 
is inapplicable, then it means that the proposition 
is applicable, but its truth value is unknown. This 
EPTV will be called in short unknown. 

© If it is completely possible that the proposition is 
inapplicable and no other truth values are possible, 
then it means that the proposition is inapplicable. 

© If all truth values are completely possible, then 
this means that no information about the truth of 
the proposition or its applicability is available. The 
proposition might be inapplicable, but might also be 
true or false. This EPTV will be called in short un- 
available. 


Modeling 


and possibility theory. Hereby, we distinguish between 
basic techniques (Sect. 19.3.1) and more advanced tech- 
niques (Sect. 19.3.2). As explained in the preliminaries, 
we use the relational database model [19.6] as the 
framework for our descriptions. This is also motivated 
by the fact that initial research in this area has been 
done on the relational database model and this model is 
nowadays still the standard for database modeling. Soft- 
computing-related research on other database models 
like the (E)ER model, the object-relational model, the 
XML-model, and object-oriented database models ex- 
ists. Overviews can, among others, be found in [19.17— 
22]. 


Soft Computing in Database and Information Management 


19.3 Soft Computing in Information Modeling 


19.3.1 Modeling of Imperfect Information - 
Basic Approaches 


In view of the correct handling of information, it is 
of utmost importance that the available information 
that has to be stored in a database is modeled as ade- 
quate as possible so as to avoid the information loss. 
The most straightforward application of fuzzy logic 
to the classical relational data model is by assuming 
that the relations in a database themselves are also 
fuzzy [19.23]. Each tuple of a relation (table) is as- 
sociated with a membership degree. This approach is 
often neglected because the interpretation of the mem- 
bership degree is unclear. On the other hand, it is worth 
noticing that fuzzy queries, as will be discussed in 
Sect. 19.4, in fact produce fuzzy relations. So, we will 
come back to this issue when discussing fuzzy queries 
in Sect. 19.4. 

Most of the research on modeling imperfect infor- 
mation in databases using soft computing techniques 
is devoted to a proper representation and processing of 
an attribute value. Such a value, in general, may not be 
known perfectly due to many different reasons [19.24]. 
For example, due to the imprecision, as when the paint- 
ing is dated to the beginning of the fourteenth century; 
or due to the unreliability, as when the source of infor- 
mation is not fully reliable; or due to the ambiguity, as 
when the provided value may have different meanings; 
or due to the inconsistency, as when there are multi- 
ple different values provided by different sources; or 
due to the incompleteness, as when when the value is 
completely missing or given as a set of possible alter- 
natives (e.g., the picture was painted by Rubens or van 
Dyck). These various forms of information imperfec- 
tion are not totally unconnected as well as may occur 
together. From the viewpoint of data representation they 
may be primarily seen as yielding uncertainty as to the 
actual value of an attribute and as such may be properly 
accounted for by a possibility distribution. 

It is worth noticing that then the assignment of 
the value to an attribute may be identified with 
a Zadeh’s [19.25] linguistic expression X is A, where 
X is a linguistic variable corresponding to the attribute 
while A is a (disjunctive) fuzzy set representing imper- 
fect information on its value. Then, various combined 
forms of information imperfection may be represented 
by appropriate qualified linguistic expressions such as, 
e.g., X is A with certainty at least œ [19.26]. Such qual- 
ified linguistic expressions may be in turn transformed 
into a X is B expression where a fuzzy set B is a func- 
tion of A and other possible parameters of a qualified 


expression, like œ in the previous example. Thus, the 
basic linguistic expression X is A indeed plays a fun- 
damental role in the representation of the imperfect 
information. 

The work of Prade and Testemale [19.27] is the 
most representative for the approaches to imperfect in- 
formation modeling in a database based on the possibil- 
ity theory. Other works in this vein include [19.27-32]. 
On the other hand, Buckles and Petry [19.33] as well as 
Anvari and Rose [19.34] assume the representation of 
attributes’ values using sets of alternatives which may 
be treated as a simple binary possibility distribution. 
However, their motivation is different as they assume 
that domain elements are similar/indistinguishable to 
some extent and due to that it may be difficult to deter- 
mine a precise value of an attribute. We will first briefly 
describe the approach of Prade and Testemale and, then, 
the model of Buckles and Petry. 


Possibilistic Approach 
In the possibilistic approach, disjunctive fuzzy sets are 
used to represent the imprecisely known value of an 
attribute A. Hence, such a fuzzy set is interpreted as 
a possibility distribution z4 and is defined on the do- 
main dom, of the attribute. The (degree of) possibility 
that the actual value of A is a particular element x of 
the domain of this attribute, x € dom, equals m,4(x). 
Every domain value x € dom, with m4(x) Æ 0 is thus 
a candidate for being the actual value of A. Together 
all candidate values and their associated possibility de- 
grees reflect what is actually known about the attribute 
value. Thus, if the value of an attribute is not known 
precisely then a set of values may be specified (rep- 
resented by the support of the fuzzy set used) and, 
moreover, particular elements of this set may be indi- 
cated as a more or less plausible values of the attribute 
in question. 

A typical scenario in which such an imprecise value 
has to be stored in a database is when the value of an 
attribute is expressed using a linguistic term. For exam- 
ple, assume that the value of a painting is not known 
precisely, but the painting is known to be very valuable. 
Then, this information might be represented by the pos- 
sibility distribution 


Tyalue (x) = Hyery_valuable(X) 
0 if x < 10M 
x—10 


= Tn if 10M <x < 20M 


1 if x > 20M . 


299 


€°6L| d Hed 


300 PartB 


Fuzzy Logic 


€°6L| 4 Hed 


The term uncertainty is in information management of- 
ten used to refer to situations where one has to cope 
with several (distinct) candidate attribute values com- 
ing from different information sources. For example, 
one information source can specify that the phone num- 
ber of a person is X, while another source can specify 
that itis Y. This kind of uncertainty can also be handled 
with the possibilistic approach as described above. In 
that case, a possibility distribution 74 over the domain 
of the attribute is used to model different options for the 
attribute’s actual value. However, in general possibility 
distributions are less informative than probability dis- 
tributions. They only inform the user about the relative 
likeliness of different options. Probability distributions 
provide more information and led to the so-called prob- 
abilistic databases [19.35—39]; cf. also [19.40]. 

Imprecision at the one hand and uncertainty at the 
other hand are orthogonal concepts: they can occur at 
the same time, as already mentioned earlier. For exam- 
ple, it might be uncertain whether the value of a painting 
is 3M, around 2M, or much cheaper, where the latter 
two options are imprecise descriptions. Using the regu- 
lar possibilistic modeling approach in such a case would 
yield in a single possibility distribution over the domain 
of values and would result in a loss of information on 
how the original three options were specified. Level-2 
fuzzy sets, which are fuzzy sets defined over a domain 
of fuzzy sets [19.41, 42], can help to avoid this informa- 
tion loss [19.43]. 

In traditional databases, missing information is 
mostly handled by means of a pseudovalue, called a null 
value [19.44,45]. In fact, information may be miss- 
ing for many different reasons: the data may exist but 
be unknown (e.g., the salary of an employee may be 
unknown); the data may not exist nor apply (e.g., an un- 
employed person earns no salary) [19.46]. For the han- 
dling of nonapplicability a special pseudovalue is still 
required, but the case of unknown information can be 
adequately handled by using possibility theory. Indeed, 
as studied in [19.47], in the so-called extended possi- 
bilistic approach, the domain dom, of an attribute A can 
be extended with an extra value L4 that is interpreted as 
regular value not applicable. Missing information can 
then be adequately modeled by considering the follow- 
ing three special possibility distributions UNK, N/A, 
and TUNA: 


@ Unknown value 


1, if x € dom, \ {La} 
0, ifx= ly 


TTUNK (x) = 


@ Value not applicable 


0, if x € dom, \ {La} 
1, ifx= L, 


zya (x) = 


@ No information available 
zuna (x) = 1, Vx € dom . 


Similarity-Based Approach 
The basic idea behind this approach [19.33] is that 
while specifying the value of a database attribute one 
may consider similar values as also being applicable. 
Thus, in general, the value of an attribute A is assumed 
to be a subset of its domain dom,. Moreover, the do- 
main dom is associated with a similarity relation S4 
quantifying this similarity for each pair of elements 
x, y E doma. The values S4(x, y) taken by S4 are in the 
unit interval [0, 1], where 0 corresponds to totally differ- 
ent and 1 to totally similar. Hence, S4 is a fuzzy binary 
relation that associates a membership grade to each 
pair of domain values. This relation is assumed to be 
reflexive, symmetric and satisfying some form of transi- 
tivity. This requirements have been found too restrictive 
and some approaches based on a weaker structure have 
been proposed in [19.48], where the proximity relation 
is used and all attractive properties of the original ap- 
proach are preserved. Among these properties, the most 
important is the proper adaptation of the redundancy 
concept and of the relational algebra operations. 

It has been quickly recognized that the rough sets 
theory [19.15] offers effective tools to deal with and an- 
alyze the indistinguishability/equivalence relation and 
the similarity-based approaches evolved into the rough- 
sets-based database model [19.49, 50]. 

There are also a number of hybrid models pro- 
posed in the literature. Takahashi [19.51] has proposed 
a model for a fuzzy relational database assuming pos- 
sibility distributions as attribute values. Moreover, in 
his model fuzzy sets are used as tuples’ truth values. 
For example, a tuple t, accompanied by such a truth 
qualification, may express that Jt is quite true that the 
paintings origin is the beginning of the fifteenth century. 

Medina etal. [19.52] proposed a fuzzy database 
model called GEFRED (generalized fuzzy relational 
database) in an attempt to integrate both approaches: 
the possibilistic and similarity based one. The data are 
stored as generalized fuzzy relations that extend the re- 
lations of the relational model by allowing imprecise 
information and a compatibility degree associated with 
each attribute value. 


Soft Computing in Database and Information Management 


19.3.2 Modeling of Imperfect Information - 
Selected Advanced Approaches 


In this section, we will focus on some extensions to the 
possibility-based approach described in Sect. 19.3.1. As 
argued earlier, a disjunctive fuzzy set may very well 
represent the situation when an attribute A of a tuple 
for sure takes exactly one value from the domain dom, 
(as it should also due to the classical relational model) 
but we do not know exactly which one. The complete 
ignorance is then modeled by the set dom,. However, 
very often we can distinguish the elements of dom, with 
respect to their plausibility as the actual value of the 
attribute. This information, based on some evidence, is 
represented by the membership function of a disjunctive 
fuzzy set which is further identified with a possibility 
distribution. However, the characteristic of the available 
evidence may be difficult to express using regular fuzzy 
sets. In the literature dealing with data representation 
there are first attempts to cover such cases using some 
extensions to the concept of the fuzzy set. We will now 
briefly review them. 


Imprecise Membership Degrees 
Prade and Testemale in their original approach [19.31] 
assume that the membership degrees of a mentioned 
disjunctive fuzzy set are known precisely. On the other 
hand, one can argue that they may be also known only 
in an imprecise way. It may be the case, in particular, 
when the value of an attribute is originally specified us- 
ing a linguistic term. For example, if a painting is dated 
to the beginning of the fifteenth century then assigning, 
e.g., the degree of 0.6 to the year 1440 may be chal- 
lenging for an expert who is to define the representation 
of this linguistic term. He or she may be much more 
comfortable stating that it is something, e.g., between 
0.5 and 0.7. Some precision is lost and a second level 
uncertainty is then implied but it may better reflect the 
evidence actually available. 

Type 2 fuzzy sets [19.25] make it possible to model 
the data in the case described earlier. In particular, their 
simplest form, the interval valued fuzzy sets (IFVSs), 
may be here of interest. In the case of interval-valued 
fuzzy sets [19.25] a membership degree is represented 
as an interval, as in the example given earlier. Thus an 
interval-valued fuzzy set X over a universe of discourse 
U is defined by two functions 


uk, we U = [0,1], 
such that 
0< ua aal, VxeU, (19.12) 


and may be denoted by 


X = [< x, ux(x), wea) > (ee UNA 
(0 < wy (x) < uya) < I]. (19.13) 


Constraint (19.12) reflects that Lh (x) and u4 (x) are, re- 
spectively, interpreted as a lower and an upper bound 
on the actual degree of membership of x in X. 

Basically, the representation of information using 
(disjunctive) IVFS is conceptually identical with the 
original approach of Prade and Testemale while it pro- 
vides some more flexibility in defining the meaning of 
linguistic terms. Some preliminary discussion on their 
use may be found in [19.53]. 


Bipolarity of Information 
Bipolarity is related to the existence of the positive 
and negative information [19.54—58]. It manifests it- 
self, in particular, when people are making judgments 
about some alternatives and take into account their 
positive and negative sides. From this point of view, 
bipolarity of information may play an important role 
in database querying and is discussed from this per- 
spective in Sect. 19.4.1. Here we will briefly discuss 
the role of bipolarity in data representation. We will 
mostly follow in this respect the work of Dubois and 
Prade [19.58]. 

The value of an attribute may be not known pre- 
cisely but some information on it may be available in 
the form of both positive and negative statements. In 
some situations positive information is provided, stat- 
ing what values are possible, satisfactory, permitted, 
desired, or considered as being acceptable. In other sit- 
uations, negative statements express what values are 
impossible, rejected or forbidden. 

Different types of bipolarity can be distin- 
guished [19.58, 59]: 


© Type I, symmetric univariate bipolarity: positive 
and negative information are considered as being 
exact complements of each other as in, e.g., the 
probabilility theory; for instance, if the probability 
that a given painting is painted in the eighteenth 
century is stated to be 0.7 (positive information), 
then the probability that this painting was not 
painted in the eighteenth century equals 0.3 (neg- 
ative information); this simple form of bipolarity is 
well supported by traditional information systems; 
this bipolarity is quantified on a bipolar univariate 
scale such as the intervals [0, 1] or [—1, 1]; 


19.3 Soft Computing in Information Modeling 301 


€°6L| d Hed 


302 


16L| d Hed 


Part B 


Fuzzy Logic 


© Type II, symmetric bivariate bipolarity: another, 
more flexible approach is to consider positive and 
negative information as being dual concepts, mea- 
sured along two different scales but based on the 
same piece of evidence. The dependency between 
them is modeled by means of some duality relation. 
This kind of bipolarity is, among others, used in 
Atanassov’s intuitionistic fuzzy sets [19.60] where 
each element of a set is assigned both a member- 
ship and nonmembership degree which do not have 
to sum up to 1 but their sum cannot exceed 1. For 
example, it could be stated that Rubens is a good 
candidate to be an author of a given painting to a de- 
gree 0.6 (due to some positive information) while at 
the same time he is not a good candidate to a de- 
gree 0.2 (due to some negative information); this 
bipolarity is quantified on a unipolar bivariate scale 
composed of two unipolar scales such as, e.g., two 
intervals [0, 1]; 

© Type III, asymmetric/heterogeneous bipolarity: in 
the most general case, positive and negative in- 
formation is provided by two separate bodies of 
evidence, which are to some extent independent of 
each other and are of a different nature. A con- 
straint to guarantee that the information does not 
contain contradictions can exist but beside of that 
both statements are independent of each other and 
hence giving rise to the notion of the heterogeneous 
bipolarity; this bipolarity is quantified as in the case 
of Type II bipolarity. 


Type III bipolarity is of special interest from the 
point of view of data representation. Dubois and 
Prade [19.58] argue that in this context the heterogene- 
ity of bipolarity is related to to the different nature of 
two bodies of evidence available. Namely, the negative 
information corresponds to the knowledge which puts 
some general constraints on the feasible values of an 
attribute. On the other hand, the positive information 
corresponds to data, i.e., observed cases which justify 
plausibility of a given element as the candidate for the 
value of an attribute. 


19.4 Soft Computing in Querying 


The research on soft computing in querying has al- 
ready a long history. It has been inspired by the success 
of fuzzy logic in modeling natural language proposi- 
tions. The use of such propositions in queries, in turn, 


This type of bipolarity is proposed to be represented 
for a tuple ż by two separate possibility distributions, 
da) and maq), defined on the domain of an attribute, 
domg [19.58] (and earlier works cited therein). A pos- 
sibility distribution m4), as previously, represents the 
compatibility of particular elements x € dom, with the 
available information on the value of the attribute A for 
a tuple t. This compatibility is quantified on a unipolar 
negative scale identified with the interval [0, 1]. The ex- 
treme values of x4, i.e., maq (x) = 1 and macy (x) = 0 
are meant to represent, respectively, that x is potentially 
fully possible to be the value of A at ¢ (1 is a neu- 
tral element on this unipolar scale) and that x is totally 
impossible to be the value of A at t (0 is an extreme neg- 
ative element on this unipolar scale). On the other hand, 
a possibility distribution 4q) expresses the degree of 
support for an element x € domy to be the value of A at 
t provided by some evidence. In this case, 54(, (x) = 0 
denotes the lack of such a support but is meant as just 
a neutral assessment while 6,(,) (x) = 1 denotes full sup- 
port (1 is an extreme positive element on this scale). 

For example, when the exact dating of a painting is 
unknown, one can be convinced it has been painted in 
some time range (e.g., related to the time period its author 
lived in) and also there may be some evidence support- 
ing a particular period of time (e.g., due to the fact that 
other very similar paintings of a given author are known 
from this period). Thus, the former is a negative infor- 
mation, excluding some period of time while the latter is 
a positive information supporting given period. 

Thus, d4() and maç) are said to represent, respec- 
tively, the set of guaranteed/actually possible and the 
set of potentially possible values of an attribute A for 
the tuple t. These possibility distributions are related by 
a consistency constraint: 74(1) (x) > day (x) as x have to 
be first nonexcluded before it may be somehow sup- 
ported by the evidence. 

For a given attribute A and tuple f, x4) is based on 
the set of nonimpossible values N4) while 64(,) relates 
to the set of actually possible values Gy ,). The querying 
of such a bipolar database may be defined in terms of 
these sets [19.58]. 


seems to be very natural for human users of any in- 
formation system, notably the database management 
system. Later on, the interest in fuzzy querying has 
been reinforced by the omnipresence of network-based 


Soft Computing in Database and Information Management | 19.4 Soft Computing in Querying 303 


applications, related to buzzwords of modern informa- 
tion technology, such as e-commerce, e-government, 
etc. These applications evidently call for a flexible 
querying capability when users are looking for some 
goods, hotel accommodations, etc., that may be best 
described using natural language terms such as cheap, 
large, close to the airport, etc. Another amplification 
of the interest in fuzzy querying comes from develop- 
ments in the area of data warehousing and data mining 
related applications. For example, a combination of 
fuzzy querying and data mining interfaces [19.61, 62] 
or fuzzy logic and the OLAP (online analytical pro- 
cessing) technology [19.63] may lead to new, effective 
and more efficient solutions in this area. More recently, 
big data challenges can be seen as driving forces for 
research in soft querying. Indeed, efficiently querying 
huge quantities of heterogeneous structured and un- 
structured data is one of the prerequisites for efficiently 
handling big data. 


19.4.1 Flexible Querying 
of Regular Databases 


As a starting point, we consider a simplified form of 
database queries on a classical crisp relational database. 
Hereby a query is assumed to consist of a combina- 
tion of conditions that are to be met by the data sought. 
Introducing flexibility is done by specifying fuzzy pref- 
erences. This can be done inside the query conditions 
via flexible search criterion and allows to express that 
some values are more desirable than others in a gradual 
way. Query conditions are allowed to contain natu- 
ral language terms. Another option is to specify fuzzy 
preferences at the level of the aggregation. By assign- 
ing grades of importance to (groups of) conditions it 
can be indicated that the satisfaction of some query 
conditions is more desirable than the satisfaction of 
others. 


Basic Approaches 

One of the pioneering approaches in recognizing the 
power of fuzzy set theory for information retrieval pur- 
poses in general is [19.64]. The research on the appli- 
cation of soft computing in database querying research 
proper dates back to an early work of Tahani [19.65], 
proposing the modeling of linguistic terms in queries 
using elements of fuzzy logic. An important enhance- 
ment of this basic approach consisted in considering 
flexible aggregation operators [19.10, 66-68]. Another 
line of research focused on embedding fuzzy constructs 
in the syntax of the standard SQL [19.21, 69-74]. 


Fuzzy Preferences Inside Query Conditions. 
Tahani [19.65] proposed to use imprecise terms typical 
for natural language such as, e.g., high, young etc., 
to form conditions of an SQL-like querying language 
for relational databases. These imprecise linguistic 
terms are modeled using fuzzy sets defined in attributes 
domains. The binary satisfaction of a classical query is 
replaced with the matching degree defined in a straight- 
forward way. Namely, a tuple ¢ matches a simple 
(elementary) condition A = /, where A is an attribute 
(e.g., price) and / is a linguistic term (e.g., high) to 
a degree y(A = /, t) such that 


y(A=1,t) = w(AQ). (19.14) 


where A(t) is the value of the attribute A at the tuple t 
and u;(-) is the membership function of the fuzzy set 
representing the linguistic term /. The matching degree 
for compound conditions, e.g., price = high AND (date 
= beginning-of-17-century OR origin = south-europe) 
is obtained by a proper interpretation of the fuzzy logi- 
cal connectives. For example 


y((Ai = 4) AND (A2 = h), t] 
= min[u; (A1 (8), Hp (A2(0))] - (19.15) 


The relational algebra has been very early adapted 
for the purposes of fuzzy flexible querying of regu- 
lar relational databases. The division operator attracted 
a special attention and its many fuzzy variants has 
been proposed, among other by Yager [19.75], Dubois 
and Prade [19.76], Galindo etal. [19.77], and Bosc 
et al. [19.78, 79]. Takahashi [19.80] was among the first 
authors to propose a fuzzy version of the relational 
calculus. His fuzzy query language (FQL) was meant 
as a fuzzy extension of the domain relational calculus 
(DRC). 


Fuzzy Preferences Between Query Conditions. Of- 
ten, a query is composed of several conditions of vary- 
ing importance for the user. For example, a customer 
of a real-estate agency may be looking for a cheap 
apartment in a specific district of a city and located 
not higher that a given floor. However, for he or she 
the low price may be much more important than the 
two other features. It may be difficult to express such 
preferences in a traditional query language. On the 
other hand, it is very natural for flexible fuzzy query- 
ing approaches due to the assumed gradual character 
of the matching degree as well as due to the existence 
of sophisticated preference modeling techniques devel- 


76L| 9 Hed 


304 PartB 


Fuzzy Logic 


16L| a Hed 


oped by fuzzy logic community. Thus, most approaches 
make it possible to assign to a condition an importance 
weight, usually represented by a number from the [0, 1] 
interval. 

The impact of a weight can be modeled by first 
matching the condition as if there is no weight and only 
then modifying the resulting matching degree in ac- 
cordance with the weight. A modification function that 
strengthens the match of more important conditions and 
weakens the match of less important conditions is used 
for this purpose. 

The evaluation of a whole query against a tuple may 
be seen as an aggregation of the matching degrees of el- 
ementary conditions comprising the query against this 
tuple. Thus, an aggregation operator is involved which, 
in the case of a simple conjunction or disjunction of 
the elementary conditions is usually assumed to take 
the form of the minimum and maximum operator, re- 
spectively. If weights are assigned to the elementary 
conditions connected using conjunction or disjunction 
then, first, the matching degrees of these conditions are 
modified using the weights [19.81] and then they are 
aggregated as usual. 

On the other hand, some special aggregation oper- 
ators may be explicitly used in the query and then they 
guide the aggregation process. Kacprzyk et al. [19.66, 
67] were first to propose the use in queries of an 
aggregation operator in the form of a linguistic quan- 
tifier [19.82]. Thus, the user may require, e.g., most 
of the elementary conditions to be fulfilled instead 
of all of them (what is required when the conjunc- 
tion of the conditions is used) or instead of just 
one of them (what is required in the case of the 
disjunction. For example, the user may define paint- 
ings of his interest as those meeting most of the 
following conditions: not expensive, painted in Italy, 
painted not later than in seventeenth century, ac- 
companied by an attractive insurance offer etc. The 
overall matching degree of a query involving a lin- 
guistic quantifier may be computed using any of the 
approaches used to model these quantifiers. In [19.66, 
67], Zadeh’s original approach is used [19.82] while 
in [19.83] Yager’s approach based on the OWA oper- 
ators is adopted [19.84]. Further studies on modeling 
sophisticated aggregation operators, notably linguistic 
quantifiers, in the flexible fuzzy queries include the pa- 
pers by Bosc et al. [19.85, 86], Galindo et al. [19.21] 
and Vila et al. [19.87]. 

A recent book by Bosc and Pivert [19.88] contains 
a comprehensive survey of the sophisticated flexible 
database querying techniques. 


Bipolar Queries 
An important novel line of research concerning ad- 
vanced querying of databases addresses the issue of 
the bipolar nature of users preferences. Some psycho- 
logical studies (e.g., sources cited by [19.56]) show 
that while expressing his or her preferences a human 
being is separately considering positive and negative 
aspects of a given option. Thus, to account for this 
phenomenon, a query should be seen as a combina- 
tion of two types of conditions: the satisfaction of one 
of them makes a piece of data desired while the sat- 
isfaction of the second makes it to be rejected. Such 
a query will be referred to as the bipolar query and will 
be denoted as a pair of conditions (C, P), where C, for 
convenience, denotes the complement of the negative 
condition and P denotes the positive condition. The re- 
lations between these two types of conditions may be 
analyzed from different viewpoints, and the conditions 
itself may be expressed in various ways. In Sect. 19.3.2, 
we have already introduced the concept of bipolarity in 
the context of data representation. Now, we will briefly 
survey different approaches to modeling the bipolar- 
ity with a special emphasis on the context of database 


querying. 


Models of Bipolarity. Various scales may be used to 
express bipolarity of preferences. Basically, two models 
based on: a bipolar univariate scale and a unipolar bi- 
variate scale [19.89] are usually considered. The former 
assumes one scale with three main levels of negative, 
neutral, and positive preference degrees, respectively. 
These degrees are gradually changing from one end of 
the scale to another accounting for some intermediate 
levels. In the second model, two scales are used which 
separately account for the positive and negative prefer- 
ence degrees. Often, the intervals [—1, 1] and [0, 1] are 
used to represent the scales in the respective models of 
bipolarity. 

From the point of view of database querying, the 
first model may be seen as assuming that the user as- 
sesses both positive and negative aspects of a given 
piece of data (an attribute value or a tuple) and is in 
a position to come up with an overall scalar evaluation. 
This is convenient with respect to the ordering of the 
tuples in the answer to a query. 

The second model is more general and makes it 
possible for the user to separately express his or her 
evaluation of positive and negative aspects of a given 
piece of data. This may be convenient if the user can- 
not, or is not willing to, combine his or her evaluations 
of positive and negative features of data. Obviously it 


Soft Computing in Database and Information Management 


requires some special means to order the query answer 
dataset with respect to a pair of evaluations. 


Levels at Which the Bipolarity May be Expressed. 
Bipolar evaluations may concern the domain of an at- 
tribute or the whole set of tuples. This is a distinction of 
a practical importance, in particular if the elicitation of 
user preferences is considered. 

In the former case, the user is supposed to be will- 
ing and in a position to partition the domains of selected 
attributes into (fuzzy) subsets of elements with posi- 
tive, negative, and neutral evaluations. For example, the 
domain of the price attribute, characterizing paintings 
offered during an auction at a gallery, may be in the 
context of a given query subjectively partitioned us- 
ing fuzzy sets representing the terms cheap (positive 
evaluation), expensive (negative evaluation), and some 
elements with a neutral evaluation. 

In the case of bipolar evaluations at the tuples level 
a similar partitioning is assumed but concerning the 
whole set of tuples (here, representing the paintings). 
Usually, this partition will be defined again by (fuzzy) 
sets, this time defined with reference to possibly many 
attributes, i.e., defined on the cross product of the 
domains of several attributes. For example, the user 
may identify as negative these paintings which satisfy 
a compound condition expensive and modern. Thus, the 
evaluations in this case have a comprehensive charac- 
ter and concern the whole tuples, taking implicitly into 
account a possibly complex weighting scheme of par- 
ticular attributes and their interrelations. 

Referring to the models of bipolarity, it seems 
slightly more natural for the bipolar evaluations ex- 
pressed on the level of the domain of an attribute to 
use a bipolar univariate scale while the evaluations on 
the level of the whole set of tuples would rather adopt 
a unipolar bivariate scale. 


A General Interpretation of Bipolarity in the Con- 
text of Database Querying. In the most general in- 
terpretation, we do not assume anything more about 
the relation between positive and negative conditions. 
Thus, we have two conditions and each tuple is evalu- 
ated against them yielding a pair of matching degrees. 
Then an important question is how to order data in 
an answer to such a query. Basically, while doing that 
we should take into account the very nature of both 
matching degrees, i.e., the fact that they correspond 
to the positive and negative conditions. The situation 
here is somehow similar to that of decision making 
under risk. Namely, in the latter context a decision 


maker who is risk-averse may not accept actions lead- 
ing with some nonzero probability to a loss. On the 
other hand, a risk-prone decision maker may ignore 
the risk of an even serious loss as long as there are 
prospects for a high gain. Similar considerations may 
apply in the case of bipolar queries. Some users may be 
more concerned about negative aspects and will reject 
a tuple with a nonzero matching degree of the negative 
condition. Some other users may be more oriented on 
the satisfaction of the positive conditions and may be 
ready to accept the fact that given piece of data satis- 
fies to some extent the negative conditions too. Thus, 
the bipolar query should be evaluated in a database in 
a way strongly dependent on the attitude of the user. 
In the extreme cases, the above-mentioned risk-averse 
and risk-prone attitudes would be represented by lexi- 
cographic orders. In the former case, the lexicographic 
ordering would be first nondecreasing with respect to 
the negative condition matching degree and then nonin- 
creasing with respect to the positive condition matching 
degree. The less extreme attitudes of the users may be 
represented by various aggregation operators producing 
a scalar overall matching degree of a bipolar query. 

An approach to a comprehensive treatment of so 
generally meant bipolar queries has been proposed 
by Matthé and De Tré [19.90], and further developed 
in [19.91]. In this approach, a pair of matching de- 
grees of the positive and negative conditions is referred 
to as a bipolar satisfaction degree (BSD). The respec- 
tive matching degrees are denoted as s and d, and 
called the satisfaction degree and the dissatisfaction de- 
gree, respectively. The ranking of data retrieved against 
a bipolar query in this approach may be obtained in var- 
ious ways. One of the options is based on the difference 
s— d of the two matching degrees. In this case, a risk- 
neutral attitude of the user is modeled: he or she does 
not favor neither positive nor negative evaluation. 


The Required/Desired Semantics. Most of the re- 
search on bipolar queries has been so far focused on 
a special interpretation of the positive and negative con- 
ditions. Namely, the data items sought have to satisfy 
the complement of the latter condition, i.e., the con- 
dition denoted earlier as C, unconditionally while the 
former condition, i. e., the condition denoted as P, is of 
somehow secondary importance. For example, a paint- 
ing one is looking for should be from seventeenth 
century and, if possible should be painted by one of 
the famous Flemish painters. The C condition is here 
painted in the seventeenth century (the original nega- 
tive condition is of course painted not in the seventeenth 


19.4 Soft Computing in Querying 305 


76L| 9 Hed 


306 PartB | Fuzzy Logic 


16L| d Hed 


century) while the positive condition P is painted by one 
of the famous Flemish painters. Thus, the condition C is 
required to be satisfied condition while the condition P 
may be referred to as a desired condition. Anyway, we 
still have two matching degrees, of conditions C and 
P, and the assumed relation between them determines 
the way the tuples should be ordered in the answer to 
a query. 

The simplest approach is to use the desired con- 
dition’s matching degree just to order the data items 
which satisfy the required condition. However, if the 
required condition is fuzzy, i.e., may be satisfied to 
a degree, it is not obvious what should it mean that it is 
satisfied. Some authors [19.56, 92] propose to adopt the 
risk-averse model of the user and use the correspond- 
ing lexicographic order with the primary account for 
the satisfaction of the condition C. This interpretation 
is predominant in the literature. 

Another approach consists in employing an aggre- 
gation operator, which combines the degrees of match- 
ing of conditions C and P in such a way so that the 
possibility of satisfying both conditions C and P is ex- 
plicitly taken into account, i. e., the focus is on a proper 
interpretation of the following expression which is iden- 
tified with the bipolar query (C, P) 


C and possibly P . (19.16) 


Aggregation operators of this type have been studied 
in the literature under different names and in vari- 
ous contexts. In the framework of database querying it 
were Lacroix and Lavency [19.93] who first proposed 
it. It has been proposed independently in the context 
of default reasoning by Yager [19.94] and by Dubois 
and Prade [19.95]. The concept of this operator was 
also used by Bordogna and Pasi [19.96] in the con- 
text of textual information retrieval. Recently, a more 
general concept of a query with preferences and a corre- 
sponding new relational algebra operator, winnow, were 
introduced by Chomicki [19.97]. 

Zadrozny and Kacprzyk [19.98,99] proposed a di- 
rect fuzzification of the concept of the and possi- 
bly operator, implicit in the work of Lacroix and 
Lavency [19.93]. In their approach, the essence of the 
and possibly operator modeling consists in taking into 
account the whole database (set of tuples) while com- 
bining the required and desired conditions matching 
degrees. Namely: 


© If there is a tuple which satisfies both conditions 
then and only then it is actually possible to satisfy 


both of them and each tuple have to meet both of 
them, i. e., the and possibly turns into a regular con- 
junction C A P, 

© If there is no such tuple then it is not possible to 
satisfy both conditions and the desired one can be 
disregarded, i. e., the query reduces to C. 


These are however two extreme cases and actually 
it may be the case that the two conditions may be simul- 
taneously satisfied to a degree. Then, the (C, P) query 
may be also matched to a degree which is identified 
with the truth of the following formula 


C(t) and possibly P(t) 
= C(t) AAs(C(s) A P(s)) > P(t) (19.17) 


This formula has been proposed by Lacroix and 
Lavency [19.93] for the crisp case. Its fuzzy coun- 
terpart [19.98-100] requires to choose a proper inter- 
pretation of the logical connectives, and may take the 
following form 


C(t) and possibly P(t) 


= min foo, max f — max min(C(s),P(s)- P| 
) (19.18) 


where J” denotes the whole set of tuples being queried. 

Formula (19.16) is derived from (19.17) using the 
classical fuzzy interpretation of the logical connec- 
tives via the max and min operators. Zadrozny and 
Kacprzyk [19.100-102] studied the properties of the 
counterparts of (19.18) obtained using a broader class 
of the operators modeling logical connectives. 

It is worth noting that if the required/preferred se- 
mantics is assumed and the bipolar evaluations are 
expressed at the level of an attribute domain then it 
is reasonable to impose some consistency conditions 
on the form of both fuzzy sets representing condition 
C and P. Namely, it may be argued that a domain el- 
ement should be first acceptable, i.e., should satisfy 
the required condition C, before it may be desired, 
i.e., satisfy the condition P. Such consistency condi- 
tions between fuzzy sets C and P may be conveniently 
expressed using the concepts of twofold fuzzy sets 
or Afanassov intuitionistic fuzzy sets/interval-valued 
fuzzy sets, referred to earlier in Sect. 19.3.2. For an 
in-depth discussion of such consistency conditions the 
reader is referred to [19.56, 92, 103]. 

The growing interest in modeling bipolarity of user 
preferences in queries resulted recently in some further 


Soft Computing in Database and Information Management 


studies and interpretations of the and possibly operator 
as well as in the concept of new similar operators such 
as the or at least operator. For more details, the reader 
is referred to [19.104, 105]. 


19.4.2 Flexible Querying of Fuzzy Databases 


Possibilistic Approach 

The possibilistic approach to data modeling is based 
on the sound foundations of a well-developed theory, 
i.e., the possibility theory. Thus, the standard rela- 
tional algebra operations have their counterparts in an 
algebra for retrieving information from a fuzzy pos- 
sibilistic relational database, proposed by Prade and 
Testemale [19.31]. Let us consider the selection op- 
eration o. In the classical relational algebra it is an 
unary operation which for a given relation R returns an- 
other relation o (R), comprising these tuples of R which 
satisfy a condition c (such a condition is a kind of a pa- 
rameter of the selection operations). In the possibilistic 
approach, the selection operation has to be redefined 
so as to make it compatible with the assumed data 
representation. To this end, two types of elementary 
conditions are considered: 


(i) A 0 a, where A is an attribute, 0 is a comparison 
operator (fuzzy or not) and a is a constant (fuzzy or 
not); 

(ii) A; 0 Aj, where A; and A; are attributes. 


In general, an exact value of an attribute is un- 
known and, thus, the matching degree is defined as 
the possibility and necessity of the match between this 
value and the constant (case (i) above) or the value of 
another attribute (case (ii) above). Hence, the formu- 
las (19.17)-(19.18) are used to compute the possibility 
and necessity in the following way. 

In case (i), the possibility distribution 74, (+) repre- 
senting the value of the attribute A at a tuple ¢ is used 
to compute the possibility measure of a set F, crisp or 
fuzzy, of elements of dom, being in the relation 0 with 
elements representing the constant a. The membership 
function of the set F is 


r(x) = sup min(He(x,y),Maly)), x Edoma, 


yEdoma 
(19.19) 


where Hal) is the membership function of the constant 
a and uo(:) represents the fuzzy comparison operator 
(fuzzy relation) 0. Then, the pair (Mae (F), Naw (F)) 


represents the membership of a tuple f to the relation 
being the result of the selection operator. 

In case (ii), the joint possibility distribution 
Taia; 0) C) is used to compute the possibility mea- 
sure of a subset F of the Cartesian product of domains 
of A; and A; comprising the pairs of elements being in 
relation 0. The membership function of the set F is de- 
fined as follows 


ur x,y) = hex y), x€dony;, y € dom, . 
(19.20) 


Then, the pair Waway) E), Naio.) (F)) repre- 
sents the membership degree of a tuple ¢ to the relation 
being the result of the selection operator. If the at- 
tributes A; and A; are noninteractive [19.27] then the 
computing of the possibility measure is simplified. 
Namely 


T(Aj(1).A()) (XY) = mina X), Tay (Y)) - 
(19.21) 


It is worth noting that Prade and Testemale [19.27], in 
fact, consider the answer to a query as composed of two 
fuzzy sets of tuples: 


@ Those which necessarily match the query; the mem- 
bership function degree for each tuple of this set is 
defined by Nain (F)); 

@ Those which possibly match the query; the mem- 
bership function degree for each tuple of this set is 
defined by Miao (F)). 


Prade and Testemale [19.27] consider also the case 
when the selection operation is used with a compound 
condition C, i.e., C=C; ACr or C=C; VC or C= 
—=C,. Due to the fact that, in general, a calculus of un- 
certainty, exemplified by the possibility theory, cannot 
be truth functional [19.106], it is not enough to compute 
the possibility and necessity measures for elementary 
conditions using possibility distributions representing 
values of particular attributes and then combine them 
using an appropriate operator. In order to secure effec- 
tive computing of the result of the selection operator 
it is thus assumed that the attributes referred to in ele- 
mentary conditions are noninteractive. In such a case, 
truth-functional combination of the obtained possibility 
and necessity measures is justified. 

Dubois and Prade [19.58] propose a technique of 
querying bipolar data using bipolar queries. A tuple 
may be classified to many categories with respect to an 
answer to such a query. 


19.4 Soft Computing in Querying 307 


76L| 9 Hed 


308 PartB 


Fuzzy Logic 


16L| a Hed 


In the extended possibilistic approach [19.107] the 
matching degree of an elementary condition against 
a tuple f is expressed by an EPTV (cf., page 298). This 
EPTV represents the extent to which it is (un)certain 
that t belongs to the result of a flexible query. Let us 
consider a query condition of the form A is /, where A 
denotes an attribute and / denotes a fuzzy set represent- 
ing a linguistic term used, in a query such as, e.g., low 
in Price is low in a query. Then, the EPTV representing 
the matching degree will be computed as 


My (a is p(T) = sup min(z4(x), Wi(x)), (19.22) 
x€dom, 
Me(aisn(F)= sup min(m (x), 1- wiQ)), 
xEdoma— {L} 
(19.23) 
«(a is (L) = min(m (L), 1 — u(L)), (19.24) 


where z4(-) denotes the possibility distribution repre- 
senting the value of the attribute A (to simplify the 
notaion we omit here a reference to a tuple f). In the case 
of a compound query condition, the resulting EPTV can 
be obtained by aggregating the EPTVs computed for the 
elementary conditions. Hereby, generalizations of the 
logical connectives of the conjunction (A), disjunction 
(v), negation (~), implication (—), and equivalence 
(<>) can be applied according to [19.14, 108]. 

Baldwin et al. [19.109] have implemented a system 
for querying a possibilistic relational database using 
semantic unification and the evidential support logic 
rule to combine matching degrees of the elementary 
conditions. The queries are composed of one or more 
conditions, the importance of each condition, a filter- 
ing function (similar to the notion of quantifier) and 
a threshold. The particularity of their work is the pro- 
cess, semantic unification, used for matching the fuzzy 
values of the criteria with the possibility distributions 
representing the values of the attributes. As a result, one 
obtains an interval [n, p], where, similarly to the previ- 
ous case, n (necessity) is the certain degree of matching 
and p (possibility) is the maximum possible degree of 
matching. However, this time the calculations are based 
on the mass assignments theory developed by Baldwin. 
In this approach, an interactive iterative process of the 
querying is postulated. 

Bosc and Pivert [19.110] proposed another type of 
a query against a possibilistic database. Namely, the 
user may be interested in finding tuples which have 
a specific features of the possibility distribution repre- 
senting the value of an attribute. Thus, the condition of 
such a query does not refer to the value of an attribute 


itself but to the characteristics of its possibility distribu- 
tion. This new type of queries may be illustrated with 
the following examples: 


I. Find tuples such that all the values a), az,.. 
possible for an attribute A. 

II. Find tuples such that more than n values are possible 
to a degree higher than A for an attribute A. 

HI. Find tuples where for attribute A the value a, is 
more possible than the value ap. 

IV. Find tuples where for attribute A only one value is 
completely possible. 


., dy are 


The matching degree for such queries is computed 
in a fairly straightforward way. For the query of type I. it 
may be computed as min(74 (a1), 74 (a2), .. . , T4 (an)). 

The reader is referred for more details to the fol- 
lowing sources on fuzzy querying in the possibilistic 
setting [19.28, 29, 32, 111]. 


Similarity-Based Approach 
The research on querying in similarity-based fuzzy 
databases is best presented in a series of papers by 
Petry etal. [19.112-114]. A complete set of opera- 
tions of the relational algebra has been defined for 
the similarity relation-based model. These operations 
result from their classical counterparts by the replace- 
ment of the concept of equality of two domain values 
with the concept of their similarity. The conditions of 
queries are composed of crisp predicates as in a regular 
query language. Additionally, a set of level thresholds 
may be submitted as a part of the query. A threshold 
may be specified for each attribute appearing in query’s 
condition. Such a threshold indicates what degree of 
similarity of two values from the domain of a given at- 
tribute justifies to consider them equal. The concept of 
the threshold level also plays a central role in the defini- 
tion of the redundancy in this database model and thus 
is important for the relational algebra operations as they 
are usually assumed to be followed by the reduction 
of redundant tuples. In this model, two tuples are re- 
dundant if the values of all corresponding attributes are 
similar (to a level higher than a selected degree) rather 
than equal, as it is the case in the traditional relational 
data model. 

Hybrid models, mentioned earlier, are usually ac- 
companied by their own querying schemes. For exam- 
ple, the GEFRED model is equipped with a generalized 
fuzzy relational algebra. Galindo et al. [19.115] have 
extended the GEFRED model with a fuzzy domain 
relational calculus (FDRC) and in [19.116] the fuzzy 
quantifiers have been included. 


Soft Computing in Database and Information Management 


References 


19.5 Conclusions 


In this chapter, we have presented an overview of se- 
lected contributions in the areas of data representation 
and querying. We have focused on approaches rooted 
in the relational data model. In the literature, many ap- 
proaches have been also proposed for, e.g., fuzzy object 


References 


19.1 H.F. Korth, A. Silberschatz: Database research 
faces the information explosion, Communication 
ACM 4&0(2), 139-142 (1997) 


19.2 L.A. Zadeh: Fuzzy logic, neural networks, and 
soft computing, Communication ACM 37(3), 77-84 
(1994) 

19.3 D. Kastrinakis, Y. Tzitzikas: Advancing search query 


autocompletion services with more and better 
suggestions, Lect. Notes Comput. Sci. 6189, 35-49 
(2010) 

19.4 R. Ozcan, l.S. Altingovde, 0. Ulusoy: Exploiting 
navigational queries for result presentation and 
caching in Web search engines, J. Am. Soc. Inf. 
Sci. Technol. 62(4), 714-726 (2011) 

19.5 T. Gaasterland, P. Godfrey, J. Minker: An overview 
of cooperative answering, J. Intell. Inf. Syst. 1, 
123-157 (1992) 

19.6 E.F. Codd: A relational model of data for large 
shared data banks, Communication ACM 13(6), 
377-387 (1970) 

19.7 R. Elmasri, S. Navathe: Fundamentals of Database 
Systems, 6th edn. (Addison Wesley, Boston 
2011) 

19.8 ISO/IEC 9075-1:2011: Information technology - 
Database languages - SQL - Part 1: Framework 
(SQL/Framework) (2011) 

19.9 L.A. Zadeh: Fuzzy sets, Inf. Control 8(3), 338-353 
(1965) 

19.10 D. Dubois, H. Prade: The three semantics of fuzzy 
sets, Fuzzy Sets Syst. 90(2), 141-150 (1997) 

19.11 L.A. Zadeh: Fuzzy sets as a basis for a theory of 
possibility, Fuzzy Sets Syst. 1(1), 3-28 (1978) 

19.12 D. Dubois, H. Prade: Possibility Theory (Plenum, 
New York 1988) 

19.13 P. Bosc, L. Duval, 0. Pivert: Value-based and 
representation-based querying of possibilistic 
databases. In: Recent Research Issues on Fuzzy 
Databases, ed. by G. Bordogna, G. Pasi (Physica, 
Heidelberg 2000) pp. 3-27 

19.14 G. De Tré: Extended possibilistic truth values, Int. 
J. Intell. Syst. 17, 427-446 (2002) 

19.15 Z. Pawlak: Rough sets, Int. J. Parallel Program. 
11(5), 341-356 (1982) 

19.16 A.N. Kolmogorov: Foundations of the Theory 
of Probability, 2nd edn. (Chelsea, New York 
1956) 


oriented models or fuzzy spatial information modeling 
which are not covered here due to the lack of space. We 
have compiled an extensive list of references which will 
hopefully help the reader to study particular approaches 
in detail. 


19.17 R. De Caluwe (Ed.): Fuzzy and Uncertain Object- 
oriented Databases: Concepts and Models (World 
Scientific, Signapore 1997) 

19.18 A. Yazici, R. George: Fuzzy Database Modeling 
(Physica, Heidelberg 1999) 

19.19 G. Bordogna, G. Pasi: Linguistic aggregation op- 
erators of selection criteria in fuzzy informational 
retrieval, Int. J. Intell. Syst. 10(2), 233-248 (1995) 

19.20 Z. Ma (Ed.): Advances in Fuzzy Object-Oriented 
Databases: Modeling and Applications (Idea 
Group, Hershey 2005) 

19.21 J. Galindo, A. Urrutia, M. Piattini (Eds.): Fuzzy 
Databases: Modeling, Design and Implementa- 
tion (Idea Group, Hershey 2006) 

19.22 J. Galindo (Ed.): Handbook of Research on Fuzzy 
Information Processing in Databases (Idea Group, 
Hershey 2008) 

19.23 K.V.S.V.N. Raju, A.K. Majumdar: The study of joins 
in fuzzy relational databases, Fuzzy Sets Syst. 21(1), 
19-34 (1987) 

19.24 P. Smets: Imperfect Information: Imprecision and 
uncertainty. In: Uncertainty Management in In- 
formation Systems: From Needs to Solution, ed. by 
A. Motro, P. Smets (Kluwer, Boston 1996) pp. 225- 
254 

19.25 L.A. Zadeh: The concept of a linguistic variable 
and its application to approximate reasoning, 
Part |, Inf. Sci. 8(3), 43-80 (1975) 

19.26 D. Dubois, H. Prade: Fuzzy sets in approximate 
reasoning, Part 1: Inference with possibility dis- 
tributions, Fuzzy Sets Syst. 40, 143-202 (1991) 

19.27 H. Prade, C. Testemale: Representation of soft 
constraints and fuzzy attribute values by means of 
possibility distributions in databases. In: Analy- 
sis of Fuzzy Information, ed. by J.C. Bezdek (Taylor 
Francis, Boca Raton 1987) pp. 213-229 

19.28 M. Umano: FREEDOM-0: A fuzzy database system. 
In: Fuzzy Information and Decision Processes, ed. 
by M.M. Gupta, E. Sanchez (Elsevier, Amsterdam 
1982) pp. 339-347 

19.29 M. Zemankova-Leech, A. Kandel: Fuzzy relational 
Data Bases: A Key to Expert Systems (Verlag TUV 
Rheinland, Cologne 1984) 

19.30 M. Zemankova-Leech, A. Kandel: Implementing 
imprecision in information systems, Inf. Sci. 37(1- 
3), 107-141 (1985) 


309 


6L| d Hed 


310 Part B | Fuzzy Logic 
19.31 H. Prade, C. Testemale: Generalizing database re- rough set techniques, Comput. Intell. 11, 233-245 
lational algebra for the treatment of incomplete (1995) 
or uncertain information and vague queries, Inf. 19.50 T. Beaubouef, F.E. Petry, G. Arora: Information- 
Sci. 34, 115-143 (1984) theoretic measures of uncertainty for rough sets 
19.32 M. Umano, S. Fukami: Fuzzy relational alge- and rough relational databases, Inf. Sci. 109,185- 
bra for possibility-distribution-fuzzy-relational 195 (1998) 
model of fuzzy data, J. Intell. Inf. Syst. 3, 7-27 19.51 Y. Takahashi: Fuzzy database query languages 
(1994) and their relational completeness theo- 
ay 19.33 B.P. Buckles, F.E. Petry: A fuzzy representation of rem, IEEE Trans. Knowl. Data Eng. 5, 122-125 
x data for relational databases, Fuzzy Sets Syst. 7, (1993) 
ow 213-226 (1982) 19.52 J.M. Medina, 0. Pons, M.A. Vila: GEFRED. A gen- 
= 19.34 M. Anvari, G.F. Rose: Fuzzy relational databases. eralized model of fuzzy relational databases, Inf. 
O In: Analysis of Fuzzy Information, ed. by Sci. 76(1/2), 87-109 (1994) 
J.C. Bezdek (Taylor Francis, Boca Raton 1987) 19.53 S. Zadrożny, G. De Tré, J. Kacprzyk: On some ap- 
pp. 203-212 proaches to possibilistic bipolar data modeling 
19.35 E. Wong: A statistical approach to incomplete in databases. In: Advances in Fuzzy Sets, Intu- 
information in database systems, ACM Trans. itionistic Fuzzy Sets. Generalized Nets and Related 
Database Syst. 7, 470-488 (1982) Topics, Vol. Il: Applications, ed. by K.T. Atanassov, 
19.36 M. Pittarelli: An algebra for probabilistic 0. Hryniewicz, J. Kacprzyk, M. Krawczak, Z. Na- 
databases, IEEE Trans. Knowl. Data Eng. 6, horski, E. Szmidt, S. Zadrożzny (EXIT, Warsaw 2008) 
293-303 (1994) pp. 197-220 
19.37 0. Benjelloun, A. Das Sarma, C. Hayworth, 19.54 S. Benferhat, D. Dubois, S. Kaci, H. Prade: Bipo- 
J. Widom: An introduction to ULDBs and the trio lar possibilistic representations, Proc. 18th Conf. 
system, IEEE Data Eng. Bull. 29(1), 5-16 (2006) Uncertain. Artif. Intell. (2002) pp. 45-52 
19.38 E. Michelakis, D.Z. Wang, M.N. Garofalakis, 19.55 D. Dubois, H. Fargier: Qualitative decision- 
J.M. Hellerstein: Granularity conscious model- making with bipolar information, Proc. 10th Int. 
ing for probabilistic databases, Proc. ICDM DUNE Conf. Princ. Knowl. Represent. Reason. (2006) 
(2007) pp. 501-506 pp. 175-186 
19.39 L. Antova, T. Jansen, C. Koch, D. Olteanu: Fastand 19.56 D. Dubois, H. Prade: Handling bipolar queries 
simple relational processing of uncertain data, in fuzzy information processing. In: Handbook 
Proc. 24th Int. Conf. Data Eng. (2008) pp. 983-992 of Research on Fuzzy Information Processing in 
19.40 P. Bosc, 0. Pivert: Modeling and querying uncer- Databases, ed. by J. Galindo (Information Science 
tain relational databases: A survey of approaches Reference, Hershey 2008) pp. 97-114 
based on the possible world semantics, Int. J. Un- 19.57 D. Dubois, H. Prade (eds.): Int. J. Intell. Syst. 
certain. Fuzziness Knowl. Syst. 18(5), 565-603 23(8/10) 863-1152 (2008), Special issues on bipo- 
(2010) lar representations of information and preference 
19.41 L.A. Zadeh: Quantitative fuzzy semantics, Inf. Sci. (Part 1A, 1B, 2) 
3(2), 177-200 (1971) 19.58 D. Dubois, H. Prade: An overview of the asymmet- 
19.42 S. Gottwald: Set theory for fuzzy sets of a higher ric bipolar representation of positive and negative 
level, Fuzzy Sets Syst. 2(2), 125-151 (1979) information in possibility theory, Fuzzy Sets Syst. 
19.43 G. De Tré, R. De Caluwe: Level-2 fuzzy sets 160(10), 1355-1366 (2009) 
and their usefulness in object-oriented database 19.59 D. Dubois, H. Prade: An introduction to bipolar 
modelling, Fuzzy Sets Syst. 140, 29-49 (2003) representations of information and preference, 
19.44 Y. Vassiliou: Null values in data base manage- Int. J. Intell. Syst. 23(8), 866-877 (2008) 
ment: A denotational semantics approach, Proc. 19.60 K.T. Atanassov: On Intuitionistic Fuzzy Sets Theory 
SIGMOD Conf. (1979) pp. 162-169 (Springer, Berlin, Heidelberg, 2012) 
19.45 C. Zaniolo: Database relations with null values, 19.61 J. Kacprzyk, S. Zadrozny: Linguistic database sum- 
J. Comput. Syst. Sci. 28(1), 142-166 (1984) maries and their protoforms: Towards natural 
19.46 E.F. Codd: Missing information (applicable and language based knowledge discovery tools, Inf. 
inapplicable) in relational databases, ACM SIGMOD Sci. 173(4), 281-304 (2005) 
Rec. 15(4), 53-78 (1986) 19.62 J. Kacprzyk, S. Zadrozny: Computing with words 
19.47 G. De Tré, R. De Caluwe, H. Prade: Null values is an implementable paradigm: Fuzzy queries, 
in fuzzy databases, J. Intell. Inf. Syst. 30, 93-114 linguistic data summaries, and natural-language 
(2008) generation, IEEE Trans. Fuzzy Syst. 18(3), 461-472 
19.48 S. Shenoi, A. Melton: Proximity relations in the (2010) 
fuzzy relational database model, Fuzzy Sets Syst. 19.63 A. Laurent: Querying fuzzy multidimensional 
31(3), 285-296 (1989) databases: Unary operators and their properties, 
19.49 T. Beaubouef, F.E. Petry, B.P. Buckles: Extension Int. J. Uncertain. Fuzziness Knowl. Syst. 11, 31-46 


of the relational database and its algebra with 


(2003) 


Soft Computing in Database and Information Management 


References 


19.64 


19.65 


19.66 


19.67 


19.68 


19.69 


19.70 


19.71 


19.72 


19.73 


19.74 


19.75 


19.76 


19.77 


19.78 


19.79 


19.80 


T. Radecki: Mathematical model of information 
retrieval based on the theory of fuzzy sets, Inf. 
Process. Manag. 13(2), 109-116 (1977) 

V. Tahani: A conceptual framework for fuzzy 
query processing — A step toward very intelli- 
gent database systems, Inf. Process. Manag. 13(5), 
289-303 (1977) 

J. Kacprzyk, A. Zidtkowski: Database queries with 
fuzzy linguistic quantifiers, IEEE Trans. Syst. Man 
Cybern. 16(3), 474-479 (1986) 

J. Kacprzyk, S. Zadrozny, A. Zidtkowski: FQUERY 
Ill+: A human-consistent database querying sys- 
tem based on fuzzy logic with linguistic quanti- 
fiers, Inf. Syst. 14(6), 443-453 (1989) 

P. Bosc, 0. Pivert: An approach for a hierarchical 
aggregation of fuzzy predicates, Proc. 2nd IEEE Int. 
Conf. Fuzzy Syst. (1993) pp. 1231-1236 

P. Bosc, 0. Pivert: Some approaches for relational 
databases flexible querying, J. Intell. Inf. Syst. 
1(3/4), 323-354 (1992) 

P. Bosc, 0. Pivert: Fuzzy querying in conventional 
databases. In: Fuzzy Logic for the Management of 
Uncertainty, ed. by L.A. Zadeh, J. Kacprzyk (Wiley, 
New York 1992) pp. 645-671 

P. Bosc, 0. Pivert: SQLf: A relational database lan- 
guage for fuzzy querying, IEEE Trans. Fuzzy Syst. 
3, 1-17 (1995) 

J. Kacprzyk, S. Zadrozny: FQUERY for Access: Fuzzy 
querying for windows-based DBMS. In: Fuzziness 
in Database Management Systems, ed. by P. Bosc, 
J. Kacprzyk (Physica, Heidelberg 1995) pp. 415- 
433 

J. Galindo, J.M. Medina, 0. Pons, J.C. Cubero: 
A server for Fuzzy SQL queries, Lect. Notes Artif. 
Intell. 1495, 164-174 (1998) 

J. Kacprzyk, S. Zadrozny: Computing with words 
in intelligent database querying: Standalone and 
internet-based applications, Inf. Sci. 134(1-4), 71- 
109 (2001) 

R.R. Yager: Fuzzy quotient operators for fuzzy re- 
lational databases, Proc. Int. Fuzzy Eng. Symp. 
(1991) pp. 289-296 

D. Dubois, H. Prade: Semantics of quotient opera- 
tors in fuzzy relational databases, Fuzzy Sets Syst. 
78, 89-93 (1996) 

J. Galindo, J.M. Medina, J.C. Cubero, M.T. Garcia: 
Relaxing the universal quantifier of the division 
in fuzzy relational databases, Int. J. Intell. Syst. 
16(6), 713-742 (2001) 

P. Bosc, 0. Pivert, D. Rocacher: Tolerant divi- 
sion queries and possibilistic database querying, 
Fuzzy Sets Syst. 160(15), 2120-2140 (2009) 

P. Bosc, 0. Pivert: On diverse approaches to bipo- 
lar division operators, Int. J. Intell. Syst. 26(10), 
911-929 (2011) 

Y. Takahashi: A fuzzy query language for relational 
databases. In: Fuzziness in Database Manage- 
ment Systems, ed. by P. Bosc, J. Kacprzyk (Physica, 
Heidelberg 1995) pp. 365-384 


19.81 


19.82 


19.83 


19.84 


19.85 


19.86 


19.87 


19.88 


19.89 


19.90 


19.91 


19.92 


19.93 


19.94 


19.95 


19.96 


19.97 


19.98 


D. Dubois, H. Prade: Using fuzzy sets in flexi- 
ble querying: Why and how? In: Flexible Query 
Answering Systems, ed. by T. Andreasen, H. Chris- 
tiansen, H.L. Larsen (Kluwer, Boston 1997) pp. 45- 
60 

L.A. Zadeh: A computational approach to fuzzy 
quantifiers in natural languages, Comput. Math. 
Appl. 9, 149-184 (1983) 

S. Zadrozny, J. Kacprzyk: Issues in the practical use 
of the OWA operators in fuzzy querying, J. Intell. 
Inf. Syst. 33(3), 307-325 (2009) 

R.R. Yager: Interpreting linguistically quantified 
propositions, Int. J. Intell. Syst. 9, 541-569 (1994) 
P. Bosc, O. Pivert, L. Lietard: Aggregate operators 
in database flexible querying, Proc. IEEE Int. Conf. 
Fuzzy Syst. (2001) pp. 1231-1234 

P. Bosc, L. Lietard, 0. Pivert: Sugeno fuzzy inte- 
gral as a basis for the interpretation of flexible 
queries involving monotonic aggregates, Inf. Pro- 
cess. Manag. 39(2), 287-306 (2003) 

M.A. Vila, J.-C. Cubero, J.-M. Medina, 0. Pons: 
Using OWA operator in flexible query process- 
ing. In: The Ordered Weighted Averaging Opera- 
tors: Theory and Applications, ed. by R.R. Yager, 
J. Kacprzyk (Kluwer, Boston 1997) pp. 258-274 

0. Pivert, P. Bosc: Fuzzy Preference Queries to Re- 
lational Database (Imperial College, London 2012) 
M. Grabisch, S. Greco, M. Pirlot: Bipolar and bi- 
variate models in multicriteria decision analy- 
sis: Descriptive and constructive approaches, Int. 
J. Intell. Syst. 23, 930-969 (2008) 

T. Matthé, G. De Tré: Bipolar query satisfaction 
using satisfaction and dissatisfaction degrees: 
Bipolar satisfaction degrees, Proc. ACM Symp. 
Appl. Comput. (2009) pp. 1699-1703 

T. Matthé, G. De Tré, S. Zadrozny, J. Kacprzyk, 
A. Bronselaer: Bipolar database querying using 
bipolar satisfaction degrees, Int. J. Intell. Syst. 
26(10), 890-910 (2011) 

D. Dubois, H. Prade: Bipolarity in flexible query- 
ing, Proc. 5th Int. Conf. Flex. Query Answ. Syst. 
(2002) pp. 174-182 

M. Lacroix, P. Lavency: Preferences: Putting more 
knowledge into queries, Proc. 13 Int. Conf. Very 
Large Databases (1987) pp. 217-225 

R.R. Yager: Using approximate reasoning to repre- 
sent default knowledge, Artif. Intell. 31(1), 99-112 
(1987) 

D. Dubois, H. Prade: Default reasoning and 
possibility theory, Artif. Intell. 35(2), 243-257 
(1988) 

G. Bordogna, G. Pasi: Linguistic aggregation op- 
erators of selection criteria in fuzzy information 
retrieval, Int. J. Intell. Syst. 10(2), 233-248 (1995) 

J. Chomicki: Preference formulas in relational 
queries, ACM Trans. Database Syst. 28(4), 427-466 
(2003) 

S. Zadrozny: Bipolar queries revisited, Proc. 
Model. Decis. Artif. Intell. (2005) pp. 387-398 


311 


6L| d Hed 


312 


6l | d Hed 


Part B 


Fuzzy Logic 


19.99 


19.100 


19.101 


19.102 


19.103 


19.104 


19.105 


19.106 


19.107 


S. Zadrożny, J. Kacprzyk: Bipolar queries and 
queries with preferences, Proc. 17th Int. Conf. 
Database Expert Syst. Appl. (2006) pp. 415-419 

S. Zadrożny, J. Kacprzyk: Bipolar queries: An ag- 
gregation operator focused perspective, Fuzzy 
Sets Syst. 196, 69-81 (2012) 

S. Zadrożny, J. Kacprzyk: Bipolar queries us- 
ing various interpretations of logical connectives, 
Proc. IFSA Congr. (2007) pp. 181-190 

S. Zadrożny, J. Kacprzyk: Bipolar queries: An 
approach and its various interpretations, Proc. 
IFSA/EUSFLAT Conf. (2009) pp. 1288-1293 

G. De Tré, S. Zadrozny, A. Bronselaer: Handling 
bipolarity in elementary queries to possibilistic 
databases, IEEE Trans. Fuzzy Syst. 18(3), 599-612 
(2010) 

D. Dubois, H. Prade: Modeling and if possible and 
or at least: Different forms of bipolarity in flex- 
ible querying. In: Flexible Approaches in Data, 
Information and Knowledge Management, ed. by 
O. Pivert, S. Zadrozny (Springer, Berlin, Heidelberg 
2014) 

L. Liétard, N. Tamani, D. Rocacher: Fuzzy bipo- 
lar conditions of type or else, Proc. FUZZ-IEEE 2011 
(2011) pp. 2546-2551 

D. Dubois, H. Prade: Gradualness, uncertainty and 
bipolarity: Making sense of fuzzy sets, Fuzzy Sets 
Syst. 192, 3-24 (2012) 

G. De Tré, R. De Caluwe: Modelling uncertainty 
in multimedia database systems: An extended 
possibilistic approach, Int. J. Uncertain. Fuzziness 
Knowl. Syst. 11(1), 5-22 (2003) 


19.108 


19.109 


19.110 


19.111 


19.112 


19.113 


19.114 


19.115 


19.116 


G. De Tré, B. De Baets: Aggregating constraint sat- 
isfaction degrees expressed by possibilistic truth 
values, IEEE Trans. Fuzzy Syst. 11(3), 361-368 (2003) 
J.F. Baldwin, M.R. Coyne, T.P. Martin: Querying 
a database with fuzzy attribute values by iterative 
updating of the selection criteria, Proc. IJCAI'93 
Workshop Fuzzy Logic Artif. Intell. (1993) pp. 62- 
76 

P. Bosc, 0. Pivert: On representation-based 
querying of databases containing ill-known val- 
ues, Proc. 10th Int. Symp. Found. Intell. Syst. 
(1997) pp. 477-486 

P. Bosc, 0. Pivert: Possibilistic databases and 
generalized yes/no queries, Proc. 15th Int. Conf. 
Database Expert Syst. Appl. (2004) pp. 912-916 
B.P. Buckles, F.E. Petry: Query languages for fuzzy 
databases. In: Management Decision Support 
Systems Using Fuzzy Sets and Possibility Theory, 
ed. by J. Kacprzyk, R. Yager (Verlag TUV Rheiland, 
Cologne 1985) pp. 241-251 

B.P. Buckles, F.E. Petry, H.S. Sachar: A domain cal- 
culus for fuzzy relational databases, Fuzzy Sets 
Syst. 29, 327-340 (1989) 

F.E. Petry: Fuzzy Databases: Principles and Appli- 
cations (Kluwer, Boston 1996) 

J. Galindo, J.M. Medina, M.C. Aranda: Querying 
fuzzy relational databases through fuzzy domain 
calculus, Int. J. Intell. Syst. 14, 375-411 (1999) 

J. Galindo, J.M. Medina, J.C. Cubero, M.T. Garcia: 
Fuzzy quantifiers in fuzzy domain calculus, Proc. 
Int. Conf. Inf. Process. Manag. Uncertain. Knowl. 
Syst. (2000) pp. 1697-1704 


20. Application of Fuzzy Techniques 


to Autonomous Robots 


Ismael Rodriguez Fdez, Manuel Mucientes, Alberto Bugarin Diz 


The application of fuzzy techniques in robotics has 
become widespread in the last years and in dif- 

ferent fields of robotics, such as behavior design, 
coordination of behavior, perception, localization, 
etc. The significance of the contributions was high 
until the end of the 1990s, where the main aim in 
robotics was the implementation of basic behav- 
iors. In the last years, the focus in robotics moved 
to building robots that operate autonomously in 

real environments; the actual impact of fuzzy tech- 
niques in the robotics community is not as deep 
as it was in the early stages of robotics or as it is in 
other application areas (e.g., medicine, processes 
industry ...). In spite of this, new emerging ar- 

eas in robotics such as human-robot interaction, 
or well-established ones, such as perception, are 
good examples of new potential realms of appli- 
cations where (hybridized) fuzzy approaches will 

surely be capable of exhibiting their capacity to 

deal with such complex and dynamic scenarios. 


20.1 Robotics and Fuzzy Logic 


Although many other classical definitions could be 
stated, an autonomous robot may be defined as a ma- 
chine that collects data from the environment through 
sensors, processes these data taking into account its pre- 
vious knowledge of the world, and acts according to 
a goal. This definition is general and covers the different 
types of robots available today: indoor wheeled robots, 
autonomous cars, unmanned aerial vehicles (UAVs), 
autonomous underwater vehicles, robotic arms, hu- 
manoid robots, robotic heads, etc. 

Between the early 1990s and today robotics has 
evolved a lot. From our point of view, it is possible 
to distinguish three stages in robotics research in the 
last years. At the beginning, the focus was on endow- 
ing robots with a number of simple behaviors to solve 
basic tasks like wall-following, obstacle avoidance, 


20.1 Robotics and Fuzzy Logic.................... 313 
20.2 Wall-Following ......................ccccccccees 314 
20.3 Navigation ................cee cece 315 
20.4 Trajectory Tracking ...................0008 317 
20.5 Moving Target Tracking ..................... 318 
20.6 Perception ................. cece ceeeeeeeeeeee 319 
20.7 PIANMING............. cece cece cee eeeseeeeeeeeees 319 
BU. SLAN isre E EE 320 
20.9 Cooperation ...................ccccceeeeeeeeeeeeees 320 
20.10 Legged Robots ...................0::ceeeeeeeeees 321 
20.11 Exoskeletons and Rehabilitation 

RODOS 22:05 sciccees secavsnsancecaghannceasacacoedns 322 
20.12 Emotional Robots........................00006 323 
20.13 Fuzzy Modeling .....................:::000eeeee 323 
20.14 Comments and Conclusions................. 324 
References... cc ceeeceteeeeeeeeeaeeeenees 325 


entering rooms, etc. Later, the objective in robotics 
moved to building truly autonomous robots, which re- 
quired the mapping of the environment, the localization 
of the robot, and navigation or motion planning. Al- 
though these topics are still open, the focus of current 
robotics is starting to move to a third stage where much 
higher level and integrated capabilities, such as ad- 
vanced perception, learning of complex behaviors, or 
human-robot interaction, are involved. 

Within this context, fuzzy logic has been widely 
used in robotics for several purposes. The main ad- 
vantage of using fuzzy logic in robotics is its ability 
to manage the uncertainty due to sensors, actuators, 
and also in the knowledge about the world. Until the 
end of the 1990s, the contributions of fuzzy logic were 
mainly in three fields [20.1]: behaviors, coordination 


313 


v 
o 
æ] 
Co 
[se] 
N 
i=) 
° 
= 


314 Part B 


Fuzzy Logic 


7°02 | d Hed 


Table 20.1 Distribution of the 98 publications considered 
in the respective rankings: Thomson-Reuters Web of Sci- 
ence (after [20.6]; JCR 2012-WoK) for journals and Mi- 
crosoft Academic Search (MAS 2013; after [20.5] for 
conferences 


Quartile Journals %papers Conferences %papers 
no.papers no. papers 

Q1 68 81 10 12 

Q2 10 12 1 7 
Q3 2 2 1 7 

Q4 1 1 0 0 
No 3 4 2 14 
ranking 

Total 84 100 14 100 


of behaviors, and perception. The design of behaviors 
for solving specific and simple tasks in robotics has 
been undoubtedly the most successful application of 
fuzzy logic in robotics. As examples of these behaviors 
we have wall-following, navigation, trajectory track- 
ing, moving objects tracking, etc. Also, the selection 
and/or combination of these basic behaviors has been 
solved with fuzzy logic [20.2—4]. Finally, fuzzy tech- 
niques have contributed to perception in two lines: i) 
for the preprocessing of sensor data, prior to their use 
as input to the behavior and ii) for modeling the uncer- 
tainty both in occupancy and feature-based maps. 

In this chapter we describe and analyze the contri- 
butions of fuzzy techniques to robotics in the period 
2003-2013. A number of 98 references related to the 


20.2 Wall-Following 


When autonomous robots navigate within indoor envi- 
ronments (e.g., industrial or civil buildings), they have 
to be endowed with a number of basic capabilities that 
allow the robot to perform specific tasks during oper- 
ation. These basic capabilities are usually referred to 
in the robotics literature as behaviors. Some examples 
of usual robot behaviors are the ability to move along 
corridors, to follow walls at a given distance, to turn cor- 
ners, and to cross open areas in rooms. Wall-following 
behavior is one of the most relevant ones and has been 
very widely dealt with in the robotics literature, since it 
is one of the basic behaviors to be executed when the 
robot is exploring an unknown area, or when it is mov- 
ing between two points in a map. 

The characteristic that makes a fuzzy controller use- 
ful for the implementation of this and other behaviors is 


topic fuzzy and robotics have finally been selected for 
being categorized and described with the aim to fo- 
cus on recent papers describing uses of fuzzy-based 
or hybrid methods in relevant tasks. Both basic be- 
haviors and also high-level tasks of the robotics area 
as it is understood nowadays were considered. Ta- 
ble 20.1 describes the relevance of the papers consid- 
ered in terms of their position in the well-known rank- 
ings Thomson-Reuters Web of Knowledge for journals 
and Microsoft Academic Search (MAS 2013) [20.5] 
for conferences. We decided to consider the MAS 
2013 ranking since other conference rankings such 
as CORE-ERA (Computing Research and Education- 
Excellence in Research for Australia) were not up 
to date at the moment and do not extensively cover 
the robotics area. It can be seen that a vast major- 
ity of the papers are ranked in the Q1 of the re- 
spective lists (81% for journals and 72% for confer- 
ences). 

All the references were revised and classified ac- 
cording to the robotics area they mainly addressed and 
were included into one of the 12 categories described 
in Sects. 20.2—20.13. Also the fuzzy technique they use 
(together with its hybridization with other soft comput- 
ing techniques if this is the case) was annotated in order 
to assess which are the most active areas of soft com- 
puting in the field of robotics and also to evaluate their 
actual impact in the field. 

The results of this revision, classification, and anal- 
ysis are presented in the sections that follow. 


the ability that fuzzy controllers have to cope with noisy 
inputs. This noise appears when the sensors of the robot 
detect the surrounding environment and is an inherent 
feature to the whole field of robotic sensors. 

The importance of wall-following behavior for 
a car-like mobile robot was pointed out in [20.7]. In this 
work, a fuzzy logic control system was used in order to 
implement human-like driving skills by an autonomous 
mobile robot. Four different sensor-based behaviors 
were merged in order to synthesize the concepts of 
the maneuvers needed. These behaviors were: wall- 
following, corner control, garage-parking, and parallel- 
parking. A description of the design and implementa- 
tion of a velocity controller for wall-following can be 
found in [20.8, 9]. In [20.8] fuzzy temporal rules were 
used in order to filter the sensorial noise of a Nomad 


Application of Fuzzy Techniques to Autonomous Robots 


20.3 Navigation 


200 mobile robot and to endow the rule base with high 
expressiveness. The use of these types of rules has no- 
ticeably improved the robustness and reliability of the 
system. 

Wall-following behavior has also been used as 
a testing benchmark for automatic learning of fuzzy 
controllers. In [20.10] an evolutionary algorithm was 
used to automatically learn the fuzzy controller, taking 
into account the tradeoff between complexity and accu- 
racy. Continuing this work, in [20.11] the focus was on 
reducing the expert knowledge demanded for designing 
the controller. No restrictions are placed either on the 
number of linguistic labels or on the values that define 
the membership functions. Finally, in [20.12] a simple 
but effective learning methodology was presented. This 
methodology was proposed in order to not only gener- 
ate fuzzy controllers with good behavior in simulated 
experiments but also to use them directly in the real 
robot, with no further post-processing or implementing 
a tuning stage. 

More recent studies have addressed the use of type- 
2 fuzzy logic. These proposals are mostly motivated by 
the claim that type-1 fuzzy sets cannot handle the high 
levels of uncertainty that are usually present in real- 
world applications to the same extent that type-2 fuzzy 
sets can. 

As for type-1 fuzzy logic controllers (FLCs), this 
behavior has been used for testing the automatic learn- 
ing of type-2 FLCs. In [20.13] a genetic algorithm 
was used for tuning the type-2 fuzzy membership 
functions of a previously defined controller. A more 
complex learning scheme was presented in [20.14]. 
The antecedent is learned using a type-2 fuzzy cluster- 
ing based on examples and without expert knowledge. 
The actions in the consequent part of rules is selected 


20.3 Navigation 


Robot navigation consists of a series of actions, which 
are summarized in the ability of the robot to go from 
a starting point to a goal without a planned route. Navi- 
gation is one of the main issues that a mobile robot must 
solve in order to operate. 

In this behavior the ability to work in dynamic 
environments whose structure is unknown, with great 
uncertainty, with moving objects or objects that 
may change their position is of great importance. 
The capacity to work under these conditions is the 
best motivation to use fuzzy logic in this particular 
task. 


from a set of defined control actions through a hybrid 
method composed of a reinforcement learning algo- 
rithm and ant colony optimization. Both works used 
a real Pioneer robot to demonstrate the viability of the 
controllers. 

In order to show the advantages of using type-2 
fuzzy logic, a comparative analysis of type-1 and inter- 
val type-2 fuzzy controllers was presented in [20.15]. 
A particle swarm optimization algorithm was used to 
optimize a type-1 FLC. Next, the interval type-2 fuzzy 
controller was constructed, blurring the membership 
functions. The results obtained by a real mobile robot 
showed that the interval type-2 fuzzy controller can 
cope better with dynamic uncertainties in the sensory 
inputs due to the softening and smoothing of the output 
control surface. 

However, these works only focus on interval type- 
2 fuzzy logic. The high computational complexities 
associated with general type-2 fuzzy logic systems 
(FLSs) have, until recently, prevented their application 
to real-world control problems. In [20.16] this problem 
was addressed by introducing a complete representa- 
tion framework, which is referred to as zSlices-based 
general type-2 fuzzy systems. As a proof-of-concept ap- 
plication, this framework was implemented for a mobile 
robot, which operates in a real-world outdoor environ- 
ment using the wall-following behavior. In this case, the 
proposed approach outperformed type-1 and interval 
type-2 fuzzy controllers in terms of errors in the dis- 
tance to the wall. For this behavior, type-2 approaches 
exhibit a better performance when compared to type- 
1 approaches. Nevertheless, from a robotics point of 
view, type-2 proposals still did not outperform in gen- 
eral other fuzzy approaches and have a very limited 
impact on this area. 


Some work in this area has been done in simu- 
lated environments. In [20.17] a multi-sensor fusion 
technique was used to integrate all types of sensors 
and combine them to obtain information about the 
environment. In this way, the environment can be per- 
ceived comprehensively, and the ability of the FLC is 
improved. In [20.18, 19] the navigation behavior was 
studied for a set of robots that navigate in the same en- 
vironment. Both works used a Petri net to negotiate the 
priority of the robots. Also, in [20.19] different FLCs 
were compared. Each controller used a different num- 
ber of labels for each variable as well as a different 


315 


€°02 | d Hed 


316 Part B 


Fuzzy Logic 


€°02 | d Hed 


shape of the fuzzy sets. It was concluded that utilizing 
a Gaussian membership function is better for naviga- 
tion in environments with a high number of moving 
objects. 

Moreover, there are several works where this behav- 
ior has been successfully implemented in a real robot. 
A hardware implementation of a FLC for navigation 
in mobile robots was presented in [20.20]. The design 
methodology allows to transform a FLC into a system 
that is suitable for easy implementation on a digital 
signal processor (DSP). This methodology was tested 
with good results in a ROMEO 4R car-like vehicle 
for a parking problem. Moreover, in [20.21] the de- 
sign of a new fuzzy logic-based navigation algorithm 
for autonomous robots was illustrated. It effectively 
achieves correct environment modeling, and processes 
noisy and uncertain sensory data on a low-cost Khepera 
robot. 

The ability to avoid dead-end paths was studied 
in [20.22,23]. In these works the minimum risk ap- 
proach was used to avoid local minima. A novel path- 
searching behavior was developed to recommend the 
local direction with minimum risk, where the risk was 
modeled using fuzzy logic. Another approximation to 
solve this problem was presented in [20.24]. While the 
fuzzy logic body of the algorithm performs the main 
tasks of obstacle avoidance and target seeking, an ac- 
tual/virtual target switching strategy solves the problem 
of dead-ends on the way to the target. 

In [20.25] a new approach was proposed that em- 
ploys a fuzzy discrete event system to implement the 
behavior coordinator mechanism that selects relevant 
behaviors at a particular moment to produce an appro- 
priate system response. The possible transition from 
one state to another when an event occurs was modeled 
using fuzzy sets. 

In addition to the use of conventional FLCs, dif- 
ferent approaches have been used in the last decade. 
In [20.26] a novel reactive type-2 fuzzy logic architec- 
ture was used. Type-2 fuzzy logic was used for both 
implementing the basic navigation behaviors and also 
the strategies for their coordination. The proposed ar- 
chitecture was implemented in a robot and successfully 
tested in indoor and outdoor environments. 

In the same way as several learning approaches 
have been used for wall-following behavior, different 
works focused on automatically learning navigation 
skills. In [20.27] a novel fuzzy Q-learning approach was 
presented, where the weights of the fuzzy rules of the 
controller were learned through a reinforcement algo- 
rithm. 


In [20.28] a neuro-fuzzy network that is able to 
add rules to the rule base was presented. The criteria 
for adding rules was based on a performance eval- 
uation within a genetic algorithm that explores the 
new situations to add. A comparison of three different 
neuro-fuzzy approaches with classical fuzzy controllers 
can be found in [20.29]. It is shown that neuro-fuzzy 
approaches perform better and that the best results 
were obtained by an optimization made by a genetic 
algorithm for both Mamdani and Takagi-Sugeno ap- 
proaches. 

Although mobile wheeled robots are the most com- 
mon area of application for navigation, other types of 
robots also use this behavior. One of the most im- 
pressive types of robot that implement the navigation 
behavior are the unmanned aerial vehicles (UAVs). 
In [20.30] two fuzzy controllers (one for altitude and 
the other for latitude—longitude) were combined in or- 
der to control the navigation of a small UAV. In [20.31] 
the design of a Takagi-Sugeno controller for an un- 
manned helicopter was presented. The controller pro- 
posed is a fuzzy gain-scheduler used for stable and 
robust altitude, roll, pitch, and yaw control. Testing 
in both papers was performed in simulated environ- 
ments to show the results obtained by the controllers 
and, therefore, real testing on real UAVs was not 
reported. 

Another type of robot that demands navigation 
capabilities are robotic manipulators. Simulated manip- 
ulators were used in [20.32, 33]. The strategy followed 
in [20.32] was to use a fuzzy inference process to tune 
the gain of a sliding-mode control and the weights of 
a neural network controller in the presence of distur- 
bance or big tracking errors. It was shown that the 
combination of these controllers can guarantee stability. 
In [20.33] a genetic algorithm was presented in order 
to optimize the controllers of two robotic manipulators 
working on the same environment. 

Other examples of applications of navigation are: 
a robotic fish motion control algorithm [20.34] en- 
dowed with an orientation control system based on 
a FLC; the data transmission latency or data loss 
considered in [20.35] for internet-based teleopera- 
tion of robots (when data transmission fails, the 
robot automatically moves and protects itself using 
a fuzzy controller optimized using a co-evolutionary 
algorithm); and the stabilization of a unicycle mo- 
bile robot described in [20.36], where a type- 
2 FLC was used, and computer simulations con- 
firmed its good performance in different navigation 
problems. 


Application of Fuzzy Techniques to Autonomous Robots 


20.4 Trajectory Tracking 


Tracking refers to the ability of a robot to follow a pre- 
determined series of movements or a predefined path. 
Tracking is a relevant behavior in the industrial field, 
where robots usually have to perform a repeated pattern 
or trajectory with high precision. 

Developing accurate analytical models for such 
systems and hence reliable controllers based on such 
models is extremely difficult and in general unfeasi- 
ble even for not very complex trajectories. Applying 
fuzzy control strategies for such systems seems appro- 
priate, since with these systems the nonlinear system 
identification methodologies can be exploited with the 
help of inherent knowledge. Furthermore, suitable sta- 
bility conditions for such controllers to guarantee global 
asymptotic stability can be determined. 

In [20.37] a control structure that makes possible 
the integration of a kinematic controller and an adap- 
tive fuzzy controller was developed for mobile robots. 
A highly robust and flexible system that automati- 
cally follows a sequence of discrete way-points was 
presented in [20.38]. In [20.39] the combination of a ve- 
locity controller with a simple fuzzy system that limits 
on-line the advancing speed of the vehicle to allow it 
to follow an assigned path in compliance with the kine- 
matic constraints was presented. 

More recent studies in tracking for mobile robots 
have focused on more advanced and complex sys- 
tems. The work in [20.40] focused on the design of 
a dynamic Petri recurrent fuzzy neural network. This 
network structure was applied to the path-tracking con- 
trol of a non-holonomic mobile robot for verifying 
its validity. Also, in [20.41] the tracking control of 
a mobile robot with uncertainties in the robot kine- 
matics, the robot dynamics, and the wheel actuator 
dynamics was investigated. A robust adaptive con- 
troller was proposed for back-stepping a FLC. Fi- 
nally, [20.42] proposed a complete control law com- 
prising an evolutionary programming-based kinematic 
control and an adaptive fuzzy sliding-mode dynamic 
control. 

Although mobile robots have a great utility in indus- 
try, the robotic manipulators are the most used type of 
robot in this sector. One main issue is to develop con- 
trollers that deal with uncertainties. [20.43] addressed 
trajectory tracking problems of robotic manipulators 
using a fuzzy rule controller designed to deal with un- 
certainty. In [20.44] two adaptive fuzzy systems are 
employed to approximate the nonlinear and uncertain 
terms occurring in the robot arm and the joint motor dy- 


namics. Other adaptive controllers for motion control of 
multi-link robot manipulators can be found in [20.45— 
47]. 

Other works focused on the ability to adapt the 
fuzzy control to different demands or requisites over 
time with different approaches. In [20.48] a fuzzy 
sliding mode controller was proposed for robotic ma- 
nipulators. The membership functions of the control 
gain are updated on-line and, therefore, the controller 
is not a conventional fuzzy controller but an adap- 
tive one. In [20.49] a design method that constructs 
the fuzzy rule base from a conventional proportional- 
integral-derivative (PID) controller in an incremental 
way using recursive feedback was proposed. A direct 
fuzzy control system for the regulation of robot ma- 
nipulators was presented in [20.50]. The bounds of the 
applied torques in this case are adjusted by means of the 
output membership functions parameters in such a way 
that the maximum torque demanded by the controller 
always ranges between the limits given by the manu- 
facturer. 

A different perspective of robotic manipulators, 
but also a very common one, is the scheme of dif- 
ferent systems that need to communicate. In [20.51] 
a new observer-controller structure for robot manipu- 
lators with model uncertainty using only the position 
measurements is proposed. In this method, adaptive 
fuzzy logic is used to approximate the nonlinear and 
uncertain robot dynamics in both the observer and the 
controller. 

The effects of network-induced delay and data 
packet dropout for a class of nonlinear networked 
control systems for a flexible arm was investigated 
in [20.52]. The non-linear networked control systems 
were approximated by linear networked Takagi-Sugeno 
fuzzy models. An iterative algorithm for constructing 
the fuzzy model was proposed. Also, in [20.53], the 
delay transmission of a signal through an internet and 
wireless module was studied. 

As well as for the case of mobile robots, in the 
recent years of research in robotic manipulators both 
the automatic design and the learning of different 
parts of the controllers have been addressed. For ex- 
ample, in [20.54] a novel tracking control design for 
robotic systems using fuzzy wavelet networks was pre- 
sented. Fuzzy wavelet networks were used to estimate 
unknown functions and, therefore, to solve the prob- 
lem of demanding prior knowledge of the controlled 
plant. 


20.4 Trajectory Tracking 317 


7°02 | d Hed 


318 PartB 


Fuzzy Logic 


G'OZ | d Hed 


Intelligent control approaches such as neural net- 
works for the approximation of nonlinear systems have 
also received considerable attention. They are very ef- 
fective in coping with structured parametric uncertainty 
and unstructured disturbance by using their powerful 
learning capability. Thus, neuro-fuzzy network con- 
trollers are the usual choice for robot manipulators. 

In [20.55] the position control of modular and re- 
configurable robots was addressed. A neuro-fuzzy con- 
trol architecture was used for tuning the gains inside 
the PID controller. An improvement was achieved with 
respect to classic controllers in terms of error of the 
trajectory that is tracked. Another neuro-fuzzy robust 
tracking control law was implemented in [20.56]. In this 
work, the controller guaranteed transient and asymp- 
totic performance. 

Other different approaches of neuro-fuzzy con- 
trollers have also been developed. In [20.57] a sta- 
ble discrete-time adaptive tracking controller using 
a neuro-fuzzy dynamic-inversion for a robotic manip- 
ulator was presented. The dynamics of the manipulator 
were approximated by a dynamic Takagi—Sugeno fuzzy 
model. With the aim of improving the robustness of 
the controller, in [20.58] a novel parameter adjustment 
scheme using a neuro-fuzzy inference system architec- 
ture was presented. 

More real environment applications of neuro-fuzzy 
networks approaches have also been discussed in the 
literature [20.59]. In [20.60, 61] a robust neural-fuzzy- 
network control was investigated for the joint posi- 
tion control of an n-link robot manipulator for peri- 
odic motion, with the aim of achieving high-precision 
position tracking. In [20.62] an approximate Takagi- 


20.5 Moving Target Tracking 


Service robots have to be endowed with the capacity 
of working in dynamic environments with high uncer- 
tainty. Typical environments with these characteristics 
are airports, hallways of buildings, corridors of hos- 
pitals, domestic environments, etc. One of the most 
important factors when working in real environments 
are the moving objects in the surrounding of the robot. 
The knowledge of the position, speed, and heading of 
the moving objects is fundamental for the execution of 
tasks like localization, route planning, interaction with 
humans, or obstacle avoidance. 

In [20.68] a module that allows the mobile robot to 
localize the target precisely in the environment was pre- 


Sugeno type neuro-fuzzy state-space model for a flex- 
ible robotic arm was presented. The model was trained 
using a particle swarm optimization technique. 

In the last years, more advanced learning algorithms 
have been used. In [20.63] an algorithm to learn the 
path-following behavior for multi-link mobile robots 
was presented. In this approach the learning complex- 
ity of the path-following behavior is reduced, as long 
paths are divided into a set of small motion primitives 
that can reach almost every point in the neighbor- 
hood. In [20.64], a robot manipulator was controlled 
by a FLC, where the parameters of the Gaussian mem- 
bership functions were optimized with particle swarm 
optimization. Also, in [20.65] the authors described 
the application of ant colony optimization and particle 
swarm optimization to the optimization of the member- 
ship function parameters of a FLC. The aim in this case 
was to find the optimal trajectory tracking controller for 
an autonomous wheeled mobile robot. 

Interval type-2 FLCs have been described in several 
case studies to handle uncertainties. However, one of 
the main issues in adopting such systems on a larger 
scale is the lack of a systematic design methodol- 
ogy. [20.66] presented a novel design methodology 
of interval type-2 Takagi-Sugeno—Kang (TSK) FLCs 
for modular and reconfigurable robot manipulators for 
tracking purposes with uncertain dynamic parameters. 
Moreover, [20.67] provided a problem-driven design 
methodology together with a systematic assessment 
of the performance quality and uncertainty robustness 
of interval type-2 FLCs. The method was evaluated 
on the problem of position control of a delta parallel 
robot. 


sented. Both direction and distance to the target were 
measured using infrared sensors. Also, a fuzzy target 
tracking control unit was proposed. This control unit 
comprises a behavior network for each action of the 
tracking control and a gate network for combining the 
information of the infrared sensors. It was shown that 
the proposed control scheme is, indeed, effective and 
feasible through some simulated and real examples of 
the behavior. 

In [20.69, 70] a pattern classifier system for the de- 
tection of moving objects using laser range finders data 
was presented. An evolutionary algorithm was used to 
learn the classifier system based on the quantified fuzzy 


Application of Fuzzy Techniques to Autonomous Robots 


temporal rules model. These quantified fuzzy temporal 
rules are able to analyze the persistence of the fulfill- 
ment of a condition in a temporal reference by using 
fuzzy quantifiers. Moreover, in [20.71] the authors pre- 
sented a deep experimental study on the performance of 


20.6 Perception 


Perception is an essential part of any robotic system, 
since it is the functionality through which the robot 
incorporates information from the environment. Sev- 
eral sensors have been typically used in mobile robots 
over the years. Ultrasound, laser range finders, acous- 
tic signals, or cameras are some examples of the most 
used sensors. The information obtained by these sensors 
can be used in order to solve various problems such as 
object recognition, collision avoidance, navigation, or 
some particular objects. 

The perception of landmarks is a very useful and 
meaningful strategy for helping in localization or nav- 
igation. Furthermore, it is a quite common strategy to 
detect known landmarks that are present in the envi- 
ronments instead of adding artificial landmarks (e.g., 
visual beacons, radio frequency identification (RFID) 
labels,...) in order to preserve environments with 
the less possible external intervention or manipula- 
tion. Within this context, one of the most commonly 
used landmarks in indoor environments are doors, 
since they indicate relevant points of interaction and 
also for their static nature. In [20.72] fuzzy temporal 


20.7 Planning 


For more complex behaviors in robotics, systems must 
be able to achieve certain higher-level goals. In order to 
do that, the robot needs to make choices that maximize 
the utility or value of the available alternatives. The pro- 
cess by which this objective is solved is called planning. 
When the robot is not the only actor (as is usual), it must 
check periodically if the environment matches with the 
predictions made and change its plan accordingly. 

Path planning is a typical task that is needed in most 
mobile robots that work in unknown environments. The 
problem consists in determining the path to be followed 
by the robot in order to reach a goal. In the case when 
not only the path, but also the movements of the robot 
at each instant are determined, planning is referred 


different evolutionary fuzzy systems for moving object 
following. Several environments with different degrees 
of complexity and a real environment were used in 
order to show the applicability of the methodologies 
presented. 


rules were used for detecting doors using the informa- 
tion obtained from ultrasound sensors. This paradigm 
was used to model the temporal variations of the 
sensor signals together with the model of the nec- 
essary knowledge for detection. A different approach 
for door detection using computer vision was pre- 
sented in [20.73]. Doors are found in gray-level images 
by detecting the borders and are distinguished from 
other similar shapes using a fuzzy system designed 
using expert knowledge. Also, a tuning mechanism 
based on a genetic algorithm was used to improve the 
performance of the system according to the particu- 
larities of the environment in which it is going to be 
employed. 

In robotic soccer games perception also plays 
a principal role. The work presented in [20.74] de- 
scribes a type-2 FLC to accurately track a mobile 
object, in this case a ball, from a robot agent. Both 
players and ball positions must be tracked using a low- 
computational cost image processing algorithm. The 
fuzzy controller aims to overcome the uncertainty 
added by this image processing. 


to as motion planning. In [20.75] a two-layered goal- 
oriented motion planning strategy using fuzzy logic was 
developed for a Koala mobile robot navigating in an un- 
known environment. The information about the global 
goal and the long-range sensorial data are used by the 
first layer of the planner to produce an intermediate goal 
in a favorable direction. The second layer of the planner 
takes this sub-goal and guides the robot to reach it while 
avoiding collisions using short-range sensorial data. 
Other path planning approaches that use fuzzy logic 
can be found in more recent years. A cooperative con- 
trol in a multi-agent architecture was applied in [20.76] 
in order to implement high cognitive capabilities like 
planning. The agents provided basic behaviors (such 


20.7 Planning 319 


2°02 | d Hed 


320 PartB 


Fuzzy Logic 


6°02 | 9 Hed 


as moving to a point) sharing the robot resources and 
negotiating when conflicts arose. A new proposal to 
solve path planning was also proposed in [20.77]. It 
was based in an ant colony optimization to find the 
best route with a fuzzy cost function. In [20.78] fuzzy 
logic was used in order to discretize the environment in 
relation to a soccer robot. Then, a multi-objective evo- 
lutionary algorithm was designed in order to optimize 
the actions needed in order to reach the ball. 

Some research has also been reported in the field 
of robot soccer, for high-level planning of team be- 


20.8 SLAM 


Simultaneous localization and mapping (SLAM) is 
a field of robotics that has as its main objective the 
construction of a map of the environment while at the 
same time keeping track of the current location of the 
robot inside the map that is being built. Mapping con- 
sists of integrating the information gathered with the 
robot’s sensors into a given representation. In contrast 
to this, localization is the problem of estimating where 
the robot is placed on a map. In practice, these two 
problems cannot be solved independently of each other. 
Before a robot can answer the question of how the envi- 
ronment looks like given a set of observations, it needs 
to know from which locations these observations have 
been made. At the same time, it is hard to estimate 
the current position of a robot (or any vehicle) without 
a map. 

Different fuzzy approximations have been used in 
order to help to solve the SLAM problem. In [20.81] 
the development of a new neuro-fuzzy-based adaptive 
Kalman filtering algorithm was proposed. The neuro- 
fuzzy-based supervision for the Kalman filtering al- 
gorithm is carried out with the aim of reducing the 
mismatch between the theoretical and the actual covari- 
ance of the innovation sequences. To do that, it attempts 
to estimate the elements of the covariance matrix at 


20.9 Cooperation 


A multi-agent system is a system composed of multi- 
ple interacting intelligent agents within an environment. 
When the agents are robots, these systems lead to 
a more challenging task because of their implicit real- 
world environment, which is presumably difficult to 


havior. In [20.79] an extensive fuzzy behavior-based 
architecture was proposed. The behavior-based archi- 
tecture decomposes the complex multi-robotic sys- 
tem into smaller modules of roles, behaviors, and 
actions. Each individual behavior was implemented 
using a FLC. The same approach was used for co- 
ordinating the various behaviors and select the most 
appropriate role for each robot. Continuing this work, 
in [20.80] an evolutionary algorithm approach was 
used to optimize each FLC for each layer of the 
architecture. 


each sampling instant when a measurement update step 
is carried out. Also, a fuzzy adaptive extended informa- 
tion filtering scheme was used in [20.82] for ultrasonic 
localization and pose tracking of an autonomous mobile 
robot. The scheme was presented in order to improve 
the estimation accuracy and robustness for the proposed 
localization system with a system having a lack of in- 
formation and noise. 

A novel hybrid method for integrating fuzzy logic 
and genetic algorithms (genetic fuzzy systems, GFSs) 
to solve the SLAM problem was presented in [20.83]. 
The core of the proposed SLAM algorithm searches for 
the most probable map such that the associated poses 
provide the robot with the best localization informa- 
tion. Prior knowledge about the problem domain was 
transferred to the genetic algorithm in order to speed 
up convergence. Fuzzy logic is employed to serve this 
purpose and allows the algorithm to conduct the search 
starting from a potential region of the pose space. The 
underlying fuzzy mapping rules infer the uncertainty 
in the location of the robot after executing a motion 
command and generate a sample-based prediction of its 
current position. The robustness of the proposed algo- 
rithm has been shown in different indoor experiments 
using a Pioneer 3AT mobile robot. 


model. There are two fundamental needs for multi- 
agent approaches. On one hand, some problems can 
be naturally too complex or impossible to be accom- 
plished by a single robot. On the other hand, there 
can be benefits for using several simple robots be- 


Application of Fuzzy Techniques to Autonomous Robots | 20.10 Legged Robots 321 


cause they are cheaper and more fault tolerant than 
having a single complex robot. Also, multi-agent 
systems can be helpful for social and life science 
problems. 

An illustrative example of this type of system is 
shown in [20.84], where a mobile sensor network ap- 
proach composed by robots that cooperate was pre- 
sented. The objective of the sensor network was the 
localization of hazardous contaminants in an unknown 
large-scale area. The robots have a swarm controller 
that controls the behavior for the localization for each 
robot, whose actions are based on a fuzzy logic control 
system that is identical for all robots. 

Control cooperation of robots has been of great in- 
terest in the last years. Fuzzy logic controllers have 
obtained good performance in some simulation exper- 
iments. The idea of applying fuzzy controllers comes 
from the fact that soft computing techniques have 


20.10 Legged Robots 


Wheeled robots have dominated the state of the art 
of mobile robots. However, in the last decade, there 
has been an interest to find alternatives for those en- 
vironments in which wheeled robots are not able to 
operate. When the terrain is variable and unprepared, 
adding legs to robots might be a solution. Legged 
robots can navigate on and adapt to any kind of sur- 
faces (such as rough, rocky, sandy, and steep terrains) 
and step over obstacles; they can adapt. Moreover, 
legged robots help the exploration of human and ani- 
mal locomotion. 

One of the main differences between legged and 
wheeled robots is that legged robots require the system 
to generate an appropriate gait to move, whereas wheels 
just need to roll. To clarify this, gait is the movement 
pattern of limbs in animals and humans used for loco- 
motion over a variety of surfaces. The same concept is 
used to design the pattern of movement of robots on dif- 
ferent surfaces. In [20.87], the learning of a biped gait 
was solved using reinforcement learning. The aim of 
this work was to improve the learning rate through the 
incorporation of expert knowledge using fuzzy logic. 
This fuzzy logic was incorporated in the reinforcement 
system through neuro-fuzzy architectures. Moreover, 
fuzzy rule-based feedback is incorporated instead of 
numerical reinforcement signals. A different approxi- 
mation was carried out in [20.88], where two different 
genetic algorithm approaches were used in order to im- 


proved to be efficient for poorly defined system op- 
timization and multi-agent coordination. In [20.85] 
a multi-agent control system was proposed, based on 
a fuzzy inference system for a group of two wheeled 
mobile robots executing a common task. An application 
of this control system is the control of robotic forma- 
tions moving on the plane such as a group of guard 
robots taking care of an area and dealing with poten- 
tial intruders. The use of fuzzy logic in this work allows 
easy expression of rules, and the multi-agent structure 
supports separation of team and individual knowledge. 
Another example can be found in [20.86]. In this work, 
a collision free target tracking problem of a multi-agent 
robot system was presented. Game theory provides an 
effective tool to solve this problem. To enhance robust- 
ness, a fuzzy controller tunes the cost function weights 
directly for the game theoretic solution and helps to 
achieve a prescribed value of cost function components. 


prove the performance of a FLC designed to model the 
gait generation problem of a biped robot. In both works, 
computer simulations were done in order to compare 
the different approaches of the control systems in terms 
of stability. 

The work in [20.89] focused on the design of a leg 
for a quadrupedal galloping machine. For that, two in- 
telligent strategies, a fuzzy and an heuristic controller, 
were developed for verification on a one-legged system. 
The fuzzy controller consists of a fuzzy rule base with 
an adaptation mechanism that modifies the rule output 
centers to correct velocity. These techniques were suc- 
cessfully implemented for operating one leg at speeds 
necessary for a dynamic gallop. It was shown that the 
fuzzy controller outperformed the heuristic controller 
without relying on a model of the system. 

Finally for the gait problems, in [20.90] a fuzzy 
logic vertical ground reaction force controller was de- 
veloped for a robotic cadaveric gait, which altered 
tendon forces in real time and iteratively adjusted the 
robotic trajectory in order to track a target reaction. 
This controller was validated using a novel dynamic 
cadaveric gait simulator. The fuzzy logic rule-based 
controller was able to track the target with a very low 
tracking error, demonstrating its ability to accurately 
control this type of robot. 

Besides the gait problem, the biped robotic sys- 
tem contains a great deal of uncertainties associ- 


or'oz | d Hed 


322 


WOZ | d Hed 


Part B 


Fuzzy Logic 


ated with the mechanism dynamics and environment 
parameters. In [20.91] it was suggested that type- 
2 fuzzy logic control systems could be a better 
way to deal with the uncertainty in a robotic sys- 
tem. A novel type-2 fuzzy switching control sys- 
tem was proposed for biped robots, which includes 
a type-2 fuzzy modeling algorithm. As in the pre- 
vious work, simulated experiments were used in or- 
der to compare the performance of the controller 
proposed with other dynamical intelligent control 
methods. 


In [20.92] a fuzzy controller, consisting of a fuzzy 
prefilter (designed by a genetic algorithm) in the feed- 
forward loop and a PID-like fuzzy controller in the 
feed-back loop, was proposed for foot trajectory track- 
ing control of a hydraulically actuated hexapod robot. 
A COMET-III real robot was used in this work and 
the experimental results exhibit that the proposed con- 
troller manifests better foot trajectory tracking per- 
formance compared to an optimal classical controller 
like the state feedback linear-quadratic regulator (LQR) 
controller. 


20.11 Exoskeletons and Rehabilitation Robots 


The latest advances in assistive robotics has had a great 
impact in different fields. For instance, in military appli- 
cations, these technologies can allow soldiers to carry 
a higher payload and walk further without requiring 
more effort or producing fatigue. However, the field 
where the impact of this type of robotics is the greatest 
is healthcare. The aging of the population will be one of 
the main problems in the near future, since more peo- 
ple are going to need some type of assistance on a daily 
basis. This dependency suffered by the elderly can be 
partially resolved with the use of robotic systems that 
help people with a lack of mobility or strength. Robotic 
systems for assistance can provide total or partial move- 
ment to these people. Moreover, rehabilitation using 
these systems can make regaining movement-related 
functions easier and faster. 

Some studies were developed in recent years that 
use fuzzy techniques on the rehabilitation of upper- 
limb motion (shoulder joint motion and elbow joint 
motion). The principal reason to use fuzzy logic is to 
deal with complicated, ill-defined, and dynamic pro- 
cesses, which are intrinsically difficult to being mod- 
elled mathematically. Moreover, fuzzy logic control 
incorporates human knowledge and experience directly 
without relying on a detailed model of the control 
system. 

In [20.93] an exoskeleton and its fuzzy control 
system to assist the human upper-limb motion of phys- 
ically weak persons was presented. The proposed robot 
automatically assists human motion mainly based on 
electromyogram signals on the skin surface. In a later 
work [20.94], the authors introduced a hierarchical 
neuro-fuzzy controller for a robotic exoskeleton where 


the angles of the elbow and shoulder are modeled us- 
ing fuzzy sets. Additionally, fuzzy sets are used in 
order to set a trigger in the activity of the muscles. 
In order to solve the same problem, a hybrid posi- 
tion/force fuzzy logic control system was presented 
in [20.95]. The objective of this work was to assist the 
subject in performing both passive and active move- 
ments along the designed trajectories with specified 
loads. 

More recent studies have been published under the 
paradigm of fuzzy sliding mode control. In [20.96] 
a novel adaptive self-organizing fuzzy sliding mode 
control for the control of a 3-degree-of-freedom (DOF) 
rehabilitation robot was presented. An interesting char- 
acteristic of this approach is the ability to establish 
and regulate the fuzzy rule base dynamically. Go- 
ing a step further along that same line, in [20.97] 
an adaptive self-organizing fuzzy sliding mode con- 
trol robot was proposed for a 2-DOF rehabilitation 
robot. 

For comparison and performance measurement pur- 
poses, one common practice in order to examine the 
effectiveness of the proposed exoskeleton in motion as- 
sistance is to use human subjects who perform different 
cooperative motions of the elbow and shoulder [20.93]. 
Different performance measures can be used to show 
the correctness of the proposed systems. In [20.94] 
the angles obtained by the exoskeleton were compared, 
while in [20.95] the results were shown in terms of force 
and stability. Finally, in [20.96, 97], the performance of 
the robotic rehabilitation system was measured in terms 
of response of the system to movements and tracking 
errors. 


Application of Fuzzy Techniques to Autonomous Robots 


20.12 Emotional Robots 


Future robots need a transparent interface that regular 
people can interpret, such as an emotional human-like 
face. Moreover, such robots must exhibit behaviors that 
are perceived as believable and life like. In general 
terms, the use of fuzzy techniques in this field did not 
have a great impact. However, the fuzzy approach can 
not only simplify the design task, but also enrich the 
interaction between humans and robots. 

An application of fuzzy logic for this type of robot 
can be found in [20.98]. In this research it was proposed 
to use fuzzy logic for effectively building the whole be- 
havior system of face emotion expression robots. It was 
shown how these behaviors could be constructed by 
a fuzzy architecture that not only seems more realistic 
but can also be easily implemented. 

In [20.73] a fuzzy system that establishes a level of 
possibility about the degree of interest that the people 
around the robot may have in interacting with it was 
presented. Firstly, a method to detect and track persons 
using stereo vision was proposed. Then, the interest of 


20.13 Fuzzy Modeling 


Some extensions to fuzzy logic have been developed 
in the last decade in the field of mobile robotics. 
In [20.102] a probabilistic type-2 FLS was proposed for 
modeling and control. [20.89] focused on the design of 
a leg for a quadrupedal galloping machine. For that, two 
intelligent strategies (fuzzy and heuristic controllers) 
were developed for verification on a one-legged sys- 
tem. The fuzzy controller consists of a fuzzy rule base 
with an adaption mechanism that modifies the rule 
output centers to correct velocity errors. These tech- 
niques were successfully implemented for operating 
one leg at the speeds demanded for a dynamic gallop. 
It was shown that the fuzzy controller outperformed the 
heuristic controller without relying on a model of the 
systems. This proposal aims to solve the lack of capa- 
bility of FLSs to handle various uncertainties identified 
by this work in practical applications. Two examples 
were used to validate the probabilistic fuzzy model: 
a function approximation and a robotic application. The 
robotic application was successfully implemented for 
the control of a simulated biped robot. 

A novel representation of robot kinematics was pro- 
posed in [20.103, 104] in order to merge qualitative 
and quantitative reasoning. Fuzzy reasoning is good 
at communicating with sensing and control level sub- 


each person was computed using fuzzy logic by ana- 
lyzing its position and its level of attention to the robot. 
The level of attention is estimated by analyzing whether 
or not the person is looking at the robot. 

A more recent work of video-based emotion recog- 
nition was presented in [20.99]. In this work, a fuzzy 
rule-based approach was used for emotion recogni- 
tion from facial expressions. The fuzzy classifica- 
tion itself analyzes the deformation of a face sep- 
arately in each image. In contrast to most exist- 
ing approaches, also blended emotions with vary- 
ing intensities as proposed by psychologists can be 
handled. Other work that was based on physiolog- 
ical measures was presented in [20.100]. In this 
work, a fuzzy inference engine was developed to es- 
timate human responses. The authors demonstrated 
in a later work [20.101] that a hidden Markov 
model is able to achieve better classification re- 
sults than the previously reported fuzzy inference 
engine. 


systems by means of fuzzification and defuzzification 
methods. It has powerful reasoning strategies utilizing 
compiled knowledge through conditional statements 
so as to easily handle mathematical and engineering 
systems. Fuzzy reasoning also provides a means for 
handling uncertainty in a natural way, making it robust 
in significantly noisy environments. However, in this 
work a lack of ability in fuzzy reasoning alone to deal 
with qualitative inference about complex systems was 
pointed out. 

It is argued that qualitative reasoning can com- 
pensate this drawback. Qualitative reasoning has the 
advantage of operating at the conceptual modeling 
level, reasoning symbolically with models that retain 
the mathematical structure of the problem rather than 
the input/output representation of rule bases. Moreover, 
the computational cause-effect relations contained in 
qualitative models facilitate analyzing and explaining 
the behavior of a structural model. The kinematics 
of a PUMA 560 robot is modelled for the trajec- 
tory tracking task. Thus demonstrating the ability of 
fuzzy reasoning. Simulation results demonstrated that 
the proposed method effectively provides a two-way 
connection for robot representations used for both nu- 
merical and symbolic robotic tasks. 


20.13 Fuzzy Modeling 323 


EL'OZ | d Hed 


324 Part B | Fuzzy Logic 


4L'0z | d Hed 


20.14 Comments and Conclusions 


The significance of the contributions of fuzzy logic to 
robotics is quite different in the three stages that we 
have identified. In the first stage, the objective was 
to endow the robot with a set of simple behaviors to 
solve basic tasks like wall-following, obstacle avoid- 
ance, moving object tracking, trajectory tracking, etc. 
Fuzzy logic significantly contributed in this stage, not 
only with the design of behaviors, but also with the co- 
ordination or fusion among them (more than 75% of 
the papers considered in this chapter deal with these 
topics). In the second stage, the focus moved to im- 
plementing autonomous robots that are able to operate 
in real environments (museums, hospitals, homes, . . .), 
which should, therefore, be able to generate a map of 
the environment, localize in the map, and also navigate 
between different positions (motion planning). The first 
two tasks were joined under the SLAM field, which has 
been one of the most important topics in robotics in the 
recent years. Motion planning has been another relevant 
field, that experimented a great improvement with the 
use of heuristic search algorithms and the inclusion of 
kinematic constraints in the planning. The contributions 
of fuzzy logic to this second wave have been marginal; 
SLAM techniques are dominated by probabilistic and 
optimization approaches, while the best motion plan- 
ning proposals rely on heuristic search processes and 
probabilistic approaches to manage the uncertainty. 

We have assessed which is the actual impact of the 
recently reported research on fuzzy-based approaches 
to robotics research and applications from a quan- 
titative/qualitative point of view. In order to have 
an estimation we have considered the two journals 
with the highest impact factor in the 2012 Thomson- 
Reuters Web of Knowledge (the International Journal 
of Robotics Research IJRR and IEEE Transactions on 
Robotics, TEEE-TR) and looked for papers that in- 
cluded the term fuzzy in the Abstract in the period 
considered (2003-2013). We found that only one paper 
fulfilled such conditions in RR and only seven papers 
in IEEE-TR. These results indicate that the actual im- 
pact and diffusion of fuzzy approaches in the relevant 
robotics arena is very limited. A vast majority of re- 
search and application results of fuzzy approaches in 
robotics are, therefore, published and presented in pa- 
pers and conferences related to soft computing. In fact, 


only 22 out of the 98 papers considered in this chapter 
(i. e., 22%) were published in robotics-related forums. 
Among these almost all papers (20 out of 22, 91%) de- 
scribed FLC of the Mamdani type for different tasks, 
which suggests that FLC is without a doubt the area 
with the highest impact in papers and conferences of the 
most genuine robotics area (i. e., out of the soft comput- 
ing related publications). 

Furthermore, FLC is the most active area of re- 
search and applications, since 66% of the papers con- 
sidered in this chapter describe Mamdani-based fuzzy 
controllers for all the behaviors and high-level tasks 
considered. Other hybrid methodologies such as neuro- 
fuzzy networks or fuzzy-based ones such as type-2 
fuzzy sets and Takagi-Sugeno rules follow at a large 
distance (12%, 9%, and 7%, respectively), but with al- 
most no impact in robotic-centered publications. 

Although these topics are still open, nowadays 
the focus in robotics is moving to other higher-level 
fields (third stage), like perception, learning of complex 
behaviors, or human-robot interaction. Perception re- 
quirements are not just the construction of occupancy 
or feature-based maps, but the recognition of objects, 
the classification of objects, the identification of ac- 
tions, etc.; in summary, scene understanding. Moreover, 
perception has to combine different sources of infor- 
mation, with visual and volumetric data being the two 
main sources. Contributions from fuzzy techniques are 
still few in number, but from our point of view, they can 
contribute to this topic and will surely do so in those 
cases where high-level reasoning is required, and also 
in the description of scenes. 

From the point of view of human-robot interaction, 
there are two directions in which fuzzy logic may also 
contribute significantly. The first one is the interpre- 
tation of the emotional state of the people interacting 
with the robot. This interpretation uses several infor- 
mation sources (visual, acoustic, etc.), requires expert 
knowledge to build the classification rules, and the kind 
of uncertainty of the data could be adequately modeled 
with fuzzy sets. The second direction is the expressive- 
ness of the robot, which will be fundamental for social 
robotics. Again, and for the same reasons as for the 
previous topic, fuzzy logic approaches may generate 
significant contributions in the field. 


Application of Fuzzy Techniques to Autonomous Robots 


References 


References 

20.1 A. Saffiotti: The uses of fuzzy logicin autonomous 20.17 A. Zhu, S.X. Yang: A fuzzy logic approach to reac- 
robot navigation, Soft Comput. 1(4), 180-197 (1997) tive navigation of behavior-based mobile robots, 

20.2 E.H. Ruspini: Fuzzy logic in the flakey robot, Proc. Proc. IEEE Int. Conf. Robot. Autom. (ICRA) '04, Vol. 
Int. Conf. Fuzzy Log. Neural Netw., lizuka (1990) 5 (2004) pp. 5045-5050 
pp. 767-770 20.18 D.R. Parhi: Navigation of mobile robots using 

20.3 A. Saffiotti, E.H. Ruspini, K. Konolige: Blending a fuzzy logic controller, J. Int. Robot. Syst. 42(3), 
reactivity and goal-directedness in a fuzzy con- 253-273 (2005) 
troller, IEEE Int. Conf. Fuzzy Syst., San Francisco 20.19 S.K. Pradhan, D.R. Parhi, A.K. Panda: Fuzzy 
(1993) pp. 134-139 logic techniques for navigation of several mobile 

20.4 A. Saffiotti, K. Konolige, E.H. Ruspini: A multi- robots, Appl. Soft Comput. 9(1), 290-304 (2009) 
valued-logic approach to integrating planning 20.20 |. Baturone, F.J. Moreno-Velo, V. Blanco, J. Ferruz: 
and control, Artif. Intell. 76(1), 481-526 (1995) Design of embedded DSP-based fuzzy controllers 

20.5 Microsoft, Inc: for autonomous mobile robots, IEEE Trans. Ind. 
http://academic.research. microsoft.com! Electron. 55(2), 928-936 (2008) 

20.6 Thomson Reuters: http://thomsonreuters.com/ 20.21 F. Cupertino, V. Giordano, D. Naso, L. Delfine: 
thomson-reuters-web-of-science/ Fuzzy control of a mobile robot, Robot. Autom. 

20.7 T.S. Li, S.-J. Chang, Y.-X. Chen: Implementation Mag. IEEE 13(4), 74-81 (2006) 
of human-like driving skills by autonomous fuzzy 20.22 M. Wang, J.N.-K. Liu: Fuzzy logic based robot path 
behavior control on an FPGA-based car-like mo- planning in unknown environment, Proc. Int. 
bile robot, IEEE Trans. Ind. Electron. 50(5), 867- Conf. Mach. Learn. Cybern., Vol. 2 (2005) pp. 813- 
880 (2003) 818 

20.8 M. Mucientes, R. Iglesias, C.V. Regueiro, A. Buga- 20.23 M. Wang, J.N.K. Liu: Fuzzy logic-based real-time 
rin, S. Barro: A fuzzy temporal rule-based velocity robot navigation in unknown environment with 
controller for mobile robotics, Fuzzy Sets Syst. dead ends, Robot. Auton. Syst. 56(7), 625-643 
134(1), 83-99 (2003) (2008) 

20.9 V.M. Peri, D. Simon: Fuzzy logic control foran au- 20.24  0.R.E. Motlagh, T.S. Hong, N. Ismail: Develop- 
tonomous robot, Proc. Annu. Meet. N. Am. Fuzzy ment of a new minimum avoidance system for 
Inf. Process. Soc. (2005) pp. 337-342 a behavior-based mobile robot, Fuzzy Sets Syst. 

20.10 M. Mucientes, D.L. Moreno, A. Bugarín, S. Barro: 160(13), 1929-1946 (2009) 

Evolutionary learning of a fuzzy controller for 20.25 R. Huq, G.K.I. Mann, R.G. Gosine: Behavior- 
wall-following behavior in mobile robotics, Soft modulation technique in mobile robotics using 
Comput. 10(10), 881-889 (2006) fuzzy discrete event system, IEEE Trans. Robotics 

20.11 M. Mucientes, D.L. Moreno, A. Bugarn, S. Barro: 22(5), 903-916 (2006) 

Design of a fuzzy controller in mobile robotics us- 20.26 H.A. Hagras: A hierarchical type-2 fuzzy logic con- 
ing genetic algorithms, Appl. Soft Comput. 7(2), trol architecture for autonomous mobile robots, 
540-546 (2007) IEEE Trans. Fuzzy Syst. 12(4), 524-539 (2004) 

20.12 M. Mucientes, R. Alcala, J. Alcala-Fdez, J. Casillas: 20.27 P. Ritthipravat, T. Maneewarn, D. Laowattana, 
Learning weighted linguistic rules to control an J. Wyatt: A modified approach to fuzzy Q learn- 
autonomous robot, Int. J. Intell. Syst. 24, 226-251 ing for mobile robots, Proc. IEEE Int. Conf. Syst., 
(2009) Man Cybern., Vol. 3 (2004) pp. 2350-2356 

20.13 C. Wagner, H. Hagras: A genetic algorithm based 20.28 L.-H. Chen, C.-H. Chiang: New approach to in- 
architecture for evolving type-2 fuzzy logic con- telligent control systems with self-exploring pro- 
trollers for real world autonomous mobile robots, cess, IEEE Trans. Syst. Man Cybern. B 33(1), 56-66 
Proc. IEEE Int. Conf. Fuzzy Syst. (2007) pp. 1-6 (2003) 

20.14 C.-F. Juang, C.-H. Hsu: Reinforcement ant op- 20.29 N.B. Hui, V. Mahendar, D.K. Pratihar: Time- 
timized fuzzy controller for mobile-robot wall- optimal, collision-free navigation of a car-like 
following control, IEEE Trans. Ind. Electron. 56(10), mobile robot using neuro-fuzzy approaches, 
3931-3940 (2009) Fuzzy Sets Syst. 157(16), 2171-2204 (2006) 

20.15 0. Linda, M. Manic: Comparative analysis of type- 20.30 L. Doitsidis, K.P. Valavanis, N.C. Tsourveloudis, 
1 and type-2 fuzzy control in context of learning M. Kontitsis: A framework for fuzzy logic based 
behaviors for mobile robotics, Proc. 36th Annu. UAV navigation and control, Proc. IEEE Int. Conf. 
Conf. IEEE Ind. Electron. Soc. (2010) pp. 1092- Robot. Autom. (ICRA) '04, Vol. 4 (2004) pp. 4041- 
1098 4046 

20.16 C. Wagner, H. Hagras: Toward general type-2 fuzzy 20.31 B. Kadmiry, D. Driankov: A fuzzy gain-scheduler 


logic systems based on ZSlices, IEEE Trans. Fuzzy 
Syst. 18(4), 637-660 (2010) 


for the attitude control of an unmanned heli- 
copter, IEEE Trans. Fuzzy Syst. 12(4), 502-515 (2004) 


325 


oz | a Hed 


326 Part B | Fuzzy Logic 

20.32 H. Hu, P.-Y. Woo: Fuzzy supervisory sliding-mode 20.46 C.-S. Chiu: Mixed feedforward/feedback based 
and neural-network control for robotic manip- adaptive fuzzy control for a class of mimo nonlin- 
ulators, IEEE Trans. Ind. Electron. 53(3), 929-940 ear systems, IEEE Trans. Fuzzy Syst. 14(6), 716-727 
(2006) (2006) 

20.33 E.A. Merchan-Cruz, A.S. Morris: Fuzzy-GA-based 20.47 N. Goléa, A. Goléa, K. Barra, T. Bouktir: Observer- 
trajectory planner for robot manipulators sharing based adaptive control of robot manipulators: 
a common workspace, IEEE Trans. Robot. 22(4), Fuzzy systems approach, Appl. Soft Comput. 8(1), 
613-624 (2006) 778-787 (2008) 

20.34 J. Yu, M. Tan, S. Wang, E. Chen: Development 20.48 Y. Guo, P.-Y. Woo: An adaptive fuzzy sliding mode 
of a biomimetic robotic fish and its control al- controller for robotic manipulators, IEEE Trans. 
gorithm, IEEE Trans. Syst. Man Cybern. B 34(4), Syst. Man Cybern. A 33(2), 149-159 (2003) 
1798-1810 (2004) 20.49 Y.L. Sun, M.J. Er: Hybrid fuzzy control of robotics 

20.35 K.-B. Sim, K.-S. Byun, F. Harashima: Internet- systems, IEEE Trans. Fuzzy Syst. 12(6), 755-765 
based teleoperation of an intelligent robot with (2004) 
optimal two-layer fuzzy controller, IEEE Trans. 20.50 V. Santibañez, R. Kelly, M.A. Llama: A novel global 

v Ind. Electron. 53(4), 1362-1372 (2006) asymptotic stable set-point fuzzy controller with 

3 20.36 R. Martinez, 0. Castillo, L.T. Aguilar: Optimiza- bounded torques for robot manipulators, IEEE 

= tion of interval type-2 fuzzy logic controllers for Trans. Fuzzy Syst. 13(3), 362-372 (2005) 

zas a perturbed autonomous wheeled mobile robot 20.51 E. Kim: Output feedback tracking control of robot 

> using genetic algorithms, Inf. Sci. 179(13), 2158- manipulators with model uncertainty via adap- 
2174 (2009) tive fuzzy logic, IEEE Trans. Fuzzy Syst. 12(3), 368- 

20.37 T. Das, I.N. Kar: Design and implementation of an 378 (2004) 
adaptive fuzzy logic-based controller for wheeled 20.52 X. Jiang, Q.-L. Han: On designing fuzzy con- 
mobile robots, IEEE Trans. Control Syst. Technol. trollers for a class of nonlinear networked control 
14(3), 501-510 (2006) systems, IEEE Trans. Fuzzy Syst. 16(4), 1050-1060 

20.38 E. Maalouf, M. Saad, H. Saliah: A higher level path (2008) 
tracking controller for a four-wheel differentially 20.53 C.-L. Hwang, L.-J. Chang, Y.-S. Yu: Network- 
steered mobile robot, Robot. Auton. Syst. 54(1), based fuzzy decentralized sliding-mode control 
23-33 (2006) for car-like mobile robots, IEEE Trans. Ind. Elec- 

20.39 G. Antonelli, S. Chiaverini, G. Fusco: A fuzzy-logic- tron. 54(1), 574-585 (2007) 
based approach for mobile robot path tracking, 20.54 C.-K. Lin: Nonsingular terminal sliding mode 
IEEE Trans. Fuzzy Syst. 15(2), 211-221 (2007) control of robot manipulators using fuzzy wavelet 

20.40 R.-J. Wai, C.-M. Liu: Design of dynamic petri re- networks, IEEE Trans. Fuzzy Syst. 14(6), 849-859 
current fuzzy neural network and its application (2006) 
to path-tracking control of nonholonomic mobile 20.55 W.W. Melek, A.A. Goldenberg: Neurofuzzy control 
robot, IEEE Trans. Ind. Electron. 56(7), 2667-2683 of modular and reconfigurable robots, IEEE/ASME 
(2009) Trans. Mechatron. 8(3), 381-389 (2003) 

20.41 Z.-G. Hou, A.-M. Zou, L. Cheng, M. Tan: Adaptive 20.56 Y.-C. Chang: Intelligent robust control for un- 
control of an electrically driven nonholonomic certain nonlinear time-varying systems and its 
mobile robot via backstepping and fuzzy ap- application to robotic systems, IEEE Trans. Syst. 
proach, IEEE Trans. Control Syst. Technol. 17(4), Man Cybern. B 35(6), 1108-1119 (2005) 

803-815 (2009) 20.57 F. Sun, L. Li, H.-X. Li, H. Liu: Neuro-fuzzy dynam- 

20.42 C.-Y. Chen, T.-H.S. Li, Y.-C. Yeh: EP-based kine- ic-inversion-based adaptive control for robotic 
matic control and adaptive fuzzy sliding-mode manipulators—discrete time case, IEEE Trans. Ind. 
dynamic control for wheeled mobile robots, Inf. Electron. 54(3), 1342-1351 (2007) 

Sci. 179(1), 180-195 (2009) 20.58 M.O. Efe: Fractional fuzzy adaptive sliding-mode 

20.43 Z. Song, J. Yi, D. Zhao, X. Li: A computed torque control of a 2-DOF direct-drive robot arm, IEEE 
controller for uncertain robotic manipulator sys- Trans. Syst. Man Cybern. B 38(6), 1561-1570 (2008) 
tems: Fuzzy approach, Fuzzy Sets Syst. 154(2), 20.59 C.-S. Chen: Dynamic structure neural-fuzzy net- 
208-226 (2005) works for robust adaptive control of robot manip- 

20.44 J.P. Hwang, E. Kim: Robust tracking control of ulators, IEEE Trans. Ind. Electron. 55(9), 3402-3414 
an electrically driven robot: Adaptive fuzzy logic (2008) 
approach, IEEE Trans. Fuzzy Syst. 14(2), 232-247 20.60 R.-J. Wai, P.-C. Chen: Robust neural-fuzzy-net- 
(2006) work control for robot manipulator including ac- 

20.45 S. Purwar, I.N. Kar, A.N. Jha: Adaptive control of tuator dynamics, IEEE Trans. Ind. Electron. 53(4), 
robot manipulators using fuzzy logic systems un- 1328-1349 (2006) 
der actuator constraints, Fuzzy Sets Syst. 152(3), 20.61 R.-J. Wai, Z.-W. Yang: Adaptive fuzzy neural net- 


651-664 (2005) 


work control design via a T-S fuzzy model for 


Application of Fuzzy Techniques to Autonomous Robots 


References 


20.62 


20.63 


20.64 


20.65 


20.66 


20.67 


20.68 


20.69 


20.70 


20.71 


20.72 


20.73 


20.74 


20.75 


a robot manipulator including actuator dynam- 
ics, IEEE Trans. Syst. Man Cybern. B 38(5), 1326- 
1346 (2008) 

A. Chatterjee, R. Chatterjee, F. Matsuno, T. Endo: 
Augmented stable fuzzy control for flexible 
robotic arm using LMI approach and neuro-fuzzy 
state space modeling, IEEE Trans. Ind. Electron. 
55(3), 1256-1270 (2008) 

F.J. Marin, J. Casillas, M. Mucientes, A.A. Transeth, 
S.A. Fjerdingen, I. Schjølberg: Learning intelligent 
controllers for path-following skills on snake-like 
robots, LNAI. Intell. Robot. Appl. Proc. 4th Int. 
Conf., II ICIRA (2011), 525-535 (2011) 

Z. Bingül, 0. Karahan: A fuzzy logic controller 
tuned with PSO for 2 DOF robot trajectory control, 
Expert Syst. Appl. 38(1), 1017-1031 (2011) 

0. Castillo, R. Martinez-Marroquin, P. Melin, 
F. Valdez, J. Soria: Comparative study of bio- 
inspired algorithms applied to the optimization 
of type-1 and type-2 fuzzy controllers for an au- 
tonomous mobile robot, Inf. Sci. 192, 19-38 (2012) 
M. Biglarbegian, W.W. Melek, J.M. Mendel: De- 
sign of novel interval type-2 fuzzy controllers 
for modular and reconfigurable robots: Theory 
and experiments, IEEE Trans. Ind. Electron. 58(4), 
1371-1384 (2011) 

0. Linda, M. Manic: Uncertainty-robust design of 
interval type-2 fuzzy logic controller for delta par- 
allel robot, IEEE Trans. Ind. Inf. 7(4), 661-670 (2011) 
T.S. Li, S.-J. Chang, W. Tong: Fuzzy target track- 
ing control of autonomous mobile robots by using 
infrared sensors, IEEE Trans. Fuzzy Syst. 12(4), 491- 
501 (2004) 

M. Mucientes, A. Bugarin: People detection with 
quantified fuzzy temporal rules, Proc. IEEE Int. 
Conf. Fuzzy Syst. (2007) pp. 149-1154 

M. Mucientes, A. Bugarin: People detection 
through quantified fuzzy temporal rules, Pattern 
Recognit. 43(4), 1441-1453 (2010) 

M. Mucientes, J. Alcala-Fdez, R. Alcala, J. Casil- 
las: A case study for learning behaviors in mobile 
robotics by evolutionary fuzzy systems, Expert 
Syst. Appl. 37, 1471-1493 (2010) 

P. Carinena, C.V. Regueiro, A. Otero, A.J. Bugarin, 
S. Barro: Landmark detection in mobile robotics 
using fuzzy temporal rules, IEEE Trans. Fuzzy Syst. 
12(4), 423-435 (2004) 

R.M. Mufioz-Salinas, E. Aguirre, M. García- 
Silvente, A. González: A fuzzy system for visual 
detection of interest in human-robot interac- 
tion, Proc. 2nd Int. Conf. Mach. Intell. (ACIDCA- 
ICMI2005) (2005) pp. 574-581 

J. Figueroa, J. Posada, J. Soriano, M. Melgarejo, 
S. Rojas: A type-2 fuzzy controller for tracking 
mobile objects in the context of robotic soccer 
games, Proc. 14th IEEE Int. Conf. Fuzzy Syst. FUZZ 
'05 (2005) pp. 359-364 

X. Yang, M. Moallem, R.V. Patel: A layered goal- 
oriented fuzzy motion planning strategy for mo- 


20.76 


20.77 


20.78 


20.79 


20.80 


20.81 


20.82 


20.83 


20.84 


20.85 


20.86 


20.87 


20.88 


20.89 


bile robot navigation, IEEE Trans. Syst. Man Cy- 
bern. B 35(6), 1214-1224 (2005) 

B. Innocenti, B. Lopez, J. Salvi: A multi-agent 
architecture with cooperative fuzzy control for 
a mobile robot, Robot. Auton. Syst. 55(12), 881- 
891 (2007) 

M.A. Garcia, 0. Montiel, 0. Castillo, R. Sepulveda, 
P. Melin: Path planning for autonomous mobile 
robot navigation with ant colony optimization 
and fuzzy cost function evaluation, Appl. Soft 
Comput. 9(3), 1102-1110 (2009) 

J.-H. Kim, Y.-H. Kim, S.-H. Choi, |.-W. Park: Evo- 
lutionary multi-objective optimization in robot 
soccer system for education, IEEE Comput. Intell. 
Mag. 4(1), 31-41 (2009) 

P. Vadakkepat, 0.C. Miin, X. Peng, T.-H. Lee: Fuzzy 
behavior-based control of mobile robots, IEEE 
Trans. Fuzzy Syst. 12(4), 559-565 (2004) 

P. Vadakkepat, X. Peng, B.K. Quek, T.H. Lee: Evolu- 
tion of fuzzy behaviors for multi-robotic system, 
Robot. Auton. Syst. 55(2), 146-161 (2007) 

A. Chatterjee, F. Matsuno: A neuro-fuzzy as- 
sisted extended Kalman filter-based approach for 
simultaneous localization and mapping (SLAM) 
problems, IEEE Trans. Fuzzy Syst. 15(5), 984-997 
(2007) 

H.-H. Lin, C.-C. Tsai, J.-C. Hsu: Ultrasonic localiza- 
tion and pose tracking of an autonomous mobile 
robot via fuzzy adaptive extended information 
filtering, IEEE Trans. Instrum. Meas. 57(9), 2024- 
2034 (2008) 

M. Begum, G.K.I. Mann, R.G. Gosine: Integrated 
fuzzy logic and genetic algorithmic approach for 
simultaneous localization and mapping of mo- 
bile robots, Appl. Soft Comput. 8(1), 150-165 
(2008) 

X. Cui, T. Hardin, R.K. Ragade, A.S. Elmaghraby: 
A swarm-based fuzzy logic control mobile sensor 
network for hazardous contaminants localiza- 
tion, Proc. IEEE Int. Conf. Mob. Ad-hoc Sens. Syst. 
(2004) pp. 194-203 

D.H.V. Sincák: Multi-robot control system for 
pursuit-evasion problem, J. Electr. Eng. 60(3), 
143-148 (2009) 

|. Harmati, K. Skrzypczyk: Robot team coordina- 
tion for target tracking using fuzzy logic controller 
in game theoretic framework, Robot. Auton. Syst. 
57(1), 75-86 (2009) 

C. Zhou, Q. Meng: Dynamic balance of a biped 
robot using fuzzy reinforcement learning agents, 
Fuzzy Sets Syst. 134(1), 169-187 (2003) 

R.K. Jha, B. Singh, D.K. Pratihar: On-line sta- 
ble gait generation of a two-legged robot using 
a genetic—fuzzy system, Robot. Auton. Syst. 53(1), 
15-35 (2005) 

J.G. Nichol, S.P.N. Singh, K.J. Waldron, L.R. Pal- 
mer, D.E. Orin: System design of a quadruped- 
al galloping machine, Int. J. Robot. Res. 23(10/11), 
1013-1027 (2004) 


327 


oz | a Hed 


328 PartB 


Fuzzy Logic 


oz | d Hed 


20.90 


20.91 


20.92 


20.93 


20.94 


20.95 


20.96 


P.M. Aubin, E. Whittaker, W.R. Ledoux: A robotic 
cadaveric gait simulator with fuzzy logic vertical 
ground reaction force control, IEEE Trans. Robot. 
28(1), 246-255 (2012) 

Z. Liu, Y. Zhang, Y. Wang: A type-2 fuzzy switching 
control system for biped robots, IEEE Trans. Syst. 
Man Cybern. C 37(6), 1202-1213 (2007) 

R.K. Barai, K. Nonami: Optimal two-degree-of- 
freedom fuzzy control for locomotion control of 
a hydraulically actuated hexapod robot, Inf. Sci. 
177(8), 1892-1915 (2007) 

K. Kiguchi, T. Tanaka, K. Watanabe, T. Fukuda: 
Exoskeleton for human upper-limb motion sup- 
port, Proc. IEEE Int. Conf. Robot. Autom. (ICRA) '03, 
Vol. 2 (2003) pp. 2206-2211 

K. Kiguchi, T. Tanaka, T. Fukuda: Neuro-fuzzy 
control of a robotic exoskeleton with EMG signals, 
IEEE Trans. Fuzzy Syst. 12(4), 481-490 (2004) 
M.-S. Ju, C.-C.K. Lin, D.-H. Lin, I.-S. Hwang, 
S.-M. Chen: A rehabilitation robot with force- 
position hybrid fuzzy controller: Hybrid fuzzy 
control of rehabilitation robot, IEEE Trans. Neural 
Syst. Rehabil. Eng. 13(3), 349-358 (2005) 

M.-K. Chang, T.-H. Yuan: Experimental imple- 
mentations of adaptive self-organizing fuzzy 
sliding mode control to a 3-DOF rehabilitation 
robot, Int. J. Innov. Comput. Inf. Control 5(10), 
3391-3404 (2009) 


20.97 


20.98 


20.99 


20.100 


20.101 


20.102 


20.103 


20.104 


M.-K. Chang: An adaptive self-organizing fuzzy 
sliding mode controller for a 2-DOF rehabilitation 
robot actuated by pneumatic muscle actuators, 
Control Eng. Pract. 18(1), 13-22 (2010) 

H. Mobahi, S. Ansari: Fuzzy perception, emo- 
tion and expression for interactive robots, Proc. 
IEEE Int. Conf. Syst., Man Cybern., Vol. 4 (2003) 
pp. 3918-3923 

N. Esau, E. Wetzel, L. Kleinjohann, B. Kleinjo- 
hann: Real-time facial expression recognition us- 
ing a fuzzy emotion model, Proc. IEEE Int. Conf. 
Fuzzy Syst., FUZZ-IEEE (2007) pp. 1-6 

D. Kulic, E. Croft: Anxiety detection during 
human-robot interaction, Proc. IEEE/RSJ Int. Conf. 
Intell. Robot. Syst. (IROS) (2005) pp. 616-621 

D. Kulic, E.A. Croft: Affective state estimation 
for human-robot interaction, IEEE Trans. Robot. 
23(5), 991-1000 (2007) 

Z. Liu, H.-X. Li: A probabilistic fuzzy logic system 
for modeling and control, IEEE Trans. Fuzzy Syst. 
13(6), 848-859 (2005) 

H. Liu: A fuzzy qualitative framework for con- 
necting robot qualitative and quantitative repre- 
sentations, IEEE Trans. Fuzzy Syst. 16(6), 1522-1530 
(2008) 

H. Liu, D.J. Brown, G.M. Coghill: Fuzzy qualita- 
tive robot kinematics, IEEE Trans. Fuzzy Syst. 16(3), 
808-822 (2008) 


21 Foundations of Rough Sets 


2 


2 


N 


w 


Andrzej Skowron, Warsaw, Poland 
Andrzej Jankowski, Warsaw, Poland 
Roman W. Swiniarski, Heidelberg, 
Germany 


Rough Set Methodology 

for Decision Aiding 

Roman Stowinski, Poznan, Poland 
Salvatore Greco, Catania, Italy 
Benedetto Matarazzo, Catania, Italy 


Rule Induction 
from Rough Approximations 
Jerzy W. Grzymala-Busse, Lawrence, USA 


Roug 


Part C Rough Sets 


Ed. by Roman Stowinski, Yiyu Yao 


24 Probabilistic Rough Sets 


2 


vi 


26 


Yiyu Yao, Regina, Saskatchewan, Canada 
Salvatore Greco, Catania, Italy 
Roman Stowinski, Poznan, Poland 


Generalized Rough Sets 

JingTao Yao, Regina, Saskatchewan, 
Canada 

Davide Ciucci, Milano, Italy 

Yan Zhang, Regina, Saskatchewan, 
Canada 


Fuzzy-Rough Hybridization 

Masahiro Inuiguchi, Toyonaka, Osaka, 
Japan 

Wei-Zhi Wu, Zhoushan, Zhejiang, China 
Chris Cornelis, Ghent, Belgium 

Nele Verbiest, Ghent, Belgium 


329 


H'Se 


21. Foundations of Rough Sets 


v 
o 
= 
rr 
A 
N 
= 
. 
= 


Andrzej Skowron, Andrzej Jankowski, Roman W. Swiniarski (deceased) 


The rough set (RS) approach was proposed by 
Pawlak as a tool to deal with imperfect knowl- 
edge. Over the years the approach has attracted 
attention of many researchers and practitioners all 
over the world, who have contributed essentially 
to its development and applications. This chapter 
discusses the RS foundations from rudiments to 
challenges. 


21.1 Rough Sets: 


Comments on Development ................ 331 
21.2 Vague Concepts ........ eee 332 
21.3 Rough Set Philosophy........................ 333 


21.4 Indiscernibility and Approximation ..... 333 
21.5 Decision Systems and Decision Rules ... 336 
21.6 Dependencies ....................ccccec ccs 337 
21.7 Reduction of Attributes....................... 337 
21.8 Rough Membership .....................00068 338 
21.9 Discernibility and Boolean Reasoning.. 339 
21.10 Rough Sets and Induction................... 340 
21.11 Rough Set-Based Generalizations........ 340 
21.12 Rough Sets and Logic ........................ 343 
21.13 Conclusions.....................ceeeeeeeeeee 347 
ROPERGHEGS oanrinnen 347 


21.1 Rough Sets: Comments on Development 


The rough set (RS) approach was proposed by Zdzistaw 
Pawlak in 1982 [21.1,2] as a tool for dealing with im- 
perfect knowledge, in particular with vague concepts. 
Many applications of methods based on rough set the- 
ory alone or in combination with other approaches have 
been developed. This chapter discusses the RS founda- 
tions from rudiments to challenges. 

In the development of rough set theory and appli- 
cations, one can distinguish three main stages. While 
the first period was based on the assumption that ob- 
jects are perceived by means of partial information 
represented by attributes, in the second period it was 
assumed that information about the approximated con- 
cepts is also partial. Approximation spaces and search- 
ing strategies for relevant approximation spaces were 
recognized as the basic tools for rough sets. Impor- 
tant achievements both in theory and applications were 
obtained. Nowadays, a new period for rough sets is 
emerging, which is also briefly characterized in this 
chapter. 

The rough set approach seems to be of fundamen- 
tal importance in artificial intelligence AI and cognitive 
sciences, especially in machine learning, data mining, 


knowledge discovery from databases, pattern recogni- 
tion, decision support systems, expert systems, intel- 
ligent systems, multiagent systems, adaptive systems, 
autonomous systems, inductive reasoning, common- 
sense reasoning, adaptive judgment, conflict analysis. 

Rough sets have established relationships with 
many other approaches such as fuzzy set theory, gran- 
ular computing (GC), evidence theory, formal concept 
analysis, (approximate) Boolean reasoning, multicri- 
teria decision analysis, statistical methods, decision 
theory, and matroids. Despite the overlap with many 
other theories rough set theory may be considered as 
an independent discipline in its own right. There are re- 
ports on many hybrid methods obtained by combining 
rough sets with other approaches such as soft comput- 
ing (fuzzy sets, neural networks, genetic algorithms), 
statistics, natural computing, mereology, principal com- 
ponent analysis, singular value decomposition, or sup- 
port vector machines. 

The main advantage of rough set theory in data anal- 
ysis is that it does not necessarily need any preliminary 
or additional information about data like probability 
distributions in statistics, basic probability assignments 


332 


712 |) Hed 


Part C 


Rough Sets 


in evidence theory, a grade of membership, or the value 
of possibility in fuzzy set theory. 

One can observe the following advantages about the 
rough set approach: 


i) Introduction of efficient algorithms for finding hid- 
den patterns in data. 

ii) Determination of optimal sets of data (data reduc- 
tion); evaluation of the significance of data. 

iii) Generation of sets of decision rules from data. 

iv) Easy-to-understand formulation. 

v) Straightforward interpretation of results obtained. 

vi) Suitability of many of its algorithms for parallel 
processing. 


Due to space limitations, many important research 
topics in rough set theory such as various logics related 
to rough sets and many advanced algebraic properties 
of rough sets are only mentioned briefly in this chapter. 

From the same reason, we herein restrict the ref- 
erences on rough sets to the basic papers by Zdzistaw 
Pawlak (such as [21.1,2]), some survey papers [21.3— 
5], and some books including long lists of references 
to papers on rough sets. The basic ideas of rough set 
theory and its extensions as well as many interesting 
applications can be found in a number of books, issues 


21.2 Vague Concepts 


Mathematics requires that all mathematical notions 
(including sets) must be exact, otherwise precise rea- 
soning would be impossible. However, philosophers 
[21.10], and recently computer scientists as well as 
other researchers, have become interested in vague (im- 
precise) concepts. Moreover, in the twentieth century 
one can observe the drift paradigms in modern science 
from dealing with precise concepts to vague concepts, 
especially in the case of complex systems (e.g., in 
economy, biology, psychology, sociology, and quantum 
mechanics). 

In classical set theory, a set is uniquely determined 
by its elements. In other words, this means that ev- 
ery element must be uniquely classified as belonging 
to the set or not. That is to say the notion of a set 
is a crisp (precise) one. For example, the set of odd 
numbers is crisp because every integer is either odd or 
even. 

In contrast to odd numbers, the notion of a beauti- 
ful painting is vague, because we are unable to classify 
uniquely all paintings into two classes: beautiful and 


of Transactions on Rough Sets, special issues of other 
journals, numerous proceedings of international confer- 
ences, and tutorials [21.3, 6,7]. The reader is referred to 
the cited books and papers, references therein, as well 
as to web pages [21.8, 9]. 

The chapter is structured as follows. In Sect. 21.2 
we discuss some basic issues related to vagueness and 
vague concepts. The rough set philosophy is outlined 
in Sect. 21.3. The basic concepts for rough sets such 
as indiscernibility and approximation are presented in 
Sect. 21.4. Decision systems and rules are covered in 
Sect. 21.5. The basic information about dependencies 
is included in Sect. 21.6. Attribute reduction belonging 
to one of the basic problems of rough sets is discussed 
in Sect. 21.7. Rough membership function as a tool for 
measuring degrees of inclusion of sets is presented in 
Sect. 21.8. The role of discernibility and Boolean rea- 
soning for solving problems related to rough sets is 
briefly explained in Sect. 21.9. In Sect. 21.10 a short 
discussion on rough sets and induction is included. 
Several generalizations of the approach proposed by 
Pawlak are discussed in Sect. 21.11. In this section 
some emerging research directions related to rough sets 
are also outlined. In Sect. 21.12 some comments about 
logics based on rough sets are included. The role of 
adaptive judgment is emphasized. 


not beautiful. With some paintings it cannot be decided 
whether they are beautiful or not and thus they remain 
in the doubtful area. Thus, beauty is not a precise but 
a vague concept. 

Almost all concepts that we use in natural lan- 
guage are vague. Therefore, common sense reasoning 
based on natural language must be based on vague 
concepts and not on classical logic. An interesting dis- 
cussion of this issue can be found in [21.11]. The idea 
of vagueness can be traced back to the ancient Greek 
philosopher Eubulides of Megara (ca. 400 BC) who 
first formulated the so-called sorites (heap) and falakros 
(bald man) paradoxes [21.10]. There is a huge literature 
on issues related to vagueness and vague concepts in 
philosophy [21.10]. 

Vagueness is often associated with the boundary re- 
gion approach (i.e., existence of objects which cannot 
be uniquely classified relative to a set or its comple- 
ment), which was first formulated in 1893 by the father 
of modern logic, the German logician, Gottlob Frege 
(1848-1925) ([21.12]). According to Frege the concept 


Foundations of Rough Sets | 21.4 Indiscernibility and Approximation 


must have a sharp boundary. To the concept without 
a sharp boundary there would correspond an area that 
would not have any sharp boundary line all around. 
This means that mathematics must use crisp, not vague 
concepts, otherwise it would be impossible to reason 
precisely. 

One should also note that vagueness also relates to 
insufficient specificity, as the result of a lack of feasi- 
ble searching methods for sets of features adequately 


21.3 Rough Set Philosophy 


Rough set philosophy is founded on the assumption that 
with every object of the universe of discourse we asso- 
ciate some information (data, knowledge). For example, 
if objects are patients suffering from a certain disease, 
symptoms of the disease form information about the pa- 
tients. Objects characterized by the same information 
are indiscernible (similar) in view of the available in- 
formation about them. 

The indiscernibility relation generated in this way 
is the mathematical basis of rough set theory. This un- 
derstanding of indiscernibility is related to the idea 
of Gottfried Wilhelm Leibniz that objects are indis- 
cernible if and only if all available functionals take 
identical values on them (Leibniz’s law of indiscerni- 
bility: the identity of indiscernibles) [21.13]. However, 
in the rough set approach indiscernibility is defined rel- 
ative to a given set of functionals (attributes). 

Any set of all indiscernible (similar) objects is 
called an elementary set and forms a basic granule 
(atom) of knowledge about the universe. Any union 
of some elementary sets is referred to as a crisp (pre- 
cise) set. If a set is not crisp, then it is called rough 
(imprecise, vague). Consequently, each rough set has 
borderline cases (boundary-line), i.e., objects which 
cannot be classified with certainty as members of either 
the set or its complement. Obviously, crisp sets have no 
borderline elements at all. This means that borderline 
cases cannot be properly classified by employing avail- 
able knowledge. 

Thus, the assumption that objects can be seen only 
through the information available about them leads to 


describing concepts. A discussion on vague (imprecise) 
concepts in philosophy includes their following charac- 
teristic features [21.10]: (i) the presence of borderline 
cases, (ii) boundary regions of vague concepts are not 
crisp, (ili) vague concepts are susceptible to sorites 
paradoxes. In the sequel we discuss the first two issues 
in the RS framework. The reader can find a discussion 
on the application of the RS approach to the third item 
in [21.11]. 


the view that knowledge has granular structure. Due 
to the granularity of knowledge, some objects of in- 
terest cannot be discerned and appear as the same (or 
similar). As a consequence, vague concepts in contrast 
to precise concepts, cannot be characterized in terms 
of information about their elements. Therefore, in the 
proposed approach, we assume that any vague concept 
is replaced by a pair of precise concepts — called the 
lower and the upper approximation of the vague con- 
cept. The lower approximation consists of all objects 
which definitely belong to the concept and the upper 
approximation contains all objects which possibly be- 
long to the concept. The difference between the upper 
and the lower approximation constitutes the boundary 
region of the vague concept. Approximation operations 
are the basic operations in rough set theory. Properties 
of the boundary region (expressed, e.g., by the rough 
membership function) are important in the rough set 
methods. 

Hence, rough set theory expresses vagueness not by 
means of membership, but by employing a boundary re- 
gion of a set. If the boundary region of a set is empty it 
means that the set is crisp, otherwise the set is rough 
(inexact). A nonempty boundary region of a set means 
that our knowledge about the set is not sufficient to de- 
fine the set precisely. 

Rough set theory it is not an alternative to classical 
set theory but it is embedded in it. Rough set theory 
can be viewed as a specific implementation of Frege’s 
idea of vagueness, i. e., imprecision in this approach is 
expressed by a boundary region of a set. 


21.4 Indiscernibility and Approximation 


The starting point of rough set theory is the indiscerni- 
bility relation, which is generated by information about 
objects of interest (Sect. 21.1). The indiscernibility rela- 


tion expresses the fact that due to a lack of information 
(or knowledge) we are unable to discern some objects 
by employing available information (or knowledge). 


333 


tiz |) Hed 


334 PartC 


Rough Sets 


Lz |) Hed 


This means that, in general, we are unable to deal with 
each particular object but we have to consider granules 
(clusters) of indiscernible objects as a fundamental ba- 
sis for our theory. 

From a practical point of view, it is better to define 
basic concepts of this theory in terms of data. There- 
fore, we will start our considerations from a data set 
called an information system. An information system 
can be represented by a data table containing rows la- 
beled by objects of interest and columns labeled by 
attributes and entries of the table are attribute values. 
For example, a data table can describe a set of pa- 
tients in a hospital. The patients can be characterized 
by some attributes, like age, sex, blood pressure, body 
temperature, etc. With every attribute a set of its val- 
ues is associated, e.g., values of the attribute age can 
be young, middle, and old. Attribute values can also be 
numerical. In data analysis the basic problem that we 
are interested in is to find patterns in data, i.e., to find 
a relationship between some set of attributes, e.g., we 
might be interested whether blood pressure depends on 
age and sex. 

More formally, suppose we are given a pair A = 
(U,A) of nonempty, finite sets U and A, where U 
is the universe of objects, and an A—a set consist- 
ing of attributes, i.e., functions a: U —> V,, where 
V, is the set of values of attribute a, called the do- 
main of a. The pair A =(U,A) is called an infor- 
mation system. Any information system can be repre- 
sented by a data table with rows labeled by objects 
and columns labeled by attributes. Any pair (x, a), 
where x€ U and a €A defines the table entry con- 
sisting of the value a(x). Note that in statistics or 
machine learning such a data table is called a sam- 
ple [21.14]. 

Any subset B of A determines a binary relation 
IND(B) on U, called an indiscernibility relation, de- 
fined by 


xIND(B)y if and only if (21.1) 

a(x) = a(y) for everya eB, ` 
where a(x) denotes the value of attribute a for the ob- 
ject x. 

Obviously, IND(B) is an equivalence relation. The 
family of all equivalence classes of IND(B), i.e., 
the partition determined by B, will be denoted by 
U/IND(B), or simply U/B; the equivalence class of 
IND(B), i.e., the block of the partition U/B, contain- 
ing x will be denoted by B(x) (other notation used: [x], 
or [x]mpæ)). Thus in view of the data we are unable, in 


general, to observe individual objects but we are forced 
to reason only about the accessible granules of knowl- 
edge. 

If (x,y) € IND(B) we will say that x and y are 
B-indiscernible. Equivalence classes of the relation 
IND(B) (or blocks of the partition U/B) are referred 
to as B-elementary sets or B-elementary granules. 
In the rough set approach the elementary sets are 
the basic building blocks (concepts) of our knowl- 
edge about reality. The unions of B-elementary sets 
are called B-definable sets. Let us note that in appli- 
cations we consider only some subsets of the fam- 
ily of definable sets, e.g., defined by conjunction 
of descriptors only. This is due to the computa- 
tional complexity of the searching problem for rele- 
vant definable sets in the whole family of definable 
sets. 

For BCA, we denote by Infg(x) the B-signature 
of x€ U, i.e., the set {(a, a(x)):a € B}. Let INF(B) = 
{Infg(x): x € U}. Then for any objects x, y € U the fol- 
lowing equivalence holds: xIND(B)y if and only if 
Infg (x) = Infg (y). 

The indiscernibility relation is used to define the ap- 
proximations of concepts. We define the following two 
operations on sets X C U 


BX) = {x € U: B(x) CX}, 
B*(X) = {xE U: BO) NX FB, 


(21.2) 
(21.3) 


assigning to every subset X of the universe U two sets 
B,.(X) and B*(X) called the B-lower and the B-upper 
approximation of X, respectively. The set 

BNg(X) = B* (X) — B4 (X) , (21.4) 
will be referred to as the B-boundary region of X. 

From the definition we obtain the following in- 
terpretation: (i) the lower approximation of a set X 
with respect to B is the set of all objects, which can 
for certain be classified to X using B (are certainly 
in X in view of B), (ii) the upper approximation of 
a set X with respect to B is the set of all objects which 
can possibly be classified to X using B (are possi- 
bly in X in view of B), (iii) the boundary region of 
a set X with respect to B is the set of all objects, 
which can be classified neither to X nor to not-X us- 
ing B. 

Due to the granularity of knowledge, rough sets 
cannot be characterized by using available knowledge. 
The definition of approximations is clearly depicted in 
Fig. 21.1. 


Foundations of Rough Sets | 21.4 Indiscernibility and Approximation 


The approximations have the following properties 


B4 (X) CX C B*(X), 
B» (Ø) = B* (Ø) = Ø , Bx(U) = B*(U) =U, 
B* (X UY) = B* (X) U B* (Y) , 
B(X OY) = B(X) O Bx (Y) , 
X CY implies B4 (X) C Bs (Y) 
and B* (X) C B* (Y), 
By (X UY) 2 Bx (X) U B4 (Y), 
B*(X NY) C B*(X)NB*(Y), 
B,(—X) = —B*(X), 
B* (—X) = —B,,.(X) , 
B,(Bx(X)) = B* (Bx (X)) = Bx(X) , 
B* (B* (X)) = B+ (B* (X)) = B* (X) . (21.5) 


Let us note that the inclusions (for union and inter- 
section) in (21.5) cannot, in general, be substituted by 
the equalities. This has some important algorithmic and 
logical consequences. 

Now we are ready to give the definition of rough 
sets. If the boundary region of X is the empty set, i. e., 
BNz (X) = Ø, then the set X is crisp (exact) with respect 
to B; in the opposite case, i. e., if BNg (X) Æ Ø, the set X 
is referred to as rough (inexact) with respect to B. Thus 
any rough set, in contrast to a crisp set, has a nonempty 
boundary region. This is the idea of vagueness proposed 
by Frege. 


Granules of knowledge The universe of objects 


The lower The set | The upper 
approximation approximation 


Fig. 21.1 A rough set 


Let us observe that the definition of rough sets refers 
to data (knowledge), and is subjective, in contrast to the 
definition of classical sets, which is in some sense an 
objective one. 

A rough set can also be characterized numerically 
by the following coefficient 


_ card(Bx (X)) 


a= card(B* (X)) ` 


(21.6) 


called the accuracy of approximation, where X ~ Ø and 
card(X) denotes the cardinality of X. Obviously 0 < 
ap(X) < 1. If æg(X) = 1 then X is crisp with respect 
to B (X is precise with respect to B), and otherwise, if 
ap(X) < 1 then X is rough with respect to B (X is vague 
with respect to B). The accuracy of approximation can 
be used to measure the quality of approximation of de- 
cision classes on the universe U. One can use another 
measure of accuracy defined by 1 — œg (X) or by 


card(BNg(X)) 
7 card(U) 


Some other measures of approximation accuracy are 
also used, e.g., based on entropy or some more specific 
properties of boundary regions. The choice of a relevant 
accuracy of approximation depends on a particular data 
set. Observe that the accuracy of approximation of X 
can be tuned by B. Another approach to the accuracy of 
approximation can be based on the variable precision 
rough set model (VPRSM). 

In [21.10], it is stressed that boundaries of vague 
concepts are not crisp. In the definition presented in 
this chapter, the notion of boundary region is de- 
fined as a crisp set BNg(X). However, let us ob- 
serve that this definition is relative to the subjective 
knowledge expressed by attributes from B. Different 
sources of information may use different sets of at- 
tributes for concept approximation. Hence, the bound- 
ary region can change when we consider these differ- 
ent views. Another reason for boundary change may 
be related to incomplete information about concepts. 
They are known only on samples of objects. Hence, 
when new objects appear again the boundary region 
may change. From the discussion in the literature it 
follows that vague concepts cannot be approximated 
with satisfactory quality by static constructs such as 
induced membership inclusion functions, approxima- 
tions, or models derived, e.g., from a sample. An 
understanding of vague concepts can be only realized 
in a process in which the induced models adaptively 


335 


7°12 |) Hed 


336 


G'IZ | ) Hed 


Part C 


Rough Sets 


match the concepts in dynamically changing environ- 
ments. This conclusion seems to have important con- 
sequences for the further development of rough set 


theory in combination with fuzzy sets and other soft 
computing paradigms for adaptive approximate reason- 
ing. 


21.5 Decision Systems and Decision Rules 


In this section, we discuss the decision rules (con- 
structed over a selected set B of features or a family 
of sets of features), which are used in inducing clas- 
sification algorithms (classifiers), making it possible to 
classify unseen objects to decision classes. Parameters 
which are tuned in searching for a classifier with high 
quality are its description size (defined, e.g., by used 
decision rules) and its quality of classification (mea- 
sured, e.g., by the number of misclassified objects on 
a given set of objects). By selecting a proper balance 
between the accuracy of classification and the descrip- 
tion size one can search for classifier with a high quality 
of classification also on testing objects. This approach 
is based on the minimum description length principle 
(MDL) [21.15]. 

In an information system A = (U,A) we some- 
times distinguish a partition of A into two disjoint 
classes C, D CA of attributes, called condition and de- 
cision (action) attributes, respectively. The tuple A = 
(U,C,D) is called a decision system (or a decision 
table). 

Let V=U{V. : a € C}U {V4 |d € D}. Atomic 
formulae over B C CUD and V are expressions a = 
v called descriptors (selectors) over B and V, where 
a €B and veV,. The set of formulae over B 
and V, denoted by F(B,V), is the least set con- 
taining all atomic formulae over B and V and 
closed under the propositional connectives A (con- 
junction), V (disjunction) and — (negation). By ||gll.a 
we denote the meaning of p € F(B,V) in the de- 
cision system A, which is the set of all objects 
in U with the property g. These sets are defined 
by 


la =vlla = tre Ula) =v}, 

IgA ¢'lla =llellaNle'lla; 

lev ¢'lla = lola Ulle'lla : 
I-¢lla =U- lela . 


The formulae from F(C, V), F(D, V) are called con- 
dition formulae of A and decision formulae of A, 
respectively. 


Any object xe U belongs to the decision class 
ll Asep 4 = d(x) || a of A. All decision classes of A 
create a partition U/D of the universe U. 

A decision rule for A is any expression of the 
form g => Y, where g € F(C, V), y € F(D, V), and 
loll a #9. Formulae gy and y are referred to as the 
predecessor and the successor of decision rule g > w. 
Decision rules are often called JF... THEN... rules. 
Such rules are used in machine learning. 

Decision rule y = y is true in A if and only if 
lola © lY l| a. Otherwise, one can measure its truth 
degree by introducing some inclusion measure of ||¢||_a 
in ||w||.4. Let us denote by card 4 (9) (or card(@), if this 
does not lead to confusion) the number of objects from 
U that satisfies formula g, i. e., the cardinality of |||]. 
According to Lukasiewicz [21.16], one can assign to 
formula ọ the value 


card(y) 
card(U) ° 


and to the implication g = w the fractional value 


card(p Ay) 

card(y) `? 
under the assumption that ||g||4 49. The fractional 
part proposed by Lukasiewicz was adapted much later 
by machine learning and data mining community, e.g., 
in the definitions of the accuracy of decision rules or 
confidence of association rules. 

For any decision system A =(U,C,D) one can 
consider a generalized decision function 64: U —> 
POW(INF(D)), where for any x € U, 64(x) is the set 
of all D-signatures of objects from U which are C- 
indiscernible with x, A= CU D and POW(INF(D)) is 
the powerset of the set INF(D) of all possible decision 
signatures. 

The decision system A is called consistent (de- 
terministic), if card(d4(x)) = 1, for any x € U. Other- 
wise A is said to be inconsistent (nondeterministic). 
Hence, a decision system is inconsistent if it consists 


Foundations of Rough Sets | 21.7 Reduction of Attributes 


of some objects with different decisions but that are 
indiscernible with respect to condition attributes. Any 
set consisting of all objects with the same general- 
ized decision value is called a generalized decision 
class. 

Now, one can consider certain (possible) rules for 
decision classes defined by the lower (upper) approxi- 
mations of such generalized decision classes of A. This 
approach can be extended by using the relationships of 
rough sets with the evidence theory (Dempster-Shafer 


21.6 Dependencies 


Another important issue in data analysis is discover- 
ing dependencies between attributes in a given decision 
system A = (U, C, D). Intuitively, a set of attributes D 
depends totally on a set of attributes C, denoted C > D, 
if the values of attributes from C uniquely determine the 
values of attributes from D. In other words, D depends 
totally on C, if there exists a functional dependency be- 
tween values of C and D. 

D can depend partially on C. Formally such a de- 
pendency can be defined in the following way. We will 
say that D depends on C to a degree k (0 < k < 1), de- 
noted by C =; D, if 


_ _ card(POSc(D)) 
k= y(C,D) = ~ card(U) P (21.7) 
where 
POSc(D)= |) Cw, (21.8) 
XEU/D 


which is called a positive region of the partition U/D 
with respect to C, is the set of all elements of U that can 


21.7 Reduction of Attributes 


We often face the question as to whether we can re- 
move some data from a data table and still preserve 
its basic properties, that is — whether a table contains 
some superfluous data. Let us express this idea more 
precisely. 

Let C,DCA be sets of condition and decision 
attributes, respectively. We will say that C’ C C is a D- 
reduct (reduct with respect to D) of C, if C’ is a minimal 


theory) by considering rules relative to decision classes 
defined by the lower approximations of unions of deci- 
sion classes of A. 

Numerous methods have been developed for the 
generation of different types of decision rules, and the 
reader is referred to the literature on rough sets for de- 
tails. Usually, one is searching for decision rules that are 
(semi) optimal with respect to some optimization crite- 
ria describing the quality of decision rules in concept 
approximations. 


be uniquely classified to blocks of the partition U/D, 
by means of C. 

If k= 1, we say that D depends totally on C, and 
if k < 1, we say that D depends partially (to degree k) 
on C. If k = 0, then the positive region of the partition 
U/D with respect to C is empty. 

The coefficient k expresses the ratio of all elements 
of the universe, which can be properly classified to 
blocks of the partition U/D, employing attributes C and 
is called the degree of the dependency. 

It can be easily seen that if D depends totally on C, 
then IND(C) € IND(D). This means that the partition 
generated by C is finer than the partition generated 
by D. 

Summing up: D is totally (partially) dependent on 
C, if all (some) elements of the universe U can be 
uniquely classified to blocks of the partition U/D, em- 
ploying C. Observe that (21.7) defines only one of the 
possible measures of dependency between attributes. 
Note that one can consider dependencies between ar- 
bitrary subsets of attributes in the same way. One also 
can compare the dependency discussed in this section 
with dependencies considered in databases. 


subset of C such that 
y(C, D) = y(C’,D). (21.9) 


The intersection of all D-reducts is called a D-core 
(core with respect to D). Because the core is the in- 
tersection of all reducts, it is included in every reduct, 
i.e., each element of the core belongs to some reduct. 


337 


212 |) Wed 


338 


8°LZ |) Hed 


Part C 


Rough Sets 


Thus, in a sense, the core is the most important subset 
of attributes, since none of its elements can be removed 
without affecting the classification power of attributes. 
Certainly, the geometry of reducts can be more comlex. 
For example, the core can be empty but there can ex- 
ist a partition of reducts into a few sets with nonempty 
intersection. 

Many other kinds of reducts and their approxima- 
tions have been discussed in the literature. They are 
defined relative to different quality measures. For ex- 
ample, if one changes the condition (21.9) to d4(x) = 
dp(x), (where A = CUD and B = C’ UD), then the de- 
fined reducts preserve the generalized decision. Other 
kinds of reducts preserve, e.g., (i) the distance between 
attribute value vectors for any two objects, if this dis- 
tance is greater than a given threshold, (ii) the distance 
between entropy distributions between any two objects, 
if this distance exceeds a given. Yet another kind of 
reducts is defined by the so-called reducts relative to 
object used for the generation of decision rules. 


21.8 Rough Membership 


Let us observe that rough sets can be also defined em- 
ploying the rough membership function (21.10) instead 
of approximation. That is, consider 


ux: U — [0,1], 
defined by 
ga _ card(B(x) N X) 
Hx) = card(B(x)) `° peat) 


where x€ X C U. The value u(x) can be interpreted 
as the degree that x belongs to X in view of knowledge 
about x expressed by B or the degree to which the el- 
ementary granule B(x) is included in the set X. This 
means that the definition reflects a subjective knowl- 
edge about elements of the universe, in contrast to the 
classical definition of a set related to objective knowl- 
edge. 

Rough membership function can also be interpreted 
as the conditional probability that x belongs to X 
given B. One may refer to Bayes’ theorem as the origin 
of this function. This interpretation was used by several 
researchers in the rough set community. 

One can observe that the rough membership func- 
tion has the following properties: 


Reducts are used for building data models. Choos- 
ing a particular reduct or a set of reducts has an 
impact on the model size as well as on its qual- 
ity in describing a given data set. The model size 
together with the model quality are two basic com- 
ponents tuned in selecting relevant data models. This 
is known as the minimum length principle. Selection 
of relevant kinds of reducts is an important step in 
building data models. It turns out that the different 
kinds of reducts can be efficiently computed using 
heuristics based, e.g., on the Boolean reasoning ap- 
proach. 

Let us note that analogously to the information 
flow [21.17] one can consider different theories over 
information or decision systems representing different 
views on knowledge encoded in the systems. In partic- 
ular, this approach was used for inducing concurrent 
models from data tables. For more details the reader 
is referred to the books cited at the beginning of the 
chapter. 


1) u(x) = 1 iff x € B.(X), 

2) w(x) = Oiff xe U—B* (Xx), 

3) 0 < u(x) < 1 iff x € BNa(X), 

4) u? y(x) = 1-8) for any x€ U, 

5) weoy@) = max(uf (x), 2 (x) for any x € U, 
6) uny) < min(uf (x), u?(x)) for any x € U. 


From the properties it follows that the rough mem- 
bership differs essentially from the fuzzy member- 
ship [21.18], for properties 5) and 6) show that the 
membership for union and intersection of sets, in gen- 
eral, cannot be computed — as in the case of fuzzy sets 
— from their constituents’ membership. Thus formally 
rough membership is different from fuzzy membership. 
Moreover, the rough membership function depends on 
available knowledge (represented by attributes from B). 
Besides, the rough membership function, in contrast 
to the fuzzy membership function, has a probabilistic 
flavor. 

Let us also mention that rough set theory, in con- 
trast to fuzzy set theory, clearly distinguishes two very 
important concepts, vagueness and uncertainty, very 
often confused in the AI literature. Vagueness is the 
property of concepts. Vague concepts can be approx- 
imated using the rough set approach. Uncertainty is 
the property of elements of a set or a set itself (e.g., 


Foundations of Rough Sets 


only examples and/or counterexamples of elements of 
a considered set are given). Uncertainty of elements of 
a set can be expressed by the rough membership func- 
tion. 

Both fuzzy and rough set theory represent two 
different approaches to vagueness. Fuzzy set theory ad- 
dresses gradualness of knowledge, expressed by the 
fuzzy membership, whereas rough set theory addresses 
granularity of knowledge, expressed by the indiscerni- 
bility relation. One can also cope with knowledge 
gradualness using the rough membership. A nice illus- 
tration of this difference was given by Dider Dubois and 


Henri Prade in their example related to image process- 
ing, where fuzzy set theory refers to gradualness of gray 
level, whereas rough set theory is about the size of pix- 
els. 

Consequently, these theories do not compete with 
each other but are rather complementary. In particular, 
the rough set approach provides tools for approxi- 
mate construction of fuzzy membership functions. The 
rough-fuzzy hybridization approach has proved to be 
successful in many applications. An interesting discus- 
sion of fuzzy and rough set theory in the approach to 
vagueness can be found in [21.11]. 


21.9 Discernibility and Boolean Reasoning 


The discernibility relations are closely related to indis- 
cernibility relations and belong to the most important 
relations considered in rough set theory. Tools for dis- 
covering and classifying patterns are based on reason- 
ing schemes rooted in various paradigms. Such patterns 
can be extracted from data by means of methods based, 
e.g., on discernibility and Boolean reasoning. 

The ability to discern between perceived objects is 
important for constructing many entities like reducts, 
decision rules, or decision algorithms. In the standard 
approach the discernibility relation DIS(B) C U x U is 
defined by xDIS(B)y if and only if non(xIND(B)y), i. e., 
B(x) N B) = Ø. However, this is, in general, not the 
case for generalized approximation spaces. 

The idea of Boolean reasoning is based on the con- 
struction for a given problem P of a corresponding 
Boolean function fp with the following property: the so- 
lutions for the problem P can be decoded from prime 
implicants of the Boolean function fp [21.19-21]. Let 
us mention that to solve real-life problems it is neces- 
sary to deal with very large Boolean functions. 

A successful methodology based on the discerni- 
bility of objects and Boolean reasoning has been de- 
veloped for computing many important ingredients for 
applications. These applications include generation of 
reducts and their approximations, decision rules, as- 
sociation rules, discretization of real-valued attributes, 
symbolic value grouping, searching for new features de- 
fined by oblique hyperplanes or higher-order surfaces, 
pattern extraction from data, as well as conflict resolu- 
tion or negotiation [21.4, 6]. 

Most of the problems related to the generation of the 
above-mentioned entities are NP-complete or NP-hard. 
However, it was possible to develop efficient heuris- 


tics returning suboptimal solutions of the problems. 
The results of experiments on many data sets are very 
promising. They show very good quality of solutions 
generated by the heuristics in comparison with other 
methods reported in the literature (e.g., with respect to 
the classification quality of unseen objects). Moreover, 
they are very efficient from the point of view of the time 
necessary to compute the solution. Many of these meth- 
ods are based on discernibility matrices. However, it 
is possible to compute the necessary information about 
these matrices without their explicit construction (i. e., 
by sorting or hashing original data). 

It is important to note that the methodology makes 
it possible to construct heuristics with a very impor- 
tant approximation property, which can be formulated 
as follows: expressions, called approximate implicants, 
generated by heuristics that are close to prime impli- 
cants define approximate solutions for the problem. 

Mining large data sets is one of the biggest chal- 
lenges in knowledge discovery and data mining (KDD). 
In many practical applications, there is a need for data 
mining algorithms running on terminals of a client— 
server database system where the only access to 
database (located in the server) is enabled by queries 
in structured query language (SQL). 

Let us consider two illustrative examples of prob- 
lems for large data sets: (i) searching for short reducts, 
(ii) searching for best partitions defined by cuts on con- 
tinuous attributes. In both cases, the traditional imple- 
mentations of rough sets and Boolean reasoning-based 
methods are characterized by a high computational cost. 
The critical factor for the time complexity of algorithms 
solving the discussed problems is the number of data 
access operations. Fortunately some efficient modifi- 


21.9 Discernibility and Boolean Reasoning 339 


6°12 |) Hed 


340 


WIZ |) Hed 


Part C 


Rough Sets 


cations of the original algorithms were proposed by 
relying on concurrent retrieval of higher-level statistics, 
which are sufficient for the heuristic search of reducts 


21.10 Rough Sets and Induction 


The rough set approach is strongly related to inductive 
reasoning (e.g., in rough set-based methods for induc- 
ing classifiers or clusters [21.6]). The general idea for 
inducing classifiers is as follows. From a given decision 
table a set of granules in the form of decision rules is 
induced together with arguments for and against each 
decision rule and decision class. For any new object 
with known signature one can select rules matching 
this object. Note that the left-hand sides of decision 
rules are described by formulae that make it possible 
to check for new objects if they satisfy them assuming 
that the signatures of these objects are known. In this 
way, one can consider two semantics of formulae: on 
a sample of objects U and on its extension U* D> U. 
Definitely, one should consider a risk related to such 
generalization, e.g., in the decision rule induction. Next, 
a conflict resolution should be applied to resolve con- 
flicts between matched rules by new object voting for 
different decisions. In the rough set approach, the pro- 
cess of inducing classifiers can be considered as the 
process of inducing approximations of concepts over 


and partitions [21.4,6]. The rough set approach was 
also applied in the development of other scalable big 
data processing techniques (e.g., [21.22]). 


extensions of approximation spaces (defined over sam- 
ples of objects represented by decision systems). The 
whole procedure can be generalized for the case of ap- 
proximation of more complex information granules. It 
is worthwhile mentioning that approaches for inducing 
approximate reasoning schemes have also been devel- 
oped. 

A typical approach in machine learning is based 
on inducing classifiers from samples of objects. These 
classifiers are used for prediction decisions on objects 
unseen so far, if only the signatures of these objects 
are available. This approach can be called global, i.e., 
leading to decision extension from a given sample of 
objects on the whole universe of objects. This global ap- 
proach has some drawbacks (see the Epilog in [21.23]). 
Instead of this one can try to use transduction [21.23], 
semi-supervised learning, induced local models rela- 
tive to new objects, or adaptive learning strategies. 
However, we are still far away from fully understand- 
ing the discovery processes behind such generalization 
strategies [21.24]. 


21.11 Rough Set-Based Generalizations 


The original approach by Pawlak was based on indis- 
cernibility defined by equivalence relations. Any such 
indiscernibility relation defines a partition of the uni- 
verse of objects. Over the years many generalizations 
of this approach were introduced, many of which are 
based on coverings rather than partitions. In particu- 
lar, one can consider the similarity (tolerance)-based 
rough set approach, binary relation based rough sets, 
neighborhood and covering rough sets, the dominance- 
based rough set approach, hybridization of rough sets 
and fuzzy sets, and many others. 

One should note that dealing with coverings re- 
quires solving several new algorithmic problems such 
as the selection of family of definable sets or resolving 
problems with the selection of the relevant definition of 
the approximation of sets among many possible ones. 
One should also note that for a given problem (e.g., 


classification problem) one should discover the relevant 
covering for the target classification task. In the litera- 
ture there are numerous papers dedicated to theoretical 
aspects of the covering rough set approach. However, 
still much more work should be done on rather hard 
algorithmic issues, e.g., for the relevant covering dis- 
covery. 

Another issue to be solved is related to inclusion 
measures. Parameters of such measures are tuned in 
inducing of the high quality approximations. Usually, 
this is done on the basis of the minimum description 
length principle. In particular, approximation spaces 
with rough inclusion measures have been investigated. 
This approach was further extended to the rough mereo- 
logical approach. More general cases of approximation 
spaces with rough inclusion have also been discussed 
in the literature, including approximation spaces in GC. 


Foundations of Rough Sets | 21.11 Rough Set-Based Generalizations 


Finally, it is worthwhile mentioning the approach for 
ontology approximation used in hierarchical learning of 
complex vague concepts [21.6]. 

In this section, we discuss in more detail some is- 
sues related to the above-mentioned generalizations. 
Several generalizations of the classical rough set ap- 
proach based on approximation spaces defined as pairs 
of the form (U, R), where R is the equivalence relation 
(called indiscernibility relation) on the set U, have been 
reported in the literature. They are related to different 
views on important components used in the definition 
of rough sets. In the definition of rough sets different 
kinds of structural sets that are examples of information 
granules are used. From mathematical point of view, 
one may treat them as sets defined over the hierarchy 
of the powerset of objects. Among them are the follow- 
ing ones: 


@ Elementary granules (neighborhoods) of objects 
(e.g., similarity, tolerance, dominance neighbor- 
hoods, fuzzy neighborhoods, rough-fuzzy neigh- 
borhoods, fuzzy rough neighborhoods, families of 
neighborhoods). 

@ Granules defined by accessible information about 
objects (e.g., only partial information on the signa- 
ture of objects may be accessible). 

@ Methods for the definition of higher-order informa- 
tion granules (e.g., defined by the left-hand sides of 
induced decision rules or clusters of similar infor- 
mation granules). 

@ Inclusion measures making it possible to define the 
degrees of inclusion and/or closeness between in- 
formation granules (e.g., the degrees of inclusion 
granules defined by accessible information about 
objects into elementary granules). 

@ Aggregation methods of inclusion or/and closeness 
degrees. 

@ Methods for the definition of approximation opera- 
tions, including strategies for extension of approx- 
imations from samples of objects to larger sets of 
objects. 

@ Algebraic structures of approximation spaces. 


Let us consider some examples of generalizations 
of the rough set approach proposed by Pawlak in 1982. 

A generalized approximation space [21.25] can be 
defined by a tuple AS = (U, I, v), where J is the uncer- 
tainty function defined on U with values in the powerset 
POW(U) of U (I(x) is the neighborhood of x) and v is 
the inclusion function defined on the Cartesian product 
POW(U) x POW(U) with values in the interval [0, 1] 


measuring the degree of inclusion of sets. The lower 
and upper approximation operations can be defined 
in AS by 


LOW(AS, X) = {x€ U: v(I(x), X) = 1}, (21.11) 
UPP(.AS, X) = {x € U: v(x), X) > O}. (21.12) 


In the case considered by Pawlak [21.2], I(x) is equal to 
the equivalence class B(x) of the indiscernibility rela- 
tion IND(B); in the case of the tolerance (or similarity) 
relation T C Ux U we take I(x) = [x]r = {y € U: xTy}, 
i.e., I(x) is equal to the tolerance class of T defined 
by x. 

The standard rough inclusion relation vgpy is defined 
for X, Y C U by 


card(X N Y) 
card (X) 
1, otherwise . 


, if XAG, 
Vsri(X, Y) = 7 


(21.13) 


For applications it is important to have some construc- 
tive definitions of J and v. 

One can consider another way to define I(x). Usu- 
ally together with AS we consider some set F of 
formulae describing sets of objects in the universe U 
of AS defined by semantics ||- || as, i.e., ||a|| as CU 
for any a € F. If AS = (U, A) then we will also write 
||o||v instead of ||æ|| as. Now, one can take the set 


Ne(x) = {a CFix€ lla|l as} ; (21.14) 
and I(x) = {||æ|| as:a € Ny (x)}. Hence, more general 
uncertainty functions with values in POW(POW(U)) 
can be defined and in consequence different definitions 
of approximations are considered. For example, one can 


consider the following definitions of approximation op- 
erations in this approximation space AS 


LOW(ASo,X) 
= {x € U: v(Y,X) = 1 for some Y € I(x)}, (21.15) 
UPP(ASo,X) 
= {x € U: v(Y,X) > 0 for any Y € I(x)}. (21.16) 


There are also different forms of rough inclusion 
functions. Let us consider two examples. In the first 
example of a rough inclusion function, a threshold t € 
(0, 0.5) is used to relax the degree of inclusion of sets. 


341 


LULZ |) Wed 


342 


LULZ |) Hed 


Part C 


Rough Sets 


The rough inclusion function v; is defined by 


v; (X, Y) 
1 if vsr (X, Y) > 1-t, 
vsri (X,Y)—t . 
= | —————— ift< X, Y)<1l-~-t, 
i7 if t < vsri ( ) 
(0) if vsrı (X, Y) <t. 


(21:17) 


One can obtain approximations considered in the 
variable precision rough set approach (VPRSM) by sub- 
stituting in (21.12) and (21.13) the rough inclusion 
function v, defined by (21.17) instead of v, assuming 
that Y is a decision class and J(x) = B(x) for any object 
x, where B is a given set of attributes. Another example 
of application of the standard inclusion was developed 
by using probabilistic decision functions. The rough 
inclusion relation can be also used for function approx- 
imation and relation approximation [21.25]. 

The approach based on inclusion functions has 
been generalized to the rough mereological ap- 
proach [21.26]. The inclusion relation xu,y with the 
intended meaning x is a part of y to a degree at 
least r has been taken as the basic notion of the rough 
mereology being a generalization of the Lesniewski 
mereology [21.27]. 

Usually families of approximation spaces labeled 
by some parameters are considered. By tuning such 
parameters according to chosen criteria (e.g., minimal 
description length) one can search for the optimal ap- 
proximation space for concept approximation. 

Our knowledge about the approximated concepts 
is often partial and uncertain. For example, con- 
cept approximation should be constructed from ex- 
amples and counterexamples of objects for the con- 
cepts [21.14]. Hence, concept approximations con- 
structed from a given sample of objects are extended, 
using inductive reasoning, on objects not yet observed. 
The rough set approach for dealing with concept ap- 
proximation under such partial knowledge is now well 
developed. 

Searching strategies for relevant approximation 
spaces are crucial for real-life applications. They in- 
clude the discovery of uncertainty functions, inclusion 
measures, as well as selection of methods for approxi- 
mations of decision classes and strategies for inductive 
extension of approximations from samples on larger 
sets of objects. 

Approximations of concepts should be constructed 
under dynamically changing environments. This leads 
to a more complex situation where the boundary re- 


gions are not crisp sets, which is consistent with the 
postulate of the higher-order vagueness considered by 
philosophers [21.10]. Different aspects of vagueness in 
the rough set framework have been discussed. 

It is worthwhile mentioning that a rough set ap- 
proach to the approximation of compound concepts has 
been developed. For such concepts, it is hardly pos- 
sible to expect that they can be approximated with 
the high quality by the traditional methods [21.23, 
28]. The approach is based on hierarchical learning 
and ontology approximation. Approximation methods 
of concepts in distributed environments have been de- 
veloped. The reader may find surveys of algorithmic 
methods for concept approximation based on rough sets 
and Boolean reasoning in the literature. 

In several papers, the problem of ontology approx- 
imation was discussed together with possible applica- 
tions to approximation of compound concepts or to 
knowledge transfer. In any ontology [21.29] (vague) 
concepts and local dependencies between them are 
specified. Global dependencies can be derived from lo- 
cal dependencies. Such derivations can be used as hints 
in searching for relevant compound patterns (informa- 
tion granules) in approximation of more compound 
concepts from the ontology. The ontology approxi- 
mation problem is one of the fundamental problems 
related to approximate reasoning in distributed environ- 
ments. One should construct (in a given language that 
is different from the language in which the ontology 
is specified) not only approximations of concepts from 
ontology, but also vague dependencies specified in the 
ontology. It is worthwhile mentioning that an ontology 
approximation should be induced on the basis of in- 
complete information about concepts and dependencies 
specified in the ontology. Information granule calculi 
based on rough sets have been proposed as tools making 
it possible to solve this problem. Vague dependencies 
have vague concepts in premisses and conclusions. 

The approach to approximation of vague dependen- 
cies based only on degrees of closeness of concepts 
from dependencies and their approximations (classi- 
fiers) is not satisfactory for approximate reasoning. 
Hence, more advanced approach should be developed. 
Approximation of any vague dependency is a method 
which for any object allows us to compute the argu- 
ments for and against its membership to the depen- 
dency conclusion on the basis of analogous arguments 
relative to the dependency premisses. Any argument 
is a compound information granule (compound pat- 
tern). Arguments are fused by local schemes (produc- 
tion rules) discovered from data. Further fusions are 


Foundations of Rough Sets | 21.12 Rough Sets and Logic 


possible through composition of local schemes, called 
approximate reasoning schemes (AR) [21.30]. To esti- 
mate the degree to which (at least) an object belongs to 
concepts from ontology the arguments for and against 
those concepts are collected and next a conflict resolu- 
tion strategy is applied to them to predict the degree. 

Several generalizations of the rough set approach 
introduced by Pawlak in 1982 are discussed in this 
handbook in more detail: 


@ The similarity (tolerance)-based rough set approach 
(Chap. 25) 

@ Binary relation based rough sets (Chap. 25) 

@ Neighborhood and covering rough sets (Chap. 25) 

@ The dominance-based rough set approach 
(Chap. 22) 

@ The probabilistic rough set approach and its prob- 
abilistic extension called the variable consistency 
dominance-based rough set approaches (Chap. 24) 

@ Parameterized rough sets based on Bayesian confir- 
mation measures (Chap. 22) 

@ Stochastic rough set approaches (Chap. 22) 

@ Generalizations of rough set approximation opera- 
tions (Chap. 25) 

© Hybridization of rough sets and fuzzy sets 
(Chap. 26) 

@ Rough sets on abstract algebraic structures (e.g., lat- 
tices) (Chap. 25). 


There are some other well-established or emerging 
domains not covered in the chapter where some gener- 
alizations of rough sets are proposed as the basic tools, 
often in combination with other existing approaches. 
Among them are rough sets based on [21.6]: 


i) Incomplete information and/or decision systems 

ii) Nondeterministic information and/or decision sys- 
tems 

iii) The rough set model on two universes 

iv) Dynamic information and/or decision systems 

v) Dynamic networks of information and/or decision 
systems. 


21.12 Rough Sets and Logic 


The father of contemporary logic was the German 
mathematician Gottlob Frege (1848-1925). He thought 
that mathematics should not be based on the notion 


Moreover, rough sets play a crucial role in the de- 
velopment of granular computing (GC) [21.31]. The ex- 
tension to interactive granular computing (IGR) [21.32] 
requires generalization of basic concepts such as in- 
formation and decision systems, as well as methods of 
inducing hierarchical structures of information and de- 
cision systems. 

Let us note that making progress in understanding 
interactive computations is one of the key problems 
in developing high quality intelligent systems working 
in complex environments [21.33]. The current research 
projects aim at developing foundations of IGC based on 
the rough set approach in combination with other soft 
computing approaches, in particular with fuzzy sets. 
The approach is called interactive rough granular com- 
puting (IRGC). In IRGC computations are based on 
interactions of complex granules (c-granules, for short). 
Any c-granule consists of a physical part and a mental 
part that are linked in a special way [21.32]. IRGC is 
treated as the basis for (see [21.6] and references in this 
book): 


i) Wistech Technology, in particular for approximate 
reasoning, called adaptive judgment about proper- 
ties of interactive computations 

ii) Context induction 

iii) Reasoning about changes 

iv) Process mining (this research was inspired 
by [21.34]) 

v) Perception-based computing (PBC) 

vi) Risk management in computational systems 
[21.32]. 


Interactive computations based on c-granules seem 
to create a good background, e.g., for modeling com- 
putations in Active Media Technology (AMT) and 
Wisdom Web of Things (W2T). We plan to investigate 
their role for foundations of natural computing too. Let 
us also mention that the interactive computations based 
on c-granules are quite different in nature than Turing 
computations. Hence, we plan to investigate relation- 
ships of interactive computability based on c-granules 
and Turing computability. 


of set but on the notions of logic. He created the first 
axiomatized logical system but it was not understood 
by the logicians of those days. During the first three 


343 


eee | Hed 


344 Part C 


Rough Sets 


ZU'LZ | ) Wed 


decades of the twentieth century there was a rapid 
development in logic, bolstered to a great extent by Pol- 
ish logicians, in particular by Alfred Tarski, Stanislaw 
Leśniewski, Jan Lukasiewicz, and next by Andrzej 
Mostowski and Helena Rasiowa. The development of 
computers and their applications stimulated logical re- 
search and widened their scope. 

When we speak about logic, we generally mean 
deductive logic. It gives us tools designed for de- 
riving true propositions from other true propositions. 
Deductive reasoning always leads to true conclusions 
from true premises. The theory of deduction has well- 
established, generally accepted theoretical foundations. 
Deductive reasoning is the main tool used in mathemat- 
ical reasoning. 

Rough set theory has contributed to some extent to 
various kinds of deductive reasoning. Particularly, var- 
ious kinds of logics based on the rough set approach 
have been investigated; rough set methodology has con- 
tributed essentially to modal logics, many valued logic, 
intuitionistic logic, and others (see, e.g., references in 
the book [21.6] and in articles [21.3, 4]). A summary of 
this research can be found in [21.35, 36] and the inter- 
ested reader is advised to consult these volumes. 

In natural sciences (e.g., in physics) inductive rea- 
soning is of primary importance. The characteristic 
feature of such reasoning is that it does not begin 
from axioms (expressing general knowledge about the 
reality) like in deductive logic, but some partial knowl- 
edge (examples) about the universe of interest are the 
starting point of this type of reasoning, which are gen- 
eralized next and they constitute the knowledge about 
a wider reality than the initial one. In contrast to de- 
ductive reasoning, inductive reasoning does not lead to 
true conclusions but only to probable (possible) ones. 
Also, in contrast to the logic of deduction, the logic of 
induction does not have uniform, generally accepted, 
theoretical foundations as yet, although many impor- 
tant and interesting results have been obtained, e.g., 
concerning statistical and computational learning and 
others. 

Verification of the validity of hypotheses in the logic 
of induction is based on experiment rather than the for- 
mal reasoning of the logic of deduction. Physics is the 
best illustration of this fact. The research on modern 
inductive logic has a several centuries’ long history. It 
is worthwhile mentioning here the outstanding English 
philosophers Francis Bacon (1561-1626) and John Stu- 
art Mill (1806-1873) [21.37]. 

The creation of computers and their innovative ap- 
plications essentially contributed to the rapid growth 


of interest in inductive reasoning. This domain is de- 
veloping very dynamically thanks to computer sci- 
ence. Machine learning, knowledge discovery, reason- 
ing from data, expert systems, and others are exam- 
ples of new directions in inductive reasoning. Rough 
set theory is very well suited as a theoretical basis 
for inductive reasoning. Basic concepts of this the- 
ory fit very well to represent and analyze knowledge 
acquired from examples, which can be next used as 
a starting point for generalization. Besides, in fact, 
rough set theory has been successfully applied in many 
domains to find patterns in data (data mining) and 
acquire knowledge from examples (learning from ex- 
amples). Thus, rough set theory seems to be another 
candidate as a mathematical foundation of inductive 
reasoning. 

The most interesting from a computer science point 
of view is common sense reasoning. We use this kind 
of reasoning in our everyday lives, and we face exam- 
ples of such kind of reasoning in newspapers, radio, TV, 
etc., in political, economics, etc., and in debates and 
discussions. 

The starting point for such reasoning is the knowl- 
edge possessed by a specific group of people (com- 
mon knowledge) concerning some subject and intuitive 
methods of deriving conclusions from it. Here we 
do not have the possibility to resolve the dispute by 
means of methods given by deductive logic (reason- 
ing) or by inductive logic (experiment). So the best 
known methods for solving the dilemma are voting, 
negotiations, or even war. See, e.g., Gulliver’s Trav- 
els [21.38], where the hatred between Tramecksan 
(High-Heels) and Slamecksan (Low-Heels) or disputes 
between Big-Endians and Small-Endians could not be 
resolved without a war. These methods do not reveal 
the truth or falsity of the thesis under consideration at 
all. Of course, such methods are not acceptable in math- 
ematics or physics. Nobody is going to solve the truth 
of Fermat’s theorem or Newton’s laws by voting, nego- 
tiations, or declare a war. 

Reasoning of this kind is the least studied from 
the theoretical point of view and its structure is not 
sufficiently understood, in spite of many interesting the- 
oretical research in this domain [21.39]. The meaning of 
commonsense reasoning, considering its scope and sig- 
nificance for some domains, is fundamental, and rough 
set theory can also play an important role in it, but more 
fundamental research must be done to this end. In par- 
ticular, the rough truth introduced and studied in [21.40] 
seems to be important for investigating commonsense 
reasoning in the rough set framework. 


Foundations of Rough Sets | 21.12 Rough Sets and Logic 345 


Let us consider a simple example. In the decision 
system considered we assume U = Birds is a set of 
birds that are described by some condition attributes 
from a set A. The decision attribute is a binary attribute 
Flies with possible values yes if the given bird flies 
and no, otherwise. Then, we define the set of abnormal 
birds by Ab, (Birds) = Ax ({x € Birds: Flies(x) = no}). 
Hence, we have, Ab,(Birds) = Birds—A*({x € 
Birds: Flies(x) = yes}) and Birds — Ab, (Birds) = 
A* ({x € Birds: Flies(x) = yes}). This means that for 
normal birds it is consistent, with knowledge repre- 
sented by A, to assume that they can fly, i.e., it is 
possible that they can fly. One can optimize Ab, (Birds) 
using A to obtain minimal boundary region in the 
approximation of {x € Birds: Flies(x) = no}. 

It is worthwhile mentioning that in [21.41] an ap- 
proach was presented that combines the rough sets with 
nonmonotonic reasoning. Some basic concepts are dis- 
tinguished, which can be approximated on the basis of 
sensor measurements and more complex concepts that 
are approximated using so-called transducers defined 
by first-order theories constructed over approximated 
concepts. Another approach to commonsense reason- 
ing was developed in a number of papers. The approach 
is based on an ontological framework for approxima- 
tion. In this approach, approximations are constructed 
for concepts and dependencies between the concepts 
represented in a given ontology, expressed, e.g., in nat- 
ural language. Still another approach combining rough 
sets with logic programming has been developed. Let us 
also note that Pawlak proposed a new approach to con- 
flict analysis [21.42]. The approach was next extended 
in the rough set framework. 

To recapitulate, let us consider the following char- 
acteristics of the three above-mentioned kinds of rea- 
soning: 


a) Deductive 
1) Reasoning methods: axioms and rules of infer- 
ence 
ii) Applications: mathematics 
iii) Theoretical foundations: complete theory 
iv) Conclusions: true conclusions from true pre- 
misses 
v) Hypotheses verification: formal proof 
b) Inductive 
i) Reasoning methods: generalization from exam- 
ples 
ii) Applications: natural sciences (physics) 
iii) Theoretical foundations: lack of generally ac- 
cepted theory 


iv) Conclusions: not true but probable (possible) 
v) Hypotheses verification: empirical experiment 
c) Common sense 

i) Reasoning methods: reasoning method based on 
common sense knowledge with intuitive rules of 
inference expressed in natural language 

ii) Applications: everyday life, humanities 

iii) Theoretical foundations: lack of generally ac- 
cepted theory 

iv) Conclusions: obtained by mixture of deduc- 
tive and inductive reasoning based on concepts 
expressed in natural language, e.g., with ap- 
plication of different inductive strategies for 
conflict resolution (such as voting, negotiations, 
cooperation, war) based on human behavioral 
patterns 

v) Hypotheses verification: human behavior. 


There are numerous issues related to approximate 
reasoning under uncertainty. These issues are discussed 
in books on granular computing, rough mereology, and 
the computational complexity of algorithmic problems 
related to these issues. For more details, the reader is 
referred to the following books [21.26, 31, 43, 44]. 

Finally, we would like to stress that still much more 
work should be done to develop approximate reasoning 
about complex vague concepts to make progress in the 
development of intelligent systems. According to Leslie 
Valiant [21.45] (who is the 2011 winner of the ACM 
Turing Award, for his fundamental contributions to the 
development of computational learning theory and to 
the broader theory of computer science): 


A fundamental question for artificial intelligence is 
to characterize the computational building blocks 
that are necessary for cognition. A specific chal- 
lenge is to build on the success of machine learning 
so as to cover broader issues in intelligence ... This 
requires, in particular a reconciliation between two 
contradictory characteristics — the apparent logi- 
cal nature of reasoning and the statistical nature of 
learning. 


It is worthwhile presenting two more views. The 
first one by Lotfi A. Zadeh, the founder of fuzzy sets 
and the computing with words (CW) paradigm [21.46]: 


Manipulation of perceptions plays a key role in hu- 
man recognition, decision and execution processes. 
As a methodology, computing with words provides 
a foundation for a computational theory of per- 


eee |) Hed 


346 PartC 


Rough Sets 


ZU'LZ |) Hed 


ceptions — a theory which may have an important 
bearing on how humans make and machines might 
make — perception-based rational decisions in an 
environment of imprecision, uncertainty and partial 
truth. ... computing with words, or CW for short, is 
a methodology in which the objects of computation 
are words and propositions drawn from a natural 
language. 


The other view is that of Judea Pearl [21.47] (the 
2011 winner of the ACM Turing Award, the high- 
est distinction in computer science, for fundamental 
contributions to artificial intelligence through the de- 
velopment of a calculus for probabilistic and causal 
reasoning): 


Traditional statistics is strong in devising ways of 
describing data and inferring distributional param- 
eters from sample. Causal inference requires two 
additional ingredients: a science-friendly language 
for articulating causal knowledge, and a mathe- 
matical machinery for processing that knowledge, 
combining it with data and drawing new causal 
conclusions about a phenomenon. 


The question arises about the logic relevant for the 
above-mentioned tasks. First, let us observe that the 
satisfiability relations in the IRGC framework can be 
treated as tools for constructing new granules. In fact, 
for a given satisfiability relation one can define the se- 
mantics of formulae related to this relation, i. e., which 
are the candidates for the new relevant granules. We 
would like to emphasize one a very important feature. 
The relevant satisfiability relation for the considered 
problems is not given but it should be induced (discov- 
ered) from partial information given by information or 
decision systems. For real-life problems it is often nec- 
essary to discover a hierarchy of satisfiability relations 
before we obtain the relevant target one. Granules con- 
structed on different levels of this hierarchy finally lead 
to relevant ones for approximation of complex vague 
concepts related to complex granules expressed using 
natural language. 

The reasoning making it possible to derive rel- 
evant c-granules for solutions of the target tasks is 
called adaptive judgment. Intuitive judgment and ra- 


tional judgment are distinguished as different kinds 
of judgment [21.48]. Deduction and induction as well 
as abduction or analogy-based reasoning are involved 
in adaptive judgment. Among the tasks for adaptive 
judgment are the following ones, which support rea- 
soning under uncertainty toward: searching for relevant 
approximation spaces, discovery of new features, se- 
lection of relevant features, rule induction, discovery 
of inclusion measures, strategies for conflict resolu- 
tion, adaptation of measures based on the minimum 
description length principle, reasoning about changes, 
perception (action and sensory) attributes’ selection 
by agent control, adaptation of quality measures over 
computations relative to agents, adaptation of object 
structures, discovery of relevant contexts, strategies for 
knowledge representation and interaction with knowl- 
edge bases, ontology acquisition and approximation, 
learning in dialog of inclusion measures between gran- 
ules from different languages (e.g., the formal language 
of the system and the user’s natural language), strate- 
gies for adaptation of existing models, strategies for 
development and evolution of communication language 
among agents in distributed environments, strategies for 
risk management in distributed computational systems. 
Definitely, in the language used by agents for deal- 
ing with adaptive judgment (i. e., intuitive and rational) 
some deductive systems known from logic may be ap- 
plied for reasoning about knowledge relative to closed 
worlds. This may happen, e.g., if the agent languages 
are based on classical mathematical logic. However, if 
we move to interactions in open worlds, then new spe- 
cific rules or patterns relative to a given agent or group 
of agents in such worlds should be discovered. The pro- 
cess of inducing such rules or patterns is influenced 
by uncertainty because they are induced by agents un- 
der uncertain and/or imperfect knowledge about the 
environment. 

The concepts discussed, such as interactive com- 
putation and adaptive judgment, are among the basic 
concepts in Wisdom Technology (WisTech) [21.49, 50]. 
Let us mention here the WisTech meta-equation 


WISDOM = INTERACTIONS 
+ ADAPTIVE JUDGMENT 


+ KNOWLEDGE . (21.18) 


Foundations of Rough Sets 


References 


21.13 Conclusions 


In the chapter, we have discussed some basic issues and 
methods related to rough sets together with some gen- 
eralizations, including those related to relationships of 
rough sets with inductive reasoning. We have also listed 


some current research directions based on interactive 
rough granular computing. For more details, the reader 
is referred to the literature cited at the beginning of this 
chapter (see also [21.9]). 


References 

21,1 Z. Pawlak: Rough Sets: Theoretical Aspects of Rea- 21.17 J. Barwise, J. Seligman: Information Flow: The 
soning about Data, Theory and Decision Library D, Logic of Distributed Systems (Cambridge Univ. Press, 
Vol. 9 (Kluwer, Dordrecht 1991) Cambridge 1997) 

21.2 Z. Pawlak: Rough sets, Int. J. Comp. Inform. Sci. 11, 21.18 L.A. Zadeh: Fuzzy sets, Inform. Control 8, 338-353 
341-356 (1982) (1965) 

21.3 Z. Pawlak, A. Skowron: Rudiments of rough sets, 21.19 G. Boole: The Mathematical Analysis of Logic 
Inform. Sci. 177(1), 3-27 (2007) (G. Bell, London 1847), Reprinted by Philosophical 

21.4 Z. Pawlak, A. Skowron: Rough Sets: Some Exten- Library, New York 1948 
sions, Inform. Sci. 177(1), 28-40 (2007) 21.20 G. Boole: An Investigation of the Laws of Thought 

21.5 Z. Pawlak, A. Skowron: Rough sets and Boolean (Walton, London 1954) 
reasoning, Inform. Sci. 177(1), 41-73 (2007) 21.21 F.M. Brown: Boolean Reasoning (Kluwer, Dordrecht 

21.6 A. Skowron, Z. Suraj (Eds.): Rough Sets and In- 1990) 
telligent Systems. Professor Zdzislaw Pawlak in 21.22 _ OD. Slezak: Infobright, online available from: http:// 
Memoriam, Intelligent Systems Reference Library, www. infobright.com/ 

Vol. 42/43 (Springer, Berlin, Heidelberg 2013) 21.23 V. Vapnik: Statistical Learning Theory (Wiley, New 

21.7 |. Chikalov, V. Lozin, I. Lozina, M. Moshkov, York 1998) 

H.S. Nguyen, A. Skowron, B. Zielosko: Three Ap- 21.24 R.S. Michalski: Atheory and methodology of induc- 
proaches to Data Analysis. Test Theory, Rough Sets tive learning, Artif. Intell. 20, 111-161 (1983) 

and Logical Analysis of Data, Intelligent Systems 21.25 J. Stepaniuk: Rough-Granular Computing in 
Reference Library, Vol. 41 (Springer, Berlin, Heidel- Knowledge Discovery and Data Mining (Springer, 
berg 2012) Berlin, Heidelberg 2008) 

21.8 Ch. Cornelis (Ed.): International Rough Sets Society, 21.26 L. Polkowski: Approximate Reasoning by Parts. An 
online available from: http://www.roughsets.org Introduction to Rough Mereology, Intelligent Sys- 

21.9 Z. Suraj: Rough Set Database System, online avail- tems Reference Library, Vol. 20 (Springer, Berlin, 
able from: http://www.rsds.univ.rzeszow. pl Heidelberg 2011) 

21.10 R. Keefe: Theories of Vagueness, Cambridge Studies 21.27 S. Leśniewski: Grungziige eines neuen Systems der 
in Philosophy (Cambridge Univ. Press, Cambridge Grundlagen der Mathematik, Fundam. Math. 14,1- 
2000) 81 (1929) 

21.11 S. Read: Thinking about Logic: An Introduction to 21.28 L. Breiman: Statistical modeling: The two cultures, 
the Philosophy of Logic (Oxford Univ. Press, Oxford Stat. Sci. 16(3), 199-231 (2001) 

1994) 21.29 S. Staab, R. Studer: Handbook on Ontologies, In- 

21.12 G. Frege: Grundgesetze der Arithmetik, Vol. 2 (Ver- ternational Handbooks on Information Systems 
lag von Hermann Pohle, Jena 1903) (Springer, Berlin 2004) 

21.13 G.W. Leibniz: Discourse on Metaphysics. In: Philo- 21.30 S.K. Pal, L. Polkowski, A. Skowron: Rough-Neural 
sophical Essays (1686), ed. by R. Ariew, D. Garber Computing: Techniques for Computing with Words, 
(Hackett, Indianapolis 1989) pp. 35-68 Cognitive Technologies (Springer, Berlin, Heidelberg 

21.14 T. Hastie, R. Tibshirani, J.H. Friedman: The Elements 2004) 
of Statistical Learning: Data Mining, Inference, and 21.31 W. Pedrycz, S. Skowron, V. Kreinovich (Eds.): Hand- 
Prediction (Springer, Berlin, Heidelberg 2001) book of Granular Computing (Wiley, Hoboken 2008) 

21.15 J. Rissanen: Modeling by shortes data description, 21.32 A. Jankowski: Practical Issues of Complex Sys- 
Automatica 14, 465-471 (1978) tems Engineering: Wisdom Technology Approach 

21.16 J. tukasiewicz: Die logischen Grundlagen der (Springer, Berlin, Heidelberg 2015), in prepara- 
Wahrscheinlichkeitsrechnung. In: Jan tukasie- tion 
wicz — Selected Works, ed. by L. Borkowski (North 21.33 D. Goldin, S. Smolka, P. Wegner: Interactive Com- 


Holland/Polish Scientific Publishers, Amsterdam, 
Warsaw 1970) pp. 16-63 


putation: The New Paradigm (Springer, Berlin, Hei- 
delberg 2006) 


347 


IZ |) Hed 


348 Part C | Rough Sets 
= 21.34 Z. Pawlak: Concurrent versus sequential - the 21.43 M.J. Moshkov, M. Piliszczuk, B. Zielosko: Partial 
2 rough sets perspective, Bulletin EATCS 48, 178-190 Covers, Reducts and Decision Rules in Rough Sets - 
(aa (1992) Theory and Applications, Studies in Computational 
A 21.35 L. Polkowski: Rough Sets: Mathematical Founda- Intelligence, Vol. 145 (Springer, Berlin, Heidelberg 
N tions, Advances in Soft Computing (Physica, Berlin, 2008) 
Heidelberg 2002) 21.44 P. Delimata, M.J. Moshkov, A. Skowron, Z. Suraj: 
21.36 M. Chakraborty, P. Pagliani: A Geometry of Approx- Inhibitory Rules in Data Analysis: A Rough Set 
imation: Rough Set Theory - Logic, Algebra and Approach, Studies in Computational Intelligence, 
Topology of Conceptual Patterns (Springer, Berlin, Vol. 163 (Springer, Berlin, Heidelberg 2009) 
Heidelberg 2008) 21.45 Web page of Professor Leslie Valiant, online 
21.37 D.M. Gabbay, S. Hartmann, J. Woods (Eds.): Induc- available from: http://people.seas.harvard.edu/ 
tive Logic, Handbook of the History of Logic, Vol. 10 ~valiant/researchinterests.htm 
(Elsevier, Amsterdam 2011) 21.46 L.A. Zadeh: From computing with numbers to com- 
21.38 J. Swift: Gulliver's Travels into Several Remote Na- puting with words - From manipulation of mea- 
tions of the World (anonymous publisher, London surements to manipulation of perceptions, IEEE 
1726) Trans. Circuits Syst. 45, 105-119 (1999) 
21.39 D.M. Gabbay, C.J. Hogger, J.A. Robinson: Non- 21.47 J. Pearl: Causal inference in statistics: An overview, 
monotonic Reasoning and Uncertain Reasoning, Stat. Surv. 3, 96-146 (2009) 
Handbook of Logic in Artificial Intelligence and 21.48 D. Kahneman: Maps of Bounded Rationality: Psy- 
Logic Programming, Vol. 3 (Clarendon, Oxford chology for behavioral economics, Am. Econ. Rev. 
1994) 93, 1449-1475 (2002) 
21.40 Z. Pawlak: Rough logic, Bull. Pol. Ac.: Tech. 35(5/6), 21.49 A. Jankowski, A. Skowron: A wistech paradigm for 
253-258 (1987) intelligent systems, Lect. Notes Comput. Sci. 4374, 
21.41 P. Doherty, W. tukaszewicz, A. Skowron, A. Szałas: 94-132 (2007) 
Knowledge Engineering: A Rough Set Approach, 21.50 A. Jankowski, A. Skowron: Logic for artificial intelli- 
Studies in Fizziness and Soft Computing, Vol. 202 gence: The Rasiowa-Pawlak school perspective. In: 
(Springer, Berlin, Heidelberg 2006) Andrzej Mostowski and Foundational Studies, ed. 
21.42 Z. Pawlak: An inquiry into anatomy of conflicts, In- by A. Ehrenfeucht, V. Marek, M. Srebrny (IOS, Ams- 


form. Sci. 109, 65-78 (1998) 


terdam 2008) pp. 106-143 


22. Rough Set Methodology for Decision Aiding 


Roman Stowinski, Salvatore Greco, Benedetto Matarazzo 


Since its conception, the dominance-based rough 
set approach (DRSA) has been adapted to a large 
variety of decision problems. In this chapter we 
outline the rough set methodology designed for 
multi-attribute decision aiding. DRSA was pro- 
posed as an extension of the Pawlak concept of 
rough sets in order to deal with ordinal data. 
We focus on decision problems where all attributes 
describing objects of a decision problem have 
ordered value sets (scales). Such attributes are 
called criteria, and thus the problems are called 
multi-criteria decision problems. Criteria are real- 
valued functions of gain or cost type, depending 
on whether a greater value is better or worse, re- 
spectively. In these problems, we also assume the 
presence of a well defined decision maker (DM) 
(single of group DM) concerned by multi-criteria 
classification, choice, and ranking. 


22.1 Data Inconsistency as a Reason 
for Using Rough Sets... 350 
22.1.1 From Indiscernibility-Based 
Rough Sets to Dominance-Based 
RUST SOUS. o eeraa eias 3951 
22.2 The Need for Replacing 
the Indiscernibility Relation 
by the Dominance Relation 
when Reasoning About Ordinal Data .... 351 
22.3 The Dominance-based Rough Set 
Approach to Multi-Criteria Classification 353 


Ordinal data are typically encountered in multi-attribute 
decision problems, where a set of objects (also called 
actions, acts, solutions, etc.) evaluated by a set of at- 
tributes (also called criteria, variables, features, etc.) 
raises one of the following questions: (i) how to assign 
the objects to some ordered classes (ordinal classi- 
fication), (ii) how to choose the best subset of ob- 
jects (choice or its particular case — optimization), or 


22.3.1 Granular Computing 
with Dominance Cones.............. 353 
22.3.2 Variable Consistency 
Dominance-Based Rough Set 


Approach (VC-DRSA) ...............006. 355 
22.3.3 Stochastic Dominance-based 

Rough Set Approach .................. 358 
22.3.4 Induction of Decision Rules........ 359 
22.3.5 Rule-based Classification 

ACOE Ss cssetes reste eena 360 


22.4 The Dominance-based Rough Set 

Approach to Multi-Criteria Choice and 
Raning occoron 361 
22.4.1 Differences with Respect 

to Multi-Criteria Classification .... 361 
22.4.2 The Pairwise Comparison Table 

as Input Preference Information. 361 
22.4.3 Rough Approximation 

of Preference Relations.............. 362 
22.4.4 Induction of Decision Rules 

from Rough Approximations 

of Preference Relations.............. 364 
22.4.5 Application of Decision Rules 

to Multi-Criteria Choice 

anid Ranking ss s.sccscssaacetenntcstossnns 365 


22.5 Important Extensions of DRSA.............. 366 
22.6 DRSA to Operational Research Problems 366 


22.7 Concluding Remarks on DRSA Applied 
to Multi-Criteria Decision Problems ...... 367 


ROTETGTNIGES oeiee iiie aia 367 


(iii) how to rank the objects from the best to the worst 
(ranking). The answer to all of these questions in- 
volves an aggregation of the multi-attribute evaluation 
of objects, which takes into account a law relating the 
evaluation with the classification, or choice, or ranking 
decision. This law has to be discovered by inductive 
learning from data describing the considered decision 
situation. In the case of decision problems that corre- 


349 


az 
o 

= 

et 
fon) 
N 
N 


350 PartC 


Rough Sets 


zz |) Hed 


spond to some physical phenomena, this law is a model 
of cause-effect relationships, and in the case of hu- 
man decision making, this law is a decision maker’s 
preference model. In DRSA, these models have the 
form of a set of if..., then... decision rules. In the 
case of multi-attribute classification the syntax of rules 
is: if evaluation of object a is better (or worse) than 
given values of some attributes, then a belongs to at 


22.1 Data Inconsistency as a Reason 


The data describing a given decision situation include 
either observations of DM’s past decisions in the same 
decision context, or examples of decisions consciously 
elicited by the DM on the demand of an analyst. These 
data hides the value system of the DM, and thus they are 
called preference information. This way of preference 
information elicitation is called indirect, in opposition 
to direct elicitation when the DM is supposed to pro- 
vide information leading directly to the definition of all 
preference model parameters, like weights and discrim- 
ination thresholds of criteria, trade-off rates, etc. [22.1]. 

Past decisions or decision examples may, however, 
be inconsistent with the dominance principle com- 
monly accepted for multi-criteria decision problems. 
Decisions are inconsistent with the dominance princi- 
ple if: 


© Incase of ordinal classification: object a has been 
assigned to a worse decision class than object b, al- 
though ais at least as good as b on all the considered 
criteria, i. e., a dominates b. 

@ In the case of choice and ranking: a pair of ob- 
jects (a, b) has been assigned a degree of preference 
worse than pair (c, d), although differences of eval- 
uations between a and b on all the considered crite- 
ria is at least as strong as the respective differences 
of evaluations between c and d, i. e., pair (a, b) dom- 
inates pair (c, d). 


Thus, in order to build a preference model from 
partly inconsistent preference information, we had the 
idea to structure this data using the concept of a rough 
set introduced by Pawlak [22.2, 3]. Originally, however, 
Pawlak’s understanding of inconsistency was different 
to the above inconsistency with the dominance prin- 
ciple. The original rough set philosophy (Chap. 21) 
is based on the assumption that with every object of 
the universe U there is associated a certain amount of 


least (at most) a given class, and in the case of multi- 
attribute choice or ranking: if object a is preferred to 
object b in at least (at most) given degrees with re- 
spect to some attributes, then a is preferred to b in 
at least (at most) a given degree. These models are 
used to work out a recommendation concerning un- 
seen objects in the context of one of the three problem 
statements. 


for Using Rough Sets 


information (data, knowledge). This information can 
be expressed by means of a number of attributes that 
describe the objects. Objects which have the same de- 
scription are said to be indiscernible (or similar) with 
respect to the available information. The indiscernibil- 
ity relation thus generated constitutes the mathematical 
basis of rough set theory. It induces a partition of the 
universe into blocks of indiscernible objects, called el- 
ementary sets, which can be used to build knowledge 
about a real or abstract world. The use of the indiscerni- 
bility relation results in information granulation. 

Any subset X of the universe may be expressed in 
terms of these blocks either precisely (as a union of ele- 
mentary sets) or approximately. In the latter case, the 
subset X may be characterized by two ordinary sets, 
called the lower and upper approximations. A rough 
set is defined by means of these two approximations, 
which coincide in the case of an ordinary set. The lower 
approximation of X is composed of all the elementary 
sets included in X (whose elements, therefore, certainly 
belong to X), while the upper approximation of X con- 
sists of all the elementary sets which have a non-empty 
intersection with X (whose elements, therefore, may be- 
long to X). The difference between the upper and lower 
approximations constitutes the boundary region of the 
rough set, whose elements cannot be characterized with 
certainty as belonging or not to X (by using the avail- 
able information). The information about objects from 
the boundary region is, therefore, inconsistent or am- 
biguous. The cardinality of the boundary region states, 
moreover, the extent to which it is possible to express X 
in exact terms, on the basis of the available information. 
For this reason, this cardinality may be used as a mea- 
sure of vagueness of the information about X. 

Some important characteristics of the rough set ap- 
proach make it a particularly interesting tool in a variety 
of problems and concrete applications. For example, it 
is possible to deal with both quantitative and qualita- 


Rough Set Methodology for Decision Aiding | 22.2 From Indiscernibility to Dominance 351 


tive input data, and inconsistencies need not be removed 
prior to the analysis. In terms of the output informa- 
tion, it is possible to acquire a posteriori information 
regarding the relevance of particular attributes and their 
subsets to the quality of approximation considered 
within the problem at hand. Moreover, the lower and 
upper approximations of a partition of U into decision 
classes prepare the ground for inducing certain and pos- 
sible knowledge patterns in the form of if... then... 
decision rules. 

Several attempts have been made to employ 
rough set theory for decision aiding [22.4,5]. The 
Indiscernibility-based Rough Set Approach (IRSA) is 
not able, however, to handle inconsistencies with re- 
spect to the dominance principle. 


22.1.1 From Indiscernibility-Based Rough 
Sets to Dominance-Based Rough Sets 


An extension of IRSA which deals with inconsisten- 
cies with respect to the dominance principle, which 
are typical for preference data, was proposed by Greco 
et al. in [22.6-8]. This extension is the dominance- 
based rough set approach (DRSA), which is mainly 
based on the substitution of the indiscernibility relation 
by a dominance relation in the rough approximation 
of decision classes. An important consequence of this 
fact is the possibility of inferring (from observations of 
past decisions or from exemplary decisions) the DM’s 


preference model in terms of decision rules which are 
logical statements of the type if..., then .... The sep- 
aration of certain and uncertain knowledge about the 
DM’s preferences is carried out by the distinction of dif- 
ferent kinds of decision rules, depending upon whether 
they are induced from lower approximations of de- 
cision classes or from the difference between upper 
and lower approximations (composed of inconsistent 
examples). Such a preference model is more general 
than the classical functional models considered within 
multi-attribute utility theory or the relational models 
considered, for example, in outranking methods [22.9— 
11]. 

This chapter is based on previous publications of 
the authors, in particular, on [22.12-14]. In the next 
section, we explain the need for replacing the indis- 
cernibility relation by the dominance relation in the 
definition of rough sets when reasoning about ordi- 
nal data. This leads us to Sect. 22.3, where DRSA is 
presented with respect to multi-criteria ordinal classifi- 
cation. This section also includes two special versions 
of DRSA: variable consistency DRSA (VC DRSA) and 
stochastic DRSA. Section 22.4 presents DRSA with re- 
spect to multi-criteria choice and ranking. Section 22.5 
characterizes some relevant extensions of DRSA, and 
Sect. 22.6 presents applications of DRSA to some op- 
erational research problems. Section 22.7 summarizes 
the features of DRSA applied to multi-criteria decision 
problems and concludes the chapter. 


22.2 The Need for Replacing the Indiscernibility Relation 
by the Dominance Relation when Reasoning About Ordinal Data 


When trying to apply the rough set concept based on 
indiscernibility to reasoning about preference ordered 
data, it has been noted that IRSA ignores not only 
the preference order in the value sets of attributes but 
also the monotonic relationship between evaluations of 
objects on such attributes (called criteria) and the pref- 
erence ordered value of decision (classification decision 
or degree of preference) [22.6, 15-17]. 

In order to explain the importance of the above 
monotonic relationship for data describing multi- 
criteria decision problems, let us consider the example 
of a data set concerning pupils’ achievements in a high 
school. Suppose that among the criteria used for eval- 
uation of the pupils there are results in Mathematics 
(Math) and Physics (Ph). There is also a General 


Achievement (GA) result, which is considered as a clas- 
sification decision. The value sets of all three criteria 
are composed of three values: bad, medium, and good. 
The preference order of these values is obvious: good 
is better than medium and bad, and medium is better 
than bad. The three values bad, medium, and good can 
be number-coded as 1, 2, and 3, respectively, making 
a gain-type criterion scale. One can also notice a seman- 
tic correlation between the two criteria and the classi- 
fication decision, which means that an improvement in 
one criterion should not worsen the classification de- 
cision, while the other criterion value is unchanged. 
Precisely, an improvement of a pupil’s score in Math 
or Ph, with other criterion value unchanged, should not 
worsen the pupil’s general achievement (GA), but rather 


TZ |) Hed 


352 


TZ |) Hed 


Part C 


Rough Sets 


improve it. In general terms, this requirement is concor- 
dant with the dominance principle defined in Sect. 22.1. 

This semantic correlation is also called a mono- 
tonicity constraint, and thus, an alternative name of 
the classification problem with semantic correlation be- 
tween evaluation criteria and classification decision is 
ordinal classification with monotonicity constraints. 

Two questions naturally follow the consideration of 
this example: 


@ What classification rules can be drawn from the 
pupils’ data set? 

@ How does the semantic correlation influence the 
classification rules? 


The answer to the first question is: monotonic if 
..., then... decision rules. Each decision rule is char- 
acterized by a condition profile and a decision profile, 
corresponding to vectors of threshold values on evalua- 
tion criteria and on classification decision, respectively. 
The answer to the second question is that condition 
and decision profiles of a decision rule should observe 
the dominance principle (monotonicity constraint) if 
the rule has at least one pair of semantically correlated 
criteria spanned over the condition and decision part. 
We say that one profile dominates another if the values 
of criteria of the first profile are not worse than the val- 
ues of criteria of the second profile. 

Let us explain the dominance principle with respect 
to decision rules on the pupils’ example. Suppose that 
two rules induced from the pupils’ data set relate Math 
and Ph on the condition side, with GA on the decision 
side: 


© rule #1: if Math = medium and Ph = medium, then 
GA = good, 

© rule #2: if Math = good and Ph = medium, then 
GA = medium. 


The two rules do not observe the dominance princi- 
ple because the condition profile of rule #2 dominates 
the condition profile of rule #1, while the decision pro- 
file of rule #2 is dominated by the decision profile of 
rule #1. Thus, in the sense of the dominance principle, 
the two rules are inconsistent, i. e., they are wrong. 

One could say that the above rules are true because 
they are supported by examples of pupils from the an- 
alyzed data set, but this would mean that the examples 
are also inconsistent. The inconsistency may come from 
many sources. Examples include: 


© Missing attributes (regular ones or criteria) in the 
description of objects. Maybe the data set does not 


include such attributes as the opinion of the pupil’s 
tutor expressed only verbally during an assessment 
of the pupil’s GA by a school assessment commit- 
tee. 

@ Unstable preferences of decision makers. Maybe 
the members of the school assessment committee 
changed their view on the influence of Math on GA 
during the assessment. 


Handling these inconsistencies is of crucial impor- 
tance for data structuring prior to induction of decision 
rules. They cannot be simply considered as noise or er- 
ror to be eliminated from data, or amalgamated with 
consistent data by some averaging operators. They 
should be identified and presented as uncertain rules. 

If the semantic correlation was ignored in prior 
knowledge, then the handling of the above-mentioned 
inconsistencies would be impossible. Indeed, there 
would be nothing wrong with rules #1 and #2. They 
would be supported by different examples discerned by 
the attributes considered. 

It has been acknowledged by many authors that 
rough set theory provides an excellent framework 
for dealing with inconsistencies in knowledge dis- 
covery [22.3, 18-24]. These authors show that the 
paradigm of rough set theory is that of granular com- 
puting, because the main concept of the theory (rough 
approximation of a set) is built up of blocks of ob- 
jects which are indiscernible by a given set of attributes, 
called granules of knowledge. In the space of regu- 
lar attributes, the indiscernibility granules are bounded 
sets. Decision rules induced from indiscernibility-based 
rough approximations are also built up of such granules. 

It appears, however, as demonstrated by the above 
pupils’ example, that rough sets and decision rules built 
up of indiscernibility granules are not able to handle 
inconsistency with respect to the dominance principle. 
For this reason, we have proposed an extension of the 
granular computing paradigm that enables us to take 
into account prior knowledge about multi-criteria eval- 
uation with monotonicity constraints. The combination 
of the new granules with the idea of rough approxima- 
tion is the DRSA approach [22.6, 8, 12-16, 25-27]. 

In the following, we present the concept of granules, 
which permit us to handle prior knowledge about multi- 
criteria evaluation with monotonicity constraints when 
inducing decision rules. 

Let U be a finite set of objects (universe) and let Q 
be a finite set of attributes divided into a set C of 
condition attributes and a set D of decision attributes, 
where CN D = Ø. Also, let X4 be the set of possible 


Rough Set Methodology for Decision Aiding | 22.3 The Dominance-based Rough Set Approach to Multi-Criteria Classification 


evaluations of considered objects with respect to at- 
tribute g € Q, so that 


are attribute spaces corresponding to sets of condi- 
tion and decision attributes, respectively. The elements 
of Xc and Xp can be interpreted as possible evaluations 
of objects on attributes from set C = {1,...,|C|} and 
from set D = {1,...,|D|}, respectively. In the follow- 
ing, with a slight abuse of notation, we shall denote the 
value of object x € U on attribute q € Q by x4. 

Suppose, for simplicity, that all condition attributes 
in C and all decision attributes in D are criteria, and 
that C and D are semantically correlated. 

Let =, be a weak preference relation on U, repre- 
senting a preference on the set of objects with respect 
to criterion q € {CUD}. Now, x4 = yg means x, is at 
least as good as y4 with respect to criterion q. On the 
one hand, we say that x dominates y with respect to P C 
C (shortly, x P-dominates y) in the condition attribute 
space Xp (denoted by xDpy) if x4 = yq for all q € P. 
Assuming, without loss of generality, that the domains 
of the criteria are number-coded (i. e., X, C R for any 
q € C) and that they are ordered so that the preference 
increases with the value (gain-type), we can say that 
xDpy is equivalent to x4 = yg for all q € P, P C C. Ob- 
serve that for each x € Xp, xDpx, i.e., P-dominance Dp 
is reflexive. Moreover, for any x,y,z € Xp, xDpy and 
yDpz imply xDpz, i.e., P-dominance Dp is a transitive 
relation. Being a reflexive and transitive relation, P- 
dominance Dp is a partial preorder. On the other hand, 
the analogous definition holds in the decision attribute 
space Xr, R C D, where x4 = y, for all q € R will be de- 
noted by xDry. 


The dominance relations xDpy and xDry (PCC 
and R C D) are directional statements where x is a sub- 
ject and y is a referent. 

If x € Xp is the referent, then one can define a set of 
objects y € Xp dominating x, called the P-dominating 
set (denoted by De (x)) and defined as D7 (x) = {ye 
U: yDpx}. If x € Xp is the subject, then one can de- 
fine a set of objects y € Xp dominated by x, called the 
P-dominated set (denoted by Dp (x)) and defined as 
Dp (x) = {y € U: xDpy}. 

P-dominating sets Dt (x) and P-dominated sets 
Dp (x) correspond to positive and negative dominance 
cones in Xp, with the origin x. 

With respect to the decision attribute space Xp 
(where R C D), the R-dominance relation enables us to 
define the following sets 

Ch” = {y€ U:yDr}, Cle” = {y € U: xD py} . 
Cl = {x € Xp: Xq = tq} is a decision class with respect 
to q € D. Clz" is called the upward union of classes, 
and ce is the downward union of classes. If y € CIZ”, 
then y belongs to class Cli, Xq = tg, or better, on each 
decision attribute g € R. On the other hand, if y € CE’, 
then y belongs to class Cli, Xq = tg, Or worse, on each 
decision attribute q € R. The downward and upward 
unions of classes correspond to the positive and neg- 
ative dominance cones in Xp, respectively. 

In this case, the granules of knowledge are open 
sets in Xp and Xp defined by dominance cones De (x), 
Dp (x) (P © C) and ce. Cc (R C D), respectively. 
Then, classification rules to be induced from data are 
functions representing granules Clz*, Cl“ by gran- 
ules Dt (x), Dp (x), respectively, in the condition at- 
tribute space Xp, for any P C C and R C D and for any 
x € Xp. 


22.3 The Dominance-based Rough Set Approach 


to Multi-Criteria Classification 


22.3.1 Granular Computing 
with Dominance Cones 


When inducing classification rules, a set D of deci- 
sion attributes is, usually, a singleton, D = {d}. Let 
us make this assumption for further presentation, al- 
though it is not necessary for DRSA. The decision 
attribute d makes a partition of U into a finite number 
of classes, Cl = {Cl;, t= 1,...,n}. Each object x € U 


belongs to one and only one class, Cl, € Cl. The upward 
and downward unions of classes boil down to, respec- 
tively, 


353 


EZZ |) Hed 


354 PartC 


EZZ |) Hed 


Rough Sets 
where t = 1,...,n. Notice that for t = 2,...,n we have 
CI} = U — CIŠ |, i.e., all the objects not belonging to 


class CI, or better, belong to class Cl,—; or worse. 

Let us explain how the rough set concept has been 
generalized in DRSA, so as to enable granular comput- 
ing with dominance cones. 

Given a set of criteria, P C C, the inclusion of an 
object x € U to the upward union of classes CIF, t = 
2,...,N, is inconsistent with the dominance principle if 
one of the following conditions holds: 


@ x belongs to class C1, or better but it is P-dominated 
by an object y belonging to a class worse than Cl,, 
i.e., x € CI= but Dt (x) N CIE, £9; 

@ x belongs to a worse class than Cl, but it P- 
dominates an object y belonging to class Cl, or 
better, i. e., x ¢ CIF but Dp (x) N CIF # ð. 


If, given a set of criteria P C C, the inclusion of x € 
U to CÈ, where t = 2,...,n, is inconsistent with the 
dominance principle, we say that x belongs to C= with 
some ambiguity. Thus, x belongs to CIF without any 
ambiguity with respect to P C C, if x € CI= and there 
is no inconsistency with the dominance principle. This 
means that all objects P-dominating x belong to C/-, 
1.e., Dy (x) © cÈ. Geometrically, this corresponds to 
the inclusion of the complete set of objects contained 
in the positive dominance cone originating in x, in the 
positive dominance cone C/? originating in Cl,. 

Furthermore, x possibly belongs to CIF with respect 
to P C C if one of the following conditions holds: 


© According to decision attribute d, x belongs to CIF. 
© According to decision attribute d, x does not belong 
toc [= , but it is inconsistent in the sense of the dom- 
inance principle with an object y belonging to C/-. 


In terms of ambiguity, x possibly belongs to Cl 
with respect to PCC, if x belongs to CI? with or 
without any ambiguity. Due to the reflexivity of the P- 
dominance relation Dp, the above conditions can be 
summarized as follows: x possibly belongs to class Cl, 
or better, with respect to P C C, if among the objects 
P-dominated by x there is an object y belonging to 
class Cl, or better, i. e., 


Dp NCF AB. 


Geometrically, this corresponds to the non-empty inter- 
section of the set of objects contained in the negative 
dominance cone originating in x, with the positive dom- 
inance cone CI originating in Cl. 


For P C C, the set of all objects belonging to CIF 
without any ambiguity constitutes the P-lower approx- 
imation of CI=, denoted by P(CI=), and the set of all 
objects that possibly belong to Cl= constitutes the P- 
upper approximation of CI=, denoted by P(CI=). More 
formally 


P(CIZ) = {x € U: DF (x) C CR}, 

P(CI=) = {x € U: Dp (x) O CIF £ Ø}, 
where tf = 1,...,n. Analogously, one can define the P- 
lower approximation and the P-upper approximation of 
Ce 

P(CIF) = {x € U: Dp (x) € CIF}, 

P(CI=) = {x € U: De (x) NCIE £0}, 


where t= 1,...,n. 

The P-lower and P-upper approximations of Ci, 
t=1,...,n, can also be expressed in terms of unions 
of positive dominance cones as follows 

Pciz)= |) OW, 

Dp CC 

PC) = |) Rœ. 

xe Cl= 


Analogously, the P-lower and P-upper approxima- 
tions of CIF, t= 1,...,n, can be expressed in terms of 
unions of negative dominance cones as follows 


PKCp= |] DFO, 
Dp (ECF 

Pee) =| | Bee. 
x€CI= 


The P-lower and P-upper approximations so de- 
fined satisfy the following inclusion properties for each 
te {1,...,n} and for all PCC 


ACE yVeCr Cr), 
P(CIF) C CIF C P(CIȘ) . 


All the objects belonging to Cl= and CIF with some 
ambiguity constitute the P-boundary of Cl7 and CIF, 
denoted by Bnp(C/=) and Bnp(CI=), respectively. They 


Rough Set Methodology for Decision Aiding | 22.3 The Dominance-based Rough Set Approach to Multi-Criteria Classification 


can be represented, in terms of upper and lower approx- 
imations, as follows 


Bnp(Cl=) = P(CIF) — P(CIF) , 
Bnp(CI=) = P(CI=) — P(CI=) , 


where t = 1,...,n. The P-lower and P-upper approx- 
imations of the unions of classes C/= and CIF have 
an important complementarity property. It says that if 
object x belongs without any ambiguity to class Cl, 
or better, then it is impossible that it could belong to 
class Cl,—; or worse, i. e., 


P(CIĪž) = U- PCE) .. t=2,...,n. 


Due to the complementarity property, Bnp (C17) = 
Bnp(Cl=,), for f= 2,...,n, which means that if x be- 
longs with ambiguity to class Cl, or better, then it also 
belongs with ambiguity to class C/,—; or worse. 

Considering application of the lower and the upper 
approximations based on dominance Dp, P C C, to any 
set X C U, instead of the unions of classes Cl= and 
CIF, one obtains upward lower and upper approxima- 
tions P=(X) and P= (X), as well as downward lower 
and upper approximations P=(X) and P“ (X), as fol- 
lows 


P= (X) = {x € UDI (x) CX}, 
P= (X) = {x € U: Dp (X) NX ZB}, 
PSX) = {x € U: Dp (x) CX}, 
P= (X) = {x € UDI (xX) NX ZO}. 


From the definition of rough approximations 
P=(X), P= (X), P=(X) and P= (X), we can also obtain 
the following properties of the P-lower and P-upper ap- 
proximations [22.28, 29]: 


1. PØ) =P 0 = P=(0) = O=, 
P= (U) = P-(U) = P=(U) = P“ (U) =U, 
2. P=(XUY) =P" QUY), 
P=(X UY) = P=(X)UP=(Y), 
3. P=(XNY) = P=(X)NP=(¥), 
P=(XNY) = P=(X)N P=(Y), 
4. XCY=>P-(X)CP*(Y), 
XovSP Wer Y), 
5. X CY = P=(X) C P=(Y), 
XCY=> PX(X)C PRY), 


6. P=(XUY) > P®(X) UP=(¥), 
PS(X UY) 2 P=(X)UPS(¥), 

7. P=(XNY) Cc P=(X)NP=(Y), 
P=(XNY) C P=(X)NP=(Y), 

8. P=(P=(X)) = Po PEX) = PEX), 
P= (P= (X)) = P> (P=(X)) = P=(X), 

9. EPA) = P= (P(X) = P=), 
PPTX) = PEP X) = P(X). 


From the knowledge discovery point of view, 
P-lower approximations of unions of classes rep- 
resent certain knowledge provided by criteria from 
P CC, while P-upper approximations represent possi- 
ble knowledge and the P-boundaries contain doubtful 
knowledge provided by the criteria from P C C. 


22.3.2 Variable Consistency 
Dominance-Based Rough Set 
Approach (VC-DRSA) 


The above definitions of rough approximations are 
based on a strict application of the dominance princi- 
ple. However, when defining non-ambiguous objects, 
it is reasonable to accept a limited proportion of neg- 
ative examples, particularly for large data tables. This 
relaxed version of DRSA is called the variable con- 
sistency dominance-based rough set approach (VC- 
DRSA) model [22.30]. 

For any P C C, we say that x € U belongs to CIF 
with no ambiguity at consistency level le (0, 1], if x € 
CIF and at least / * 100% of all objects y € U dominat- 
ing x with respect to P also belong to CIF, i.e., 


Pe) ACF. , 
IDE (x)| 


The term IDF x) N CIž|/|DF (x)| is called rough 
membership and can be interpreted as conditional prob- 
ability Pr(y € CIF | y€ DF (x)). The level / is called the 
consistency level because it controls the degree of con- 
sistency between objects qualified as belonging to CI 
without any ambiguity. In other words, if / < 1, then at 
most (1 —/) x 100% of all objects y € U dominating x 
with respect to P do not belong to C/= and thus contra- 
dict the inclusion of x in CIF. 

Analogously, for any P C C we say that x € U be- 
longs to CIF with no ambiguity at consistency level 
Le (0, 1], if x € CIF and at least /* 100% of all the ob- 
jects y € U dominated by x with respect to P also belong 


355 


EZZ |) Hed 


356 Part C 


Rough Sets 


EZZ |) Hed 


to Cie, kës 


Dr) ACI 
[Dp (I 


The rough membership |D; (x) NCI=|/|Dp (x)| can 
be interpreted as conditional probability Pr(y € CIF | 
y € D; (x)). Thus, for any P C C, each object x€ U 
is either ambiguous or non-ambiguous at consistency 
level / with respect to the upward union CIF (t= 
2,...,n) or with respect to the downward union CIF 
@=1,...,n— 1). 

The concept of non-ambiguous objects at some 
consistency level / naturally leads to the definition of 
P-lower approximations of the unions of classes CŽ 
and Cl, which can be formally presented as follows 


Dt (x) N CIZ 
Pict) = rece: ! ae) lei. 
|Dp (x)| 
Dz (x) N CF 
P! (CIE) = rege nal] 
|D} (x)| 


Given P C C and consistency level /, we can define 
the P-upper approximations of Cl= and CIF, denoted 
by P'(CIF) and P'(CI=), respectively, by complemen- 
tation of PCT) and PCa a) with respect to U as 
follows 


P(Ci2) = U—P\(CIE,) .t=2,....n, 


P (CIS) = U-P\(CE 


ei PH 1am. 


P (Cl) can be interpreted as the set of all the ob- 
jects belonging to Cl=, which are possibly ambiguous 
at consistency level /. Analogously, P (CIF) can be in- 
terpreted as the set of all the objects belonging to CIF, 
which are possibly ambiguous at consistency level /. 


The P-boundaries (P-doubtful regions) of Cl= and CIS 
are defined as 


Bnp(CIZ) = P'(CI2) 
Bnp(ClZ) = P (CIE) 


PAGE); 
PCr): 


where t= 1,...,. The VC-DRSA model provides 
some degree of flexibility in assigning objects to lower 
and upper approximations of the unions of decision 


classes. It can easily be demonstrated that for 0 < I/ < 
I<landt=2,...,n, 


P(Clz) c P'(CIZ) and P (CIF) CP(CE). 


The VC-DRSA model was inspired by Ziarko’s 
model of the variable precision rough set ap- 
proach [22.31]. However, there is a significant differ- 
ence in the definition of rough approximations because 
P!(CI=) and P(CIz) are composed of non-ambiguous 
and ambiguous objects at the consistency level /, re- 
spectively, while Ziarko’s P!(Cl,) and P'(CL) are com- 
posed of P-indiscernibility sets such that at least l 
100% of these sets are included in Cl, or have a 
non-empty intersection with Cl,, respectively. If one 
would like to use Ziarko’s definition of variable preci- 
sion rough approximations in the context of multiple- 
criteria classification, then the P-indiscernibility sets 
should be substituted by P-dominating sets Dy (x). 
However, then the notion of ambiguity that natu- 
rally leads to the general definition of rough approx- 
imations [22.21] loses its meaning. Moreover, a bad 
side effect of the direct use of Ziarko’s definition 
is that a lower approximation P! (CIF) may include 
objects y assigned to Cl,, where h is much less 
than ¢, if y belongs to Dr (x), which was included 
in P'(CI=). When the decision classes are preference 
ordered, it is reasonable to expect that objects as- 
signed to far worse classes than the considered union 
are not counted to the lower approximation of this 
union. 

The VC-DRSA model presented above has been 
generalized in [22.32, 33]. The generalized model ap- 
plies two types of consistency measures in the definition 
of lower approximations: 


© Gain-type consistency measures f£ (x), f2,(x) 


P*= (CŽ) = {x EC of) Sta} 
P*= (CIF) = {x € CIF: fL, 0) = war}, 


@ Cost-type consistency measures g, gx) 


PPM CE) = {x € CIF: 82,0 2 b=}, 
PÊ= (CIF) = {x € CIF: g&,(x) = bah, 
where as,;, @<,, B>;, B<,, are threshold values on 


the consistency measures that condition the inclusion 
of object x in the P-lower approximation of Cl, or 


Rough Set Methodology for Decision Aiding | 22.3 The Dominance-based Rough Set Approach to Multi-Criteria Classification 


CIF. Here are the consistency measures considered Table 22.1 Monotonicity properties of consistency mea- 


in [22.33]: for all x € U and P C C 


IDE (x) 0 CF| 


P 
HS) = 
IDF (x)| 
P IDp @) NCIF| 
uo =~ 
= |Dp (x)| 
= IDR NC | 
maS mx —~— 
esr. (Dg | 
zEDR (xX) NCR 
D7 (2AN CIS 
Ba) = max DRONG, 


er, De @I 
zeDt (NCE Á 


pon DEACRA _- 
B, (4) = -F a t= Duna; 
Dp (x) CIZ, | |ClF| 
p Dp (x) 9 Cl | |Cl | 
Ba, (x) = = ees <)? 
Dp (x) CTA |Cl=| 
t=1,...,m—-1, 
itis a eae 
D Cl 
cB (9) = Ee t=2,....m, 
=l 
Dz (x) NCH, 
Peg ZON f= banm- h 
st [= 
ICT || 
; DNCE 
t= £ ICF] E t=2,....m, 
t 
1 D; (x AcE 
= as oe, t=1,...,m—1, 
t 


e$, (x) = max es (x), 

eS 0) = max e4, 0) , 
with 

M), HG), Hs), 

Ta), BEE), BZ) 


being gain-type consistency measures and 


P P ‘Pp iP P P 
es (x) , E<) ’ es (x) , E2) ’ es, (x) ’ B=, (x) 


being cost-type consistency measures. 

To be concordant with the rough set philosophy, 
consistency measures should enjoy some monotonic- 
ity properties (Table 22.1). A consistency measure is 
monotonic if it does not decrease (or does not increase) 
when: 


sures (after [22.33]) 


Consistency (m1) (m2) (m3) (m4) 
measure 

HE, (x) 9 we, (x) no yes yes no 
(rough membership) 

FE, 0) 3 Be, (x) yes yes yes yes 
BE (x) , BE, (x) no no no no 
(Bayesian) _ 

eko) > ekaa) yes yes no yes 
SOO yes yes yes yes 
e) ; eo) yes yes yes yes 


(m1) The set of attributes is growing. 

(m2) The set of objects is growing. 

(m3) The union of ordered classes is growing. 

(m4) x improves its evaluation, so that it dominates 
more objects. 


For every P C C, the objects being consistent in the 
sense of the dominance principle with all upward and 
downward unions of classes are called P-correctly clas- 
sified. For every P C C, the quality of approximation of 
classification Cl by the set of criteria P is defined as the 
ratio between the number of P-correctly classified ob- 
jects and the number of all the objects in the decision 
table. Since the objects which are P-correctly classi- 
fied are those that do not belong to any P-boundary 
of unions Cl= and Ci-, t=1,...,n, the quality of ap- 
proximation of classification Cl by the set of criteria P, 
can be written as 


\(U- (Uet PENNS ny Bnp (CIF )) 


ch= 
yp(Cl) la 
r (Urets,....ny Bre(Cl=)))| 
|U| 
T (U - (Uş PRES ny Bnp (CIF )))| 
|U| 


yp(Cl) can be seen as a measure of the quality of 
knowledge that can be extracted from the decision table, 
where P is the set of criteria and Cl is the classification 
considered. 

Each minimal subset P C C, such that yp(Cl) = 
yc(Cl), is called a reduct of Cl and is denoted by 
RED. Note that a decision table can have more than 
one reduct. The intersection of all reducts is called the 
core and is denoted by CORE ;. Criteria from CORE; 
cannot be removed from the decision table without de- 
teriorating the knowledge to be discovered. This means 
that in set C there are three categories of criteria: 


357 


EZZ |) Hed 


358 PartC 


Rough Sets 


EZZ |) Hed 


© Indispensable criteria included in the core. 

@ Exchangeable criteria included in some reducts but 
not in the core. 

© Redundant criteria that are neither indispensable 
nor exchangeable, thus not included in any reduct. 


Note that reducts are minimal subsets of crite- 
ria conveying the relevant knowledge contained in the 
decision table. This knowledge is relevant for the ex- 
planation of patterns in a given decision table but not 
necessarily for prediction. 

It has been shown in [22.34] that the quality of 
classification satisfies properties of set functions called 
fuzzy measures. For this reason, we can use the quality 
of classification for the calculation of indices that mea- 
sure the relevance of particular attributes and/or criteria, 
in addition to the strength of interactions between them. 
The useful indices are: the value index and interaction 
indices of Shapley and Banzhaf; the interaction indices 
of Murofushi-Soneda and Roubens; and the Mobius 
representation [22.15]. All these indices can help to as- 
sess the interaction between the criteria considered and 
can help us to choose the best reduct. 


22.3.3 Stochastic Dominance-based Rough 
Set Approach 


From a probabilistic point of view, the assignment of 
object x; to at least class t can be made with probabil- 
ity Pr(y; > t | x;), where y; is the classification decision 
for x;,t=1,..., n. This probability is supposed to sat- 
isfy the usual axioms of probability 

Pr(y,; > 1 |x) =1, 

Pr(y; <t | Xi) = 1—Pr(y; >t+1 | x;) , and 

Pry > t| x) < Proz rt |x) fort>zfť. 

These probabilities are unknown but can be estimated 
from data. 

For each class t = 2,...,, we have a binary prob- 
lem of estimating the conditional probabilities Pr(y; > 
t | x;) = 1, Pr(y; < t | x;). It can be solved by isotonic re- 
gression [22.35]. Let y;, = 1 if y; > t, otherwise y; = 0. 
Let also p;, be the estimate of the probability Pr(y; > 
t | x;). Then, choose estimates p+ which minimize the 
squared distance to the class assignment yj, subject to 
the monotonicity constraints 

lUl 
Minimize X Or —piy 
i=1 
subject to pj; = pj for all x;, x; € U such that x; = x, 


where x; = x; means that x; dominates x. 


Then, stochastic œ-lower approximations for classes 
at least t and at most t— 1 can be defined as 


P (CIE ,) = {x; € U: Pr(y; < t | xi) > a}. 


P“ (CIF) = {xj € U: Pry; 2 t | x;) 2 a}, 


Replacing the unknown probabilities Pr(y; > t | x;) 


and Pr(y; < t | x;) by their estimates př and 1 — př ob- 
tained from isotonic regression, we obtain 


P (CIF) = {x; € U: py > a}, 
P% (CI=_,) = {x € U: p= <1-a}, 


where parameter a €[0.5,1] controls the allowed 
amount of inconsistency. 

Solving isotonic regression requires O(|U|*) time, 
but a good heuristic needs only O(|U|*). In fact, as 
shown in [22.35], we do not really need to know the 
probability estimates to obtain stochastic lower approx- 
imations. We only need to know for which object x;, 
pP% >a and for which x;, p% < 1—a. This can be found 
by solving a linear programming (reassignment) prob- 
lem. 

As before, yy = 1 if y; > t, otherwise y; = 0. Let di, 
be the decision variable which determines a new class 
assignment for object x;. Then, reassign objects from 
union of classes indicated by y;, to the union of classes 
indicated by dž, such that the new class assignments 
are consistent with the dominance principle, where dž 
results from solving the following linear programming 
problem 


lU] 
Minimize 5 Wyz [Yir — dirl 
i=1 
subject to di = dy for all x;, x; € U 


such that x; > x, 


where w,, are some positive weights and x; > x; means 
that x; dominates x. 

Due to unimodularity of the constraint matrix, the 
optimal solution of this linear programming problem is 
always integer, i. e., dž € {0, 1}. For all objects consis- 
tent with the dominance principle, dž = yi. If we set 
wo =a@ and w; = a—1, then the optimal solution d% 
satisfies: df = 1 & př >a. If we set wo = 1—a@ and 
wı =a, then the optimal solution d satisfies: df = 
0 <p; <1l-a. 

For each t=2,...,n, solving the reassignment 
problem twice, we can obtain the lower approximations 
P® (CIF), P% (CK), without knowing the probability 
estimates! 


Rough Set Methodology for Decision Aiding | 22.3 The Dominance-based Rough Set Approach to Multi-Criteria Classification 359 


22.3.4 Induction of Decision Rules 


Using the terms of knowledge discovery, the 
dominance-based rough approximations of upward and 
downward unions of classes are applied on the data set 
in the pre-processing stage. In result of this stage, the 
data are structured in a way that facilitates induction 
of if ..., then ... decision rules with a guaranteed 
consistency level. For a given upward or downward 
union of classes, CI? or Cl=, the decision rules 
induced under the hypothesis that objects belonging to 
P(CI=) or P(CI=) are positive and all the others are 
negative, suggests an assignment to class Cl, or better, 
or to class Cl; or worse, respectively. On the other 
hand, the decision rules induced under a hypothesis that 
objects belonging to the intersection P(CI=) N P(C/=) 
are positive and all the others are negative, suggest an 
assignment to some classes between Cl, and Cl,(s < t). 

In the case of preference ordered data it is mean- 
ingful to consider the following five types of decision 
tules: 


1. Certain Ds-decision rules. These provide lower 
profile descriptions for objects belonging to CIF 
without ambiguity: 
if Xq1 Zq1 Tq and Xg Zq Fq and ... Xqp Zqp To 
then x € Ci=, where for each w4, Zq E€ Xq» Wq Zq Z4 
means wq is at least as good as zq. 

2. Possible D>-decision rules. Such rules provide 
lower profile descriptions for objects belonging to 
CIŽ with or without any ambiguity: 
if Xqi Zqi qi and xyz =qr "q2 and ... Xap =qp Top 
then x possibly belongs to CIŽ. 

3. Certain D<-decision rules. These give upper profile 
descriptions for objects belonging to CIF without 
ambiguity: 
if Xg1 Xqi qı and Xq Sq Yq and ... Xqp Xq "ap 
then x € F, where for each wg, Zq E€ Xq» Wq Xq Zq 
means w, is at most as good as Zq. 

4. Possible D<-decision rules. These provide upper 
profile descriptions for objects belonging to CIF 
with or without any ambiguity: 
if Xq1 Xqi qı and Xq Sq Yq and ... Xqp Xap Tap» 
then x possibly belongs to CIF. 

5. Approximate Ds <-decision rules. These represent 
simultaneously lower and upper profile descriptions 
for objects belonging to Cls U Cls+1 U- -+U Cl, with- 
out the possibility of discerning the actual class: 
if Xq1 ql ‘ql and... Xqk > qk Vk and Xgk+1 Nok+1 
Tgk+1 and ... Xap XqpVap, then x € Cl; U Cls U 
-+U Ch. 


In the left-hand side of a D> <-decision rule we can 
have xq =q rq and xq <q T, where ry < Fj for the same 
q€ C. Moreover, if rg = na the two conditions boil 
down to x4 ~q rg, Where for each wy, Zg € Xq» Wa ~q Zq 
means wq is indifferent to zq. 

A rule is minimal if there is no other rule with a left- 
hand side that has at least the same weakness (which 
means that it uses a subset of elementary conditions 
and/or weaker elementary conditions) and which has 
a right-hand side that has at least the same strength 
(which means a D>- or a D<-decision rule assigning 
objects to the same union or sub-union of classes, or 
a Ds <-decision rule assigning objects to the same or 
larger set of classes). 

Rules of type 1) and 3) represent certain knowl- 
edge extracted from the decision table, while the rules 
of type 2) and 4) represent possible knowledge. Rules 
of type 5) represent doubtful knowledge. 

Rules of type 1) and 3) are exact if they do not cover 
negative examples; they are probabilistic, otherwise. In 
the latter case, each rule is characterized by a confi- 
dence ratio, representing the probability that an object 
matching the left-hand side of the rule also matches its 
right-hand side. 

A set of decision rules is complete if it is able to 
cover all objects from the decision table in such a way 
that consistent objects are re-classified to their original 
classes and inconsistent objects are classified to clus- 
ters of classes that refer to this inconsistency. Each set 
of decision rules that is complete and non-redundant is 
called minimal. Note that an exclusion of any rule from 
this set makes it non-complete. 

In the case of VC-DRSA, the decision rules are 
probabilistic because they are induced from the P- 
lower approximations whose composition is controlled 
by the user-specified consistency level /. Consequently, 
the value of confidence œ for the rule should be con- 
strained from the bottom. It is reasonable to require 
that the smallest accepted confidence level of the rule 
should not be lower than the currently used consis- 
tency level /. Indeed, in the worst case, some objects 
from the P-lower approximation may create a rule us- 
ing all the criteria from P, thus giving a confidence 
a>l. 

Observe that the syntax of decision rules induced 
from dominance-based rough approximations uses the 
concept of dominance cones: each condition profile 
is a dominance cone in Xc, and each decision pro- 
file is a dominance cone in Xp. In both cases the 
cone is positive for D>-rules and negative for D<- 
tules. 


EZZ |) Hed 


360 Part C 


Rough Sets 


EZZ |) Hed 


Also note that dominance cones that correspond to 
condition profiles can originate in any point of Xc, with- 
out the risk of being too specific. Thus, in contrast to 
granular computing based on the indiscernibility (or 
similarity) relation, in the case of granular computing 
based on dominance, the condition attribute space Xc 
need not be discretized [22.28, 36, 37]. 

Procedures for induction of rules from dominance- 
based rough approximations have been proposed 
in [22.38, 39]. A publicly available computer imple- 
mentation of one of these procedures is called JMAF 
(java multi-criteria and multi-attribute analysis frame- 
work) [22.40, 41]. 

The utility of decision rules is threefold: they ex- 
plain (summarize) decisions made on objects from the 
dataset, they can be used to make decisions with respect 
to new (unseen) objects which are matching conditions 
of some rules, and they permit to build up a strategy 
of intervention [22.42]. The attractiveness of particular 
decision rules can be measured in many different ways; 
however, the most convincing measures are Bayesian 
confirmation measures enjoying a special monotonicity 
property, as reported in [22.43, 44]. 

In [22.45], a new methodology for the induction of 
monotonic decision trees from dominance-based rough 
approximations of preference ordered decision classes 
was proposed. 

It is finally worth noting that several algebraic mod- 
els have been proposed for DRSA [22.29, 46,47] — 
the algebraic structures are based on bipolar disjoint 
representation (positive and negative) of the interior 
and the exterior of a concept. These algebra mod- 
els give elegant representations of the basic properties 
of dominance-based rough sets. Moreover, a topol- 
ogy for DRSA in a bitopological space was proposed 
in [22.48]. 


22.3.5 Rule-based Classification Algorithms 


We will now comment upon the application of deci- 
sion rules to some objects described by criteria from 
C. When applying Ds -decision rules to an object x, it 
is possible that x either matches the left hand side of 
at least one decision rule or it does not. In the case 
of at least one such match, it is reasonable to con- 
clude that x belongs to class Cl,, because it is the 
lowest class of the upward union C/= which results 
from intersection of all the right hand sides of the rules 
covering x. More precisely, if x matches the left-hand 
side of rules p1, p2,...,Pm, having right-hand sides 


a > ad . . 
xe CF, xeCl5,...,x€Cly,, then x is assigned to 


class Cl,, where t = max{tl, 12,..., tm}. In the case of 
no matching, we can conclude that x belongs to Cl), 
i.e., to the worst class, since no rule with a right-hand 
side suggesting a better classification of x covers this 
object. 

Analogously, when applying D<-decision rules to 
the object x, we can conclude that x belongs either to 
class Cl, (because it is the highest class of the down- 
ward union CIF resulting from the intersection of all 
the right-hand sides of the rules covering x), or to class 
Cl, i.e., to the best class, when x is not covered by 
any rule. More precisely, if x matches the left-hand side 


of rules p1, p2, . . - , Pm, having right-hand sides x € ci. 
xE Cc re a= Ck, then x is assigned to class Cl,, 
where t = min{f1, 2,..., tm}. In the case of no match- 


ing, it is concluded that x belongs to the best class 
Cl,, because no rule with a right-hand side suggesting 
a worse classification of x covers this object. Finally, 
when applying Ds <-decision rules to x, it is possi- 
ble to conclude that x belongs to the union of all the 
classes suggested in the right-hand side of the rules cov- 
ering x. 

A new classification algorithm was proposed 
in [22.49]. Let pı > W,...,@ —> We, be the rules 
matching object x. Then, R,(x) = {j: Cl, € Wy. j = 
1,...,k} denotes the set of rules matching x, which 
recommend assignment of object x to a union includ- 
ing class Cl, and R,(x) = {j: Cl, é Yj = 1,..., k} 
denotes the set of rules matching x, which do not rec- 
ommend assignment of object x to a union including 
class CL. |||, ||y;|| are sets of objects with property 
gj and yy, respectively, j = 1,...,k. For a classified ob- 
ject x, one has to calculate the score for each candidate 
class 


score(Cl;,x) = scoret (Cl;, x) — score” (Cl,, x) , 


where 


2 


[UerwllgllN Ch) 


score (Cl;, x) = 
[Uerw Ihgll] x ICH 


and 


score” (Cl,, x) 
2 
[Uero hall I| 
Uero Hall] [Uero lvl 


Rough Set Methodology for Decision Aiding 


22.4 Dominance-based Rough Set Approach to Choice and Ranking 361 


scoret (Cl,, x) and score™ (Cl;, x) can be interpreted in 
terms of conditional probability as a product of confi- 
dence and coverage of the matching rules 


scoret (Cl, x) = Pr({ø:j € RiQ@)}Cl) 

x Pr(CL Hg: j € Rœ}. 
score” (Cl, x) = Prg: j € Rm} CL) 

x Pr(=CL hø: j € R0} - 


The recommendation of the univocal classification 
x — Cl, is such that 


Examples illustrating the application of DRSA to 
multi-criteria classification in a didactic way can be 
found in [22.12—-14, 50]. 


22.4 The Dominance-based Rough Set Approach 
to Multi-Criteria Choice and Ranking 


22.4.1 Differences with Respect 
to Multi-Criteria Classification 


One of the very first extensions of DRSA concerned 
preference ordered data representing pairwise compar- 
isons (i.e., binary relations) between objects on both, 
condition and decision attributes [22.7, 8, 25,51]. Note 
that while classification is based on the absolute eval- 
uation of objects, choice and ranking refer to pairwise 
comparisons of objects. In this case, the decision rules 
to be discovered from the data characterize a com- 
prehensive binary relation on the set of objects. If 
this relation is a preference relation and if, among the 
condition attributes, there are some criteria which are 
semantically correlated with the comprehensive prefer- 
ence relation, then the data set (serving as the learning 
sample) can be considered as preference information 
provided by a DM in a multi-criteria choice or ranking 
problem. In consequence, the comprehensive prefer- 
ence relation characterized by the decision rules discov- 
ered from this data set can be considered as a preference 
model of the DM. It may be used to explain the decision 
policy of the DM and to recommend an optimal choice 
or preference ranking with respect to new objects. 

Let us consider a finite set A of objects evaluated 
by a finite set C of criteria. The optimal choice (or the 
preference ranking) in set A is semantically correlated 
with the criteria from set C. The preference informa- 
tion concerning the multi-criteria choice or ranking 
problem is a data set in the form of a pairwise com- 
parison table which includes pairs of some reference 
objects from a subset BC A x A. This is described by 
preference relations on particular criteria and a com- 
prehensive preference relation. One such example is 
a weak preference relation called the outranking rela- 
tion. By using DRSA for the analysis of the pairwise 


comparison table, we can obtain a rough approxima- 
tion of the outranking relation by a dominance relation. 
The decision rules induced from the rough approxima- 
tion are then applied to the complete set A of the objects 
associated with the choice or ranking. As a result, one 
obtains a four-valued outranking relation on this set. 
In order to obtain a recommendation, it is advisable 
to use an exploitation procedure based on the net flow 
score of the objects. We present this methodology in 
more detail below. 


22.4.2 The Pairwise Comparison Table 
as Input Preference Information 


Given a multi-criteria choice or ranking problem, a DM 
can express the preferences by pairwise comparisons 
of the reference objects. In the following, xSy denotes 
the presence, while xS°y denotes the absence of the 
outranking relation for a pair of objects (x,y) € A x A. 
Relation xSy reads object x is at least as good as ob- 
ject y. 

For each pair of reference objects (x, y) € BC AxA, 
the DM can select one of the three following possibili- 
ties: 


Object x is as good as y, 1. e., xSy. 

Object x is worse than y, i. e., xS°y. 

3. The two objects are incomparable at the present 
stage. 


Noe 


A pairwise comparison table, denoted by Spcr, is 
then created on the basis of this information. The first 
m columns correspond to the criteria from set C. The 
last, i.e., the (m+ 1)-th column, represents the com- 
prehensive binary preference relation S or S°. The rows 
correspond to the pairs from B. For each pair in Spcr, 


7°72 |) Hed 


362 


HZZ |) Hed 


Part C 


Rough Sets 


a difference between criterion values is put in the cor- 
responding column. If the DM judges that two objects 
are incomparable, then the corresponding pair does not 
appear in Spcr. 

We will define Sper more formally. For any cri- 
terion g; € C, let T; be a finite set of binary relations 
defined on A on the basis of the evaluations of objects 
from A with respect to the considered criterion g;, such 
that for every (x, y) € A x A exactly one binary relation 
t € T; is verified. More precisely, given the domain V; 
of g; € C, if vi, vy’ € V; are the respective evaluations of 
x,y E A by means of g; and (x,y) € t, with t € T;, then 
for each w, z € A having the same evaluations vi, vi’ by 
means of g;, (w,z) € t. Furthermore, let Tz be a set of 
binary relations defined on set A (comprehensive pair- 
wise comparisons) such that at most one binary relation 
t € T4 is verified for every (x,y) E A XA. 

The pairwise comparison table is defined as the data 
table Sper = (B, CU {d}, Tg U Ta, f), where BC AxA 
is a non-empty set of exemplary pairwise comparisons 
of reference objects, Tg = | giec Tis d is a decision cor- 
responding to the comprehensive pairwise comparison 
(comprehensive preference relation), and f : B x (CU 
{d}) + T U T4 is a total function such that f[(x, y), q] € 
T; for every (x,y)E€AXA and for each g; €C, and 
fix, y), q] € Ta for every (x, y) € B. It follows that for 
any pair of reference objects (x, y) € B there is verified 
one and only one binary relation t € Ty. Thus, T4 in- 
duces a partition of B. In fact, the data table Sper can be 
seen as a decision table, since the set C of considered 
criteria and the decision d are distinguished. 

We consider a pairwise comparison table where the 
set Ty is composed of two binary relations defined on A: 


@ x outranks y (denoted by xSy or (x,y) € S), where 
(x,y) € B, 

@ x does not outrank y (denoted by xS‘y or (x, y) € S°), 
where (x, y) € B, and SUS° = B. 


Observe that the binary relation S is reflexive, but 
not necessarily transitive or complete. 


22.4.3 Rough Approximation 
of Preference Relations 


In the following, we will distinguish between two types 
of evaluation scales of criteria: cardinal and ordinal. 
Let C" be the set of criteria expressing preferences 
on a cardinal scale, and let C? be the set of criteria 
expressing preferences on an ordinal scale, such that 
CUC? =C and CNC? = Ø. Moreover, for each 
P C C, we denote by P? the subset of P composed of 


criteria expressing preferences on an ordinal scale, i. e., 
P? = PNC®, and by P” we denote the subset of P com- 
posed of criteria expressing preferences on a cardinal 
scale, i. e., PY = PAC’. Of course, for each P C C, we 
have P= PY U P° and P“ A P? = Ø. 

The meaning of the two scales is such that in the 
case of the cardinal scale we can specify the intensity 
of the preference for a given difference of evaluations, 
while in the case of the ordinal scale, this is not possible 
and we can only establish an order of evaluations. 


Multi-Graded Dominance 

We assume that the pairwise comparisons of reference 
objects on cardinal criteria from set C™ can be rep- 
resented in terms of graded preference relations (for 
example, very weak preference, weak preference, strict 
preference, strong preference, and very strong prefer- 
ence), denoted by Po for each q € C™ and for every 
(x,y) € AXA, T; = {P}, h € H;}, where H; is a particu- 
lar subset of the relative integers and: 


o xP}y, h > 0, means that object x is preferred to ob- 
ject y by degree h with respect to criterion g;. 

e xPly, h < 0, means that object x is not preferred to 
object y by degree h with respect to criterion g;. 

@ xP°y means that object x is similar (asymmetrically 
indifferent) to object y with respect to criterion g;. 


Within the preference context, the similarity rela- 
tion P?, even if not symmetric, resembles the indiffer- 
ence relation. Thus, in this case, we call this similarity 
relation asymmetric indifference. Of course, for each 
gi € C and for every (x, y) E€ Å x A, 


[xPly,h>0] => [Px k<0], 
[xPly, h<0| > [yPkx, k>0]. 


Let P= P^ and P? =Ø. Given PCC (P #®), 
(x, y), (w,z) EA x A, the pair of objects (x,y) is said 
to dominate (w, z) with respect to criteria from P (de- 
noted by (x, y)Dp(w, z)), if x is preferred to y at least 
as strongly as w is preferred to z with respect to each 
gi € P. More precisely, at least as strongly as means by 
at least the same degree, i.e., h> k, where h,k € Hj, 
xP!y, and wPkz, for each g; € P. 

Let D,;; be the dominance relation confined to 
the single criterion g; € P. The binary relation Dy; 
is reflexive ((x, y)Dgi (x, y) for every (x,y) € A xA), 
transitive ((x, y)Dgņ(w,z) and (w,z)Dşņ (u,v) imply 
(x, y)Dgi (u,v) for every (x,y), (w,z), (u,v) E€ A XA), 
and complete ((x, y)D(w,z) and/or (w, z)Din (x, y) 


Rough Set Methodology for Decision Aiding 


22.4 Dominance-based Rough Set Approach to Choice and Ranking 363 


for all (x, y), (w,z) € A x A). Therefore, Dg} is a com- 
plete preorder on AXA. Since the intersection of 
complete preorders is a partial preorder, and Dp = 


AQ gep Dti P G C, the dominance relation Dp is a par- 


tial preorder on A x A. 
Let RCP CC and (x,y), (u,v) € AxA; then the 
following implication holds 


(x, y)Dp(u, v) = (x, y)Dr(u, v). 
Given P C C and (x, y) € A x A, we define the fol- 
lowing: 


@ A set of pairs of objects dominating (x, y), called the 
P-dominating set, denoted by D7 (x, y) and defined 
as {(w,z) E€ A xA: (w, z)Dp(x, y)}. 

@ A set of pairs of objects dominated by (x, y), called 
the P-dominated set, denoted by Dp (x, y) and de- 
fined as {(w, z) € A x A: (x, y)Dp(w, z)}. 


The P-dominating sets and the P-dominated sets 
defined on B for all pairs of reference objects from B 
are granules of knowledge that can be used to express 
P-lower and P-upper approximations of the comprehen- 
sive outranking relations S and S°, respectively, 


P(S) = {a EB DF (uy) ES}, 


P(S)= (J De (uy). 


(yes 
P(S®) = {œ y) € B: Dp (x,y) SS" , 


P(S)= |) Dre»). 


(x.y) ES¢ 
It was proved in [22.7] that 
P(S)CSCP(S), P(S) CS’ CPS). 


Furthermore, the following complementarity properties 
hold 


P(S)=B-P(S*), P(S)=B-P(S), 
PS) =B-—P(S), P(S°) =B — P(S) . 


The P-boundaries (P-doubtful regions) of S and S° 
are defined as 


Bnp(S) = P(S)—P(S) , 
Bnp(S°) = P (S°) -P (S°) . 


From the above it follows that Bnp (S) = Bnp(S°). 


The concepts of the quality of approximation, 
reducts, and core can also be extended to the approx- 
imation of the outranking relation by the multi-graded 
dominance relations. 

In particular, the coefficient 


= POURO 
|B| 


defines the quality of approximation of S and S° by 
P C C. It expresses the ratio of all pairs of reference 
objects (x, y) € B correctly assigned to S and S° by the 
set P of criteria to all the pairs of objects contained in 
B. Each minimal subset P C C, such that yp = yc, is 
called a reduct of C (denoted by REDs,.,). Note that 
Spcr can have more than one reduct. The intersection of 
all B-reducts is called the core (denoted by CORE;,,,). 

It is also possible to use the variable consistency 
model on Spcr [22.52], if one is aware that some of 
the pairs in the positive or negative dominance sets be- 
long to the opposite relation, while at least / * 100% of 
pairs belong to the correct one. Then the definition of 
the lower approximations of S and S° boils down to 


preva 
[pe (x, »)| 7 


[Dp wns | >l 
|Dp (x, y)| 


P(S)=4G@,y) E€ B: 


’ 


PS) = laes 


Dominance Without Degrees of Preference 
The degree of graded preference considered above is 
defined on a cardinal scale of the strength of preference. 
However, in many real world problems, the existence of 
such a quantitative scale is rather questionable. This is 
the case with ordinal scales of criteria. In this case, the 
dominance relation is defined directly on evaluations 
gi(x) for all objects x € A. Let us explain this latter case 
in more detail. 

Let P = P? and P” = Ø, then, given (x, y), (w, z) € 
AxA, the pair (x,y) is said to dominate the pair 
(w,z) with respect to criteria from P (denoted by 
(x, y)Dp(w, z)), if for each g; €P, g;(x) > gi(w) and 
gi(z) = gi(y). 

Let Dgn be the dominance relation confined to the 
single criterion g; € P°. The binary relation Dy; is re- 
flexive, transitive, but non-complete (it is possible that 
not (x, y)Dsix(w,z) and not (w,z)Dsi(x, y) for some 
(x, y), (w,z) € A x A). Therefore, Ds; is a partial pre- 
order. Since the intersection of partial preorders is also 


HZZ |) Hed 


364 PartC 


Rough Sets 


HZZ |) Hed 


a partial preorder and Dp = Meer Din, P= P°, the 
dominance relation Dp is a partial preorder. 

If some criteria from P C C express preferences on 
a quantitative or a numerical non-quantitative scale and 
others on an ordinal scale, i.e., if PY A@ and P? Æ 
Ø, then, given (x,y), (w,z) €A XA, the pair (x,y) is 
said to dominate the pair (w, z) with respect to criteria 
from P, if (x, y) dominates (w, z) with respect to both 
PN and P®. Since the dominance relation with respect 
to P” is a partial preorder on A x A (because it is a multi- 
graded dominance) and the dominance with respect to 
P? is also a partial preorder on Ax A (as explained 
above), then the dominance Dp, being the intersection 
of these two dominance relations, is a partial preorder. 
In consequence, all the concepts introduced in the previ- 
ous section can be restored using this specific definition 
of dominance. 


22.4.4 Induction of Decision Rules 
from Rough Approximations 
of Preference Relations 


Using the rough approximations of preference rela- 
tions S and S° defined in Sect. 22.4.3, it is possible 
to induce a generalized description of the preference 
information contained in a given Spcr in terms of suit- 
able decision rules. The syntax of these rules involves 
the concept of upward cumulated preferences (denoted 
by p=") and downward cumulated preferences (denoted 


by P="), with the following interpretation: 


e xP="y means 
x is preferred to y with respect to g; by at least de- 
gree h, 

e xPs"y means 
x is preferred to y with respect to g; by at most de- 
gree h. 


The exact definition of the cumulated preferences, 
for each (x,y) € AXA, g; € C™, and h € H;, can be rep- 
resented as follows: 


e xp="y if xPky, where k € H; and k > h; 
@ xP="y if xPky, where k € H; and k < h. 


Let also G; = {g;(x), x € A}, gi € C?. The decision 
rules have then the following syntax: 


1. D>-decision rules: 
If xpzhidy and ... xper),y and gie+1(x) = 


Tie+1 ANd Sie+1(¥) < Sie+1 and ... gip (x) > rip and 
Eip) < Sip, then xSy, 


where 
P= {8i1,.--,8ipt GC, 
PY = (Bits -o Biel > 
P? St peti 2e Rint 3 
(A(il),...,h(ie)) € Hi x- -x Hie 
and 
(Fie+1> +- -3 Fip), (Sie+1> - - -> Sip) € Giet X+ +X Gip « 


These rules are supported by pairs of objects from 
the P-lower approximation of S only. 
D<-decision rules: 


If apy and ... xP?) y and gie+1(x) < 


Viet+1 and Bie+10) = Sie+1 and... Ep2) < rip and 
Sip(Y) = Sip, then xS‘y, 


where 
P={8i1,.--,8ipt GC, 
PPS pai ena Biel 
PO = {gepi Eip} 
(h(il),... ,h(ie)) € Ha x+- x H; 
and 
(Ties ++ +s Tip)s Siet1s +++ Sip) € Giet1 X X Gip 


These rules are supported by pairs of objects from 
the P-lower approximation of S° only- 
Ds <-decision rules: 


If apy and ... xpzhtie)y and ay 
iG 

zá sah P and gy+i(x) = rg+i and gy+ily) < 

Sy+1 and ... ig(x)>rig and gig(y) < Sig and 


Sigti(X) STig+1 and gig+i(y) = Sigti and ... 
Sip(X) < Tip and gip) = Sip, then xSy or xS‘y, 


where 
O = {8ü;.- -8i EC, 
O" = {gieti se Ret SC, 
PaO U0", 

O’, and O” are not necessarily disjoint, 
P? — ESTER Sint ’ 
(A(i1),..., ACif)) € Ha x- x Hy, 
(Hp iisenns Tip) (yF itp) 


E Git X+ X Gip. 


These rules are supported by pairs of objects from 
the P-boundary of S and S° only. 


Rough Set Methodology for Decision Aiding | 22.4 Dominance-based Rough Set Approach to Choice and Ranking 365 


22.4.5 Application of Decision Rules 
to Multi-Criteria Choice and Ranking 


The decision rules induced from a given Spcr describe 
the comprehensive preference relations S and S° either 
exactly (Ds - and D<-decision rules) or approximately 
(Ds <-decision rules). A set of these rules covering 
all pairs of Spcr represents a preference model of the 
DM who gave the preference information in terms of 
pairwise comparison of reference objects. The appli- 
cation of these decision rules on a new subset M T 
A of objects induces a specific preference structure 
on M. 

In fact, any pair of objects (u, v) € M x M can match 
the decision rules in one of four ways: 


@ At least one Ds -decision rule and neither D< nor 
Ds <-decision rules. 

© At least one D<-decision rule and neither D> nor 
Ds <-decision rules. 

@ At least one Ds -decision rule and at least one D<- 
decision rule, or at least one D> <-decision rule, or 
at least one Ds <-decision rule and at least one D> 
and/or at least one D<-decision rule. 

@ No decision rule. 


These four ways correspond to the following four 
situations of outranking, respectively: 


@ uSv and not uS‘y, i.e., true outranking (denoted by 
uS7y). 

© uS‘v and not usy, i.e., false outranking (denoted by 
uS*y). 

© uSv and uS‘y, i.e., contradictory outranking (de- 
noted by uS*v). 

© not uSv and not uS‘y, i. e., unknown outranking (de- 
noted by usv). 


The above four situations, which together constitute 
the so-called four-valued outranking [22.53], have been 
introduced to underline the presence and absence of 
positive and negative reasons for the outranking. More- 
over, they make it possible to distinguish contradictory 
situations from unknown ones. 


A final recommendation (optimal choice or rank- 
ing) can be obtained upon suitable exploitation of this 
structure, i.e., of the presence and the absence of 
outranking S and S° on M. A possible exploitation pro- 
cedure consists of calculating a specific score, called the 
net flow score, for each object x € M 


Syp (x) = STF (x) — ST (@) + S~T(@)—-S7(@), 


where 


@ St*(x) =|{y eM: there is at least one decision 
rule which affirms xSy}|; 

e St ~(~) =|{y €M: there is at least one decision 
rule which affirms ySx}|; 

e S~t+(~ =|{y eM: there is at least one decision 
rule which affirms yS°x}|; 

@ S (x)= |{y e€ M: there is at least one decision rule 
which affirms xS°y}]. 


The recommendation in ranking problems consists 
of the total preorder determined by S,(x) on M. In 
choice problems, it consists of the object(s) x* € M 
such that 


Saa”) = peat {Sip (x) } : 


The above procedure has been characterized with 
reference to a number of desirable properties in [22.53, 
54]. A computer implementation of the whole ap- 
proach, called jRank (ranking generator using DRSA) 
is publicly available [22.55]. 

Recently, Fortemps et al. [22.56] extended DRSA 
to multi-criteria choice and ranking on multi-graded 
preference relations, instead of uni-graded relations S$ 
and S°. 

It is also worth mentioning that there is a machine 
learning approach to multi-criteria choice and ranking 
using ensembles of decision rules. The approach pre- 
sented by Dembczyriski et al. [22.57] makes a bridge 
between stochastic methods of preference learning and 
DRSA for choice and ranking. Examples illustrating the 
application of DRSA to multi-criteria choice and rank- 
ing in a didactic way can be found in [22.12—14, 50, 
54]. 


HZZ |) Hed 


366 Part C | Rough Sets 


9°72 |) Hed 


22.5 Important Extensions of DRSA 


The existing literature describes many extensions of 
DRSA that make it a useful tool for other practical ap- 
plications. These extensions are: 


@ DRSA to decision under risk and uncer- 
tainty [22.58]; 

@ DRSA to decision under uncertainty and time pref- 
erence [22.59]; 

© DRSA handling missing data [22.60, 61]; 

@ DRSA for imprecise object evaluations and assign- 
ments [22.62]; 

@ Dominance-based approach to induction of associ- 
ation rules [22.63]; 

© Fuzzy-rough hybridization of DRSA [22.8, 64-67]; 

@ DRSA as a way of operator-free fuzzy-rough hy- 
bridization [22.28, 67, 68]; 

@ DRSA to granular computing [22.36, 37]; 

@ DRSA to case-based reasoning [22.69, 70]; 

@ DRSA for hierarchical structure of evaluation crite- 
ria [22.71]; 

@ DRSA to decision involving multiple decision mak- 
ers [22.72, 73]; 

@ DRSA to interactive multi-objective optimiza- 
tion [22.74]; 

© DRSA to interactive evolutionary multi-objective 
optimization under risk and uncertainty [22.75, 76]. 


It is worth stressing that dealing with ordinal data 
and monotonicity constraints also makes sense in gen- 
eral classification problems, where the notion of prefer- 
ence has no meaning. Even when the ordering seems 
irrelevant, the presence or the absence of a property 
have an ordinal interpretation. If two properties are re- 
lated, one of the two: the presence or the absence of 
one property should make more (or less) probable the 
presence of the other property. A formal proof show- 
ing that the IRSA is a particular case of the DRSA was 


given in [22.28]. With this in mind, DRSA can be seen 
as a general framework for analysis of classification 
data. Although it was designed for ordinal classification 
problems with monotonicity constraints, DRSA can be 
used to solve a general classification problem where no 
additional information about ordering is taken into ac- 
count. 

The idea behind this claim is the following [22.77]. 
We assume, without loss of generality, that the value 
sets of all regular attributes are number-coded. While 
this is natural for numerical attributes, categorical at- 
tributes must get numerical codes for categories. In 
this way, the value sets of all regular attributes be- 
come ordered (as all sets of numbers are ordered). 
Now, to analyze a non-ordinal classification problem 
using DRSA, we transform the decision table such that 
each regular attribute is cloned (doubled). It is assumed 
that the value set of each original attribute is ordered 
with respect to increasing preference (gain-type crite- 
rion), and the value set of its clone is ordered with 
respect to decreasing preference (cost-type criterion). 
Using DRSA, for each t € {1,...,}, we approximate 
two sets of objects from the decision table: class Cl; 
and its complement —C/,. Obviously, we can calcu- 
late dominance-based rough approximations of the two 
sets. Moreover, they can serve to induce if ..., then 
... decision rules recommending assignment to class 
Cl, or to its complement —C/,. In this way, we refor- 
mulated the original non-ordinal classification problem 
to an ordinal classification problem with monotonicity 
constraints. Due to cloning of attributes with opposite 
preference orders, we can have rules that cover a sub- 
space in the condition space, which is bounded from the 
top and from the bottom — this leads (without discretiza- 
tion) to more synthetic rules than those resulting from 
the IRSA. 


22.6 DRSA to Operational Research Problems 


DRSA is also a useful instrument in the toolbox of oper- 
ational research (OR). DRSA has been adapted to solve 
the following OR problems: 


@ Interactive multi-objective optimization [22.74] ap- 
plied to OR problems, such as portfolio manage- 
ment, project scheduling, and production planning. 


@ Interactive evolutionary multi-objective optimiza- 
tion under risk and uncertainty [22.75, 76]. 

@ Decision under uncertainty and time prefer- 
ence [22.59], which is useful for dealing with many 
OR problems where uncertainty of outcomes and 
their distribution over time play a fundamental role, 
such as portfolio selection [22.78], scheduling with 


Rough Set Methodology for Decision Aiding | References 


time-resource interactions, and inventory manage- 
ment. 

Global investment risk analysis on partially missing 
data [22.79]. 


© Explanation of recommendations following from 


robust ordinal regression applied to multi-criteria 
ranking problems in terms of rules [22.80]. 


22.7 Concluding Remarks on DRSA Applied to Multi-Criteria 


Decision Problems 


Let us point out the main features of the methodology 
described: 


The input data set describing a given decision sit- 
uation is the preference information elicited by the 
DM in terms of exemplary decisions (class assign- 
ments or pairwise comparisons of some objects). 
The rough set analysis of preference informa- 
tion using DRSA supplies some useful elements 
of knowledge about the decision situation. These 
are: the relevance of particular criteria, information 
about their interaction, minimal subsets of criteria 
(reducts) conveying important knowledge contained 
in the exemplary decisions and the set of the non- 
reducible criteria (core). 

The methodology presented is based on elementary 
concepts and mathematical tools (sets and set op- 
erations, binary relations), without recourse to any 
complex algebraic or analytical structures; the main 
idea is very natural and the key concept of domi- 
nance is rational and objective. 

DRSA structures the input data prior to induction 
of decision rules. The structuring takes into ac- 
count inconsistencies of the preference information 
with respect to the dominance principle. Due to the 
structuring the induced decision rules are certain or 
possible, depending whether they are induced from 
lower or upper approximations (of unions of classes 
or preference relations), respectively. 

The preference model induced from the rough ap- 
proximations defined on the preference information 
is expressed in the natural and comprehensible lan- 


References 


22.1 J. Figueira, S. Greco, M. Ehrgott (Eds.): Multiple 


Criteria Decision Analysis: State of the Art Surveys 
(Springer, Berlin 2005) 


22.2 Z. Pawlak: Rough sets, Int. J. Comput. Inf. Sci. 11, 


341-356 (1982) 


guage of if..., then... decision rules, fulfilling 
the postulate of transparency and interpretability of 
preference models in decision aiding; each deci- 
sion rule can be clearly identified with those parts 
of the preference information (decision examples) 
which support the rule; the rules inform the DM 
about the relationships between conditions and de- 
cisions; in this way, the rules permit traceability of 
the decision aiding process and give understandable 
justifications for the decision to be made, so that the 
resulting preference model constituted for the DM 
is a glass box rather than a black box. Finally, the de- 
cision rule preference model is more general than all 
existing models of conjoint measurement due to its 
capacity of handling inconsistent preferences [22.9— 
11, 81]. 

Apart from their clear meaning, the decision rules 
are characterized by some interestingness measures, 
among which Bayesian confirmation measures ap- 
pear to be the most appropriate, as shown in the 
studies [22.43, 44]. 

The decision rules do not convert ordinal informa- 
tion into numeric information but keep the ordinal 
character of input data due to the syntax proposed. 
Heterogeneous information (qualitative and quan- 
titative, ordered and non-ordered) and scales of 
preference (ordinal, cardinal) can be processed 
within DRSA, while classical methods consider 
only quantitative ordered evaluations (with rare 
exceptions). 

No prior discretization of the quantitative domains 
of criteria is necessary. 


22.3 Z. Pawlak: Rough sets: Theoretical Aspects of Rea- 


soning about Data (Kluwer, Dordrecht 1991) 


22.4 Z. Pawlak, R. Stowinski: Rough set approach to 


multi-attribute decision analysis, Eur. J. Oper. Res. 
72, 443-459 (1994) 


367 


zz |3 Hed 


368 PartC 


TZ |) Hed 


Rough Sets 

22.5 R. Stowinski: Rough set learning of preferential 22.18 Z. Pawlak, J.W. Grzymala-Busse, R. Słowiński, 
attitude in multi-criteria decision making, Lect. W. Ziarko: Rough sets, Communications ACM 38, 
Notes Artif. Intell. 689, 642-651 (1993) 89-95 (1995) 

22.6 S. Greco, B. Matarazzo, R. Stowinski: Anew rough 22.19 L. Polkowski: Rough Sets: Mathematical Founda- 
set approach to evaluation of bankruptcy risked. tions (Physica, Heidelberg 2002) 

In: Operational Tools in the Management of Fi- 22.20 R. Stowiński (Ed.): Intelligent Decision Support: 
nancial RiskC. Zopounidis (Kluwer, Dordrecht 1998) Handbook of Applications and Advances of the 
pp. 121-136 Rough Sets Theory (Kluwer, Dordrecht 1992) 

226k S. Greco, B. Matarazzo, R. Stowinski: Rough ap- 22.21 R. Stowinski, D. Vanderpooten: A generalised defi- 
proximation of a preference relation by dominance nition of rough approximations, IEEE Trans. Knowl. 
relations, Eur. J. Oper. Res. 117, 63-83 (1999) Data Eng. 12, 331-336 (2000) 

22.8 S. Greco, B. Matarazzo, R. Stowinski: The use of 22.22 R. Stowinski, C. Zopounidis: Application of the 
rough sets and fuzzy sets in MCDM. In: Multicriteria rough set approach to evaluation of bankruptcy 
Decision Making, International Series in Opeara- risk, Intell. Syst. Account. Financ. Manag. 4, 27-41 
tions Research & Management Science, Vol. 21, ed. (1995) 
by T. Gal, T. Stewart, T. Hanne (Kluwer, Dordrecht 22.23 W. Ziarko: Rough sets as a methodology for data 
1999) pp. 397-455 mining. In: Rough Sets in Knowledge Discovery, 

22.9 S. Greco, B. Matarazzo, R. Stowinski: Preference (Physica, Heidelberg 1998) pp. 554-576 
representation by means of conjoint measurement 22.24 R.Stowinski: A generalization of the indiscernibility 
& decision rule model. In: Aiding Decisions with relation for rough set analysis of quantitative in- 
Multiple Criteria-Essays, International Series in Op- formation, Riv. Mat. Sci. Econ. Soc. 15, 65-78 (1992) 
erations Reasearch & Management Science, Vol. 44, 22.25 S. Greco, B. Matarazzo, R. Stowinski: Extension of 
ed. by D. Bouyssou, E. Jacquet-Lagréze, P. Perny, the rough set approach to multicriteria decision 
R. Stowinski, D. Vanderpooten, P. Vincke (Kluwer, support, INFOR 38, 161-196 (2000) 

Dordrecht 2002) pp. 263-313 22.26 S. Greco, B. Matarazzo, R. Stowiriski: Rough sets 

22.10 S. Greco, B. Matarazzo, R. Stowinski: Axiomatic methodology for sorting problems in presence of 
characterization of a general utility function and its multiple attributes and criteria, Eur. J. Oper. Res. 
particular cases in terms of conjoint measurement 138, 247-259 (2002) 
and rough-set decision rules, Eur. J. Oper. Res.158, 22.27 S. Greco, B. Matarazzo, R. Stowinski: Determin- 
271-292 (2004) ing task and methods Calssification: Multicriteria 

22.11 R. Stowinski, S. Greco, B. Matarazzo: Axioma- classification. In: Handbook of Data Mining and 
tization of utility, outranking and decision-rule Knowledge Discovery, ed. by W. Kloesgen, J. Zytkow 
preference models for multiple-criteria classifica- (Oxford Univ. Press, Oxford 2002), 318-328 
tion problems under partial inconsistency with the 22.28 S. Greco, B. Matarazzo, R. Stowinski: Dominance- 
dominance principle, Control Cybern. 31, 1005-1035 based rough set approach as a proper way of 
(2002) handling graduality in rough set theory, Lect. Notes 

22.12 R. Stowinski, S. Greco, B. Matarazzo: Rough-set- Comput. Sci. 4400, 36-52 (2007) 
based decision support. In: Search Methodologies: 22.29 S. Greco, B. Matarazzo, R. Stowinski: The bipo- 
Introductory Tutorials in Optimization and Decision lar complemented de Morgan Brouwer-Zadeh dis- 
Support Techniques, 2nd edn., ed. by E.K. Burke, tributive lattice as an algebraic structure for the 
G. Kendall (Springer, New York 2014) pp. 557- dominance-based rough set approach, Fundam. 
609 Inf. 115, 25-56 (2012) 

22.13 R. Stowinski, S. Greco, B. Matarazzo: Rough Sets 22.30 S. Greco, B. Matarazzo, R. Stowinski, J. Stefanowski: 
in Decision Making. In: Encyclopedia of Complexity Variable consistency model of dominance-based 
and Systems Science, ed. by R.A. Meyers (Springer, rough set approach, Lect. Notes Artif. Intell. 2005, 
New York 2009) pp. 7753-7786 170-181 (2001) 

22.14 R. Stowinski, S. Greco, B. Matarazzo: Rough setand 22.31 W. Ziarko: Variable precision rough sets model, 
rule-based multicriteria decision aiding, Pesquisa J. Comput. Syst. Sci. 46, 39-59 (1993) 

Oper. 32(2), 213-269 (2012) 22.32 S. Greco, B. Matarazzo, R. Stowinski: Parameter- 

22.15 S. Greco, B. Matarazzo, R. Stowiriski: Rough sets ized rough set model using rough membership and 
theory for multicriteria decision analysis, Eur. Bayesian confirmation measures, Int. J. Approx. 
J. Oper. Res. 129, 1-47 (2001) Reason. 49, 285-300 (2008) 

22.16 R. Stowinski, S. Greco, B. Matarazzo: Rough set 22.33 J. Btaszczynski, S. Greco, R. Stowinski, M. Szeląg: 
analysis of preference-ordered data, Lect. Notes Monotonic variable consistency rough set ap- 
Artif. Intell. 2475, 44-59 (2002) proaches, Int. J. Approx. Reason. 50(7), 979-999 

22.17 R. Stowinski, J. Stefanowski, S. Greco, B. Matarazzo: (2009) 

Rough sets based processing of inconsistent in- 22.34 S. Greco, B. Matarazzo, R. Stowinski: Assessment of 


formation in decision analysis, Control Cybern. 29, 
379-404 (2000) 


a value of information using rough sets and fuzzy 
measures. In: Fuzzy Sets and Their Applications, ed. 


Rough Set Methodology for Decision Aiding 


References 


22.35 


22.36 


22.31 


22.38 


22.39 


22.40 


22.41 


22.42 


22.43 


22.44 


22.45 


22.46 


22.47 


by J. Chocjan, J. Leski (Silesian Univ. Technol. Press, 
Gliwice 2001) pp. 185-193 

W. Kottowski, K. Dembczynski, S. Greco, R. Stow- 
iński: Stochastic dominance-based rough set 
model for ordinal classification, Inf. Sci. 178(21), 
4019-4037 (2008) 

S. Greco, B. Matarazzo, R. Stowinski: Granular 
computing for reasoning about ordered data: The 
dominance-based rough set approach. In: Hand- 
book of Granular Computing, ed. by W. Pedrycz, 
A. Skowron, V. Kreinovich (Wiley, Chichester 2008) 
pp. 347-373 

S. Greco, B. Matarazzo, R. Stowinski: Granular com- 
puting and data mining for ordered data - The 
dominance-based roughset approach. In: Encyclo- 
pedia of Complexity and Systems Science, ed. by 
R.A. Meyers (Springer, New York 2009) pp. 4283- 
4305 

S. Greco, B. Matarazzo, R. Stowinski, J. Stefanowski: 
An algorithm for induction of decision rules con- 
sistent with dominance principle, Lect. Notes Artif. 
Intell. 2005, 304-313 (2001) 

J. Btaszczynski, R. Stowinski, M. Szelag: Sequen- 
tial covering rule induction algorithm for variable 
consistency rough set approaches, Inf. Sci. 181, 
987-1002 (2011) 

J. Btaszczynski, S. Greco, B. Matarazzo, R. Stowinski, 
M. Szeląg: jMAF - Dominance-based rough set data 
analysis framework. In: Rough Sets and Intelli- 
gent Systems, Intelligent Systems Reference Library, 
Vol. 42, ed. by A. Skowron, Z. Suraj (Springer, Berlin 
2013) pp. 185-209 

J. Btaszczynski, S. Greco, B. Matarazzo, R. Stowinski, 
M. Szelag: jMAF (java multi-criteria and multi- 
attribute analysis framework), 2013, available at: 
http://www.cs.put.poznan. pl/jBlaszczynski/Site/ 
jRS.html 

S. Greco, B. Matarazzo, N. Pappalardo, R. Stowinski: 
Measuring expected effects of interventions based 
on decision rules, J. Exp. Theor. Artif. Intell. 17(1/2), 
103-118 (2005) 

S. Greco, Z. Pawlak, R. Stowinski: Can Bayesian 
confirmation measures be useful for rough set de- 
cision rules?, Eng. Appl. Artif. Intell. 17(4), 345-361 
(2004) 

S. Greco, R. Stowinski, 1. Szczech: Properties of 
rule interestingness measures and alternative ap- 
proaches to normalization of measures, Inf. Sci. 
216, 1-16 (2012) 

S. Giove, S. Greco, B. Matarazzo, R. Stowinski: Vari- 
able consistency monotonic decision trees, Lect. 
Notes Artif. Intell. 2475, 247-254 (2002) 

S. Greco, B. Matarazzo, R. Stowinski: Algebra and 
topology for dominance-based rough set ap- 
proach. In: Advances in Intelligent Information 
Systems, Studies in Computational Intelligence, 
(Springer, Berlin 2010) pp. 43-78 

S. Greco, B. Matarazzo, R. Stowinski: Dominance- 
based rough set approach to granular computing. 


22.48 


22.49 


22. 


22. 


22. 


22. 


22. 


22. 


22. 


22. 


22. 


22. 


22. 


22. 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


61 


In: Novel Developments in Granular Computing, ed. 
by J. Yao (Hershey, New York 2010) pp. 439-527 

S. Greco, B. Matarazzo, R. Stowiński: On topological 
dominance-based rough set approach, Lect. Notes 
Comput. Sci. 6190, 21-45 (2010) 

J. Btaszczynski, S. Greco, R. Stowiński: Multi-criteria 
classification - A new scheme for application of 
dominance-based decision rules, Eur. J. Oper. Res. 
181(3), 1030-1044 (2007) 

S. Greco, B. Matarazzo, R. Stowinski: Decision rule 
approach. In: Multiple Criteria Decision Analysis: 
State of the Art Surveys, (Springer, New York 2005) 
pp. 507-562 

S. Greco, B. Matarazzo, R. Stowinski: Rule-based 
decision support in multicriteria choice and rank- 
ing, Lect. Notes Artif. Intell. 2143, 29-47 (2001) 

R. Stowinski, S. Greco, B. Matarazzo: Mining 
decision-rule preference model from rough ap- 
proximation of preference relation, Proc. 26th IEEE 
Annu. Int. Conf. Comput. Softw. Appl., Oxford (2002) 
pp. 1129-1134 

S. Greco, B. Matarazzo, R. Stowinski, A. Tsoukias: 
Exploitation of a rough approximation of the out- 
ranking relation in multicriteria choice and rank- 
ing. In: Trends in Multicriteria Decision Making, 
Lecture Notes in Economics and Mathematical Sys- 
tems, Vol. 465, ed. by T.J. Stewart, R.C. van den 
Honert (Springer, Berlin 1998) pp. 45-60 

M. Szeląg, S. Greco, R. Stowinski: Rule-based ap- 
proach to multicriteria ranking. In: Multicriteria 
Decision Aid and Artificial Intelligence: Links, The- 
ory and Applications, ed. by M. Doumpos, E. Grigo- 
roudis (Wiley-Blackwell, London 2013) pp. 127-160 
M. Szelag, S. Greco, R. Slowinski: jRank (ranking 
generator using DRSA), 2013, available at: http:// 
www.cs.put.poznan.pl/mszelag/Software/jRank/ 
jRank.html 

P. Fortemps, S. Greco, R. Stowinski: Multicrite- 
ria decision support using rules that represent 
rough-graded preference relations, Eur. J. Oper. 
Res. 188(1), 206-223 (2008) 

K. Dembczynski, W. Kottowski, R. Słowiński, 
M. Szeląg: Learning of rule ensembles for multiple 
attribute ranking problems. In: Preference Learn- 
ing, ed. by J. Fürnkranz, E. Hüllermeier (Springer, 
Berlin 2010) pp. 217-247 

S. Greco, B. Matarazzo, R. Stowiński: Rough set ap- 
proach to decisions under risk, Lect. Notes Artif. 
Intell. 2005, 160-169 (2001) 

S. Greco, B. Matarazzo, R. Stowiński: Dominance- 
based rough set approach to decision under un- 
certainty and time preference, Ann. Oper. Res. 176, 
41-75 (2010) 

S. Greco, B. Matarazzo, R. Słowiński: Handling 
missing values in rough set analysis of multi- 
attribute and multi-criteria decision problems, 
Lect. Notes Artif. Intell. 1711, 146-157 (1999) 

S. Greco, B. Matarazzo, R. Stowiński: Dealing with 
missing data in rough set analysis of multi- 


369 


zz |) Hed 


370 PartC 


Rough Sets 


TZ |) Hed 


22.62 


22.63 


22.64 


22.65 


22.66 


22.67 


22.68 


22.69 


22.70 


22.71 


attribute and multi-criteria decision problems. In: 
Decision Making: Recent Developments and World- 
wide Applications, ed. by S.H. Zanakis, G. Doukidis, 
C. Zopounidis (Kluwer, Dordrecht 2000) pp. 295- 
316 

K. Dembczyński, S. Greco, R. Słowiński: Rough 
set approach to multiple criteria classification 
with imprecise evaluations and assignments, Eur. 
J. Oper. Res. 198(2), 626-636 (2009) 

S. Greco, B. Matarazzo, R. Stowiński, J. Stefanowski: 
Mining association rules in preference-ordered 
data, Lect. Notes Artif. Intell. 2366, 442-450 (2002) 
S. Greco, B. Matarazzo, R. Stowiński: Rough set pro- 
cessing of vague information using fuzzy similarity 
relations. In: Finite Versus Infinite - Contributions 
to an Eternal Dilemma, ed. by C.S. Calude, G. Paun 
(Springer, Berlin 2000) pp. 149-173 

S. Greco, B. Matarazzo, R. Stowiński: Fuzzy exten- 
sion of the rough set approach to multicriteria and 
multiattribute sorting. In: Preferences and Deci- 
sions under Incomplete Knowledge, ed. by J. Fodor, 
B. De Baets, P. Perny (Physica, Heidelberg 2000) 
pp. 131-151 

S. Greco, M. Inuiguchi, R. Stowiński: Dominance- 
based rough set approach using possibility and 
necessity measures, Lect. Notes Artif. Intell. 2475, 
85-92 (2002) 

S. Greco, M. Inuiguchi, R. Stowiński: A new proposal 
for fuzzy rough approximations and gradual deci- 
sion rule representation, Lect. Notes Comput. Sci. 
3135, 319-342 (2003) 

S. Greco, M. Inuiguchi, R. Stowiński: Fuzzy rough 
sets and multiple-premise gradual decision rules, 
Int. J. Approx. Reason. 41, 179-211 (2005) 

S. Greco, B. Matarazzo, R. Słowiński: Case- 
based reasoning using gradual rules induced from 
dominance-based rough approximations, Lect. 
Notes Artif. Intell. 5009, 268-275 (2008) 

M. Szeląg, S. Greco, J. Błaszczyński, R. Słow- 
iński: Case-based reasoning using dominance- 
based decision rules, Lect. Notes Artif. Intell. 6954, 
404-413 (2011) 

K. Dembczynski, S. Greco, R. Stowinski: Methodol- 
ogy of rough-set-based classification and sorting 


22.72 


22.73 


22.74 


22.75 


22.76 


22.77 


22.78 


22.79 


22.80 


22.81 


with hierarchical structure of attributes and crite- 
ria, Control Cybern. 31, 891-920 (2002) 

S. Greco, B. Matarazzo, R. Stowinski: Dominance- 
based rough set approach to decision involving 
multiple decision makers, Lect. Notes Comput. Sci. 
4259, 306-317 (2006) 

S. Greco, B. Matarazzo, R. Stowiński: Dominance- 
based rough set approach on pairwise comparison 
tables to decision involving multiple decision mak- 
ers, Lect. Notes Comput. Sci. 6954, 126-135 (2011) 
S. Greco, B. Matarazzo, R. Stowiński: Dominance- 
based rough set approach to interactive multiob- 
jective optimization, Lect. Notes Comput. Sci. 5252, 
121-156 (2008) 

S. Greco, B. Matarazzo, R. Stowiński: Dominance- 
based rough set approach to interactive evolution- 
ary multiobjective optimization. In: Preferences 
and Decisions: Models and Applications, Studies in 
Fuzziness, (Springer, Berlin 2010) pp. 225-260 

S. Greco, B. Matarazzo, R. Słowiński: Interac- 
tive evolutionary multiobjective optimization us- 
ing dominance-based rough set approach, Proc. 
IEEE World Congr. Comput. Intell. 2010 (WCCI 2010), 
Barcelona, Spain (2010) pp. 3026-3033 

J. Btaszczynski, S. Greco, R. Stowinski: Inductive 
discovery of laws using monotonic rules, Eng. Appl. 
Artif. Intell. 25(2), 284-294 (2012) 

S. Greco, B. Matarazzo, R. Słowiński: Beyond 
Markowitz with multiple criteria decision aiding, 
J. Bus. Econ. 83(1), 29-60 (2013) 

S. Greco, B. Matarazzo, R. Stowiński, S. Zanakis: 
Global investing risk: A case study of knowledge 
assessment via rough sets, Ann. Oper. Res. 185, 
105-138 (2011) 

S. Greco, R. Stowinski, P. Zielniewicz: Putting dom- 
inance-based rough set approach and robust or- 
dinal regression together, Decis. Support Syst. 54, 
891-903 (2013) 

S. Greco, B. Matarazzo, R. Stowiński: Conjoint mea- 
surement and rough set approach for multicriteria 
sorting problems in presence of ordinal criteria. In: 
A-MCD-A: Aide Multi-Critére à la Décision — Multi- 
ple Criteria Decision Aiding, (European Commission, 
Ispra 2001) pp. 117-144 


23. Rule Induction from Rough Approximations 


Jerzy W. Grzymala-Busse 


Rule induction is an important technique in data 
mining or machine learning. Knowledge is fre- 
quently expressed by rules in many areas of 
artificial intelligence (Al), including rule-based 
expert systems. In this chapter we discuss only su- 
pervised learning in which all cases of the input 
data set are pre-classified by an expert. 


23.1 Complete and Consistent Data.............. 371 
23.1.1 Global COVEFMES........0....cccc0 see 372 
23.1.2 Local COoveriNgSssresinsssisirsess 373 
313 UASSIMCALOM enes 375 


23.1 Complete and Consistent Data 


Our basic assumption is that the data sets are pre- 
sented as decision tables. An example of a decision 
table is presented in Table 23.1. Rows of the de- 
cision table represent cases and columns represent 
variables. The set of all cases is denoted by U. In Ta- 
ble 23.1, U = {1, 2,3, 4,5, 6,7, 8}. Some variables are 
called attributes while one selected variable is called 
a decision and is denoted by d. The set of all at- 
tributes will be denoted by A. In Table 23.1, A= 
{Wind, Humidity, Temperature} and d = Trip. For an at- 
tribute a and case x, a(x) denotes the value of the 
attribute a for case x. For example, Wind(1) = low. 

Let B be a subset of the set A of all attributes. Com- 
plete data sets are characterized by the indiscernibility 
relation IND(B) [23.1,2] defined as follows: for any 
x,y EU, 


(x, y) € IND(B) if and only if a(x) = a(y) 
foranyaeB. (23.1) 


Obviously, IND(B) is an equivalence relation. The 
equivalence class of IND(B) containing x € U will be 
denoted by [x] and called a B-elementary set. A- 
elementary sets will be called elementary. Any union 


23.2 Inconsistent Data...............0....0.0.e 375 
23.3 Decision Table 
with Numerical Attributes .................... 377 
23.4 Incomplete Data....................... ee 378 
23.4.1 Singleton, Subset, 
and Concept Approximations...... 379 
23.4.2 Modified LEM2 Algorithm ........... 381 
23.4.3 Probabilistic Approximations...... 382 
2355  GOMCIUSIONS neneeese 384 
REFEFENCES......... 0... ccc eee cece eeeeecce een eeeneceeeenes 384 


of B-elementary sets will be called a B-definable set. 
By analogy, the A-definable set will be called definable. 
The elementary sets of the partition {d}* are called con- 
cepts. In Table 23.1, the concepts are {1,2,3}, {4,5}, 
and {6,7,8}. The set of all equivalence classes [x]z, 
where x€ U, is a partition on U denoted by B*. For 
Table 23.1, A* = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8} 
All members of A* are elementary sets. 

We will quote some definitions from [23.3]. A rule 
r is an expression of the following form 


(ay, V1 )& (do, V2)& ... & (Ax, vk) > (d,w), (23.2) 


Table 23.1 A complete and consistent decision table 


Attributes Decision 
Case Wind Humidity Temperature Trip 
1 low low medium yes 
2 low low low yes 
3 low medium medium yes 
4 low medium high maybe 
3 medium low medium maybe 
6 medium high low no 
7 high high high no 
8 medium high high no 


371 


v 
ar 
= 
Co 
(om) 
N 
w 
° 
= 


372 


lez |) Hed 


Part C 


Rough Sets 


where aj, a2, ..., ag are distinct attributes, d is a deci- 
sion, V1, V2, . . ., vg are respective attribute values, and w 
is a decision value. 

A case x is covered by a rule r if and only if any 
attribute—value pair of r is satisfied by the correspond- 
ing value of x. For example, case 1 from Table 23.1 is 
covered by the following rule r: 


(Wind, low) & (Humidity, low) — (Trip, yes) . 


The concept C defined by rule r is indicated by r. The 
above rule r indicates concept {1, 2, 3}. 

A rule r is consistent with the data set if and only 
if for any case x covered by r, x is a member of the 
concept indicated by r. The above rule is consistent with 
the data set represented by Table 23.1. A rule set R is 
consistent with the data set if and only if for any r € R, 
r is consistent with the data set. The rule set containing 
the above rule is consistent with the data set represented 
by Table 23.1. 

We say that a concept C is completely covered by 
a rule set R if and only if for every case x from C there 
exists a rule r from R such that r covers x. For example, 
the single rule 


(Wind, low) — (Trip, yes) 


completely covers the concept {1,2,3}. On the other 
hand, this rule is not consistent with the data set repre- 
sented by Table 23.1. A rule set R is complete for a data 
set if and only if every concept from the data set is com- 
pletely covered by R. 

In this chapter we will discuss how to induce rule 
sets that are complete and consistent with the data set. 


23.1.1 Global Coverings 


The simplest approach to rule induction is based on 
finding the smallest subset B of the set A of all at- 
tributes that is sufficient to be used in a rule set. Such 
reducing of the attribute set is one of the main and fre- 
quently used techniques in rough set theory [23.1, 2, 4]. 
This approach is also called a feature selection. In Ta- 
ble 23.1 the attribute Humidity is redundant (irrelevant). 
The remaining two attributes (Wind and Temperature) 
distinguish all eight cases. Let us make it more precise 
using the fundamental definitions of rough set the- 
ory [23.1, 2,4]. 

For a decision d we say that {d} depends on B if 
and only if B* < {d}*, i.e., for any elementary set X in 
B there exists a concept C from {d}* such that X C C. 


note that for partitions x and t on U, if for any X € x 
there exists Y € t such that X C Y, then we say that 7 
is smaller than or equal to t and denote it by a < t. 
A global covering (or relative reduct) of {d} is a subset 
B of A such that {d} depends on B and B is minimal in 
A. The algorithm to compute a single global covering is 
presented below. 


Algorithm 23.1 Algorithm to compute a single 
global covering 
1: (input: the set A of all attributes, 
partition {d}* on U; 
output: a single global covering R); 


2: begin 
3: compute partition A*; 
4: P:=A; 
5: R:=@; 
6: if A* < {d}* 
7: then 
8: begin 
9: for each attribute a in A do 
10: begin 
11: Q:=P-{a}; 
12: compute partition Q*; 
13: if Q* < {d}* 
14: then P := Q 
15: end {for} 
16: R:=P 
17: end {then} 


18: end {algorithm}. 
Let us use this algorithm for Table 23.1. First, 


A™ = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, (883 
< {Trip}* . 


Initially, 


P =A and Q= P — Wind, 
Q = {Humidity, Temperature} , 


and then we compute Q*, where 


Q™ = {{1, 5}, {2}, {3}, {4}, {6}, {7, 8H} . 


We find that Q* £ {Trip}*. Thus, P=A. Next, 
we try to delete Humidity from P. We obtain 
Q = {Wind, Temperature} and then we compute Q*, 
where Q* = {{1,3}, {2}, {4}, {5}, {6}, {7}, {8}}. This 
time Q* < {Trip}*, so P = {Wind, Temperature}. 


Rule Induction from Rough Approximations | 23.1 Complete and Consistent Data 


We still need to check Q = P— {Temperature}, Q = 
{Wind} and Q* = {{1, 2,3, 4}, {5, 6, 8}, {7}}, and O* £ 
{Trip}*. Thus R = {Wind, Temperature} is a global cov- 
ering. 

For a given global covering rules are induced by 
examining cases of the data set. Initially, such a rule 
contains all attributes from the global covering with 
the corresponding attribute values, then a dropping 
conditions technique is used; we try to drop one con- 
dition (attribute—value pair) at a time, starting from the 
leftmost condition, checking whether the rule is still 
consistent with the data set, then we try to drop the next 
condition, and so on. For example, 


(Wind, low) & (Temperature, medium) — (Trip, yes) 


is our first candidate for a rule. If we are going to 
drop the first condition, the above rule will be reduced 
to 


(Temperature, medium) — (Trip, yes) . 


However, this rule covers the case 5, so it is not con- 
sistent with the data set represented by Table 23.1. By 
dropping the second condition from the initial rule we 
obtain 


(Wind, low) — (Trip, yes) , 


but this rule is not consistent with the data represented 
by Table 23.1 either, since it covers case 4, so we con- 
clude that the initial rule is the simplest possible. This 
rule covers two cases: | and 3. 

It is not difficult to check that the rule 


(Wind, low) & (Temperature, low) — (Trip, yes) 


is as simple as possible and that it covers only case 2. 
Thus, the above two rules consistently and completely 
cover the concept {1, 2, 3}. 

The above algorithm is implemented as LEM1 
(Learning from Examples Module, version 1). It is 
a component of the data mining system LERS (Learn- 
ing from Examples Using Rough Sets). A similar sys- 
tem was described in [23.5]. 


23.1.2 Local Coverings 
The LEM1 algorithm is based on calculus on partitions 


on the entire universe U. Another approach to rule in- 
duction, based on attribute—value pairs, is presented in 


the LEM2 algorithm (Learning from Examples Module, 
version 2), another component of LERS. We will quote 
a few definitions from [23.6, 7]. 

For an attribute—value pair (a, v) = t, a block of t, 
denoted by [f], is a set of all cases from U such that for 
attribute a have value v, i.e., 


[(a, v)] = {x | a(x) =v}. (23.3) 


Let T be a set of attribute—value pairs. The block of 
T, denoted by [T], is the following set 


Nia. (23.4) 


tET 


Let B be a subset of U. Set B depends on a set T 
of attribute—value pairs t = (a,v) if and only if [T] is 
nonempty and 


[T] CB. (23.5) 


Set T is a minimal complex of B if and only if B 
depends on T and no proper subset T’ of T exists such 
that B depends on T’. Let T be a nonempty collection of 
nonempty sets of attribute—value pairs. Then T is a lo- 
cal covering of B if and only if the following conditions 
are satisfied: 


1. each member T of T is a minimal complex of B, 
2. Ue [7] =B, and T is minimal, i.e., T has the 
smallest possible number of members. 


An algorithm for finding a single local covering, 
called LEM2, is presented below. For a set X, |X| de- 
notes the cardinality of X. 


Algorithm 23.2 LEM2 
1: (input: a set B, 
output: a single local covering T of set B); 
2: begin 
3: G:= B; 
4: T =p: 
5: while G Æ Ø 
6: begin 
7 T := Ø; 
8: T(G):= {tN GAB}; 
9: while T = Ø or [T] ZB 
10: begin 
11: select a pair t € T(G) 
12: such that |[t] A G| is 
13: maximum; if a tie 


373 


Lez |) Hed 


374 PartC 


Lez |) Hed 


Rough Sets 

14: occurs, select a pair select the first (top) pair, (Humidity, low). This 
15: t € T(G) with the time {1,2,3,4}N {1,2,5} = {1,2} C {1,2,3}, so 
16: smallest cardinality of [¢]; {(Wind, low), (Humidity, low)} is the first element T 
17: if another tie occurs, of T. 

18: select first pair; 3. The new set G= B-[|T] = {1,2,3}— {1,2} = 
19: T:=TU {t}; {3}. The pair [(Humidity, medium)] has the 
20: G:=[ANG; smallest cardinality of [t], so it is the best 
21: T(G) := {t|[ O G Æ 9}; choice. However, [(Humidity, medium)] = {3,4} É 
22; T(G):=T(G)-T ; {1,2,3}, hence we need to look for the next t. 

23: end {while} 4. The pair [(Temperature, medium)] is the best 
24: foreach te T do choice, and {3,4} N {1,3,5} = {3} C {1,2,3}, so 
25: if [T—{}] CB {(Humidity, medium), (Temperature, medium)} is 
26: then T := T — {t}; the second element T of T. 


7: T:=T U{T}; 

28: G:=B-Urer(T]; 
29: end {while}; 

30: for each T € T do 

31: if Usern [$] =B 
32: then T := T — {T}; 
33: end {procedure}. 


We will trace the LEM2 algorithm applied to the 
following input set {1, 2,3} = [(Trip, yes)]. The tracing 
of LEM2 is presented in the Tables 23.2 and 23.3. The 
corresponding comments are: 


1. The set G= {1,2,3}. The best attribute-value pair 
t, with the largest cardinality of the intersection 
of [t] and G (presented in the third column of 
Table 23.2) is (Wind, low). The corresponding en- 
try in the third column of Table 23.2 is bulleted. 
However, [(Wind, low)] = {1,2,3,4} Z {1,2,3} = 
B, hence we need to look for the next t. 

2. The set G is the same, G = {1,2,3}. There are 
four attribute—value pairs with |[f O G| = 2. Two 
of them have the same cardinality as [t], so we 


Table 23.2 Computing a local covering for the concept 
[(Trip, yes)], part I 


(a,v) =t [@,»)] {1, 2, 3} {1, 2, 3} 
(Wind, low) fi, 23.2% | M2. 3hO | CLs 
(Wind, medium) {5, 6, 8} — = 
(Wind, high) {7} = = 
(Humidity, low) San ee {1,2} o 
(Humidity, medium) {3,4} {3} {3} 
(Humidity, high) {6, 7, 8} = = 
(Temperature, low) {2, 6} {2} {1, 3} 
(Temperature, {i BSH {iL sh {1,3} 
medium) 

(Temperature, high) {4, 7, 8} = 


Comments 1 2 


Thus, 


T = {{(Wind, low), (Humidity, low)}, 
{(Humidity, medium), (Temperature, medium) }} . 


Therefore, the LEM2 algorithm induces the following 
rule set 


(Wind, low) & (Humidity, low) 
— (Trip, yes) 
(Humidity, medium) & (Temperature, medium) 
— (Trip, yes) . 


Rules induced from local coverings differ from 
rules induced from global coverings. In many cases 
the former are simpler than the latter. For example, for 
Table 23.1 and the concept [(7rip, no)], the LEM2 al- 
gorithm would induce just one rule that covers all three 
cases 


(Humidity, high) —> (Trip, no) . 


Table 23.3 Computing a local covering for the concept 
[(Trip, yes)], part II 


(a,v) =t [@,v)] {3} {3} 
(Wind, low) Hie sah | Sp {3} 
(Wind, medium) {5, 6, 8} = = 
(Wind, high) {7} — — 
(Humidity, low) 112,5% — — 
(Humidity, medium) {3,4} {3} e = 
(Humidity, high) {6,7,8} = — 
(Temperature, low) {2, 6} — = 
(Temperature, {1, 3,5} {3} {3} 
medium) 

(Temperature, high) {4,7, 8} — 

Comments 3 4 


Rule Induction from Rough Approximations | 23.2 Inconsistent Data 


On the other hand, the attribute Humidity is not in- 
cluded in the global covering. The rules induced from 
the global covering are 


(Temperature, high) — (Trip, no). 
(Wind, medium) & (Temperature, low) 
— (Trip, no). 


23.1.3 Classification 


Rule sets, induced from data sets, are used most fre- 
quently to classify new, unseen cases. A classification 
system has two inputs: a rule set and a data set con- 
taining new cases and it classifies every case as being 
a member of some concept. A classification system 
used in LERS is a modification of the well-known 
bucket brigade algorithm [23.7-9]. 

The decision of to which concept a case belongs 
is made on the basis of three factors: strength, speci- 
ficity, and support. These factors are defined as follows: 
strength is the total number of cases correctly classi- 
fied by the rule during training. Specificity is the total 
number of attribute—value pairs on the left-hand side of 
the rule. The matching rules with a larger number of 
attribute—value pairs are considered more specific. The 
third factor, support, is defined as the sum of products 
of strength and specificity for all matching rules indi- 
cating the same concept. The concept C for which the 
support, i. e., the following expression 


5 Strength(r)* 


matching rules r describing C 


Specificity (r) (23.6) 


23.2 Inconsistent Data 


Frequently data sets contain conflicting cases, i.e., 
cases with the same attribute values but from different 
concepts. An example of such a data set is presented 
in Table 23.4. Cases 4 and 5 have the same values for 
all three attributes, yet their decision values are dif- 
ferent (they belong to different concepts). Similarly, 
cases 7 and 8 also conflict. Rough set theory handles 
inconsistent data by introducing lower and upper ap- 
proximations for every concept [23.1, 2]. 

There exists a very simple test for consistency: 
A* < {d}*. If this condition is false, the correspond- 
ing data set is not consistent. For Table 23.4, A* = 
{{1}, {2}, {3}, {4, 53, {6, 7, 8}, {93, {103}, and {d}* = 
{{1,2,3, 4}, {5, 6, 7}, {8,9, 10}}, so A* £ {d}*. 


is the largest is the winner, and the case is classified as 
being a member of C. 

In the classification system of LERS, if complete 
matching is impossible, all partially matching rules are 
identified. These are rules with at least one attribute- 
value pair matching the corresponding attribute—value 
pair of a case. For any partially matching rule r, the ad- 
ditional factor, called Matching_factor (r), is computed. 
Matching _factor (r) is defined as the ratio of the num- 
ber of matched attribute—value pairs of r with a case to 
the total number of attribute—value pairs of r. In par- 
tial matching, the concept C for which the following 
expression 


bD Matching_factor(r)* 


partially matching 
rules r describing C 


Strength(r)* 
Specificity(r) . (23.7) 


is the largest is the winner and the case is classified as 
being a member of C. 

Since the classification system is a part of the LERS 
data mining system, rules induced by any component of 
LERS, such as LEM1 or LEM2, are presented in the 
LERS format, in which every rule is associated with 
three numbers: the total number of attribute—value pairs 
on the left-hand side of the rule (i. e., specificity), the 
total number of cases correctly classified by the rule 
during training (i. e., strength), and the total number of 
training cases matching the left-hand side of the rule, 
i. e., the rule domain size. 


Let B be a subset of the set A of all attributes. For 
inconsistent data sets, in general, a concept X is not 
a definable set. However, set X may be approximated 
by two B-definable sets; the first one is called a B-lower 
approximation of X, denoted by BX and defined as fol- 
lows 


{x € U|[x]z C X}. (23.8) 


The second set is called a B-upper approximation of X, 
denoted by BX and defined as follows 


{x e Ulke NX # 8}. (23.9) 


In (23.8) and (23.9) lower and upper approxima- 
tions are constructed from singletons x; we say that we 


375 


TEZ |) Hed 


376 PartC 


Rough Sets 


TEZ |) Hed 


are using the so-called first method. The B-lower ap- 
proximation of X is the largest B-definable set contained 
in X. The B-upper approximation of X is the smallest B- 
definable set containing X. 

As was observed in [23.2], for complete decision 
tables we may use a second method to define the B- 
lower approximation of X, by the following formula 

U{[x]s|x € U, [x]z € X}, (23.10) 
while the B-upper approximation of x may be defined, 
using the second method, by 

U{[x]s|x € U, [x] NX AO}. (23.11) 
Obviously, both (23.8) and (23.10) define the same 


set. Similarly, (23.9) and (23.11) also define the same 
set. For Table 23.4, 


A{1,2,3,4} = {1,2,3} 
and 
A{1, 2, 3,4} = {1, 2, 3,4, 5}. 
It is well known that for any B C A and X C U, 


BX CX CBX, (23.12) 
hence any case x from BX is certainly a member of X, 
while any member x of BX is possibly a member of X. 
This observation is used in the LERS data mining sys- 
tem. If an input data set is inconsistent, LERS computes 
lower and upper approximations for any concept and 
then induces certain rules from the lower approxima- 
tion and possible rules from the upper approximation. 
For example, if we want to induce certain and possible 


Table 23.4 An inconsistent decision table 


Attributes Decision 

Case Wind Humidity Temperature Trip 

1 low low medium yes 

2 low low low yes 

3 low medium medium yes 

4 low medium high yes 

5 low medium high maybe 
6 medium low medium maybe 
7 medium low medium maybe 
8 medium low medium no 

9 high high high no 

10 medium high high no 


rule sets for the concept [(Trip, yes)] from Table 23.4, 
we need to consider the following two data sets, pre- 
sented in Tables 23.5 and 23.6. 

Table 23.5 was obtained from Table 23.4 by assign- 
ing the value yes of the decision Trip to all cases from 
the lower approximation of [(Trip, yes)] and by replac- 
ing all remaining values of Trip by a special value, say 
SPECIAL. Similarly, Table 23.6 was obtained from Ta- 
ble 23.4 by assigning the value yes of the decision Trip 
to all cases from the upper approximation of [(Trip, 
yes)] and by replacing all remaining values of Trip 
by the value SPECIAL. Obviously, both tables 23.5 
and 23.6 are consistent. Therefore, we may use the 
LEM1 or LEM2 algorithms to induce rules from Ta- 
bles 23.5 and 23.6. The rule set induced by the LEM2 
algorithm from Table 23.5 is: 


@ 2,2,2 


(Wind, low) & 
(Humidity, low) — (Trip, yes) , 


Table 23.5 A new data set for inducing certain rules for 
the concept [(Trip, yes)] 


Attributes Decision 

Case Wind Humidity Temperature Trip 

1 low low medium yes 

2 low low low yes 

3 low medium medium yes 

4 low medium high SPECIAL 
5 low medium high SPECIAL 
6 medium low medium SPECIAL 
T medium low medium SPECIAL 
8 medium low medium SPECIAL 
9 high high high SPECIAL 
10 medium high high SPECIAL 


Table 23.6 A new data set for inducing possible rules for 
the concept [(Trip, yes)] 


Attributes Decision 

Case Wind Humidity Temperature Trip 

1 low low medium yes 

2 low low low yes 

3 low medium medium yes 

4 low medium high yes 

5 low medium high yes 

6 medium low medium SPECIAL 
7 medium low medium SPECIAL 
8 medium low medium SPECIAL 
9 high high high SPECIAL 
10 medium high high SPECIAL 


Rule Induction from Rough Approximations 


23.3 Decision Table with Numerical Attributes 


èe 2,1,1 

(Humidity, medium) & 

(Temperature, medium) — (Trip, yes) , 
© 1,4,4 

(Temperature, high) — (Trip, SPECIAL) , 
© 1,4,4 

(Wind, medium) — (Trip, SPECIAL) , 


where all rules are presented in the LERS format, see 
Sect. 23.1.3. 


Obviously, only rules with (Trip, yes) on the right- 
hand side are informative; the remaining rules, with 
(Trip, SPECIAL) on the right-hand side should be ig- 
nored. These two rules are certain. The only infor- 
mative rule induced by the LEM2 algorithm from Ta- 
ble 23.6 is: 


© 1,4,5 
(Wind, low) — (Trip, yes) . 


This rule is possible. 


23.3 Decision Table with Numerical Attributes 


An example of a data set with numerical attributes is 
presented in Table 23.7. 

In rule induction from numerical data a prelim- 
inary step called discretization [23.10-12] is usually 
conducted. During discretization a domain of the nu- 
merical attribute is divided into intervals defined by 
cut-points (left and right delimiters of intervals). Such 
an interval, delimited by two cut-points, c and d, will 
be denoted by c...d. In this chapter we will discuss 
how to do both processes concurrently: rule induction 
and discretization. First we need to check whether our 
data set is consistent. Note that numerical data are, in 
general, consistent, but inconsistent numerical data are 
possible. For inconsistent numerical data we need to 
compute lower and upper approximations and the in- 
duce certain and possible rule sets. In the data set from 
Table 23.7, A* = {{1}, {2}, {3}, {4}, {53 {6}, {7}, (83, 
{d}* = {{1,2, 3}, {4,5}, {6,7, 8}}, so A* < {d}*, and 
the data set is consistent. 

A modified LEM2 algorithm for rule induction, 
called MLEM2 [23.13], does not need any preliminary 
discretization of numerical attributes. The domain of 


Table 23.7 A data set with numerical attributes 


Attributes Decision 

Case Wind Humidity Temperature Trip 

1 4 low medium yes 

2 8 low low yes 

3 4 medium medium yes 

4 8 medium high maybe 

5 12 low medium maybe 

6 16 high low no 

T 30 high high no 

8 12 high high no 


any numerical attribute is sorted first. Then potential 
cut-points are selected as averages of any two consec- 
utive values of the sorted list. For each cut-point c the 
MLEM2 algorithm creates two blocks, the first block 
contains all cases for which values of the numerical 
attribute are smaller than c, the second block contains 
the remaining cases (with values of the numerical at- 
tribute larger than c). Once all such blocks have been 
computed, rule induction in MLEM2 is conducted the 
same way as in LEM2. We will illustrate rule induction 


Table 23.8 Computing a local covering for the concept 
(Trip, yes)], part I 


(a,v) =t [@,»)] {1, 2, 3} {1, 2, 3} 
(Wind, 4..6) {1,3} {1, 3} {1,3} 
(Wind, 6..30) {2,4,5, 6, 7, 8} {2} {2} 
(Wind, 4..10) Hil, 2 nah {1,2,3}¢ — 
(Wind, 10..30) {5, 6, 7, 8} = = 
(Wind, 4..14) Hil, 2S. OF Hil Se = 
(Wind, 14..30) {6, 7} = = 
(Wind, 4..23) He ote | LA SH = 
(Wind, 23..30) {7} = = 
(Humidity, {1, 2, 5} {1,2} {1,2} 
low) 

(Humidity, {3,4} {3} {3} 
medium) 

(Humidity, high) {6,7,8} = = 
(Temperature, {2, 6} {2} {1, 3} 
low) 

(Temperature, PELS {1, 3} {1,3} 
medium) 

(Temperature, {4,7, 8} — = 
high) 

Comments 1 2 


377 


EEZ |) Hed 


378 Part C | Rough Sets 


HEZ |) Hed 


Table 23.9 Computing a local covering for the concept 
[(Trip, yes)], part II 


(a,v)=t [@,v)] (2) {2} 
(Wind, 4..6) {1,3} = = 
(Wind, 6..30) DA S.6, 1 | GR {2} 
(Wind, 4..10) {1,2, 3,4} {2} {2} 
(Wind, 10..30) {5, 6, 7, 8} = = 
(Wind, 4..14) {1, 2, 3,4, 5, 8} {2} {2} 
(Wind, 14..30) {6, 7} = = 
(Wind, 4..23) HN Zsa SO | ee {2} 
(Wind, 23..30) {7} = = 
(Humidity, 1,2, {2} {2}e 
low) 

(Humidity, {3, 4} = = 
medium) 

(Humidity, high)  {6, 7, 8} = = 
(Temperature, {2, 6} {2} © = 
low) 

(Temperature, ql, dS = = 
medium) 

(Temperature, {4, 7, 8} = = 
high) 

Comments 3 4 


from Table 23.7 using the MLEM2 rule induction algo- 
rithm. The MLEM2 algorithm is shown in Tables 23.8 
and 23.9. The corresponding comments are 


1. 


The set G = {1,2,3}. The best attribute—value pair 
t, with the largest cardinality of the intersection of 
[t] and G (presented in the third column of Ta- 
ble 23.8) is (Wind, 4..10). The corresponding entry 
in the third column of Table 23.8 is bulleted. How- 
ever, 


[(Wind, 4..10)] = {1,2,3,4 Z {1,2,3} =B, 


hence we need to look for the next t. 


23.4 Incomplete Data 


Real-life data are frequently incomplete. In this section 
we will consider incompleteness in the form of miss- 
ing attribute values. We will distinguish three types of 
missing attribute values: 


© Lost values, denoted by ?, where the original values 


existed, but are currently unavailable, since these 
values have been, for example, erased or the op- 
erator forgot to input them. In rule induction we 


2; 


Set G is the same, G = {1,2,3}. There are dashes 
for rows (Wind, 4..14) and (Wind, 4..23) since the 
corresponding intervals contain 4..10. There are 
four attribute—value pairs with |[t1 G| = 2. The best 
attribute—value pair, with the smallest cardinality of 
[t], is (Wind, 4..6). This time 


{1,2,3,4 N {1, 3} = {1,3} C {1,2,3}. 


Obviously, the common part of both intervals is 4..6, 
so {(Wind, 4..6)} is the first element T of T. 

The new set G = B—[T] = {1,2,3}— {1,3} = {2}. 
The pair [(Zemperature, low)] has the smallest car- 
dinality of [t], so it is the best choice. However, 
[(Temperature, low)] = {2,6} Z {1,2,3}, hence we 
need to look for the next t. 

The pair [(Humidity, low)] is the best choice, and 


{3,459 {1, 3,5} = 3) S {1,2,3}, 


so {[(Zemperature, low), (Humidity, low)} is the 
second element T of T. 


As a result, 


T ={{(Wind, 4..6)}, {(Temperature, low) , 
(Humidity, low)}} . 


In other words, the MLEM2 algorithm induces the fol- 
lowing rule set for Table 23.7: 


1,2,2 
(Wind, 4..6) — (Trip, yes) , 
2,1,1 


(Temperature, low) & (Humidity, low) 
— (Trip, yes) . 


will induce rules from existing, specified attribute 
values. 

Do not care conditions, denoted by *, where the 
original values are mysterious. For example, data 
were collected in a form of the interview, some 
questions were considered to be irrelevant or were 
embarrassing. Let us say that in an interview as- 
sociated with the diagnosis of a disease, there is 
a question about eye color. For some people such 


Rule Induction from Rough Approximations | 23.4 Incomplete Data 


a question is irrelevant. In rule induction we are as- 
suming that the attribute value is any value from the 
attribute domain. 

© Attribute-concept value, denoted by —. This in- 
terpretation is a special case of the do not care 
condition: it is restricted to attribute values typ- 
ical for the concept to which the case belongs. 
For example, typical values of temperature for pa- 
tients sick with flu are: high and very-high, for 
a patient the temperature value is missing, but we 
know that this patient is sick with flu, if using 
the attribute-concept interpretation, we will assume 
that possible temperature values are: high and very- 
high. 


We will assume that for any case at least one at- 
tribute value is specified (i. e., is not missing) and that 
all decision values are specified. An example of a deci- 
sion table with missing attribute values is presented in 
Table 23.10. 

The definition of consistent data from Sect. 23.2 
cannot be applied to data with missing attribute val- 
ues, since for such data the standard definition of 
the indiscernibility relation must be extended. More- 
over, it is well known that the standard defini- 
tions of lower and upper approximations are not 
applicable to data with missing attribute values. In 
Sect. 23.4.1 we will discuss three generalizations of 
the standard approximations: singleton, subset, and 
concept. 


23.4.1 Singleton, Subset, 
and Concept Approximations 


For incomplete data the definition of a block of an 
attribute-value pair is modified [23.14]: 


e If for an attribute a there exists a case x such that 
a(x) = ?, i.e., the corresponding value is lost, then 


Table 23.10 An incomplete decision table 


Attributes Decision 

Case Wind Humidity Temperature Trip 

1 low low medium yes 

2 2 low X yes 

3 medium medium yes 

+ low 7 high maybe 

5 medium = — medium maybe 

6 e high low no 

7 = high id no 

8 medium high high no 


the case x should not be included in any blocks 
[(a, v)] for all values v of attribute a. 

© If for an attribute a there exists a case x such that 
the corresponding value is a do not care condition, 
i.e., a(x) = x, then the case x should be included 
in blocks [(a, v)] for all specified values v of at- 
tribute a. 

© If for an attribute a there exists a case x such 
that the corresponding value is an_attribute— 
concept value, i.e., a(x) = —, then the correspond- 
ing case x should be included in blocks [(a, v)] 
for all specified values v € V(x, a) of attribute a, 
where 


V(x, a) ={a(y) | aQ) is specified, 
yeU, diy) = d(x)}. (23.13) 
For Table 23.10, 


V(5, Humidity) = Ø and 
V(7, Wind) = {medium} , 


so the blocks of attribute—value pairs are 


[(Wind, low)] = {1, 3, 4, 6}, 
[(Wind, medium)] = {3,5,6,7, 8}, 
[(Humidity, low)] = {1,2}, 
[(Humidity, medium)] = {3} , 
[(Humidity, high)] = {6,7, 8}, 
[(Temperature, low)] = {2, 6,7}, 
(Temperature, medium)] = {1, 2,3,5,7}, 
[(Temperature, high)] = {2,4,7, 8}. 


For a case x€ U, the characteristic set Kg(x) is 
defined as the intersection of the sets K(x, a), for all 
a € B, where the set K(x, a) is defined in the following 
way: 


© If a(x) is specified, then K(x,a) is the block 
[(a, a(x)] of attribute a and its value a(x). 

© If a(x) =? or a(x) = * then the set K(x, a) = U. 

© If a(x)=-, then the corresponding set 
K(x,a) is equal to the union of all blocks of 
attribute-value pairs (a,v), where v€ V(x, a) 
if V(x,a) is nonempty. If V(x,a) is empty, 
K(x,a)= U. 


379 


HEZ |) Hed 


380 PartC 


HEZ |) Hed 


Rough Sets 


For Table 23.10 and B = A, 


Ka(1) = {1,3,4,6, N {1,2 N {1,2,3,5,7} 
={1}, 
KO=UN,ZNU={1,2}, 
Ka(3) = UN {3N (1,2, 3, 5,7} = {3}, 
K,(4) = {1,3,4,6 N UN {1, 2, 3,5, 7} = {4}, 
Ka(5) = (3, 5,6, 7,8} NUN (£1, 2, 3,5, 7} 
= {3,5,7}, 
Ka(6) = UN (6,7, 8} N {2, 6, 7} = {6, 7}, 
Ka(7) = (3,5, 6,7, 8} {6, 7, 83 U = (6,7, 8}, 
Ka(8) = (3,5, 6, 7, 8} 9 {6,7,8} {2, 4,7, 8} 
= {7,8}. 


The characteristic set Kg(x) may be interpreted as 
the set of cases that are indistinguishable from x using 
all attributes from B and using a given interpretation of 
missing attribute values. For completely specified data 
sets (i.e., data sets with no missing attribute values), 
characteristic sets are reduced to elementary sets. The 
characteristic relation R(B) is a relation on U defined 
for x, y € U as follows 


(x, y) € R(B) if and only if y € Kg(x) . (23.14) 


The characteristic relation R(B) is reflexive but — 
in general — does not need to be symmetric or tran- 
sitive. Obviously, the characteristic relation R(B) 
is known if we know characteristic sets Kg(x) for 
all xe U and vice versa. In our example, R(A) = 
{(1, 1), (2, 1), (2,2), (3,3), 4,4), G, 3), (5, 5), (6, 6), 
(6,7), (7,6), (7,7), (7, 8), (8, 7), (8, 8)}. For a com- 
plete decision table, the characteristic relation R(B) is 
reduced to the indiscernibility relation [23.2]. 

Definability for completely specified decision tables 
should be modified to fit into incomplete decision ta- 
bles. For incomplete decision tables, a union of some 
intersections of attribute—value pair blocks, where such 
attributes are members of B and are distinct, will be 
called B-locally definable sets. A union of characteristic 
sets Kg(x), where x € X C U will be called a B-globally 
definable set. Any set X that is B-globally definable is 
B-locally definable; the converse is not true. 

For example, the set {2} is A-locally definable since 
{2} = [(Humidity, low)| A [(Temperature, high)|. How- 
ever, the set {2} is not A-globally definable. On the other 
hand, the set {5} = is not even locally definable since all 
blocks of attribute—value pairs containing case 5 contain 


also the case 7 as well. Obviously, if a set is not B- 
locally definable then it cannot be expressed by rule sets 
using attributes from B. Thus we should induce rules 
from sets that are at least A-locally definable. 

For incomplete decision tables lower and upper ap- 
proximations may be defined in a few different ways. 
We suggest three different definitions of lower and 
upper approximations for incomplete decision tables, 
following [23.14—-16]. Let X be a concept, a subset of 
U, let B be a subset of the set A of all attributes, and 
let R(B) be the characteristic relation of the incomplete 
decision. Our first definition uses an idea similar to the 
first method in Sect. 23.2, and is based on constructing 
both approximations from single elements of the set U. 
We will call these approximations singleton. A single- 
ton B-lower approximation of X is defined as follows 


BX = {x € U | Kex) CX}. (23.15) 


A singleton B-upper approximation of X is 
BX = {xE U | Kg) NXFO}. (23.16) 


In our example of the decision table presented in 
Table 23.10, the singleton A-lower and A-upper approx- 
imations of the concept: {1, 2, 3} are: 


Af1, 2,3} = {1,2,3}, (23.17) 
Af1,2,3} = {1,2,3,5}. (23.18) 


We may easily observe that the set {1,2,3,5}= 
(A{1, 2, 3}) is not A-locally definable since in all blocks 
of attribute—value pairs cases 5 and 7 are inseparable. 
Thus, as it was observed in, e.g., [23.14—16], single- 
ton approximations should not be used, theoretically, 
for rule induction. 

The second method of defining lower and upper 
approximations for complete decision tables uses an- 
other idea: lower and upper approximations are unions 
of elementary sets, subsets of U. Therefore, we may 
define lower and upper approximations for incomplete 
decision tables by analogy with the second method in 
Sect. 23.2, using characteristic sets instead of elemen- 
tary sets. There are two ways to do this. Using the first 
way, a subset B-lower approximation of X is defined as 
follows 


BX = U{Kg(x) | x € U, Kex) CX}. (23.19) 
A subset B-upper approximation of X is 


BX = U{Kg(x) | x€ U, K(x) NX AO. (23.20) 


Rule Induction from Rough Approximations | 23.4 Incomplete Data 


For any concept X, singleton B-lower and B-upper 
approximations of X are subsets of the subset B-lower 
and B-upper approximations of X, respectively [23.16], 
because the characteristic relation R(B) is reflexive. For 
the decision table presented in Table 23.10, the subset 
A-lower and A-upper approximations are 


A{1, 2,3} = {1,2,3}, 
A{1, 2,3} = {1,2,3, 5,7}. 


The second possibility is to modify the subset defini- 
tion of lower and upper approximation by replacing the 
universe U from the subset definition by a concept X. 
A concept B-lower approximation of the concept X is 
defined as follows 


BX = U{Kg(x) | x € X, Kex) CX}. (23.21) 


Obviously, the subset B-lower approximation of X 
is the same set as the concept B-lower approximation of 
X. A concept B-upper approximation of the concept X 
is defined as follows 


BX = U{Kg(x) | x € X, Kex) N X ZO} 


= Uf{Kp(x) | xE X}. (23.22) 


The concept upper approximations were defined 
in [23.17] and [23.18] as well. The concept B-upper 
approximation of X is a subset of the subset B-upper 
approximation of X [23.16]. For the decision table 
presented in Table 23.10, the concept A-upper approxi- 
mations is 


Af{1,2,3} = {1,2,3}. 


Note that for complete decision tables, all three defi- 
nitions of lower and upper approximations, singleton, 
subset, and concept, are reduced to the same standard 
definition of lower and upper approximations, respec- 
tively. 


23.4.2 Modified LEM2 Algorithm 


The same MLEM? rule induction from Sect. 23.3 may 
be used for rule induction from incomplete data; the 
only difference is a different definition of blocks of 
attribute—value pairs. Let us apply the MLEM2 algo- 
rithm to the data set from Table 23.10. First, we need to 
make a decision as to what kind of approximations we 
are going to use: singleton, subset, or concept. In our ex- 
ample, we use concept approximation. For Table 23.10, 


Af1, 2,3} = Af1, 2,3} = {1,2,3}, 


we will trace the MLEM2 algorithm applied to the set 

{1, 2,3}; this way our certain rule set, for the concept 

[(Trip, yes)], is at the same time certain and possible. 

The tracing of LEM2 is presented in the Tables 23.11. 
The corresponding comments are: 


1. The set G= {1, 2,3}. The best attribute—value pair 
t, with the largest cardinality of the intersection 
of [t] and G (presented in the third column of 
Table 23.11) is (Temperature, medium). The corre- 
sponding entry in the third column of Table 23.11 is 
bulleted. However, 


[(Temperature, medium)| 
= {1,2,3,5,7} Z {1,2,33=B, 


hence we need to look for the next t. 

2. Set G is the same, G = {1,2,3}. There are two 
attribute—value pairs with |[tM G| = 2. One of them, 
(Humidity, low) has the smallest cardinality of [t], 
so we select it. This time 


{1,2, 3,5, 7} {1,2} = {1,2} € {1,2,3}. 


However, (Temperature, medium) is redun- 
dant, since [(Humidity, low)] C {1,2,3}, hence 
{(Humidity, low)} is the first element T of T. 

3. The new set G = B—[T] = {1,2,3}— {1,2} = {3}. 
The pair [(Humidity, medium) has the smallest 
cardinality of [t], so it is the best choice. Ad- 
ditionally, [(Humidity, medium)| = {3} C {1, 2,3}, 
hence we are done, the set T = {(Humidity, 
medium)}. 


Table 23.11 Computing a rule set for the concept [(Trip, 
yes)], Table 23.10 


(a,v) =t [@, v)] {1, 2, 3} {1,253} _ {3} 


(Wind, low) {1, 3, 4, 6} {1, 3} {1, 3} {3} 
(Wind, e date | SBR {3} {3} 
medium) 

(Humidity, {1,2} ALOR dL% |= 
low) 

(Humidity, {3} {3} {3} {3} e 
medium) 

(Humidity, {6, 7, 8} — — = 
high) 

(Temperature, {2,6,7} {2} = = 
low) 

(Temperature, {1,2,3,5,7} {1,2,3}e — {3} 
medium) 

(Temperature, {2,4,7,8} {2} = = 
high) 

Comments 1 2 3 


381 


HEZ |) Hed 


382 PartC | Rough Sets 


HEZ |) Hed 


Therefore, T = {{(Humidity, low)}, { (Humidity, 
medium)}}. The MLEM2 algorithm induces the 
following rule set for Table 23.10: 


© 1,2,2 

(Humidity, low) — (Trip, yes) , 
e ],1,1 

(Humidity, medium) — (Trip, yes) . 
23.4.3 Probabilistic Approximations 


In this section we are going to generalize singleton, 
subset, and concept approximations from Sect. 23.4.1 
to corresponding approximations that are defined us- 
ing an additional parameter (or threshold), denoted by 
a, and interpreted as a probability. A generalization of 
standard approximations, called probabilistic approx- 
imations, has been studied in many papers [23.19- 
26]. 

Let B be a subset of the attribute set A and X be 
a subset of U. 

A B-singleton probabilistic approximation of X with 
the threshold a, 0 < œ < 1, denoted by apply (X), 
is defined as follows 


{x |x eU, Pr(X | Kp(x)) 2a}, 
where 


IX N Ka(x)| 


Pr(X | Kg(x)) = K] 


is the conditional probability of X given Kg(x) and |Y| 
denotes the cardinality of set Y. 

A B-subset probabilistic approximation of the 
set X with the threshold a, O <œ < 1, denoted by 
appràts'(X), is defined as follows 


U{Kp(x) | x€ U, Pr(X | Ka(x)) > a} . 


A B-concept probabilistic approximation of the 
set X with the threshold a, O <œ < 1, denoted by 
appry g” (X), is defined as follows 


U{Kp(x) | x €X, Pr(X | Kg(x)) > a}. 
For simplicity, if B=A, an A-singleton, B- 


subset, and B-concept probabilistic approximations 
will be called singleton, subset, and concept prob- 


abilistic approximations, and will be denoted by 
appr t” X),  apprset(x), and appriy™“P*(X), 
respectively. 

Obviously, for the concept X, the probabilis- 
tic approximation of a given type (singleton, sub- 
set, or concept) of X computed for the threshold 
equal to the smallest positive conditional probabil- 
ity Pr(X | [x]) is equal to the standard upper ap- 
proximation of X of the same type. Additionally, 
the probabilistic approximation of a given type of X 
computed for the threshold equal to 1 is equal to 
the standard lower approximation of X of the same 


type. 
For the data set from Table 23.12, the set of blocks 
of attribute—value pairs is 


[(Wind, low)] = {1, 3,5}, 
[(Wind, high)] = {4, 6,7, 8}, 
[(Humidity, low)] = {1,2,3,5}, 
(Humidity, high)] = {1, 4, 6,7, 8}, 
[(Temperature, low)] = {1,2,5, 6}, 
[(Temperature, high)] = {1,4, 6,7, 8}. 


The corresponding characteristic sets are 


Ka(1) = Ka(3) = {1,3,5}, 
Ka(2) = {1,2,5}, 

Ka(4) = (4, 6, 8}, 

Ka(5) = 1,5}, 

K4(6) = Ka(8) = {4, 6, 8}, 
Ka(T) = {4, 6,7, 8}. 


Conditional probabilities of the concept {1, 2, 3, 4} 
given a characteristic set K4(x) are presented in Ta- 
ble 23.13. 


Table 23.12 An incomplete decision table 


Attributes Decision 
Case Wind Humidity Temperature Trip 
1 low low yes 
2 ? low low yes 
3 low low ? yes 
4 high high high yes 
5} low i low no 
6 high high te no 
7 high ? high no 
8 high high high no 


Rule Induction from Rough Approximations | 23.4 Incomplete Data 


Table 23.13 Conditional probabilities 


Ka (62) {1, 2, 5} 
Pr({1,2, 4, 6} | Ka(x)) 0.667 


(1,3, 5} 
0.667 


For Table 23.13, all probabilistic approximations 
(singleton, subset, and concept) are 
appry 8 "({1,2,3,4}) =U, 
appre s"(41,2,3,4}) = {1,2,3,4,5, 6,8}, 
appr $°" ({1,2,3,4}) = {1, 2,3, 5}, 
appie ({1,2,3,4}) = (1,2, 3}, 
appr "({1,2,3,4}) = 9, 
appr 3s (1,2,3,4) =U, 
appr os<'({1,2, 3,4}) = {1,2,3,4,5,6,8}, 
apprg's“({1, 2, 3, 3}) = {1,2,3,5}, 
appro-cer ({1, 2, 3, 4}) = {1,2,3,5}, 
appr "(11 2, 3,44) =ð, 
appt, 9s ({1,2,3,4}) = {1,2, 3,4, 5,6, 8}, 
appry 333, ({1, 2, 3,4}) = {1, 2, 3,4, 5, 6, 8}, 
appros ({1,2,3,4}) = {1,2,3,5}, 
appro es ({1, 2, 3,4}) = (1,2, 3, 5}, 
appr ™ '({1, 2, 3,4) =ð. 


For rule induction from probabilistic approxima- 
tions of the given concept a technique similar to the 


Table 23.14 A modified decision table 


Attributes Decision 
Case Wind Humidity Temperature Trip 
1 low low yes 
2 ? low low yes 
3 low low ? yes 
4 high high high SPECIAL 
5 low be low no 
6 high high Be SPECIAL 
7 high ? high SPECIAL 
8 high high high SPECIAL 


{1,5} (4, 6, 8} 
0.5 0.333 


{4, 6, 7, 8} 
0.25 


one in Sect. 23.2 may be used. For any concept and the 
probabilistic approximation of the concept we will cre- 
ate a new decision table. Let us illustrate this idea with 
inducing a rule set for the concept [(7rip, yes)] from 
Table 23.12 using concept probabilistic approximation 
with œ = 0.5. The corresponding modified decision ta- 
ble is presented in Table 23.14. 

In the data set presented in Table 23.14, all val- 
ues of Trip are copied from Table 23.12 for all cases 
from 


apptgs  ({1,2,3,4}) = {1,2,3, 5}, 


while for all remaining cases values of Trip are replaced 
by the SPECIAL value. The MLEM2 rule induction al- 
gorithm, using concept upper approximation should be 
used with the corresponding type of upper approxima- 
tion (singleton, subset, and concept). In our example, 
the MLEM2 rule induction algorithm, using concept 
upper approximation, induces the following rule set 
from Table 23.14: 


e 1,3,4 
(Humidity, low) — (Trip, yes), 
© 1,4,4 
(Wind, high) — (Trip, SPECIAL), 
è 2,1,2 
(Wind, low)&(Temperature, low) — (Trip, no). 
The only rules that are useful should have (Trip, yes) 


on the right-hand side. Thus, the only rule that survives 
is: 


e 1,3,4 
(Humidity, low) — (Trip, yes). 


383 


HEZ |) Hed 


384 PartC | Rough Sets 


EZ |) Hed 


23.5 Conclusions 


Investigation of rule induction methods is subject to 
intensive research activity. New versions of rule in- 
duction algorithms based on probabilistic approxima- 
tions have been explored [23.27, 28]. Novel rule in- 
duction algorithms in which computation of proba- 


References 


23.11 


23.12 


Z. Pawlak: Rough sets, Int. J. Comput. Inf. Sci. 11, 
341-356 (1982) 

Z. Pawlak: Rough Sets. Theoretical Aspects of Rea- 
soning about Data (Kluwer Academic, Boston 1991) 
J.W. Grzymala-Busse: Rule induction. In: Data 
Mining and Knowledge Discovery Handbook, Sec- 
ond Edition, ed. by 0. Maimon, L. Rokach (Springer, 
Berlin, Heidelberg 2010) pp. 249-265 

Z. Pawlak, J.W. Grzymala-Busse, R. Slowinski, 
W. Ziarko: Rough sets, Commun. ACM 38, 89-95 
(1995) 

J.G. Bazan, M.S. Szczuka, A. Wojna, M. Wojnarski: 
On the evolution of rough set exploration system. 
In: Rough Sets and Current Trends in Computing, 
ed. by S. Tsumoto, R. Stowinski, J. Komorowski, 
J.W. Grzymala-Busse (Springer, Berlin, Heidelberg 
2004) pp. 592-601 

J.W. Grzymala-Busse: LERS — A system for learning 
from examples based on rough sets. In: /ntelli- 
gent Decision Support. Handbook of Applications 
and Advances of the Rough Set Theory, ed. by 
R. Slowinski (Kluwer Academic, Boston 1992) pp. 3- 
18 

J. Stefanowski: Algorithms of Decision Rule Induc- 
tion in Data Mining (Poznan University of Technol- 
ogy Press, Poznan 2001) 

L.B. Booker, D.E. Goldberg, J.F. Holland: Clas- 
sifier systems and genetic algorithms. In: Ma- 
chine Learning. Paradigms and Methods, ed. by 
J.G. Carbonell (MIT Press, Cambridge 1990) pp. 235- 
282 

J.H. Holland, K.J. Holyoak, R.E. Nisbett, P.R. Tha- 
gard: Induction. Processes of Inference, Learning, 
and Discovery (MIT Press, Cambridge 1986) 

M.R. Chmielewski, J.W. Grzymala-Busse: Global 
discretization of continuous attributes as prepro- 
cessing for machine learning, Int. J. Approx. Rea- 
son. 15(4), 319-331 (1996) 

J.W. Grzymala-Busse: Discretization of numer- 
ical attributes. In: Handbook of Data Mining 
and Knowledge Discovery, ed. by W. Kloesgen, 
J. Zytkow (Oxford Univ. Press, Oxford 2002) pp. 218- 
225 

J.W. Grzymala-Busse: Mining numerical data - 
A rough set approach, Trans. Rough Sets 11, 1-13 
(2010) 


bilistic approximations is done in parallel with rule 
induction were recently developed and experimentally 
tested [23.29]. The LEM2 algorithm was implemented 
in a bagged version [23.30], using ideas of ensemble 
learning. 


23.13 


23.14 


23.15 


23.16 


23.17 


23.18 


23.19 


23.20 


23.21 


23.22 


23.23 


23.24 


23.25 


J.W. Grzymala-Busse: MLEM2: A new algorithm 
for rule induction from imperfect data, Proc. 9th 
Int. Conf. Inform. Proc. Manag. Uncertain. Knowl.- 
Based Syst. (2002) pp. 243-250 

J.W. Grzymala-Busse: Data with missing attribute 
values: Generalization of indiscernibility relation 
and rule induction, Trans. Rough Sets 1, 78-95 
(2004) 

J.W. Grzymala-Busse: Rough set strategies to data 
with missing attribute values, Proc. Workshop 
Found. New Dir. Data Min. (2003) pp. 56-63 

J.W. Grzymala-Busse: Characteristic relations for 
incomplete data: A generalization of the in- 
discernibility relation. In: Rough Sets and Cur- 
rent Trends in Computing, ed. by S. Tsumoto, 
R. Stowinski, J. Komorowski, J.W. Grzymala- 
Busse (Springer, Berlin, Heidelberg 2004) pp. 244- 
253 

TY. Lin: Topological and fuzzy rough sets. In: In- 
telligent Decision Support. Handbook of Applica- 
tions and Advances of the Rough Sets Theory, ed. 
by R. Slowinski (Kluwer Academic, Boston 1992) 
pp. 287-304 

R. Slowinski, D. Vanderpooten: A generalized def- 
inition of rough approximations based on sim- 
ilarity, IEEE Trans. Knowl. Data Eng. 12, 331-336 
(2000) 

J.W. Grzymala-Busse, W. Ziarko: Data mining based 
on rough sets. In: Data Mining: Opportunities and 
Challenges, ed. by J. Wang (Idea Group, Hershey 
2003) pp. 142-173 

J.W. Grzymala-Busse, Y. Yao: Probabilistic rule in- 
duction with the LERS data mining system, Int. 
J. Intell. Syst. 26, 518-539 (2011) 

Z. Pawlak, A. Skowron: Rough sets: Some exten- 
sions, Inf. Sci. 177, 28-40 (2007) 

Z. Pawlak, S.K.M. Wong, W. Ziarko: Rough sets: 
probabilistic versus deterministic approach, Int. 
J. Man-Mach. Stud. 29, 81-95 (1988) 

Y.Y. Yao: Probabilistic rough set approximations, 
Int. J. Approx. Reason. 49, 255-271 (2008) 

Y.Y. Yao, S.K.M. Wong: A decision theoretic frame- 
work for approximate concepts, Int. J. Man-Mach. 
Stud. 37, 793-809 (1992) 

W. Ziarko: Variable precision rough set model, 
J. Comput. Syst. Sci. 46(1), 39-59 (1993) 


Rule Induction from Rough Approximations 


References 


23.26 


23.27 


23.28 


W. Ziarko: Probabilistic approach to rough sets, Int. 
J. Approx. Reason. 49, 272-284 (2008) 

P.G. Clark, J.W. Grzymala-Busse: Experiments 
on probabilistic approximations, IEEE Int. Conf. 
Granul. Comput. (2011) pp. 144-149 

P.G. Clark, J.W. Grzymala-Busse, M. Kuehnhausen: 
Local probabilistic approximations for incomplete 
data, Lect. Notes Comput. Sci. 7661, 93-98 (2012) 


23.29 


23.30 


J.W. Grzymala-Busse, W. Rzasa: A local version of 
the MLEM2 algorithm for rule induction, Fundam. 
Inform. 100, 99-116 (2010) 

C. Cohagan, J.W. Grzymala-Busse, Z.S. Hippe: Ex- 
periments on mining inconsistent data with bag- 
ging and the MLEM2 rule induction algorithm, Int. 
J. Granul. Comput. Rough Sets Intell. Syst. 2, 257-271 
(2012) 


385 


EZ |) Hed 


Yiyu Yao, Salvatore Greco, Roman Stowinski 


As quantitative generalizations of Pawlak rough 
sets, probabilistic rough sets consider degrees of 
overlap between equivalence classes and the set. 
An equivalence class is put into the lower approx- 
imation if the conditional probability of the set, 
given the equivalence class, is equal to or above 
one threshold; an equivalence class is put into the 
upper approximation if the conditional probabil- 
ity is above another threshold hold. We review 
a basic model of probabilistic rough sets (i.e., 
decision-theoretic rough set model) and varia- 
tions. We present the main results of probabilistic 
rough sets by focusing on three issues: (a) interpre- 
tation and calculation of the required thresholds, 
(b) estimation of the required conditional proba- 
bilities, and (c) interpretation and applications of 
probabilistic rough set approximations. 


24.1 Motivation for Studying Probabilistic 


Rough Sets oac enteo 388 

24.2 Pawlak Rough Sets ................... ee 388 
24.2.1 Rough Set Approximations....... 388 
24.2.2 Construction 


of Rough Set Approximations... 389 


24.3 A Basic Model 


of Probabilistic Rough Sets ................. 390 
24.4 Variants of Probabilistic Rough Sets..... 391 


24.4.1 Variable Precision Rough Sets .. 391 
24.4.2 Parameterized Rough Sets ....... 392 
24.4.3 Confirmation-Theoretic 

ROUGH BAIS. EA EE TN 392 
24.4.4 Bayesian Rough Sets ............... 393 

24.5 Three Fundamental 

Issues of Probabilistic Rough Sets........ 394 
24.5.1 Decision-Theoretic Rough Set 


Model: Determining 


the Thresholds ...................0..8 394 


24. Probabilistic Rough Sets 


24.6 


24.7 


24.8 


24.9 


24.5.2 Naive Bayesian Rough Set 
Model: Estimating 
the Conditional Probability...... 
24.5.3 Three-Way Decisions: 


Interpreting the Three Regions. 


Dominance-Based Rough Set 
Approaches ...................cccceccseeeeeeee scene 


A Basic Model of Dominance-Based 
Probabilistic Rough Sets ..................... 


Variants of Probabilistic 


Dominance-Based Rough Set Approach 


24.8.1 Variable Consistency 
Dominance-Based Rough Sets . 
24.8.2 Parameterized 
Dominance-Based Rough Sets . 
24.8.3 Confirmation-Theoretic 
Dominance-Based Rough Sets . 
24.8.4 Bayesian Dominance-Based 


Rough Sets. ciccio 


Three Fundamental Issues 

of Probabilistic Dominance-Based 
Rough Sets. occacoicrnocisorss 
24.9.1 Decision-Theoretic 
Dominance-Based Rough Set 
Model: Determining the 
Thresholds sessies 
Stochastic Dominance-Based 
Rough Set Approach: 
Estimating the Conditional 
ProbabiliOy eisern cenrrniersi 
Three-Way Decisions: 
Interpreting the Three Regions 
in the Case 

of Dominance-Based 

Rough SOUS 5 ccissccstasdicaesssaueebsans 


24.9.2 


24.9.3 


24.10 Conclusions... 


REFEFENCES....... occ ccc ccc eec eee eeaeseesneen sence 


387 


v 
o 

= 

rr 
(om) 
N 
p> 


388 PartC | Rough Sets 


THz |) Hed 


24.1 Motivation for Studying Probabilistic Rough Sets 


Rough set theory [24.1,2] provides a simple and ele- 
gant method for analyzing data represented in a tabular 
form called an information table. The rows of the table 
represent a finite set of objects, the columns represent 
a finite set of attributes, and each cell represents the 
value of an object on the corresponding attribute. With 
a limited number of attributes, we may only be able 
to describe some subsets of objects precisely [24.3, 4]. 
Those subsets that can be precisely described are called 
definable sets, and all other subsets are called undefin- 
able sets. A fundamental notion of rough set theory is 
the approximation of a subset of objects by a pair of de- 
finable sets from below and above, or equivalently, by 
three pairwise disjoint positive, negative, and boundary 
regions [24.4]. 

Pawlak rough set approximations are characterized 
by a zero tolerance of errors. That is, an object in the 
lower approximation certainly belongs to set and an ob- 
ject in the complement of the upper approximation cer- 
tainly does not belong to the set. This has motivated the 
introduction of many different generalizations of rough 
sets. By introducing certain levels of errors, probabilis- 
tic rough sets [24.5, 6] are quantitative generalizations 
of the qualitative Pawlak rough sets. Although several 
specific models of probabilistic rough sets had been 
considered by some authors [24.7—10], a more gen- 
eral model, called decision-theoretic rough set (DTRS) 


24.2 Pawlak Rough Sets 


We present a semantically meaningful definition of 
rough set approximations and a simple method for con- 
structing rough set approximations. 


24.2.1 Rough Set Approximations 
In rough set theory, a finite set of objects is described by 
using a finite set of attributes in a tabular form, called an 
information table [24.2]. Formally, an information table 
can be expressed as 

S = (U,AT, {V,a |a € AT}, {la |a E€ ATY), 


where 


U is a finite nonempty set of objects called universe , 


AT is a finite nonempty set of attributes , 


model, was first proposed by Yao etal. [24.11,12] 
based on the well-established Bayesian decision theory. 
Other probabilistic models include variable precision 
rough sets [24.13, 14], Bayesian rough sets [24.15- 
18], parameterized rough sets [24.19,20], game- 
theoretic rough sets [24.21,22], variable-consistency- 
indiscernibility-based and dominance-based rough 
sets [24.23,24], stochastic dominance-based rough 
sets [24.25], naive Bayesian rough sets [24.26], 
information-theoretic rough sets [24.27], confirmation- 
theoretic rough sets [24.28], and many different types 
of probabilistic rough set approximations [24.29, 30]. 

In this chapter, we present a basic model of prob- 
abilistic rough sets and a brief review of other prob- 
abilistic rough set models. We examine in particular 
three fundamental issues, namely, the interpretation and 
computation of the pair of thresholds, the estimation 
of probability, and an application of three regions. We 
also show how a probabilistic approach can be applied 
when information related to some order representing 
the extent to which some property related to considered 
attributes has to be taken into account. This situation is 
handled by the well-known rough set extension called 
dominance-based rough set approach [24.31-34]. A full 
understanding of these issues will greatly increase the 
chance of success when applying probabilistic rough 
sets in real-world applications. 


Va is a nonempty set of values for a € AT , 


I, : U — Va is an information function . 


The information table provides all available information 
about the set of objects, based on which we can perform 
tasks of analysis and inference. 

In an information table, one can introduce a de- 
scription language, as suggested by Marek and 
Pawlak [24.3], to formally describe objects. We con- 
sider a language DL that is recursively defined as 
follows 


(1) (a =v) € DL, where a € AT, vEV,, 
(2) if p,q € DL, then-p,pAqg,pVqeEDL. 


Formulas defined by (1) are called atomic formulas. 
The satisfiability of formula p by an object x, written 


Probabilistic Rough Sets | 24.2 Pawlak Rough Sets 389 


x p, is defined as follows 


© xH (a= v) iff L(x) =v, 

Gi) xE =p, iff “GF p), 
Gii) x= p^q, iff x} pandxEq, 
(iv) x= pVq,iffxFEporxEq. 


If p is a formula, the set m(p) C U defined by 


m(p) = {xe U| xF p} (24.1) 


is called the meaning set of p. That is, the meaning set 
m(p) consists of all those objects that satisfy the for- 
mula p. 

With the introduction of a description language, we 
can formally describe an important characteristics of an 
information table, namely, some subsets of objects are 
definable or describable while others are not. A subset 
of objects X C U is called a definable set [24.3, 4] if 
there exists a formula p such that 


X=m(p), (24.2) 


otherwise, X is called an undefinable set. The formula 
p is called a description of X. Let DEF(U) © 2” denote 
the family of all definable sets, where 2” is the power 
set of U. By definition, DEF(U) contains the empty set 
Ø, the entire universe U and is closed under set comple- 
ment, intersection, and union. In other words, DEF(U) 
is a sub-Boolean algebra of the power set 2”. 

For any subset of objects X C U, may be either de- 
finable or undefinable, we define the following pair of 
lower and upper approximations 


apr(X) = the largest definable set contained by X 
=|_J{G € DEF(U)|GC X}, 

apr(X) = the smallest definable set containing X 
= ( {G € DEF(U)|X € G}. (24.3) 


By definition, it follows that apr(X) C X C apr(X) for 
any X C U, and apr(X) = X =apr(X) if and only if 
X € DEF(U). The definition is semantically meaning- 
ful in the sense that it clearly explains the motivation 
for introducing rough set approximations and provides 
an interpretation of the approximations. However, one 
cannot use this definition to construct rough set approx- 
imations easily. 


24.2.2 Construction 
of Rough Set Approximations 


A simple method for constructing rough set approxima- 
tions is through an equivalence relation. For an attribute 
a € AT, the information function J, maps an object in 
U to a value of V,, that is, 7,(x) € Va. For an attribute 
a € AT, we can define an equivalence relation E, as fol- 
lows: for x,y € U 


xEy <=> hX) = 1.0). (24.4) 


The equivalence class containing x is denoted by [x]q. 
Similarly, for a subset of attributes A C AT, we define 
an equivalence relation E4 


xEay = > Va € A(la (x) = In(y)) . (24.5) 


The equivalence class containing x is denoted by [x],. 
By definition, it follows that, fora € AT and A C AT, 


Etay = Ea Pha = ba , 
Ea = (E bh = (Nha. (24.6) 
acA acA 


That is, we can construct the equivalence relation in- 
duced by a subset of attributes A by using equivalence 
relations induced by individual attributes in A. 

Consider the equivalence relation E4 C U x U in- 
duced by a subset of attributes A C AT. The equivalence 
relation E, induces a partition U/E, of U, i.e., a fam- 
ily of nonempty and pairwise disjoint subsets whose 
union is the universe. For an object x € U, its equiva- 
lence class is given by 


[xl = {ye U | xEay} . (24.7) 


By taking the union of a family of equivalence classes, 
one can construct an atomic sub-Boolean B(U/E,) 
of 2U with U/E, as the set of atoms 


B(U/Es) ={\ J F| FS U/Es} (24.8) 


That is, B(U/E,) contains the empty set Ø, the whole 
set U, and is closed with respect to set complement, in- 
tersection, and union. The three notions of equivalence 
relation E, the partition U/E,, and atomic Boolean al- 
gebra B(U/E,) uniquely determine each other. We can 
therefore use E4, U/E,, and B(U/E,) interchangeably. 

The pair apr = (U, E4), equivalently, the pair apr = 
(U, U/E,) or the pair apr = (U, B(U/E,)), is called an 


THz |) Hed 


390 PartC 


Rough Sets 


EHZ |) Hed 


approximation space. Although three different repre- 
sentations are equivalent, each of them provides a dif- 
ferent hint when we generalize rough sets. The pair 
apr = (U, F4) is useful for generalizing rough sets us- 
ing a nonequivalence relation [24.35]. The partition 
U/E, may be viewed as a granulation of the uni- 
verse U and the pair apr = (U, U/E,) relates rough 
sets and granular computing [24.36]. The pair apr = 
(U, B(U/E,)) leads to a subsystem-based formulation 
and generalizations [24.37]. 

For a subset of attributes A C AT, if we restrict the 
formulas of DL by using only attributes in A, we obtain 
a sublanguage DL(A) C DL. It can be proved that the 
family of all definable sets DEF; (U) defined by DL(A) 
is exactly the sub-Boolean algebra B(U/E,). With re- 
spect to a subset of attributes A C AT, each object x is 
described by a logic formula 


|\\a=hQ). 


acA 


(24.9) 


where J,(x) € Va and the atomic formula a = Ia (x) in- 
dicate that the value of an object on attribute a is I4 (x). 
The equivalence class containing x, namely, [x]z,, is the 
set of those objects that satisfy the formula ^ac4a = 
I,(x). The formula can be viewed as a description of ob- 
jects that are equivalent to x with respect to A, including 
x itself. 

Based on the equivalence of DEF,(U) and 
B(U/E,), we can equivalently define rough set ap- 
proximations by using the equivalence classes [x]4. For 
simplicity, we also simply write [x] when no confusion 
arises. 


For a subset of objects X C U, the pair of lower and 
upper approximations can be equivalently defined by 
apr(X) = {xe U| [x] CX}, 


apr(X) = {xe U | K] NX AQ}. (24.10) 


Construction of rough set approximation by this defini- 
tion is much easier. Alternatively, one can also define 
three pairwise disjoint positive, negative, and boundary 
regions [24.38] 

POS(X) = {xE U | [x] CX}, 

NEG(X) = {xE U| fx] NX = 9}, 

BND(X) = {xe U| k] Z Xak] NX AG}. 

(24.11) 

The pair of approximations and three regions deter- 
mines each other as follows 

POS(X) = apr(X) , 

NEG(X) = (apr(X))° , 


BND(X) = apr(X) —apr(X) ; (24.12) 
and 

apr(X) = POS(X) , 

apr(X) = POS(X) UBND(X) , (24.13) 


where (-)° denotes the complement of a set. Each repre- 
sentation provides a distinctive interpretation of rough 
set approximations. We will use the three-region ap- 
proximation in the rest of this chapter, due to its close 
connections to three-way decisions. 


24.3 A Basic Model of Probabilistic Rough Sets 


Decision-theoretic rough set (DTRS) model proposed 
by Yao et al. [24.11, 12] gives rises to a general form of 
probabilistic rough set approximations by using a pair 
of thresholds on conditional probabilities. The results 
enable us to formulate a basic model of probabilis- 
tic rough sets. However, we introduce the model in 
a way that is different from DTRS. We first interpret 
Pawlak rough sets in terms of probability and the two 
extreme value of probability (i.e., 1 and 0) and then 
generalize 1 and 0 into a pair of thresholds (a, 6) with 
0<B<a<l. 

The Pawlak rough sets consider only qualitative 
relationship between an equivalence class and a set, 
namely, an equivalence is a subset of the set or has 


a nonempty intersection with the set. This qualitative 
nature becomes clearer with a probabilistic interpreta- 
tion [24.6]. Suppose Pr(X|[x]) denotes the conditional 
probability that an object is in X given that the object 
is in [x]. The conditions for defining rough set three re- 
gions can be equivalently expressed as 


[x] € X <= > Pr(X|[x]) = 1; 

bx] OX = ø 4 Pr(X|[x]) < 0; 

[x] ZX AL] NX AO => 0 < Pr(X|[x]) <1. 
(24.14) 


Although a probability can never be greater than | or 
less than 0, we purposely use the conditions > 1 and 


Probabilistic Rough Sets | 24.4 Variants of Probabilistic Rough Sets 


< 0 whose intended meaning will become clearer later. 
By those conditions, Pawlak three regions can be equiv- 
alently expressed as 


POS(X) = {x € U | Pr(X|[x)) > 13, 
NEG(X) = {x € U | Pr(X|[x]) < 05 , 


BND(X) = {x€ U |0 < Pr(X|[x]) <1}. (24.15) 


They show that Pawlak rough sets only use the two ex- 
treme values, i. e., 1 and 0, of probability. 

It is natural to generalize Pawlak rough sets by re- 
placing 1 and O with some other values in the unit 
interval [0,1]. Given a pair of thresholds a, with 
0 < B <a <1, the main results of probabilistic rough 
sets are the (œ, 6 )-probabilistic regions defined by 


POS (ep) (X) = {x € U | Pr(X|[x]) = a}, 

NEG (e.g) (X) = {x € U | Pr(X|[x]) < B}. 

BND (q,6)(X) = {x € U | B < Pr(X|[x]) < œ} . 
(24.16) 


The Pawlak rough set model is a special case in which 
a = l and f =0. In the case when 0 < f =a < 1, the 
three regions are given by 


POS (aw) (X) = {x € U | Pr(X|[x]) > a}, 
NEG (aa) (X) = {x € U | Pr(X|[x]) <a}, 
BNDia.q) (X) = {x € U | Pr(X|[x]) =a}. (24.17) 


It may be commented that this special case is perhaps 
more of mathematical interest, rather than practical ap- 
plications. We use this particular definition in order 
to establish connection to existing studies. As will be 
shown in subsequent discussions, when Pr(X|[x]) = 
a = ß, the costs of assigning objects in [x] to the pos- 
itive, boundary, and negative regions, respectively, are 
the same. In fact, one may simply define two regions by 
assigning objects in the boundary region into either the 
positive or boundary region. 

The main results of the basic model of probabilis- 
tic rough sets were first proposed by Yao et al. [24.11, 
12] in a DTRS model, based on Bayesian decision 
theory. The DTRS model covers all specific mod- 
els introduced before it. The interpretation of Pawlak 
rough sets in terms of conditional probability, i.e., 
the model characterized by a=1 and f =0, was 
first given by Wong and Ziarko [24.10]. A 0.5- 
model, characterized by a = 6 = 0.5, was introduced 
by Wong and Ziarko [24.8] and Pawlak et al. [24.7], 
in which the positive region is defined by probabil- 
ity greater than 0.5, the negative by probability less 
than 0.5, and the boundary by probability equal to 
0.5. A model characterized by a>0.5 and $ = 0.5 
was suggested by Wong and Ziarko [24.9]. Most re- 
cent developments on decision-theoretic rough sets can 
be found in a book edited by Li et al. [24.39] and pa- 
pers [24.21, 40-50] in a journal special issue edited by 
Yao et al. [24.51]. 


24.4 Variants of Probabilistic Rough Sets 


Since the introduction of decision-theoretic rough set 
model, several new models have been proposed and in- 
vestigated. They offer related but different directions in 
generalizing Pawlak rough sets by incorporating proba- 
bilistic information. 


24.4.1 Variable Precision Rough Sets 


The first version of variable precision rough sets was 
introduced by Ziarko [24.14], in which the standard 
set inclusion [x] C X is generalized into a graded set 
inclusion s([x],X) called a measure of the relative 
degree of misclassification of [x] with respect to X. 
A particular measure suggested by Ziarko is given 
by 


IP] NX] 
Id] 


s([x],X) =1- (24.18) 


where |-| denotes the cardinality of a set. By introducing 
a threshold 0 < z < 0.5, one can define three regions as 
follows 


VPOS,(X) = {x € U | s([x], X) < 2}, 

VNEG,(X) = {xe U|s([x],X) > 1-2, 

VBND.(X) = {x € U | z<s([x],X) <1—z}. 
(24.19) 


A more generalized version using a pair of thresholds 
was late introduced by Katzberg and Ziarko [24.13] as 
follows: forO<l<u<1, 


VPOS a (X) = {x € U | s(x], X) < }, 
VNEG(,,,) (X) = {x € U | s([x], X) = u}, 


VBND iu) (X) = {x € U | 1 < s(x], X) <u}. 
(24.20) 


391 


HHz |) Hed 


392 


HHz |) Hed 


Part C 


Rough Sets 


The one-threshold model may be considered as a spe- 
cial case of the two-threshold model with /=z and 
u= l=; 

One may interpret the ratio in (24.18) as an estima- 
tion of the conditional probability Pr(X|[x]), namely 


Ik] 0X! 
Ikl 


By setting œ = 1—/ and 6 = 1—u, we immediately 
have 


s([x], X) = 1— 


=1—Pr(X|[x]). (24.21) 


POSa—,1—u) (X) = {x € U | Pr(X|[x]) = 1-5 
= {xeU|s(f1,.X) <} 
= VPOS q) (X) , 
NEG4a—,1—u) (X) = {x € U | Pr(X|[x]) = 1- u} 
= {x € U | s([x],X) > u} 
= VNEGq) (X) , 
BND —1,1—1 (X) = {x € U | 1—1<Pr(X|[x])<1— u} 
= {xe U|1< s(x], X) <u} 
= VBND (q, (X) . 


(24.22) 


It follows that, when the particular set-inclusion mea- 
sure defined by (24.18) is used, the variable precision 
rough sets are coincident with the decision-theoretic 
rough sets. 

Variable precision rough sets provide an alter- 
native direction in generalizing Pawlak rough sets 
by considering a graded set-inclusion relation, which 
is not necessarily restricted to a probabilistic in- 
terpretation. If we use other set-inclusion measures, 
we will obtain other types of quantitative rough 
sets [24.38,52]. Unfortunately, subsequent develop- 
ments lose this crucial feature in an attempt to unify 
variable precision rough sets into probabilistic rough 
sets [24.53]. 


24.4.2 Parameterized Rough Sets 


Parameterized rough sets, proposed by Greco 
etal. [24.19,20], generalize probabilistic rough 
sets by introducing a Bayesian confirmation measure 
and a pair of thresholds on the confirmation measure, 
in addition to a pair of thresholds on conditional 
probability. According to Fitelson [24.54], measures of 
confirmation quantify the degree to which a piece of 
evidence E provides evidence for or against or support 
for or against a hypothesis H. 


A measure of confirmation of a piece of evidence E 
with respect to a hypothesis H is denoted by c(E, H). 
A confirmation measure c(E, H) is required to satisfy 
the following minimal property: 


>0 if Pr (A\E) > Pr (A) 
c(E,H)=4 =0_ if Pr(H\E) = Pr (H) (i) 
<0 if Pr(A\E) < Pr (H). 


Two well-known Bayesian confirmation measures 
are [24.55] 


calb], X) = Pr(X|[x]) — Pr(X) , 
Pr(X|b) 


CAPA) = Pr(X) 


(24.23) 


These measures have a probabilistic interpretation. The 
parameterized rough sets can be therefore viewed as 
a different formulation of probabilistic rough sets. 

A first discussion about relationships between con- 
firmation measures and rough sets were proposed by 
Greco et al. [24.56]. Other contributions related to the 
properties of confirmation measures with special atten- 
tion to application to rough sets are given in [24.57]. 

Given a pair of thresholds (s,t) with t< s, three 
(a, P, s, t)-parameterized regions are defined in [24.19, 
20] 


PPOS (a. g.s.) (X) = {x € U | Pr(X|[x]) > a@ 
A c([x],X) = s}, 
PNEG a, 6.5.1) (X) = {x € U | Pr(X|[x]) < £ 
Ac([x],X) St, 
PBND a, 6,5,1)(X) = {x € U | (Pr(X|[x]) > £ 
V c([x], X) > 2) 
A (Pr(X|[x]) < a 
vek], X) <s)}. 


(24.24) 


There exist many Bayesian confirmation measures, 
which makes the model of parameterized rough sets 
more flexible. On the other hand, due to lack of 
a general agreement on a Bayesian confirmation mea- 
sure, choosing an appropriate confirmation measure for 
a particular application may not be an easy task. 


24.4.3 Confirmation-Theoretic Rough Sets 


Although many Bayesian confirmation measures are 
related to the conditional probability Pr(X|[x]), Zhou 


Probabilistic Rough Sets | 24.4 Variants of Probabilistic Rough Sets 393 


and Yao [24.26] argued that the conditional proba- 
bility Pr(X|[x]) and a Bayesian confirmation measure 
have very different semantics and should be used 
for different purposes. For example, the conditional 
probability Pr(X|[x]) gives us an absolute degree of 
confidence in classifying objects from [x] as belong- 
ing to X. On the other hand, a Bayesian measure, 
for example, cg or c,, normally reflects a change of 
confidence in X before and after knowing [x]. Thus, 
a Bayesian confirmation measures is useful to weigh 
the strength of evidence [x] with respect to the hy- 
pothesis X. A mixture of conditional probability and 
confirmation measure in the parameterized rough sets 
may cause a semantic difficulty in interpreting the three 
regions. 

To resolve this difficulty, Zhou and Yao [24.28] sug- 
gested a separation of the parameterized model into two 
models. One is the conventional probabilistic model 
and the other is a confirmation-theoretic model. For 
a Bayesian confirmation measure c([x], X) and a pair of 
thresholds (s,f) with £ < s, three confirmation regions 
are defined by 


CPOS (9 (X) = {Lx] € U/R |c], X) > s}, 
CNEG (5,1) (X) = ib] € U/R | eh], X) < t, 
CBND,,.4) (X) = {[x] € U/R | t < c([x], X) < s}. 


(24.25) 
For the case with s = t, we define 
CPOS. (X) = {[x] € U/R | c([x], X) > s}, 
CNEGg,9 (X) = {[x] € U/R | elk]. X) < s}, 
CBND s,s) (X) = ik] € U/R | c(h], X) = 5}. 
(24.26) 


In the definition, each equivalence class may be viewed 
as a piece of evidence. Thus, the partition U/E, instead 
of the universe, is divided into three regions. An equiva- 
lence class in the positive region supports X to a degree 
at least s, an equivalence class in the negative region 
supports X to a degree at most ¢ and may be viewed 
as against X, and an equivalence class in the boundary 
region is interpreted as neutral toward X. 


24.4.4 Bayesian Rough Sets 
Bayesian rough sets were proposed by Slezak and 


Ziarko (24.15, 16] as a probabilistic model in which the 
required pair of thresholds is interpreted using the a pri- 


ori probability Pr(X). They introduced Bayesian rough 
sets and variable precision Bayesian rough sets. 

For the Bayesian rough sets, the three regions are 
defined by 


BPOS(X) = {x € U | Pr(X|[x]) > Pr(X)} , 
BNEG(X) = {x € U | Pr(X|[x]) < Pr(X)}, 
BBND(X) = {x € U | Pr(X|[x]) = Pr(X)} . (24.27) 


Bayesian rough sets can be viewed as a special case of 
the decision-theoretic rough sets when œ = 6 = Pr(X). 
Semantically, they are very different, however. In con- 
trast to decision-theoretic rough sets, for a set with 
a higher a priori probability Pr(X), many equivalence 
classes may not be put into the positive region in the 
Bayesian rough set model, as the condition Pr(X|[x]) > 
Pr(X) may not hold. For example, the positive re- 
gion of the entire universe is always empty, namely, 
BPOS(U) = Ø. This leads to a difficulty in interpret- 
ing the positive region as a lower approximation of 
a set. 

The difficulty with Bayesian rough sets stems from 
the fact that they are in fact a special model of 
confirmation-theoretic rough sets, which is suitable 
for classifying pieces of evidence (i.e., equivalence 
classes), but is inappropriate for approximating a set. 
Recall that one Bayesian confirmation measure is given 
by calix], X) = Pr(X|[x]) — Pr(X). Therefore, Bayesian 
rough sets can be expressed as confirmation-theoretic 
rough sets as follows, 


BPOS(X) = {x € U | Pr(X|[x]) > Pr(X)} , 
= {xe U | ca([x], X) > 0}, 
=|] CPOS (0.9) (X) , 
BNEG(X) = {x € U | Pr(X|[x]) < Pr(X)}, 
= {x € U | ca([x], X) < 0}, 
=|] CNEGo,.9 ®© , 
BBND(X) = {x € U | Pr(X|[x]) = Pr(X)} 
= {x € U | ca([x],X) = 0}, 
=|_JCBND¢@»)(X). (24.28) 


That is, the Bayesian rough sets are a model of 
confirmation-theoretic rough sets characterized by the 
Bayesian confirmation measure cy with a pair of thresh- 
olds s = t = 0. Slezak and Ziarko [24.17] showed that 


HHz |) Hed 


394 PartC 


Rough Sets 


S°4Z |) Hed 


Bayesian rough sets can also be interpreted by using 
other Bayesian confirmation measures. 

The three regions of the variable precision Bayesian 
rough sets are defined as follows [24.16]: for € € [0, 1) 


VBPOS, (X) = {x € U | Pr(X|[x]) 
>1—e(1—Pr(X))}, 

VBNEG« (X) = {x € U | Pr(X|[x]) < ePr(X)} , 

VBBND, (X) = {x € U | €Pr(X) < Pr(X|[x]) 


<1—e(1—Pr(X))}. (24.29) 


Consider the Bayesian confirmation measure 
c(h], X) = Pr(X|[x))/Pr(X) . 


For the condition of the positive region, when Pr(X°) # 
0 we have 


Pr(X|[x]) > 1—€(1 — Pr(X)) = > c(h], X) <€. 
(24.30) 


Similarly, for the condition defining the negative region, 
when Pr(X) 4 0 we have 

Pr(X|[x]) < €Pr(X) => c(h], X) <€. (24.31) 
That is, [x] is put into the positive region if it confirms 
X° to a degree less than or equal to € and is put into the 
negative region if it confirms X to a degree less than or 
equal to e. In this way, we get a confirmation-theoretic 
interpretation of variable precision Bayesian rough sets. 

Unlike the confirmation-theoretic model defined 
by (24.25), the positive region of variable precision 
Bayesian rough sets is defined based on the confirma- 
tion of the complement of X and negative region is 
defined based on the confirmation of X. This definition 
is a bit awkward to interpret. Generally speaking, it may 
be more natural to define the positive region by those 
equivalence classes that confirm X to at least a certain 
degree. This suggests that one can redefine variable pre- 
cision Bayesian rough sets by using the framework of 
confirmation-theoretic rough sets. Moreover, one can 
use a pair of thresholds instead of one threshold. 


24.5 Three Fundamental Issues of Probabilistic Rough Sets 


For practical applications of probabilistic rough sets, 
one must consider at least the following three funda- 
mental issues [24.58, 59]: 


@ Interpretation and determination of the required pair 
of thresholds, 

@ Estimation of the required conditional probabilities, 
and 

@ Interpretation and applications of three probabilistic 
regions. 


For each of the three issues, this section reviews one 
example of the possible methods. 


24.5.1 Decision-Theoretic Rough Set Model: 
Determining the Thresholds 


A decision-theoretic model formulates the construction 
of rough set approximations as a Bayesian decision 
problem with a set of two states and a set of three 
actions [24.11, 12]. The set of states is given by 22 = 
{X, X°} indicating that an element is in X and not in 
X, respectively. For simplicity, we use the same symbol 
to denote both a subset X and the corresponding state. 
Corresponding to the three regions, the set of actions 


is given by A = {ap,ag,ay}, denoting the actions in 
classifying an object x, namely, deciding x € POS(X), 
deciding x € BND(X), and deciding x € NEG(X), re- 
spectively. The losses regarding the actions for different 
states are given by the 3 x 2 matrix 


X (P) X° (N) 
ap App ÀN 
aB App Ap 
an ANp Ann 


In the matrix, App, Age, and Ayp denote the losses in- 
curred for taking actions ap, ag, and ay, respectively, 
when an object belongs to X, and Apy, Agy and Any 
denote the losses incurred for taking the same actions 
when the object does not belong to X. 

The expected losses associated with taking different 
actions for objects in [x] can be expressed as 


R(ap|[x]) = AppPr(X|P]) + ApyPr(X*|[x)) , 
R(ap|[x]) = AwePr(X|P]) + AwPr(X°|[x)) , 


R(ay|[x]) = AnpPr(X|[x]) + AnwPr(X"|[a]) « 
(24.32) 


Probabilistic Rough Sets | 24.5 Three Fundamental Issues of Probabilistic Rough Sets 395 


The Bayesian decision procedure suggests the follow- 
ing minimum-risk decision rules 


(P) If R(ap|[x]) < R(az|[x]) 

and R(ap|[x]) < R(ay|[x]), decide x € POS(X) ; 
(B) If R(ag|[x]) < R(ae|[x)) 

and R(ag|[x]) < R(ay|[x]), decide x € BND(X) ; 
(N) If R(ay|[x]) < R(ap|[x]) 

and R(ay|[x]) < R(ag|[x]), decide x € NEG(X) . 


In order to make sure that the three regions are mutually 
disjoint, tie-breaking criteria should be added when two 
or three actions have the same risk. We use the follow- 
ing ordering for breaking a tie: ap, ay, ap. 

Consider a special class of loss functions with 


(c0) App < App < Anp. Ann < Àw < Àr. 
(24.33) 


That is, the loss of classifying an object x belonging 
to X into the positive region POS(X) is less than or 
equal to the loss of classifying x into the boundary 
region BND(X), and both of these losses are strictly 
less than the loss of classifying x into the negative 
region NEG(X). The reverse order of losses is used 
for classifying an object not in X. With the condition 
(c0) and the equation Pr(X|[x]) + Pr(X°|[x]) = 1, we 
can express the decision rules (P)—(N) in the following 
simplified form (for a detailed derivation, see refer- 
ences [24.58]) 


(P) If Pr(X|[x]) > @ 

and Pr(X|[x]) > y, decide x€ POS(X); 
(B) If Pr(X|[x]) <@ 

and Pr(X||[x]) > 8, decide x € BND(X) ; 
(N) If Pr(X|[x]) < £ 

and Pr(X|[x]) < y, decide x€ NEG(X) , 


where 
ae: (Apy — Àa) 
(Am — åa) + (Amp — App) ` 
B= (Any — Aww) 
(Any — An) + (Anp — Àp) ` 
(Apy — Aww) 


= ý 24.34 
ae vom 


Each rule is defined by two out of the three parameters. 
By setting œ > f, namely 


(Apy — Apy) 
(Apy — Apy) + (App — App) 
Cae (24.35) 


Cig ee 


we obtain the following condition on the loss func- 
tion [24.58] 


Ane —Apge _— Agp—App 


(cl) (24.36) 


iv Any Ary —Apy 


The condition (c1) implies that 1 > œ > y > f > 0. In 
this case, after tie-breaking, we have the simplified 
rules [24.58] 


(P) If Pr(X|[x]) > œ, decide x € POS(X); 
(B) If B < Pr(X|[x]) <a, decide x € BND(X) ; 
(N) If Pr(X|[x]) < B, decide x € NEG(X) . 


The parameter y is no longer needed. Each object can 
be put into one and only one region by using rules (P), 
(B), and (N). The (œ, £)-probabilistic positive, negative 
and boundary regions are given, respectively, by 


POS. p) (X) = {x € U | Pr(X|[x]) 2 a}, 
BND, p) (X) = {x € U | B < PrI) <a}, 
NEG.) (X) = {x € U | Pr(X|[x]) < 83. (24.37) 


The formulation provides a solid theoretical basis and 
a practical interpretation of the probabilistic rough sets. 
The threshold parameters are systematically calculated 
from a loss function. 

In the development of decision-theoretic rough sets, 
we assume that a loss function is given by experts in 
a particular application. There are studies on other types 
of loss functions and their acquisition [24.60]. Several 
other proposals have also been made regarding the in- 
terpretation and computation of the thresholds, includ- 
ing game-theoretic rough sets [24.21, 22], information- 
theoretic rough sets [24.27], and an optimization-based 
framework [24.43, 61, 62]. 


24.5.2 Naive Bayesian Rough Set Model: 
Estimating the Conditional 
Probability 


Naive Bayesian rough set model was proposed by Yao 
and Zhou [24.59] as a practical method for estimating 


Sz |) Hed 


396 PartC 


Rough Sets 


S'Hz |) Hed 


the conditional probability. First, we perform the logit 
transformation of the conditional probability 


Pr(X||x 
logit(Pr(X|[x])) =1 a 
B Pr(X|[x]) 
= 0g ED (24.38) 


which is a monotonically increasing transformation of 
Pr(X||[x]). Then, we apply the Bayes’ theorem 


Pr((x]|X)Pr(X) 
Pr([x]) 

to infer the a posteriori probability Pr(X|[x]) from the 

likelihood Pr([x]|X) of [x] with respect to X and the 


a priori probability Pr(X). Similarly, for X° we also 
have 


Pr(X|[x]) = (24.39) 


Pr({a]|X°)Pr(X°) 
Pre) 


By substituting results of (24.39) and (24.40) into 
(24.38), we immediately have 


Pr(X°|[x]) = (24.40) 


logit(Pr(X|[x))) = log O(X|[x1) 
Prik) 
Prek) 

jë Pr([x]|X) , Pr(X) 
Pr([x]|X°)  Pr(X°) 


Pr([x]|X) 
Pr([x|X9) + log O(X) , 


(24.41) 


where O(X|[x]) and O(X) are the a posterior and the 
a prior odds, respectively, and Pr([x]|X)/Pr([x]|X°) is 
the likelihood ratio. 

A threshold value on the probability can be ex- 
pressed as another threshold value on logarithm of the 
likelihood ratio. For the positive region, we have 


Pr(X|[x]) > œ 


Pr(X|[x]) 
= los Fea) ETa 
Pr([x]|X)  Pr(X) a 
B ee x) ah oe 
Pr([x]|X) Pr(X°) a 
= los po = 8 Bray ETa 
=g. (24.42) 


Similar expressions can be obtained for the negative and 
boundary regions. The three regions can now be written 
as 


POS (@,8) (X) = fr €U |log Pr(bd|X) v 


Pr([x]|X°) ~ 


BND (af) (X) = Ire U |B’ <e PS 7 
NEG. 00 = fee U oe Frey E 
(24.43) 
where 
e N 
ae ae + log f . (24.44) 


With the transformation, we need to estimate the likeli- 
hoods that are relatively easier to obtain. 

Suppose that an equivalence relation E4 is defined 
by using a subset of attributes A C AT. In the naive 
Bayesian rough set model, we estimate the likelihood 
ratio Pr([x]4|X)/Pr([x]4|X°) through the likelihoods 
Pr([x]a|X) and Pr([x]q|X°) defined by individual at- 
tributes, as the latter can be estimated more accurately. 
For this purpose, based on the results in (24.6), we make 
the following naive conditional independence assump- 
tions 


Pr([xla|X) = Pr (Murs = | [Pr 0 . 
acA acA 

Pr([x]s|X°) = Pr (Aux) = | [ Prax). 
acA acA iisi 


By inserting them into (24.42) and assuming that [x] is 
defined by a subset of attributes A C AT, namely, [x] = 
[x], we have 


PAND y 
~ Pr([x]4|X°) ~ 
Tea Pr(([xJalX) Sy! 


>1 
E Trea Prax ~ 
Pr([x]alX) |X) / 
<=> J log 8 IX) >a (24.46) 


acA 


Probabilistic Rough Sets 


24.5 Three Fundamental Issues of Probabilistic Rough Sets 


Similar conditions can be derived for negative and 
boundary regions. Finally, the three regions can be de- 
fined as 


Pr([x]q|X) 7 
POS p) = {xe U| ea a >a | ‘ 
28 FIR) 
BND, g) (X) = 4y EU | p’ 


Pr([x]alX) |X) / 
sA <7 l. 


acA 
P X 

NEG iep (C) = dxe U| D loe ED <p | , 
acA 

(24.47) 

where 
, (X°) a 
a ss ee 

, (X°) B 

B l Pr(X) + log ip (24.48) 


We obtain a model in which we only need to estimate 
likelihoods of equivalence classes induced by individ- 
ual attributes. 

The likelihoods Pr([x],|X) and Pr([x]q|X°) may be 
simply estimated based on the following frequencies 


_ IP]Ja NAXI 
Pr([xJalX) = K ’ 
cy lka NX] 
Pr([xJalX°) = x] , 
where [x], = {y € U | LO) = L(x)}. An equivalence 


class defined by a single attribute is usually large in 
comparison with an equivalence classes defined by 
a subset of attributes. Probability estimation based on 
the former may be more accurate than based on the lat- 
ter. 

Naive Bayesian rough sets provide only one of pos- 
sible ways to estimate the conditional probability. Other 
estimation methods include logistic regress [24.46] and 
the maximum likelihood estimators [24.63]. 


24.5.3 Three-Way Decisions: Interpreting 
the Three Regions 


A theory of three-way decisions [24.64] is motivated by 
the needs for interpreting the three regions [24.65-67] 


and moves beyond rough sets. The main results of three- 
way decisions can be found in two recent books edited 
by Jia et al. [24.68] and Liu et al. [24.69], respectively. 
We present an interpretation of rough set three regions 
based on the framework of three-way decisions. 

In an information table, with respect to a subset of 
attributes A C AT, an object x induces a logic formula 


)\ a= ha) ; 


acA 


(24.49) 


where J, (x) € V, and the atomic formula a = I,(x) in- 
dicates that the value of an object on attribute a is I4 (x). 
An object y satisfies the formula if J,(y) = I4 (x) for all 
a € A, that is 


( = /\ a= no) 4> Ya EA (hO)= hx). 
acA 
(24.50) 


With these notations, we are ready to interpret rough set 
in three regions. 

From the three regions, we can construct three 
classes of rules for classifying an object, called the pos- 
itive, negative, and boundary rules [24.58, 66, 67]. They 
are expressed in the following forms, for y € U: 


@ Positive rule induced by an equivalence class 
[x] E POS (a, 6) (X) 


if y = Na =LA), accept y € X 


acA 


@ Negative rule induced by an equivalence class 
[x] E NEG .8) (X) 


if y = \\ a= hi). reject y € X 


acA 


@ Boundary rule induced by an equivalence class 
k]  BND@a,g)(X) 


if y H /\ a=1I,(x), neither accept 
acA 


nor reject y EX. 


The three types of rules have very different semantic 
interpretations as defined by their respective decisions. 
A positive rule allows us to accept an object y to be 
a member of X, because y has a higher probability of be- 


397 


S'Hz |) Hed 


398 PartC 


Rough Sets 


9°42 |) Wed 


ing in X due to the facts that y € [x], and Pr(X|[x]4) > a. 
A negative rule enables us to reject an object y to be 
a member of X, because y has lower probability of be- 
ing in X due to the facts that y € [x], and Pr(X|[x],) < £. 
When the probability of y being in X is neither high nor 
low, a boundary rule makes a noncommitment decision. 
Although we explicitly give the class of boundary rules 
for convenience and completeness, we do not really 
need this class, once we have both classes of positive 
and negative rules. Whenever we can not accept nor 
reject an object to be a member of X, we choose a non- 
commitment decision. 

Both actions of acceptance and rejection as associ- 
ated with errors and costs. The error rate of a positive 


rule is given by 1 — Pr(X|[x]), which, by definition of 
the three regions, is at or below | — a. The error rate of 
negative rule is given by Pr(X|[x]) and is at or below 
B. It becomes clear that the introduction of a non- 
commitment decision is to ensure both a low level of 
acceptance error and a low level of rejection error. 
According to the 3 x 2 table in Sect. 24.5.1, the cost 
a positive rule is AppPr(X|[x]a) + Ap (1 — Pr(X|[xh)) 
and is bounded above by a@App + (1 —a)Apy. The cost 
a negative rule is AypPr(X|[x]4) + Amy — Pr(X|[x]4)) 
and is bounded above by BAyp+(1—)Ayw. From 
view of cost, a noncommitment decision is preferred 
if its cost is less than an action of acceptance or 
rejection. 


24.6 Dominance-Based Rough Set Approaches 


Very often value sets V, of some attributes a € AT 
are ordered in the sense that it is meaningful to con- 
sider a binary relation =, on Va such that for x, y € U, 
la (x) Za la (y) means that x possesses some property re- 
lated to attribute a at least as much as y. In this case, 
it is natural to consider =, as complete preorder on V4, 
i.e., a transitive and strongly complete binary relation 
on V, (let us remember that strong completeness means 
that for all v4, Ua E€ Va we have vg Xa Ug OF Ug Za Va 
and that this implies the reflexivity of %4). Observe 
that the binary relation ZY on U defined as x ZY y if 
la(x) Za la(y) for all x,y € U is a complete preorder. 
The first type of properties considered in this perspec- 
tive were preferences encountered in Multiple Criteria 
Decision Aiding (MCDA) (for a comprehensive collec- 
tion of state of the art surveys see [24.70]), where for 
x,y € U, Iy(x) = Ig(y) means x is at least as good as 
y with respect to attribute a that in this case is called 
criterion. If there are attributes a € AT related to some 
complete preorder X4, then the indiscernibility relation 
is unable to produce granules in U taking into account 
the order generated by =,. To do so, the indiscerni- 
bility relation has to be substituted by a new binary 
relation on U that, using a term coming from MCDA, is 
called dominance relation. Suppose, for simplicity, all 
attributes a from AT are criteria related to correspond- 
ing complete preorders %4. 

We say that x dominates y with respect to AC 
AT (shortly, x A-dominates y) denoted by x =y y, if 
I,(x) Za (y) for all a € A. Since zu is a complete pre- 
order on U for each a € AT, Xa is a partial preoder on 
U, i.e. X4 is a reflexive and transitive binary relation 
on U. 


For any x € U and for each nonempty A C AT, we 
can define a positive and a negative cone of dominance, 
denoted by D} (x) and Dj (x), respectively, 


DS @=hev lysis. 


(24.51) 
Dy Q= fyeU | xx y}. 


For simplicity, we also simply write Dt (x) and D7 (x) 
when no confusion arises. 

Let us explain how the rough set concept has 
been generalized to the dominance-based rough set ap- 
proach (DRSA) in order to enable granular computing 
with dominance cones (for more details, see Chap. 22, 
and [24.3 1-34, 71-74)]). 

For any X CU we define upward lower and up- 
per approximations apr* (X) and aprt (X), as well as 
downward lower and upper approximations apr (X) 
and apr (X), as follows = 

apr* (X) = {xe U| DF (a) CX}, 

aprt (X) = {xE U| D(x) NX FQ}, 
apr (X)={xeU|D (x) CX}, 

apr (X)={xeU|Dt()NXFDB}. — (24.52) 


For any X C U, using cones of dominance Dt (x) and 
D- (x), we can define three upward pairwise disjoint 
positive, negative and boundary regions 


POST (X) = {xe U| Dt (x) CX}, 
NEG (X) = {xe U| D(x) NX = 9}, 
BND} (X) = {xe U|DT (x) ZX 


and D` (x) NX Æ Ø}. (24.53) 


Probabilistic Rough Sets | 24.7 A Basic Model of Dominance-Based Probabilistic Rough Sets 


Analogously, for any X C U, we can define three 
downward pairwise disjoint positive, negative, and 
boundary regions 

POS” (X) = {xE U| D(x) CX}, 

NEGT (X) = {x€U| DT (x) NX = 9}, 

BND (X)={xeU|D x) X 


and Dt (x) NX Z Ø}. (24.54) 


Observe that the following complementarity prop- 
erties hold: For all X C U 
POST (X) = NEGT (U — X), 
POST (X) = NEGĦ (U — X), 
BND* (X) = BND (U—X), 


BND” (X) = BND* (U—X). (24.55) 


For all X CU, the pair of upward approximations 
and three upward regions determine each others as 
follows 


POSt (X) = apr? (X) , 
NEGT (X) = (a@prt (X))°, 


BND* (X) = apF" (X) — apr* (X) , (24.56) 
and 

apr* (X) = POS* (X) , 

apr* (X) = POS* (X) UBND(X). (24.57) 


Analogously, for all X C U, the pair of downward 
approximations and three downward regions determine 
each others as follows 


POST (X) = apr (X) , 
NEG™ (X) = (apr (X))° 


BND (X) = apr (X)—apr (X), (24.58) 
and 

apr (X) = POS (X), 

apr (X) = POS (X) UBND(X) . (24.59) 


24.7 A Basic Model of Dominance-Based Probabilistic Rough Sets 


DRSA considers only qualitative relationship between 
positive and negative cones D*(x) and D~(x), and 
a set X, namely, a positive or negative cone is a sub- 
set of the set or has a nonempty intersection with 
the set. This qualitative nature becomes clearer with 
a probabilistic interpretation. Suppose Pr(X|D* (x)) 
denotes the conditional probability that an object is 
in X, given that the object is in Dt (x), as well 
as Pr(X|D~(x)) denotes the conditional probabil- 
ity that an object is in X, given that the object 
is in D~ (x). The conditions for defining rough set 
three upward regions can be equivalently expressed 
as 


D* (x) CX <> Pr(X|Dt (x)) = 1; 
D7 (x) NX = ð 4> Pr(X|D7 (x)) < 0; 
Dt (x) ZLXAD- (x) NX £0 

<=> Pr(X|Dt (x)) <1 


A Pr(X|D" (x)) > 0. (24.60) 


Analogously, the conditions for defining rough set 
three upward regions can be equivalently expressed 


as 
D(x) CX &> Pr(X|D (x) > 1; 
Dt (x) NX = 4> Pr(X|Dt (x) < 0; 
D (xy) LXAD*(xX)NX FD 
<=> Pr(X|D~(x)) < 1A Pr(X|D* (x)) > 0. (24.61) 


By those conditions, DRSA upward and downward 
three regions can be equivalently expressed as 


POS* (X) = {x € U | Pr(X|Dt (x) > 1}, 
NEG™ (X) = {x € U | Pr(X|D™ (x)) < 0}, 
BND? (X) = {x € U | Pr(X|D* (x)) < 1 

A Pr(X|D~ (x)) > 0}, 
POST (X) = {x € U | Pr(X|D7 (x)) > 1}, 
NEGT (X) = {x € U | Pr(X|D*(x)) < 0}, 
BND (X) = {x € U | Pr(X|D~(x)) < 1 

A Pr(X|D* (x)) > 0}. 


(24.62) 


Observe that DRSA approximations use only the two 
extreme values, i. e., 1 and 0, of probability. 


399 


Laz |) Hed 


400 PartC 


Rough Sets 


8°92 |) Hed 


It is natural to generalize DRSA approximations 
by replacing 1 and 0 with some other values in the 
unit interval [0,1]. Given a pair of thresholds œ, B 
with 0< 6 <a <1, the main results of probabilis- 
tic DRSA are the (a, £)-probabilistic regions defined 
by 


Pr(X|D* (x)) > a}, 


NEG, p) © = tr € U | Pr(X|D (x) < B}. 
BND¢, p (X) = {x € U | Pr(X|D* (x)) <a 


A Pr(X|D~ (x)) > p}, 

Pr(X|D (x) >a}, 

Pr(X|Dt (x)) < B}, 

BND @ p) (X) = {x € U | Pr(X|D~ (x) < a 
A Pr(X|Dt (x)) > b}. 


(24.63) 


The DRSA rough set model is a special case in which 
a = 1 and 6 = 0. In the case when 0 < f =a < 1, the 
three regions are given by 


POSE, 4) (X) = {xe U| Pr(X|Dt (x)) >a}, 
NEGE, 9) (X) = {x € U | Pr(X|D7 (x) <a}, 


BND o) (X) = {x € U | Pr(X|DT (x) < œ 

A Pr(X|D~(x)) > a}, 
POS a (X) = {x € U | Pr(X|D~ (x)) > a}, 
NEG a) X) = {x € U | Pr(X|D* (x) <a}, 
BND; o) (X) = {x € U | Pr(X|D~ (x) <a 

A Pr(X|D* (x)) >a}. 


(24.64) 


24.8 Variants of Probabilistic Dominance-Based Rough Set Approach 


Several models generalizing dominance-based rough 
sets by incorporating probabilistic information can be 
considered. 


24.8.1 Variable Consistency 
Dominance-Based Rough Sets 


In a first version of variable consistency dominance- 
based rough sets [24.23] (see also [24.24]) the stan- 
dard set inclusions D+ (x) C X and D7 (x) C X can be 
generalized into graded set inclusion s+ (Dt (x), X) 
and s~ (D` (x), X) called measure of the relative up- 
ward and downward degree of misclassification of 
Dt (x) and D~(x) with respect to X, respectively. 
A particular upward and downward measure is given 
by 


cayenne = [Dt (x) NX| 
s'(D'(x),X)=1 DFO 
Sa _ | DT@)nX| 
S (D (x), X) =1— Po 


(24.65) 


By introducing a threshold 0 < z < 0.5, one can de- 
fine three upward and downward regions as follows 


VPOS? (X) = {x € U | s+ (Dt (x), X) < z}, 
VNEG+ (X) = {x € U |s (D7 (x), X) > 1—2}, 
VBND? (X) = {x € U |s t(D œ), X) >z 
As (D (x),X) <1—z}, 
VPOS; (X) = {x € U | s7 (D7 (x), X) < z}, 
VNEG; (X) = {x € U | s7 (D+ @),X) > 1—2}, 
VBND_ (X) = {x € U | s7 (D7 (x), X) >z 
Ast (Dt (x), X) <1—z}. (24.66) 


A more generalized version using a pair of thresholds 
can be defined as follows: for0 </<u <1, 


VPOS 4, (X) = {xe U| st (DT (x),X) , 
VNEG{  (X) = {x € U | s~ (D~ (x), X) =u}, 
VBND? (X) = {xe U|st(Dt (x), X) > 1 


lu 
= As (D (x),X) <u}, 

VPOS Gy (X) = {x € U |s (D7), X) $B, 
VNEG{_, (X) = {x € U | st (DT (x), X) >u}, 
VBND(,,, (X) = {x € U |s (D7 (@), X) > 1 
Ast (D* (x), X) <u}. 


(24.67) 


Probabilistic Rough Sets 


24.8 Variants of Probabilistic Dominance-Based Rough Set Approach 


The one-threshold model may be considered as a spe- 
cial case of the two-threshold model with / =z and 
u=1-z. 

One may interpret the ratio in (24.65) as an esti- 
mation of the conditional probability Pr(X|Dt (x)) and 
Pr(X|D~ (x)), namely, 


|D* (x) NX| 

st (Pr(X|Dt (x)),X) = 1- "DO 
= 1—Pr(X|D*(x)), 

s7 (Pr(X|D~ (x)),X) = 1 — VE a eA 5 = | 


= 1—Pr(X|D (x)). 
(24.68) 
By setting a = 1—/ and f = 1 — u, we immediately get 
Pr(X|D* (x)) = 1-9 
st (DT (x),.X) <B 


POS} 11-0) = EU 


Pr(X|D (x)) < 1- u} 
s (D (x), X) > u} 
= VNEG ®© , 
BNDÝ iy (X) = (x € U | Pr(X|DT (x)) < 1-1 
A Pr(X|D~ (x)) > 1—u} 
={keU|s*(DT@,X)>1 
As (DT (x), X) <u} 
= VBND{,_ (X) . 
POSG X) = fx E U | Pr(X|D (x) = 1-3 
={xeU|s (D7 (x), X) < } 
= VPOS;,, &) , 
NEGa—; 1- (X) = xE U Pr(X|D* (x)) < 1—u} 
= {re U|st (DT (x), X) > u} 
= VNEG )(X), 
(X) = {x € U | Pr(X|D (x) < 1-1 
A Pr(X|D* (x)) > 1-3 
= {xe U|s~ (D(a), X) > 1 
Ast (Dt (x), X) <u} 
= VBND7,, (X) . 


+ 
BND á —,1—u) 


(24.69) 


24.8.2 Parameterized Dominance-Based 
Rough Sets 


Parameterized rough sets based on dominance [24.24] 
generalize variable consistency DRSA by introduc- 
ing a Bayesian confirmation measure and a pair of 
thresholds on the confirmation measure, in addition 
to a pair of thresholds on conditional probability. Let 
ct (Dt (x), X) and c7 (D7 (x), X) denote a Bayesian 
upward and downward confirmation measure, respec- 
tively, that indicate the degree to which positive or neg- 
ative cones D+ (x) and D7 (x) confirm the hypothesis 
X. The upward and downward Bayesian confirmation 
measures corresponding to those ones introduced in 
Sect. 24.4.3 are 


ct (Dt (x), X) = Pr(X|D* (x)) — Pr(X), 
cq (D` (x), X) = Pr(X|D™ (x) — Pr(X) , 


Pr(X|Dt 
ct (Dt (),X) = rare 
FDO a . acta) 


Given a pair of thresholds (s,f) with t< s, three 
(a, B, s, t)-parameterized regions can be defined as fol- 
lows 


Bo. = {x €U | Pr(X|D* (x) > oF 
Act (DTA), X) = s}, 
Pr(X|D (x)) < f 
Ac (D7 (x),X) R, 
(Pr(X|Dt (x)) < a 
vct (Dt (x),X) <s) 
A (Pr(X|D™ (x)) > B 
Vc (D7 (x), X) >D} 
PPOS @ g.s.) (X) = {x € U | Pr(X|D7 (x) > a 
Ac (D (x), X) 2 s}, 
PNEG 5s.) X) = {x € U | Pr(X|D* (x) < B 
Act (Dt (x),X) <4, 
PBND@. 5.1) (X) = {x € U | (Pr(X|D~ (x) < a 
Vc’ (D(x), X) <s) 
A (Pr(X|Dt (x)) > B 
vet(Dt(x),X) >A}. (24.71) 


Let us remember that a family of consistency 
measures, called gain-type consistency measures, and 


401 


8°92 |) Hed 


402 


8°92 |) Hed 


Part C 


Rough Sets 


inconsistency measures, called cost-type consistency 
measures, larger than confirmation measures, and the 
related dominance-based rough sets have been con- 
sidered in [24.24]. For any xe U and X CU, for 
a consistency measure m,(x, X), x can be assigned to 
the positive region of X if m.(x,X) >a, with œ being 
a proper threshold, while for an inconsistency mea- 
sure mj-(x, X), x can be assigned to the positive region 
of X if m.(x,X) < œ. A consistency measure m,(x, X) 
or an inconsistency measure mj.(x, X) are monotonic 
(Sect. 22.3.2 in Chap. 22) if they do not deteriorate 
when: 


(m1) The set of attributes is growing, 

(m2) The set of objects is growing, 

(m3) x improves its evaluation, so that it dominates 
more objects. 


Among the considered consistency and inconsis- 
tency measures, one that can be considered very inter- 
esting because it enjoys all the considered monotonity 
properties (m1)-(m3) while maintaining a reasonably 
easy formulation is the inconsistency measures £’ which 
is expressed as follows: 


@ Inthe case of dominance-based upward approxima- 


tion 
Dt (x)N(U-X 
stap- DOW») 
|X| 
@ In the case of dominance-based downward approx- 
imation 
D- (x) N (U-X 
Jag- PONU- 


|x| 


Observe that as explained in [24.24], consistency 
and inconsistency measures can be properly reformu- 
lated in order to be used in indiscernbility-based rough 
sets. For example, inconsistency measure £” in case of 
indicernibility-based rough sets becomes 


Faget X= Ik] A (U—X)| 
|X| 
24.8.3 Confirmation-Theoretic 
Dominance-Based Rough Sets 


A separation of the parameterized model into two 
models within DRSA can be constructed as follows. 
One is the conventional probabilistic model and the 
other is a confirmation-theoretic model. For an up- 
ward and a downward Bayesian confirmation mea- 
sure (ct (D+ (x), X) and c7 (D7 (x), X)), and a pair of 


thresholds (s,f) with t < s, three confirmation regions 


are defined by 
CPOS )(X) = {xe U 
CNEGE  (X) = {xE U 
CBNDẸ y(X) = {x€ U 


(s.t 


ct (Dt (x),X) > 5}, 
c (D (x),X) <h, 
ct (Dt (x),X) <s 


Ac (D (x), X) >t}, 


CPOSÇ y (X) = {x E€ U 
CNEG(, p (X) = {x € U 
CBND,, p(X) = {x€ U 


Act (D 


For the case with s = t, we d 


CPOS X) = {xe U 


CNEG (Xx) = {xe U 
CBND{ , (X) = {xe U 


Ac (D 
CPOS (X) = {xe U 
CNEGG (X) = {xe U 
CBND,, (X) = (re U 


ct (D7 (x), X) > s}, 
c (Dt (x), X) <8, 
c (D (x), X) <s 
A,X) >t. (24.72) 
efine 

ct (Dt (x),X) > s}, 

é€ (O°), X) <s}, 

ct (Dt (x),X) <s 
(x), X) > s} 

ct (Dt (x), X) > s}, 

c (D (x),X) <s}, 
ct(Dt(x),X) <s 


Act (Dt (œ), X) >s}. (24.73) 


24.8.4 Bayesian Dominance-Based 
Rough Sets 


Bayesian DRSAmodel in which the required pair of 
thresholds is interpreted using a priori probability 
Pr(X) can be defined as an extension of the Bayesian 
DRSA and variable consistency Bayesian DRSA, as ex- 
plained below. 

For the Bayesian DRSA, the three upward and 
downward regions are defined by 


Pr(X|D* (x)) > Pr(X)}, 
BNEG* (X) = {x € U | Pr(X|D~ (x)) < Pr(X)}, 
BBND* (X) = {x € U | Pr(X|Dt (x)) < Pr(X) 

A Pr(X|D" (x)) > Pr(X)}, 

BPOS (X) = {x € U | Pr(X|D (x)) > Pr(X)} , 
BNEG™ (X) = {x € U | Pr(X|Dt (x)) < Pr(X)}, 
BBND™ (X) = {x € U | Pr(X|D™ (x)) < Pr(X) 

A Pr(X|D* (x)) > Pr(X)}. (24.74) 


BPOS (X) = {x€ U 


Probabilistic Rough Sets | 24.9 Three Fundamental Issues of Probabilistic Dominance-Based Rough Sets 


Bayesian dominance-based rough sets can be viewed 
as a special case of the decision-theoretic DRSA when 
a= p =Pr(X). 

Recalling the upward and downward DRSA, 
Bayesian confirmation measures 


cf (Dt (x), X) = Pr(X|D* (x)) — Pr(X) 
and 
Cy (D7 (x), X) = Pr(X|D™ (x)) — Pr(X) , 


Bayesian dominance-based rough sets can be expressed 
as confirmation-theoretic dominance-based rough sets 
as follows 


BPOS (X) = {x € U | Pr(X|Dt (x)) > Pr(X)}, 
={xEeU ct (Dt (x),X) > 0}, 
BNEG™ (X) = {x € U | Pr(X|D~ (x)) < Pr(X)}, 
= {x € U | cg (D7 (x), X) < 0}, 
BBND(X)* = {x € U | Pr(X|DT (x) < Pr(X) 
^ Pr(X|D~ (x)) = Pr(X)} 
={xeU|ct(Dt(a),X) <0 
Acq (D (x), X) = 0}, 


BPOST (X) = {x € U | Pr(X|D™ (x)) > Pr(X)} , 
={xeU|c,(D (x), X) > 0}, 
BNEG™ (X) = {x € U | Pr(X|D~ (x)) < Pr(X)}, 
= {xE U |c] (D*(x),X) <0}, 
BBND(X)* = {x € U | Pr(X|D™ (x)) < Pr(X) 
A Pr(X|DT (x)) > Pr(X)} 
={xeU|c,(D (x), X) <0 
Act (Dt (x),X) = 0}. 


(24.75) 


That is, the Bayesian rough sets are models of 
confirmation-theoretic rough sets characterized by the 
upward and downward Bayesian confirmation measures 
cT and c, with a pair of thresholds s = t = 0. 

The three upward and downward regions of the 
variable precision Bayesian rough sets are defined as 
follows: for € € [0, 1), 


VBPOS¢ (X) = {x € U | Pr(X|[x]) 
211 — Pr(X)); , 
VBNEG,; (X) = {x € U | Pr(X|[x]) < €Pr(X)} , 
VBBND, (X) = {x € U | €Pr(X) < Pr(X|[x]) 
<1—e(1—Pr(X))}. (24.76) 


24.9 Three Fundamental Issues of Probabilistic Dominance-Based 


Rough Sets 


Also for probabilistic dominance-based rough sets, one 
must consider the three fundamental issues of inter- 
pretation and determination of the required pair of 
thresholds, estimation of the required conditional prob- 
abilities, and interpretation and applications of three 
probabilistic regions. 

These three issues are considered in this section 
with respect to dominance-based rough sets. 


24.9.1 Decision-Theoretic 
Dominance-Based 
Rough Set Model: 
Determining the Thresholds 


Following [24.75], a decision-theoretic model formu- 
lates the construction of dominance-based rough set ap- 
proximations as a Bayesian decision problem with a set 
of two states 2 = {X, X°}, indicating that an element 
is in X and not in X, respectively. In the case of up- 


ward dominance-based rough sets, we consider a set 
of three actions A+ = far, aj, ar}, with at decid- 
ingx € POS* (X), ag deciding x € BNDT (X), and ay 
deciding x € NEG (X), respectively. In case of down- 
ward dominance-based rough sets, we consider a set of 
three actions AT = {ap , az ,ay }, witha, deciding x € 
POS” (X), ag deciding x € BND (X), and ay deciding 
x € NEG (X), respectively. The losses regarding the ac- 
tions for different states are given by the 6 x 2 matrix 


X(P) X(N) 
+ 


ap Ap ÀN 
ay Np ÀW 
i Ame An 
ap Ap Àw 


403 


6°92 |) Hed 


4o04 PartC 


Rough Sets 


6°42 |) Hed 


In the matrix: 


@ In the case that upward dominance-based rough 
approximations are considered, A, A, and A 
denote the losses incurred for taking actions ar 
at, and at, respectively, when an object belongs 
to X, and pia A and Ane denote the losses in- 
curred for taking the same actions when the object 
does not belong to X, 

@ In the case that downward dominance-based rough 
approximations are considered, App, Age, and Ajp 
denote the losses incurred for taking actions ap , dg , 
and ay , respectively, when an object belongs to X, 
and Apy. Agy and Axy denote the losses incurred for 
taking the same actions when the object does not 
belong to X. 


In the case that upward dominance-based rough 
approximations are considered, the expected losses as- 
sociated with taking different actions for objects in 
D+ (x) can be expressed as 


Raz |Dt (x) = APr(X|D* (x) 
+ APr(X*|Dt (x), 
R(ag |D* (x) = Ab Pr(X|Dt (x) 
+ AgPr(X*|Dt (x), 
R(ax DE (x) = ANpPr(X|DT (x) 

+ AX Pr(X°|D* (x) . 


(24.77) 


In the case that downward dominance-based rough 
approximations are considered, the expected losses as- 
sociated with taking different actions for objects in 
D- (x) can be expressed as 


R(ap |D (x)) = AppPr(X|D™ (x) 

+ ApyPr(X"|D- (x) , 
R(ag |D" (x)) = AgpPr(X|D™ (x) 

+ Àw Pr(X |D (x)), 
R(ay |D (x)) = AypPr(X|D™ (x) 

+å Pr X ID (x)) . (24.78) 


In the case that upward dominance-based rough 
approximations are considered, the Bayesian decision 
procedure suggests the following minimum -risk deci- 


sion rules 

(P+) If R(at|D* (x)) < R(at |Dt œ) 
and R(aj |[x]) < R(ax |D* (x), 
decide x € POST (X); 

(BT) If (az |D* (x) < R(ap |D* (x) 
and R(ap Ik) < Rat Dt (x), 
decide x € BND* (X); 

(N+) If R(at|Dt (x)) < R(at |Dt (x) 
and Rat |[x]) < Raz D(a), 
decide x € NEGT (X) . 


In the case that downward dominance-based rough 
approximations are considered, the Bayesian decision 
procedure suggests the following minimum-risk deci- 
sion rules 
(P7) If R(ap |D (x) < Rag |D (x) 
and R(ap |]) < Ray |D~ (x) 
decide x € POS” (X); 

(B) If R(ag |D (x) < R(ap |D (x) 
and R(ag |[x]) < R(ay |D (x)) , 
decide x € BND (X); 

N`) If Ray |D™ (x)) < R(ap |D™ (x) 
and R(ay |[x]) < R(ag |D (x)) , 
decide x € NEG (X). 


Also in the case that dominance-based rough ap- 
proximations are considered, when two or three actions 
have the same risk, one can use the same ordering for 
breaking a tie used in case indiscernibility-based rough 
approximations are used: a ae = in case upward 
rough approximations are considered, and ap , dy , dg 
in case downward rough approximations are consid- 
ered. 

Analogously to Sect. 24.5.1, let us consider the spe- 
cial class of loss functions with 


(0f). AF <A < Aip 


07). App < App <Anp, 


paula ekg 
Aw SÀN = oe 
(24.79) 


With the conditions (c0*+) and (c07), and the equations 
Pr(X|D* (x)) + Pr(X°|D* (x) = 1 
and 


Pr(X|D" (x)) + Pr(X |D (x) = 1, 


Probabilistic Rough Sets | 24.9 Three Fundamental Issues of Probabilistic Dominance-Based Rough Sets 405 


we can express the decision rules (P+)-(N+) and 
(P~)-(N~ ) in the following simplified form 

(PH) If Pr(X|D* (x) > at 
and Pr(X|Dt (x))>y*, 
decide x € POST (X); 

(B+) If Pr(X|Dt (x) < at 
and Pr(X|D* (x)) > BY, 
decide x € BND* (X) ; 

NE) If Pr(X|D¥ (a) < BT 
and Pr(X|Dt (x)) < yt, 
decide x € NEG™ (X); 

(PT) If Pr(X|D~ (x)) > «7 
and Pr(X|D~(x))>y*, 
decide x € POS” (X); 

(B7) If Pr(X|D~ (x)) < a7 
and Pr(X|D~ (x)) > B*, 
decide x € BND (X); 

N) If Pr(X|D™ (x) < BT 
and Pr(X|D™ (x)) < yt, 
decide x € NEG (X). 


where 
FO ARAR 
Se ome here eee ene 
Bt- (gy — Any) . 
Cee nn + Ai —Ab) 
pe (ny Any) 
ARAR AA) 
E (Amy — Am) 
Aw- Am) + Ap Am) 
jpa On in 
Aw = Ann) oF (Aip = App) 
y= Any — Aw) (24.80) 


By setting at > B+ and a~ > fT, we obtain that 
l>at>yts pt >Oandl>a->y—>p->0, 
that, after tie breaking, we give the following simplified 
tules 


(P+) If Pr(X|Dt(x)) = at , 
decide x € POST (X) ; 

(Bt) If B+ < Pr(x|Dt (x)) <at, 
decide x € BND* (X) ; 

(N+) If Pr(X|D* (x)) < Bt, 
decide x € NEG™ (X) , 

(PT) If Pr(X|D7 (x)) > a7, 
decide x € POS” (X); 

(BT) If 87 < Pr(X|D~(x)) <a7, 
decide x € BND” (X); 

(N`) If Pr(X|D" (x) < 87, 
decide x € NEG (X), 


so that the parameters y+ and y7 are no longer needed. 
Each object can be put into one and only one upward 
region, and one and only one downward region by us- 
ing rules (P+), (B+) and (Nt), and (P7), (B7) and 
(NT), respectively. The upward (a+, 8+)-probabilistic 
positive, negative, and boundary regions and downward 
(a—, B)-probabilistic positive, negative, and bound- 
ary regions are given, respectively, by 


POSE. g4)(X) = {xe U | Pr(X|Dt (x) = aT}, 

BNDĖ + 9+) ={xeU| Bt < Pr(X|Dt (x) 
<at}, 

NEG* 4 gO = {ve U| Pr(X|D* (a) < B}, 


POS- g- X) = tx EU Pr(X&|D (x) => a7}, 
BND(q- g—-)(X) = {x € U | BO < Pr(X|D" (x) 
<a}, 


NEG(q.—,g-)(X) = {x € U | Pr(X|D" (x) < BT}. 
(24.81) 


An alternative decision theoretic model for 
dominance-based rough sets taking into account in the 
cost function the conditional probabilities P(X|D* (x)) 
and P(X°|D™ (x)) for upward rough approximations, as 
well as the conditional probabilities P(X|D™ (x)) and 
P(X°|D* (x)) for downward rough approximations, can 
be defined as follows. 

In the case that upward dominance-based rough 
approximations are considered, the expected losses as- 
sociated with taking different actions for objects in 


6°92 |) Hed 


406 PartC | Rough Sets 


6°42 |) Hed 


D* (x) can be expressed as 

R(ajz |Dt (x), D7 (x)) = AfPr(X|Dt œ) 
+ ApyPr(X°|D~ (x), 
R(ajz |D (x), D7 (x)) = ApePr(X|DT (x)) 
+ A Pr(X°|D~ (x) . 
Rax |D* (x), D7 (x) = AsePr(X|D* (x) 


+ AX, Pr(X°|D7 (x) . 
(24.82) 


In the case that downward dominance-based rough 
approximations are considered, the expected losses as- 
sociated with taking different actions for objects in 
D- (x) can be expressed as 


R(ap |D* (x), D7 (x) = AppPr(X|D~ (x)) 
+ AmPr(X°|DT (x) , 
R(az |D* (x), D7 (x) = AgPr(X|D™ (x) 
+ APr(X°|Dt (x)) , 
R(ay IDT (x), D~ (x)) = AxpPr(X|D™ (x)) 


+ Au Pr(X°|Dt (x)) . 
(24.83) 


24.9.2 Stochastic Dominance-Based 
Rough Set Approach: 
Estimating the Conditional 
Probability 


Naive Bayesian rough set model presented for rough 
sets based on indiscernibility in Sect. 24.5.2 can be ex- 
tended quite straightforwardly to rough sets based on 
dominance. Thus, in this section, we present a different 
approach to estimate probabilities for rough approxima- 
tions: stochastic rough set approach [24.25] (see also 
Sect. 22.3.3 in Chap. 22). It can be applied also to rough 
sets based on indiscernibility, but here we present this 
approach taking into consideration rough sets based on 
dominance. In the following, we shall consider upward 
dominance-based approximations of a given X C U. 
However, the same approach can be used for downward 
dominance-based approximations. From a probabilis- 
tic point of view, the assignment of object x to X C 
U can be made with probability Pr(X |Dt (x)) and 
Pr(X|D~ (x)). This probability is supposed to satisfy 
the usual axioms of probability 


Pr(U|D* (x)) =1, 


Pr(U—X|Dt (x)) = 1—Pr(X|Dt(@), 
Pr(U|D (x) = 1, 
Pr(U—X|D (x)) = 1—Pr(X|D (x)) . 


Moreover, this probability has to satisfy an axiom re- 
lated to the choice of the rough upward approximation, 
i.e., the positive monotonic relationships one expects 
between membership in X C U and possession of the 
properties related to attributes from AT, i.e., the domi- 
nance relation =: for any x, y € U such that x = y 


(i) Pr(X|D* (x)) > Pr(X|D* (y)) , 
(ii) Pr(U—X|D~— (x)) < Pr(U—X|D~— (y)). 


Condition (i) says that if objects x possesses properties 
related to attributes from AT at least as object y, i.e., 
x = y, then the probability that x belongs to X has to 
be not smaller than the probability that y belongs to X. 
Analogously, Condition (ii) says that since x = y , then 
the probability that x does not belong to X should not 
be greater than the probability that y does not belong to 
X. Observe that (ii) can be written also as 


(ii) Pr(X|D™ (x)) = Pr(X|D“(y)) . 


These probabilities are unknown but can be esti- 
mated from data. For each X C U, we have a bi- 
nary problem of estimating the conditional proba- 
bilities Pr(X|Dt (x)) = 1—Pr(U—X|Dt(x)) and the 
conditional probabilities Pr(X|D™ (x)) = 1 — Pr(U — 
X|D~(x)). It can be solved by isotonic regres- 
sion [24.25]. For X C U and for any x € U, let y(x, X) = 
1 if xe€X, otherwise y(x,X)=0. Then one can 
choose estimates Pr*(X|D+(x)) and Pr*(X|D~(x)) 
with Pr*(X|D*(x)) and Pr*(X|D~(x)) which min- 
imize the squared distance to the class assignment 
y(x, X), subject to the monotonicity constraints related 
to the dominance relation = on the attributes from AT 
(see also Sect. 22.3.3 in Chap. 22) 


Minimize 

Yl OG, X) — Pr(X|D* (a)? 
xEU 

+ OG, X) — Pr(X|D~ (x)))? 
subject to 


Pr(X|D* (x)) > Pr(X|Dt (z)) and 
Pr(X|D (x)) > Pr(X|D™ (z)) if x = z, 
forallx,zeU. 


Probabilistic Rough Sets | 24.9 Three Fundamental Issues of Probabilistic Dominance-Based Rough Sets 


Then, stochastic w-lower approximations of X C U 
can be defined as 


P(X) = {x€ U: Pr(X|D* (x)) >a}, 
P®(U—X) = {x€ U: Pr(U—X|D (x) >a}. 


Replacing the unknown probabilities 
Pr(X|D* (x) 
and 
Pr(U—X|D" (x)) 
by their estimates 
Pr*(X|D* (x) 
and 
Pr*(U—X|D™ (x)) 
obtained from isotonic regression, we get 
p@(X) = fx EU: Pr*(X|D+ (x) > al , 
P” (U—X) = {x€ U : Pr*(U—X|D (x)) > a}, 


where parameter œ € [0.5,1] controls the allowed 
amount of inconsistency. 

Solving isotonic regression requires O(|U|*) time, 
but a good heuristic needs only O(|U|’). 

In fact, as shown in [24.25] and recalled in 
Sect. 22.3.3 in Chap. 22, we do not really need to know 
the probability estimates to obtain stochastic lower 
approximations. We only need to know for which ob- 
ject x€ U, Pr*(X|Dt(x)) > œ and for which x € U, 
Pr* (U — X|D7 (x)) >a (i.e., Pr* (X|D7 (x)) < 1-a). 
This can be found by solving a linear programming (re- 
assignment) problem. 

As before, y(x,X)=1 if xexX, otherwise 
y(x,X)=0. Let d(x,X) be the decision variable 
which determines a new class assignment for object 
x. Then, reassign objects to X if d* (x, X) = 1, and to 
U -—X if d* (x, X) = 0, such that the new class assign- 
ments are consistent with the dominance principle, 
where d*(x,X) results from solving the following 
linear programming problem 


Minimize >, wew lyx, X) — d(x, X)| 
zEU 
subject to d(x, X) > d(z,X) if x =z 
for all x, z € U 


where w; and wọ are arbitrary positive weights. 


Due to unimodularity of the constraint matrix, the 
optimal solution of this linear programming problem is 
always integer, i.e., d*(x,X) € {0, 1}. For all objects 
consistent with the dominance principle, d* (x, X) = 
y(x, X). If we set wo =a and w; =a—1, then the 
optimal solution d*(x,X) satisfies: d*(x,X)=1< 
Pr* (X|D* (x)) > a. If we set wọ = 1—a@ and w; =a, 
then the optimal solution d* (x, X) satisfies: d* (x, X) = 
0 S Pr*(X|D~ (x)) < 1-a. 

Solving the reassignment problem twice, we can 
obtain the lower approximations P“ (X), P% (U — X), 
without knowing the probability estimates. 


24.9.3 Three-Way Decisions: 
Interpreting the Three Regions 
in the Case of Dominance-Based 
Rough Sets 


In this section, we present an interpretation of domi- 
nance-based rough set three regions taking into consid- 
eration the framework of three-way decisions. 

In an information table, with respect to a subset of 
attributes A C AT, an object x induces logic formulae 


N uLO® Zava, (24.84) 
acA 
NN Va Za l), (24.85) 
acA 


where 74 (x), Va € Va and 


@ The atomic formula va Xa la (x) indicates that object 
x taking value J, (x) on attribute a possess a property 
related to a not more than any object y taking value 
I,(y) = va on attribute a. 

@ The atomic formula J,(x) Xa Va indicates that object 
x taking value J, (x) on attribute a possess a property 
related to a not less than any object y taking value 
I,(y) = va On attribute a. 


Thus, an object y satisfies the formula 


A la(x) Za Va if laO) X va forallaeA, 


acA 


that is, 


b E N tal) Xa va => Va EA, (LO) Za ¥ 
acA 
(24.86) 


407 


6°92 |) Hed 


408 


6°42 |) Hed 


Part C 


Rough Sets 


Analogously, an object y satisfies the formula 


/\ Va Za la) if va Z laO) for allae A, 


acA 


that is, 


(> = /\ Va Xa la(x) = Ya EA, (Va Xa no) : 
acA 
(24.87) 


With these notations, we are ready to interpret upward 
and downward dominance-based rough set three re- 
gions. 

From the upward and downward three regions, we 
can construct three classes of rules for classifying an 
object, called the upward and downward positive, neg- 
ative, and boundary rules. 

They are expressed in the following forms: for 
yeu, 


@ Positive rule induced by an upward cone 
+ + : 
D™ (x) C POS p) : 


if yH NALO Xa Va, accept yE X, 


acA 


@ Negative rule induced by the complement of an up- 
ward cone 


U -Dt (x) € NEGE, p% : 


ifyH => N ba Xa Va , rejectyE X, 


acA 


© Boundary rule induced by an upward cone Dt (x) 
and its complement U — D* (x) such that 


Dt (x) Z POSE, p (X) 
and (U—D*(x)) Z NEG, 4)(X): 
ify A uO) Za va ^5 NN aQ) Za tia, 


acA acA 


neither accept nor reject y € X , 


@ Positive rule induced by an downward cone 
D (x) C POS gy (X): 


if yH /\ Va Xa la(x), accept y € X , 


acA 


@ Negative rule induced by the complement of a 
downward cone 


U- D(x) CNEGG, g)(X): 


ify= ~ VAN Va Za la(x), reject ye X, 


acA 


@ Boundary rule induced by a downward cone D7 (x) 
and its complement U — D7 (x) such that 


D(x) Z POS% p) (X) 
and U- D~ (x) € NEG g)(X) : 
if y VAN Va Za lax) An VAN Ua Za I,(x), 


acA acA 


neither accept nor reject y € X . 


The three types of rules have a semantic interpreta- 
tions analogous to those induced by probabilistic rough 
sets based on indiscernibility presented in Sect. 24.5.3. 
Let us consider the rules related to POSF and NEGF. 
A positive rule allows us to accept an object y to 
be a member of X, because y has a higher probabil- 
ity of being in X due to the facts that y € DĦ (x) and 
Pr(X|Dt (x)) > a*. A negative rule enables us to re- 
ject an object y to be a member of X, because y has 
lower probability of being in X due to the facts that 
y € D* (x) and Pr(X|Dt (x)) < B+. When the proba- 
bility of y being in X is neither high nor low, a boundary 
rule makes a noncommitment decision. 

The error rate of a positive rule is given by 
1 — Pr(X|Dt (x)), which, by definition of the three 
regions, is at or below 1—a*. The error rate of neg- 
ative rule is given by Pr(X|Dt (x)) and is at or below 
B+. The cost of a positive rule is Ape Pr(X|D* (x)) + 
AR ( 1—Pr(X|D*(x))) and is bounded above by 
atA + ash, The cost of a negative rule 
is AXSPr(X|Dt (x)) + As, (1 — Pr(X|Dt (x))) and is 
bounded above by pray +(1— BHIA. 


Probabilistic Rough Sets | References 


24.10 Conclusions 


A basic probabilistic rough set model is formulated 
by using a pair of thresholds on conditional probabil- 
ities, which leads to flexibility and robustness when 
performing classification or decision-making tasks. 
Three theories are the supporting pillars of proba- 
bilistic rough sets. Bayesian decision theory enables 
us to determine and interpret the required thresh- 
olds by using more operable notions such as loss, 
cost, risk, etc. Bayesian inference ensures us to esti- 
mate the conditional probability accurately. A theory 
of three-way decisions allows us to make a wise de- 
cision in the presence of incomplete or insufficient 
information. 


References 


Other probabilistic rough set models have also been 
described. We have shown how a probabilistic approach 
can be applied when information related to some order. 
The order concerns degrees in which an object has some 
properties related to considered attributes. This kind of 
order can be handled by the well-known rough set ex- 
tension called dominance-based rough set approach. 

One may expect a continuous growth of interest in 
probabilistic approaches to rough sets. An important 
task is to examine fully, in the light of three fundamen- 
tal issues concerning the basic model, the semantics of 
each model, in order to identify its limitations and ap- 
propriate areas of applications. 


24.1 Z. Pawlak: Rough set, Int. J. Inf. Comput. Sci. 11, 
341-356 (1982) 

24.2 Z. Pawlak: Rough Sets: Theoretical Aspects of Rea- 
soning About Data (Kluwer, Dordrecht 1991) 

24.3 W. Marek, Z. Pawlak: Information storage and re- 
trieval systems: mathematical foundations, Theor. 
Comput. Sci. 1, 331-354 (1976) 

24.4 YY. Yao: A note on definability and approximations. 
In: Transactions on Rough Sets VII, Lecture Notes 
in Computer Science, Vol. 4400, ed. by J.F. Peters, 
A. Skowron, V.W. Marek, E. Orlowska, R. Stowinski, 
W. Ziarko (Springer, Heidelberg 2007) pp. 274-282 

24.5 YY. Yao: Probabilistic approaches to rough sets, Ex- 
pert Syst. 20, 287-297 (2003) 

24.6 YY. Yao: Probabilistic rough set approximations, 
Int. J. Approx. Reason. 49, 255-271 (2008) 

24.7 Z. Pawlak, S.K.M. Wong, W. Ziarko: Rough sets: 
Probabilistic versus deterministic approach, Int. 
J. Man-Mach. Stud. 29, 81-95 (1988) 

24.8 S. K. M. Wong, W. Ziarko: A probabilistic model of 
approximate classification and decision rules with 
uncertainty in inductive learning, Technical Report 
CS-85-23 (Department of Computer Science, Uni- 
versity of Regina 1985) 

24.9 S.K.M. Wong, W. Ziarko: INFER — an adaptive deci- 
sion support system based on the probabilistic ap- 
proximate classifications, Proc. 6th Int. Workshop 
on Expert Syst. Their Appl., Vol. 1 (1986) pp. 713-726 

24.10 S.K.M. Wong, W. Ziarko: Comparison of the proba- 
bilistic approximate classification and the fuzzy set 
model, Fuzzy Sets Syst. 21(3), 357-362 (1987) 

24.11 YY. Yao, S.K.M. Wong: A decision theoretic frame- 
work for approximating concepts, Int. J. Man- 
Mach. Stud. 37, 793-809 (1992) 

24.12 YY. Yao, S.K.M. Wong, P. Lingras: A decision- 
theoretic rough set model. In: Methodologies for 
Intelligent Systems, Vol. 5, ed. by Z.W. Ras, M. Ze- 


mankova, M.L. Emrich (North-Holland, New York 
1990) pp. 17-24 

24.13 J.D. Katzberg, W. Ziarko: Variable precision rough 
sets with asymmetric bounds. In: Rough Sets, Fuzzy 
Sets and Knowledge Discovery, ed. by W. Ziarko 
(Springer, Heidelberg 1994) pp. 167-177 

24.14 W. Ziarko: Variable precision rough set model, 
J. Comput. Syst. Sci. 46, 39-59 (1993) 

24.15 D. Slezak, W. Ziarko: Bayesian rough set model, 
Proc. Found. Data Min. (FDM 2002) (2002) pp. 131- 
135 

24.16 D. Ślęzak, W. Ziarko: Variable precision Bayesian 
rough set model, Rough Sets, Fuzzy Sets, Data 
Minging and Granular Comput. (RSFGrC 2013), Lect. 
Notes Comput. Sci. (Lect. Notes Artif. Intel.), Vol. 
2639, ed. by G.Y. Wang, Q. Liu, Y.Y. Yao, A. Skowron 
(Springer, Heidelberg 2003) pp. 312-315 

24.17 D. Slezak, W. Ziarko: The investigation of the 
Bayesian rough set model, Int. J. Approx. Reason. 
40, 81-91 (2005) 

24.18 H.Y. Zhang, J. Zhou, D.Q. Miao, C. Gao: Bayesian 
rough set model: a further investigation, Int. J. Ap- 
prox. Reason. 53, 541-557 (2012) 

24.19 S. Greco, B. Matarazzo, R. Stowiński: Rough mem- 
bership and Bayesian confirmation measures for 
parameterized rough sets. In: Rough Sets, Fuzzy 
Sets, Data Mining and Granular Computing, Lec- 
ture Notes in Computer Science, Vol. 3641, ed. by 
D. Slezak, G.Y. Wang, M. Szczuka, |. Duntsch, Y.Y. Yao 
(Springer, Heidelberg 2005) pp. 314-324 

24.20 S. Greco, B. Matarazzo, R. Stowinski: Parameter- 
ized rough set model using rough membership and 
Bayesian confirmation measures, Int. J. Approx. 
Reason. 49, 285-300 (2008) 

24.21 N. Azam, J.T. Yao: Analyzing uncertainties of prob- 
abilistic rough set regions with game-theoretic 
rough sets, Int. J. Approx. Reason. 55, 142-155 (2014) 


409 


Hz |) Hed 


410 =~ Part C 


Rough Sets 


Hz |) Hed 


24.22 


24.23 


24.24 


24.25 


24.26 


24.27 


24.28 


24.29 


24.30 


24.31 


24.32 


24.33 


24.34 


J.P. Herbert, J.T. Yao: Game-theoretic rough sets, 
Fundam. Inf. 108, 267-286 (2011) 

S. Greco, B. Matarazzo, R. Stowinski, J. Stefanowski: 
Variable consistency model of dominance-based 
rough set approach. In: Rough Sets and Cur- 
rent Trends in Computing, Lecture Notes in Com- 
puter Science, Vol. 2005, ed. by W. Ziarko, Y.Y. Yao 
(Springer, Heidelberg 2001) pp. 170-181 

J. Btaszczynski, S. Greco, R. Słowiński, M. Sze- 
lag: Monotonic variable consistency rough set ap- 
proaches, Int. J. Approx. Reason. 50, 979-999 
(2009) 

W. Kottowski, K. Dembczynski, S. Greco, R. Stow- 
iński: Stochastic dominance-based rough set 
model for ordinal classification, Inf. Sci. 178, 4019- 
4037 (2008) 

B. Zhou, Y.Y. Yao: Feature selection based on 
confirmation-theoretic rough sets. In: Rough Sets 
and Current Trends in Computing, Lecture Notes 
in Computer Science, Vol. 8536, ed. by C. Cor- 
nelis, M. Kryszkiewicz, D. Slezak, E.M. Ruiz, R. Bello, 
L. Shang (Springer, Heidelberg 2014) pp. 181-188 
X.F. Deng, Y.Y. Yao: An information-theoretic inter- 
pretation of thresholds in probabilistic rough sets. 
In: Rough Sets and Knowledge Technology, Lecture 
Notes in Computer Science, Vol. 7414, ed. by T.R. Li, 
H.S. Nguyen, G.Y. Wang, J. Grzymala-Busse, R. Jan- 
icki (Springer, Heidelberg 2012) pp. 369-378 

B. Zhou, Y.Y. Yao: Comparison of two models of 
probabilistic rough sets. In: Rough Sets and Knowl- 
edge Technology, Lecture Notes in Computer Sci- 
ence, Vol. 8171, ed. by P. Lingras, M. Wolski, C. Cor- 
nelis, S. Mitra, P. Wasilewski (Springer, Heidelberg 
2013) pp. 121-132 

J.W. Grzymala-Busse: Generalized parameterized 
approximations. In: Rough Sets and Knowledge 
Technology, Lecture Notes in Computer Science, 
Vol. 6954, ed. by J.T. Yao, S. Ramanna, G.Y. Wang, 
Z. Suraj (Springer, Heidelberg 2011) pp. 36-145 

J.W. Grzymala-Busse: Generalized probabilistic 
approximations. In: Transactions on Rough Sets, 
Lecture Notes in Computer Science, Vol. 7736, ed. 
by J.F. Peters, A. Skowron, S. Ramanna, Z. Suraj, 
X. Wang (Springer, Heidelberg 2013) pp. 1-16 

S. Greco, B. Matarazzo, R. Stowiński: Rough sets 
theory for multicriteria decision analysis, Eur. 
J. Oper. Res. 129, 1-47 (2001) 

S. Greco, B. Matarazzo, R. Stowiński: Decision rule 
approach. In: Multiple Criteria Decision Analy- 
sis: State of the Art Surveys, ed. by J.R. Figueira, 
S. Greco, M. Ehrgott (Springer, Berlin 2005) pp. 507- 
562 

R. Stowiński, S. Greco, B. Matarazzo: Rough sets 
in decision making. In: Encyclopedia of Complexity 
and Systems Science, ed. by R.A. Meyers (Springer, 
New York 2009) pp. 7753-7786 

R. Stowiński, S. Greco, B. Matarazzo: Rough set and 
rule-based multicriteria decision aiding, Pesqui. 
Oper. 32, 213-269 (2012) 


24.35 


24.36 


24.37 


24.38 


24.39 


24.40 


24.41 


24.42 


24.43 


24.44 


24.45 


24.46 


24.47 


24.48 


24.49 


24.50 


24.51 


24.52 


Y.Y. Yao: Relational interpretations of neighbor- 
hood operators and rough set approximation op- 
erators, Inf. Sci. 111, 239-259 (1998) 

Y.Y. Yao: Information granulation and rough set ap- 
proximation, Int. J. Intell. Syst. 16, 87-104 (2001) 
Y.Y. Yao, Y.H. Chen: Subsystem based generaliza- 
tions of rough set approximations. In: Foundations 
of Intelligent Systems, Lecture Notes in Computer 
Science, Vol. 3488, ed. by M.S. Hacid, N.V. Murray, 
Z.W. Ras, S. Tsumoto (Springer, Heidelberg 2005) 
pp. 210-218 

Y.Y. Yao, X.F. Deng: Quantitative rough sets based on 
subsethood measures, Inf. Sci. 267, 702-715 (2014) 
H.X. Li, X.Z. Zhou, T.R. Li, G.Y. Wang, D.Q. Miao, 
YY. Yao: Decision-Theoretic Rough Set Theory and 
Recent Progress (Science Press, Beijing 2011) 

H. Yu, G.Z. Liu, Y.G. Wang: An automatic method to 
determine the number of clusters using decision- 
theoretic rough set, Int. J. Approx. Reason. 55,101- 
115 (2014) 

F. Li, M. Ye, D.X. Chen: An extension to rough 
c-means clustering based on decision-theoretic 
rough sets model, Int. J. Approx. Reason. 55, 116- 
129 (2014) 

J. Li, T.P.X. Yang: An axiomatic characterization of 
probabilistic rough sets, Int. J. Approx. Reason. 55, 
130-141 (2014) 

XY. Jia, Z.M. Tang, W.H. Liao, L. Shang: On an 
optimization representation of decision-theoretic 
rough set model, Int. J. Approx. Reason. 55, 156- 
166 (2014) 

F. Min, Q.H. Hu, W. Zhu: Feature selection with test 
cost constraint, Int. J. Approx. Reason. 55, 167-179 
(2014) 

J.W. Grzymala-Busse, G.P. Clark, M. Kuehnhausen: 
Generalized probabilistic approximations of in- 
complete data, Int. J. Approx. Reason. 55, 180-196 
(2014) 

D. Liu, T.R. Li, D.C. Liang: Incorporating logistic re- 
gression to decision-theoretic rough sets for classi- 
fications, Int. J. Approx. Reason. 55, 197-210 (2014) 
B. Zhou: Multi-class decision-theoretic rough sets, 
Int. J. Approx. Reason. 55, 211-224 (2014) 

HY. Qian, H. Zhang, LY. Sang, Y.J. Liang: Multigran- 
ulation decision-theoretic rough sets, Int. J. Ap- 
prox. Reason. 55, 225-237 (2014) 

P. Lingras, M. Chen, Q.D. Miao: Qualitative and 
quantitative combinations of crisp and rough clus- 
tering schemes using dominance relations, Int. 
J. Approx. Reason. 55, 238-258 (2014) 

W.M. Shao, Y. Leung, Z.W. Wu: Rule acquisition and 
complexity reduction in formal decision contexts, 
Int. J. Approx. Reason. 55, 259-274 (2014) 

J.T. Yao, X.X. Li, G. Peters: Decision-theoretic rough 
sets and beyond, Int. J. Approx. Reason. 55, 9-100 


(2014) 
X.Y. Zhang, D.Q. Miao: Two basic double- 
quantitative rough set models of precision 


and grade and their investigation using granular 


Probabilistic Rough Sets | References 411 


24.53 


24.54 


24.55 


24.56 


24.57 


24.58 


24.59 


24.60 


24.61 


24.62 


24.63 


24.64 


computing, Int. J. Approx. Reason. 54, 1130-1148 
(2013) 

W. Ziarko: Probabilistic approach to rough sets, Int. 
J. Approx. Reason. 49, 272-284 (2008) 

B. Fitelson: Studies in Bayesian Confirmation The- 
ory, Ph.D. Thesis (University of Wisconsin, Madison 
2001) 

R. Festa: Bayesian confirmation. In: Experi- 
ence, Reality, and Scientific Explanation, ed. by 
M. Galavotti, A. Pagnini (Kluwer, Dordrecht 1999) 
pp. 55-87 

S. Greco, Z. Pawlak, R. Stowinski: Can Bayesian con- 
firmation measures be useful for rough set decision 
rules?, Eng. Appl. Artif. Intell. 17, 345-361 (2004) 

S. Greco, R. Stowinski, |. Szczech: Properties of 
rule interestingness measures and alternative ap- 
proaches to normalization of measures, Inf. Sci. 
216, 1-16 (2012) 

Y.Y. Yao: Two semantic issues in a probabilis- 
tic rough set model, Fundam. Inf. 108, 249-265 
(2011) 

Y.Y. Yao, B. Zhou: Naive Bayesian rough sets. In: 
Rough Sets and Knowledge Technology, Lecture 
Notes in Computer Science, Vol. 6401, ed. by J. Yu, 
S. Greco, P. Lingras, G.Y. Wang, A. Skowron (Springer, 
Heidelberg 2010) pp. 719-726 

D.C. Liang, D. Liu, W. Pedrycz, P. Hu: Triangular fuzzy 
decision-theoretic rough sets, Int. J. Approx. Rea- 
son. 54, 1087-1106 (2013) 

H.X. Li, X.Z. Zhou: Risk decision making based on 
decision-theoretic rough set: a three-way view 
decision model, Int. J. Comput. Intell. Syst. 4, 1- 
11 (2011) 

D. Liu, T.R. Li, D. Ruan: Probabilistic model crite- 
ria with decision-theoretic rough sets, Inf. Sci. 181, 
3709-3722 (2011) 

K. Dembczynski, S. Greco, W. Kottowski, R. Stow- 
iński: Statistical model for rough set approach 
to multicriteria classification. In: Knowledge Dis- 
coveery in Databases, Lecture Notes in Computer 
Science, Vol. 4702, ed. by J.N. Kok, J. Koronacki, 
R. de Lopez Mantaras, S. Matwin, D. Mladenic, 
A. Skowron (Springer, Heidelberg 2007) pp. 164-175 
Y.Y. Yao: An outline of a theory of three-way 
decisions. In: Rough Sets and Current Trends in 
Computing, Lecture Notes in Computer Science, 


24.65 


24.66 


24.67 


24.68 


24.69 


24.70 


24.71 


24.72 


24.73 


24.74 


24.75 


Vol. 7413, ed. by J.T. Yao, Y. Yang, R. Stowiński, 
S. Greco, H.X. Li, S. Mitra, L. Polkowski (Springer, 
Heidelberg 2012) pp. 1-17 

Y.Y. Yao: Three-way decision: an interpretation of 
rules in rough set theory. In: Rough Sets and 
Knowledge Technology, Lecture Notes in Com- 
puter Science, Vol. 5589, ed. by P. Wen, Y.F. Li, 
L. Polkowski, Y.Y. Yao, S. Tsumoto, G.Y. Wang 
(Springer, Heidelberg 2009) pp. 642-649 

Y.Y. Yao: Three-way decisions with probabilistic 
rough sets, Inf. Sci. 180, 341-353 (2010) 

Y.Y. Yao: The superiority of three-way decisions in 
probabilistic rough set models, Inf. Sci. 181, 1080- 
1096 (2011) 

X.Y. Jia, L. Shang, X.Z. Zhou, J.Y. Liang, D.Q. Miao, 
GY. Wang, T.R. Li, Y.P. Zhang: Theory of Three- 
Way Decisions and Application (Nanjing Univ. Press, 
Nanjing 2012) 

D. Liu, T.R. Li, D.Q. Miao, G.Y. Wang, JY. Liang: 
Three-Way Decisions and Granular Computing (Sci- 
ence Press, Beijing 2013) 

J.R. Figueira, S. Greco, M. Ehrgott: Multiple Criteria 
Decision Analysis: State of the Art Surveys (Springer, 
Berlin 2005) 

S. Greco, B. Matarazzo, R. Stowinski: A new rough 
set approach to evaluation of bankruptcy risk. In: 
Rough Fuzzy and Fuzzy Rough Sets, ed. by C. Zo- 
pounidis (Kluwer, Dordrecht 1998) pp. 121-136 

S. Greco, B. Matarazzo, R. Stowinski: The use of 
rough sets and fuzzy sets in MCDM. In: Multicriteria 
Decision Making, Int. Ser. Opear. Res. Manage. Sci., 
Vol. 21, ed. by T. Gal, T. Stewart, T. Hanne (Kluwer, 
Dordrecht 1999) pp. 397-455 

S. Greco, B. Matarazzo, R. Stowinski: Extension of 
the rough set approach to multicriteria decision 
support, INFOR 38, 161-196 (2000) 

S. Greco, B. Matarazzo, R. Stowinski: Rough sets 
methodology for sorting problems in presence of 
multiple attributes and criteria, Eur. J. Oper. Res. 
138, 247-259 (2002) 

S. Greco, R. Stowinski, Y. Yao: Bayesian decision 
theory for dominance-based rough set approach. 
In: Rough Sets and Knowledge Technology, Lec- 
ture Notes in Computer Science, Vol. 4481, ed. by 
J.T. Yao, P. Lingras, W.Z. Wu, M. Szczuka, N. Cercone 
(Springer, Heidelberg 2007) pp. 134-141 


Hz |) Hed 


JingTao Yao, Davide Ciucci, Yan Zhang 


This chapter reviews three formulations of rough set 
theory, i.e., element-based definition, granule- 
based definition, and subsystem-based definition. 
These formulations are adopted to generalize rough 
sets from three directions. The first direction is to 
use an arbitrary binary relation to generalize the 
equivalence relation in the element-based defini- 
tion. The second is to use a covering to generalize 
the partition in the granule-based definition, and 
the third to use a subsystem to generalize the 
Boolean algebra in the subsystem-based defini- 
tion. In addition, we provide some insights into 
the theoretical aspects of these generalizations, 
mainly with respect to relations with non-classical 
logic and topology theory. 


In the Pawlak rough set model, the relationships of 
objects are defined by equivalence relations [25.1, 
2]. In addition, we may obtain two other equiva- 
lent structures: the partition, induced by the equiv- 
alence relations, and an atomic Boolean algebra, 
formed by the equivalence classes as its set of 
atoms [25.2,3]. In other words, we have three 
equivalent formulations of rough sets, namely, the 
equivalence relation-based formulation, the partition- 
based formulation, and the Boolean algebra-based 
formulation [25.4]. The approximation operators 
apr and apr are defined by an equivalence re- 
lation E, a partition U/E, and Boolean alge- 
bra B(U/E), respectively [25.3,5]. Although math- 
ematically equivalent, these three formulations give 
different insights into the theory. More interest- 
ingly, when rough sets are generalized, the three 
formulations are no longer equivalent and thus 


25. Generalized Rough Sets 


25.1 Definition and Approximations 


of the Models ........ 414 
25.1.1 A Framework 
for Generalizing Rough Sets........ 414 
25.1.2 Binary Relation-Based 
Rough SOUS aec 415 
25.1.3 Covering-Based 
Roug m SESon naei 416 
25.1.4 Subsystem-Based 
Po ATT S csicesscecossadnedasiecedensnss 418 
25.2 Theoretical Approaches...................... 420 
25.2:1 Logical SMING......06.50.cc0c..cccs see. 420 
25.2.2 TODO RY, oan veed cae eecsuivscstautens 421 
25.3 Comclusion....................ccceeeeeeeeeeeeeeeeee 422 
GA e TNT 423 


give new directions for the exploration of rough 
sets. 

This chapter aims to explore these different gener- 
alizations. The discussion is organized in two parts. In 
the first part, we review and summarize relation-based, 
covering-based, and subsystem-based rough sets, based 
on several articles by Yao [25.4, 6, 7]. In the second part, 
we will give some insight into the theoretical aspects of 
these generalizations, mainly with respect to relations 
with nonclassical logic (modal and many-valued) and 
topology theory. It is to be noted that this second part 
partially overlaps with the first one, however, the scopes 
are different. Indeed, whereas the first part explains the 
models and their genesis, the second one is only devoted 
to some theoretical aspects. As such, the second part 
can be skipped by readers who are not so interested in 
fine details but may still have a clear view of the whole 
landscape of these kinds of generalized rough sets. 


413 


v 
w 

= 

+ 
(om) 
N 
ul 


414 Part C | Rough Sets 


Sz |) Hed 


25.1 Definition and Approximations of the Models 


In this section we discuss three equivalent formulations, 
namely, the equivalence relation-based formulation, the 
partition-based formulation, and the Boolean algebra- 
based formulation. 


25.1.1 A Framework 
for Generalizing Rough Sets 


For a systematic study on the generalization of Pawlak 
rough sets, Yao provided a framework to classifying 
commonly used definitions of rough set approxima- 
tions into three types: the element-based definition, the 
granule-based definition, and the subsystem-based def- 
inition [25.5]. He argued that these types offer three 
directions for generalizing rough set models. We adapt 
this framework in the following discussion. 

Suppose the universe U is a finite and nonempty set 
and let E C Ux U be an equivalence relation on U. The 
equivalence class containing x is denoted as 


kle = tyly € U, xEy} . 


The family of all equivalence classes is known as a quo- 
tient set denoted by 


U/E = {lxlelx € U} . 


U/E defines a partition of U. A family of all definable 
sets form B(U/E). A family of all definable sets can 
be obtained from U/E by adding the empty set Ø and 
making itself closed under set union. A family of all 
definable sets is a subsystem of 2” | that is, B(UU/E) € 
2¥ [25.1]. The standard rough set theory deals with the 
approximation of any subset of U in terms of definable 
subsets in B(U/E). From different representations of an 
equivalence relation, three definitions of Pawlak rough 
set approximations can be obtained as follows: 


Pawlak rough set model Generalized rough set model 


Element-based 
definition 


n ; Generalize 
Equivalence relation E ————__> 


0 


Partition U/E 


RSUxU 


Generalize 


Granule-based — > Covering C 


definition 


Generalize 


Subsystem-based Boolean algebra > Any subsystem 


definition B(UI/E) 


Fig. 25.1 Different formulations of approximation operators 


Any binary relation 


© Element-based definitions [25.3, 5] 
apr(A) = {x|x € U, [x]z CA} 
= {x|x € U, Vy € ULxXEy > y E€ Al}, 
apr(A) = {x|x € U, [I]se NA F 85 
= {x|x € U, dy € UlxEyAyeA]}. (25.1) 


@ Granule-based definitions [25.3,5, 8] 
apr(A) = |_){blelbde € U/E, be € A} 
=| J{XIX € U/E,X C A}, 


apr(A) = |_){blelbde € U/E, be NA FO} 
=| JX € U/E,X NA # 8}. 


(25.2) 
@ Subsystem-based definition [25.3, 5, 8] 
apr(A) = |_J{XIX € B(U/E), X C A}, 
apr(A) = ( {XIX € B(U/E),A CX}. (25.3) 


The three equivalent definitions offer different in- 
terpretations of rough set approximations [25.5]. Ac- 
cording to the element-based definition, an element x 
is in the lower approximation apr(A) of a set A if all 
of its equivalent elements are in A; the element is in 
the upper approximation apr(A) if at least one of its 
equivalent elements is in A [25.5]. According to the 
granule-based definition, apr(A) is the union of equiva- 
lence classes that are subsets of A; apr(A) is the union of 
equivalence classes that have a nonempty intersection 
with A [25.5]. According to the subsystem-based defini- 
tion, apr(A) is the largest definable set in the subsystem 
B(U/E) that is contained in A; apr(A) is the smallest 
definable set in the subsystem B(U/E) that contains 
A [25.5]. 

Figure 25.1, adapted from Yao and Yao [25.4], 
shows three directions in generalized rough set mod- 
els. In the Pawlak model, the definitions of approx- 
imation operators based on the equivalence relation, 
partition and Boolean algebra B(U/E) are equivalent. 
The symbol < is used to show a one-to-one two-way 
construction process. However, the generalized defini- 
tions of approximation operators using arbitrary binary 
relations, coverings, and subsystems are not equivalent. 


Generalized Rough Sets | 25.1 Definition and Approximations of the Models 


In other words, the corresponding subsystem or cov- 
ering may not be found based on an arbitrary binary 
relation. With each formulation, various definitions of 
approximation operators can be examined. One may 
consider an arbitrary binary relation in generalizing 
the equivalence relation in the element-based defini- 
tion, a covering in generalizing the partition in the 
granule-based definition, and other subsystems in gen- 
eralizing the Boolean algebra in the subsystem-based 
definition [25.4, 5]. 


25.1.2 Binary Relation-Based Rough Sets 


In the development of the theory of rough sets, ap- 
proximation operators are typically defined by using 
equivalence relations which are reflexive, symmetric, 
and transitive [25.2]. The Pawlak rough set model can 
be extended by using any arbitrary binary relation to re- 
place the equivalence relation. Wybraniec-Skardowska 
introduced different rough set models based on vari- 
ous types of binary relations [25.9]. Pawlak pointed 
out that any type of relations may be assumed on 
the universe for the development of a rough set the- 
ory [25.10]. Yao etal. extended conventional rough 
set models by considering various types of relations 
by drawing results from modal logics [25.11]. Simi- 
larly to defining different types of modal logic sys- 
tems, different rough set models were defined by us- 
ing classes of binary relations satisfying various sets 
of properties, formed by serial, reflective, symmetric, 
transitive, and Euclidean relations, and their combina- 
tions. Slowinski and Vanderpooten considered a special 
case in which a reflexive (not necessarily symmet- 
ric and transitive) similarity relation was used [25.12]. 
Greco etal. examined a fuzzy rough approximation 
based on fuzzy similarity relations [25.13]. Guan and 
Wang investigated the relationships among 12 differ- 
ent basic definitions of approximations and suggested 
the suitable generalized definitions of approximations 
for each class of generalized indiscernibility rela- 
tions [25.14]. 

A binary relation R may be conveniently repre- 
sented by a mapping n: U > 2", i.e., n is a neigh- 
borhood operator and n(x) consists of all R-related 
elements of x. In the element-based definition, the 
equivalence class [x], can be viewed as a neighborhood 
of x consisting of objects equivalent to x. In general, 
one may consider any type neighborhood of x, con- 
sisting of objects related to x, to form more general 
approximation operators. By extending (25.1), we can 
define lower and upper approximation operators as fol- 


lows [25.15] 


apr (A) = {x|x € U,n(x) C A} 
= {x|x € U, Vy € U(y E n(x) > y E A)}, 
apr, (A) = {x|x € U, n(x) NA F Ø} 
= {x|x € U, 3yo E n(x) Ay EA)}. 
(25.4) 


The set apr, (A) consists of elements whose R-related 
elements are all in A, and apr,,(A) consists of elements 
such that at least one of whose R-related elements is 
in A. The lower and upper approximation operators apr 
and apr,, pair are a generalized rough set of A induced 
by the binary relation R. 

A neighborhood operator can be defined by using 
a binary relation [25.6, 12,16]. Suppose RC Ux U is 
a binary relation on the universe U. A successor neigh- 
borhood operator R- : U > 2” can be defined as 


xR: = {y|y € U, xRy}. 


Conversely, a binary relation can be constructed from 
its successor neighborhood as 


xRy & yexR.. 


Generalized approximations by a neighborhood oper- 
ator can be equivalently formulated by using a binary 
relation [25.4]. This formulation connects generalized 
approximation operators with the necessity and possi- 
bility operators in modal logic [25.6]. There are many 
types of generalized approximation operators defined 
by neighborhood operators that are induced by a binary 
relation or a family of binary relations [25.3, 15-19]. 

For an arbitrary relation, generalized rough set op- 
erators do not necessarily satisfy all the properties in 
the Pawlak rough set model. Nevertheless, the follow- 
ing properties hold in rough set models induced by any 
binary relation [25.3, 6, 20] 


(L1) apr(A) = (@pr(A’)), 

(L2) apr(U) = U, 

(L3) apr(A N B) = apr(A) N apr(B) , 
(LA) apr(A U B) > apr(A) U apr(B) , 
(L5) A E B = apr(A) © apr(B), 

(K) apr(A® U B) S (apr(A))* U apr(B) , 
(U1) TFTA) = (apr(A9) 


415 


sz |) Hed 


416 Part C | Rough Sets 


Sz |) Hed 


(U2) apr(B) = G, 

(U3) apr(A U B) = apr(A) U apr(B) , 
(U4) apr(A N B) © apr(A) N apr(B) , 
(US) A C B => apr(A) € apr(B). 


A relation R is a serial relation if for all x € U there 
exists a y € U such that xRy; a relation is a reflexive rela- 
tion if for all x € U the relationship xRx holds; a relation 
is symmetric relation if for all x, y € U, xRy implies yRx 
holds; a relation is transitive relation if for three ele- 
ments x,y,z € U, xRy and yRZ imply xRz; a relation 
is Euclidean when for all x,y,z € U, xRy and xRz im- 
ply yRz [25.6, 15]. By using mapping n, we can express 
equivalently the conditions on a binary relation as fol- 
lows [25.6, 15, 21]: 


Serial xEU,n(x) 4G 

Reflexive xeU,x€n(x) 

Symmetric x,y € U, x € n(y) > y € n(x) 
Transitive x,y € U, y € n(x) > n(y) E n(x) 
Euclidean x,y €U, y€ n(x) > n(x) € n). 


Different binary relations have different properties. 
The five properties of a binary relation, namely, the 
serial, reflexive, symmetric, transitive, and Euclidean 
properties, induce five properties for the approximation 
operators [25.6, 20,21]. We use the same labeling sys- 
tem as in modal logic to label these properties [25.6]: 


Serial Property (D) 
apr(A) C apr(A) holds 
Reflexive Property (T) 


apr(A) C A holds 
Symmetric Property (B) 

A C apr(apr(A))holds 
Transitive Property (4) 

apr(A) C apr(apr(A)) holds 
Euclidean: Property (5) 

apr(A) © apr(apr(A)) holds. 


By combining these properties, one can construct 
more rough set models [25.6, 20]. Other than the above 
mentioned properties, (K) denotes the property that any 
binary relation holds, i.e., no special property is re- 
quired. We use a series of property labels, i.e., (K), 
(D), (T), (B), (4), (5), to represent the rough set models 
built on relations with these properties. For example, the 
KTB rough set model is built on a compatibility relation 
R, i. e., with reflexive and symmetric properties. In such 
a model, properties (K), (D), (T) and (B) hold, how- 
ever, properties (4) and (5) do not hold. Property (D) 
does not explicitly appear in this label because (D) can 


be obtained from (T). If R is reflexive, symmetric, and 
transitive, i.e., R is an equivalence relation, we obtain 
the Pawlak rough set model [25.6, 20]. The approxima- 
tion operators satisfies all properties (D), (T), (B), (4), 
and (5). 

Figure 25.2 summarizes the relationships between 
these models [25.6, 20, 21]. The label of the model in- 
dicates the characterization properties of that model. 
A line connecting two models indicates that the model 
on the upper level is also a model on the lower level. 
For example, a KTS model is a KT4 model, as KT5 con- 
nects down to KT4. It should be noted that the lines that 
can be derived by transitivity are not explicitly shown. 
The model K may be considered as the basic model be- 
cause it does not require any special property on the 
binary relation. All other models are built on top of the 
model K and it can be regarded as the weakest model. 
The model KTS, i. e., the Pawlak rough set model, is the 
strongest model. 

With the element-based definition, we can obtain bi- 
nary relation-based rough set models by generalizing 
the equivalence relation to binary relations. Different 
binary relations can induce different rough set mod- 
els with different properties, as was discussed above. 
This generalization not only deepens our understand- 
ing of rough sets, but also enriches the rough set 
theory. 


25.1.3 Covering-Based Rough Sets 


A covering of a universe is a family of subsets of the 
universe such that their union is the universe. By al- 
lowing nonempty overlap of two subsets, a covering 
is a generalized mathematical structure of a parti- 
tion [25.22]. These subsets in a covering or a partition 
can be considered as granules based on the concepts 


Fig. 25.2 Rough set models (after [25.6]) 


Generalized Rough Sets 


25.1 Definition and Approximations of the Models 


in granular computing [25.23]. By generalizing the 
partition to covering in granule-based approximation 
definitions, we form a more general definition and we 
call this approach a granule-based definition. In this 
section, we mainly investigate covering-based rough 
sets. 

Zakowski proposed the notion of covering-based 
rough set approximations in 1983 [25.24]. Lower 
and upper approximation operators are defined by 
a straightforward generalization of the rough set defi- 
nition proposed by Pawlak. However, the generalized 
approximation operators are not dual to each other with 
respect to set complements [25.15, 25]. Pomykala stud- 
ied two pairs of dual approximation operators [25.25]. 
The lower approximation operator in one pair is the 
same as the Zakowski lower approximation operator, 
and the upper approximation operator in the other pair 
is same as the Zakowski upper approximation oper- 
ator [25.25]. Pomykala also suggested and examined 
additional pairs of dual approximation operators that 
are induced by a covering. Furthermore, he considered 
coverings produced by tolerance relations in an incom- 
plete information table [25.26]. 

Instead of using duality, Wybraniec-Skardowska 
studied pairs of approximation operators linked to- 
gether by a different type of relations [25.9]. Given an 
upper approximation operator, the corresponding lower 
approximation operator is defined based on the up- 
per approximations of singleton subsets. Several such 
pairs of approximation operators were studied based on 
a covering and a tolerance relation defined by a cover- 
ing, including some of those used by Zakowski [25.24] 
and Pomykala [25.25]. Yao investigated dual approx- 
imation operators by using coverings induced by the 
predecessor and/or successor neighborhoods of serial 
or inverse serial binary relations. The two pairs of 
dual approximation operators introduced by Pomykala 
were examined and the conditions for their equiva- 
lence to those obtained from a binary relation were 
given [25.5, 15]. Couso and Dubois proposed a loose 
pair and a tight pair [25.27]. They presented an inter- 
esting investigation of the two pairs of approximation 
operators within the context of incomplete information. 
The two pairs of operators were shown to be related to 
the family of approximation operators produced by all 
partitions consistent with a covering induced by an ill- 
known attribute function in an incomplete information 
table. Restrepo et al. investigated different relationships 
between commonly used operators using concepts of 
duality and other properties [25.28]. They also showed 
that a pair of lower operators and an upper approxi- 


mation operator can be dual and adjoint at the same 
time. 

By using the minimum neighborhood of an object 
(i. e., the intersection of subsets in the minimal descrip- 
tion of the object), Wang et al. introduced a pair of 
dual approximation operators [25.29]. The same pair of 
approximation operators was also used and examined 
by Xu and Wang [25.30] and Xu and Zhang [25.31]. 
Zhu’s team systematically studied five types of approx- 
imation operators [25.19, 32-36]. The lower approxi- 
mation operator is the Zakowski lower approximation 
operator, and the upper approximation operators are 
different. They investigated properties of these oper- 
ators and their relationships and provided set of ax- 
ioms for characterizing these operators. Liu examined 
covering-based rough sets from constructive and ax- 
iomatic approaches [25.37]. The relationships among 
four types of covering-based rough sets and the topolo- 
gies induced by different covering approximations were 
discussed. Zhang and Luo investigated relationships 
between relation-based rough sets and covering-based 
rough sets [25.38]. They also presented some suffi- 
cient and necessary conditions for different types of 
covering-based rough sets to be equal. 

We will elaborate on how to obtain rough sets 
by generalizing a partition to a covering, as well as 
duality, loose pairs, and tight pairs of approximation 
operators. Let C be a covering of the universe U. By 
replacing a partition U/E with a covering C and equiv- 
alence classes with subsets in C in the granule-based 
definition, a pair of approximation operators can be ob- 
tained [25.24]. However, they are not a pair of dual 
operators [25.25]. To overcome this problem, Yao sug- 
gested that one can generalize one of them and define 
the other by duality [25.7, 15]. The granule-based defi- 
nition can be generalized in two ways, i. e., (1) the lower 
approximation operator is extended from partition to 
covering and the upper approximation operator is rede- 
fined by duality, (2) the upper approximation operator 
is extended from partition to covering and the lower ap- 
proximation operator is redefined by duality [25.4, 7]. 
The results are two pairs of dual approximation opera- 
tors [25.4] 


apr’ (A) = |_){x|x eC, xX CA} 
= {x|x € U,AX e Cx € X,X CA}}, 
apr’ (A) = (apr’(A°))° 
= {xlxe U, VX ECheXS XNAF Øh. 
(25.5) 


417 


sz |) Hed 


418 Part C 


Sz |) Hed 


Rough Sets 


and 
apr” (A) = (apr’(A‘))° 
= {xlxe U, VX EC eX> XCA}, 
apr’ (A) =|_){x|xeC,XNAZ 9B} 
= {x|x € U,AX e Ck E X, XNA # Øh. 
(25.6) 


We may define two pairs of dual approximation opera- 
tors for each covering. Both of pairs are consistent with 
the Pawlak definition. The following relationships hold 
for the above approximation operators [25.4, 25] 


apr” (A) C apr’(A) C A apr’ (A) C ap” (A). 
(25.7) 


Therefore, the pair (apr (A), apr (A)) is called a pair of 
tighter approximation and the pair (apr” (A), apr” (A)) 
is called a looser approximation [25.25]. Furthermore, 
any approximation produced by other authors are 
bounded by 


(apr’(A), apr’ (A)) 
and 
(apr” (A), apr’’(A)) 
if 
apr(A) C A C apr). 


In addition to this fundamental generalization of 
a rough set to a covering, more than 20 different ap- 
proximation pairs have been defined [25.4, 39]. Their 
properties were studied in [25.39], where the inclusion 
relationship occurring among two sets and their approx- 
imations was considered. All these approaches were 
categorized in a recent study [25.4]. 

We recall some notions that will be useful in 
Sect. 25.2.2 when dealing with the topological charac- 
terization of approximations. 


Definition 25.1 [25.39] 
Let C be a covering on a universe U and x € U. We 
define: 


@ The neighborhood of x: y(x) =N{CEC:xeE C}. 

@ The friends of x: d(x) = U{C Ee C:x€ Ch. 

© The partition generated by a covering: Mc (x) = 
fyeU:VCEC, (xEeCoyedO)}. 


Using the above operators, the following approxi- 
mation pairs are introduced 


Li (A) = U{S(x) : 6(x) C A}, 

U (A) = L(A‘), (25.8) 
L(4)=U{C CA}, 

U>(A) = L (4°) = N{C°: CEC, CNA =D}, 


(25.9) 
L3(A) = L(A), 
U3(A) = U{C : CNA # G}\ L (A), (25.10) 
L(A) = Ufc) CA}, 
U4(A) = Uc (x) NA F Oy, (25.11) 
Ls5(A) = {xE U: y(x) CA}, 
Us(A) = {x EU: y(x) NA £ Ø}, (25.12) 
Lg (A) = U6(A‘)*, 
U6(A) = U{y (x) :x € A}. (25.13) 


These approximation pairs have been introduced 
and studied in several papers: approximation pair 1 
(25.8) can be found in [25.40], approximation pair 2 
(25.9) in [25.5, 40—43], approximation pair 3 (25.10) 
in [25.42], approximation pair 4 (25.11) in many 
papers starting from [25.44], approximation pair 5 
(25.12) in [25.41, 45], and approximation pair 6 (25.13) 
in [25.45,46]. As we will discuss, they all show nice 
topological properties. 

By simply replacing a partition with a covering, 
a generalized mathematical structure of a partition, in 
the granule-based definition, we form new rough sets. 
The lower and upper approximation operators are not 
necessary dual. We may redefine one of them to obtain 
the dual approximation operators. There are two types 
of approximation, a tight pair and a loose pair. The two 
pairs provide the boundary when new approximation 
operators are introduced. 


25.1.4 Subsystem-Based Rough Sets 


In the Pawlak rough set model, the same subsystem 
is used to define lower and upper approximation oper- 
ators. When generalizing the subsystem-based defini- 
tion, two subsystems may be used, one for the lower 
approximation, which is closed under union, and the 
other for the upper approximation, which is closed 
under intersection [25.4, 7,47]. To ensure duality of ap- 
proximation operators, the two subsystems should be 
dual systems with respect to set complement [25.4]. 
Given a closure system S, its dual system S can be con- 


Generalized Rough Sets 


25.1 Definition and Approximations of the Models 


structed as § = {~ X|X € S}. The system S contains the 
universe U and is closed under set intersection. The sys- 
tem S contains the empty set Ø and is closed under set 
union [25.4]. A pair of lower and upper approximation 
operators with respect to S is defined as [25.4] 


apr(A) = |_}{x|x E€ S,X CA}, 


apr(A) = ( XIX €S,A C X}. (25.14) 


In the Pawlak rough set model, the two systems § 
and § are the same, namely, S$ = S, which is closed 
under set complement, union, and intersection. That 
is, it is a Boolean algebra. The subsystem-based def- 
inition provides a way to approximate any set in 2” 
by a pair of sets in S and S, respectively [25.4]. The 
subsystem-based definition can be generalized by using 
different mathematical structures, such as topological 
spaces [25.7,47,48], closure systems [25.7,47], lat- 
tices [25.7, 49], and posets [25.7, 50]. 

For an arbitrary topological space, the family of 
open sets is different from the family of closed sets. Let 
(U, O(U)) be a topological space, where O(U) € 2” is 
a family of subsets of U called open sets. The family 
of (topological) open sets contains Ø and U. The family 
of open sets is closed under union and finite intersec- 
tion. The family of all (topological) closed sets C(U) = 
{7X|X € O(U)} contains Ø and U, and is closed un- 
der intersection and finite union. A pair of generalized 
approximation operators can be defined by replacing 
B(U/E) with O(U) for the lower approximation opera- 
tor, and B(U/E) with C(U) for the upper approximation 
operator [25.5]. The definitions of approximation oper- 
ators are [25.5, 7, 50] 


apr(A) = | _J{XIX € OU), X CA}, 


apr(A) = ( {XIX € C(U),A C X}. (25.15) 


The rough set model can be generalized by us- 
ing closure systems. A family of subsets of U, C(U), 
is called a closure system if it contains U and is 
closed under intersection. By collecting the comple- 
ments of members of C(U), we can obtain another 
system O(U) = {-X|X € C(U)}, which contains the 
empty set @ and is closed under union. In this case, 
a pair of approximation operators in a closure system 
can be defined by replacing B(U/E) with O(U) for the 
lower approximation operator, and B(U/E) with C(U) 
for the upper approximation operator. The definitions of 


approximation operators are [25.5, 7] 


apr(A) = (_{X|X € O(U),.X CA}, 


apr(A) = ( {XIX € C(U),A C X}. (25.16) 

The power set of the universe is a special lattice. 
Suppose (8, =, A, V, 0, 1) is a finite Boolean algebra 
and (Bo, =, A, V, 0,1) is a sub-Boolean algebra. One 
may approximate an element of B by using elements 
of Bo [25.7] 


apr(x) = \/ bly € Bo.y <x}, 


apr(x) = NDI» € Bo. x <y}. (25.17) 

We consider a more generalized definition in which 
the Boolean algebra B is replaced by a completely dis- 
tributive lattice [25.51], and one subsystem is used. 
A subsystem O(B) of B satisfies the following ax- 
ioms [25.7]: 


(01) 0€ O(B), 1 € O(B); 

(O2) for any subsystem D C O(B), if there exists 
a least upper bound LUB(D) = V D, it belongs 
to O(B); 

(03) O(B) is closed under finite meets. 


Elements of O(B) are referred to as inner defin- 
able elements. The complement of an inner definable 
element is called an outer definable element. The set 
of outer definable elements C(B) = {-x|x € O(B)} is 
characterized by the following axioms [25.7]: 


(C1) 0 € C(B), 1 € C(B); 

(C2) for any subsystem D C C(B), if there exists 
a greatest lower bound GLB(D) = A D, it be- 
longs to C(B); 

(C3) C(B) is closed under finite joins. 


From the sets of inner and outer definable elements, 
we define the following approximation operators [25.7] 


apr(x) = \/ ply € O(B),y <x}, 


apr(x) = NOl € C(B),x <y}. (25.18) 

Let (L, <,0, 1) be a bounded lattice. Suppose O(L) 
is a subset of L such that it contains 0 and is closed un- 
der join, and C(L) a subset of L such that it contains 1 
and is closed under meets. They are complete lattices, 


419 


sz |) Hed 


420 Part C 


Rough Sets 


TSZ |) Hed 


although the meet of O(L) and the join of C(L) may be 
different from those of L. Based on these two systems, 
we can define two other approximation operators as fol- 
lows [25.7] 


apr(x) = \/ oly € O(L).y <x}, 


apr(x) = /\ Oly € C(L),x < yt. (25.19) 
The operator apr is a closure operator [25.7]. C(L) 
corresponds to the closure system in the set-theoretic 
framework. However, since a lattice may not be com- 
plemented, we must explicitly consider both O(L) 


25.2 Theoretical Approaches 


In this section, we further develop the previously out- 
lined links with modal logics and topology. 


25.2.1 Logical Setting 


The minimal modal system K [25.52] is at the ba- 
sis of any modal logic. Its language £ is the usual 
one of propositional logic plus necessity O and possi- 
bility Q. That is, £=a € V|-ala A B|O(@), where 
V = {a,b,c,...} is the set of propositional variables 
and —, ^A, O are the negation, conjunction, and neces- 
sity connectives. As usual, other connectives can be 
derived: disjunction œ V B stands for (a@’ A f’)’, im- 
plication œ — £ stands for a’ v B, and possibility is 
Oa = -O (7a). 

The axioms are those of Boolean logic plus the ax- 
ioms to characterize the modal connectives: 


(Bl) a > (B >a) 


(B2) ($ > @> u)) > ($ > a) > ($ > y)) 
(B3) (w > B’) > (B > a) 
(K) O(a > B) > (Ga > Of). 


The rules are modus ponens: If a and F œ —> f 
then B and necessitation: fF a then F- Oa. 

In our context, the semantics is given through 
a model M = (X, R, v), where (X, R) is an approxima- 
tion space (that is, a universe with a binary relation) 
and v is the interpretation that given a variable returns 
a subset of elements of the universe: v(a) C X. Using 
the standard modal logic terminology, X is the set of 
possible worlds, R the accessibility relation, and v(a) 
represents the set of possible worlds where a holds. The 


and C(L). That is, the system (L,O(L),C(L)), 
or equivalently the system (L,apr,apr), is used 
for the generalization of Pawlak approximation 
operators. 

The subsystem-based formulation provides an im- 
portant interpretation of rough set theory. It allows us 
to study rough set theory in the contexts of many al- 
gebraic systems [25.47]. This naturally leads to the 
generalization of rough set approximations. With the 
subsystem-based definition, we examine the gener- 
alized approximation operators by using topological 
space, closure systems, lattices, and posets in this sub- 
section. 


interpretation v can recursively be extended to any for- 
mula @ as 


v~a) = væ)", 
v(&œı A a2) = v(a1) N v(&2), 
v(œı V @2) = v(œ1) U v(&2), 


and modal operators are mapped to lower and upper ap- 
proximations according to Definition 25.4 


v(Oa) = apr (v(a)), 
v(Qe) = Api, (v(a)). 


It is well known from modal logic [25.52] that, once 
the basic axioms (B1)-(B3) and (K) are fixed, then 
a different modal axiom according to Table 25.1 cor- 
responds to any relation property. 

Clearly, these axioms reflect the properties on rough 
approximations given in Sect. 25.1.2 and can be used to 
generate all the logics given in Fig. 25.2. 

Other kinds of generalized rough sets models have 
been studied in the literature under the framework of 
modal logic. In particular, nondeterministic information 


Table 25.1 Correspondence between modal axioms and re- 
lation properties 


Name Axiom Property 
T a> a Reflexive 
4 Da > O(a) Transitive 
5 Oa > O($(«)) Euclidean 
D Oæ —> a Serial 

B a > 09a Symmetric 


Generalized Rough Sets | 25.2 Theoretical Approaches 


logic (NIL) [25.53] is defined to capture those informa- 
tion tables in which more than one value can correspond 
to each pair (object, attribute). For instance, if we have 
a feature color then it is allowed that to each object can 
be assigned more than one color. Given this extended 
definition, several new relations can be introduced; 
these where studied in the Orlowska—Pawlak seminal 
paper [25.53] and some subsequent studies [25.54—56]. 
Some of these relations are: 


@ Similarity (connection) 

xSy iff f(x,a) Of, a) 4 @ for alla € A. 
@ Inclusion 

xy iff f(x, a) C f(y, a) for alla €A. 
© Indiscernibility 

xInd y iff f(x, a) = f(y, a) for alla € A. 
© Weak indiscernibility 

x W; iff f(x, a) =f (y, a) for some a € A. 
@ Weak similarity 

xW, iff f(x, a) N f(y, a) 4 @ for some a € A. 
© Complementarity 

x Comy iff f(x, a) = VAL, VO, a). 


We also mention the logic for data analysis 
(DAL) [25.57], which is meant to deal with approxi- 
mation spaces with more than one equivalence relation 
(X, Ri). 

Besides modal logic, in standard rough set theory 
based on one equivalence relation, several authors have 
dealt with a many-valued logic approach [25.58, 59], 
also with some criticism from the point of view of 
the interpretation of results [25.60]. On the other hand, 
there have been only a few attempts to link generalized 
rough sets and many-valued logic. One of the reasons 
is the intrinsic difficulty that arises when trying to de- 
fine intersection and union of rough sets (in an algebraic 
context defining a lattice and not only a poset) without 
imposing some restrictions (see, for example, [25.61, 
62)). 

A recent work [25.63] deals with many-valued logic 
in coverings and in particular the apr”, apr’ approxima- 
tions defined in (25.6). The novelty of the approach in 
the introduction of a subordination relation among ob- 
jects 


xxy iff VCECWEeC = xeC), 


which is strictly linked to the notion of neighborhood. 
Indeed, 


xxy iff xey). 


We also remark that a similar preorder relation defined 
by a topology is used in the bitopological approach to 
dominance-based rough sets [25.64]. This link could 
bring new insight into the many-valued approach to 
covering-based rough sets. 

The syntax of the logic in [25.63] consists of two 
types of variables: object variables x, y, . . . and set vari- 
ables A,B,... Atomic formulae are x< y and x€A 
(where A can be a set variable or a composition of set 
variables) and compound formulae are obtained with 
the usual logical connectives =, A, V. The axioms are 
given in the form of sequent calculus, and the interpre- 
tation mapping 7 is given with respect to a covering C 
in atomic formulae as 


bajs t if v(x) = v(y) 
f otherwise 

where v maps each object variable to an object in the 

universe U and 


t ifvie apr (w(A)) 
IxEA)= 4f ifvaaye apr” (w(A)‘) , 


u otherwise 


where w maps each set variable and set formula to 
a subset of objects and u is a third truth value rep- 
resenting the unknown. The interpretation extends to 
compound formulae by truth functional application of 
Kleene three-valued logic. The logic is proven to be 
sound but complete only with respect to the sublan- 
guage of atomic formulae. 

We remark that this logic suffers from the problems 
of using three-valued logic to capture an epistemic no- 
tion such as is the case of Kleene-valued logics with 
respect to uncertainty. For instance, even if we are not 
sure if an element x € A, we can undoubtedly say that 
(x€A) or =—(x € A) (tertium non datur). On the con- 
trary, with the above interpretation we can obtain that 
I((x € A) A(x € A)) = u, whenever x is in the bound- 
ary of A, that is I(x € A) = u. 


25.2.2 Topology 


We saw in Sect. 25.1.4 that the subsystem approach can 
be generalized by the help of topological notions. Here, 
we further develop this topic and show which covering- 
based approximations have a topological behavior. 

Let us consider a lattice structure and define on it 
a notion of closure [25.65, 66]. 


421 


TSZ |) Hed 


422 Part C | Rough Sets 


ESZ |) Hed 


Definition 25.2 
Given a lattice £, a map c : £ +> £ is a closure operator 
if for all x,y € £: 


(Clop) x < c(x) 
(C2op) If x < y then c(x) < c(y) 
(C30p) c(c(x)) = c(x) 


The map c is a topological closure if in addition 
(Clop)—(C3op) satisfies 


(C4op) c(a) v c(b) = c(a v b). 


The map c is an Alexandroff closure if in addition 
(Clop)—(C3op) satisfies 


(CSop) vjela;) = e(vjqi). 


Of course, any Alexandroff closure is a topological one 
and on a finite universe, the two notions coincide. 

On a complemented lattice, an interior operator is 
defined by duality as i(x) = c(x’)/ and properties dual 
to (Clop)-(CSop) hold. On the other hand, if the lat- 
tice is not complemented, an interior operator must be 
explicitly defined, as discussed in Sect. 25.1.4. 

To the above algebraic definition of a closure opera- 
tor there corresponds an equivalent one based on closed 
sets as we saw in Sect. 25.1.4. More precisely: 


Definition 25.3 

Let £ be a lattice and C C £ a subset of elements 
which is closed under arbitrary intersections, that is, 
axioms (C1)-(C3) are satisfied. Then, a closure oper- 


25.3 Conclusion 


Three equivalent approaches to Pawlak rough sets can 
be given based on an equivalence relation, a partition 
of the universe, or a Boolean algebra. These differ- 
ent views generate three different possible generaliza- 
tions of the classical model: binary relation, covering, 
and subsystem-based rough sets. We have reviewed 
these models and given the definitions of rough ap- 
proximations in the different contexts. It can be seen 
that different models show interesting mathematical 
properties. In particular, binary relations-based rough 
sets have their roots in modal logic, whereas cov- 
ering and subsystem-based rough sets are linked to 
topology. 


ator satisfying properties (Clop—C3op) is defined as 
c(a) = A{uEe C:aK< ut}. 

A topological closure is such that the union of a fi- 
nite family of closed elements is closed, i. e., (Vierc;) € 
C with Z a finite set of indexes and an Alexandroff topol- 
ogy if closed under arbitrary union. 


Now, if the subsystem rough sets are naturally based 
on a topological ground, also covering rough sets can be 
classified with respect to topological properties. First of 
all, let us consider the approximations apr’ (A), apr (A) 
defined in (25.5). They are an interior and a closure op- 
erator, respectively. On the other hand, approximation 
apr’ (A) in (25.6) is not a closure, since in general, it 
does not satisfy condition (C3). 

Moreover, let us consider a covering C (X) of a uni- 
verse and the neighborhood of an element x € X with 
respect to C(X) defined as y(x) in Definition 25.1. 
It is well known that an Alexandroff closure operator 
is induced as the map cy : P(X) —> P(X) defined as 
cy (A) = U{y (a) : a € A}, which correspond to the up- 
per approximation Us in (25.13), and consequently the 
dual operator Le is an interior operator. 

More generally, all the upper approximations U1- 
Us are closure operators. In particular U4—U6 are also 
topological closures, and since duality holds with respect 
to all approximation pairs but (L3, U3) and since L3 = 
Ly, then all lower approximations are interior operators. 
This result can be easily established by checking that the 
properties satisfied by the approximations include those 
of Definition 25.2 (see Table 25.1 in [25.39]). 


Nowadays, generalized rough sets are continuously 
defined and we can encounter, for instance, more 
than 20 definitions of approximations based on cover- 
ings [25.4]. There is, however, a lack of interpretation 
in this collection. Efforts should be made to under- 
stand the meaning and usefulness of the already defined 
approximations. This should also be considered when 
defining new approximations. Besides an intrinsic the- 
oretical interest, a logical approach could also be useful 
in this direction. Indeed, if in the case of binary relation- 
based rough sets we have a clear logical framework, 
the same cannot be said about covering and subsystem- 
based rough sets, where only few results are known. 


Generalized Rough Sets 


References 


References 
25.1 Z. Pawlak: Rough sets, Int. J. Parallel Program. 11(5), 25.19 W. Zhu: Relationship between generalized rough 
341-356 (1982) sets based on binary relation and covering, Inf. Sci. 
25.2 Z. Pawlak: Rough Sets: Theoretical Aspects of Rea- 179(3), 210-225 (2009) 
soning About Data (Kluwer, Dordrecht 1991) 25.20 YY. Yao, S.K.M. Wang, T.Y. Lin: A review of rough set 
25.3 YY. Yao: Two views of the theory of rough sets in models. In: Rough Sets and Data Mining: Analysis 
finite universes, Int. J. Approx. Reason. 15(4), 291- for Imprecise Data, ed. by L. Polkowski, A. Skowron 
317 (1996) (Kluwer, Boston 1997) pp. 47-75 
25.4 YY. Yao, B.X. Yao: Covering based rough setapprox- 25.21 Y.Y. Yao: Constructive and algebraic methods of 
imations, Inf. Sci. 200, 91-107 (2012) the theory of rough sets, J. Inf. Sci. 109(1), 21-47 
25.5 YY. Yao: On generalizing rough set theory, Proc. Int. (1998) 
Conf. Rough Sets Fuzzy Sets Data Min. Granul. Com- 25.22 J.T. Yao, Y.Y. Yao: Induction of classification rules 
put. (Springer, Berlin Heidelberg 2003) pp. 44-51 by granular computing, Proc. Int. Conf. Rough Sets 
25.6 ‘YY. Yao, T.Y. Lin: Generalization of rough sets using Curr. Trends Comput. (Springer, Berlin Heidelberg 
modal logic, Intell. Autom. Soft Comput. 2(2), 103- 2002) pp. 331-338 
120 (1996) 25.23 J.T. Yao, A.V. Vasilakos, W. Pedrycz: Granular com- 
25.7 YY. Yao: On generalizing Pawlak approxima- puting: Perspectives and challenges, IEEE Trans. 
tion operators, Proc. Int. Conf. Rough Sets Curr. Cybern. 43(6), 1977-1989 (2013) 
Trends Comput. (Springer, Berlin Heidelberg 1998) 25.24 W. Zakowski: Approximations in the space (u, M), 
pp. 298-307 Demonstr. Math. 16(40), 761-769 (1983) 
25.8 YY. Yao, T. Wang: On rough relations: An alterna- 25.25 J.A. Pomykala: Approximation operations in ap- 
tive formulation, Proc. Int. Conf. New Dir. Rough proximation space, Bull. Pol. Acad. Sci. Math. 35, 
Sets Data Min. and Granul.-Soft Comput. (Springer, 653-662 (1987) 
Berlin Heidelberg 1999) pp. 82-90 25.26 J.A. Pomykata: On definability in the nondetermin- 
25.9 U. Wybraniec-Skardowska: On a generalization of istic information system, Bull. Pol. Acad. Sci.: Math. 
approximation space, Bull. Pol. Acad. Sci. Math. 36(3/4), 193-210 (1988) 
37(1-6), 51-62 (1989) 25.27 l. Couso, D. Dubois: Rough sets, coverings and in- 
25.10 Z. Pawlak: Hard and soft sets, Rough Sets, Proc. complete information, Fundam. Inf. 108(3), 223-247 
Int. Workshop Rough Sets Knowl. Discov. (Springer, (2011) 
London 1994) pp. 130-135 25.28 M. Restrepo, C. Cornelis, J. Gómez: Duality, conju- 
25.11 YY. Yao, X. Li, TY. Lin, Q. Liu: Representation and gacy and adjointness of approximation operators 
classification of rough set models, Proc. Int. Work- in covering-based rough sets, Int. J. Approx. Rea- 
shop Rough Sets Soft Comput. (SCS: Society for son. 55(1), 469-485 (2014) 
Computer Simulation, San Diego 1995) pp. 44-47 25.29 J. Wang, D. Dai, Z. Zhou: Fuzzy covering general- 
25.12 R. Slowinski, D. Vanderpooten: A generalized def- ized rough sets, J. Zhoukou Teach. Coll. 21(2), 20-22 
inition of rough approximations based on simi- (2004), in Chinese 
larity, Knowl. Data Eng., IEEE Trans. 12(2), 331-336 25.30 Z. Xu, Q. Wang: On the properties of covering rough 
(2000) sets model, J. Henan Norm. Univ. 33(1), 130-132 
25.13 S. Greco, B. Matarazzo, R. Slowinski: Fuzzy similar- (2005), in Chinese 
ity relation as a basis for rough approximations, 25.31 W.H. Xu, W.X. Zhang: Measuring roughness of gen- 
Proc. Int. Conf. Rough Sets Curr. Trends Comput. eralized rough sets induced by a covering, Fuzzy 
(Springer, Berlin Heidelberg 1998) pp. 283-289 Sets Syst. 158(22), 2443-2455 (2007) 
25.14 L.H. Guan, G.Y. Wang: Generalized approximations 25.32 W. Zhu, FY. Wang: Some results on covering gener- 
defined by non-equivalence relations, Inf. Sci. 193, alized rough sets, Pattern Recogn. Artif. Intell. 15(1), 
163-179 (2012) 6-13 (2002) 
25.15 YY. Yao: Relational interpretations of neighbor- 25.33 W. Zhu, FY. Wang: Reduction and axiomization of 
hood operators and rough set approximation op- covering generalized rough sets, Inf. Sci. 152, 217- 
erators, Inf. Sci. 111(1), 239-259 (1998) 230 (2003) 
25.16 W.Z. Wu, W.X. Zhang: Neighborhood operator sys- 25.34 W. Zhu: Properties of the second type of covering- 
tems and approximations, Inf. Sci. 144(1), 201-217 based rough sets, Proc. Int. Web Intell. Intell. Agent 
(2002) Technol. (IEEE, Piscataway 2006) pp. 494-497 
25.17 H.M. Abu-Donia: Comparison between different 25.35 W. Zhu, F.Y. Wang: A new type of covering rough set, 
kinds of approximations by using a family of binary Proc. Int. Conf. Intell. Syst. (IEEE, Piscataway 2006) 
relations, Knowl.-Based Syst. 21(8), 911-919 (2008) pp. 444-449 
25.18 YY. Yao: Generalized rough set models. In: Rough 25.36 W. Zhu, FY. Wang: On three types of covering-based 


Sets in Knowledge Discovery, ed. by L. Polkowski, 
A. Skowron (Physica, Heidelberg 1998) pp. 286-318 


rough sets, IEEE Trans. Knowl. Data Eng. 19(8), 1131- 
1144 (2007) 


423 


SZ |) Hed 


424 Part C 


Rough Sets 


Sz |) Hed 


25.37 


25.38 


25.39 


25.40 


25.41 


25.42 


25.43 


25.44 


25.45 


25.46 


25.47 


25.48 


25.49 


25.50 


25.51 


25.52 


25.53 


G.L. Liu: The relationship among different covering 
approximations, Inf. Sci. 250, 178-183 (2013) 

Y.-L. Zhang, M.-K. Luo: Relationships between 
covering-based rough sets and relation-based 
rough sets, Inf. Sci. 225, 55-72 (2012) 

P. Samanta, M.K. Chakraborty: Generalized rough 
sets and implication lattices, Trans. Rough Sets 14, 
183-201 (2011) 

J.A. Pomykala: Approximation, Similarity and 
Rough Construction, ILLC Prepublication Series for 
Computation and Complexity Theory, Vol. 93 (Univ. 
Amsterdam, Amsterdam 1993) 

T.-J. Li: Rough approximation operators in covering 
approximation spaces, RSCTC2006 Proc. (Springer, 
Berlin Heidelberg 2006) pp. 174-182 

D. Slezak, P. Wasilewski: Granular sets - Founda- 
tions and case study of tolerance spaces, RSFD- 
Gr(2007 Proc. (Springer, Berlin Heidelberg 2007) 
pp. 435-442 

G. Cattaneo, D. Ciucci: Lattices with interior 
and closure operators and abstract approximation 
spaces, Trans. Rough Sets 10, 67-116 (2009) 

Z. Bonikowski: A certain conception of the calculus 
of rough sets, Notre Dame J. Formal Log. 33(3), 412- 
421 (1992) 

K.Y. Qin, Y. Gao, Z. Pei: On covering rough sets, 
RSKT2007 Proc. (Springer, Berlin Heidelberg 2007) 
pp. 34-41 

W. Zhu: Topological approaches to covering rough 
sets, Inf. Sci. 177(6), 1499-1508 (2007) 

Y.Y. Yao, Y.H. Chen: Subsystem based generaliza- 
tions of rough set approximations, Proc. Int. Conf. 
Found. Intell. Syst. (Springer, Berlin Heidelberg 
2005) pp. 210-218 

A. Wiweger: On topological rough sets, Bull. Pol. 
Acad. Sci. Math. 37, 89-93 (1989) 

J. Jarvinen: On the structure of rough approxima- 
tions, Fundam. Inf. 53(2), 135-153 (2002) 

G. Cattaneo: Abstract approximation spaces for 
rough theories, Rough Sets Knowl. Discov. 1, 59-98 
(1998) 

M. Gehrke, E. Walker: On the structure of rough 
sets, Bull. Pol. Acad. Sci. Math. 40, 235-245 (1992) 
B.F. Chellas: Modal Logic: An Introduction (Cam- 
bridge Univ. Press, Cambridge 1988) 

E. Orlowska, Z. Pawlak: Representation of non- 
deterministic information, Theor. Comput. Sci. 29, 
27-39 (1984) 


25.54 


25.55 


25.56 


25.57 


25.58 


25.59 


25.60 


25.61 


25.62 


25.63 


25.64 


25.65 


25.66 


D. Vakarelov: A model logic for similarity relations 
in Pawlak knowledge representation systems, Fun- 
dam. Inf. 15(1), 61-79 (1991) 

D. Vakarelov: Modal logics for knowledge represen- 
tation systems, Theor. Comput. Sci. 90(2), 433-456 
(1991) 

P. Balbiani, D. Vakarelov: A modal logic for in- 
discernibility and complementarity in information 
systems, Fundam. Inf. 50(3/4), 243-263 (2002) 

F. del Cerro, L.E. Orlowska: DAL — A logic for data 
analysis, Theor. Comput. Sci. 36, 251-264 (1985) 

M. Banerjee, K. Chakraborty: Algebras from rough 
sets. In: Rough-Neural Computing: Techniques for 
Computing with Words, ed. by S.K. Pal, A. Skowron, 
L. Polkowski (Springer, Berlin Heidelberg 2004) 
pp. 157-188 

M. Banerjee, M.A. Khan: Propositional logics from 
rough set theory, Trans. Rough Sets 6, 1-25 
(2007) 

D. Ciucci, D. Dubois: Truth-functionality, rough sets 
and three-valued logics, Proc. ISMVL (IEEE, Piscat- 
away 2010) pp. 98-103 


Z. Bonikowski, E. Bryniarski, U. Wybraniec- 


Skardowska: Extensions and intentions in the 
rough set theory, Inf. Sci. 107(1-4), 149-167 
(1998) 


G. Cattaneo, D. Ciucci: On the lattice structure of 
preclusive rough sets, IEEE Int. Conf. Fuzzy Syst., 
Piscataway (2004) 

B. Konikowska: Three-valued logic for reasoning 
about covering-based rough sets. In: Rough Sets 
and Intelligent Systems — Professor Z. Pawlak in 
Memoriam, Intelligent Systems Reference Library, 
Vol. 42, ed. by A. Skowron, Z. Suraj (Springer, Berlin 
Heidelberg 2013) pp. 439-461 

S. Greco, B. Matarazzo, R. Stowinski: Algebra and 
topology for dominance-based rough set ap- 
proach. In: Advances in Intelligent Information 
Systems, Studies in Computational Intelligence, 
Vol. 265, ed. by Z.W. Ras, L.-S. Tsay (Springer, Berlin 
Heidelberg 2010) pp. 43-78 

B.A. Davey, H.A. Priestley: Introduction to Lat- 
tices and Order (Cambridge Univ. Press, Cambridge 
1990) 

N. Caspard, B. Monjardet: The lattices of closure 
systems, closure operators, and implicational sys- 
tems on a finite set: A survey, Discret. Appl. Math. 
127(2), 241-269 (2003) 


425 


26. Fuzzy-Rough Hybridization 


Masahiro Inuiguchi, Wei-Zhi Wu, Chris Cornelis, Nele Verbiest 


Fuzzy sets and rough sets are known as uncer- 
tainty models. They are proposed to treat different 
aspects of uncertainty. Therefore, it is natural to 
combine them to build more powerful mathemat- 
ical tools for treating problems under uncertainty. 
In this chapter, we describe the state-of-the-art in 
the combinations of fuzzy and rough sets dividing 
into three parts. 

In the first part, we describe two kinds of 
models of fuzzy rough sets: one is classification- 
oriented model and the other is approximation- 
oriented model. We describe the fundamental 
properties and show the relations of those mod- 
els. Moreover, because those models use logical 
connectives such as conjunction and implication 
functions, the selection of logical connectives can 
sometimes be a question. Then we propose a log- 
ical connective-free model of fuzzy rough sets. 

In the second part, we develop a generalized 
fuzzy rough set model. We first introduce general 
types of belief structures and their induced dual 
pairs of belief and plausibility functions in the 
fuzzy environment. We then build relationships 
between belief and plausibility functions in the 
Dempster-Shafer theory of evidence and the lower 
and upper approximations in rough set theory in 
various situations. We also provide the potential 
applications of the main results to intelligent in- 
formation systems. 

In the third part, we give an overview of the 
practical applications of fuzzy rough sets. The main 
focus will be on the machine-learning domain. In 


26.1 Introduction 
to Fuzzy-Rough Hybridization .............. 425 


26.2 Classification- Versus 
Approximation-Oriented 


Fuzzy Rough Set Models....................... 427 
26.2.1 Classification-Oriented 

Fuzzy Rough Sets... 427 
26.2.2 Approximation-Oriented 

Fuzzy Rough Sets..sicccsesiners 431 
26.2.3 Relations Between Two Kinds 

of Fuzzy Rough Sets................6. 434 


26.2.4 The Other 

Approximation-Oriented 

Fuzzy Rough Sets..............::0:c.00 434 
26:25 REMAINS. si cckensicacssvsanncedianecadsnnn 436 


26.3 Generalized Fuzzy Belief Structures 
with Application 


in Fuzzy Information Systems. ............... 437 
26.3.1 Belief Structures 
and Belief Functions.................. 437 
26.3.2 Belief Structures 
of Rough Approximations........... 439 
26.3.3 Conclusion of This Section.......... 443 
26.4 Applications of Fuzzy Rough Sets.......... 444 = 
26.4.1 Applications D 
in Machine Learning.................. 44L a 
26.4.2 Other Applications.................. 446 — 
e e EE eametagerbescs 44T A 


particular, we review fuzzy-rough approaches for 
attribute selection, instance selection, classifica- 
tion, and prediction. 


26.1 Introduction to Fuzzy-Rough Hybridization 


Rough set approaches [26.1,2] have been successfully 
applied to various fields related to data analysis, knowl- 
edge discovery, decision analysis, and so on. In order 
to expand the application area and to develop its theory 


further, rough sets have been generalized under various 
settings. There are two different generalizations. One 
relaxes the precision so that the sizes of lower and upper 
approximations are controlled by a precision parameter. 


426 Part C 


Rough Sets 


l'9Z |) Hed 


This generalized rough set is called a variable preci- 
sion rough set. The other generalizes the approximation 
space, i.e., the structure of background knowledge. 
Many researchers generalized an equivalence relation 
which is often referred to as an indiscernibility relation 
to a general binary relation or a family. Many other re- 
searchers [26.3—23] generalized an equivalence relation 
to a fuzzy binary relation or a family of fuzzy sets. 

In this chapter, we describe the generalizations 
of rough sets in the latter sense. More precisely, 
we concentrate on the fuzzy generalizations of rough 
set approaches called fuzzy rough hybridizations. 
Fuzzy rough sets were originally proposed by Naka- 
mura [26.3] and by Dubois and Prade [26.4,5]. 
The fundamental properties of fuzzy rough sets have 
been investigated by Dubois and Prade [26.4,5] and 
Radzikowska and Kerre [26.9]. In those studies, an 
equivalence relation of approximation space in the orig- 
inal rough sets is generalized to a fuzzy equivalence 
relation. Greco et al. [26.7] proposed fuzzy rough sets 
under a fuzzy dominance relation. Those fuzzy rough 
sets are based on possibility and necessity measures di- 
rectly. Moreover, this type of fuzzy rough sets is defined 
under more generalized settings [26.11, 15] and differ- 
ent types of fuzzy rough sets were proposed based on 
certainty qualifications by Inuiguchi and Tanino [26.10, 
12] and also based on modifier functions by Greco 
et al. [26.24, 25]. The fuzzy rough set model can be 
used to deal with attribute reduction in information sys- 
tems with fuzzy decision while the fuzzy rough set 
model can be employed in reasoning and knowledge 
acquisition with decision tables with real-valued condi- 
tional attributes or quantitative data (see, for example, 
[26.26-36]). 

In the first part of this chapter, we introduce three 
models of fuzzy rough sets. Those fuzzy sets are classi- 
fied into two groups, i. e., classification-oriented fuzzy 
rough set models and approximation-oriented fuzzy 
rough set models proposed by Inuiguchi [26.37] orig- 
inally in the crisp settings. In the classification-oriented 
models, we are interested in a set to which objects 
belong. We evaluate each object whether its member- 
ship to a set X is consistent with all information we 
have at hand or not. The positive region of X is de- 
fined by collecting all objects whose memberships to 
X are consistent with whole information. The possi- 
ble region of X is defined by collecting all objects 
whose memberships to X are conceivable from some 
part of information but not consistent with all infor- 
mation. Then the fuzzy rough set of X is defined by 
a pair of the positive and possible regions of X. On 


the contrary, in approximation-oriented models, we are 
interested in the approximations of a set by using ele- 
mentary sets of a family. We approximate a set X by 
unions of the elementary sets and by intersections of the 
complementary sets of the elementary sets. The lower 
and upper approximations are defined by the inner and 
outer approximations of X, respectively. A rough set 
of X is defined by a pair of the lower and upper ap- 
proximations. We describe that one of the three models 
belongs to the group of classification-oriented models 
and the remaining two models belong to the group of 
approximation-oriented models. 

Another important method used to deal with un- 
certainty in intelligent systems is the Dempster-Shafer 
theory of evidence [26.38]. Shafer’s belief and plausi- 
bility functions are constructed under the assumption 
that the focal elements in the belief structure are all 
crisp. In some situations, it seems to be quite natural 
that the evidence mass may be assigned to a fuzzy sub- 
set of the universe of discourse. In fact, combining the 
Dempster-Shafer theory and fuzzy set theory has been 
suggested to be a way to deal with different kinds of un- 
certain information in intelligent systems in a number of 
studies. It is demonstrated that the lower and upper ap- 
proximation operators in rough set theory have strong 
relationship with the belief and plausibility functions in 
the Dempster-Shafer theory of evidence [26.21, 23, 39- 
44]. The Dempster-Shafer theory of evidence may be 
used to analyze knowledge acquisition in information 
systems (see, for example, [26.45—49]). 

In the second part of this chapter, we will explore 
the relationships between belief and plausibility func- 
tions in the Dempster-Shafer theory of evidence and 
the lower and upper approximations in rough set theory 
with their potential applications to intelligent informa- 
tion systems. 

Both fuzzy set and rough set theories have fostered 
broad research communities and have been applied in 
a wide range of settings. More recently, this has also ex- 
tended to the hybrid fuzzy rough set models. The third 
part of this chapter tries to give a sample of those appli- 
cations, which are in particular numerous for machine 
learning but which also cover many other fields, like 
image processing, decision making, and information re- 
trieval. 

Note that we do not consider applications that 
simply involve a joint application of fuzzy sets and 
rough sets, like for instance a rough classifier that in- 
duces fuzzy rules. Rather, we focus on applications that 
specifically involve one of the fuzzy rough set models 
discussed in the previous sections. 


Fuzzy-Rough Hybridization | 26.2 Classification- Versus Approximation-Oriented Fuzzy Rough Set Models 


This chapter is organized as follows. In the next 
section, three models of fuzzy rough sets are ex- 
plained dividing into two groups. In Sect. 26.3, we 
introduce generalized fuzzy belief structures with ap- 


plication in fuzzy information systems. In Sect. 26.4, 
we give an overview of the practical applications of 
fuzzy rough sets focusing on the machine-learning 
domain. 


26.2 Classification- Versus Approximation-Oriented 


Fuzzy Rough Set Models 


In this section, we review three kinds of fuzzy rough 
sets from classification-oriented and approximation- 
oriented points of view. Focusing on the membership 
of an object to a set X under the indiscernibility re- 
lation, the classical rough set defined by a pair of 
lower and upper approximations of a set X can be seen 
as a classifier of objects into three disjoint regions: 
positive, negative, and boundary regions of a set X. 
Namely, the lower approximation defines the positive 
region, the complement of the upper approximation de- 
fines the negative region and the difference between 
upper and lower approximations defines the boundary 
region. On the other hand, focusing on the approx- 
imations of X by means of elementary sets of the 
partition, the rough set of X defines the inner and 
outer approximations of X. Namely, the lower approx- 
imation defines the inner approximation and the upper 
approximation defines the outer approximation. Those 
two different views of rough sets give different defi- 
nitions of rough sets in the generalized settings (see 
Inuiguchi [26.50]). In this section, we describe fuzzy 
rough sets in a generalized setting from those points of 
view and show the fundamental properties, differences, 
and similarities. 


26.2.1 Classification-Oriented 
Fuzzy Rough Sets 


Definitions in Crisp Setting 
In this subsection, we define fuzzy rough sets under 
the interpretation of rough sets as classification of ob- 
jects into positive, negative, and boundary regions of 
a set and describe their properties. As the introduc- 
tion, we first describe the definitions of positive and 
possible regions of a set in the crisp setting. Let U be 
a set of all objects. Assume that we do not know ob- 
jects which fit with a particular concept C but we have 
pieces of information that tell some objects fit with C 
and that the other objects do not fit. Let X C U be the 
set of objects which are supposed to fit with C in the 
information and U — X the set of objects which are sup- 


posed not to fit with C in the information. On the other 
hand, there is knowledge about C expressed by a bi- 
nary relation P C Ux U. Under the binary relation P, 
we presume y fits with C from facts (y, x) € P and x fits 
with C. 

Under this circumstance, we investigate credible 
members of X and plausible members of X. Objects 
whose membership to X is consistent with the knowl- 
edge can be understood as credible members of X, 
while objects whose membership to X is presumable 
from the information and the knowledge can be un- 
derstood as plausible members. For convenience, we 
define P(x) = {y € U | (y, x) € P} which is the set of ob- 
jects whose membership to X is presumed from the fact 
x € X. Therefore, if x € X satisfies Yy € P(x), y€ X or 
simply, P(x) C X, x can be considered a credible mem- 
ber of X. Thus, the set of credible members of X is 
defined by 


P(X) = {x € X | P(x) € X} Gaal 
=XN{xeU| P(x) CX}. i 

On the other hand, we may presume x € X if x € X or 

dy € X, x € P(y) under the information and the knowl- 


edge. Then the set of plausible members of X can be 
defined by 


P*(X)=XU{xeU|Jy EX, xe P(y) FB}. 
(26.2) 


P(X) is called the positive region of X and P* (X) is 
called the possible region of X. Moreover, we do not as- 
sume the reflexivity of P, i.e., Yx € U, (x, x) € P. This 
is why we take the intersection with X in the defini- 
tion of P(X) and the union with X in the definition 
of P* (X). Those intersection and union can be dropped 
when P is reflexive. 

When there is knowledge about C expressed by a bi- 
nary relation Q C U x U instead of P. Under the binary 
relation Q, we presume y does not fit with C from facts 


427 


7°92 |) Hed 


428 Part C | Rough Sets 


7°97 |) Hed 


(y,x) € Q and x does not fit with C. In this case, we 
directly obtain positive and possible regions of U — X, 
respectively, by 


Q.(U—X) = {xe U—-X| Ox) CU-X} 
= (U—X)N {xe U| QQ) CU-X}, 
(26.3) 


O*(U—X) =(U-X) U{xeU| aye U-X, 
xE Oy) FA. (26.4) 


Because an object that is not a member of Q..(U — X) 
can be seen as a plausible member of X and an object 
which is not a member of Q*(U — X) can be seen as 
a credible member of X, we may define positive and 
possible regions of X by 


Q(X) = U-O*(U—X), 
Q*(X) = U- Q(U- X). (26.5) 


Inuiguchi [26.50] investigated the properties of those 
positive and possible regions. 


Definitions in Fuzzy Setting 

and Their Properties 
We now extend those definitions of positive and pos- 
sible regions into the fuzzy setting. First, we as- 
sume a fuzzy set X CU and a fuzzy binary rela- 
tion PC UxXU are given. Their membership func- 
tions x(x) and up(y,x) show the membership de- 
gree of x<¢U to a fuzzy set X and the degree to 
what extent we presume that y is a member of 
X from the fact x is a member of a fuzzy set 
X, where uy: U > [0,1] and wp: Ux U > [0,1]. We 
define P(x) by its membership function py.) (y) = 
Up, x). 

To define the positive region under this circum- 
stance, we should consider the consistency degree of 
the information that x is a member of X to member- 
ship degree uy(x) with the knowledge P. This can be 
measured by the truth value of statement y € P(x) im- 
plies y€ X under fuzzy sets P(x) and X. The truth 
value of this statement can be defined by a neces- 
sity measure infyey I(f1p(,) O), ux(y)) with an implica- 
tion function /: [0, 1] x [0, 1] — [0, 1] such that /(0, 0) = 
1(0, 1) =7(1, 1) = 1, 10,0) = 0, I(-, a) is decreasing 
for any a € [0, 1] and /(a,-) is increasing for any a € 
[0, 1]. Therefore, in the analogy to (26.1), the member- 
ship function of the positive region P(X) of X can be 


defined by 


upoo (9 = min ( x0), inf ney 0)- se) J 


= min (ux), inf 44.2), x0) ) 
l (26.6) 


where we note the intersection CMD of two fuzzy 
sets C,DCU is normally defined by ucnplx) = 
min(uc(x), Hp(x)), Vx EU. ucnp, Hc and up are 
membership functions of CMD, C and D. How- 
ever, some researchers use t-norms [26.51] instead 
of the min operation. A t-norm ¢ is a conjunction 
function t: [0, 1] x [0,1] + [0,1] such that (tl) Ya € 
[0, 1], z(a, 1) = t(1,a) =a (boundary condition), (t2) 
Va, b € [0, 1], t(a, b) = t(b, a) (commutativity) and (t3) 
Va,b,c € [0,1], t(a, t(b, c)) = t(t(a, b), c) (associativ- 
ity). 

Now let us define the possible region when X 
and P are a fuzzy set and a fuzzy binary relation, 
respectively. To do this, we should define the truth 
value of statement there exists ye X such that x€ 
P(y) under fuzzy sets X and P(x). The truth value 
of this statement can be obtained by a possibility 
measure sup,cy T (upo) (Xx), Ux(y)) with a conjunction 
function T: [0, 1] x [0, 1] — [0, 1] such that T(1, 1) = 1, 
T(0,0) = T(0, 1) = TC, 0) = 0 and T is increasing in 
both arguments. Therefore, in the analogy to (26.2), the 
membership function of the possible region P* (X) of X 
can be defined by 


p+ (x) (x) = max (1 (x), ap T (poy), z 


= max (o. ap T(up(x, y), z ; 
l (26.7) 


where we note the union CUD of two fuzzy 
sets C,DCU is normally defined by wcup(x) = 
max(uc(x), 4p(x)), Vx € U. ucup is a membership 
functions of CU D. However, some researchers use 
t-conorms [26.51] instead of the max operation. A t- 
conorm s is a function s: [0, 1] x [0, 1] — [0, 1] such that 
(sl) Ya € [0,1], s(a, 0) = s(0,a) = a (boundary con- 
dition), (s2) Va, b € [0,1], s(a, b) = s(b,a) (commu- 
tativity), (s3) Ya, b, c € [0, 1]s(a, s(b, c)) = s(s(a, b), c) 
(associativity). and (s4) Ya, b, c,d such that a > c and 
b > d; s(a, b) > s(c, d) (monotonicity). 

Note that we do not assume the reflexivity of 
P, i.e., up(x,x)=1, Vxe U so that we take the 


Fuzzy-Rough Hybridization | 26.2 Classification- Versus Approximation-Oriented Fuzzy Rough Set Models 429 


minimum between py and infyey I(urœw O), uxO)) 
in Eq. (26.6) and the maximum between py and 
supyey T (upo (x), Hx(y)) in (26.7). When P is reflex- 
ive, [(1,a) <a and T(1,a) =a for all a€ [0,1], we 
have 


Hp. (xy) = inf Iuew O), Hx) » 


26.8 
[p* (x) (x) = Sup Turo (x), ux). on 


Those definitions of lower and upper approximations 
have been proposed by Dubois and Prade [26.4,5] 
and Radzikowska and Kerre [26.9]. They assumed the 
reflexivity of P and J(1,a) = T(1,a) =a, for all a € 
[0, 1]. Moreover, the definitions of (26.8) are used even 
when P is not reflexive and neither Z nor T satisfy 
the boundary conditions /(1,a) = T(1,a) = a, for all 
a € [0, 1] [26.15, 52]. In such generalized situation, we 
may loose the inclusiveness of P4 (X) in X and that of X 
in P*(X) for Ps (X) and P*(X) defined by (26.8). The 
definitions of Px (X) and P* (X) by (26.6) and (26.7) ob- 
tained from the interpretations of positive and possible 
regions of X satisfy the inclusiveness of Px (X) in X and 
that of X in P* (X) even in the generalized situation. 

Using the positive region P(X) and the possible 
region P*(X), we can define a fuzzy rough set of X as 
a pair (Ps (X), P* (X)). We can call such fuzzy rough 
sets as classification-oriented fuzzy rough sets under 
a positively extensive relation P of X (for short CP- 
fuzzy rough sets). Note that the relation P depends on 
the meaning of a set X. Thus, we cannot always define 
the CP-rough set of U — X by the same relation P. 

To define a CP-rough set of U—X, we should in- 
troduce another fuzzy relation QC U xU such that 
How Y) = Holy, x) represents the degree to what ex- 
tent we presume an object y as a member of U — X from 
the fact x is a member of U — X, where ug: U x U > 
[0, 1] is a membership function of a fuzzy relation Q. In 
the same way, we define positive and possible regions 
of U—X under fuzzy relation Q by the following mem- 
bership functions 


LQ..(U—X) (x) 

= min (roxo), inf 104010. n(uee))) , 
(26.9) 

Ho* (U—x) (x) 

= max (roxo ap T(uo(x, y), nawo) , 


(26.10) 


where U—X is defined by a membership function 
n(jx(-)) andn: [0, 1] — [0, 1] is a strong negation which 
is a decreasing function such that n(n(a)) = a,a € [0, 1] 
(involutive). The involution implies the continuity of n. 

Using Q..(X) and Q* (X), in analogy to (26.5), we 
can define the positive region Q,.(X) and the possible 
region Q* (X) of X by the following membership func- 
tions 


Hoo 0) 

= min (a), ingar uono). 
(26.11) 

How (x) 


= max (no. sup n(I(Lo(y, x), nao) s 
l (26.12) 


_ We can define a fuzzy rough set of X as a pair 
(Qx (X), O* (X)) with the positive region Q» (X) and the 
possible region Q* (X). We can call this type of rough 
sets as classification-oriented fuzzy rough sets under 
a negatively extensive relation Q of X (for short CN- 
fuzzy rough sets). 

Let us discuss the properties of CP- and CN-fuzzy 
rough sets. By definition, we have 


P,Q CX C P*(X), 
OX) CX CQ*(X), (26.13) 
P. (Ø) = P* (Ø) = Ox(@) = O* (Ø) =ø, (26.14) 
P,(U) = P* (U) = O«(U) = Q*(U) = U 


(26.15) 


P4 (XN Y) = Px (X) A P4 (Y), 
P* (XU Y) = P* (X) U P* (Y), (26.16) 
Qx (XN Y) = Q(X) N Q4), 
O* XUY) = O*(X)UO*(Y), (26.17) 
X CY implies P(X) C P4 (Y), 
X C Y implies P* (X) c P* (Y), (26.18) 
X C Y implies Q+ (X) C Qx (Y) , 
X C Y implies Q* (X) C Q* (Y) , (26.19) 
Pa (XU Y) D Py (X) U P4 (Y), 
P*(XA Y) C P*(X)NP*(Y), (26.20) 


Qx (XU Y) 2 Q4 (X)U Q4 (Y), 
O* (XN Y) c O*(X)NO*(Y), (26.21) 


7°92 |) Hed 


430 Part C | Rough Sets 


where the inclusion relation between two fuzzy sets A 
and B is defined by ua (x) < p(x), for all x € U. 

The properties satisfied under some conditions are 
listed as follows (see Inuiguchi [26.37]): 


(1) When I(a,b) =n(T(a,n(b))), for all a,b € [0, 1] 
and Q is the converse of P, i. e., Wa(x, y) = p(y, x), 
for all x,y € U, we have 

P4 (X) = U—Q*(U—X) = Q4 (X) , 
P*(X) = U- Q(U- X) = Q* (X). 


(26.22) 
(26.23) 


(2) When T (a, I(a, b)) < b holds for all a, b € [0, 1], we 
have 


X 2 P* (P4 (X)) 2 Px (X) 2 Px(Px(X)) , 
(26.24) 


X C Q4 (Q* (X)) € O*(X) € O* (O*(X)). 
(26.25) 


(3) When I (a, T (a, b)) > b holds for all a, b € [0, 1], we 
have 


X C P4 (P* (X)) C P* (X) C P* (P*(X)), 
(26.26) 


X 2 O* (Qx (X)) 2 Q4 (X) 2 Q4 (0 (X)) . 
(26.27) 


(4) Let P and Q be T’-transitive. The following asser- 
tions are valid: 
(a) When J is upper semicontinuous and satis- 
fies I(a, I(b, c)) = I(T' (b, a), c) for all a,b,c € 
[0, 1], we have 
Pa (Px (X)) = Px(X), Q*(Q*(X)) = O*(X). 
(26.28) 


7°92 |) Hed 


(b) When T =T is lower semicontinuous and 
satisfies T(a,T(b,c))=T(T(a,b),c) for all 
a,b,c € [0, 1] (associativity), we have 
P*(P*(X)) = P*(X), Qs (Qx (X)) = Q+ (X) . 

(26.29) 


(5) When P and Q are reflexive and T-transitive, the 
following assertions are valid: 
(a) If I(a,-) is upper semicontinuous, I(1,a) < a, 
and T = [I] is associative, then we have 
P* (P4 (X)) = Px(X), Ox(Q*(X)) = Q* (X) . 
(26.30) 


(b) If I(a, b) = n(& [I] (a, n(b))) and the conditions 
of (a) are satisfied, then we have 

P4 (P*X)) = P*(X), Q*(Qx(X)) = Ox(X). 

(26.31) 


Here a fuzzy relation P is said to be T’-transitive, if 
and only if P satisfies p(x, z) > T’ (p(x, y), up, Z)) 
for all x, y, z € U and for a conjunction function T’. We 
can generate a function &[/]: [0, 1] x [0, 1] — [0, 1] by 
E[7](a, b) = inf{s € [0, 1] | (a, s) > b} when a function 
I: [0, 1] x [0, 1] = [0, 1] is given. &[/] is a conjunction 
function when 7 satisfies 7(1, a) < 1 for all a € [0, 1). 

Concerning to the assumption of (1), it is known 
that a function J’ defined by I’ (a, b) = n(T(a,n(b))) is 
an implication function and that a function T’ defined 
by T’(a,b) =n((a,n(b))) is a conjunction function 
(see, for example, Inuiguchi and Sakawa [26.51,53)). 
The assumption of (2) corresponds to modus ponens, 
i.e., A and (A — B) implies B. Therefore, it is a nat- 
ural assumption. However, this cannot hold for any 
implication and conjunction functions. For example, 
consider functions T(a,b) = min(a,b) and I(a, b) = 
max(1—a,b) which are often used in possibility the- 
ory. T(a, I(a, b)) < b does not always hold. On the other 
hand, the assumption holds for any T and J such that 
T(a, b) < min(a, b) for all a,b € [0,1] and Z(a, b) < b 
for all a,b € [0, 1] satisfying a > b. Thus, a t-norm T 
and a residual implication / of a t-norm satisfies the as- 
sumption, i. e., J is defined by I(a, b) = sup{s € [0, 1] | 
t(a,s) < b}, for a,b € [0, 1]. The assumption of (3) is 
dual with that of (2). Namely, for any implication func- 
tion 7, there exists a conjunction function T” such that 
I(a, b) = n(T’(a, n(b))), and for any conjunction func- 
tion T, there exists an implication function /’ such that 
T(a, b) = n(I'(a, n(b))). Using T’ and I’, the assump- 
tion I(a, T(a, b)) = b is equivalent to T’(a, T’ (a, b)) < b 
which is the same as the assumption of (2). 

The assumption of (3) is satisfied with Z and T 
such that [(a, b) > max(n(a), b) for all a, b € [0, 1] and 
T(a, b) > b for all a,b € [0, 1] satisfying a > n(b). The 
assumption of (4)-(a) is satisfied with residual im- 
plication functions of lower semicontinuous f-norms 
T’ and S-implication functions with respect to lower 
semicontinuous t-norms T’, where an S-implication 
function J with respect to the t-norm T’ is defined 
by I(a,b) = n(T’(a,n(b))), a,b € [0, 1] with a strong 
negation n. The assumption of (4)-(b) is satisfied with 
lower semicontinuous t-norms T. These assumptions 
are satisfied with a lot of famous implication and con- 
junction functions. 


Fuzzy-Rough Hybridization | 26.2 Classification- Versus Approximation-Oriented Fuzzy Rough Set Models 431 


26.2.2 Approximation-Oriented 
Fuzzy Rough Sets 


Definitions in Crisp Setting 

In this section, we define fuzzy rough sets under the in- 
terpretation of rough sets as approximation of sets and 
describe their properties. We first describe the defini- 
tions of lower and upper approximations in the crisp 
setting. We assume a family of subsets in U, F = 
{F;|i=1,2,...,p}is given. Each elementary set F; is 
a meaningful set of objects such as a set of objects sat- 
isfying some properties. F;s can be seen as information 
granules with which we would like to express a set of 
objects. Given a set X C U, an understated expression 
of X, or in other words, an inner approximation of X by 
means of unions of F;s is obtained by 


FLO= JF EFFEX. (26.32) 


On the other hand, an overstated expression of X, or in 
other words, an outer approximation of X by means of 
unions of F;s is obtained by 


FE =U FI Ur2x, 
ies ies 
Jepo, (26.33) 


where we define Fo = U. We add Fo considering cases 
where there is no J C {1,2,...,p} such that ),-,; Fi 2 
X. In such cases, we obtain FÖ (X) = U owing to the 
existence of Fo = U. FY (X) and F(X) are called 
lower and upper approximations of X, respectively. 
Applying those approximations to U — X, we obtain 
FY (U—X) and Fö (U — X). From those, we obtain 


FRX) =U- FE (U-X) 


=Vju-Ur|Un2u-x, 
ies ie] 
Jepo, (26.34) 


FAX) =U- Fy (U-X) 
=[( {U -F; |F; c U-X, 
ie€{1,2,...,p,e}}, (26.35) 


where we define Fe = Ø. We note that FẸ (X) and 
F(X) are not always the same as Fy’ (X) and Fö (X), 


respectively. The properties of those lower and upper 
approximations are studied by Inuiguchi [26.50]. 


Definitions by Certainty Qualifications 

in Fuzzy Setting 
We extend those lower and upper approximations to 
cases where F is a family of fuzzy sets in U and X 
is a fuzzy set in U. To do this, we extend the inter- 
section, union, complement, and the inclusion relation 
into the fuzzy setting. The intersection, union, and com- 
plement are defined by the min operation, the max 
operation and a strong negation n, i. e., CN D, CUD and 
U—C for fuzzy sets C and D are defined by member- 
ship functions Wcnp(x) = min(uc(x), up(x)), Vx € C, 
Hcup(x) = max(uc(x), Mp(x)), WeEC, bLu—c(x) = 
n(uc(x)), Vx € C, respectively. The inclusion relation 
C CD is extended to inclusion relation with degree 
Inc(C, D) = inf, I(uc(x), p(x), where J is an impli- 
cation function. 

First let us define a lower approximation by extend- 
ing (26.32). In (26.32), before applying the union, we 
collect F; such that F; C X. This procedure cannot be 
extended simply into the fuzzy setting, because the in- 
clusion relation has a degree showing to what extent the 
inclusion holds in the fuzzy setting. Namely, each F; has 
a degree q; = Inc(F;, X). This means that X includes F; 
to a degree q;. Therefore, by using F;, X is expressed as 
a fuzzy set including F; to a degree q;. In other words, 
X is a fuzzy set Y satisfying 


Inc(F;, Y) = inf I(ur, (x), Uy (x) = qi . (26.36) 


We note that there exists a solution satisfying (26.36) 
because q; is defined by Inc(F;, X). There can be many 
solutions Y satisfying (26.36) and the intersection and 
union of those solutions can be seen as inner and outer 
approximations of X by F;. Because we are now extend- 
ing (26.32) and interested in the lower approximation, 
we consider the intersection of fuzzy sets including F; 
to a degree q;. Let us consider 


Inc(F;, Y) = inf I(r, (x), by(x)) = qi, (26.37) 


instead of (26.36). Equation (26.37) is called a con- 
verse-certainty qualification [26.10] (or possibility- 
qualification). Because /(a, -) is increasing for any a € 
[0, 1] for an implication function J and (26.36) has a so- 
lution, the intersection of solutions of (26.36) is the 
same as the intersection of solutions of (26.37). More- 
over, because J is upper semicontinuous, we obtain the 
intersection of solutions of (26.37) as the smallest solu- 
tion Y of (26.37) defined by the following membership 


7°92 |) Hed 


432 


7°92 |) Hed 


Part C 


Rough Sets 


function 


My (x) = inffs € [0, 1] | (ue, (x), 8) > qi 

= $ [N (Hr, (), qi) - (26.38) 
We have Y C X. Because we have many F; € F, the 
lower approximation F. £ (X) of X is defined by the fol- 
lowing membership function 


Hp (xy) 


= sup él (unc. inf Mu). ) | (26.39) 
FEF ye 


where F can have infinitely many elementary fuzzy 
sets F. 

Because (26.32) is extended to (26.39), (26.35) is 
extended to the following equation in the sense that 


FEX) = U-FE(U-X) 


H FE (x) 


= jot. m (em (ur. in 14r().m(ux0))) 
(26.40) 


where u FEW is the membership function of the upper 
approximation F(X) of X. 

Now let us consider the extension of (26.33). In 
this case, before applying the intersection, we collect 
Uie, Fi such that (J;e; F: 2 X. In the fuzzy setting, 
each (J;e; F; has a degree ry = Inc(X, Uie; Fi). This 
means that X is included in | J;e; F; to a degree ry. 
Therefore, by using F;, i € J, X is expressed as a fuzzy 
set included in |_];<; F; to a degree ry. In other words, X 
is a fuzzy set Y satisfying 


Inc (r U n) = inf (uro. max un) 
x jE 
ies 


= rfj. 
(26.41) 


We note that there exists a solution satisfying (26.41) 
because ry is defined by Inc(X, |),<; Fi). There can be 
many solutions Y satisfying (26.41) and the intersec- 
tion and union of those solutions can be seen as inner 
and outer approximations of X by (J;e; Fi. Because we 
are now extending (26.33) and interested in the upper 
approximation, we consider the union of fuzzy sets in- 


cluding U,<; F; to a degree ry. Let us consider 


Inc (r U n) = inf] (uro. max u) 
ier x ie] 
Zj, 


(26.42) 


instead of (26.41). Equation (26.42) is called a cer- 
tainty qualification [26.10,54]. Because J(-,a) is de- 
creasing for any a € [0, 1] for an implication function 
I and (26.41) has a solution, the union of solutions 
of (26.41) is the same as the union of solutions 
of (26.42). Moreover, because I is upper semicontin- 
uous, we obtain the union of solutions of (26.42) as the 
greatest solution Y of (26.42) defined by the following 
membership function 


M5 (x) = sup fs € [0,1] | I (s max uno) > al 


= o[z] @ max u) , 
ie] 
(26.43) 
where we define o[/](a,b) = sup{s € [0, 1] | Z(s, b) = 
a} for a,b € [0, 1]. We have X C Y. Because we have 


many (J;e; F;, the upper approximation F% (X) of X is 
defined by the following membership function 


Legs(xy (x) 


= inf off] ( inf I . 
jn fol ] (ing (1x0. ap ur) 


sup uro) g (26.44) 


FET 


where F can have infinitely many elementary fuzzy sets 
F. We note that o [J] becomes an implication function. 

Because (26.33) is extended to (26.44), (26.34) is 
extended to the following equation in the sense that 
FE (X) =U-F4(U-X) 


HFI (x) (x) 


= sup n (om (ing (noxo). sup ir) 
TEF yeu FET 
sup 7) . 
FET 


where Hra) is the membership function of the upper 
approximation F5(X) of X. 


(26.45) 


Fuzzy-Rough Hybridization | 26.2 Classification- Versus Approximation-Oriented Fuzzy Rough Set Models 


These four approximations were originally pro- 
posed by Inuiguchi and Tanino [26.10]. They se- 
lected a pair (FiO, Ft (X)) to define a rough set 
of X. However, in connection with the crisp case, 
Inuiguchi [26.37] selected pairs (FÈ (X), F (X)) and 
(Fo (X), Fz f (X)) for the definitions of rough sets of X. 
In this chapter, a pair (FE (X), Fë (X)) is called a £- 
fuzzy rough set and a pair (F (x), Fe (X)) a o-fuzzy 
rough set. 


Properties 
First, we show properties about the representations of 
lower and upper approximations defined by (26.39), 
(26.40), (26.44), and (26.45). We have the following 
equalities (see Inuiguchi [26.37]) 


Hrt O = supti ure), h) |F € Fhe [0,1] 

such that 
Hrg œ (x) = sup fn (om (» sup u) ) | 

FET 

TCF,he [0,1] 

such that 

ol Q sup 10) > n(ux(y)), Yy € u} 

Fe 
(26.47) 

Hgt oo œ = infin G UZ] (ee), h) | 

FefF,he[0,1] 

such that 

Elur), h) < n(ux(y)), Yy € U}, (26.48) 
Le Fe (x) (x) 

= int fot (« sup u) |T CF,he [0,1] 

FET 
such that 
oli] @ sup ur) > ux), Vy € u} ; 
Fe 
(26.49) 


Using these equations, the following properties can be 
easily obtained: 


FEM CXC FE), 
FZX EXC FX), (26.50) 


Fe@) = FEO) =9 


FEU) = Fg(U) =U, (26.51) 
X CY implies F£(X) C F(Y), 

X CY implies F2 (X) C FIY), (26.52) 
X C Y implies Fz (X) C Fz (Y), 

X C Y implies F(X) C F(Y), (26.53) 
FERUY 2 FOU FEC), 

FI XUY) 2 FZ X)U FFY), (26.54) 
Fe(XNY)C FËXNFĚY). 

FŽXNAY) C FŽX) N FEY), (26.55) 
FÊU-X) =U- FEX), 

FZ (U-X) =U- FRX). (26.56) 


Furthermore, we can prove the following properties 
(see Inuiguchi [26.37]): 


(7) The following assertions are valid: 
(a) Ifa > 0, b < 1 imply (a,b) < 1 and 


a . 
inf sup hrx) >0 


then we have 
F(U) = U and Ff (0) =Ø 
(b) If b < 1 implies /(1, b) < 1 and 


inf sup r(x) = 1, 
U FEF 


then we have 
Fe (U) = U and FE (0) =9 


(c) If a>0, b< 1 imply I(a,b)< 1 and Yx € U, 
JF €e F such that urp(x)< 1, then we have 
FZ (Ù) =U and F$ Ø) =Ø 

(d) Ifa > 0 implies Z(a,0) < 1 and Yx € U, JF € F 
such that ur (x) = 0, then we have F2 (U) = 


and Fž (Ø) = 
(8) We have 
FANY) = FF VNF Y), 
F3 XUY) = FX) U FSO). (26.57) 


Moreover, if Va € [0, 1], 7(a,a) = 1 and YF;, F; € 
F, Fi A Fj, Fi Q F; = Ø, we have 


FEXNY) = FEX) N FEY), 
Fe XUY) = Fe (X)U FY). (26.58) 


433 


7°92 |) Hed 


434 Part C | Rough Sets 


7°92 |) Hed 


(9) We have 
FLED) = FÊX) , 
FS CFS = FIX), (26.59) 
FE (FEO) = FE 
FFs) = F% (X). (26.60) 


Inuiguchi and Tanino [26.10] first proposed this 
type of fuzzy rough sets. They demonstrated the ad- 
vantage in approximation when P is reflexive and 
symmetric, J is Dienes implication, and T is minimum 
operation. Inuiguchi and Tanino [26.55] showed that by 
selection of a necessity measure expressible various in- 
clusion situations, the approximations become better, 
i.e., the differences between lower and upper approx- 
imations satisfying (26.58) become smaller. Moreover, 
Inuiguchi and Tanino [26.56] applied these fuzzy rough 
sets to function approximation. 


26.2.3 Relations Between Two Kinds 
of Fuzzy Rough Sets 


Under the given fuzzy relations P and Q described 
in Sect. 26.2.1, we discuss the relations between two 
kinds of fuzzy rough sets. Families of fuzzy sets are 
defined by P = {P(x) | x € U} and Q = {Q(x) |x € U}. 
We have the following assertions: 


(10) When P and Q are reflexive, /(1, a) = a, we have 


P(X) C PE(X), QE(X) CO*(X). 
(26.61) 


(11) When P and Q are reflexive, X is a crisp set, a < b 
if and only if I(a,b) = 1 and T(a, 1) =a for all 
a € [0, 1], we have 


POS EH), On(X) COT). 
(26.62) 


(12) When P and Q are T-transitive, the following as- 
sertions are valid: 
(a) When T = &€[I/] is associative, we have 


PE (X) SPx(X),  O*(X) C Q(X). 
(26.63) 
(b) When T = €[o[/]] and of[/](a, o[/](b,c)) = 


olN|(b, o[1]|(a,c)) for all a,b,c € [0,1], we 
have 


P*(X) C PX), Q(X) c OX). 
(26.64) 


Here we define 
t[T](a, b) = sup{s € [0, 1] | T(a, s) < b}. 


This functional ¢ can produce an implication function 
from a conjunction function T. Note that ¢ [E [Z]] = Z and 
€[¢[T]] = Z for upper semicontinuous J and lower semi- 
continuous T (see Inuiguchi and Sakawa [26.53]). 


26.2.4 The Other Approximation-Oriented 
Fuzzy Rough Sets 


Greco et al. [26.24, 25] proposed fuzzy rough sets cor- 
responding to a gradual rule [26.57], the more an object 
is in G, the more it is in X with fuzzy sets G and X. 
Corresponding to this gradual rule, we may define the 
lower approximation GF (X) of X and the upper ap- 
proximation G} (x) of X, respectively, by the following 
membership functions 


Het oy) = influx(2)|z€ U, ue) = Me}. 
(26.65) 

Lox œw) = supt{ux(z) |z€ U, uek) < ue). 
(26.66) 


When we have a gradual rule, the less an object is in G, 
the more it is in X, we define the lower approximation 
G, (X) of X and the upper approximation G* (X) of X, 
respectively, by the following membership functions 


Lax xy (x) = inf{x(z) | z€ U, We) < Me}. 
(26.67) 


Ho (x) (x) = supfux(z) |z € U, eZ) = Me()} - 
(26.68) 


Moreover, when a complex gradual rule, the more an 
object is in GĦ and the less it is in GT, the more it is in 
X is given, the lower approximation G% (X) and upper 
approximation G7, (X) are defined, respectively, by the 
following equations 


H goo) = influx(@) [Ze U, ug @) > Hg, 


ug) <u, (26.69) 
ugy co (x) = sup{ux(z)|z€U, we @ < ug 0), 
MG (2) = ug). (26.70) 


where we define G = {G+, G7}. 

The fuzzy rough sets are defined by pairs of those 
lower and upper approximations. This approach is ad- 
vantageous in (i) no logical connectives such as im- 
plication function, conjunction function, etc., are used 


Fuzzy-Rough Hybridization | 26.2 Classification- Versus Approximation-Oriented Fuzzy Rough Set Models 


and (ii) the fuzzy rough sets correspond to gradual 
rules (see Greco et al. [26.24,25]). However, we need 
a background knowledge about the monotone proper- 
ties between G (or G) and X. 

This approach can be seen from a viewpoint of mod- 
ifier functions of fuzzy sets. A modifier function ø is 
generally a function from [0, 1] to [0, 1] [26.58]. Func- 
tions defined by g(x) = x7, gx) = yx and 3(x) = 
1 — x are known as modifier functions corresponding to 
modifying words very, more, or less and not. Namely, 
given a fuzzy set A, we may define fuzzy sets very A, 
more or less A and not-A by the following membership 
functions 


Every A (x) = (HA (x))’, 


more or less A (x) = y Ma (x), 


Hnot-a (X) = 1— pa(x). (26.71) 


Such modifier functions are often used in approxi- 
mate/fuzzy reasoning [26.59, 60], especially in the in- 
direct method of fuzzy reasoning which is called also, 
truth value space method. 

Namely, we may define the lower approximation 
®,.(X) of X and the upper approximation ®* (X) of X 
by means of a fuzzy set G by the following membership 
functions 


Le. (x(x) = of * (Ue). 


Horo) = 96x (Me) , (26.72) 
where ® = {p97 *¥, pë y} and modifier functions 
go * and gë y are selected to satisfy 


oy * (ue) < Ux), 
Pox (UG(X)) > mxx), Wr EU. 
Indeed, 


UC, inf M60). wx) 


(26.73) 


and 
ol|Cnt Mux), He), ) 


are modifier functions satisfying (26.73) and these 
are used to define FÈ (X) and FŽ(X) in (26.39) and 
(26.44), respectively. We note that we consider multiple 
fuzzy sets G = F € F in (26.39) and apply the union 
because we have 


SU ](Ha, inf (May). ux) < px0) 
VxeUforallGe F. 


Similarly we consider multiple fuzzy sets G de- 
fined by G(x) = sUppercy r(x), xE U in (26.44) 
and we apply the intersection because we have 
o [Gni ev u60) u0), UG) = ux), Vee U 
for all those fuzzy sets G. 

In the definitions of (26.65)—(26.68), the following 
modifier functions are used, respectively 


yx (a) = supt yt (B) | B € [0,a]}. 


p (a) = infty} (B) | B € [æ, 1]}. (26.74) 
Px (a) = supt Wx (B)| € læ, 1}. 
g~ (a) = inf{y=(B) | B € [0.a]} , (26.75) 


where we define 


Wet (@) = inf{ux(2) | z€ U, uoz) Za}, (26.76) 
Wi (@) = sup{ux(z) |zE€ U, uo) <a}, (26.77) 
We (œ) = inf{ux(z)|z€ U, wc(z) <a}, (26.78) 
w= (a) = sup{ux(z) |ze U, Welz) >a}, (26.79) 


with inf = 0 and sup ð = 1. We note that oe and 
gy are monotonically increasing which gy, and ọŽ 
are monotonically decreasing. These monotonicities are 
imposed in order to fit the supposed gradual rules. How- 
ever, such monotonicities do not hold for functions wet 5 
Wi. y; and w%. In the cases of (26.69) and (26.70), 
we should extend the modifier function to a generalized 
modifier function which is a function from [0, 1] x [0, 1] 
to [0, 1] because we have two fuzzy sets in the premise 
of the corresponding gradual rule. The associated gen- 
eralized modifier functions with (26.69) and (26.70) are 
obtained as 


p3 (@1, 2) 

= sup{wx (B1, B2) | Bi € [0,01], 82 € (a2, IJ}, 
(26.80) 

p3 (a1, 02) 

= sup{y { (1, B2) | Bi € [or 1], Bo € [0,a0]} , 
(26.81) 

where we define 

Wx (Bi, B2) 

= inf{ux(2)|zEU, wd © = Bi, ug < Bo}, 
(26.82) 

Wi (Bi, Bo) 

= sup{ux(z)|ze U, wd (2) < Bi, uG) = Bo}, 
(26.83) 


435 


7°92 |) Hed 


436 Part C 


Rough Sets 


7°97 |) Hed 


with inf@=0O and sup@=1. We note gz and 
gy are monotonically increasing in the first argu- 
ment and monotonically decreasing in the second 
argument. 

Moreover, when we do not have any background 
knowledge about the relation between G and X which 
is expressed by a gradual rule. We may define the 
lower approximation G,.(X) and the upper approxima- 
tion G* (X) by the following membership functions 


Mex) (x) = infix) |z € U, uel) = ue), 
(26.84) 


uow (x) = sup{ux kz) |z € U, uel?) = uco). 
(26.85) 


The modifier functions associate with these approxima- 
tions are obtained as 


(26.86) 
(26.87) 


Px (@) = inf{ux(z) |z E€ U, uez) =a}, 
gy” (a) = sup{ux(z) |z E€ U, Welz) =a}, 


where we define infO=0O and sup@=1. Equa- 
tion (26.87) is same as the inverse truth qualifica- 
tion [26.59, 60] of X based on G. 

We describe the properties of the approximations 
defined by (26.65) to (26.68). However. the other ap- 
proximations defined by (26.69), (26.70), (26.84), and 
(26.85) have the similar results. We have the following 
properties for the approximations defined by (26.65) to 
(26.68) (see Greco et al. [26.25] for a part of these prop- 
erties): 


GMcxcet@, 


Gy X) CXC G*(X), (26.88) 
Gt @) = G4 (Ø) = G 0) = GEG) =9., 
(26.89) 
G} (U) = Gł (U) = G3 (U) = G* (U) =U, 
(26.90) 
GF XNY) =G XQNGI (Y), 
Gy (XN Y) =G (XN Y), (26.91) 
G4 (XU Y) = GÏ (X) U GY (Y) , 
G* (XU Y) = G* (X) UG*(Y), (26.92) 
X C Y implies G} (X) € G} (Y) , 
X CY implies G} (X) C G4(Y), (26.93) 


X CY implies G, (X) € G, (Y), 


X C Y implies G* (X) C G*(Y), (26.94) 
GF AUY) 2 GX UGE Y), 
GÈ ANY EDN), (26.95) 
G, (XUY) 2 G (X)UG,(Y), 
G* (XN Y) C G*(X)NG*(Y), (26.96) 
Gf (U\X) = U\G*(X) = UVUA O$ Œ) 

= (U\G); (U\X), (26.97) 
Gy (U\X) = U\G4(X) = U\(U\G)E) 

= (U\G)+(U\X), (26.98) 
Gt (GE (xX) = GE (GT Œ) = GE), 
G4. (G4 (X)) = Gt (G(X) = G4 (X), (26.99) 
Gy (Gy (X)) = GŽ (G; (X)) = G4 (X) , 
G* (G* (X)) = G, (GE (X)) = G (X), (26.100) 


where U\X is a fuzzy set defined by its membership 
function y\x(x) = N(ux(x)), Vx € U with a strictly 
decreasing function N: [0, 1] — [0, 1]. We found that all 
fundamental properties [26.2] of the classical rough set 
are preserved. 


26.2.5 Remarks 


Three types of fuzzy rough set models have been 
described, divided into two groups: classification- 
oriented and approximation-oriented models. The 
classification-oriented fuzzy rough set models are much 
more investigated by many researchers. However, the 
approximation-oriented fuzzy rough set models would 
be more important because they are associated with 
rules. While approximation-oriented fuzzy rough set 
models based on modifiers are strongly related to the 
gradual rules, approximation-oriented fuzzy rough set 
models based on certainty qualification have relations to 
uncertain generation rule (uncertain qualification rule: 
certainty rule and possibility rule) [26.54], i.e., a rule 
such as the more an object is in A, the more certain (pos- 
sible) it is in B. While approximation-oriented fuzzy 
rough set models based on modifiers need a modi- 
fier function for each granule G, the approximation- 
oriented fuzzy rough set models based on certainty 
qualification need only a degree of inclusion for each 
granule F. Therefore, the latter may work well for data 
compression such as image compression, speech com- 
pression, and so on. 


Fuzzy-Rough Hybridization | 26.3 Generalized Fuzzy Belief Structures with Application in Fuzzy Information Systems 437 


26.3 Generalized Fuzzy Belief Structures with Application 


in Fuzzy Information Systems 


In rough set theory there exists a pair of approximation 
operators, the lower and upper approximations, whereas 
in the Dempster-Shafer theory of evidence there ex- 
ists a dual pair of uncertainty measures, the belief and 
plausibility functions. In this section, general types of 
belief structures and their induced dual pairs of belief 
and plausibility functions are first introduced. Relation- 
ships between belief and plausibility functions in the 
Dempster-Shafer theory of evidence and the lower and 
upper approximations in rough set theory are then es- 
tablished. It is shown that the probabilities of lower 
and upper approximations induced from an approxima- 
tion space yield a dual pair of belief and plausibility 
functions. And for any belief structure there must exist 
a probability approximation space such that the belief 
and plausibility functions defined by the given belief 
structure are, respectively, the lower and upper proba- 
bilities induced by the approximation space. The pair 
of lower and upper approximations of a set capture 
the non-numeric aspect of uncertainty of the set which 
can be interpreted as the qualitative representation of 
the set, whereas the pair of the belief and plausibil- 
ity measures of the set capture the numeric aspect of 
uncertainty of the set which can be treated as the quan- 
titative characterization of the set. Finally, the potential 
applications of the main results to intelligent informa- 
tion systems are explored. 


26.3.1 Belief Structures and Belief Functions 


In this section, we recall some basic notions related to 
belief structures with their induced belief and plausibil- 
ity functions. 


Belief and Plausibility Functions Derived 

from a Crisp Belief Structure 
Throughout this section, U will be a nonempty set 
called the universe of discourse. The class of all sub- 
sets (respectively, fuzzy subsets) of U will be denoted 
by P(U) (respectively, by F(U)). For any A € F(U), 
the complement of A will be denoted by ~ A, i.e., 


(~ A)(x) = 1—A(x) forallxe U. 


The basic representational structure in the 
Dempster-Shafer theory of evidence is a belief 
structure. 


Definition 26.1 

Let U be a nonempty set which may be infinite, a set 
function m: P(U) — [0, 1] is referred to as a crisp basic 
probability assignment if it satisfies axioms (M1) and 
(M2) 


(M1) m(@) =0, (M2) > m(X) =1. 


XEU 


A set X € P(U) with nonzero basic probability assign- 
ment is referred to as a focal element of m. We denote 
by M the family of all focal elements of m. The pair 
(M, m) is called a crisp belief structure on U. 


Lemma 26.1 
Let (M, m) be a crisp belief structure on U. Then the 
focal elements of m constitute a countable set. 


Proof: For any n € {1,2,...}, let 
H, = {A € M|m(A) > 1/n}. 


By axiom (M2) we can see that for each n € {1,2,...}, 
H, is a finite set. Since M = Uz H,, we conclude 
that M is countable. m 

Associated with each belief structure, a pair of be- 


lief and plausibility functions can be defined. 


Definition 26.2 

Let (M, m) be a crisp belief structure on U. A set func- 
tion Bel: P(U) —> [0, 1] is called a CC-belief function 
on U if 


Bel(X) = X mM), YXe€?PU). 
MCX 


A set function Pl: P(U) — [0,1] is called a CC- 
plausibility function on U if 


(26.101) 


P= > mM), VXeEP(U). (26.102) 


MOXÆØ 


Remark 26.1 

Since M is a countable set, the change of convergence 
may not change the values of the infinite (countable) 
sums in (26.101) and (26.102). Therefore, Defini- 
tion 26.2 is reasonable. 


€°92 |) Hed 


438 Part C 


Rough Sets 


€°92 |) Hed 


The CC-belief function and CC-plausibility func- 
tion based on the same belief structure are connected 
by the dual property 


PI(X) = 1—Bel(= X), VX € P(U) (26.103) 
and moreover, 
Bel(X) < PI(X), VX € P(U). (26.104) 


When U is finite, a CC-belief function can be 
equivalently defined as a monotone Choquet capac- 
ity [26.61] on U which satisfies the following proper- 
ties [26.38]: 


(MC1) Bel(@) = 0, 
(MC2) Bel(U) = 1, 
(MC3) for all X; € P(U), i= 1,2,...,k, 


n(x) 


i=l 
YD Bel (Q=) . 
ies 


DAICA,2,....43 
(26.105) 


Similarly, a CC-plausibility function can be equiva- 
lently defined as an alternating Choquet capacity on U 
which satisfies the following properties: 

(AC1) PI(@) = 0, 
(AC2) PI(U) = 1, 
(AC3) for all X; € P(U),i=1,2,...,k, 


k 
Pl (a x) < 
i=1 
> “enh (Ux) . (26.106) 


i€J 
A monotone Choquet capacity is a belief function 
in which the basic probability assignment can be calcu- 
lated by using the Mobius transform 


mX) = X (-1)*\"!Bel(¥),X € P(U). (26.107) 
YCX 


A crisp belief structure can also be induced by 
a dual pair of fuzzy belief and plausibility functions. 


Definition 26.3 
Let (M, m) be a crisp belief structure on U. A fuzzy set 


function Bel: F(U) — [0, 1] is called a CF-belief func- 
tion on U if 


Bel(X) = X` mAN (X), WX € F(U). 
AEM 
(26.108) 


A fuzzy set function Pl: F(U) — [0, 1] is called a CF- 
plausibility function on U if 


PIX) = $ m(A)II4(X), VX € F(U), (26.109) 


AEM 


where Ny: F(U) — [0,1] and Ma: F(U) —> [0, 1] are, 
respectively, the necessity measure and the possibility 
measure determined by the crisp set A defined as fol- 
lows 


Na(X) = NXU), XE FW), (26.110) 
uEA 

mO =V XW). Xe FU). (26.111) 
uEA 


Belief and Plausibility Functions Derived 
from a Fuzzy Belief Structure 


Definition 26.4 

Let U be a nonempty set which may be infinite. A set 
function m: F(U) — [0, 1] is referred to as a fuzzy basic 
probability assignment, if it satisfies axioms (FM1) and 
(FM2) 


(FM1) mø) =0, 
(FM2) }) m(X)=1. 


XEF(U) 


A fuzzy set X € F(U) with m(X) > 0 is referred to as 
a focal element of m. We denote by M the family of all 
focal elements of m. The pair (M, m) is called a fuzzy 
belief structure. 


Lemma 26.2 
[26.62] Let (M, m) be a fuzzy belief structure on W. 
Then the focal elements of m constitute a countable set. 


In the discussion to follow, all the focal elements are 
supposed to be normal, i. e., for any A € M, there exists 
an x € U such that A(x) = 1. Associated with the fuzzy 
belief structure (M,m), two pairs of fuzzy belief and 
plausibility functions can be derived. 


Fuzzy-Rough Hybridization | 26.3 Generalized Fuzzy Belief Structures with Application in Fuzzy Information Systems 


Definition 26.5 
Let U be a nonempty set which may be infinite, and 
(M, m) a fuzzy belief structure on U. A crisp set func- 
tion Bel: P(U) — [0,1] is referred to as a FC-belief 
function on U if 


Bel(X) = $ m(A)Na(X), VX € P(U). 
AEM 
(26.112) 


A crisp set function Pl: P(U) — [0, 1] is called a FC- 
plausibility function on U if 


PIX) = $ mA), VX € P(U), (26.113) 
AEM 
where Na: P(U) — [0,1] and Ma: P(U) — [0, 1] are, 
respectively, the necessity measure and the possibility 
measure determined by the fuzzy set A defined as fol- 
lows 


Nœ) = /\(1—A(u)),X € P(U) (26.114) 
ux 

TUX) = V Aw), X € P(U). (26.115) 
uEX 


Definition 26.6 
Let U be a nonempty set which may be infinite, and 
(M, m) a fuzzy belief structure on U. A fuzzy set func- 
tion Bel: F(U) — [0, 1] is referred to as a FF-belief 
function on U if 


Bel(X) = ` mAN X), WX € F(U). 
AEM 
(26.116) 


A fuzzy set function Pl: F(U) — [0, 1] is called a FF- 
plausibility function on U if 


PIX) = $ mA), WX € FU). (26.117) 
AEM 
Where Ny: F(U) — [0, 1] and Ma: F(U) — [0, 1] are, 
respectively, the necessity measure and the possibility 
measure determined by the fuzzy set A defined as fol- 
lows 


Na(X) = A Zw v A -Au))).X € FU), 
= (26.118) 


MX) = V (X(u) AAW). XE F(U). (26.119) 


ucU 


It can be proved that the belief and plausibility 
functions derived from the same fuzzy belief structure 
(M, m) are dual, that is, 


Bel(X) = 1 —PI(x X), VXEF(U). (26.120) 
And 
Bel(X) < P(X), YXe€ F(U). (26.121) 


Moreover, Bel is a fuzzy monotone Choquet capacity 
of infinite order on U which satisfies axioms (FMC1)— 
(FMC3), 


(FMC1) Bel(@) = 0, 
(FMC2) Bel(U) = 1, 
(FMC3) For X; € F(U),i=1,2,...,n,n EN, 


> (—1)"'+'Bel Nx 


OAIC£{1,2,..., ny jEJ 
(26.122) 


And Plis a fuzzy alternating Choquet capacity of in- 
finite order on U which obeys axioms (FAC1)—(FAC3), 


(FAC1) PI(Ø) = 0, 
(FAC2) PI(U) = 1, 
(FAC3) For X; € F(U), i=1,2,...,n,n €N, 


|IJI+1 
n(x) = Y ED »ı (Ux). 
i=1 GERi A a ag n} EJ 

(26.123) 


26.3.2 Belief Structures 
of Rough Approximations 


In this section, we show relationships between various 
belief and plausibility functions in Dempster-Shafer 
theory of evidence and the lower and upper approxi- 
mations in rough set theory with potential applications. 


Belief Functions 
versus Crisp Rough Approximations 


Definition 26.7 

Let U and W be two nonempty universes of discourse. 
A subset R € P(U x W) is referred to as a binary relation 
from U to W. The relation R is referred to as serial if 


439 


E'9Z |) Hed 


440 PartC 


Rough Sets 


€°92 |) Hed 


for any x € U there exists y € W such that (x,y) € R. 
If U = W, Re P(U x U) is called a binary relation on 
U, R € P(U x U) is referred to as reflexive if (x, x) € R 
for all x€ U; R is referred to as symmetric if (x, y) € 
R implies (y, x) € R for all x, y € U; R is referred to as 
transitive if for any x,y, z € U, (x,y) € Rand (y,z) € R 
imply (x, z) € R; R is referred to as Euclidean if for any 
x,y,z E U, (x,y) € R and (x,z) € R imply (y,z) € R; R 
is referred to as an equivalence relation if R is reflexive, 
symmetric and transitive. 


Assume that R is an arbitrary binary relation from 
U to W. One can define a set-valued mapping R,: U > 
P(W) by 


R,(x) = {ye WiC, y) ER}, xEU. (26.124) 


R;(x) is called the successor neighborhood of x with re- 
spect to R [26.63]. Obviously, any set-valued mapping 
F from U to W defines a binary relation from U to W by 
setting R = {(x, y) € Ux W|y € F(x)}. For A € P(W), 
let j(A) = R7! (A) be the counter-image of A under the 
set-valued mapping Rs, i. e., 


R,'(A) = {u € U|R,(u) = A}, 
if A € {R,(x)|x € U}, 
Ø, otherwise . 


j(A) = 


(26.125) 


Then it is well known that j satisfies the properties (J1) 
and (J2) 


(1) A ¢ B= f(A) NjB) = 9, 
(2) LU wsv. 


AE P(W) 


Definition 26.8 

If R is an arbitrary relation from U to W, then the triple 
(U, W, R) is referred to as a generalized approximation 
space. For any set A C W, a pair of lower and upper ap- 
proximations, R(A) and R(A), are, respectively, defined 
by 


R(A) = tx € UIRs(x) CA}, 


R(A) = {x € UR) NAF Ø}. (26.126) 
The pair (R(A), R(A)) is referred to as a generalized 
crisp rough set and R and R: P(W) —> P(U) are called 
the lower and upper generalized approximation opera- 
tors, respectively. 


If U is countable set, P a normalized probability 
measure on U, i.e., P({x}) > 0 for all x€ U, and R an 
arbitrary relation from U to W, then ((U, P), W, R) is 
referred to as a probability approximation space. 


Theorem 26.1 [26.43] 
Assume that ((U, P), W,R) is a serial probability ap- 
proximation space, for X € P(W), define 


m(X) = P(j(X)), 
Bel(X) = P(R(X)), 
PI(X) = P(R(X)) . 


Then m: P(W) — [0, 1] is a basic probability assign- 
ment on W and Bel: P(W) — [0,1] and Pl: P(W) > 
[0, 1] are, respectively, the CC-belief and CC-plausibil- 
ity functions on W. 

Conversely, for any crisp belief structure (M, m) 
on W which may be infinite. If Bel: P(W) — [0, 1] and 
Pl: P(W) — [0, 1] are, respectively, the CC-belief and 
CC-plausibility functions defined in Definition 26.2, 
then there exists a countable set U, a serial relation R 
from U to W, and a normalized probability measure P 
on U such that 

Bel(X) = P(R(X)), 


PIX) = P(R(X)), 


(26.127) 


YX € P(W). (26.128) 


The notion of information systems (sometimes 
called data tables, attribute-value systems, knowledge 
representation systems etc.) provides a convenient tool 
for the representation of objects in terms of their at- 
tribute values. 

An information system is a pair (U, A), where U = 
{x1,X2,...,X,}is a nonempty, finite set of objects called 
the universe of discourse and A = {aj,d2,...,@m} is 
a nonempty, finite set of attributes, such that a: U > Va 
for any a € A, where V, is called the domain of a. 

Each nonempty subset B C A determines an indis- 
cernibility relation as follows 


Rg = {(x, y) € Ux Ula(x) = a(y), Va € B} . 
(26.129) 


Since Rg is an equivalence relation on U, it forms a par- 
tition U/Rg = {[x]g|x € U} of U, where [x]g denotes 
the equivalence class determined by x with respect to 
(w.r.t.) B, i.e., [x]p = {y € U| (x, y) € Rp}. 

Let (U, A) be an information system, B C A, for any 
X C U, denote 


Re(X) = {x € Ulbe CX}, 


Rg(X) = {x € Ulka NX ZO}, (26.130) 


Fuzzy-Rough Hybridization | 26.3 Generalized Fuzzy Belief Structures with Application in Fuzzy Information Systems 


where Rg(X) and Rg(X) are, respectively, referred to as 
the lower and upper approximations of X w.r.t. (U, Rg), 
the knowledge generated by B. Objects in Rg(X) can 
be certainty classified as elements of X on the basis 
of knowledge in (U, Rg), whereas objects in Rg(X) can 
only be classified possibly as elements of X on the basis 
of knowledge in (U, Rg)). 

For B CA and X C U, denote Belg(X) = P(Rg(X)) 
and Plg(X) = P(Rg(X)), where P(Y) = |Y|/|U| and |Y| 
is the cardinality of a set Y. Then Belg and Plg are 
CC-belief function and CC-plausibility function on U, 
respectively, and the corresponding mass distribution is 


P(Y), if Y€U/Rp, 


te 0, otherwise . 


A decision system (sometimes called decision table) 
is a pair (U, CU {d}) where (U, C) is an information 
system, and d is a distinguished attribute called the de- 
cision; in this case C is called the conditional attribute 
set, d is a map d: U — V4 of the universe U into the 
value set Vz, we assume, without any loss of generality, 
that Va = {1,2,...,r}. Define 


Ry = {(x,y) € U x Uld(x) =d(y)} . (26.131) 


Then we obtain the partition U/Ry = {D,, D2,...,D,} 
of U into decision classes, where D; = {x € U|d(x) = 
j}. j <r. If Rc C Ra, then the decision system (U, CU 
{d}) is consistent, otherwise it is inconsistent. One 
can acquire certainty decision rules from consistent 
decision systems and uncertainty decision rules from 
inconsistent decision systems. 


Belief Functions 
versus Rough Fuzzy Approximations 


Definition 26.9 

Let (U,W,R) be a generalized approximation space, 
for a fuzzy set A € F(W), the lower and upper ap- 
proximations of A, RF(A) and RF(A), with respect to 
the approximation space (U, W, R) are fuzzy sets of U 
whose membership functions, for each x € U, are de- 
fined, respectively, by 


RF(A)(x)= V AG). xeU, (26.132) 
YER; (x) 

RF(A\(x) = N AO), xeU. (26.133) 
YER, (x) 


The pair (RF(A), RF (A)) is referred to as a generalized 
rough fuzzy set, and RF and RF: F (W) > F(U) are 


referred to as lower and upper generalized rough fuzzy 
approximation operators, respectively. 


In the discussion to follow, we always assume that 
(U, A, P) is a probability space, i.e., U is a nonempty 
set, A C P(U) a o-algebra on U, and P a probability 
measure on U. 


Definition 26.10 

A fuzzy set A € F(U) is said to be measurable w.r.t. 
(U, A) if A: U > [0, 1] is a measurable function w.r.t. 
A—B((0, 1]), where B((0, 1]) is the family of Borel sets 
on [0, 1]. We denote by F(U, A) the family of all mea- 
surable fuzzy sets of U w.r.t. A — B([0, 1]). 


For any measurable fuzzy set A € F(U, A), since 
Aq E A for all a € [0,1], Ag is a measurable set on 
the probability space (U, A, P) and then P(Aq) € [0, 1]. 
Note that f(&) = P(Aq) is monotone decreasing and 
left continuous, it can be seen that f(a) is integrable, 


we denote the integrand as i P(Aq)da. 


Definition 26.11 
If a fuzzy set A is measurable w.r.t. (U, A), and P is 
a probability measure on (U, A). Denote 


1 


P(A) = / P(Aq)da , 


0 


(26.134) 


P(A) is called the probability of A. 


For a singleton set {x}, we will write P(x) instead of 
P({x}) for short. 


Proposition 26.1 
[26.21, 64] The fuzzy probability measure P in Defini- 
tion 26.11 satisfies the following properties: 


(1) P(A) € [0, 1] and P(A) + P(® A) = 1, for all A € 
F(U, A). 

(2) P is countably additive, i. e., for A; € F(U, A), i = 
1,2,...,A;N A; = 0, Yi Æ j, then 


P (Ù 4) = X Pa) , (26.135) 


i=1 i=l 


(3) A,B € F(U, A), A C B => P(A) < P(B). 


44 


E'9Z |) Hed 


442 PartC | Rough Sets 


€°92 |) Hed 


(4) If U = {u;|i=1,2,...} is an infinite countable set 
and A = P(U), then for all A € F(U), 


1 


P(A) = | Peada = YS A@)PO) . (26.136) 


0 xEU 


(5) If U is a finite set with |U| =n, A = P(U), and 
P(u) = 1/n, then P(A) = f) P(Aq)da = |A|/n for 
all A € P(U). 


Theorem 26.2 
Assume that ((U, P), W,R) is a serial probability ap- 
proximation space, for X € F(W), define 


m(X) = P(i(X)) , 
Bel(X) = P(RE(X)) , 


PI(X) = P(RF(X)) . (26.137) 


Then m: P(W)— [0,1] is a basic probability as- 
signment on W and Bel: F(W)— [0,1] and 
Pl: F(W) — [0,1] are, respectively, the CF-belief 
and CF-plausibility functions on W. 

Conversely, for any crisp belief structure (M,m) 
on W which may be infinite. If Bel: F (W) — [0, 1] and 
Pl: F(W) — [0, 1] are, respectively, the CF- belief and 
CF-plausibility functions defined in Definition 26.3, 
then there exists a countable set U, a serial relation R 
from U to W, and a normalized probability measure P 
on U such that 

Bel(X) = P(RE(X)) , 

PI(X) = P(RF(X)), VX e FW). (26.138) 


For a decision table (U,CU{d}), where V4 = 
{d,do,...,d,}, d is called a fuzzy decision if, for each 
x € U, d(x) is a fuzzy subset of V4, i. e., d: U > F (Va), 
with no lose of generality, we represent d as follows 


d(x) = da /dı +dj/dy+-+--+dir/d,,i 


N (26.139) 


where dj € [0, 1]. In this case, (U, C U {d}) is called an 
information system with fuzzy decision. For the fuzzy 
decision d, we define a fuzzy indiscernibility binary re- 
lation R4 on U as follows: For i,k = 1,2,...,n 


Ra(xi, Xk) = min{1 = ld; — dylli = 1,2, ni sr} é 
(26.140) 


Then, we obtain a fuzzy similarity class S4(x) of x € U 
in the system (U, CU {d}) as follows 


Sa(x)(y) = Ra (x, y), ye U; 


Since Sa (x)(x) = Ra(x, x) = 1, we see that Sy(x): U > 
[0, 1] is a normalized fuzzy set of U. Denote by U/R4 
the fuzzy similarity classes induced by the fuzzy deci- 
sion d, i.e. 


(26.141) 


U/Ra = {Sa (x) |x € U}. (26.142) 


For B C C and X € F(U), we define the lower and 
upper approximations of X w.r.t. (U, Rg) as follows 


RFs (X) Œ) = Ayesa(x) XO), xeU, 
RFg(X) (x) = Vessa) XO), xEU. 
(26.143) 


Theorem 26.3 

Let (U, CU {d}) be an information system with fuzzy 
decision. For BC C and X €e F(U), if RFg(X) and 
RFg(X) are, respectively, the lower and upper approxi- 
mations of X w.r.t. (U, Rg) defined by Definition 26.9, 
denote 


Belg(X) = P(RFg(X)), 
Ph) = PRF), ae 
where P(X) = } ey X(x)/|U| for Xe F(U), then 
Belg: F(U) — [0,1] and Plz: F(U) — [0,1] are, re- 
spectively, a CF-belief function and a CF-plausibility 
function on U, and the corresponding basic probability 
assignment m, is 


_{ PŒ)=]YI/|Ul, if Y € U/Re, 
m,(Y) = 0, otherwise . 


(26.145) 


Belief Functions 
versus Fuzzy Rough Approximations 


Definition 26.12 

Let U and W be two nonempty universes of discourse. 
A fuzzy subset R € F(U x W) is referred to as a bi- 
nary relation from U to W, R(x,y) is the degree of 
relation between x and y, where (x,y) € Ux W. The 
fuzzy relation R is referred to as serial if for each x € U, 
Vew R@, y) = 1. If U = W, R € F(U x U) is called 


Fuzzy-Rough Hybridization | 26.3 Generalized Fuzzy Belief Structures with Application in Fuzzy Information Systems 443 


a fuzzy binary relation on U, R is referred to as a re- 
flexive fuzzy relation if R(x, x) = 1 for all x € U; R is 
referred to as a symmetric fuzzy relation if R(x, y) = 
R(y, x) for all x, y € U; R is referred to as a transitive 
fuzzy relation if R(x, z) > Vyeu (R(x, y) ARQ, z)) for all 
x,z € U; R is referred to as an equivalence fuzzy rela- 
tion if it is reflexive, symmetric, and transitive. 


Definition 26.13 

Let U and W be two nonempty universes of dis- 
course and R a fuzzy relation from U to W. The triple 
(U, W,R) is called a generalized fuzzy approximation 
space. For any set A € F(W), the lower and upper ap- 
proximations of A, FR(A) and FR(A), with respect to 
the approximation space (U, W, R) are fuzzy sets of U 
whose membership functions, for each x € U, are de- 
fined, respectively, by 


FR(A)(x) = V Ræ y) AAO), x EU, 


yew 


FR(A)(x) = /\[U-R@y)) VAO], x EU. 
yew 
(26.146) 


The pair (FR(A), FR(A)) is referred to as a generalized 
fuzzy rough set, and FR and FR: F (W) —> F(U) are re- 
ferred to as lower and upper generalized fuzzy rough 
approximation operators, respectively. 


Theorem 26.4 

Let (U, W, R) be a serial fuzzy approximation space in 
which U is a countable set and P a probability measure 
on U. If FR and FR are the fuzzy rough approximation 
operators defined in Definition 26.13, denote 


Bel(X) = P(FR(X)) , 


PI(X) = P(FR(X)), X€ F(W). (26.147) 
Then Bel: F(W) — [0,1] and Pl: #(W) — [0, 1] are, 
respectively, FF-fuzzy belief and FF-plausibility func- 
tions on W. 

Conversely, if (M,m) is a fuzzy belief structure 
on W, Bel: F(W) — [0,1] and Pl: F(W) — [0, 1] are 
the pair of FF-fuzzy belief function and FF-plausibility 
function defined in Definition 26.6, then there exists 
a countable set U, a serial fuzzy relation R from U to 
W, and a probability measure P on U such that for all 


Xe F(W), 
Bel(X) = P(ER(X)) = X FR(X)(x)P(x) , 
= (26.148) 
PI(X) = P(FR(X)) = > FR(X)(x)P(x) . 
= (26.149) 


A pair (U, A) is called a fuzzy information system if 
each a € A is a fuzzy attribute, i. e., for each x € U, a(x) 
is a fuzzy subset of Va, that is, a: U > F (Va). Similar 
to (26.140), we can define a reflexive fuzzy binary rela- 
tion Ra on U, and consequently, for any attribute subset 
BCA one can define a reflexive fuzzy relation Rg as 
follows 


Rg = ( \ Ra. 


aEB 


For X € F(U), denote 


(26.150) 


Belz (X) = P(FR,(X)), Pla(X) = P(FRa(X)), 
(26.151) 


where P(X) = cy X(x)/|U| for X€ F(U). Then, 
according to Theorem 26.4, Belg: F(U) — [0, 1] and 
Pls: F(U) — [0, 1] are respectively, FF-fuzzy belief 
function and FF-plausibility function on U. More 
specifically, if X in (26.151) is crisp subset of U, 
then Belg: P(U) — [0,1] and Pls: P(U) — [0, 1] de- 
fined by (26.151) are, respectively, FC-fuzzy belief 
functions and FC-plausibility functions on U. Based 
on these observations, we believe that FF-fuzzy be- 
lief functions and FF-plausibility functions can be used 
to analyze uncertainty fuzzy information systems with 
fuzzy decision and whereas FC-fuzzy belief functions 
and FC-plausibility functions can be employed to deal 
with knowledge discovery in fuzzy information systems 
with crisp decision. 


26.3.3 Conclusion of This Section 


The lower and upper approximations of a set capture 
the non-numeric aspect of uncertainty of the set which 
can be interpreted as the qualitative representation of 
the set, whereas the pair of the belief and plausibil- 
ity measures of the set characterize the numeric aspect 
of uncertainty of the set which can be treated as the 
quantitative characterization of the set. In this section, 
we have introduced some generalized belief and plau- 
sibility and belief functions on the Dempster-Shafer 


€°97 |) Hed 


444 Part C 


Rough Sets 


7°92 |) Hed 


theory of evidence. We have shown that the fuzzy be- 
lief and plausibility functions can be interpreted as 
the lower and upper approximations in rough set the- 
ory. That is, the belief and plausibility functions in the 
Dempster-Shafer theory of evidence can be represented 
as the probabilities of lower and upper approximations 


in rough set theory; thus, rough set theory may be re- 
garded as the basis of the Dempster-Shafer theory of 
evidence. Also the Dempster-Shafer theory of evidence 
in the fuzzy environment provides a potentially useful 
tool for reasoning and knowledge acquisition in fuzzy 
systems and fuzzy decision systems. 


26.4 Applications of Fuzzy Rough Sets 


Both fuzzy set and rough set theories have fostered 
broad research communities and have been applied 
in a wide range of settings. More recently, this has 
also extended to the hybrid fuzzy rough set models. 
This section tries to give a sample of those applica- 
tions, which are in particular numerous for machine 
learning but which also cover many other fields, like 
image processing, decision making, and information re- 
trieval. 

Note that we do not consider applications that 
simply involve a joint application of fuzzy sets and 
rough sets, like for instance a rough classifier that in- 
duces fuzzy rules. Rather, we focus on applications that 
specifically involve one of the fuzzy rough set models 
discussed in the previous sections. 


26.4.1 Applications in Machine Learning 


Feature Selection 
The most prominent application of classical rough set 
theory is undoubtedly semantics-preserving data di- 
mensionality reduction: the removal of attributes (fea- 
tures) from information systems (An information sys- 
tem (U,A) consists of a nonempty set U of objects 
which are described by a set of attributes A.) without 
sacrificing the ability to discern between different ob- 
jects. A minimal attribute subset B C A that maintains 
objects’ discernibility is called a reduct. For classifica- 
tion tasks, it is sufficient to be able to discern between 
objects belonging to different classes, in which case 
a decision reduct, also called relative reduct, is sought. 

The traditional rough set model sets forth a crisp 
notion of discernibility, where two objects are either 
discernible or not w.r.t. a set of attributes B based on 
their values for all attributes in B. To be able to handle 
numerical data, discretization is required. Fuzzy-rough 
feature selection avoids this external preprocessing step 
by incorporating graded indiscernibility between ob- 
jects directly into the data reduction process. On the 
other hand, by the use of fuzzy partitions, such that ob- 


jects can belong to different classes to varying degrees, 
a more flexible data representation is obtained. 

Chronologically, the oldest proposal to apply 
fuzzy rough sets to feature selection is due to 
Kuncheva [26.26] in 1992. However, rather than using 
Dubois and Prade’s definition, she proposed her own 
notion of a fuzzy rough set based on an inclusion mea- 
sure. Based on this, she defined a quality measure for 
evaluating attribute subsets w.r.t. their ability to approx- 
imate a predetermined fuzzy partition on the data, and 
illustrated its usefulness on a medical data set. 

Jensen and Shen [26.27,29] were the first to pro- 
pose a reduction method that generalizes the classical 
rough set positive region and dependency function. In 
particular, the dependency degree yg, with BCA, is 
used to guide a hill-climbing search in which, starting 
from B = Ø, in each step an attribute a is added such 
that ygU,a} is maximal. The search ends when there is 
no further increase in the measure. This is the Quick 
Reduct algorithm. In [26.65] they replaced this simple 
greedy search heuristic by a more complex one based 
on ant colony optimization. 

Hu et al. [26.66] formally defined the notions of 
reduct and decision reduct in the fuzzy-rough case, 
referring to the invariance of the fuzzy partition in- 
duced by the data, and of the fuzzy positive region, 
respectively. They also showed that minimal subsets 
that are invariant w.r.t. (conditional) entropy are (deci- 
sion) reducts. 

Tsang et al. [26.67] proposed a method based on 
the discernibility matrix and function to find all de- 
cision reducts where invariance of the fuzzy positive 
region defined using Dubois and Prade’s definition is 
imposed, and proved its correctness. In [26.68], an ex- 
tension of this method is defined that finds all decision 
reducts where the approximations are defined using 
a lower semicontinuous t-norm J and its R-implicator. 
The particular case using Lukasiewicz connectives was 
studied in [26.69]. Later, Zhao and Tsang [26.31] stud- 
ied relationships that exist between different kinds of 


Fuzzy-Rough Hybridization | 26.4 Applications of Fuzzy Rough Sets 445 


decision reducts, defined using different types of fuzzy 
connectives. 

In [26.70], Jensen and Shen introduced three differ- 
ent quality measures for evaluating attribute subsets: the 
first one is a revised version of their previously defined 
degree of dependency, the second one is based on the 
fuzzy boundary region, and the third one on the satisfac- 
tion of the clauses of the fuzzy discernibility function. 
On the other hand, in [26.71], Cornelis et al. proposed 
the definition of fuzzy M-decision reducts, where M is 
an increasing, [0, 1]-valued quality measure. They stud- 
ied two measures based on the fuzzy positive region 
and two more based on the fuzzy discernibility func- 
tion, and applied them to classification and regression 
problems. 

In [26.33], Chen and Zhao studied the concept of lo- 
cal reduction: instead of looking for a global reduction, 
where the whole positive region is considered as an in- 
variant, they focus on subsets of decision classes and 
identify the conditional attributes that provide minimal 
descriptions for them. 

Over the past few years, there has also been consid- 
erable interest in the application of noise-tolerant fuzzy 
rough set models to feature selection, where the aim is 
to make the reduction more robust in the presence of 
noisy or erroneous data. For instance, Hu et al. [26.72] 
defined fuzzy rough sets as an extension of variable 
precision rough sets, and used a corresponding notion 
of positive region to guide a greedy search algorithm. 
In [26.73], Cornelis and Jensen evaluated the vaguely 
quantified rough set (VQRS) approach to feature se- 
lection. They found that because the model does not 
satisfy monotonicity w.r.t. the fuzzy relation R, adding 
more attributes does not always lead to an expansion of 
the fuzzy positive region, and the hill-climbing search 
sometimes runs into troubles. Furthermore, in [26.74] 
Hu et al., inspired by the idea of soft margin support 
vector machines, introduced soft fuzzy rough sets and 
applied them to feature selection. 

He etal. [26.75] consider the problem of fuzzy- 
rough feature selection for decision systems with fuzzy 
decisions, that is, where the decision attribute is charac- 
terized by a fuzzy T-similarity relation instead of a crisp 
one. This is the case of regression problems. They give 
an algorithm for finding all decision reducts and another 
one for finding a single reduction. 

The relatively high complexity of fuzzy-rough fea- 
ture selection algorithms somewhat limits is applicabil- 
ity to large datasets. In view of this, Chen et al. [26.76] 
propose a fast algorithm to obtain one reduct, based 
on a procedure to find the minimal elements of the 


discernibility matrix of [26.67]. The algorithm is com- 
pared w.r.t. execution time with the proposals in [26.70] 
and [26.67], and turns out to be a lot faster. On the other 
hand, Qian et al. [26.77] implement an efficient version 
of feature selection using the model of Hu et al. [26.72]. 

The use of kernel functions as fuzzy similarity re- 
lations in feature selection algorithms has also sparked 
researchers’ interest. In particular, Du et al. [26.78] ap- 
ply fuzzy-rough feature selection with kernelized fuzzy 
rough sets to yawn detection, while Chen et al. [26.79] 
propose parameterized attribute reduction with Gaus- 
sian kernel-based fuzzy rough sets. He and Wu [26.80] 
develop a new method to compute membership for 
fuzzy support vector machines (FSVMs) by using 
a Gaussian kernel-based fuzzy rough set, and em- 
ploy a technique of attribute reduction using Gaussian 
kernel-based fuzzy rough sets to perform feature selec- 
tion for FSVMs. 

Finally, Derrac et al. [26.81] combine fuzzy-rough 
feature selection with evolutionary instance selection. 


Instance Selection 

Instance selection can be seen as the orthogonal task 
to feature selection: here the goal is to reduce an in- 
formation system (U,A) by removing objects from U. 
The first work on instance selection using fuzzy rough 
set theory was presented in [26.82]. The main idea is 
that instances for which the fuzzy rough lower approx- 
imation membership is lower than a certain threshold 
are removed. This idea was improved in [26.83], where 
the selection threshold is optimized. This method has 
been applied in combination with evolutionary feature 
selection in [26.84] and for imbalanced classification 
problems in [26.85, 86], in combination with resam- 
pling methods. 


Classification 
Fuzzy rough sets have been widely used for classifi- 
cation purposes, either by means of rule induction or 
by plugging them into existing classifiers like nearest 
neighbor classifiers, decision trees, and support vector 
machines (SVM). 

The earliest work on rule induction using fuzzy 
rough set theory can be found in [26.25]. In this pa- 
per, the authors propose a fuzzy rough framework 
to induce fuzzy decision rules that does not use any 
fuzzy logical connectives. Later, in [26.30], an approach 
that generates rules from data using fuzzy reducts was 
presented, with a fuzzy rough feature selection prepro- 
cessing step. In [26.87], the authors noticed that using 
feature selection as a preprocessing step often leads to 


7°97 |) Hed 


446 PartC 


Rough Sets 


7°92 |) Hed 


too specific rules, and proposed an algorithm for simul- 
taneous feature selection and rule induction. In [26.88, 
89], a rule-based classifier is built using the so-called 
consistency degree as a critical value to keep the dis- 
cernibility information invariant in the rule-induction 
process. Another approach to fuzzy rough rule induc- 
tion can be found in [26.90], where rules are found from 
training data with hierarchical and quantitative attribute 
values. The most recent work can be found in [26.91], 
where fuzzy equivalence relations are used to model 
different types of attributes in order to obtain small rule 
sets from hybrid data, and in [26.92] where a harmony 
search algorithm is proposed to generate emerging rule 
sets. 

In [26.93], the K nearest neighbor method was 
improved using fuzzy set theory. So far, three differ- 
ent fuzzy-rough-based approaches have been used to 
improve this fuzzy nearest neighbor (FNN) classifier. 
In [26.94], the author introduces a fuzzy rough own- 
ership function and plugs it into the FNN algorithm. 
In [26.95—98], the extent to which the nearest neighbors 
belong to the fuzzy lower and upper approximations of 
a certain class are used to predict the class of the tar- 
get instance, these techniques are applied in [26.99] for 
mammographic risk analysis. Finally, in [26.100], the 
FNN algorithm is improved using the fuzzy rough pos- 
itive regions as weights for the nearest neighbors. 

During the last decade, several authors have worked 
on fuzzy rough improvements of decision trees. The 
common idea of these methods is that during the 
construction phase of the decision tree, the feature 
significances are measured using fuzzy rough tech- 
niques [26.101—104]. In [26.105—107], the kernel func- 
tions of the SVM are redefined using fuzzy rough sets, 
to take into account the inconsistency between condi- 
tional attributes and the decision class. In [26.80], this 
approach is combined with fuzzy rough feature selec- 
tion. In [26.108], SVMs are reformulated by plugging 
in the fuzzy rough memberships of all training samples 
into the constraints of the SVMs. 


Clustering 
Many authors have worked on clustering methods that 
use both fuzzy set theory and rough set theory, but to the 
best of our knowledge, only two approaches use fuzzy 
rough sets for clustering. In [26.109], fuzzy rough sets 
are used to measure the intracluster similarity, in order 
to estimate the optimal number of clusters. In [26.110], 
a fuzzy rough measure is used to measure the similarity 
between genes in microarray analysis, in order to gen- 
erate clusters such that genes within a cluster are highly 


correlated to the sample categories, while those in dif- 
ferent clusters are as dissimilar as possible. 


Neural Networks 

There are many approaches to incorporate fuzzy rough 
set theory in neural networks. One option is to use fuzzy 
rough set theory to reduce the problem that samples 
in the same input clusters can have different classes. 
The resulting fuzzy rough neural networks are designed 
such that they work as fuzzy rough membership func- 
tions [26.111—114]. A related approach is to use fuzzy 
rough set theory to find the importance of each subset 
of information sources of subnetworks [26.115]. Other 
approaches use fuzzy rough set theory to measure the 
importance of each feature in the input layer of the neu- 
ral network [26.116—-118]. 


26.4.2 Other Applications 


Image Processing 
Fuzzy rough sets have been used in several domains of 
image processing. They are especially suitable for these 
tasks because they can capture both indiscernibility and 
vagueness, which are two important aspects of image 
processing. 

In [26.119, 120], fuzzy-rough-based image seg- 
menting methods are proposed and applied in a tra- 
ditional Chinese medicine tongue image segmenta- 
tion experiment. Often, fuzzy rough attribute reduction 
methods are proposed for image processing problems, 
as in [26.121] or in [26.122], where the methods are 
applied for face recognition. In [26.123], a method 
for edge detection is proposed by building a hierar- 
chy of rough-fuzzy sets to exploit the uncertainty and 
vagueness at different image resolutions. Another as- 
pect of image processing is texture segmentation, this 
problem is tackled in [26.124] using rough-fuzzy sets. 
In [26.125], the authors solve the image classification 
problem using a nearest neighbor clustering algorithm 
based on fuzzy rough set theory, and apply their algo- 
rithm to hand gesture recognition. In [26.126], a com- 
bined approach of neural network classification systems 
with a fuzzy rough sets based feature reduction method 
is presented. In [26.127], fuzzy rough feature reduction 
techniques are applied to a large-scale Mars McMurdo 
panorama image. 


Decision Making 
Fuzzy rough set theory has many applications in de- 
cision making. In [26.128], the authors calculate the 
fuzzy rough memberships of software components in 


Fuzzy-Rough Hybridization | References 


previous projects and decide based on these values 
which ones to reuse in a new program. In [26.129, 
130], a multiobjective decision-making model based on 
fuzzy rough set theory is used to solve the inventory 
problem. In [26.131], variable precision fuzzy rough 
sets are used to develop a decision making model, 
that is applied for IT offshore outsourcing risk evalua- 
tion. Another approach can be found in [26.132] where 
the decision corresponds to the decision corresponding 
with the instance with maximal sum of lower and upper 
soft fuzzy rough approximation. Recent work can be 
found in [26.133], where a fuzzy rough set model over 
two universes is defined to develop a general decision- 
making framework in an uncertainty environment for 
solving a medical diagnosis problem. 


Information Retrieval, Data Mining, 

and the Web 
Fuzzy rough sets have been used to model impreci- 
sion and vagueness in databases. In [26.134], the au- 


References 


thors develop a fuzzy rough relational database, while 
in [26.135], a fuzzy rough extension of a rough ob- 
ject classifier for relational database mining is studied. 
In [26.136], fuzzy rough set theory is used to mine from 
incomplete datasets, while in [26.137], fuzzy rough 
sets are incorporated in mining agents for predicting 
stock prices. More recently, fuzzy rough sets have been 
applied to identify imprecision in temporal database 
models [26.138, 139]. 

In [26.140, 141], fuzzy rough set theory is used to 
approximate document queries. In the context of the 
semantic web, a lot of work has been done on fuzzy 
rough description logics. The first paper on this topic 
can be found in [26.142], where a fuzzy rough on- 
tology was proposed. Later, in [26.143], the authors 
propose a fuzzy rough extension of the descriptive 
logic SHIN. A fuzzy rough extension of the descrip- 
tive logic ALC can be found in [26.144]. In [26.145, 
146], an improved and more general approach is pre- 
sented. 


26.1 Z. Pawlak: Rough sets, Int. J. Comput. Inf. Sci. 11, 
341-356 (1982) 

26.2 Z. Pawlak: Rough Sets: Theoretical Aspects of Rea- 
soning About Data (Kluwer, Boston 1991) 

26.3 A. Nakamura: Fuzzy rough sets, Notes Mult.- 
Valued Log. Jpn. 9, 1-8 (1988) 

26.4 D. Dubois, H. Prade: Rough fuzzy sets and 
fuzzy rough sets, Int. J. Gen. Syst. 17, 191-209 
(1990) 

26.5 D. Dubois, H. Prade: Putting rough sets and fuzzy 
sets together. In: Intelligent Decision Support, ed. 
by R. Stowinski (Kluwer, Boston 1992) pp. 203- 
232 

26.6 N.N. Morsi, M.M. Yakout: Axiomatics for fuzzy 
rough sets, Fuzzy Sets Syst. 100, 327-342 (1998) 

26.7 S. Greco, B. Matarazzo, R. Stowinski: The use of 
rough sets and fuzzy sets in MCDM. In: Multicri- 
teria Decision Making, ed. by T. Gal, T.J. Stew- 
ard, T. Hanne (Kluwer, Boston 1999) pp. 397- 
455 

26.8 D. Boixader, J. Jacas, J. Recasens: Upper and lower 
approximations of fuzzy sets, Int. J. Gen. Syst. 29, 
555-568 (2000) 

26.9 A.M. Radzikowska, E.E. Kerre: A comparative study 
of fuzzy rough set, Fuzzy Sets Syst. 126, 137-155 
(2002) 

26.10 M. Inuiguchi, T. Tanino: New fuzzy rough sets 
based on certainty qualification. In: Rough- 
Neural Computing, ed. by K. Pal, L. Polkowski, 
A. Skowron (Springer, Berlin, Heidelberg 2003) 
pp. 278-296 


26.11 W.-Z. Wu, J.-S. Mi, W.-X. Zhang: Generalized 
fuzzy rough sets, Inf. Sci. 151, 263-282 (2003) 

26.12 M. Inuiguchi: Generalization of rough sets: From 
crisp to fuzzy cases, Lect. Notes Artif. Intell. 3066, 
26-37 (2004) 

26.13 A.M. Radzikowska, E.E. Kerre: Fuzzy rough sets 
based on residuated lattices, Lect. Notes Comput. 
Sci. 3135, 278-296 (2004) 

26.14 W.-Z. Wu, W.-X. Zhang: Constructive and ax- 
iomatic approaches of fuzzy approximation op- 
erators, Inf. Sci. 159, 233-254 (2004) 

26.15 J.-S. Mi, W.-X. Zhang: An axiomatic characteriza- 
tion of a fuzzy generalization of rough sets, Inf. 
Sci. 160, 235-249 (2004) 

26.16 W.-Z. Wu, Y. Leung, J.-S. Mi: On characterizations 
of (7,7 )-fuzzy rough approximation operators, 
Fuzzy Sets Syst. 15, 76-102 (2005) 

26.17 D.S. Yeung, D.G. Chen, E.C.C. Tsang, J.W.T. Lee, 
X.Z. Wang: On the generalization of fuzzy 
rough sets, IEEE Trans. Fuzzy Syst. 13, 343-361 
(2005) 

26.18 M. DeCock, C. Cornelis, E.E. Kerre: Fuzzy rough 
sets: The forgotten step, IEEE Trans. Fuzzy Syst. 15, 
121-130 (2007) 

26.19 T.J. Li, W.X. Zhang: Rough fuzzy approximations 
on two universes of discourse, Inf. Sci. 178, 892- 
906 (2008) 

26.20 J.-S. Mi, Y. Leung, H.-Y. Zhao, T. Feng: Generalized 
fuzzy rough sets determined by a triangular norm, 
Inf. Sci. 178, 3203-3213 (2008) 


447 


9z |) Hed 


448 PartC 


Rough Sets 


92 |) Hed 


26.21 


26.22 


26.23 


26.24 


26.25 


26.26 


26.27 


26.28 


26.29 


26.30 


26.31 


26.32 


26.33 


26.34 


26.35 


26.36 


26.37 


26.38 


W.-Z. Wu, Y. Leung, J.-S. Mi: On generalized fuzzy 
belief functions in infinite spaces, IEEE Trans. 
Fuzzy Syst. 17, 385-397 (2009) 

X.D. Liu, W. Pedrycz, T.Y. Chai, M.L. Song: The 
development of fuzzy rough sets with the use 
of structures and algebras of axiomatic fuzzy 
sets, IEEE Trans. Knowl. Data Eng. 21, 443-462 
(2009) 

W.-Z. Wu: On some mathematical structures of T- 
fuzzy rough set algebras in infinite universes of 
discourse, Fundam. Inf. 108, 337-369 (2011) 

S. Greco, M. Inuiguchi, R. Stowinski: Rough sets 
and gradual decision rules, Lect. Notes Artif. In- 
tell. 2639, 156-164 (2003) 

S. Greco, M. Inuiguchi, R. Stowinski: Fuzzy rough 
sets and multiple-premise gradual decision rules, 
Int. J. Approx. Reason. 41(2), 179-211 (2006) 

L.I. Kuncheva: Fuzzy rough sets: Application to 
feature selection, Fuzzy Sets Syst. 51, 147-153 
(1992) 

R. Jensen, Q. Shen: Fuzzy-rough attributes reduc- 
tion with application to web categorization, Fuzzy 
Sets Syst. 141, 469-485 (2004) 

R. Jensen, Q. Shen: Semantics-preserving di- 
mensionality reduction: Rough and fuzzy-rough 
based approaches, IEEE Trans. Knowl. Data Eng. 
16, 1457-1471 (2004) 

R. Jensen, Q. Shen: Fuzzy-rough sets assisted at- 
tribute selection, IEEE Trans. Fuzzy Syst. 15, 73-89 
(2007) 

X.Z. Wang, E.C.C. Tsang, S.Y. Zhao, D.G. Chen, 
D.S. Yeung: Learning fuzzy rules from fuzzy sam- 
ples based on rough set technique, Fuzzy Sets Syst 
177, 4493-4514 (2007) 

S.Y. Zhao, E.C.C. Tsang: On fuzzy approximation 
operators in attribute reduction with fuzzy rough 
sets, Inf. Sci. 178, 3163-3176 (2008) 

S.Y. Zhao, E.C.C. Tsang, D.G. Chen: The model of 
fuzzy variable precision rough sets, IEEE Trans. 
Fuzzy Syst. 17, 451-467 (2009) 

D.G. Chen, S.Y. Zhao: Local reduction of decision 
system with fuzzy rough sets, Fuzzy Sets Syst. 161, 
1871-1883 (2010) 

Q.H. Hu, L. Zhang, D.G. Chen, W. Pedrycz, 
D.R. Yu: Gaussian kernel based fuzzy rough sets: 
Model, uncertainty measures and applications, 
Int. J. Approx. Reason. 51, 453-471 (2010) 

Q.H. Hu, D.R. Yu, W. Pedrycz, D.G. Chen: Kernel- 
ized fuzzy rough sets and their applications, IEEE 
Trans. Knowl. Data Eng. 23, 1649-1667 (2011) 

Q.H. Hu, L. Zhang, S. An, D. Zhang, D.R. Yu: On 
robust fuzzy rough set models, IEEE Trans. Fuzzy 
Syst. 20, 636-651 (2012) 

M. Inuiguchi: Classification- versus approxi- 
mation-oriented fuzzy rough sets, Proc. Inf. 
Process. Manag. Uncertain. Knowl.-Based Syst. 
(2004), CD-ROM 

G. Shafer: A Mathematical Theory of Evidence 
(Princeton Univ. Press, Princeton 1976) 


26.39 


26.40 


26.41 


26.42 


26.43 


26.44 


26.45 


26.46 


26.47 


26.48 


26.49 


26.50 


26.51 


26.52 


26.53 


26.54 


26.55 


26.56 


A. Skowron: The relationship between rough set 
theory and evidence theory, Bull. Polish Acad. Sci. 
Math. 37, 87-90 (1989) 

A. Skowron: The rough sets theory and evidence 
theory, Fundam. Inf. 13, 245-262 (1990) 

A. Skowron, J. Grzymala-Busse: From rough 
set theory to evidence theory. In: Advance in 
the Dempster-Shafer Theory of Evidence, ed. by 
R.R. Yager, M. Fedrizzi, J. Kacprzyk (Wiley, New 
York 1994) pp. 193-236 

W.-Z. Wu, Y. Leung, W.-X. Zhang: Connections 
between rough set theory and Dempster-Shafer 
theory of evidence, Int. J. Gen. Syst. 31, 405-430 
(2002) 

W.-Z. Wu, J.-S. Mi: Some mathematical structures 
of generalized rough sets in infinite universes of 
discourse, Lect. Notes Comput. Sci. 6499, 175-206 
(2011) 

Y.Y. Yao, P.J. Lingras: Interpretations of belief 
functions in the theory of rough sets, Inf. Sci. 104, 
81-106 (1998) 

P.J. Lingras, Y.Y. Yao: Data mining using extensions 
of the rough set model, J. Am. Soc. Inf. Sci. 49, 
415-422 (1998) 

W.-Z. Wu: Attribute reduction based on evidence 
theory in incomplete decision systems, Inf. Sci. 
178, 1355-1371 (2008) 

W.-Z. Wu: Knowledge reduction in random in- 
complete decision tables via evidence theory, 
Fundam. Inf. 115, 203-218 (2012) 

W.-Z. Wu, M. Zhang, H.-Z. Li, J.-S. Mi: Knowl- 
edge reduction in random information systems 
via Dempster-Shafer theory of evidence, Inf. Sci. 
174, 143-164 (2005) 

M. Zhang, L.D. Xu, W.-X. Zhang, H.-Z. Li: A rough 
set approach to knowledge reduction based on 
inclusion degree and evidence reasoning theory, 
Expert Syst. 20, 298-304 (2003) 

M. Inuiguchi: Generalization of rough sets and 
rule extraction, Lect. Notes Comput. Sci. 3100, 96- 
119 (2004) 

E.P. Klement, R. Mesiar, E. Pap: Triangular Norms 
(Kluwer, Boston 2000) 

W. Wu, J. Mi, W. Zhang: Generalized fuzzy rough 
sets, Inf. Sci. 151, 263-282 (2003) 

M. Inuiguchi, M. Sakawa: On the closure of gen- 
eration processes of implication functions from a 
conjunction function. In: Proc. 4th Int. Conf. Soft 
Comput. 1996) pp. 327-330 

D. Dubois, H. Prade: Fuzzy sets in approximate 
reasoning, Part 1: Inference with possibility dis- 
tributions, Fuzzy Sets Syst. 40, 143-202 (1991) 

M. Inuiguchi, T. Tanino: A new class of necessity 
measures and fuzzy rough sets based on certainty 
qualifications, Lect. Notes Comput. Sci. 2005, 261- 
268 (2001) 

M. Inuiguchi, T. Tanino: Function approxima- 
tion by fuzzy rough sets. In: Intelligent Systems 
for Information Processing: From Representa- 


Fuzzy-Rough Hybridization 


References 


26.57 


26.58 


26.59 


26.60 


26.61 


26.62 


26.63 


26.64 


26.65 


26.66 


26.67 


26.68 


26.69 


26.70 


26.71 


26.72 


26.73 


26.74 


tion to Applications, ed. by B. Bouchon-Meunier, 
L. Foulloy, R.R. Yager (Elsevier, Amsterdam 2003) 
pp. 93-104 

D. Dubois, H. Prade: Gradual inference rules in 
approximate reasoning, Inf. Sci. 61, 103-122 (1992) 
L.A. Zadeh: A fuzzy set-theoretic interpretation of 
linguistic hedge, J. Cybern. 2, 4-34 (1974) 

J.F. Baldwin: A new approach to approximate rea- 
soning using a fuzzy logic, Fuzzy Sets Syst. 2(4), 
309-325 (1979) 

Y. Tsukamoto: An approach to fuzzy reason- 
ing method. In: Advances in Fuzzy Set Theory 
and Applications, ed. by M.M. Gupta, R.K. Ra- 
gade, R.R. Yager (North-Holland, New-York 1979) 
pp. 137-149 

G. Choquet: Theory of capacities, Ann. l'institut 
Fourier 5, 131-295 (1954) 

L. Biacino: Fuzzy subsethood and belief functions 
of fuzzy events, Fuzzy Sets Syst. 158, 38-49 (2007) 
Y.Y. Yao: Generalized rough set model. In: Rough 
Sets in Knowledge Discovery 1. Methodology and 
Applications, ed. by L. Polkowski, A. Skowron 
(Physica, Heidelberg 1998) pp. 286-318 

D.G. Chen, W.X. Yang, F.C. Li: Measures of general 
fuzzy rough sets on a probabilistic space, Inf. Sci. 
178, 3177-3187 (2006) 

R. Jensen, Q. Shen: Fuzzy-rough data reduction 
with ant colony optimization, Fuzzy Sets Syst. 
149(1), 5-20 (2005) 

Q. Hu, D. Yu, Z. Xie: Information-preserving hybrid 
data reduction based on fuzzy-rough techniques, 
Pattern Recogn. Lett. 27(5), 414-423 (2006) 

E.C.C. Tsang, D.G. Chen, D.S. Yeungm, X.Z. Wang, 
J.W.T. Lee: Attributes reduction using fuzzy rough 
sets, IEEE Trans. Fuzzy Syst. 16(5), 1130-1141 
(2008) 

D. Chen, E. Tsang, S. Zhao: Attribute reduction 
based on fuzzy rough sets, Lect. Notes Comput. 
Sci. 4585, 73-89 (2007) 

D. Chen, E. Tsang, S. Zhao: An approach of at- 
tributes reduction based on fuzzy tl-rough sets, 
Proc. IEEE Int. Conf. Syst. Man Cybern. (2007) 
pp. 486-491 

R. Jensen, Q. Shen: New approaches to fuzzy- 
rough feature selectio, IEEE Trans. Fuzzy Syst. 17(4), 
824-838 (2009) 

C. Cornelis, G.H. Martin, R. Jensen, D. Slezak: Fea- 
ture selection with fuzzy decision reducts, Inf. Sci. 
180(2), 209-224 (2010) 

Q. Hu, X.Z. Xie, D.R. Yu: Hybrid attribute reduc- 
tion based on a novel fuzzy-rough model and 
information granulation, Pattern Recogn. 40(12), 
3509-3521 (2007) 

C. Cornelis, R. Jensen: A noise-tolerant approach 
to fuzzy-rough feature selection, Proc. IEEE Int. 
Conf. Fuzzy Syst. (2008) pp. 1598-1605 

Q. Hu, S.A. An, D.R. Yu: Soft fuzzy rough sets for 
robust feature evaluation and selection, Inf. Sci. 
180(22), 4384-4440 (2010) 


26.75 


26.76 


26.77 


26.78 


26.79 


26.80 


26.81 


26.82 


26.83 


26.84 


26.85 


26.86 


26.87 


26.88 


26.89 


Q. He, C.X. Wu, D.G. Chen, S.Y. Zhao: Fuzzy rough 
set based attribute reduction for information sys- 
tems with fuzzy decisions, Knowl.-Based Syst. 
24(5), 689-696 (2011) 

D.G. Chen, L. Zhang, S.Y. Zhao, Q.H. Hu, P.F. Zhu: 
A novel algorithm for finding reducts with fuzzy 
rough sets, IEEE Trans. Fuzzy Syst. 20(2), 385-389 
(2012) 

Y.H. Qian, C. Li, J.Y. Liang: An efficient fuzzy-rough 
attribute reduction approach, Lect. Notes Artif. 
Intell. 6954, 63-70 (2011) 

Y. Du, Q. Hu, D.G. Chen, P.J. Ma: Kernelized 
fuzzy rough sets based yawn detection for driver 
fatigue monitoring, Fundam. Inf. 111(1), 65-79 
(2011) 

D.G. Chen, Q.H. Hu, Y.P. Yang: Parameterized at- 
tribute reduction with Gaussian kernel based 
fuzzy rough sets, Inf. Sci. 181(23), 5169-5179 
(2011) 

Q. He, C.X. Wu: Membership evaluation and fea- 
ture selection for fuzzy support vector machine 
based on fuzzy rough sets, Soft Comput. 15(6), 
1105-1114 (2011) 

J. Derrac, C. Cornelis, S. Garcia, F. Herrera: Enhanc- 
ing evolutionary instance selection algorithms by 
means of fuzzy rough set based feature selection, 
Inf. Sci. 186(1), 73-92 (2012) 

R. Jensen, C. Cornelis: Fuzzy-rough instance se- 
lection, Proc. IEEE Int. Conf. Fuzzy Syst. (2010) 
pp. 1-7 

N. Verbiest, C. Cornelis, F. Herrera: Granularity- 
based instance selection, Proc. 20th Ann. Belg.- 
Dutch Conf. Mach. Learn. (2011) pp. 101-103 

J. Derrac, N. Verbiest, S. Garcia, C. Cornelis, 
F. Herrera: On the use of evolutionary feature 
selection for improving fuzzy rough set based 
prototype selection, Soft Comput. 17(2), 223-238 
(2013) 

E. Ramentol, N. Verbiest, R. Bello, Y. Caballero, 
C. Cornelis, F. Herrera: Smote-frst: A new resam- 
pling method using fuzzy rough set theory, Proc. 
10th Int. FLINS Conf. Uncertain. Model. Knowl. 
Eng. Decis. Mak. (2012) pp. 800-805 

N. Verbiest, E. Ramentol, C. Cornelis, F. Herrera: 
Improving smote with fuzzy rough prototype se- 
lection to detect noise in imbalanced classifica- 
tion data, Proc. 13th Ibero-Am. Conf. Artif. Intell. 
(2012) pp. 169-178 

R. Jensen, C. Cornelis, Q. Shen: Hybrid fuzzy- 
rough rule induction and feature selection, Proc. 
IEEE Int. Conf. Fuzzy Syst. (2009) pp. 1151- 
1156 

E. Tsang, S.Y. Zhao, J. Lee: Rule induction based 
on fuzzy rough sets, Proc. Int. Conf. Mach. Learn. 
Cybern. (2007) pp. 3028-3033 

S. Zhao, E. Tsang, D. Chen, X. Wang: Building 
a rule-based classifier - a fuzzy-rough set ap- 
proach, IEEE Trans. Knowl. Data Eng. 22, 624-638 
(2010) 


449 


9z |) Hed 


450 PartC 


Rough Sets 


92 |) Hed 


26.90 


26.91 


26.92 


26.93 


26.94 


26.95 


26.96 


26.97 


26.98 


26.99 


26.100 


26.101 


26.102 


26.103 


26.104 


26.105 


26.106 


26.107 


T.P. Hong, Y.L. Liou, S.L. Wang: Fuzzy rough sets 
with hierarchical quantitative attributes, Expert 
Syst. Appl. 36(3), 6790-6799 (2009) 

Y. Liu, Q. Zhou, E. Rakus-Andersson, G. Bai: 
A fuzzy-rough sets based compact rule induction 
method for classifying hybrid data, Lect. Notes 
Comput. Sci. 7414, 63-70 (2012) 

R. Diao, Q. Shen: A harmony search based ap- 
proach to hybrid fuzzy-rough rule induction, 
Proc. 21st Int. Conf. Fuzzy Syst. (2012) pp. 1549- 
1556 

J.M. Keller, M.R. Gray, J.R. Givens: A fuzzy k- 
nearest neighbor algorithm, IEEE Trans. Syst. Man 
Cybern. 15, 580-585 (1985) 

M. Sarkar: Fuzzy-rough nearest neighbor algo- 
rithms in classification, Fuzzy Sets Syst. 158, 2134- 
2152 (2007) 

R. Jensen, C. Cornelis: A new approach to fuzzy- 
rough nearest neighbour classification, Lect. 
Notes Comput. Sci. 5306, 310-319 (2008) 

R. Jensen, C. Cornelis: Fuzzy-rough nearest 
neighbour classification and prediction, Theor. 
Comput. Sci. 412, 5871-5884 (2011) 

Y. Qu, C. Shang, Q. Shen, N.M. Parthalain, 
W. Wu: Kernel-based fuzzy-rough nearest neigh- 
bour classification, IEEE Int. Conf. Fuzzy Syst. (2011) 
pp. 1523-1529 

H. Bian, L. Mazlack: Fuzzy-rough nearest- 
neighbor classification approach, 22nd Int. Conf. 
North Am. Fuzzy Inf. Process. Soc. (2003) pp. 500- 
505 

M.N. Parthalain, R. Jensen, Q. Shen, R. Zwigge- 
laar: Fuzzy-rough approaches for mammographic 
risk analysis, Intell. Data Anal. 13, 225-244 (2010) 
N. Verbiest, C. Cornelis, R. Jensen: Fuzzy rough 
positive region-based nearest neighbour classi- 
fication, Proc. 20th Int. Conf. Fuzzy Syst. (2012) 
pp. 1961-1967 

R. Jensen, Q. Shen: Fuzzy-rough feature signifi- 
cance for decision trees, Proc. 2005 UK Workshop 
Comput. Intell. (2005) pp. 89-96 

R. Bhatt, M. Gopal: FRCT: Fuzzy-rough classifica- 
tion trees, Pattern Anal. Appl. 11, 73-88 (2008) 
M. Elashiri, H. Hefny, A.A. Elwahab: Induction of 
fuzzy decision trees based on fuzzy rough set 
techniques, Proc. Int. Conf. Comput. Eng. Syst. 
(2011) pp. 134-139 

J. Zhai: Fuzzy decision tree based on fuzzy-rough 
technique, Soft Comput. 15, 1087-1096 (2011) 

D. Chen, Q. He, X. Wang: Frsvms: Fuzzy rough set 
based support vector machines, Fuzzy Sets Syst. 
161, 596-607 (2010) 

Z. Zhang, D. Chen, Q. He, H. Wang: Least squares 
support vector machines based on fuzzy rough 
set, IEEE Int. Conf. Syst. Man Cybern. (2010) 
pp. 3834-3838 

Z. Xue, W. Liu: A fuzzy rough support vector 
regression machine, 9th Int. Conf. Fuzzy Syst. 
Knowl. Discov. (2012) pp. 840-844 


26.108 


26.109 


26.110 


26.111 


26.112 


26.113 


26.114 


26.115 


26.116 


26.117 


26.118 


26.119 


26.120 


26.121 


26.122 


26.123 


26.124 


D. Chen, S. Kwong, Q. He, H. Wang: Geometrical 
interpretation and applications of membership 
functions with fuzzy rough sets, Fuzzy Sets Syst. 
193, 122-135 (2012) 

F. Li, F. Min, Q. Liu: Intra-cluster similarity index 
based on fuzzy rough sets for fuzzy c-means al- 
gorithm, Lect. Notes Comput. Sci. 5009, 316-323 
(2008) 

P. Maji: Fuzzy rough supervised attribute clus- 
tering algorithm and classification of microarray 
data, IEEE Trans. Syst. Man Cybern., Part B: Cybern. 
41, 222-233 (2011) 

M. Sarkar, B. Yegnanarayana: Fuzzy-rough neural 
networks for vowel classification, IEEE Int. Conf. 
Syst. Man Cybern., Vol. 5 (1998) pp. 4160-4165 
JY. Zhao, Z. Zhang: Fuzzy rough neural network 
and its application to feature selection, Fourth 
Int. Workshop Adv. Comput. Intell. (2011) pp. 684- 
687 

D. Zhang, Y. Wang: Fuzzy-rough neural network 
and its application to vowel recognition, 45th IEEE 
Conf. Control Decis. (2006) pp. 221-224 

M. JianXu, L. Caiping, W. Yaonan: Remote sensing 
images classification using fuzzy-rough neural 
network, IEEE Fifth Int. Conf. Bio-Inspir. Comput. 
Theor. Appl. (2010) pp. 761-765 

M. Sarkar, B. Yegnanarayana: Application of 
fuzzy-rough sets in modular neural networks, 
IEEE Joint World Congr. Comput. Intell. Neural 
Netw. (1998) pp. 741-746 

A. Ganivada, P. Sankar: A novel fuzzy rough gran- 
ular neural network for classification, Int. J. Com- 
put. Intell. Syst. 4, 1042-1051 (2011) 

M. Sarkar, B. Yegnanarayana: Rough-fuzzy set 
theoretic approach to evaluate the importance of 
input features in classification, Int. Conf. Neural 
Netw. (1997) pp. 1590-1595 

A. Ganivada, S.S. Ray, S.K. Pal: Fuzzy rough granu- 
lar self-organizing map and fuzzy rough entropy, 
Theor. Comput. Sci. 466, 37-63 (2012) 

L. Jiangping, P. Baochang, W. Yuke: Tongue image 
segmentation based on fuzzy rough sets, Proc. 
Int. Conf. Environ. Sci. Inf. Appl. Technol. (2009) 
pp. 367-369 

L. Jiangping, W. Yuke: A shortest path algorithm of 
image segmentation based on fuzzy-rough grid, 
Proc. Int. Conf. Comput. Intell. Softw. Eng. (2009) 
pp. 1-4 

A. Petrosino, A. Ferone: Rough fuzzy set-based 
image compression, Fuzzy Sets Syst. 160, 1485- 
1506 (2009) 

L. Zhou, W. Li, Y. Wu: Face recognition based on 
fuzzy rough set reduction, Proc. Int. Conf. Hybrid 
Inf. Technol. (2006) pp. 642-646 

A. Petrosino, G. Salvi: Rough fuzzy set based scale 
space transforms and their use in image analysis, 
Int. J. Approx. Reason. 4&1, 212-228 (2006) 

A. Petrosino, M. Ceccarelli: Unsupervised tex- 
ture discrimination based on rough fuzzy sets 


Fuzzy-Rough Hybridization 


References 


26.125 


26.126 


26.127 


26.128 


26.129 


26.130 


26.131 


26.132 


26.133 


26.134 


and parallel hierarchical clustering, Proc. IEEE Int. 
Conf. Pattern Recogn. (2000) pp. 1100-1103 

X. Wang, J. Yang, X. Teng, N. Peng: Fuzzy-rough 
set based nearest neighbor clustering classifica- 
tion algorithm, Proc. 2nd Int. Conf. Fuzzy Syst. 
Knowl. Discov. (2005) pp. 370-373 

C. Shang, Q. Shen: Aiding neural network based 
image classification with fuzzy-rough feature se- 
lection, Proc. IEEE Int. Conf. Fuzzy Syst. (2008) 
pp. 976-982 

S. Changjing, D. Barnes, S. Qiang: Effective feature 
selection for mars mcmurdo terrain image classi- 
fication, Proc. Int. Conf. Intell. Syst., Des. Appl. 
(2009) pp. 1419-1424 

D.V. Rao, V.V.S. Sarma: A rough-fuzzy approach for 
retrieval of candidate components for software 
reuse, Pattern Recogn. Lett. 24, 875-886 (2003) 
G. Cong, J. Zhang, T. Huazhong, K. Lai: A variable 
precision fuzzy rough group decision-making 
model for it offshore outsourcing risk evaluation, 
J. Glob. Inf. Manag. 16, 18-34 (2008) 

J. Xu, L. Zhao: A multi-objective decision-making 
model with fuzzy rough coefficients and its ap- 
plication to the inventory problem, Inf. Sci. 180, 
679-696 (2010) 

J. Xu, L. Zhao: A class of fuzzy rough expected 
value multi-objective decision making model 
and its application to inventory problems, Com- 
put. Math. Appl. 56(8), 2107-2119 (2008) 

B. Sun, W. Ma: Soft fuzzy rough sets and its appli- 
cation in decision making, Artif. Intell. Rev. 41(1), 
67-80 (2014) 

B. Suna, W. Ma, Q. Liu: An approach to decision 
making based on intuitionistic fuzzy rough sets 
over two universes, J. Oper. Res. Soc. 64(7), 1079- 
1089 (2012) 

T. Beaubouef, F. Petry: Fuzzy rough set tech- 
niques for uncertainty processing in a rela- 
tional database, Int. J. Intell. Syst. 15(5), 389-424 
(2000) 


26.135 


26.136 


26.137 


26.138 


26.139 


26.140 


26.141 


26.142 


26.143 


26.144 


26.145 


26.146 


R.R. Hashemi, F.F. Choobineh: A fuzzy rough sets 
classifier for database mining, Int. J. Smart Eng. 
Syst. Des. 4, 107-114 (2002) 

T.P. Hong, L.H. Tseng, B.C. Chien: Mining from in- 
complete quantitative data by fuzzy rough sets, 
Expert Syst. Appl. 37, 2644-2653 (2010) 

Y.F. Wang: Mining stock price using fuzzy rough 
set system, Expert Syst. Appl. 24, 13-23 (2003) 

A. Burney, N. Mahmood, Z. Abbas: Advances in 
fuzzy rough set theory for temporal databases, 
Proc. 11th WSEAS Int. Conf. Artif. Intell. Knowl. Eng. 
Data Bases (2012) pp. 237-242 

A. Burney, Z. Abbas, N. Mahmood, Q. Arifeen: Ap- 
plication of fuzzy rough temporal approach in 
patient data management (frt-pdm), Int. J. Com- 
put. 6, 149-157 (2012) 

P. Srinivasan, M. Ruiz, D.H. Kraft, J. Chen: Vo- 
cabulary mining for information retrieval: Rough 
sets and fuzzy sets, Inf. Process. Manag. 37, 15-38 
(2001) 

M. DeCock, C. Cornelis: Fuzzy rough set based web 
query expansion, Proc. Rough Sets Soft Comput. 
Intell. Agent Web Technol., Int. Workshop (2005) 
pp. 9-16 

L. Dey, M. Abulaish, R. Goyal, K. Shubham: 
A rough-fuzzy ontology generation framework 
and its application to bio-medical text process- 
ing, Proc. 5th Atl. Web Intell. Conf. (2007) pp. 74- 
79 

Y. Jiang, J. Wang, P. Deng, S. Tang: Reasoning 
within expressive fuzzy rough description logics, 
Fuzzy Sets Syst. 160, 3403-3424 (2009) 

F. Bobillo, U. Straccia: Generalized fuzzy rough 
description logics, Inf. Sci. 189, 43-62 (2012) 

Y. Jiang, Y. Tang, J. Wang, S. Tang: Reasoning 
within intuitionistic fuzzy rough description log- 
ics, Inf. Sci. 179, 2362-2378 (2009) 

F. Bobillo, U. Straccia: Supporting fuzzy rough sets 
in fuzzy description logics, Lect. Notes Comput. 
Sci. 5590, 676-687 (2009) 


451 


9z |) Hed 


27 Artificial Neural Network Models 
Peter Tino, Birmingham, UK 
Lubica Benuskova, Dunedin, New Zealand 
Alessandro Sperduti, Padova, Italy 


28 Deep and Modular Neural Networks 
Ke Chen, Manchester, UK 


29 Machine Learning 
James T. Kwok, Hong Kong, Hong Kong 
Zhi-Hua Zhou, Nanjing, China 
Lei Xu, Hong Kong, Hong Kong 


30 Theoretical Methods 
in Machine Learning 
Badong Chen, Xi'an, China 
Weifeng Liu, Chicago, USA 
José C. Principe, Gainesville, USA 


3 


—_ 


Probabilistic Modeling 

in Machine Learning 

Davide Bacciu, Pisa, Italy 

Paulo J.G. Lisboa, Liverpool, UK 
Alessandro Sperduti, Padova, Italy 
Thomas Villmann, Mittweida, Germany 


32 Kernel Methods 
Marco Signoretto, Leuven, Belgium 
Johan A. K. Suykens, Leuven, Belgium 


33 Neurodynamics 
Robert Kozma, Memphis, USA 
Jun Wang, Hongkong, Hong Kong 
Zhigang Zeng, Wuhan, China 


Part D 


Part D Neural Networks 


Ed. by Cesare Alippi, Marios Polycarpou 


34 Computational Neuroscience - 
Biophysical Modeling of Neural Systems 
Harrison Stratton, Phoenix, USA 
Jennie Si, Tempe, USA 


35 Computational Models of Cognitive 
and Motor Control 
Ali A. Minai, Cincinnati, USA 


36 Cognitive Architectures and Agents 
Sebastien Hélie, West Lafayette, USA 
Ron Sun, Troy, USA 


37 Embodied Intelligence 
Angelo Cangelosi, Plymouth, UK 
Josh Bongard, Burlington, USA 
Martin H. Fischer, Potsdam OT Golm, 
Germany 
Stefano Nolfi, Roma, Italy 


38 Neuromorphic Engineering 
Giacomo Indiveri, Zurich, Switzerland 


39 Neuroengineering 
Damien Coyle, Derry, Northern Ireland, UK 
Ronen Sosnik, Holon, Israel 


40 Evolving Connectionist Systems: 
From Neuro-Fuzzy-, to Spik- 
ing- and Neuro-Genetic 
Nikola Kasabov, Auckland, New Zealand 


41 Machine Learning Applications 
Piero P. Bonissone, San Diego, USA 


453 


27. Artificial Neural Network Models 


Peter Tino, Lubica Benuskova, Alessandro Sperduti 


We outline the main models and developments 
in the broad field of artificial neural networks 
(ANN). A brief introduction to biological neurons 
motivates the initial formal neuron model — the 
perceptron. We then study how such formal neu- 
rons can be generalized and connected in network 
structures. Starting with the biologically motivated 
layered structure of ANN (feed-forward ANN), the 
networks are then generalized to include feedback 
loops (recurrent ANN) and even more abstract gen- 
eralized forms of feedback connections (recursive 
neuronal networks) enabling processing of struc- 
tured data, such as sequences, trees, and graphs. 
We also introduce ANN models capable of form- 
ing topographic lower-dimensional maps of data 
(self-organizing maps). For each ANN type we out- 


The human brain is arguably one of the most excit- 
ing products of evolution on Earth. It is also the most 
powerful information processing tool so far. Learning 
based on examples and parallel signal processing lead 
to emergent macro-scale behavior of neural networks 
in the brain, which cannot be easily linked to the be- 
havior of individual micro-scale components (neurons). 
In this chapter, we will introduce artificial neural net- 
work (ANN) models motivated by the brain that can 
learn in the presence of a teacher. During the course of 
learning the teacher specifies what the right responses 
to input examples should be. In addition, we will also 
mention ANNs that can learn without a teacher, based 
on principles of self-organization. 

To set the context, we will begin by introducing ba- 
sic neurobiology. We will then describe the perceptron 
model, which, even though rather old and simple, is an 


27.1 Biological Neurons 


It is estimated that there are about 10!? neural cells 
(neurons) in the human brain. Two-thirds of the neurons 


27.1 Biological Neurons....................0008 455 
ka  PEFCEDUUON ociera 456 
27.3 Multilayered Feed-Forward 

ANN Models... 458 
27.4 Recurrent ANN Models.........................8 460 
27.5 Radial Basis Function ANN Models ........ 464 
27.6 Self-Organizing Maps......................005 465 
27.7 Recursive Neural Networks ................... 467 
262 GONCIUSION: enesenn 469 
ROTEFENCES .5.55 50.0565 ccessesscesssesscsavessccssssendeaaees 470 


line the basic principles of training the corre- 
sponding ANN models on an appropriate data 
collection. 


important building block of more complex feed-forward 
ANN models. Such models can be used to approximate 
complex non-linear functions or to learn a variety of as- 
sociation tasks. The feed-forward models are capable of 
processing patterns without temporal association. In the 
presence of temporal dependencies, e.g., when learning 
to predict future elements of a time series (with certain 
prediction horizon), the feed-forward ANN needs to 
be extended with a memory mechanism to account for 
temporal structure in the data. This will naturally lead 
us to recurrent neural network models (RNN), which 
besides feed-forward connections also contain feedback 
loops to preserve, in the form of the information pro- 
cessing state, information about the past. RNN can be 
further extended to recursive ANNs (RecNN), which 
can process structured data such as trees and acyclic 
graphs. 


form a 4—6 mm thick cortex that is assumed to be the 
center of cognitive processes. Within each neuron com- 


455 


v 
o 

=l 
i=) 
N 
ay 
=à 


456 PartD 


Neural Networks 


722 | d Hed 


Dendrites 


plex biological processes take place, ensuring that it can 
process signals from other neurons, as well as send its 
own signals to them. The signals are of electro-chemical 
nature. In a simplified way, signals between the neurons 
can be represented by real numbers quantifying the in- 
tensity of the incoming or outgoing signals. The point 
of signal transmission from one neuron to the other is 
called the synapse. Within synapse the incoming sig- 
nal can be reinforced or damped. This is represented by 
the weight of the synapse. A single neuron can have up 
to 10°—10° such points of entry (synapses). The input 
to the neuron is organized along dendrites and the soma 
(Fig. 27.1). Thousands of dendrites form a rich tree-like 
structure on which most synapses reside. 

Signals from other neurons can be either excitatory 
(positive) or inhibitory (negative), relayed via exci- 
tatory or inhibitory synapses. When the sum of the 
positive and negative contributions (signals) from other 
neurons, weighted by the synaptic weights, becomes 
greater than a certain excitation threshold, the neuron 
will generate an electric spike that will be transmitted 
over the output channel called the axon. At the end of 


27.2 Perceptron 


The perceptron is a simple neuron model that takes in- 
put signals (patterns) coded as (real) input vectors x = 
(x1,X2,...,%n+1) through the associated (real) vector 
of synaptic weights w = (w1,W2,...,Wn+1). The out- 
put o is determined by 


o = f (net) = f(w-x) 
n+1 


fl do wa 


j=! 


=f Yo wij : (27.1) 


j=l 


where net denotes the weighted sum of inputs, (i. e., dot 
product of weight and input vectors), and f is the acti- 
vation function. By convention, if there are n inputs to 


Fig. 27.1 Schematic illustration of the 
basic information processing struc- 
ture of the biological neuron 


Terminal 


axon, there are thousands of output branches whose ter- 
minals form synapses on other neurons in the network. 
Typically, as a result of input excitation, the neuron can 
generate a series of spikes of some average frequency — 
about 1 — 10? Hz. The frequency is proportional to the 
overall stimulation of the neuron. 

The first principle of information coding and rep- 
resentation in the brain is redundancy. It means that 
each piece of information is processed by a redun- 
dant set of neurons, so that in the case of partial 
brain damage the information is not lost completely. As 
a result, and crucially — in contrast to conventional com- 
puter architectures, gradually increasing damage to the 
computing substrate (neurons plus their interconnec- 
tion structure) will only result in gradually decreasing 
processing capabilities (graceful degradation). Further- 
more, it is important what set of neurons participate 
in coding a particular piece of information (distributed 
representation). Each neuron can participate in cod- 
ing of many pieces of information, in conjunction with 
other neurons. The information is thus associated with 
patterns of distributed activity on sets of neurons. 


the perceptron, the input (n + 1) will be fixed to —1 and 
the associated weight to w„+1 = 9, which is the value 
of the excitation threshold. 


Xn =—1 


Fig. 27.2 Schematic illustration of the perceptron model 


Artificial Neural Network Models | 27.2 Perceptron 


In 1958 Rosenblatt [27.1] introduced a discrete 
perceptron model with a bipolar activation function 
(Fig. 27.2) 


Jf (net) = sign(net) 


+1 ifnet >0o) way > 


= ‘fel (27.2) 


—1 if net <04% $ way <0. 


j=! 


The boundary equation 


> wy-09=0, (27.3) 


J=1 


parameterizes a hyperplane in n-dimensional space with 
normal vector w. 

The perceptron can classify input patterns into two 
classes, if the classes can indeed be separated by an 
(n— 1)-dimensional hyperplane (27.3). In other words, 
the perceptron can deal with linearly-separable prob- 
lems only, such as logical functions AND or OR. XOR, 
on the other hand, is not linearly separable (Fig. 27.3). 
Rosenblatt showed that there is a simple training rule 
that will find the separating hyperplane, provided that 
the patterns are linearly separable. 

As we shall see, a general rule for training many 
ANN models (not only the perceptron) can be for- 
mulated as follows: the weight vector w is changed 
proportionally to the product of the input vector and 
a learning signal s. The learning signal s is a function 
of w, x, and possibly a teacher feedback d 


s=s(w,x,d) or s=s(w,Xx). (27.4) 


In the former case, we talk about supervised learning 
(with direct guidance from a teacher); the latter case is 
known as unsupervised learning. The update of the j-th 
weight can be written as 


wilt + 1) = w(t) + Aw,(d) = w(t) + s(t) Ct) . 
(27.5) 


Fig. 27.3 Linearly separable and non-separable problems 


The positive constant 0 < œ < 1 is called the learning 
rate. 

In the case of the perceptron, the learning signal is 
the disproportion (difference) between the desired (tar- 
get) and the actual (produced by the model) response, 
s = d —o = ô. The update rule is known as the ô (delta) 
rule 


Aw; = a(d—0)x;. (27.6) 


The same rule can, of course, be used to update the ac- 
tivation threshold w,4) = 8. 
Consider a training set 


Arain = {(x!, dD, P)... (9?,d?)... R, dP) 


consisting of P (input,target) couples. The perceptron 
training algorithm can be formally written as: 


© Step 1: Set w € (0,1). Initialize the weights ran- 
domly from (—1, 1). Set the counters to k = 1, p= 
1 (k indexes sweep through Arain, p indexes individ- 
ual training patterns). 

@ Step 2: Consider input x, calculate the output o = 
sign(S "1 wat). 

© Step 3: Weight update: w; <— w; + a(d? — oP)xX , for 
jJ=Hl,...,n4+1. 

© Step 4: If p <P, setp < p + 1, go to step 2. Other- 
wise go to step 5. 

@ Step 5: Fix the weights and calculate the cumulative 
error E on Arain- 

© Step 6: If E = Q, finish training. Otherwise, set p = 
1,k =k-+1 and go to step 2. A new training epoch 
starts. 


457 


722 | d Hed 


458 


EZZ | d Hed 


Part D | Neural Networks 


27.3 Multilayered Feed-Forward ANN Models 


A breakthrough in our ability to construct and train 
more complex multilayered ANNs came in 1986, when 
Rumelhart etal. [27.2] introduced the error back- 
propagation method. It is based on making the transfer 
functions differentiable (hence the error functional to be 
minimized is differentiable as well) and finding a local 
minimum of the error functional by the gradient-based 
steepest descent method. 

We will show derivation of the back-propagation 
algorithm for two-layer feed-forward ANN as demon- 
strated, e.g., in [27.3]. Of course, the same principles 
can be applied to a feed-forward ANN architecture with 
any (finite) number of layers. In feed-forward ANNs 
neurons are organized in layers. There are no connec- 
tions among neurons within the same layer; connections 
only exist between successive layers. Each neuron from 
layer / has connections to each neuron in layer l + 1. 

As has already been mentioned, the activation func- 
tions need to differentiable and are usually of the 
sigmoid shape. The most common activation functions 
are 


© Unipolar sigmoid: 


1 


© Bipolar sigmoid (hyperbolic tangent): 


2 


———_——_ - 1 27.8 
1 + exp(—Anet) ( ) 


fuet) = 

The constant A > 0 determines steepness of the sig- 
moid curve and it is commonly set to 1. In the limit 
À — oo the bipolar sigmoid tends to the sign function 
(used in the perceptron) and the unipolar sigmoid tends 
to the step function. 

Consider the single-layer ANN in Fig. 27.4. The 
output and input vectors are y= (yj,...,)j,-.-,yy) 
and 0 = (01,..., Ok, . - -, Og), respectively, where og = 
f(net,) and 


J 
net, = D Wij - (27.9) 
j=l 


Set yy = —1 and wą; = 6, a threshold fork = 1,...,K 
output neurons. The desired output is d= (d\,..., dk, 
...,dx). 

After training, we would like, for all training pat- 
terns p=1,...,P from Aywan, the model output to 


closely resemble the desired values (target). The train- 
ing problem is transformed to an optimization one by 
defining the error function 


1X 
Ep = 2 > (dyk — Op)” , (27.10) 
k=1 
where p is the training point index. Æ, is the sum of 
squares of errors on the output neurons. During learn- 
ing we seek to find the weight setting that minimizes 
E,. This will be done using the gradient-based steepest 
descent on £, 


P OE dE, d(nety) _ 
ows d(net,) ows 


Awy = AS KY} 5 


(27:11) 


where œ is a positive learning rate. Note that 
—0E,/0(net,) = ok, which is the generalized training 
signal on the k-th output neuron. The partial derivative 
0(net,)/ dw, is equal to y; (27.9). Furthermore, 


dE, OE oy 


Sok = Inet) do, (net) (Apt 


Op ft > 
(27.12) 


where f? denotes the derivative of the activation func- 
tion with respect to net,. For the unipolar sigmoid 
(27.7), we have ff = ox (1 —o,). For the bipolar sigmoid 
(27.8), ff = (1/21 —0;). The rule for updating the j-th 
weight of the k-th output neuron reads as 

Awy = a (dyk — Op Md} ’ (27.13) 
where (dk — Opx) R = ok is generalized error signal 
flowing back through all connections ending in the k- 
the output neuron. Note that if we put ff = 1, we would 
obtain the perceptron learning rule (27.6). 


yi 


Jj 


Wy 


Fig. 27.4 A single-layer ANN 


Artificial Neural Network Models 


27.3 Multilayered Feed-Forward ANN Models 


We will now extend the network with another layer, 
called the hidden layer (Fig. 27.5). 

Input to the network is identical with the in- 
put vector x = (x,,...,2%;,...,,) for the hidden layer. 
The output neurons process as inputs the outputs y = 
Ois- Yj- --Y7), Yy =f (net) from the hidden layer. 
Hence, 


I 


net = b> VjiXi . 


i=1 


(27.14) 


As before, the last (in this case the /-th) input is fixed to 
—1. Recall that the same holds for the output of the J-th 
hidden neuron. Activation thresholds for hidden neu- 
rons are vj, = 6, forj=1,...,J. 

Equations (27.11)—(27.13) describe modification of 
weights from the hidden to the output layer. We will 
now show how to modify weights from the input to the 
hidden layer. We would still like to minimize E, (27.10) 
through the steepest descent. 

The hidden weight v; will be modified as follows 


ðE, ; 
Oe dE, (net) 
OVji d(ney) dvi 


Ayi = = a yiXi . 


(27.15) 


Again, —dE,/0(net;) = ô; is the generalized training 
signal on the j-th hidden neuron that should flow on the 
input weights. As before, d(net;) /dvjj = x; (27.14). Fur- 
thermore, 


s- 3p _ p ay OE y 
” (net) dy; Ə(net) dy’ 


(27.16) 


Fig. 27.5 A two-layer feed-forward ANN 


where f is the derivative of the activation function in 
the hidden layer with respect to net, 


--5 (dyk — opk) oe 
Y 


> on yf (netz) 0(net,) 
p. 


9) Jne t) oy 
Since fj is the derivative of the output neuron sigmoid 
with respect to net, and d(nety)/dy = wy (27.9), we 
have 


(27.17) 


dE, K K 
Fe De p= ofwy =— D7 Sowy - 
X k=1 k=1 
(27.18) 
Plugging this to (27.16) we obtain 
K 
3 = (>: sam) (27.19) 
k=1 


Finally, the weights from the input to the hidden layer 
are modified as follows 


K 
Avi = a (>: sam) f'xi 


k=1 


(27.20) 


Consider now the general case of m hidden layers. For 
the n-th hidden layer we have 


Avi = abn! , (27.21) 
where 
s = (>: ee “a EY, (27.22) 


and fy’ is the derivative of the activation function of 
the n- layer with respect to net’. 

Often, the learning speed can be improved by using 
the so-called momentum term 


Awy(t) — Awg(t) + HAwy(t— 1), 


Avji(t) < Ayj(t) + pAvi(t— 1), (27.23) 
where jz € (0, 1) is the momentum rate. 

Consider a training set 

Arain = {@',d')@,d’)...G?,d?)...@?,a")}. 


The back-propagation algorithm for training feed- 
forward ANNs can be summarized as follows: 


459 


EZZ | d Hed 


460 


1212 | d Hed 


Part D 


Neural Networks 


@ Step 1: Set a € (0,1). Randomly initialize weights 
to small values, e.g., in the interval (—0.5, 0.5). 
Counters and the error are initialized as follows: 
k=1,p=1, E=0. E denotes the accumulated er- 
ror across training patterns 


E=) Ę,, 


p=1 


(27.24) 


where E, is given in (27.10). Set a tolerance thresh- 
old ¢ for the error. The threshold will be used to stop 
the training process. 

@ Step 2: Apply input x? and compute the correspond- 
ing y? and o?. 

© Step 3: For every output neuron, calculate ôo 
(27.12), for hidden neuron determine 6); (27.19). 

© Step 4: Modify the weights wy <— wy + 0d, and 
Yi < Vi + OS yjXi- 

© Step 5: If p < P, set p = p+ 1 and go to step 2. Oth- 
erwise go to step 6. 

© Step 6: Fixing the weights, calculate E. If E < e, 
stop training, otherwise permute elements of Ayain, 
set E =0,p=1,k = k + 1, and go to step 2. 


Consider a feed-forward ANN with fixed weights 
and single output unit. It can be considered a real- 
valued function G on /-dimensional vectorial inputs, 


J I 
G® =f |} wf > vs) 


j=l i=1 


There has been a series of results showing that such 
a parameterized function class is sufficiently rich in the 
space of reasonable functions (see, e.g., [27.4]). For 
example, for any smooth function F over a compact do- 
main and a precision threshold ¢, for sufficiently large 
number J of hidden units there is a weight setting so 
that G is not further away from F than e (in L-2 norm). 

When training a feed-forward ANN a key decision 
must be made about how complex the model should be. 
In other words, how many hidden units J one should 
use. If J is too small, the model will be too rigid (high 


27.4 Recurrent ANN Models 


Consider a situation where the associations in the train- 
ing set we would like to learn are of the following 
(abstract) form: a> «, b —> p, b > a, b > y, c > q, 
c —> y, d —> q, etc., where the Latin and Greek letters 
stand for input and output vectors, respectively. It is 


bias) and it will not be able to sufficiently adapt to the 
data. However, under different samples from the same 
data generating process, the resulting trained models 
will vary relatively little (low variance). On the other 
hand, if J is too high, the model will be too complex, 
modeling even such irrelevant features of the data such 
as output noise. The particular data will be interpolated 
exactly (low bias), but the variability of fitted models 
under different training samples from the same process 
will be immense. It is, therefore, important to set J to an 
optimal value, reflecting the complexity of the data gen- 
erating process. This is usually achieved by splitting the 
data into three disjoint sets — training, validation, and 
test sets. Models with different numbers of hidden units 
are trained on the training set, their performance is then 
checked on a held-out validation set. The optimal num- 
ber of hidden units is selected based on the (smallest) 
validation error. Finally, the test set is used for inde- 
pendent comparison of selected models from different 
model classes. 

If the data set is not large enough, one can perform 
such a model selection using k-fold cross-validation. 
The data for model construction (this data would be 
considered training and validation sets in the scenario 
above) is split into k disjoint folds. One fold is selected 
as the validation fold, the other k— 1 will be used for 
training. This is repeated k times, yielding k estimates 
of the validation error. The validation error is then cal- 
culated as the mean of those k estimates. 

We have described data-based methods for model 
selection. Other alternatives are available. For exam- 
ple, by turning an ANN into a probabilistic model (e.g., 
by including an appropriate output noise model), un- 
der some prior assumptions on weights (e.g., a-priori 
small weights are preferred), one can perform Bayesian 
model selection (through model evidence) [27.5]. 

There are several seminal books on feed-forward 
ANNs with well-documented theoretical foundations 
and practical applications, e.g., [27.3,6,7]. We refer 
the interested reader to those books as good starting 
points as the breadth of theory and applications of feed- 
forward ANNs is truly immense. 


clear that now for one input item there can be different 
output associations, depending on the temporal con- 
text in which the training items are presented. In other 
words, the model output is determined not only by the 
input, but also by the history of presented items so far. 


Artificial Neural Network Models | 27.4 Recurrent ANN Models 


Obviously, the feed-forward ANN model described in 
the previous section cannot be used in such cases and 
the model must be further extended so that the temporal 
context is properly represented. 

The architecturally simplest solution is provided 
by the so-called time delay neural network (TDNN) 
(Fig. 27.6). The input window into the past has a finite 
length D. If the output is an estimate of the next item 
of the input time series, such a network realizes a non- 
linear autoregressive model of order D. 

If we are lucky, even such a simple solution can be 
sufficient to capture the temporal structure hidden in the 
data. An advantage of the TDNN architecture is that 
some training methods developed for feed-forward net- 
works can be readily used. A disadvantage of TDNN 
networks is that fixing a finite order D may not be ade- 
quate for modeling the temporal structure of the data 
generating source. TDNN enables the feed-forward 
ANN to see, besides the current input at time t, the other 
inputs from the past up to time ¢— D. Of course, during 
the training, it is now imperative to preserve the order of 
training items in the training set. TDNN has been suc- 
cessfully applied in many fields where spatial-temporal 
structures are naturally present, such as robotics, speech 
recognition, etc. [27.8, 9]. 

In order to extend the ANN architecture so that the 
variable (even unbounded) length of input window can 
be flexibly considered, we need a different way of cap- 
turing the temporal context. This is achieved through 
the so-called state space formulation. In this case, we 
will need to change our outlook on training. The new 
architectures of this type are known as recurrent neural 
networks (RNN). 

As in feed-forward ANNs, there are connections be- 
tween the successive layers. In addition, and in contrast 
to feed-forward ANNs, connections between neurons of 
the same layer are allowed, but subject to a time de- 


Output layer 


Hidden layer 


(t) xD 


Fig. 27.6 TDNN of order D 


lay. It also may be possible to have connections from 
a higher-level layer to a lower layer, again subject to 
a time delay. In many cases it is, however, more conve- 
nient to introduce an additional fictional context layer 
that contains delayed activations of neurons from the 
selected layer(s) and represent the resulting RNN archi- 
tecture as a feed-forward architecture with some fixed 
one-to-one delayed connections. As an example, con- 
sider the so-called simple recurrent network (SRN) of 
Elman [27.10] shown in Fig. 27.7. The output of SRN 
at time t is given by 


J 
oP =f dima? |. 


j=l 


i I 
y= (>: wA va) . (27.25) 


i=1 i=l 


The hidden layer constitutes the state of the input- 
driven dynamical system whose role it is to represent 
the relevant (with respect to the output) information 


F 


\ 


‘PN | 


Fig. 27.7 Schematic depiction of the SRN architecture 


Hidden 


Input Context 


Fig. 27.8 Schematic depiction of the Jordan’s RNN archi- 
tecture 


461 


712 | d Hed 


462 


712 | d Hed 


Part D 


Neural Networks 


about the input history seen so far. The state (as in 
generic state space model) is updated recursively. 

Many variations on such architectures with time- 
delayed feedback loops exist. For example, Jor- 
dan [27.11] suggested to feed back the outputs as 
the relevant temporal context, or Bengio et al. [27.12] 
mixed the temporal context representations of SRN and 
the Jordan network into a single architecture. Schematic 
representations of these architectures are shown in 
Figs. 27.8 and 27.9. 

Training in such architectures is more complex 
than training of feed-forward ANNs. The principal 
problem is that changes in weights propagate in time 
and this needs to be explicitly represented in the up- 
date rules. We will briefly mention two approaches 
to training RNNs, namely back-propagation through 
time (BPTT) [27.13] and real-time recurrent learning 
(RTRL) [27.14]. We will demonstrate BPTT on a clas- 


e 
Output 


“7 Unitdelay >, T 
l Hidden 
l 
l 
l 
Context Input Context 


Fig. 27.9 Schematic depiction of the Bengio’s RNN archi- 
tecture 


/ 


Fig. 27.10 A two-neuron SRN 


sification task, where the label of the input sequence 
is known only after T time steps (i.e., after T input 
items have been processed). The RNN is unfolded in 
time to form a feed-forward network with T hidden lay- 
ers. Figure 27.10 shows a simple two-neuron RNN and 
Fig. 27.11 represents its unfolded form for T = 2 time 
steps. 

The first input comes at time t = 1 and the last at 
t= T. Activities of context units are initialized at the 
beginning of each sequence to some fixed numbers. 
The unfolded network is then trained as a feed-forward 
network with T hidden layers. At the end of the se- 
quence, the model output is determined as 


J 


T T T 
oD =F EPP), 
j=l 


J I 
Par (Supima apa). e28 
j=l i=1 


Having the model output enables us to compute the er- 
ror 


K 


ET = 5 (aP — of?) 


k=1 


2 
; (27.27) 


The hidden-to-output weights are modified according to 


0E(T 
Amy? = —Q o) = aP yP ; 


(27.28) 


Omg 


YL vn(2) 
~ 


~ 


Fig. 27.11 Two-neuron SRN unfolded in time for T = 2 


Artificial Neural Network Models | 27.4 Recurrent ANN Models 


where 


sP = (aP - 


ae (ne) ; 


The other weight updates are calculated as follows 


(27.29) 


og EN aD yD , 
OWnj 


g = ap sP n) ’ (net{”) 


0E(T 
dE(T) = gO 
My 


5 = (EPn p) ' (ne?) (27.31) 


Aw!) = as yl (T—2) . : 


(T) _ 
Aw Why 


(27.30) 


Av = —a 


sD = Da (T—1) f (neg ») 


j=l 
(27.32) 
Ave) BE of PD PD 


07D — (>: sT Dyl y i) (ne =») 
i ; 


h=1 
(27.33) 


etc. The final weight updates are the averages of the 
T partial weight update suggestions calculated on the 
unfolded network 


(O) 9 
E= A Lia Awy Ei Ay Ay, í 


and Avj; = 
T OT 
(27.34) 


Awn = 


For every new training sequence (of possibly different 
length T) the network is unfolded to the desired length 
and the weight update process is repeated. In some 
cases (e.g., continual prediction on time series), it is 
necessary to set the maximum unfolding length L that 
will be used in every update step. Of course, in such 
cases we can lose vital information from the past. This 
problem is eliminated in the RTRL methodology. 
Consider again the SRN architecture in Fig. 27.6. 
In RTRL the weights are updated on-line, i. e., at every 


time step t 
dE 
Aw? =-a ; 
4 ə (O) 
kj 
OE 
Avy? =— ©’ 
Ovi; 
o dE 
Am; = 5: (27.35) 
m 


The updates of hidden-to-output weights are straight- 
forward 


Am® = _ od? yO = a (a? — ol?) fg (ne) y® i 


(27.36) 
For the other weights we have 
J (O) 
d 
an? <0) (W wape) 
h=1 Vi 
K J aye 
wy =a 5 (e> 2o wag a ) ; (27.37) 
k=1 wji 
where 
D y ( O) (x0 ayy"? 
h — f (net E one + Wh 
a aa 3 ua 
I=1 
(27.38) 
ayy O) (0 ay)? 
T (neti?) (xP g + ae 
+ I=1 
eer 


and ono" is the Kronecker delta oe =1,ifj=h; 
kron: = 0 otherwise). The partial derivatives required 
for the weight updates can be recursively updated us- 
ing (27.37)-(27.39). To initialize training, the partial 
derivatives at t = 0 are usually set to 0. 

There is a well-known problem associated with 
gradient-based parameter fitting in recurrent networks 
(and, in fact, in any parameterized state space models 
of similar form) [27.15]. In order to latch an important 
piece of past information for future use, the state- 
transition dynamics (27.25) should have an attractive 
set. 

However, in the neighborhood of such an attractive 
set, the derivatives of the dynamic state-transition map 


463 


lz | d Hed 


464 PartD 


Neural Networks 


G°ZZ | d Hed 


vanish. Vanishingly small derivatives cannot be reliably 
propagated back through time in order to form a useful 
latching set. This is known as the information latching 
problem. Several suggestions for dealing with informa- 
tion latching problem have been made, e.g., [27.16]. 
The most prominent include long short term memory 
(LSTM) RNN [27.17] and reservoir computation mod- 
els [27.18]. 

LSTM models operate with a specially designed 
formal neuron model that contains so-called gate units. 
The gates determine when the input is significant (in 
terms of the task given) to be remembered, whether 
the neuron should continue to remember the value, and 
when the value should be output. The LSTM architec- 
ture is especially suitable for situations where there are 
long time intervals of unknown size between impor- 
tant events. LSTM models have been shown to provide 
superior results over traditional RNNs in a variety of 
applications (e.g., [27.19, 20]). 

Reservoir computation models try to avoid the 
information latching problem by fixing the state- 
transition part of the RNN. Only linear readout from 
the state activations (hidden recurrent layer) producing 
the output is fit to the data. The state space with the as- 


sociated dynamic state transition structure is called the 
reservoir. The main idea is that the reservoir should be 
sufficiently complex so as to capture a large number of 
potentially useful features of the input stream that can 
be then exploited by the simple readout. 

The reservoir computing models differ in how the 
fixed reservoir is constructed and what form the readout 
takes. For example, echo state networks (ESN) [27.21] 
have fixed RNN dynamics (27.25), but with a lin- 
ear hidden-to-output layer map. Liquid state machines 
(LSM) [27.22] also have (mostly) linear readout, but 
the reservoirs are realized through the dynamics of a set 
of coupled spiking neuron models. Fractal prediction 
machines (FPM) [27.23] are reservoir RNN models for 
processing discrete sequences. The reservoir dynamics 
is driven by an affine iterative function system and the 
readout is constructed as a collection of multinomial 
distributions. Reservoir models have been successfully 
applied in many practical applications with competitive 
results, e.g., (27.21, 24, 25]. 

Several books that are solely dedicated to RNNs 
have appeared, e.g., [27.26—-28] and they contain 
a much deeper elaboration on theory and practice of 
RNNs than we were able to provide here. 


27.5 Radial Basis Function ANN Models 


In this section we will introduce another implemen- 
tation of the idea of feed-forward ANN. The activa- 
tions of hidden neurons are again determined by the 
closeness of inputs X = (x1, X2, .. . , Xn) to weights ¢ = 
(C1, C2,...,Cn). Whereas in the feed-forward ANN in 
Sect. 27.3, the closeness is determined by the dot- 
product of x and c, followed by the sigmoid activation 
function, in radial basis function (RBF) networks the 
closeness is determined by the squared Euclidean dis- 
tance of x and c, transferred through the inverse expo- 
nential. The output of the j-th hidden unit with input 
weight vector G; is given by 


a E 
jX p z , 


J 


(27.40) 


where g; is the activation strength parameter of the j-th 
hidden unit and determines the width of the spherical 
(un-normalized) Gaussian. The output neurons are usu- 
ally linear (for regression tasks) 


J 
o) = 5 wyg) . (27.41) 


j=! 


The RBF network in this form can be simply viewed 
as a form of kernel regression. The J functions ø 
form a set of J linearly independent basis functions 
(e.g., if all the centers c are different) whose span 
(the set of all their linear combinations) forms a lin- 
ear subspace of functions that are realizable by the 
given RBF architecture (with given centers c and kernel 
widths gj). 

For the training of RBF networks, it important that 
the basis functions g(x) cover the structure of the in- 
puts space faithfully. Given a set of training inputs x? 
from Again = {(X', d) E, a)... (3, P)... 0, d?)}, 
many RBF-ANN training algorithms determine the cen- 
ters & and widths o; based on the inputs {x! EE a } 
only. One can employ different clustering algo- 
rithms, e.g., k-means [27.29], which attempts to 
position the centers among the training inputs so 
that the overall sum of (Euclidean) distances be- 
tween the centers and the inputs they represent (i. e., 
the inputs falling in their respective Voronoi com- 
partments — the set of inputs for which the cur- 
rent center is the closest among all the centers) is 
minimized: 


Artificial Neural Network Models | 27.6 Self-Organizing Maps 


@ Step 1: Set J, the number of hidden units. The op- 
timum value of Jcan be obtained through a model 
selection method, e.g., cross-validation. 

@ Step 2: Randomly select J training inputs that will 
form the initial positions of the J centers G. 

@ Step 3: At time step t: 

a) Pick a training input x(t) and find the center c(t) 
closest to it. 
b) Shift the center c(t) towards x(t) 


clt) — El) + PHAM — ee), 
where 0 < p(t) <1. (27.42) 
The learning rate p(t) usually decreases in time 
towards zero. The training is stopped once the cen- 
ters settle in their positions and move only slightly 
(some norm of weight updates is below a certain 
threshold). Since k-means is guaranteed to find only lo- 
cally optimal solutions, it is worth re-initializing the 
centers and re-running the algorithm several times, 
keeping the solution with the lowest quantization 
error. 
Once the centers are in their positions, it is easy to 
determine the RBF widths, and once this is done, the 


27.6 Self-Organizing Maps 


In this section we will introduce ANN models that 
learn without any signal from a teacher, i.e., learning 
is based solely on training inputs — there are no out- 
put targets. The ANN architecture designed to operate 
in this setting was introduced by Kohonen under the 
name self-organizing map (SOM) [27.32]. This model 
is motivated by organization of neuron sensitivities in 
the brain cortex. 

In Fig. 27.12a we show schematic illustration of 
one of the principal organizations of biological neural 
networks. In the bottom layer (grid) there are recep- 
tors representing the inputs. Every element of the inputs 
(each receptor) has forward connections to all neurons 
in the upper layer representing the cortex. The neurons 
are organized spatially on a grid. Outputs of the neurons 
represent activation of the SOM network. The neurons, 
besides receiving connections from the input recep- 
tors, have a lateral interconnection structure among 
themselves, with connections that can be excitatory, or 
inhibitory. In Fig. 27.12b we show a formal SOM archi- 
tecture — neurons spatially organized on a grid receive 
inputs (elements of input vectors) through connections 
with synaptic weights. 


output weights can be solved using methods of linear 
regression. 

Of course, it is more optimal to position the centers 
with respect to both the inputs and target outputs in the 
training set. This can be formulated, e.g., as a gradient 
descent optimization. Furthermore, covering of the in- 
put space with spherical Gaussian kernels may not be 
optimal, and algorithms have been developed for learn- 
ing of general covariance structures. A comprehensive 
review of RBF networks can be found, e.g., in [27.30]. 

Recently, it was shown that if enough hidden 
units are used, their centers can be set randomly 
at very little cost, and determination of the only 
remaining free parameters — output weights — can 
be done cheaply and in a closed form through lin- 
ear regression. Such architectures, known as extreme 
learning machines [27.31] have shown surprisingly 
high performance levels. The idea of extreme learn- 
ing machines can be considered as being analo- 
gous to the idea of reservoir computation, but in 
the static setting. Of course, extreme learning ma- 
chines can be built using other implementations of 
feed-forward ANNs, such as the sigmoid networks of 
Sect. 27.3. 


A particular feature of the SOM is that it can map 
the training set on the neuron grid in a manner that 
preserves the training set’s topology — two input pat- 
terns close in the input space will activate neurons most 
that are close on the SOM grid. Such topological map- 
ping of inputs (feature mapping) has been observed in 
biological neural networks [27.32] (e.g., visual maps, 
orientation maps of visual contrasts, or auditory maps, 
frequency maps of acoustic stimuli). 

Teuvo Kohonen presented one possible realization 
of the Hebb rule that is used to train SOM. Input 


Fig. 27.12a,b Schematic representation of the SOM ANN architec- 


tures 


465 


9°22 | d Hed 


466 PartD 


Neural Networks 


9°22 | d Hed 


weights of the neurons are initialized as small ran- 

dom numbers. Consider a training set of inputs, Atrain = 
SiP : 

{Xp þp=i and linear neurons 


m 
Oj = X wiy = Wix . (27.43) 


j=l 


where m is the input dimension andi = 1,...,n. Train- 
ing inputs are presented in random order. At each train- 
ing step, we find the (winner) neuron with the weight 
vector most similar to the current input x. The mea- 
sure of similarity can be based on the dot product, i. e., 
the index of the winner neuron is i* = arg max(w/ xX), 
or the (Euclidean) distance i* = arg min; ||x— W;||. Af- 
ter identifying the winner the learning continues by 
adapting the winner’s weights along with the weights 
all its neighbors on the neuron grid. This will ensure 
that nearby neurons on the grid will eventually repre- 
sent similar inputs in the input space. This is moderated 
by a neighborhood function h(i* , i) that, given a winner 
neuron index i*, quantifies how many other neurons on 
the grid should be adapted 


w (t+ 1) = Wi (A) +a (A) -h (i*i) EÀ -wi O). 
(27.44) 


The learning rate a(t) € (0, 1) decays in time as 1/t, or 
exp(—kt), where k is a positive time scale constant. This 
ensures convergence of the training process. The sim- 
plest form of the neighborhood function operates with 
rectangular neighborhoods, 


h(i*,i) = 1, ifdy(i*,) <A (£) 


, (27.45) 
0, otherwise, 


where dy(i*,i) represents the 2A (t) (Manhattan) dis- 
tance between neurons i* and i on the map grid. The 
neighborhood size 2A (t) should decrease in time, e.g., 
through an exponential decay as or exp(—q?), with time 
scale q > 0. Another often used neighborhood function 
is the Gaussian kernel 


d2 (i*, 2) 


h(i* i) = exp (- 


where dg(i*, i) is the Euclidean distance between i* and 
i on the grid, i. e., dg(i*, i) = ||7;* —7;||, where 7; is the 
co-ordinate vector of the i-th neuron on the grid SOM. 


Training of SOM networks can be summarized as fol- 
lows: 


© Step 1: Set œo, Ao and fmax (maximum number 
of iterations). Randomly (e.g., with uniform distri- 
bution) generate the synaptic weights (e.g., from 
(—0.5,0.5)). Initialize the counters: t = 0, p = 1; t 
indexes time steps (iterations) and p is the input pat- 
tern index. 

© Step 2: Take input x? and find the corresponding 
winner neuron. 

@ Step 3: Update the weights of the winner and its 
topological neighbors on the grid (as determined by 
the neighborhood function). Increment t. 

© Step 4: Update a and À. 

@ Step 5: If p< P, set p <p + 1, go to step 2 (we 
can also use randomized selection), otherwise go to 
step 6. 

@ Step 6: If t= tmax, finish the training process. Oth- 
erwise set p = 1 and go to step 2. A new training 
epoch begins. 


The SOM network can be used as a tool for non- 
linear data visualization (grid dimensions 1, 2, or 3). 
In general, SOM implements constrained vector quanti- 
zation, where the codebook vectors (vector quantization 
centers) cannot move freely in the data space dur- 
ing adaptation, but are constrained to lie on a lower 
dimensional manifold W in the data space. The dimen- 
sionality of W is equal to the dimensionality of the 
neural grid. The neural grid can be viewed as a dis- 
cretized version of the local co-ordinate system Y (e.g., 
computer screen) and the weight vectors in the data 
space (connected by the neighborhood structure on the 
neuron grid) as its image in the data space. In this in- 
terpretation, the neuron positions on the grid represent 
co-ordinate functions (in the sense of differential ge- 
ometry) mapping elements of the manifold W to the 
coordinate system Y. Hence, the SOM algorithm can 
also be viewed as one particular implementation of 
manifold learning. 

There have been numerous successful applications 
of SOM in a wide variety of applications, e.g., in image 
processing, computer vision, robotics, bioinformatics, 
process analysis, and telecommunications. A good sur- 
vey of SOM applications can be found, e.g., in [27.33]. 
SOMs have also been extended to temporal domains, 
mostly by the introduction of additional feedback con- 
nections, e.g., [27.34—37]. Such models can be used for 
topographic mapping or constrained clustering of time 
series data. 


Artificial Neural Network Models | 27.7 Recursive Neural Networks 


27.7 Recursive Neural Networks 


In many application domains, data are naturally or- 
ganized in structured form, where each data item is 
composed of several components related to each other 
in a non-trivial way, and the specific nature of the task to 
be performed is strictly related not only to the informa- 
tion stored at each component, but also to the structure 
connecting the components. Examples of structured 
data are parse trees obtained by parsing sentences in 
natural language, and the molecular graph describing 
a chemical compound. 

Recursive neural networks (RecNN) [27.38, 39] are 
neural network models that are able to directly pro- 
cess structured data, such as trees and graphs. For the 
sake of presentation, here we focus on positional trees. 
Positional trees are trees for which each child has an as- 
sociated index, its position, with respect to the siblings. 
Let us understand how RecNN is able to process a tree 
by analogy with what happens when unfolding a RNN 
when processing a sequence, which can be understood 
as a special case of tree where each node v possesses 
a single child. 

In Fig. 27.13 (top) we show the unfolding in time 
of a sequence when considering a graphical model (re- 


a) Oy 
qi 
1=2 t=4 
Q- O~- Q- Q- QO + 
t= (S9 T25 Ii 
Sequence (list) Xy 
Recursive network 
b) 
a Oy 
a | aR 
L R 
b @ + 
Yv 
d e e 


Data structure (binary tree) Recursive network 


cursive network) representing, for a generic node v, the 
functional dependencies among the input information 
Xy, the state variable (hidden node) y,, and the output 
variable o,. The operator g~! represents the shift oper- 
ator in time (unit time delay), i. e., g yr = yi which 
applied to node v in our framework returns the child of 
node v. At the bottom of Fig. 27.13 we have reported 
the unfolding of a binary tree, where the recursive net- 
work uses a generalization of the shift operator, which 
given an index i and a variable associated to a vertex 
v returns the variable associated to the i-th child of v, 
i.e., q7 Yy = Yen;[v]: SO, while in RNN the network is 
unfolded in time, in RecNN the network is unfolded on 
the structure. The result of unfolding, in both cases, is 
the encoding network. The encoding network for the se- 
quence specifies how the components implementing the 
different parts of the recurrent network (e.g., each node 
of the recurrent network could be instantiated by a layer 
of neurons or by a full feed-forward neural network 
with hidden units) need to be interconnected. In the case 
of the tree, the encoding network has the same seman- 
tics: this time, however, a set of parameters (weights) 
for each child should be considered, leading to a net- 


Encoding network: time unfolding 


Frontier states 


Encoding network 


Fig. 27.13a,b Generation of the encoding network (a) for a sequence and (b) a tree. Initial states are represented by 


squared nodes 


467 


222 | d Hed 


222 | d Hed 


Part D | Neural Networks 


a), 


100 ata, Ee 647 
h363 A E B 


ob ogo i: 
80 a ae Be h2179 
ctv fl ge ad 


60 ae 


Be dbo 
Kool 


20 h1549 z 
40 60 80 100 


40 n1179 B 


$1911 
eN ay 

i sP s901 

<< $177 


60 80 100 120 


Fig. 27.15a,b Schematic illustration of a principal organization of 
biological self-organizing neural network (a) and its formal coun- 
terpart SOM ANN architecture (b) 


Fig. 27.14 The causality style of computation induced by 
the use of recursive networks is made explicit by using 
nested boxes to represent the recursive dependencies of the 
hidden variable associated to the root of the tree < 


work that, given a node v, can be described by the 
equations 


o0 =f 5 myy” f 


d J I 
j chs [v y 
P (EE Da), 


s=1i=1 i=l 


where d is the maximum number of children an input 
node can have, and weights wi are indexed on the s-th 
child. Note that it is not difficult to generalize all the 
learning algorithms devised for RNN to these extended 
equations. 

It should be remarked that recursive networks 
clearly introduce a causal style of computation, 1. e., 
the computation of the hidden and output variables for 
a vertex v only depends on the information attached to v 
and the hidden variables of the children of v. This de- 
pendence is satisfied recursively by all v’s descendants 
and is clearly shown in Fig. 27.14. In the figure, nested 
boxes are used to make explicit the recursive depen- 
dencies among hidden variables that contribute to the 
determination of the hidden variable y, associated to the 
root of the tree. 

Although an encoding network can be generated 
for a directed acyclic graph (DAG), this style of com- 
putation limits the discriminative ability of RecNN 
to the class of trees. In fact, the hidden state is not 
able to encode information about the parents of nodes. 
The introduction of contextual processing, however, al- 
lows us to discriminate, with some specific exceptions, 
among DAGs [27.40]. Recently, Micheli [27.41] also 
showed how contextual processing can be used to ex- 
tend RecNN to the treatment of cyclic graphs. 

The same idea described above for supervised neu- 
ral networks can be adapted to unsupervised mod- 
els, where the output value of a neuron typically 
represents the similarity of the weight vector associ- 
ated to the neuron with the input vector. Specifically, 
in [27.37] SOMs were extended to the processing of 
structured data (SOM-SD). Moreover, a general frame- 
work for self-organized processing of structured data 


Artificial Neural Network Models | 27.8 Conclusion 


was proposed in [27.42]. The key concepts introduced 
are: 


i) The explicit definition of a representation space R 
equipped with a similarity measure dp(-,-) to evalu- 
ate the similarity between two hidden states. 

ii) The introduction of a general representation func- 
tion, denoted rep(-), which transforms the activation 
of the map for a given input into an hidden state rep- 
resentation. 


In these models, each node v of the input struc- 
ture is represented by a tuple [x,,7,,...,7,], where 
X, is a real-valued vectorial encoding of the infor- 
mation attached to vertex v, and 7,, are real-valued 


27.8 Conclusion 


The field of artificial neural networks (ANN) has 
grown enormously in the past 60 years. There are 
many journals and international conferences specifi- 
cally devoted to neural computation and neural net- 
work related models and learning machines. The field 
has gone a long way from its beginning in the form 
of simple threshold units existing in isolation (e.g., 
the perceptron, Sect. 27.2) or connected in circuits. 
Since then we have learnt how to generalize such 
networks as parameterized differentiable models of var- 
ious sorts that can be fit to data (trained), usually 
by transforming the learning task into an optimization 
one. 

ANN models have found numerous successful 
practical applications in many diverse areas of sci- 
ence and engineering, such as astronomy, biology, 
finance, geology, etc. In fact, even though basic feed- 
forward ANN architectures were introduced long time 
ago, they continue to surprise us with successful ap- 
plications, most recently in the form of deep net- 
works [27.44]. For example, a form of deep ANN 
recently achieved the best performance on a well- 
known benchmark problem — the recognition of hand- 
written digits [27.45]. This is quite remarkable, since 
such a simple ANN architecture trained in a purely 
data driven fashion was able to outperform the current 
state-of-art techniques, formulated in more sophisti- 


vectorial representations of hidden states returned by 
the rep(-) function when processing the activation 
of the map for the i-th neighbor of v. Each neu- 
ron nj in the map is associated to a weight vector 
[W ct, e ee | The computation of the winner neu- 
ron is based on the joint contribution of the similarity 
measures d,(-,-) for the input information, and dp(-, -) 
for the hidden states, i.e., the internal representa- 
tions. Some parts of a SOM-SD map trained on DAGs 
representing visual patterns are shown in Fig. 27.15. 
Even in this case the style of computation is causal, 
ruling out the treatment of undirected and/or cyclic 
graphs. In order to cope with general graphs, recently 
a new model, named GraphSOM [27.43], was pro- 
posed. 


cated frameworks and possibly incorporating domain 
knowledge. 

ANN models have been formulated to operate in 
supervised (e.g., feed-forward ANN, Sect. 27.3; RBF 
networks, Sect. 27.5), unsupervised (e.g., SOM models, 
Sect. 27.6), semi-supervised, and reinforcement learn- 
ing scenarios and have been generalized to process 
inputs that are much more general than simple vector 
data of fixed dimensionality (e.g., the recurrent and re- 
cursive networks discussed in Sects. 27.4 and 27.7). Of 
course, we were not able to cover all important de- 
velopments in the field of ANNs. We can only hope 
that we have sufficiently motivated the interested reader 
with the variety of modeling possibilities based on the 
idea of interconnected networks of formal neurons, 
so that he/she will further consult some of the many 
(much more comprehensive) monographs on the topic, 
e.g., [27.3, 6, 7]. 

We believe that ANN models will continue to 
play an important role in modern computational in- 
telligence. Especially the inclusion of ANN-like mod- 
els in the field of probabilistic modeling can provide 
techniques that incorporate both explanatory model- 
based and data-driven approaches, while preserving 
a much fuller modeling capability through operat- 
ing with full distributions, instead of simple point 
estimates. 


469 


8°22 | d Hed 


470 PartD | Neural Networks 

References 
D 
= 
o 27.1 F. Rosenblatt: The perceptron, a probabilisticmodel 27.19 A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, 
F for information storage and organization in the H. Bunke, J. Schmidhuber: A novel connection- 
N brain, Psychol. Rev. 62, 386-408 (1958) ist system for improved unconstrained handwriting 

27.2 D.E. Rumelhart, G.E. Hinton, R.J. Williams: Learn- recognition, IEEE Trans. Pattern Anal. Mach. Intell. 
ing internal representations by error propagation. 31, 5 (2009) 

In: Parallel Distributed Processing: Explorations in 27.20 S.Hochreiter, M. Heusel, K. Obermayer: Fast model- 
the Microstructure of Cognition. Vol. 1 Founda- based protein homology detection without align- 
tions, ed. by D.E. Rumelhart, J.L. McClelland (MIT ment, Bioinformatics 23(14), 1728-1736 (2007) 

Press/Bradford Books, Cambridge 1986) pp. 318-363 27.21 H. Jaeger, H. Hass: Harnessing nonlinearity: pre- 

27.3 J. Zurada: Introduction to Artificial Neural Systems dicting chaotic systems and saving energy in wire- 
(West Publ., St. Paul 1992) less telecommunication, Science 304, 78-80 (2004) 

27.4 K. Hornik, M. Stinchocombe, H. White: Multilayer 27.22 W. Maass, T. Natschlager, H. Markram: Real-time 
feedforward networks are universal approxima- computing without stable states: A new framework 
tors, Neural Netw. 2, 359-366 (1989) for neural computation based on perturbations, 

27.5 D.J.C. MacKay: Bayesian interpolation, Neural Com- Neural Comput. 14(11), 2531-2560 (2002) 
put. 4(3), 415-447 (1992) 27.23 P. Tino, G. Dorffner: Predicting the future of discrete 

27.6 S. Haykin: Neural Networks and Learning Machines sequences from fractal representations of the past, 
(Prentice Hall, Upper Saddle River 2009) Mach. Learn. 45(2), 187-218 (2001) 

27.7 C. Bishop: Neural Networks for Pattern Recognition 27.24 M.H. Tong, A. Bicket, E. Christiansen, G. Cottrell: 
(Oxford Univ. Press, Oxford 1995) Learning grammatical structure with echo state 

27.8 T. Sejnowski, C. Rosenberg: Parallel networks that network, Neural Netw. 20, 424-432 (2007) 
learn to pronounce English text, Complex Syst. 1, 27.25 K. Ishii, T. van der Zant, V. Becanovic, P. Ploger: 
145-168 (1987) Identification of motion with echo state network, 

27.9 A. Weibel: Modular construction of time-delay Proc. OCEANS 2004 MTS/IEEE-TECHNO-OCEAN Conf., 
neural networks for speech recognition, Neural Vol. 3 (2004) pp. 1205-1210 
Comput. 1, 39-46 (1989) 27.26 L. Medsker, L.C. Jain: Recurrent Neural Networks: 

27.10 J.L. Elman: Finding structure in time, Cogn. Sci. 14, Design and Applications (CRC, Boca Raton 1999) 
179-211 (1990) 27.27 J. Kolen, S.C. Kremer: A Field Guide to Dynamical 

27.11 M.I. Jordan: Serial order: A parallel distributed pro- Recurrent Networks (IEEE, New York 2001) 
cessing approach. In: Advances in Connectionist 27.28 D. Mandic, J. Chambers: Recurrent Neural Networks 
Theory, ed. by J.L. Elman, D.E. Rumelhart (Erlbaum, for Prediction: Learning Algorithms, Architectures 
Hillsdale 1989) and Stability (Wiley, New York 2001) 

27.12 Y. Bengio, R. Cardin, R. DeMori: Speaker indepen- 27.29 J.B. MacQueen: Some models for classification and 
dent speech recognition with neural networks and analysis if multivariate observations, Proc. 5th 
speech knowledge. In: Advances in Neural Infor- Berkeley Symp. Math. Stat. Probab. (Univ. California 
mation Processing Systems II, ed. by D.S. Touretzky Press, Oakland 1967) pp. 281-297 
(Morgan Kaufmann, San Mateo 1990) pp. 218-225 27.30 M.D. Buhmann: Radial Basis Functions: Theory and 

27.13 P.J. Werbos: Generalization of backpropagation Implementations (Cambridge Univ. Press, Cam- 
with application to a recurrent gas market model, bridge 2003) 

Neural Netw. 1(4), 339-356 (1988) 27.31 G.-B. Huang, Q.-Y. Zhu, C.-K. Siew: Extreme learn- 

27.14 R.J. Williams, D. Zipser: A learning algorithm for ing machine: theory and applications, Neurocom- 
continually running fully recurrent neural net- puting 70, 489-501 (2006) 
works, Neural Comput. 1(2), 270-280 (1989) 27.32 T. Kohonen: Self-Organizing Maps, Springer Series 

27.15 Y. Bengio, P. Simard, P. Frasconi: Learning long- in Information Sciences, Vol. 30 (Springer, Berlin, 
term dependencies with gradient descent is dif- Heidelberg 2001) 
ficult, IEEE Trans. Neural Netw. 5(2), 157-166 (1994) 27.33 T. Kohonen, E. Oja, 0. Simula, A. Visa, J. Kangas: En- 

27.16 T. Lin, B.G. Horne, P. Tino, C.L. Giles: Learning long- gineering applications of the self-organizing map, 
temr dependencies with NARX recurrent neural Proc. IEEE 84(10), 1358-1384 (1996) 
networks, IEEE Trans. Neural Netw. 7(6), 1329-1338 27.34 T. Koskela, M. Varsta, J. Heikkonen, K. Kaski: Re- 
(1996) current SOM with local linear models in time series 

27.17 S. Hochreiter, J. Schmidhuber: Long short-term prediction, 6th Eur. Symp. Artif. Neural Netw. (1998) 
memory, Neural Comput. 9(8), 1735-1780 (1997) pp. 167-172 

27.18 M. Lukosevicius, H. Jaeger: Overview of Reservoir 27.35 T. Voegtlin: Recursive self-organizing maps, Neural 
Recipes, Technical Report, Vol. 11 (School of En- Netw. 15(8/9), 979-992 (2002) 
gineering and Science, Jacobs University, Bremen 27.36 M. Strickert, B. Hammer: Merge som for temporal 


2007) 


data, Neurocomputing 64, 39-72 (2005) 


Artificial Neural Network Models 


References 


27.37 


27.38 


27.39 


27.40 


27.41 


M. Hagenbuchner, A. Sperduti, A. Tsoi: Self- 
organizing map for adaptive processing of struc- 
tured data, IEEE Trans. Neural Netw. 14(3), 491-505 
(2003) 

A. Sperduti, A. Starita: Supervised neural networks 
for the classification of structures, IEEE Trans. Neu- 
ral Netw. 8(3), 714-735 (1997) 

P. Frasconi, M. Gori, A. Sperduti: A general frame- 
work for adaptive processing of data structures, 
IEEE Trans. Neural Netw. 9(5), 768-786 (1998) 

B. Hammer, A. Micheli, A. Sperduti: Universal ap- 
proximation capability of cascade correlation for 
structures, Neural Comput. 17(5), 1109-1159 (2005) 
A. Micheli: Neural network for graphs: A contextual 
constructive approach, IEEE Trans. Neural Netw. 
20(3), 498-511 (2009) 


27.42 


27.43 


27.44 


27.45 


B. Hammer, A. Micheli, A. Sperduti, M. Strickert: 
A general framework for unsupervised process- 
ing of structured data, Neurocomputing 57, 3-35 
(2004) 

M. Hagenbuchner, A. Sperduti, A.-C. Tsoi: Graph 
self-organizing maps for cyclic and unbounded 
graphs, Neurocomputing 72(7-9), 1419-1430 (2009) 
Y. Bengio, Y. Lecun: Greedy Layer-Wise Training of 
Deep Network. In: Advances in Neural Information 
Processing Systems 19, ed. by B. Schélkopf, J. Platt, 
T. Hofmann (MIT Press, Cambridge 2006) pp. 153- 
160 

D.C. Ciresan, U. Meier, L.M. Gambardella, J. Schmid- 
huber: Deep big simple neural nets for handwritten 
digit recognition, Neural Comput. 22(12), 3207-3220 
(2010) 


471 


Zz | d Hed 


28. Deep and Modular Neural Networks 


Ke Chen 


In this chapter, we focus on two important areas 
in neural computation, i.e., deep and modular 
neural networks, given the fact that both deep 
and modular neural networks are among the most 
powerful machine learning and pattern recogni- 
tion techniques for complex Al problem solving. 
We begin by providing a general overview of deep 
and modular neural networks to describe the gen- 
eral motivation behind such neural architectures 
and fundamental requirements imposed by com- 
plex Al problems. Next, we describe background 
and motivation, methodologies, major building 
blocks, and the state-of-the-art hybrid learning 
strategy in context of deep neural architectures. 
Then, we describe background and motivation, 
taxonomy, and learning algorithms pertaining to 
various typical modular neural networks in a wide 
context. Furthermore, we also examine relevant 


28.1 Overview 


The human brain is a generic effective and efficient 
system that solves complex and difficult problems and 
generates the trait of intelligence and creation. Neural 
computation has been inspired by brain-related research 
in different disciplines, e.g., biology and neuroscience, 
on various levels ranging from a simple single-neuron 
to complex neuronal structure and organization [28.1]. 
Among many discoveries in brain-related sciences, two 
of the most important properties are modularity and hi- 
erarchy of neuronal organization in the human brain. 
Neuroscientific research has revealed that the cen- 
tral nervous system (CNS) in the human brain is 
a distributed, massively parallel, and self-organizing 
modular system [28.1—3]. The CNS is composed of sev- 
eral regions such as the spinal cord, medulla oblongata, 
pons, midbrain, diencephalon, cerebellum, and the two 
cerebral hemispheres. Each such region forms a func- 
tional module and all regions are interconnected with 


28.1 Overview 2.0.00... cece ccccccceceeesssteeeeeeees 473 

28.2 Deep Neural Networks.......................... 474 

28.2.1 Background and Motivation ....... 474 
28.2.2 Building Blocks 

and Learning Algorithms............ 475 

28.2.3 Hybrid Learning Strategy ............ 480 

28.2.4 Relevant Issues ..........ceeeeeeeeeeeee 483 

28.3 Modular Neural Networks..................... 484 

28.3.1 Background and Motivation....... 484 

28.3.2 Tightly Coupled Models.............. 485 

28.3.3 Loosely Coupled Models............. 487 

28.3.4 Relevant Issues ...........0....ccceseoe- 491 

28.4 Concluding Remarks .....................00008 492 

Referents eenn nccrne 492 


issues and discuss open problems in deep and 
modular neural network research areas. 


other parts of the brain [28.1]. In particular, the cerebral 
cortex consists of several regions attributed to main per- 
ceptual and cognitive tasks, where modularity emerges 
in two different aspects: i.e., structural and functional 
modularity. Structural modularity is observable from 
the fact that there are sparse connections between dif- 
ferent neuronal groups but neurons are often densely 
connected within a neuronal group, while functional 
modularity is evident from different response patterns 
produced by neural modules for different perceptual 
and cognitive tasks. Modularity evidence in the human 
brain strongly suggests that domain-specific modules 
are required by specific tasks and different modules 
can cooperate for high level, complex tasks, which pri- 
marily motivates the modular neural network (MNN) 
development in neural computation (NC) [28.4, 5]. 
Apart from modularity, the human brain also ex- 
hibits a functional and structural hierarchy given the fact 


473 


v 
o 
= 
as 
(= 
N 
fo] 
= 


474 Part D 


Neural Networks 


7°82 | d Hed 


that information processing in the human brain is done 
in a hierarchical way. Previous studies [28.6,7] sug- 
gested that there are different cortical visual areas that 
lead to hierarchical information representations to carry 
out highly complicated visual tasks, e.g., object recog- 
nition. In general, hierarchical information processing 
enables the human brain to accomplish complex percep- 
tual and cognitive tasks in an effective and extremely 
efficient way, which mainly inspires the study of deep 
neural networks (DNNs) of multiple layers in NC. 

In general, both DNNs and MNNs can be cate- 
gorized into biologically plausible [28.8] and artificial 


28.2 Deep Neural Networks 


In this section, we overview main deep neural net- 
work (DNN) techniques with an emphasis on the latest 
progress. We first review background and motivation 
for DNN development. Then we describe major build- 
ing blocks and relevant learning algorithms for con- 
structing different DNNs. Next, we present a hybrid 
learning strategy in the context of NC. Finally, we ex- 
amine relevant issues related to DNNs. 


28.2.1 Background and Motivation 


The study of NC dates back to the 1940s when Mc- 
Cullod and Pitts modeled a neuron mathematically. 
After that NC was an active area in AI studies until 
Minsky and Papert published their influential book, Per- 
ceptron [28.10], in 1969. In the book, they formally 
proved the limited capacities of the single-layer percep- 
tron and further concluded that there is a slim chance 
to expand its capacities with its multi-layer version, 
which significantly slowed down NC research until the 
back-propagation (BP) algorithm was invented (or rein- 
vented) to solve the learning problem in a multi-layer 
perceptron (MLP) [28.11]. 

In theory, the BP algorithm enables one to train an 
MLP of many hidden layers to form a powerful DNN. 
Such an attractive technique has aroused tremendous 
enthusiasm in applying DNNs in different fields [28.9]. 
Apart from a few exceptions, e.g., [28.12], researchers 
soon found that an MLP of more than two hidden lay- 
ers often failed [28.13] due to the well-known fact 
that MLP learning involves an extremely difficult non- 
convex optimization problem, and the gradient-based 
local search used in the BP algorithm easily gets stuck 
in an unwanted local minimum. As a result, most re- 


models [28.9] in NC. The main difference between 
biologically plausible and artificial models lies their 
methodologies that a biologically plausible model of- 
ten takes both structural and functional resemblance 
to its biological counterpart into account, while an 
artificial model simply works towards modeling the 
functionality of a biological system without consider- 
ing those bio-mimetic factors. Due to the limited space, 
in this chapter we merely focus on artificial DNNs 
and MNNs. Readers interested in biologically plausi- 
ble models are referred to the literature, e.g., [28.4], for 
useful information. 


searchers gradually gave up deep architectures and 
devoted their attention to shallow learning architectures 
of theoretical justification, e.g., the formal but non- 
constructive proof that an MLP of single hidden layer 
may be a universal function approximator [28.14] anda 
support vector machine (SVM) [28.15], instead. It has 
been shown that shallow architectures often work well 
with support of effective feature extraction techniques 
(but these are often handcrafted). However, recent the- 
oretic justification suggests that learning models of 
insufficient depth have a fundamental weakness as they 
cannot efficiently represent the very complicated func- 
tions often required in complex AI tasks [28.16, 17]. 

To solve complex the non-convex optimization 
problem encountered in DNN learning, Hinton and his 
colleagues made a breakthrough by coming up with 
a hybrid learning strategy in 2006 [28.18]. The novel 
learning strategy combines unsupervised and super- 
vised learning paradigms where a layer-wise greedy 
unsupervised learning is first used to construct an ini- 
tial DNN with chosen building blocks (such an initial 
DNN alone can also be used for different purposes, 
e.g., unsupervised feature learning [28.19]), and super- 
vised learning is then fulfilled based on the pre-trained 
DNN. Their seminal work led to an emerging machine 
learning (ML) area, deep learning. As a result, differ- 
ent building blocks and learning algorithms have been 
developed to construct various DNNs. Both theoretical 
justification and empirical evidence suggest that the hy- 
brid learning strategy [28.18] greatly facilitates learning 
of DNNs [28.17]. 

Since 2006, DNNs trained with the hybrid learn- 
ing strategy have been successfully applied in differ- 
ent and complex AI tasks, such as pattern recogni- 


Deep and Modular Neural Networks | 28.2 Deep Neural Networks 475 


tion [28.20—23], various computer vision tasks [28.24— 
26], audio classification and speech information 
processing [28.27-31], information retrieval [28.32— 
34], natural language processing [28.35-37], and 
robotics [28.38]. Thus, DNNs have become one of the 
most promising ML and NC techniques to tackle chal- 
lenging AI problems [28.39]. 


28.2.2 Building Blocks 
and Learning Algorithms 


In general, a building block is composed of two para- 
metric models, encoder and decoder, as illustrated in 
Fig. 28.1. An encoder transforms a raw input or a low- 
level representation x into a high-level and abstract 
representation h(x), while a decoder generates an out- 
put x, a reconstructed version of x, from h(x). The 
learning building block is a self-supervised learning 
task that minimizes an elaborate reconstruction cost 
function to find appropriate parameters in encoder and 
decoder. Thus, the distinction between two building 
blocks of different types lies in their encoder and de- 
coder mechanisms and reconstruction cost functions (as 
well as optimization algorithms used for parameter es- 
timation). Below we describe different building blocks 
and their learning algorithms in terms of the generic ar- 
chitecture shown in Fig. 28.1. 


Auto-Encoders 
The auto-encoder [28.40] and its variants are sim- 
ple building blocks used to build an MLP of many 
layers. It is carried out by an MLP of one hidden 
layer. As depicted in Fig. 28.2, the input and the 
hidden layers constitute an encoder to generate a M- 


Fig. 28.1 Schematic diagram of a generic building block 
architecture 


dimensional representation h(x) = (h(x),...,Ay(x))' 
(hereinafter, we use the notation h(x) = (Am(x))“_, to 
indicate a vector—element relationship for simplifying 
the presentation) for a given input x = (x,)*_, in N- 
dimensional space 


h(x) =f(Wx + bn). 


where W is a connection weight matrix between the 
input and the hidden layers, bp is the bias vector for 
all hidden neurons, and f(-) is a transfer function, e.g., 
the sigmoid function [28.9]. Let f(u) = (f(uz))<, be 
a collective notation for output of all K neurons in 
a layer. Accordingly, the hidden and the output lay- 
ers form a decoder that yields a reconstructed version 
$= (EN, 


&=f(W'h(x) + bo), 


where W! is the transpose of the weight matrix W and 
b, is the bias vector for all output neurons. Note that the 
auto-encoder can be viewed as a special case of auto- 
associator when the same weights are tied to be used 
in connections between different layers, which will be 
clearly seen in the learning algorithm later on. Doing 
so avoids an unwanted solution when an over-complete 
representation, i. e., M > N, is required [28.22]. 
Further studies [28.41] suggest that the auto- 
encoder is unlikely to lead to the discovery of a more 
useful representation than the input despite the fact that 
a representation should encode much of the informa- 
tion conveyed in the input whenever the auto-encoder 
produces a good reconstruction of its input. As a re- 
sult, a variant named the denoising auto-encoder (DAE) 
was proposed to capture stable structures underlying 
the distribution of its observed input. The basic idea is 
as follows: instead of learning the auto-encoder from 
the intact input, the DAE will be trained to recover the 


Decoder i 
00 00|: 
wt 


| 
j | 
f | 
[00 00): | 
j | 
j | 


Fig. 28.2 Auto-encoder architecture 


7°82] d Hed 


476 PartD 


Neural Networks 


7°82 | d Hed 


original input from its distorted version of partial de- 
struction [28.41]. As is illustrated in Fig. 28.3, the DAE 
leads to a more useful representation h(x) by restoring 
the corrupted input x to a reconstructed version £ as 
close to the clean input x as possible. Thus, the encoder 
yields a representation as 


h(x) = f(W% + bn) , 


and the decoder produces a restored version £ via the 
representation h(x) 


&=f(W'h(€) +bo) . 


To produce a corrupted input, we need to distort a clean 
input by corrupting it with appropriate noise. Depend- 
ing on the attribute nature of input, there are three 
kinds of noise used in the corruption process: i.e., 
the isotropic Gaussian noise, N (0, oI), masking noise 
(by setting some randomly chosen elements of x to 
zero) and salt-and-pepper noise (by flipping some ran- 
domly chosen elements’ values of x to the maximum 
or the minimum of a given range). Normally, Gaus- 
sian noise is used for input of real or continuous 
values, while and masking and salt-and-pepper noise 
is applied to input of discrete values, e.g., pixel in- 
tensities of gray images. It is worth stating that the 
variance o? in Gaussian noise and the number of ran- 
domly chosen elements in masking and salt-and-pepper 
noise are hyper-parameters that affect DAE learning. 
By corrupting a clean input with the chosen noise, we 
achieve an example, (x,x), for self-supervised learn- 
ing. 

Given a training set of T examples {(x;,x;)}/_, 
(auto-encoder) or {(*;,x,)}/_, (DAE) two reconstruc- 
tion cost functions are commonly used for learning 
auto-encoders as follows 


T N 
1 A 
LW, bio) = z7 2 2 Omn), (28.1) 


t=] n=l 


L(W, bn, bo) 


T N 
1 4 ‘ 
S= XOY Gn log din + (1 — xm) log (1 — ĉm)) . 


t=1n=1 


(28.1b) 


The cost function in (28.1a) is used for input of real 
or discrete values, while the cost function in (28.1b) is 
employed especially for input of binary values. 


Fig. 28.3 Denoising auto-encoder architecture 


To minimize reconstruction functions in (28.1), 
application of the stochastic gradient descent algo- 
rithm [28.12] leads to a generic learning algorithm for 
training the auto-encoder and its variant, summarized as 
follows: 


Auto-Encoder Learning Algorithm. Given a train- 
ing set of T examples, {(z;,x,)}7_, where z; =x, for 
the auto-encoder or z; = x, for the DAE, and a transfer 
function, f(-), randomly initialize all parameters, W, bp 
and b,, in auto-encoders and pre-set a learning rate e€. 
Furthermore, the training set is randomly divided into 
several batches of Tg examples, enx) 2 and then 
parameters are updated based on each batch: 


@ Forward computation 
For the input z, (t= 1,--- , Tg), output of the hidden 
layer is 


h(z,) = fun) . 


And output of the output layer is 


¥, =f(Uo(Z:)) , 


@ Backward gradient computation 
For the cost function in (28.1a), 


OL(W, bn, bo) 
duo (21) 


unz) = Wz: + bn . 


Uo(Z:) = W'h(z,) +b, . 


= (Gn = Xm) f’ (o aED, ’ 


where f’(-) is the first-order derivative function of 
f(-). For the cost function in (28.1b), 


OL(W, by, Bo) ig 
duo (Zr) = 
Then, the gradient with respect to h(z,) is 
OL(W, bn, bo), OL(W, bn, bo) 
oh (z) OU, (Zr) 


Deep and Modular Neural Networks | 28.2 Deep Neural Networks 


Applying the chain rule achieves the gradient with 
respect to uj,(Z;) as 


AL(W, By, Bo) 


/ ƏL(W, bn, bo) \" 
dun (Zr) z (r (nna) Ee ) 


dhn (z) 


m=1 
Gradients with respect to biases are 


ƏL(W, br, bo)  ƏL(W, bn, bo) 
ðb, duo (Zr) ? 


and 


oL(W, bn, bo) z oL(W, bn, bo) 
dbn, dun (z:) 


@ Parameter update 
Applying the gradient descent method and tied 
weights leads to update rules 


Tg 
€ ƏL(W, br bo) at 
wew-= | 
<" T l ame) C"! 


t=1 


ƏL(W, bn, bo) \' 
men a h 
oO t 
Tg 
E ƏL(W, bn, bo) 
b, <b, ; 
T > OU, (Z:) 
and 
Tg 
e€ & aL(W, bn, bo) 
b, <b, 
A T > ðu, (Zr) 


The above three steps repeat for all batches, which 
leads to a training epoch. The learning algorithm runs 
iteratively until a termination condition is met (typically 
based on a cross-validation procedure [28.12]). 


The Restricted Boltzmann Machine 
Strictly speaking, the restricted Boltzmann machine 
(RBM) [28.42] is an energy-based generative model, 
a simplified version of the generic Boltzmann machine. 
As illustrated in Fig. 28.4, an RBM can be viewed 
as a probabilistic NN of two layers, i.e., visible and 
hidden layers, with bi-directional connections. Unlike 
the Boltzmann machine, there are no lateral connec- 
tions among neurons in the same layer in an RBM. 
With the bottom-up connections from the visible to the 


Hidden layer 


wT 


Encoder 
= 
Jopoooq 


Visible layer 


Fig. 28.4 Restricted Boltzmann machine (RBM) architec- 
ture 


hidden layer, RBM forms an encoder that yields a prob- 
abilistic representation h = (h,,)”_, for input data v = 


(Vn y 1 


M 
P(h|v) = | | Panl»), 
m=1 
N 
P(hnlv) = (o (>: Win T hn) ` (28.2) 
n=l 


where W,,,, is the connection weight between the vis- 
ible neuron n and the hidden neuron m, and bj.» 
is the bias of the hidden neuron m. ¢(u) = I~ 
is the sigmoid transfer function. As h,, is assumed 
to take a binary value, i.e., hm € {0,1}, P(hm|v) is 
interpreted as the probability of n= 1. Accord- 
ingly, RBM performs a probabilistic decoder via the 
top-down connections from the hidden to the visi- 
ble layer to reconstruct an input with the probabil- 


ity 


M 
Poh) = [] Poult). 
m=1 
N 
P(valh) = ġ ( SO Wamltm + ha) ; (28.3) 
m= 1 


where W,„m is the connection weight between the hid- 
den neuron m and the visible neuron n, and by n is 
the bias of visible neuron n. Like connection weights 
in auto-encoders, bi-directional connection weights are 
tied, i.e., Win = Wm, aS shown in Fig. 28.4. By 
learning a parametric model of the data distribution 
P(v) derived from the joint probability P(v,h) for 
a given data set, RBM yields a probabilistic repre- 
sentation that tends to reconstruct any data subject 
to P(v). 


477 


7°82] d Hed 


478 PartD 


Neural Networks 


7°82 | d Hed 


The joint probability P(v, h) is defined based on the 
following energy function for v, € {0, 1} 


M N 
E(v, h) = X > WinlinVn 


m=1n=1 


N 


M 
= >D hmbh.m = = Vnby,m . 


m=1 n=1 


(28.4) 


As a result, the joint probability is subject to the Boltz- 
mann distribution 


eT EW.h) 


Zy a eh 


Thus, we achieve the data probability by marginalizing 
the joint probability as follows 


PY) = X P0, h) = D> PO|A)P(h). 
h h 


P(v,h) = (28.5) 


In order to achieve the most likely reconstruction, we 
need to maximize the log-likelihood of P(v). Therefore, 
the reconstruction cost function of an RBM is its nega- 
tive log-likelihood function 


L(W, bn, by) = — log P() = — log È ` P(|h)P(h) . 
h 


(28.6) 


From (28.5) and (28.6), it is observed that the direct 
use of a gradient descent method for optimal parame- 
ters often leads to intractable computation due to the 
fact that the exponential number of possible hidden- 
layer configurations needs to be summed over in (28.5) 
and then used in (28.6). Fortunately, an approximation 
algorithm named contrastive divergence (CD) has been 
proposed to solve this problem [28.42]. The key ideas 
behind the CD algorithm are (i) using Gibbs sampling 
based on the conditional distributions in (28.2) and 
(28.3), and (ii) running only a few iterations of Gibbs 
sampling by treating the data x input to an RBM as the 
initial state, i. e., v? = x, of the Markov chain at the vis- 
ible layer. Many studies have suggested that only the 
use of one iteration of the Markov chain in the CD algo- 
rithm works well for building up a deep belief network 
(DBN) in practice [28.17, 18,22], and hence the algo- 
rithm is dubbed CD-/ in this situation, a special case 
of the CD-k algorithm that executes k iterations of the 
Markov chain in the Gibbs sampling process. 


XO; -0 
k=0 k=1 


Data Reconstruction 


Fig. 28.5 Gibbs sampling process in the CD-/ algorithm 


Figure 28.5 illustrates a Gibbs sampling process 
used in the CD-J algorithm as follows 


i) Estimating probabilities P(h°|v°), for m= 1, 


--+ ,M, with the encoder defined in (28.2) and then 
forming a realization of h? by sampling with these 
probabilities. 

ii) Applying the decoder defined in (28.3) to estimate 
probabilities P(v}|h®), for n= 1,--- ,N, and then 
producing a reconstruction of v! via sampling. 

iii) With the reconstruction, estimating probabilities 
P(h}|v!), for m = 1,--- , M, with the encoder. 


m 


With the above Gibbing sampling procedure, the 
CD-1 algorithm is summarized as follows: 


Algorithm 28.1 RBM CD-1 Learning Algorithm 
Given a training set of T instances, {x,}/_,, randomly 
initialize all parameters, W, b, and b,, in an RBM and 
pre-set a learning rate €: 


@ Positive phase 


— Present an instance to the visible layer, i.e., 
0 


ve =X; 
- Estimate probabilities with the encoder: 
P(h®|v°) = (P(A°|v°))“_, by using (28.2). 


@ Negative phase 
- Formarealization of h? by sampling with prob- 
abilities P(h°|v°). 
— With the realization of h°, apply the de- 
coder to estimate probabilities: Pv'|h°) = 


(P(vo\h°))_, by using (28.3), and then pro- 
duce a reconstruction v! via sampling based on 
Pi! |h?). 


— With the encoder and the reconstruction, es- 
timate probabilities: P(A! |v!) = (P(h},|v!))“_, 
by using (28.2). 
@ Parameter update 
Based on Gibbs sampling results in the positive and 


Deep and Modular Neural Networks | 28.2 Deep Neural Networks 479 


the negative phases, parameters are updated as fol- 
lows: 


W-W+e (Pno) — Êh’ yoy") 
b, <b, te (Pa) -Êh b») , 

and 
b, — by +€ (v? —v'). 


The above three steps repeat for all instances in the 
given training set, which leads to a training epoch. The 
learning algorithm runs iteratively until it converges. 


Predictive Sparse Decomposition 
Predictive sparse decomposition (PSD) [28.43] is 
a building block obtained by combining sparse cod- 
ing [28.44] and auto-encoder ideas. In a PSD building 
block, the encoder is specified by 


h(x,) = Gtanh(Wex, + bn) , (28.7) 


where Wg is the MXN connection matrix between input 
and hidden neurons in the encoder and G = diag (8mm) 
is an MxM learnable diagonal gain matrix for an M- 
dimensional representation of an N-dimensional input, 
x;, b are biases of hidden neurons, and tanh(-) is the hy- 
perbolic tangent transfer function [28.9]. Accordingly, 
the decoder is implemented by a linear mapping used in 
the sparse coding [28.44] 


&, = Wph(x,) , (28.8) 


where Wp is an NxM connection matrix between hidden 
and output neurons in the decoder, and each column of 
Wp always needs to be normalized to a unit vector to 
avoid trivial solutions [28.44]. 

Given a training set of T instances, {x I p the PSD 
cost function is defined as 


Lesp (G, We, Wp, br; h* (x,)) 


i 
= Y` Woh" œ) — xl? + alk" Œ): 


t=1 


+ B\|h* œŒ) -kE . (28.9) 


where h*(x,) is the optimal sparse hidden representa- 
tion of x, while h(x,) is the output of the encoder in 
(28.7) based on the current parameter values. In (28.9), 
a and # are two hyper-parameters to control regular- 
ization strengths, and ||- ||, and ||- ||, are £; and £2 
norm, respectively. Intuitively, in the multi-objective 


cost function defined in (28.9), the first term specifies 
reconstruction errors, the second term refers to the mag- 
nitude of non-sparse representations, and the last term 
drives the encoder towards yielding the optimal repre- 
sentation. 

For learning a PSD building block, the cost function 
in (28.9) needs to be optimized simultaneously with 
respect to the hidden representation and all the param- 
eters. As a result, a learning algorithm of two alternate 
steps has been proposed to solve this problem [28.43] 
as follows: 


Algorithm 28.2 PSD Learning algorithm 

Given a training set of T instances {x}, randomly 
initialize all the parameters, Wg, Wp, G, bp, and the 
optimal sparse representation {h*(x,)}7_, in a PSD 
building block and pre-set hyper-parameters œ and f 
as well as learning rates e€; (i = 1,--- ,4): 


@ Optimal representation update 
In this step, the gradient descent method is applied 
to find the optimal sparse representation based on 
the current parameter values of the encoder and the 
decoder, which leads to the following update rule 


h* (x;) —h* (x;)—€ [wsign(h* (x;)) 
+ B(h* (x) —h(x,)) 
+(Wp)'(Wph* œ) —x1)] , 


where sign(-) is the sign function; sign(w) = +1 if 
u = O and sign(u) = 0 if u = 0. 
@ Parameter update 

In this step, h*(x,) achieved in the above step is 
fixed. Then the gradient descent method is applied 
to the cost function (28.9) with respect to all en- 
coder and decoder parameters, which results in the 
following update rules 


We < We — ega) a) , 
by, io by, — €39(x;) > 


Here g(x,) is obtained by 
B(X1) = [ium Erm — Mtn) 
OED — hin) Jy 


m=1 ' 


G < G—endiag (x (x;) — ba] 


N 
x tanh (Ewin + bun) > 
n=l 


7°82] d Hed 


480 PartD 


Neural Networks 


7°82 | d Hed 


and 
Wp <— Wp — €4 [Wph* (œ) — x+] [h* œ] 
Normalize each column of Wp such that 
[Wo] l2 = 1 forn=1,,N. 


The above two steps repeat for all the instances in 
the given training set, which leads to a training epoch. 
The learning algorithm runs iteratively until it con- 
verges. 


Other Building Blocks 
While the auto-encoders and the RBM are building 
blocks widely used to construct DNNs, there are other 
building blocks that are either derived from existing 
building blocks for performance improvement or are 
developed with an alternative principle. Such build- 
ing blocks include regularized auto-encoders and RBM 
variants. Due to the limited space, we briefly overview 
them below. 

Recently, a number of auto-encoder variants have 
been developed by adding a regularization term to 
the standard reconstruction cost function in (28.1) and 
hence are dubbed regularized auto-encoders. The con- 
trastive auto-encoder (CAE) is a typical regularized 
version of the auto-encoder with the introduction of the 
norm of the Jacobian matrix of the encoder evaluated at 
each training example x, into the standard reconstruc- 
tion cost function [28.45] 


: 
Lcae(W, bn, bo) = L(W, br, bo) +0 >> IEDIG, 


t=1 


(28.10) 


where a is a trade-off parameter to control the regular- 
ization strength and ||J(x;)||? is the Frobenius norm of 
the Jacobian matrix of the encoder and is calculated as 
follows 


Welz = 2 ST 


m=l n=l 


f'n maD W, 


Here, f’ (-) is the first-order derivative of a transfer func- 
tion fC), and f [hn (&1)] = hy (x,)[1 = hn (X1)] when 
f(-) is the sigmoid function [28.9]. It is straightfor- 
ward to apply the stochastic gradient method [28.12] 


to the CAE cost function in (28.10) to derive a learn- 
ing algorithm used for training a CAE. Furthermore, 
an improved version of CAE was also proposed by 
penalizing additional higher order derivatives [28.46]. 
The sparse auto-encoder (SAE) is another class of 
regularized auto-encoders. The basic idea underlying 
SAEs is the introduction of a sparse regularization term 
working on either hidden neuron biases, e.g., [28.47], 
or their outputs, e.g., [28.48], into the standard re- 
construction cost function. Different forms of sparsity 
penalties, e.g., £; norm and student-t, are employed 
for regularization, and the learning algorithm is derived 
by applying the coordinate descent optimization pro- 
cedure to a new reconstruction cost function [28.47, 
48]. 

The RBM described above works only for an input 
of binary values. When an input has real values, a vari- 
ant named Gaussian RBM (GRBM) [28.49], has been 
proposed with the following energy function 


E(v,h) = 


E: S Wonln = ose 


m=1n=1 


N 
( Ge Dyn)? 
_ 3 An Drm a 2 sch 


m=1 n=l 


(28.11) 


where o, is the standard deviation of the Gaussian 
noise for the visible neuron n. In the CD learning al- 
gorithm, the update rule for the hidden neurons remains 
the same except that each v, is substituted by =, and 
the update rule for all visible neurons needs "fo use 
reconstructions v, produced by sampling from a Gaus- 
sian distribution with mean on i Wrimlm + by, and 
variance o? for n = 1,--- , N. In addition, an improved 
GRBM was also proposed by introducing an alternative 
parameterization of the energy function in (28.11) and 
incorporating it into the CD algorithm [28.50]. Other 
RBM variants will be discussed later on, as they of- 
ten play a different role from being used to construct 
a DNN. 


28.2.3 Hybrid Learning Strategy 


Based on the building blocks described in Sect. 28.2.2, 
we describe a systematic approach to establish- 
ing a feed-forward DNN for supervised and semi- 
supervised learning. This approach employs a hybrid 
learning strategy that combines unsupervised and su- 
pervised learning paradigms to overcome the optimiza- 
tion difficulty in training DNNs. The hybrid learning 


Deep and Modular Neural Networks | 28.2 Deep Neural Networks 481 


strategy [28.18, 40] first applies layer-wise greedy un- 
supervised learning to set up a DNN and initialize 
parameters with input data only and then uses a global 
supervised learning algorithm with teachers’ informa- 
tion to train all the parameters in the initialized DNN 
for a given task. 


Layer-Wise Unsupervised Learning 
In the hybrid learning strategy, unsupervised learning 
is a layer-wise greedy learning process that constructs 
a DNN with a chosen building block and initializes pa- 
rameters in a layer-by-layer way. 

Suppose we want to establish a DNN of K (K>1) 
hidden layers and denote output of hidden layer k as 
h(x) (k = 1,--- , K) for a given input x and output of 
the output layer as o(x), respectively. To facilitate the 
presentation, we stipulate ho(x) = x. Then, the generic 
layer-wise greedy learning procedure can be summa- 
rized as follows: 


Algorithm 28.3 Layer-wise greedy learning proce- 
dure 

Given a training set of T instances {eH randomly 
initialize its parameters in a chosen building block and 
pre-set all hyper-parameters required for learning such 
a building block: 


@ Train a building block for hidden layer k 
— Set the number of neurons required by hidden 
layer k to be the dimension of the hidden repre- 
sentation in the chosen building block. 
— Use the training data set {h,—1(x;)}/_, train the 
building block to achieve its optimal parame- 
ters. 
@ Construct a DNN up to hidden layer k 
With the trained building block in the above step, 
discard its decoder part, including all associated pa- 
rameters, and stack its hidden layer on the existing 
DNN with connection weights of the encoder and 
biases of hidden neurons achieved in the above step 
(the input layer ho(x) = x is viewed as the starting 
architecture of a DNN). 


The above steps are repeated for k= 1,--- ,K. 
Then, the output layer o (x) is stacked onto hidden layer 
K with randomly initialized connection weights so as to 
finalize the initial DNN construction and its parameter 
initialization. 


Figure 28.6 illustrated two typical instances for 
constructing an initial DNN via the layer-wise greedy 
learning procedure described above. Figure 28.6a 


a) hki) 


hgx) 


hgx) 


b) na) 


h(x) 


Fig. 28.6a,b Construction of a DNN with a building block via 
layer-wise greedy learning. (a) Auto-encoder or its variants. 


(b) RBM or its variants 


shows a schematic diagram of the layer-wise greedy 
learning process with the auto-encoder or its vari- 
ants; to construct the hidden layer k, the output layer 
and its associated parameters W7 and b, are re- 
moved and the remaining part is stacked onto hid- 
den layer k— 1, and W, is a randomly initialized 
weight matrix for the connection between the hid- 
den layer K and the output layer. When a DNN is 
constructed with the RBM or its variants, all back- 
ward connection weights in the decoder are abandoned 
after training and only the hidden layer with those 
forward connection weights and biases of hidden neu- 
rons are used to construct the DNN, as depicted in 
Fig. 28.6b. 


h(x) 


h(x) 


7°82 | d Hed 


482 


7°82 | d Hed 


Part D 


Neural Networks 


Global Supervised Learning 
Once a DNN is constructed and initialized based on 
the layer-wise greedy learning procedure, it is ready 
to be further trained in a supervision fashion for 
a classification or regression task. There are a vari- 
ety of optimization methods for supervised learning, 
e.g., stochastic gradient descent and the second-order 
Levenberg—Marquadt methods [28.9, 12]. Also there 
are cost functions of different forms used for various 
supervised learning tasks and regularization towards 
improving the generalization of a DNN. Due to the 
limited space, we only review the stochastic gradi- 
ent descent algorithm with a generic cost function for 
global supervised learning. 

For a generic cost function L(O,D), where © 
is a collective notation of all parameters in a DNN 
(Fig. 28.6) and D is a training data set for a given su- 
pervised learning task, applying the stochastic gradient 
descent method [28.9, 12] to L(©@, D) leads to the fol- 
lowing learning algorithm for fine-tuning parameters. 


Algorithm 28.4 Global supervised learning algo- 
rithm 

Given a training set of T examples D = {(x,,y,)}7_, 
pre-set a learning rate € (and other hyper-parameters 
if required). Furthermore, the training set is ran- 
domly divided into many mini-batches of Tg examples 
{x yE =, and then parameters are updated based on 
each mini-batch. © = GEWETE, boD are all pa- 
rameters in a DNN, where W, is the weight matrix 
for the connection between the hidden layers k and 
k— 1, and b; is biases of neurons in layer k (Fig. 28.6). 
Here, input and output layers are stipulated as layers 0 
and K + 1, respectively, i. e., ho(x;) = x;, Wo = Wx-+1, 
b, = bg4ı and o(x;) = hk+1(%;): 

@ Forward computation 


Given the input x,, for k = 1,--- 
of layer k is 


, K+1, the output 


hix) =f(UK@%1)) , Wer) = Wih Œ) + Dx . 
@ Backward gradient computation 
Given a cost function on each mini-batch, Lg (©, D) 
calculate gradients at the output layer, i. e., 


ILg(O,D) _ dLg(O, D) 

dhk (x) d0(x;) 

ƏL (O, D) a , i 
= (x , 

ðug+ı (x) Tey AU j=l 


where f”(-) is the first-order derivative of the transfer 
function f(-). 


For all hidden layers, i.e., k= K,--- , 1, applying 
the chain rule leads to 
aLp(O, D) aLp(O, D) lhal 
= = ( OES (ugi) 
dur (X;) ðhy(x) j=l 


and 
OL,(O, D) = ( (i) dL (9, D) 
Ahy(x;) ETT Bupa Ge) 


Fork=K+1,---,1 
ases of layer k are 


gradients with respect to bi- 


ILp(O,D) _ 
ab, 


OLp(O, D) 
ður (x;) 


© Parameter update 
Applying the gradient descent method results in the 
following update rules: 
For k = K+1,---,1 


Tg 
€ dLg(O, D) T 
W; <— W, h;— (x : 
kW nZ ut) CED 
dLg(O0, D) 
b; — by — ay eo) Ju; @,) 3 


The above three steps repeat for all mini-batches, 
which leads to a training epoch. The learning algorithm 
runs iteratively until a termination condition is met (typ- 
ically based on a cross-validation procedure [28.12]). 


For the above learning algorithm, the BP algo- 
rithm [28.11] is a special case when the transfer func- 
tion is the sigmoid function, i. e., f (u) = ġ (u), and the 
cost function is the mean square error (MSE) function, 
i.e., for each mini-batch 


Tg 
1 
L(0.D) = z7 X lowe) —yill3 . 
t=1 


Thus, we have 


p'u) = o(w—¢(u)). 

ðLe(0©, D) 1 Ë 

ae = Ts X oœ) =J) 
t=1 


Deep and Modular Neural Networks | 28.2 Deep Neural Networks 


and 
aLp(O, D) 
OuK+1 (x) 


t=1 


=1 


lol 
ieee 
= È Soe) ren a6) 
J 


28.2.4 Relevant Issues 


In the literature, the hybrid learning strategy described 
in Sect. 28.2.3 is often called the semi-supervised learn- 
ing strategy [28.17, 39]. Nevertheless, semi-supervised 
learning implies the situation that there are few labeled 
examples but many unlabeled instances in a training set. 
Indeed, such a strategy works well in a situation where 
both unlabeled and labeled data in a training set are 
used for layer-wise greedy learning, and only labeled 
data are used for fine-tuning in global supervised learn- 
ing. However, other studies, e.g., [28.28, 29,51], also 
show that this strategy can considerably improve the 
generalization of a DNN even though there are abun- 
dant labeled examples in a training data set. Hence we 
would rather name it hybrid learning. On the other hand, 
our review focuses on only primary supervised learn- 
ing tasks in the context of NC. In a wider context, the 
unsupervised learning process itself develops a novel 
approach to automatic feature discovery/extraction via 
learning, which is an emerging ML area named rep- 
resentation learning [28.39]. In such a context, some 
DNNs can perform a generative model. For instance, 
the DBN [28.18] is a RBM-based DNN by retaining 
both forward and backward connections during layer- 
wise greedy learning. To be a generative model, the 
DBN needs an alternative learning algorithm, e.g., the 
wake-sleep algorithm [28.18], for global unsupervised 
learning. In general, the global unsupervised learning 
for a generative DNN is still a challenging problem. 
While the hybrid learning strategy has been success- 
fully applied to many complex AI tasks, in general, it is 
still not entirely clear why such a strategy works well 
empirically. A recent study attempted to provide some 
justification of the role played by layer-wise greedy 
learning for supervised learning [28.51]. The findings 
can be summarized as follows: such an unsupervised 
learning process brings about a regularization effect that 
initializes DNN parameters towards the basin of attrac- 
tion corresponding to a good local minimum, which 
facilitates global supervised learning in terms of gener- 
alization [28.51]. In general, a deeper understanding of 
such a learning strategy will be required in the future. 


On the other hand, a successful story was recently re- 
ported [28.52] where no unsupervised pre-training was 
used in the DNN learning for a non-trivial task; which 
poses another open problem as to when and where 
such a learning strategy must be employed for training 
a DNN to yield a satisfactory generalization perfor- 
mance. 

Recent studies also suggest that the use of arti- 
ficially distorted or deformed training data and un- 
supervised front-ends can considerably improve the 
performance of DNNs regardless of the hybrid learning 
strategy. As DNN learning is of the data-driven nature, 
augmenting training data with known input deforma- 
tion amounts to the use of more representative examples 
conveying intrinsic variations underlying a class of 
data in learning. For example, speech corrupted by 
some known channel noise and deformed images by 
using affine transformation and adding noise have sig- 
nificantly improved the DNN performance in various 
speech and visual information processing tasks [28.22, 
28, 29,51,52]. On the other hand, the generic build- 
ing blocks reviewed in Sect. 28.2.2 can be extended 
to be specialist front-ends by exploiting intrinsic data 
structures. For instance, the RBM has several vari- 
ants, e.g., [28.53-55], to capture covariance and other 
statistical information underlying an image. After un- 
supervised learning, such front-ends generate power- 
ful representations that greatly facilitate further DNN 
learning in visual information processing. 

While our review focuses on only fully connected 
feed-forward DNNs, there are alternative and more 
effective DNN architectures for specific tasks. Convo- 
lutional DNNs [28.12] make use of topological locality 
constraints underlying images to form more effective 
locally connected DNN architecture. Furthermore, var- 
ious pooling techniques [28.56] used in convolutional 
DNNs facilitate learning invariant and robust features. 
With appropriate building blocks, e.g., the PSD re- 
viewed in Sect. 28.2.2, convolutional DNNs work very 
well with the hybrid learning strategy [28.43,57]. In 
addition, novel DNN architectures need to be devel- 
oped by exploring the nature of a specific problem, e.g., 
a regularized Siamese DNN was recently developed for 
generic speaker-specific information extraction [28.28, 
29]. As a result, novel DNN architecture development 
and model selection are among important DNN re- 
search topics. 

Finally, theoretical justification of deep learning and 
the hybrid learning strategy, along with other developed 
recently techniques, e.g., parallel graphics processing 
unit (GPU) computing, enable researchers to develop 


483 


7°82] d Hed 


484 PartD 


Neural Networks 


€°8z | d Hed 


large-scale DNNs to tackle very complex real world 
problems. While some theoretic justification has been 
provided in the literature, e.g., [28.16, 17,39], to show 
strengths in their potential capacity and efficient rep- 
resentational schemes of DNNs, more and more suc- 
cessful applications of DNNs, particulary working with 
the hybrid learning strategy, lend evidence to support 
the argument that DNNs are one of the most promis- 


28.3 Modular Neural Networks 


In this section, we review main modular neural net- 
works (MNN) and their learning algorithms with our 
own taxonomy. We first review background and moti- 
vation for MNN research and present our MNN tax- 
onomy. Then we describe major MNN architectures 
and relevant learning algorithms. Finally, we exam- 
ine relevant issues related to MNNs in a boarder 
context. 


28.3.1 Background and Motivation 


Soon after neural network (NN) research resurged in the 
middle of the 1980s, MNN studies emerged; they have 
become an important area in NC since then. There are 
a variety of motivations that inspire MNN researches, 
e.g., biological, psychological, computational, and im- 
plementation motivations [28.4,5,9]. Here, we only 
describe the background and motivation of MNN re- 
searches from learning and computational perspectives. 

From the learning perspective, MNNs have several 
advantages over monolithic NNs. First of all, MNNs 
adopt an alternative methodology for learning, so that 
complex problem can be solved based an ensemble of 
simple NNs, which might avoid/alleviate the complex 
optimization problems encountered in monolithic NN 
learning without decreasing the learning capacity. Next, 
modularity enables MNNs to use a priori knowledge 
flexibly and facilitates knowledge integration and up- 
date in learning. To a great extent, MNNs are immune 
to temporal and spatial cross-talk, a problem faced by 
monolithic NNs during learning [28.9]. Finally, theoret- 
ical justification and abundant empirical evidence show 
that an MNN often yields a better generalization than 
its component networks [28.5,59]. From the compu- 
tational perspective, modularization in MNNs leads to 
more efficient and robust computation, given the fact 
that MNNs often do not suffer from a high coupling 
burden in a monolithic NN and hence tend to have 


ing learning systems for dealing with complex and 
large-scale real world problems. For example, such evi- 
dence can be found from one of the latest developments 
in a DNN application to computer vision where it is 
demonstrated that applying a DNN of nine layers con- 
structed with the SAE building block via layer-wise 
greedy learning results in the favorable performance in 
object recognition of over 22 000 categories [28.58]. 


a lower overall structural complexity in tackling the 
same problem [28.5]. This main computational merit 
makes MNNs scalable and extensible to large-scale 
MNN implementation. 

There are two highly influential principles that are 
often used in artificial MNN development; i. e., divide- 
and-conquer and diversity-promotion. The divide-and- 
conquer principle refers to a generic methodology that 
tackles a complex and difficult problem by dividing it 
into several relatively simple and easy subproblems, 
whose solutions can be combined seamlessly to yield 
a final solution. On the other hand, theoretical justifica- 
tion [28.60, 61] and abundant empirical studies [28.62] 
suggest that apart from the condition that component 
networks need to reach some certain accuracy, the suc- 
cess of MNNs are largely attributed to diversity among 
them. Hence, the promotion of diversity in MNNs be- 
comes critical in their design and development. To 
understand motivations and ideas underlying different 
MNNs, we believe that it is crucial to examine how two 
principles are applied in their development. 

There are different taxonomies of MNNs [28.4, 5, 
9]. In this chapter, we present an alternative taxonomy 
that highlights the interaction among component net- 
works in an MNN during learning. As a result, there 
is a dichotomy between tightly and loosely coupled 
models in MNNs. In a tightly coupled MNN, all com- 
ponent networks are jointly trained in a dependent way 
by taking their interaction into account during a single 
learning stage, and hence all parameters of different net- 
works (and combination mechanisms if there are any) 
need to be updated simultaneously by minimizing a cost 
function defined at the global level. In contrast, training 
of a loosely coupled MNN often undergoes multi- 
ple stages in a hierarchical or sequential way where 
learning undertaken in different stages may be either 
correlated or uncorrelated via different strategies. We 
believe that such a taxonomy facilitates not only un- 


Deep and Modular Neural Networks | 28.3 Modular Neural Networks 


derstanding different MNNs especially from a learning 
perspective but also relating MNNs to generic ensemble 
learning in a broader context. 


28.3.2 Tightly Coupled Models 


There are two typical tightly coupled MNNs: the mix- 
ture of experts (MoE) [28.63, 64] and MNNs trained via 
negative correlation learning (NCL) [28.65]. 


Mixture of Experts 
The MoE [28.63, 64] refers to a class of MNNs that 
dynamically partition input space to facilitate learn- 
ing in a complex and non-stationary environment. 
By applying the divide-and-conquer principle, a soft- 
competition idea was proposed to develop the MoE 
architecture. That is, at every input data point, multiple 
expert networks compete to take on a given supervised 
learning task. Instead of winner-take-all, all expert net- 
works may work together but the winner expert plays 
a more important role than the losers. 

The MoE architecture is composed of N expert net- 
works and a gating network, as illustrated in Fig. 28.7. 
The n-th expert network produces an output vector, 
0, (x), for an input, x. The gating network receives the 
vector x as input and produces N scalar outputs that 
form a partition of the input space at each point x. For 
the input x, the gating network outputs N linear combi- 
nation coefficients used to verdict the importance of all 
expert networks for a given supervised learning task. 
The final output of MoE is a convex weighted sum of 
all the output yielded by N expert networks. Although 
NNs of different types can be used as expert networks, 
a class of generalized linear NNs are often employed 
where such an NN is linear with a single output non- 


o(x) 


Gating 
network 


oix) on(x) 
Expert Expert 
network 1 network N 
x, x 


Fig. 28.7 Architecture of the mixture of experts 


linearity [28.64]. As a result, output of the n-th expert 
network is a generalized linear function of the input x 


0n(x) = f (Wx) >, 


where W, is a parameter matrix, a collective notation 
for both connection weights and biases, and f(-) is 
a nonlinear transfer function. The gating network is also 
a generalized linear model, and its n-th output g(x, v,,) 
is the softmax function of v'x 


8, vn) = ——T > 


where v, is the n-th column of the parameter matrix V 
in the gating network and is responsible for the linear 
combination coefficient regarding the expert network n. 
The overall output of the MoE is the weighted sum re- 
sulted from the soft-competition at the point x 


N 
ox) =) > g(x, vo). 


n=l 


There is a natural probabilistic interpretation of the 
MoE [28.64]. For a training example (x, y), the values 
of g(x, V) = (g(x, Pe 4 are interpreted as the multi- 
nomial probabilities associated with the decision that 
terminates in a regressive process that maps x to y. 
Once a decision has been made that leads to a choice 
of regressive process n, the output y is chosen from 
a probability distribution P(y|x, W,,). Hence, the over- 
all probability of generating y from x is the mixture of 
the probabilities of generating y from each component 
distribution and the mixing proportions are subject to 
a multinomial distribution 


N 
PO|x, O) = X g, v) POX, Wr) » 


n=l 


(28.12) 


where © is a collective notation of all the parameters in 
the MoE, including both expert and gating network pa- 
rameters. For different learning tasks, specific compo- 
nent distribution models are required. For example, the 
probabilistic component model should be a Gaussian 
distribution for a regression task, while a Bernoulli dis- 
tribution and multinomial distributions are required for 
binary and multi-class classification tasks, respectively. 
In general, MoE is viewed as a conditional mixture 
model for supervised learning, a non-trivial extension 
of finite mixture model for unsupervised learning. 


485 


€°82 | d Hed 


486 PartD 


Neural Networks 


€°8z | d Hed 


By means of the above probabilistic interpretation, 
learning in the MoE is treated as a maximum likeli- 
hood problem defined based on the model in (28.12). 
An expectation-maximization (EM) algorithm was pro- 
posed to update parameters in the MoE [28.64]. It is 
summarized as follows: 


Algorithm 28.5 EM algorithm for MoE learning 
Given a training set of T examples D = {œ yA}; 
pre-set the number of expert networks, N, and randomly 
initialize all the parameters © = {V, (W,,)*_,} in the 
MoE: 


© E-step 
For each example, (x;,y;) in D, estimate posterior 
probabilities, h, = Ga is with the current pa- 
rameter values, V and {W,}_, 


g(x, Pn) POX, W,,) 


Dnt = N a a R 
Xai gr PAPY: |X We) 
@ M-step 
— For expert network n (n = 1,--- ,n), solve the 


maximization problems 


t 
W, = arg max ) hnt log P(Y ixr, Wn) » 
n 


t=1 


with all examples in D and posterior probabili- 
ties {h,}!_, achieved in the E-step. 
— For the gating network, solve the maximization 


problem 


T N 
V = arg max) X hn log g(x, Vn), 


t=1n=1 


with training examples, {(x;, h;) on derived 
from posterior probabilities {h} i 

Repeat the E-step and the M-step alternately until 
the EM algorithm converges. 


To solve optimization problems in the M-step, the 
iteratively re-weighted least squares (IRLS) algorithm 
was proposed [28.64]. Although the IRLS algorithm 
has the strength to solve maximum likelihood prob- 
lems arising from MoE learning, it might result in 
some instable performance due to its incorrect assump- 
tion on multi-class classification [28.66]. As learning 
in the gating network is a multi-class classification 


task in essence, the problem always exists if the IRLS 
algorithm is used in the EM algorithm. Fortunately, 
improved algorithms were proposed to remedy this 
problem in the EM learning [28.66]. In summary, nu- 
merous MoE variants and extensions have been de- 
veloped in the past 20 years [28.59], and the MoE 
architecture turns out to be one of the most successful 
MNNs. 


Negative Correlation Learning 
The NCL [28.65] is a learning algorithm to establish 
an MNN consisting of diverse neural networks (NNs) 
by promoting the diversity among component networks 
during learning. The NCL development was clearly in- 
spired by the bias-variance analysis of generalization 
errors [28.60, 61]. As a result, the NCL encourages co- 
operation among component networks via interaction 
during learning to manage the bias-variance trade-off. 

In the NCL, an unsupervised penalty term is in- 
troduced to the MSE cost function for each com- 
ponent NN so that the error diversity among com- 
ponent networks is explicitly managed via training 
towards negative correlation. Suppose that an NN en- 
semble F(x,@) is established by simply taking the 
average of N neural networks f(x, W,) (n =1,--- , N), 
where W, denotes all the parameters in the n-th com- 
ponent network and © = {W,}‘_,. Given a training 
set D = {(x;,y,)}1_, the NCL cost function for the 
n-th component network [28.65] is defined as fol- 
lows 


T 
1 
L(D, Wn) == OT 5 Wf, Wn) -=y:ll2 
t=1 


a T 
Z. X Fr. Wn) — Fer, OVS , 


t=1 


(28.13) 


where F(x,;, ©) = i SLf, W,„) and A is a trade- 
off hyper-parameter. In (28.13), the first term is the 
MSE cost for network n and the second term refers 
to the negative correlation cost. By taking all compo- 
nent networks into account, minimizing the second term 
leads to maximum negative correlation among them. 
Therefore, A needs to be set properly to control the 
penalty strength [28.65]. 

For the NCL, all N cost functions specified in 
(28.13) need to be optimized together for parameter es- 
timation. Based on the stochastic descent method, the 
generic NCL algorithm is summarized as follows: 


Deep and Modular Neural Networks | 28.3 Modular Neural Networks 


Algorithm 28.6 Negative correlation learning al- 
gorithm 

Given a training set of T examples D = {(x;,y,)}1_, 
pre-set the number of component networks, N, and 
learning rate, €, as well as randomly initialize all the 
parameters © = {W,,}_, in component networks: 


© Output computation 
For each example, (x,,y;) in D, calculate output of 
each component network f(x;, W,,) and that of the 
NN ensemble by 


1 N 
Fer, 0) = = Df er, Wa). 


n=l 


@ Gradient computation 
For component network n (n = 1,--- , N), calculate 
the gradient of the NCL cost function in (28.13) 
with respect to the parameters based on all training 
examples in D 


OL(D, Wr) 1< 
a ee 2 æn Wn) — Yello 
_MN=1) ee Wn) 


— tpt Wa) -FEO 


d Wha 


@ Parameter update 
For component network n (n= 1,--- ,N), update 
the parameters 


dL(D, Wn) 
€ : 


Wn < Wn 
aw, 


Repeat the above three steps until a pre-specified 
termination condition is met. 


While the NCL was originally proposed based on 
the MSE cost function, the NCL idea can be extended to 
other cost functions without difficulty. Hence, applying 
appropriate optimization techniques on alternative cost 
functions leads to NCL algorithms of different forms 
accordingly. 


28.3.3 Loosely Coupled Models 


In a loosely coupled model, component networks are 
trained independently or there is no direct interaction 
among them during learning. There are a variety of 
MNNs that can be classified as loosely coupled mod- 


els. Below we review several typical loosely coupled 
MNNs. 


Neural Network Ensemble 
An neural network ensemble here refers to a committee 
machine where a number of NNs trained indepen- 
dently but their outputs are somehow combined to reach 
a consensus as a final solution. The development of 
NN ensembles is explicitly motivated by the diversity- 
promotion principle [28.67, 68]. 

Intuitively, errors made by component NNs can be 
corrected by taking their diversity or mutual comple- 
ment into account. For example, three NNs, NN; (i = 
1,2, 3), trained on the same data set have different yet 
imperfect performance on test data. Given three test 
points, x;, (t = 1, 2,3), NN; yields the correct output for 
x2 and x3 but does not for xı, NN> yields the correct out- 
put for x;, and x3 but does not for x2 and NN; yields the 
correct output for xı and xz but does not for x3, respec- 
tively. In such circumstances, an error made by one NN 
can be corrected by other two NNs with a majority vote 
so that the ensemble can outperform any component 
NNs. Formally, there is a variety of theoretical justifica- 
tion [28.60, 61] for NN ensembles. For example, it has 
been proven for regression that the NN ensemble per- 
formance is never inferior to the average performance 
of all component NNs [28.60]. Furthermore, a theo- 
retical bias-variance analysis [28.61] suggests that the 
promotion of diversity can improve the performance of 
NN ensembles provided that there is an adequate trade- 
off between bias and variance. In general, there are 
two non-trivial issues in constructing NN ensembles; 
i.e., creating diverse component NNs and ensembling 
strategies. 

Depending on the nature of a given problem [28.5, 
62], there are several methodologies for creating diverse 
component NNs. First of all, a NN learning process 
itself can be exploited. For instance, learning in a mono- 
lithic NN often needs to solve a complex non-convex 
optimization problem [28.9]. Hence, a local-search- 
based learning algorithm, e.g., BP [28.11], may end up 
with various solutions corresponding to local minima 
due to random initialization. In addition, model selec- 
tion is required to find out an appropriate NN structure 
for a given problem. Such properties can be exploited 
to create component networks in a homogeneous NN 
ensemble [28.67]. Next, NNs of different types trained 
on the same data may also yield different performance 
and hence are candidates in a heterogeneous NN en- 
semble [28.5]. Finally, exploration/exploitation of input 
space and different representations is an alternative 


487 


€°82 | d Hed 


488 PartD 


Neural Networks 


€°8z2 | d Hed 


methodology for creating different component NNs. In- 
stead of training an NN on the input space, NNs can be 
trained on different input subspaces achieved by a par- 
titioning method, e.g., random partitioning [28.69], 
which results in a subspace NN ensemble. Whenever 
raw data can be characterized by different represen- 
tations, NNs trained on different feature sets would 
constitute a multi-view NN ensemble [28.70]. 

Ensembling strategies are required for different 
tasks. For regression, some optimal fusion rules have 
been developed for NN ensembles, e.g., [28.68], which 
are supported by theoretical justification, e.g., [28.60, 
61]. For classification, ensembling strategies are more 
complicated but have been well-studied in a wider con- 
text, named combination of multiple classifiers. As is 
shown in Fig. 28.8, ensembling strategies are gener- 
ally divided into two categories: learnable and non- 
learnable. Learnable strategies use a parametric model 
to learn an optimal fusion rule, while non-learnable 
strategies fulfil the combination by directly using the 
statistics of all competent network outputs along with 
simple measures. As depicted in Fig. 28.8, there are 
six main non-learnable fusion rules: sum, product, min, 
max, median, and majority vote; details of such non- 
learnable rules can be found in [28.71]. Below, we focus 
on the main learnable ensembling strategies in terms of 
classification. 

In general, learnable ensembling strategies are 
viewed as an application of the stacked generalization 
principle [28.72]. In light of stacked generalization, 
all component NNs serve as level 0 generalizers, and 
a learnable ensembling strategy carried out by a combi- 
nation mechanism would perform a level 1 generalizer 
working on the output space of all component NNs to 


Input Input 
dependent independent 


Soft 
competition 


Evidence 
reasoning 


Associative 
switch 


Bayesian 
fusion 


Ensembling 
strategies 


improve the overall generalization. In this sense, such 
a combination mechanism is trained on a validation 
data set that is different from the training data set used 
in component NN learning. As is shown in Fig. 28.8, 
combination mechanisms have been developed from 
different perspectives, i. e., input-dependent and input- 
independent. 

An input-dependent mechanism combines compo- 
nent NNs based on test data; i.e., given two inputs, 
xı and x2; there is the property: c(x;|O) Æ c(x2|0) 
if x; AX2, where c(x|O) = (cnal: is an input- 
dependent mechanism used to combine an ensemble of 
N component NNs and © collectively denotes all learn- 
able parameters in this parametric model. As a result, 
output of an NN ensemble with the input-dependent en- 
sembling strategy is of the following form 


olx) = 2 (01%), ,on(x) | e(x|@)), 


where 0,,(x) is output of the n-th component NN for 
n=1,---,N and &2 indicates a method on how to ap- 
ply c(x|©) to component NNs. For example, {2 may be 
a linear combination scheme such that 


N 


o(x) = Ý cn(e|@)on(x) . 


n=1 


(28.14) 


As listed in Fig. 28.8, soft-competition and associative 
switch are two commonly used input-dependent com- 
bination mechanisms. The soft-competition mechanism 
can be regarded as a special case of the MoE described 
earlier when all expert networks were trained on a data 
set independently in advance. In this case, the gating 
network plays the role of the combination mechanism 


Non-learnable 


Sum || Prod || Min || Max || Med Majority 
vote 


Linear 


combination 


Fig. 28.8 A taxonomy of ensembling strategies 


Deep and Modular Neural Networks | 28.3 Modular Neural Networks 


by deciding the importance of component NNs via soft- 
competition. Although various learning models may be 
used as such a gating network, a RBF-like (radial ba- 
sis function) parametric model [28.73] trained on the 
EM algorithm has been widely used for this purpose. 
Unlike a soft-competition mechanism that produces the 
continuous-value weight vector c(x) used in (28.14), 
the associative switch [28.74] adopts a winner-take-all 
strategy, i. €., = Cn(x|O) = 1 and c,(x|@) € {0, 1}. 
Thus, an associative switch yields a specific code for 
a given input so that the output of the best performed 
component NN can be selected as the final output of 
the NN ensemble according to (28.14). The associative 
switch learning is a multi-class classification problem, 
and an MLP is often used to carry it out [28.74]. 
Although an input-dependent ensembling strategy is ap- 
plicable to most NN ensembles, it is difficult to apply it 
to multi-view NN ensembles, since different represen- 
tations need to be considered simultaneously in training 
a combination mechanism. Fortunately, such issues 
have been explored in a wider context on how to use dif- 
ferent representations simultaneously for ML [28.70, 
75-79] so that both soft-competition and associative 
switch mechanisms can be extended to multi-view NN 
ensembles. 

In contrast, an input-independent mechanism com- 
bines component NNs based on the dependence of their 
outputs without considering input data directly. Given 
two inputs x; and x2, and x; Ax, the same c(@) 
may be applied to outputs of component NNs, where 
c(@) = (c,(@))*_, is an input-independent combina- 
tion mechanism used to combine an ensemble of N 
component NNs. Several input-independent mecha- 
nisms have been developed [28.62], which often fall 
into one of three categories, i.e., Bayesian fusion, ev- 
idence reasoning, and a linear combination scheme, as 
shown in Fig. 28.8. Bayesian fusion [28.80] refers to 
a class of combination schemes that use the informa- 
tion collected from errors made by component NNs 
on a validation set in order to find out the optimal 
output of the maximum a posteriori probability, C* = 
arg max) <;<z P(C)|0;(x),--: ,on(x), ©), via Bayesian 
reasoning, where C; is the label for the /-th class 
in a classification task of L classes, and © here en- 
codes the information gathered, e.g., a confusion matrix 
achieved during learning [28.80]. Similarly, evidence 
reasoning mechanisms make use of alternative reason- 
ing theories [28.80], e.g., the Dempster-Shafer theory, 
to yield the best output for NN ensembles via an ev- 
idence reasoning process that works on all outputs of 
component NNs in an ensemble. Finally, linear com- 


bination schemes of different forms are also popular 
as input-independent combination mechanisms [28.62]. 
For instance, the work presented in [28.68] exemplifies 
how to achieve optimal linear combination weights in 
a linear combination scheme. 


Constructive Modularization Learning 
Efforts have also been made towards constructive mod- 
ularization learning for a given supervised learning 
task. In such work, the divide-and-conquer principle 
is explicitly applied in order to develop a constructive 
learning strategy for modularization. The basic idea be- 
hind such methods is to divide a difficult and complex 
problem into a number of subproblems that are eas- 
ily solvable by NNs of proper capacities, matching the 
requirements of the subproblems, and then the solu- 
tions to subproblems are combined seamlessly to form 
a solution to the original problem. On the other hand, 
constructive modularization learning may alleviate the 
model selection problem encountered by a monolithic 
NN. As NNs of simple and even different architectures 
may be used to solve subproblems, empirical stud- 
ies suggest that an MNN generated via constructive 
modularization learning is often insensitive to compo- 
nent NN architectures and hence is less likely to suffer 
from overall overfitting or underfitting [28.81]. Below 
we describe two constructive modularization learning 
strategies [28.8 1-83] for exemplification. 

The partitioning-based strategy [28.81,82] per- 
forms the constructive modularization learning by ap- 
plying the divide-and-conquer principle explicitly. For 
a given supervised learning task, the strategy consists 
of two learning stages: dividing and conquering. In 
the dividing stage, it first recursively partitions the 
input space into overlapping subspaces, which facili- 
tates dealing with various uncertainties, by taking into 
supervision information into account until the nature 
of each subproblem defined in generated subspaces 


Fig. 28.9 A self-generated tree-structured MNN 


489 


€°82 | d Hed 


490 ©PartD 


Neural Networks 


€°82 | d Hed 


matches the capacity of one pre-selected NN. In the 
conquering stage, an NN works on a given input sub- 
space to complete the corresponding learning subtask. 
As a result, a tree-structured MNN is self-generated, 
where a learnable partitioning mechanism P, is situ- 
ated at intermediate levels and NNs works at leaves 
of the tree, as illustrated in Fig. 28.9. To enable the 
partition-based constructive modularization learning, 
two generic algorithms have been proposed, i. e., grow- 
ing and credit-assignment algorithms [28.81,82] as 
summarized below. 


Algorithm 28.7 Growing algorithm 

Given a training set D, set X <— D. Randomly initialize 
parameters in all component NNs in a given repository 
and pre-set hyper-parameters in a learnable partitioning 
mechanism and compatibility criteria, respectively: 


@ Compatibility test 
For a training (sub)set X, apply the compatibility 
criteria to X to examine whether the learning task 
defined on X matches the capacity of a component 
NN in the repository. 

© Partitioning space 
If none in the repository can solve the problem de- 
fined on X, then train the partitioning mechanism 
on the current X to partition it into two overlapped 
Xı and X,. Set X < X4, then go to the compatibility 
test step. Set X < X,, then go to the compatibility 
test step. 
Otherwise, go to the subroblem solving step. 

@ Subproblem solving 
Train this NN on X with an appropriate learning 
algorithm. The trained NN resides at the current leaf 
node. 


The growing algorithm expands a tree-structured 
MNNs until learning problems defined on all parti- 
tioned subspaces are solvable with NNs in the reposi- 
tory. 


For a given test data point, output of such a tree- 
structured MNN may depend on several component 
NNs at the leaves of the tree since the input space 
is partitioned into overlapping subspaces. A credit- 
assignment algorithm [28.81, 82] has been developed to 
weight the importance of component NNs contributed 
to the overall output, which is summarized as follows: 


Algorithm 28.8 Credit-assignment algorithm 
P(x) is a trained partitioning mechanisms that resides 
at a nonterminal node and partitions the current input 


(sub)space into two subspaces with an overlapping de- 
fined by —t < P(x) < t(t > 0). Cr(-), and Cr(-) are 
two credit assignment functions for two subspaces, re- 
spectively. For a test data point x: 


© Initialization 
Set a(x) <— 1 and Pointer < Root. 
© Credit assignment 
As a recursive credit propagation process to assign 
credits to all the component NNs at leaf nodes that 
x can reach, CR[a(x), Pointer] consists of three 
steps: 
- If Pointer points to a leaf node, then output 
a(x) and stop. 
— If P(x) <rt, a(x) < a(x)xC_(P(x)) and invoke 
CR[a(x), Pointer.Leftchild]. 
— If P(x) > ~-r, a(x) << a(X)xCr(P(x)) and in- 
voke CR[a(x), Pointer.Rightchild]. 


Thus, the output of a self-generated MNN is 


oč) = D> an(%)xo,(#) , 


nEN 


where N denotes all the component NNs that x can 
reach, and a, (x) and 0,,(x) are the credit assigned and 
the output of the n-th component NN in W for x, re- 
spectively. 


To implement such a strategy, hyper-planes placed 
with heuristics [28.81] or linear classifiers trained with 
the Fisher discriminative analysis [28.82] were first 
used as the partition mechanism and NNs such as 
MLP or RBF can be employed to solve subproblems. 
Accordingly, two piece-wise linear credit assignment 
functions [28.81,82] were designed for the hyper- 
plane partitioning mechanism, so that Cr (x) + Cr(x) = 
1. Heuristic compatibility criteria were developed by 
considering learning errors and efficiency [28.81, 82]. 
By using the same constructive learning algorithms 
described above, an alternative implementation was 
also proposed by using the self-organization map as 
a partitioning mechanism and SVMs were used for sub- 
problem solving [28.84]. Empirical studies suggest that 
the partitioning-based strategy leads to favorable results 
in various supervised learning tasks despite different 
implementations [28.81, 82, 84]. 

By applying the divide-and-conquer principle, task 
decomposition [28.83] is yet another constructive mod- 
ularization learning strategy for classification. Unlike 
the partitioning-based strategy, the task decomposition 
strategy converts a multi-class classification task into 


Deep and Modular Neural Networks | 28.3 Modular Neural Networks 


a number of binary classification subtasks in a brute- 
force way and each binary classification subtask is 
expected to be fulfilled by a simple NN. If a subtask 
is too difficult to carry out by a given NN, the subtask is 
allowed to be further decomposed into simpler binary 
classification subtasks. For a multi-class classification 
task of M categories, the task decomposition strategy 
first exhaustively decomposes it into 5M (M — 1) differ- 
ent primary binary subtasks where each subtask merely 
concerns classification between two different classes 
without taking remaining M —2 classes into account, 
which differs from the commonly used one-against-rest 
decomposition method. In general, the original multi- 
class classification task may be decomposed into more 
binary subtasks if some primary subtasks are too dif- 
ficult. Once the decomposition is completed, all the 
subtasks are undertaken by pre-selected simple NNs, 
e.g., MLP of one hidden layer, in parallel. For a final 
solution to the original problem, three non-learnable op- 
erations, min, max, and inv, were proposed to combine 
individual binary classification results achieved by all 
the component NNs. By applying three operations prop- 
erly, all the component NNs are integrated together to 
form a min-max MNN [28.83]. 


28.3.4 Relevant Issues 


In general, studies of MNNs closely relate to several 
areas in different disciplines, e.g., ML and statistics. We 
here examine several important issues related to MNNs 
in a wider context. 

As described above, a tightly coupled MNN leads 
to an optimal solution to a given supervised learn- 
ing problem. The MoE is rooted in the finite mixture 
model (FMM) studied in probability and statistics and 
becomes a non-trivial extension to conditional models 
where each expert is a parametric conditional prob- 
abilistic model and the mixture coefficients also de- 
pend on input [28.64]. While the MoE has been well 
studied for 20 years [28.59] in different disciplines, 
there still exist some open problems in general, e.g., 
model selection, global optimal solution, and conver- 
gence of its learning algorithms for arbitrary component 
models. Different from the FMM, the product of ex- 
perts (PoE) [28.42] was also proposed to combine 
a number of experts (parametric probabilistic mod- 
els) by taking their product and normalizing the re- 
sult into account. The PoE has been argued to have 
some advantages over the MoE [28.42] but has so 
far merely been developed in the context of unsu- 
pervised learning. As a result, extending the PoE to 


conditional models for supervised learning would be 
a non-trivial topic in tightly coupled MNN studies. 
On the other hand, the NCL [28.65] directly applies 
the bias-variance analysis [28.60,61] to construction 
of an MNN. This implies that MNNs could be also 
built up via alternative loss functions that properly 
promote diversities among component MNNs during 
learning. 

Almost all existing NN ensemble methods are now 
included in ensemble learning [28.85], which is an 
important area in ML, or the multiple classifier sys- 
tem [28.62] in the context of pattern recognition. In 
statistical ensemble learning, generic frameworks, e.g., 
boosting [28.86] and bootstrapping [28.87], were devel- 
oped to construct ensemble learners where any learning 
models including NNs may be used as component 
learners. Hence, most of common issues raised for 
ensemble learning are applicable to NN ensembles. 
Nevertheless, ensemble learning researches suggest that 
behaviors of component learners may considerably af- 
fect the stability and overall performance of ensemble 
learning. As exemplified in [28.88], properties of dif- 
ferent NN ensembles are worth investigating from both 
theoretical and application perspectives. 

While constructive modularization learning pro- 
vides an alternative way of model selection, it is gener- 
ally a less developed area in MNNs, and existing meth- 
ods are subject to limitation due to a lack of theoretical 
justification and underpinning techniques. For example, 
a critical issue in the partitioning-based strategy [28.8 1, 
82] is how to measure the nature of a subproblem to 
decide if any further partitioning is required and the 
appropriateness of a pre-selected NN to a subproblem 
in terms of its capacity. In previous studies [28.81, 82], 
a number of heuristic and simple criteria were proposed 
based on learning errors and efficiency. Although such 
heuristic criteria work practically, there is no theoretical 
justification. As a result, more sophisticated compatibil- 
ity criteria need to be developed for such a constructive 
learning strategy based on the latest ML development, 
e.g., manifold and adaptive kernel learning. Fortunately, 
the partitioning-based strategy has inspired the latest 
developments in ML [28.89]. In general, constructive 
modularization learning is still a non-trivial topic in 
MNN research. 

Finally, it is worth stating that our MNN review here 
only focuses on supervised learning due to the limited 
space. Most MNNs described above may be extended 
to other learning paradigms, e.g., semi-supervised and 
unsupervised learning. More details on such topics are 
available in the literature, e.g., [28.90, 91]. 


491 


€°82 | d Hed 


492 Part D | Neural Networks 


8z | d Hed 


28.4 Concluding Remarks 


In this chapter, we have reviewed two important ar- 
eas, DNNs and MNNs, in NC. While we have pre- 
sented several sophisticated techniques that are ready 
for applications, we have discussed several challeng- 
ing problems in both deep and modular neural net- 
work research as well. Apart from other non-trivial 
issues discussed in the chapter, it is worth empha- 
sizing that it is still an open problem to develop 
large-scale DNNs and MNNs and integrate them for 


References 


modeling highly intelligent behaviors, although some 
progress has been made recently [28.58]. In a wider 
context, DNNs and MNNs are closely related to two 
active areas, deep learning and ensemble learning, in 
ML. We anticipate that motivation and methodolo- 
gies from different perspectives will mutually ben- 
efit each other and lead to effective solutions to 
common challenging problems in the NC and ML 
communities. 


28.1 E.R. Kandel, J.H. Schwartz, T.M. Jessell: Principle of 
Neural Science, 4th edn. (McGraw-Hill, New York 
2000) 

28.2 G.M. Edelman: Neural Darwinism: Theory of Neural 
Group Selection (Basic Books, New York 1987) 

28.3 J.A. Fodor: The Modularity of Mind (MIT Press, Cam- 
bridge 1983) 

28.4 F. Azam: Biologically inspired modular neural net- 
works, Ph.D. Thesis (School of Electrical and Com- 
puter Engineering, Virginia Polytechnic Institute 
and State University, Blacksburg 2000) 

28.5 G. Auda, M. Kamel: Modular neural networks: 
A survey, Int. J. Neural Syst. 9(2), 129-151 (1999) 

28.6 D.C. Van Essen, C.H. Anderson, D.J. Fellman: In- 
formation processing in the primate visual system, 
Science 255, 419-423 (1992) 

28.7 J.H. Kaas: Why does the brain have so many visual 
areas?, J. Cogn. Neurosci. 1(2), 121-135 (1989) 

28.8 G. Bugmann: Biologically plausible neural compu- 
tation, Biosystems 40(1), 11-19 (1997) 

28.9 S. Haykin: Neural Networks and Learning Machines, 
3rd edn. (Prentice Hall, New York 2009) 

28.10 M. Minsky, S. Papert: Perceptrons (MIT Press, Cam- 
bridge 1969) 

28.11 D.E. Rumelhurt, G.E. Hinton, R.J. Williams: Learn- 
ing internal representations by error propagation, 
Nature 323, 533-536 (1986) 

28.12 Y. LeCun, L. Bottou, Y. Bengio, P. Haffner: Gradient 
based learning applied to document recognition, 
Proc. IEEE 86(9), 2278-2324 (1998) 

28.13 G. Tesauro: Practical issues in temporal difference 
learning, Mach. Learn. 8(2), 257-277 (1992) 

28.14 G. Cybenko: Approximations by superpositions of 
sigmoidal functions, Math. Control Signals Syst. 
2(4), 302-314 (1989) 

28.15 N. Cristianini, J. Shawe-Taylor: An Introduction to 
Support Vector Machines and Other Kernel-Based 
Learning Methods (Cambridge University Press, 
Cambridge 2000) 

28.16 Y. Bengio, Y. LeCun: Scaling learning algorithms to- 
wards Al. In: Large-Scale Kernel Machines, ed. by 


L. Bottou, 0. Chapelle, D. DeCoste, J. Weston (MIT 
Press, Cambridge 2006), Chap. 14 

28.17 Y. Bengio: Learning deep architectures for Al, 
Found. Trends Mach. Learn. 2(1), 1-127 (2009) 

28.18 G.E. Hinton, S. Osindero, Y. Teh: A fast learning al- 
gorithm for deep belief nets, Neural Comput. 18(9), 
1527-1554 (2006) 

28.19 Y. Bengio: Deep learning of representations for un- 
supervised and transfer learning, JMLR: Workshop 
Conf. Proc., Vol. 7 (2011) pp. 1-20 

28.20 H. Larochelle, D. Erhan, A. Courville, J. Bergstra, 
Y. Bengio: An empirical evaluation of deep archi- 
tectures on problems with many factors of vari- 
ation, Proc. Int. Conf. Mach. Learn. (ICML) (2007) 
pp. 473-480 

28.21 R. Salakhutdinov, G.E. Hinton: Learning a nonlin- 
ear embedding by preserving class neighbourhood 
structure, Proc. Int. Conf. Artif. Intell. Stat. (AISTATS) 
(2007) 

28.22 H. Larochelle, Y. Bengio, J. Louradour, P. Lamblin: 
Exploring strategies for training deep neural net- 
works, J. Mach. Learn. Res. 10(1), 1-40 (2009) 

28.23 W.K. Wong, M. Sun: Deep learning regularized 
Fisher mappings, IEEE Trans. Neural Netw. 22(10), 
1668-1675 (2011) 

28.24 S. Osindero, G.E. Hinton: Modeling image patches 
with a directed hierarchy of Markov random field, 
Adv. Neural Inf. Process. Syst. (NIPS) (2007) pp. 1121- 
1128 

28.25 l. Levner: Data driven object segmentation, Ph.D. 
Thesis (Department of Computer Science, University 
of Alberta, Edmonton 2008) 

28.26 H. Mobahi, R. Collobert, J. Weston: Deep learning 
from temporal coherence in video, Proc. Int. Conf. 
Mach. Learn. (ICML) (2009) pp. 737-744 

28.27 H. Lee, Y. Largman, P. Pham, A. Ng: Unsupervised 
feature learning for audio classification using con- 
volutional deep belief networks, Adv. Neural Inf. 
Process. Syst. (NIPS) (2009) 

28.28 K. Chen, A. Salman: Learning speaker-specific 
characteristics with a deep neural architec- 


Deep and Modular Neural Networks 


References 


28.29 


28.30 


28.31 


28.32 


28.33 


28.34 


28.35 


28.36 


28.37 


28.38 


28.39 


28.40 


28.41 


28.42 


28.43 


28.44 


ture, IEEE Trans. Neural Netw. 22(11), 1744-1756 
(2011) 

K. Chen, A. Salman: Extracting speaker-specific 
information with a regularized Siamese deep net- 
work, Adv. Neural Inf. Process. Syst. (NIPS) (2011) 

A. Mohamed, G.E. Dahl, G.E. Hinton: Acoustic mod- 
eling using deep belief networks, IEEE Trans. Audio 
Speech Lang. Process. 20(1), 14-22 (2012) 

G.E. Dahl, D. Yu, L. Deng, A. Acero: Context- 
dependent pre-trained deep neural networks for 
large-vocabulary speech recognition, IEEE Trans. 
Audio Speech Lang. Process. 20(1), 30-42 (2012) 

R. Salakhutdinov, G.E. Hinton: Semantic hashing, 
Proc. SIGIR Workshop Inf. Retr. Appl. Graph. Model. 
(2007) 

M. Ranzato, M. Szummer: Semi-supervised learn- 
ing of compact document representations with 
deep networks, Proc. Int. Conf. Mach. Learn. (ICML) 
(2008) 

A. Torralba, R. Fergus, Y. Weiss: Small codes and 
large databases for recognition, Proc. Int. Conf. 
Comput. Vis. Pattern Recogn. (CVPR) (2008) pp. 1- 
8 

R. Collobert, J. Weston: A unified architecture for 
natural language processing: Deep neural networks 
with multitask learning, Proc. Int. Conf. Mach. 
Learn. (ICML) (2008) 

A. Mnih, G.E. Hinton: A scalable hierarchical dis- 
tributed language model, Adv. Neural Inf. Process. 
Syst. (NIPS) (2008) 

J. Weston, F. Ratle, R. Collobert: Deep learning via 
semi-supervised embedding, Proc. Int. Conf. Mach. 
Learn. (ICML) (2008) 

R. Hadsell, A. Erkan, P. Sermanet, M. Scoffier, 
U. Muller, Y. LeCun: Deep belief net learning in 
a long-range vision system for autonomous of- 
froad driving, Proc. Intell. Robots Syst. (IROS) (2008) 
pp. 628-633 

Y. Bengio, A. Courville, P. Vincent: Representa- 
tion learning: A review and new perspectives, IEEE 
Trans. Pattern Anal. Mach. Intell. 35(8), 1798-1827 
(2013) 

Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle: 
Greedy layer-wise training of deep networks, Adv. 
Neural Inf. Process. Syst. (NIPS) (2006) 

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, 
P.A. Manzagol: Stacked denoising autoencoders: 
Learning useful representations in a deep network 
with a local denoising criterion, J. Mach. Learn. 
Res. 11, 3371-3408 (2010) 

G.E. Hinton: Training products of experts by min- 
imizing contrastive divergence, Neural Comput. 
14(10), 1771-1800 (2002) 

K. Kavukcuoglu, M. Ranzato, Y. LeCun: Fast infer- 
ence in sparse coding algorithms with applications 
to object recognition. CoRR, arXiv:1010.3467 (2010) 

B.A. Olshausen, D.J. Field: Sparse coding with an 
overcomplete basis set: A strategy employed by V1?, 
Vis. Res. 37, 3311-3325 (1997) 


28.45 


28.46 


28.47 


28.48 


28.49 


28.50 


28.51 


28.52 


28.53 


28.54 


28.55 


28.56 


28.57 


28.58 


28.59 


28.60 


S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Ben- 
gio: Contracting auto-encoders: Explicit invariance 
during feature extraction, Proc. Int. Conf. Mach. 
Learn. (ICML) (2011) 

S. Rifai, G. Mesnil, P. Vincent, X. Muller, Y. Ben- 
gio, Y. Dauphin, X. Glorot: Higher order contractive 
auto-encoder, Proc. Eur. Conf. Mach. Learn. (ECML) 
(2011) 

M. Ranzato, C. Poultney, S. Chopra, Y. LeCun: Ef- 
ficient learning of sparse representations with an 
energy based model, Adv. Neural Inf. Process. Syst. 
(NIPS) (2006) 

M. Ranzato, Y. Boureau, Y. LeCun: Sparse feature 
learning for deep belief networks, Adv. Neural Inf. 
Process. Syst. (NIPS) (2007) 

G.E. Hinton, R. Salakhutdinov: Reducing the di- 
mensionality of data with neural networks, Science 
313, 504-507 (2006) 

K. Cho, A. Ilin, T. Raiko: Improved learning 
of Gaussian-Bernoulli restricted Boltzmann ma- 
chines, Proc. Int. Conf. Artif. Neural Netw. (ICANN) 
(2011) 

D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, 
P. Vincent, S. Bengio: Why does unsupervised pre- 
training help deep learning?, J. Mach. Learn. Res. 
11, 625-660 (2010) 

D.C. Ciresan, U. Meier, L.M. Gambardella, J. Schmid- 
huber: Deep big simple neural nets for handwrit- 
ten digit recognition, Neural Comput. 22(1), 1-14 
(2010) 

M. Ranzato, A. Krizhevsky, G.E. Hinton: Factored 3- 
way restricted Boltzmann machines for modeling 
natural images, Proc. Int. Conf. Artif. Intell. Stat. 
(AISTATS) (2010) pp. 621-628 

M. Ranzato, V. Mnih, G.E. Hinton: Generating more 
realistic images using gated MRF's, Adv. Neural Inf. 
Process. Syst. (NIPS) (2010) 

A. Courville, J. Bergstra, Y. Bengio: Unsupervised 
models of images by spike-and-slab RBMs, Proc. 
Int. Conf. Mach. Learn. (ICML) (2011) 

H. Lee, R. Grosse, R. Ranganath, A.Y. Ng: Unsuper- 
vised learning of hierarchical representations with 
convolutional deep belief networks, Commun. ACM 
54(10), 95-103 (2011) 

D. Hau, K. Chen: Exploring hierarchical speech rep- 
resentations with a deep convolutional neural net- 
work, Proc. U.K. Workshop Comput. Intell. (UKCI) 
(2011) 

Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, 
G.S. Corrado, J. Dean, AY. Ng: Building high- 
level features using large scale unsupervised 
learning, Proc. Int. Conf. Mach. Learn. (ICML) 
(2012) 

S.E. Yuksel, J.N. Wilson, P.D. Gader: Twenty years of 
mixture of experts, IEEE Trans. Neural Netw. Learn. 
Syst. 23(8), 1177-1193 (2012) 

A. Krogh, J. Vedelsby: Neural network ensembles, 
cross validation, and active learning, Adv. Neural 
Inf. Process. Syst. (NIPS) (1995) 


493 


8z | d Hed 


494 PartD 


Neural Networks 


8z | d Hed 


28.61 


28.62 


28.63 


28.64 


28.65 


28.66 


28.67 


28.68 


28.69 


28.70 


28.71 


28.72 


28.73 


28.74 


28.75 


28.76 


N. Ueda, R. Nakano: Generalization error of ensem- 
ble estimators, Proc. Int. Conf. Neural Netw. (ICNN) 
(1996) pp. 90-95 

L.l. Kuncheva: Combining Pattern Classifiers 
(Wiley-Interscience, Hoboken 2004) 

R.A. Jacobs, M.I. Jordan, S. Nowlan, G.E. Hinton: 
Adaptive mixture of local experts, Neural Comput. 
3(1), 79-87 (1991) 

M.I. Jordan, R.A. Jacobs: Hierarchical mixture of ex- 
perts and the EM algorithm, Neural Comput. 6(2), 
181-214 (1994) 

Y. Liu, X. Yao: Simultaneous training of negatively 
correlated neural networks in an ensemble, IEEE 
Trans. Syst. Man Cybern. B 29(6), 716-725 (1999) 

K. Chen, L. Xu, H.S. Chi: Improved learning algo- 
rithms for mixture of experts in multi-class classi- 
fication, Neural Netw. 12(9), 1229-1252 (1999) 

L.K. Hansen, P. Salamon: Neural network ensem- 
bles, IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 
993-1001 (1990) 

M.P. Perrone, L.N. Cooper: Ensemble methods for 
hybrid neural networks. In: Artificial Neural Net- 
works for Speech and Vision, ed. by R.J. Mammone 
(Chapman-Hall, New York 1993) pp. 126-142 

T.K. Ho: The random subspace method for con- 
structing decision forests, IEEE Trans. Pattern Anal. 
Mach. Intell. 20(8), 823-844 (1998) 

K. Chen, L. Wang, H.S. Chi: Methods of com- 
bining multiple classifiers with different feature 
sets and their applications to text-independent 
speaker identification, Int. J. Pattern Recogn. Ar- 
tif. Intell. 11(3), 417-445 (1997) 

J. Kittler, M. Hatef, R.P.W. Duin, J. Matas: On com- 
bining classifiers, IEEE Trans. Pattern Anal. Mach. 
Intell. 20(3), 226-239 (1998) 

D.H. Wolpert: Stacked generalization, Neural Netw. 
2(3), 241-259 (1992) 

L. Xu, M.I. Jordan, G.E. Hinton: An alternative 
model for mixtures of experts, Adv. Neural Inf. Pro- 
cess. Syst. (NIPS) (1995) 

L. Xu, A. Krzyzak, C.Y. Suen: Associative switch 
for combining multiple classifiers, J. Artif. Neural 
Netw. 1(1), 77-100 (1994) 

K. Chen: A connectionist method for pattern classi- 
fication with diverse feature sets, Pattern Recogn. 
Lett. 19(7), 545-558 (1998) 

K. Chen, H.S. Chi: A method of combining multiple 
probabilistic classifiers through soft competition 


28.77 


28.78 


28.79 


28.80 


28.81 


28.82 


28.83 


28.84 


28.85 


28.86 


28.87 


28.88 


28.89 


28.90 


28.91 


on different feature sets, Neurocomputing 20(1-3), 
227-252 (1998) 

K. Chen: On the use of different representations for 
speaker modeling, IEEE Trans. Syst. Man Cybern. C 
35(3), 328-346 (2005) 

Y. Yang, K. Chen: Temporal data clustering via 
weighted clustering ensemble with different rep- 
resentations, IEEE Trans. Knowl. Data Eng. 23(2), 
307-320 (2011) 

Y. Yang, K. Chen: Time series clustering via RPCL 
ensemble networks with different representations, 
IEEE Trans. Syst. Man Cybern. C 41(2), 190-199 (2011) 
L. Xu, A. Krzyzak, C.Y. Suen: Methods of combining 
multiple classifiers and their applications to hand- 
writing recognition, IEEE Trans. Syst. Man Cybern. 
22(3), 418-435 (1992) 

K. Chen, X. Yu, H.S. Chi: Combining linear discrimi- 
nant functions with neural networks for supervised 
learning, Neural Comput. Appl. 6(1), 19-41 (1997) 
K. Chen, L.P. Yang, X. Yu, H.S. Chi: A self-generating 
modular neural network architecture for super- 
vised learning, Neurocomputing 16(1), 33-48 (1997) 
B.L. Lu, M. Ito: Task decomposition and module 
combination based on class relations: A modu- 
lar neural network for pattern classification, IEEE 
Trans. Neural Netw. Learn. Syst. 10(5), 1244-1256 
(1999) 

L. Cao: Support vector machines experts for time 
series forecasting, Neurocomputing 51(3), 321-339 
(2003) 

T.G. Dietterich: Ensemble learning. In: Handbook of 
Brain Theory and Neural Networks, ed. by M.A. Ar- 
bib (MIT Press, Cambridge 2002) pp. 405-408 

Y. Freund, R.E. Schapire: Experiments with a new 
boosting algorithm, Proc. Int. Conf. Mach. Learn. 
(ICML) (1996) pp. 148-156 

L. Breiman: Bagging predictors, Mach. Learn. 24(2), 
123-140 (1996) 

H. Schwenk, Y. Bengio: Boosting neural networks, 
Neural Comput. 12(8), 1869-1887 (2000) 

J. Wang, V. Saligrama: Local supervised learning 
through space partitioning, Adv. Neural Inf. Pro- 
cess. Syst. (NIPS) (2012) 

X.J. Zhu: Semi-supervised learning literature sur- 
vey, Technical Report, School of Computer Science 
(University of Wisconsin, Madison 2008) 

J. Ghosh, A. Acharya: Cluster ensembles, WIREs Data 
Min. Knowl. Discov. 1(2), 305-315 (2011) 


James T. Kwok, Zhi-Hua Zhou, Lei Xu 


This tutorial provides a brief overview of a num- 
ber of important tools that form the crux of the 
modern machine learning toolbox. These tools 
can be used for supervised learning, unsupervised 
learning, reinforcement learning and their numer- 
ous variants developed over the years. Because 
of the lack of space, this survey is not intended 
to be comprehensive. Interested readers are re- 
ferred to conference proceedings such as Neural 
Information Processing Systems (NIPS) and the In- 
ternational Conference on Machine Learning (ICML) 
for the most recent advances. 


29.1 Overview ............. eee eeeeeeeeeeeeeeeeeee 495 

29.2 Supervised Learning ....................000 497 

29.2.1 Classification and Regression...... 497 
29.2.2 Other Variants 

of Supervised Learning .............. 499 

29.3 Unsupervised Learning .......................- 502 


29.3.1 Principal Subspaces 

and Independent Factor Analysis 503 
29.3.2 Multi-Model-Based Learning: 

Data Clustering, Object Detection, 

and Local Regression................. 505 


29.1 Overview 


Machine learning represents one of the most prolific 
developments in modern artificial intelligence. It pro- 
vides a new generation of computational techniques and 
tools that support understanding and extraction of use- 
ful knowledge from complicated data sets. So what is 
machine learning? Simon [29.1] defined machine learn- 
ing as: 


changes in the system that are adaptive in the sense 
that they enable the system to do the same task or 
tasks drawn from the same population more effec- 
tively the next time. 


29. Machine Learning 


29.3.3 Matrix-Based Learning: 

Similarity Graph, Nonnegative 

Matrix Factorization, 

and Manifold Learning .............. 507 
29.3.4 Tree-Based Learning: 

Temporal Ladder, 

Hierarchical Mixture, 


and Causal Tre sia35esecacseearsee vere 509 

29.4 Reinforcement Learning....................... 510 

29.4.1 Markov Decision Processes ......... 510 

29.4.2 TD Learning and Q-Learning....... 511 
29.4.3 Improving Q-Learning 

by Unsupervised Methods.......... 512 

29.5 Semi-Supervised Learning ................... 513 

29.6 Ensemble Methods .........................005 514 

29.6.1 Basit CONCEPTS sccis 514 

29.6.2 BOOSIE vecerniei 516 

290-3 BOZE oaei ee nieee 516 

296 MAME ioiii eatae 517 

29.65 DIVE NY os evcciwesscccnwacacicnesseacceiees 517 

29.7 Feature Selection and Extraction .......... 518 

Referentes... eee cec cc ceeeceneeeceeeeeaeeeeaees 519 


Hence, fundamentally, the emphasis of machine learn- 
ing is on the system’s ability to adapt or change. 
Typically, this is in response to some form of experience 
provided to the system. After learning or adaptation, the 
system is expected to have better future performance on 
the same or a related task. 

Over the past decades, machine learning has grown 
from a few toy applications to being almost every- 
where. It is now being applied to numerous real-world 
applications. For example, the control of autonomous 
robots that can navigate on their own, the filtering of 
spam from mailboxes, the recognition of characters 


495 


v 
far 
= 
as 
(= 
N 
0 
= 


496 Part D 


Neural Networks 


62 | d Hed 


from handwriting, the recognition of speech on mo- 
bile devices, the detection of faces in digital cameras, 
and so on. Indeed, one can find applications of ma- 
chine learning from everyday consumer products to 
advanced information systems in corporations. Studies 
of machine learning may be overviewed from either 
the perspective of learning intelligent systems or that 
of a machine learning toolbox. For the former, learning 
is considered as a process of an intelligent system for 
coordinately solving three levels of inverse problems, 
namely problem solving for making pattern recognition 
and various other tasks, parameter learning for esti- 
mating unknown parameters in the system, and model 
selection for shaping system configuration with an ap- 
propriate scale or complexity to describe regularities 
underlying a finite size of samples. Different learning 
approaches are featured by differences in one or more 
of three ingredients, namely as a learner that has an 
appropriate system configuration, a theory that guides 
learning, and an algorithm or dynamic procedure that 
implements learning. Examples of studies from this 
prospect are recently overviewed in [29.2,3], and will 
not be further addressed in this chapter. Instead, this 
chapter aims at a tutorial on studies of machine learn- 
ing from the second prospect, that is, on those important 
collections pooled in the machine learning toolbox for 
decades. Actually, the current prosperity of machine 
learning comes from not only further developments of 
the classical statistical modeling and neural network 
learning, but also emerging achievements of machine 
learning and data mining in recent decades. Due to lim- 
ited space, the focus of this tutorial will be particularly 
placed on those advancements made in the last two 
decades or so. 

Classically, there are three basic learning 
paradigms, namely, supervised learning (Sect. 29.2), 
unsupervised learning (Sect. 29.3), and reinforcement 
learning (Sect. 29.4). In supervised learning, the 
learner is provided with a set of inputs together with 
the corresponding desired outputs. This is similar 
to the familiar human learning process for pattern 
recognition, in which a teacher provides examples to 
teach children to recognize different objects (say, for 
example, animals). Such a pattern recognition task is 
featured by data with each input sample associated with 
a label, namely labeled data. In the current literature 
on machine learning, the term labeled data is even 
generally used to refer data with each input associated 
with an output beyond simply a label, which is also 
adopted in this chapter. Section 29.2 provides not only 
a tutorial on basic issues of supervised learning but 


also an overview on a number of interesting topics 
developed in recent years. The coverage of this section 
is not complete, e.g., it does not cover the supervised 
learning studies in the literature on neural networks. 
Interested readers are referred to a number of survey 
papers, e.g., especially those on multi-layer perceptron 
and radial basis functions [29.4, 5]. 

Unlike supervised learning, the tasks of unsuper- 
vised learning are featured by data that consist of only 
inputs, namely, the data is unlabeled and there is no 
longer the presence of a teacher. Unsupervised learning 
aims at finding certain dependence structure underlying 
data via optimizing a learning principle. Considering 
different types of structures, studies include not only 
classic topics of data clustering, subspace, and topolog- 
ical maps, but also emerging topics of learning latent 
factor models, hidden state-space models, and hier- 
archical structures. Section 29.3 also consists of two 
parts. The first part provides a tutorial on three clas- 
sic topics, while the second part makes an overview 
on emerging topics. Extensive studies have been made 
on unsupervised learning for many decades. Instead of 
seeking a complete coverage, this section focuses on 
a tutorial on fundamentals and an overview on inter- 
esting developments of recent years, mainly based on 
a more systematic overview [29.6]. Further, readers are 
referred to several recent survey papers, e.g., [29.7] for 
an overview on 50 years of studies beyond k-means 
for data clustering, [29.8, 9] for subspace and manifold 
learning, and [29.10] for topological maps. 

The third paradigm is reinforcement learning. Upon 
observing the current environment and obtaining some 
input (if any), the learner makes an action and changes 
to a new environment, receiving an evaluation (award 
or punish) value about the action. A learning pro- 
cess makes a series of actions with the received total 
award maximized. Different to unsupervised learning, 
the learner gets a guidance from an external evaluation. 
Also unlike supervised learning in which the teacher 
clearly specifies the output that corresponds to an input, 
in reinforcement learning the learner is only provided 
with an evaluative value about the action made. Sec- 
tion 29.4 starts at giving a tutorial on basic issues of 
reinforcement learning, especially temporal difference 
TD learning and Q-learning, plus improvements on the 
Q-learning with the help of some unsupervised learning 
methods. 

Besides these three basic learning paradigms, many 
more variants have been developed in recent years be- 
cause of the advances in machine learning. Some of 
these will also be described in this tutorial. They are of- 


Machine Learning | 29.2 Supervised Learning 497 


ten a hybrid of the previous learning paradigms. A very 
popular variant is semi-supervised learning (Sect. 29.5), 
which uses both labeled data (as in supervised learning) 
and unlabeled data (as in unsupervised learning) for 
training. This is advantageous as labeled data typically 
are expensive and involve tedious human effort, while 
a large amount of unlabeled data can often be obtained 
in an inexpensive manner (e.g., simply downloadable 
from the web). Another hybrid of supervised learning 
and unsupervised learning is discriminative clustering. 
Here, one adopts a cost function originally used for 
supervised learning as a clustering criterion. A well- 
known example in this category is called maximum 
margin clustering [29.1 1-13], which tries to maximize 
the margin (used as a criterion in constructing the highly 


29.2 Supervised Learning 


A supervised learner is provided with some labeled data 
(often called training data). This consists of a set of 
training samples, each of which is an input together 
with the corresponding desired outputs. Hence, the first 
step in machine learning is to collect these training 
samples. Moreover, as each training sample needs to 
be represented in a form amendable by the computer 
algorithm, one has to define a set of features. As an 
example, consider the task of recognizing handwritten 
characters on an envelope. To construct the training 
samples, obviously one first has to collect a number 
of envelopes with handwritten characters on. Then, the 
characters on each envelope have to be separated from 
each other. This can be performed either manually or 
automatically by some image segmentation algorithm. 
Afterwards, each character is a block of pixels (typi- 
cally rectangular). A simple feature representation will 
be to use the intensities of these raw pixels. Each in- 
put is represented as a vector of feature values, and this 
vector is called the feature vector. Obviously, it is im- 
portant to have a good set of features to work with. 
The presence of bad features may confuse the learn- 
ing algorithm and makes learning more difficult. For 
example, in the context of character recognition, the 
color of the ink is not relevant to the identity of the 
character and so can be considered as a bad feature. De- 
pending on the domain knowledge, more sophisticated 
features can be manually defined. It is desirable that 
good features can be automatically extracted and bad 
features automatically removed. More details on these 
feature selection/extraction algorithms will be covered 


successful supervised learning model: support vector 
machine) between clusters. 

Moreover, instead of just constructing one learner 
from training data, one can construct a set of learners 
and combine them to solve the problem. This approach, 
known as ensemble learning [29.14], has become very 
popular in recent years and will be discussed in more 
detail in Sect. 29.6. Finally, before learning can pro- 
ceed, the data need to be appropriately represented by 
a set of features. In many real-world data sets, there 
are often a large number of features, many of which 
are abundant or irrelevant. Feature selection and extrac- 
tion aim at automatically extracting the good features 
and removing the bad ones, and this will be covered in 
Sect. 29.7. 


in Sect. 29.7. Finally, each character on the envelope 
has to be manually labeled. 

In practice, as the real-world data are often dirty, 
a significant amount of time may have to be spent on 
data pre-processing in order to create the training data. 
There are many forms of dirty data. For example, it can 
be incomplete in that certain attribute values (e.g., occu- 
pation) may be lacking; it can contain outliers or errors 
(e.g., the salary is negative); parts of it may be inconsis- 
tent (e.g., the customer’s age is 42 but his/her birthday is 
03/07/2012); it may also be redundant in that there are 
duplicate records or unnecessary attributes. All these 
problems may be due to faulty or careless data col- 
lection, human/hardware/software problems, errors in 
data transmission; or that the data may have come from 
a number of different data sources. In all cases, data pre- 
processing can have a significant impact on the resultant 
machine learning system, as no quality data implies no 
quality learning results! 


29.2.1 Classification and Regression 


The two main goals in supervised learning are (i) clas- 
sification, which aims at assigning the input pattern 
to different categories (also called classes or labels); 
and (ii) regression, which aims at predicting a real 
value or vector associated with the input. The ba- 
sic idea and the training/testing procedures in regres- 
sion are similar to those in classification. Hence, we 
will mainly focus on the classification problem in the 
sequel. 


7°62 | d Hed 


498 PartD 


Neural Networks 


7°62 | d Hed 


The simplest case for classification is binary classi- 
fication, in which there are only two classes. Examples 
include classifying an email as spam or non-spam; and 
classifying an image as face or non-face. For each sam- 
ple, the supervised learner examines the feature values 
in that sample and predicts the class that the sample be- 
longs to. Essentially, the supervised learner partitions 
the whole feature space (the space of all possible fea- 
ture value combinations) into two regions, one for each 
class. The boundary is called the decision boundary. 
A wide variety of models can be used to construct 
this decision boundary. A simple example is the linear 
classifier, which creates a linear boundary. Depending 
on the task, the linear classifier may be too simple 
to differentiate the two classes. Then, one can also 
use a more complicated decision boundary, such as 
a quadratic surface, leading to a quadratic classifier. 
In machine learning, a large number of various mod- 
els that are capable of producing nonlinear decision 
boundaries have been proposed. The most popular ones 
include the decision tree classifier, nearest neighbor 
classifiers, neural network classifiers, Bayesian clas- 
sifiers, and support vector machines. Each of these 
models has some parameters that have to be adapted 
to the particular data set. For example, the parameters 
of a linear classifier include the weight on each fea- 
ture (which controls the slope of the linear boundary) 
and a bias (which controls the offset). To estimate or 
train these parameters, one has to provide a training set, 
where the i-th training pattern (x;, y;) consists of an in- 
put x; and the corresponding target output label y; (for 
regression problems, this y; is a real value or vector). 
The greater the amount of training data, intuitively the 
more accurate the learned model. However, since the 
training data in supervised learning are labeled, obtain- 
ing these output labels typically involve expensive and 
tedious human effort. Hence, recent machine learning 
algorithms also try to utilize data that are unlabeled, 
leading to the development of semi-supervised learning 
algorithms in Sect. 29.5. 

Given the model, different strategies can be used 
to learn the model parameters so that it fits the train- 
ing set (i. e., train the model). Parameter estimation and 
feature selection (Sect. 29.7) can sometimes be per- 
formed together. However, note that there is the danger 
of overfitting, which occurs when the model performs 
better than other models on the training data, but worse 
on the entire data distribution as it has captured the 
trends of the noise underlying the data. Often this hap- 
pens when the model is excessively complex, such as 
when it has a lot more parameters than can be re- 


liably estimated from the limited number of training 
patterns. To combat overfitting, one can constrain the 
model’s freedom during training by adding a regular- 
izer or Bayesian to the parameters or model beforehand. 
Alternatively, one can stop the learning procedure be- 
fore convergence (early stopping) or remove part of the 
model when training is complete (pruning). If there are 
noisy training samples that significantly deviate from 
the underlying input-output trend, one can also per- 
form outlier detection to first remove these outlying 
samples. 

There are two general approaches to train the model 
parameters. The first approach treats the model as a gen- 
erative model that defines how the data are generated 
(typically by using a probabilistic model). One can then 
maximize the likelihood by varying the parameters, or 
to maximize the posterior probability of the parame- 
ters given the training data. Alternatively, one can take 
a discriminative approach that directly considers how 
the output is related to the input. The parameters can be 
obtained by empirical risk minimization, which seeks 
the parameters which best fit the training data. The risk 
is dependent on the loss function, which measures the 
difference between the prediction and the target out- 
put. Let y; be the target output for sample i, and 4; 
be the predicted output from the supervised learner. 
For classification problems, commonly used loss func- 
tions include the logistic loss In(1 + exp(—y;¥;)) and 
the hinge loss max(0, 1 — y;¥;); and for regression prob- 
lems, the most common loss function is the square loss 
(yi —3;)*. However, in order to combat overfitting, it is 
better to perform regularized risk minimization instead 
of empirical risk minimization. Regularized risk con- 
sists of two components. The first component is the loss 
as in empirical risk minimization. The second compo- 
nent is a regularizer, which helps to control the model 
complexity and prevents overfitting. Various regulariz- 
ers have been proposed. Let w = [w1,W2,...,Wa]’ be 
the vector of parameters. A popular regularizer is the 
£5-norm of w, i.e., 


d 
wb = dow. 


i=l 


This leads to ridge regression when the linear model is 
used, and is commonly called weight decay in the neural 
networks literature. Instead of using the 2-norm, one 
can use the lo-norm ||w||o, which counts the number 
of nonzero w; in the model. However, this is noncon- 
vex and the associated optimization is more difficult. 
A common way to alleviate this problem is by using the 


Machine Learning | 29.2 Supervised Learning 499 


£,-norm 


d 
lwli = So wil, 


i=l 


which is still convex (as for the £-norm) but can still 
lead to a sparse parameter solution (as for the £9-norm). 
When used with the square loss on the linear model, this 
leads to the well-known lasso model. 

Once trained, the classifier can be used to predict 
the label of an unseen test sample. The underlying as- 
sumption is that this test sample comes from the same 
distribution as that of the training samples. In this case, 
we expect the trained classifier to be able to generalize 
well to this new sample. This can also be formally de- 
scribed by generalization error bounds in computational 
learning theory. 

There are multiple ways to measure the perfor- 
mance of a trained classifier. An obvious performance 
evaluation criterion is classification accuracy, which is 
the fraction of test samples that are correctly classi- 
fied (by comparing the prediction obtained from the 
classifier and the true class output of the test sample). 
As mentioned above, because of the issue of overfit- 
ting, it can be misleading to simply gauge classification 
accuracy on the training set. Instead, one can mea- 
sure classification accuracy on a separate validation set 
(which is used as a proxy for the underlying data distri- 
bution), or use cross-validation. Moreover, sometimes, 
when the sample sizes of the two classes differ sig- 
nificantly, this accuracy may again be misleading, as 
one may attain an apparently high accuracy by sim- 
ply predicting the test sample to belong to the majority 
class. In these cases, other measures such as precision, 
recall, and F-measure may be more useful. Moreover, 
while the classifier’s accuracy is often an important cri- 
terion, other aspects may also be important, such as the 
training and testing of computational complexities (in- 
cluding both time and space), user-friendliness (e.g., is 
the trained model considered as a black-box or can it be 
easily conveyed and explained to the users), etc. 

While binary classification assumes the presence of 
only two output classes, many real-world applications 
have more than two (say, K), leading to a multi-class 
classification problem. There are two common ap- 
proaches to reduce a multi-class classification problem 
to binary classification problems, namely, the one-vs- 
rest (also called one-vs-all) approach and the one-vs- 
one approach. In the one-vs-rest approach, K binary 
classifiers are constructed, each one separating the sam- 
ples belonging to the i-th class from those that do not. 


On prediction, the test sample is sent to all the K classi- 
fiers, and its label corresponds to the classifier with the 
highest output. In the one-vs-one approach, one binary 
classifier is built for each pair of outputs (e.g., outputs 7 
and j), and each classifier tries to discriminate samples 
belonging to the i class from those belonging to the j 
class. Thus, there are a total of 5K (K — 1) classifiers. 
On prediction, the test sample is again sent to all the 
binary classifiers, and the class that receives the largest 
number of votes is output. 


29.2.2 Other Variants 
of Supervised Learning 


Multi-Label Classification 
While an instance can only belong to one and only one 
class in multi-class classification, an instance in multi- 
label classification can belong to multiple classes. Many 
real-world applications involve multi-label classifica- 
tion. For example, in text categorization, a document 
can belong to more than one category, such as gov- 
ernment and health; in bioinformatics, a gene may 
be associated with more than one function, such as 
metabolism, transcription, and protein synthesis; and 
in image classification, an image may belong to multi- 
ple semantic categories, such as beach and urban. Note 
that the number of labels associated with an unseen 
instance is unknown and can also vary from instance 
to instance. Hence, this makes the multi-label classifi- 
cation problem more complicated than the multi-class 
classification problem. In the special case where the 
number of labels associated with each instance is al- 
ways equal to one, obviously multi-label classification 
reduces to multi-class classification. 

In general, multi-label classification algorithms can 
be divided into two categories: problem transformation 
and algorithm adaptation [29.15]. Problem transfor- 
mation methods transform a multi-label classification 
problem into one or more single-label classification 
problems. The basic approach (called binary relevance) 
simply decomposes a multi-label problem with K la- 
bels into K binary classification problems, one for 
each label. In other words, the i-th classifier is a bi- 
nary classifier that tries to decide whether the sample 
belongs to the i-th class. However, since this consid- 
ers the labels independently, any possible correlations 
among labels will be ignored, leading to inferior per- 
formance in problems with highly correlated labels. 
More refined variants thus take the label correlation 
into account during training, a similar idea that is also 
exploited in multi-task learning (Sect. 29.2.2). On the 


7°62 | d Hed 


500 PartD 


Neural Networks 


2°62 | d Hed 


other hand, algorithm adaptation methods extend a spe- 
cific learning algorithm for multi-label classification. 
The specific extension is thus tailor-made for each in- 
dividual learning algorithm and less general. Example 
learning algorithms that have been extended in this 
way include boosting, decision trees, ensemble meth- 
ods, neural networks, support vector machines, genetic 
algorithms, and the nearest-neighbor classifier. Recent 
surveys on the progress of multi-label classification and 
its use in different applications can be found in [29.15, 
16]. 

In many applications, the labels are often organized 
in a hierarchy, either in the form of a tree (such as 
documents in Wikipedia) or as a directed acyclic graph 
(such as gene ontology). An instance is associated 
with a label only if it is also associated with the label’s 
parent(s) in the hierarchy. Recently, progress has 
also been made in multi-label classification in these 
structured label hierarchies [29.17—19]. 


Multi-Instance Learning 
In multi-instance learning (MIL), the training set is 
composed of many bags each containing multiple in- 
stances, where a positive bag contains at least one 
positive instance, whereas a negative bag contains 
only negative instances; labels of the training bags are 
known, but labels of the instances are unknown. The 
task is to make predictions for labels of unseen bags. 
The multi-instance learning framework is illustrated in 
Fig. 29.1. Notice that the instances are described by the 
same feature set, rather than different feature sets. 

The MIL learning framework originated from the 
study of drug activity [29.20], where a molecule with 
multiple low-energy shapes is known to be useful to 
make a drug, whereas it is unknown which shape is cru- 
cial. Later, many real tasks are found to be natural multi- 
instance learning problems. For example, in image re- 
trieval if we regard each image patch as an instance, then 
the fact that an user is interested (or not interested) in 
an image implies that there are at least one patch (or no 
patch) that contains his/her interesting objects. 


Instance ` 


Instance “ 


Fig. 29.1 Illustration of multi-instance learning 


Most MIL methods attempt to adapt single-instance 
supervised learning algorithms to the multi-instance 
representation by shifting their focus from discrimi- 
nation on instances to discrimination on bags; there 
are also methods that try to adapt the multi-instance 
representation to single-instance algorithms by repre- 
sentation transformation [29.21]. Recently, it has been 
recognized that the instances in the bags should not be 
treated independently [29.22]; otherwise MIL is a spe- 
cial case of semi-supervised learning [29.23]. 

In addition to classification, multi-instance regres- 
sion and clustering have been studied, and different 
versions of generalized multi-instance learning have 
been defined [29.24, 25]. To deal with complicated data 
objects that are associated with multiple labels simulta- 
neously, a new framework, multi-instance, multi-label 
learning (MIML) [29.26], was developed recently. 

Notice that the original MIL assumption implies 
that there exists a key instance in a positive bag; 
later, some other assumptions were introduced [29.27]. 
For example, some methods assumed that there is no 
key instance and every instance contributes to the bag 
label. 


Multi-View Learning 
In many real tasks there is more than one feature set. 
For example, a video film can be described by au- 
dio features, image features, etc.; a web page can be 
described by features characterizing its own content, 
features characterizing its linked pages, etc. A classical 
routine is to take these features together and represent 
each instance using a concatenated feature vector. The 
different feature sets, however, usually convey informa- 
tion from different channels, and therefore, it may be 
better to consider the difference explicitly. This mo- 
tivates multi-view learning, where each feature set is 
called a view. 

Each instance in multi-view learning is represented 
by multiple feature vectors each in a different, usually 
non-overlapping feature set. Multi-view learning meth- 
ods in supervised learning setting are closely related 
to studies of information fusion, combining classi- 
fiers (29.28-30], and ensemble methods [29.14]. A pop- 
ular representative is to construct a model from each 
view, and then combine their predictions using voting or 
averaging. The models are often assigned with different 
weights, reflecting their different strength, reliability, 
and/or importance. 

Multi-views make great sense when unlabeled data 
are considered. For example, it has been proved that 
when there are sufficient and redundant views (that is, 


Machine Learning | 29.2 Supervised Learning 501 


each view contains sufficient information for construct- 
ing a good model, and the two views are conditionally 
independent given the class label), co-training is able 
to boost the performance of any initial weak learner to 
an arbitrary performance using unlabeled data [29.31]. 
Later, it was found that such a process is beneficial 
even when the two views satisfy weaker assumptions, 
such as weak dependence, expansion, or large diver- 
sity [29.32-34], and when there are really sufficient and 
redundant views, even semi-supervised learning with 
a single labeled example is possible [29.35]. Moreover, 
in active learning where the learner actively selects 
some unlabeled instances to query their labels from an 
oracle (such as a human expert), it has been proved 
that multi-view learning enables exponential improve- 
ment of sample complexity in a setting close to real 
tasks [29.36], whereas previously it was believed that 
only polynomial improvement is possible. 


Multi-Task Learning 
Many real-world problems involve the learning of 
a number of similar asks. Consider the simple example 
of learning to recognize the numeric digits 0—9. One 
can build ten separate classifiers, one for each digit. 
However, apparently these ten classifiers share some 
common features, e.g., many of the digits consist of 
loops and strokes. Hence, the ability to detect these 
higher latent features is of common interest to all these 
classifiers, and learning all these tasks together will 
allow them to borrow strength from each other. More- 
over, when the number of training examples is rare for 
each task, most single-task learning methods may fail. 
By learning them together, better generalization per- 
formance can be obtained by harnessing the intrinsic 
task relationships. Consequently, this leads to the de- 
velopment of multi-task learning (MTL) [29.37]. These 
different tasks have different output spaces and can also 
have different input features. But it is also quite often 
that these different tasks share the same set of input fea- 
tures. In this case, the problem is similar to multi-label 
classification (Sect. 29.2.2). 

A popular MTL approach is regularized multi-task 
learning (RMTL) [29.38, 39]. It assumes that the tasks 
are highly related, and encourages the parameters of all 
the tasks to be close. More specifically, let there be T 
tasks and denote the parameter associated with the f-th 
task by w,;. RMTL assumes that all the w,’s are close 
to some shared task w, and that the w,’s differ by each 
other only in a term Aw,’s as w, = w + Aw,. Hence, 
w represents the component that is shared by all the 
tasks, and thus can benefit from learning all the tasks 


together; while Aw, is the component that is specific to 
each individual task, and can be used to capture the in- 
dividual variations. Alternatively, other MTL methods, 
such as multi-task feature learning (MTFL) [29.40], as- 
sumes that all the tasks lie in a shared low-dimensional 
space. 

Moreover, tasks are supposed to form several clus- 
ters rather than from the same group. If such a task 
clustering structure is known, then a simple remedy is 
to constrain task sharing to be just within the same clus- 
ter [29.39, 41]. More generally, all the tasks are related 
in different degrees, which can be represented by a net- 
work of task relationships [29.42]. In this case, MTL 
can also be performed. In practice, however, such an 
explicit knowledge of task clusters/network may not be 
readily available. 

A number of efforts have made towards identify- 
ing task relationships simultaneously during parameter 
learning, e.g., learning a low-dimensional subspace 
shared by most of the tasks [29.43], finding the correla- 
tions between tasks [29.44], and inferring the clustering 
structure [29.45,46], as well as integrating low-rank 
and group-spatse structures for robust multi-task learn- 
ing [29.47]. 


Transfer Learning 
As discussed in Sect. 29.1, traditionally, machine learn- 
ing is defined as: 


changes in the system that are adaptive in the sense 
that they enable the system to do the same task or 
tasks drawn from the same population more effec- 
tively the next time. 


However, recently, there has been increasing interest in 
adapting a classifier/regressor trained in one task for 
use in another. This so-called transfer learning is par- 
ticularly crucial when the target application is in short 
supply of labeled data. For example, it is very expensive 
to calibrate a WiFi localization model in a large-scale 
environment. To reduce re-calibration effort, we might 
want to adapt the localization model trained in one 
time period (source domain) for a new time period (tar- 
get domain), or to adapt the localization model trained 
on a mobile device (source domain) for a new mo- 
bile device (target domain). However, the WiFi signal 
strength is a function of time, device, and other dynamic 
factors. Thus, transfer learning is used to adapt the dis- 
tributions of WiFi data collected over time or across 
devices. 

In general, transfer learning addresses the problem 
of how to utilize plentiful labeled data in a source do- 


7°62 | d Hed 


502 


€°6z | d Hed 


Part D 


Neural Networks 


main to solve related but different problems in a target 
domain, even when the training and testing problems 
have different distributions or features. The success to 
transfer learning from one context to another context 
depends on how similar the learning task is to the trans- 
ferred task. There are two main approaches to transfer 
learning. The first approach tries to learn a common set 
of features from both domains, which can then be used 
for knowledge transfer [29.48-50]. Intuitively, a good 
feature representation should be able to reduce the dif- 
ference in distributions between domains as much as 
possible, while at the same time preserving important 
(geometric or statistical) properties of the original data. 
With a good feature representation, we can apply stan- 
dard machine learning methods to train classifiers or 
regression models in the source domain for use in the 
target domain. The second approach to transfer learning 
is based on instances [29.5 1—53]. It tries to learn differ- 
ent weights on the source examples for better adaptation 
in the target domain. For example, in the kernel mean 
matching algorithm [29.52], instances in a reproducing 
kernel Hilbert space are re-weighted based on the the- 
ory of maximum mean discrepancy. 


Cost-Sensitive Learning 
In many real tasks, the costs of making different 
types of mistakes are usually unequal. In such sit- 
uations, maximizing the accuracy (or equivalently, 
minimizing the number of mistakes) may not pro- 
vide the optimal decision. For example, two instances 
that each cost 10 dollars are less important than one 
instance that costs 50 dollars. Cost-sensitive learn- 
ing methods attempt to minimize the total cost by 
reducing serious mistakes through sacrificing minor 
mistakes. 

There are two types of misclassification costs, i. e., 
example-dependent or class-dependent cost. The for- 
mer assumes that every example has its own misclassifi- 
cation cost, whereas the latter assumes that every class 
has its own misclassification cost. To obtain example- 
dependent cost is usually much more difficult in real 
practice, and therefore, most studies focus on class- 
dependent cost. 


29.3 Unsupervised Learning 


Given a set Xy = {x,}_, of unlabeled data samples, 
unsupervised learning aims at finding a certain de- 
pendence structure underlying data Xy with help of 
a learning principle. The simplest one is the structure 


The essence of most cost-sensitive learning meth- 
ods is rescaling (or rebalance), which tries to rebalance 
the classes such that the influence of each class in the 
learning process is in proportion to its costs. Suppose 
the cost of misclassifying the i-th class to the j-th class 
is cost;. For binary classification, it can be derived from 
the Bayes risk theory that the optimal rescaling ratio of 
the i-th class against the j-th class is tj = ah [29.54]. 
For multi-class problems, however, there is no direct so- 
lution to obtain the optimal rescaling ratios [29.55], and 
one may want to decompose a multi-class problem to 
a series of binary problems to solve. 

Rescaling can be implemented in different ways, 
e.g., re-weighting or re-sampling the training exam- 
ples of different classes, or even moving the decision 
threshold directly towards the cheaper class. It can be 
easily incorporated into existing supervised learning al- 
gorithms. For example, for support vector machines, 
the corresponding optimization problem can be written 
as 


i 1 m 
min zlwli3e +C) _ costxéi 
st yi(w (x) +b) > 1-& 


&>0 i=1,...,m, (29.1) 


where ¢ is the feature induced from a kernel func- 
tion and cost(x;) is the cost for misclassifying x;. 
It can be found that the only difference with the 
classical support vector machine is the insertion of 
cost(x;). 

It is often difficult to know precise costs in real 
practice, and some recent studies have tried to ad- 
dress this issue [29.56]. Notice that a learning pro- 
cess may involve various costs such as the fest cost, 
teacher cost, intervention cost, etc. [29.57], and these 
costs can also be considered in cost-sensitive learn- 
ing. Last but not least, it should be noted that the 
variants introduced in this section already go beyond 
the classic paradigm of supervised learning. Many of 
them are integrated with unsupervised learning. Some 
further issues will be also addressed in the following 
sections. 


of merely a point jz in a vector space as illustrated in 
Fig. 29.2a(A(2)). It represents each sample x, with an 
error measure £; = ||x; — u||?. The best u may be ob- 
tained under a learning principle, e.g., minimizing the 


Machine Learning | 29.3 Unsupervised Learning 503 


following error 


N 
B=} oe, (29.2) 


which results in that jz is simply the mean of the sam- 
ples. 

Efforts made have been far from a simple point 
structure. As illustrated in Fig. 29.2, these efforts are 
roughly grouped into several closely related streams. 
One consists of those listed in Fig. 29.2a(A), fea- 
tured by increasing the dimensionality of the modeling 
structure from a single point to a line, plane, and sub- 
space. The second stream consists of those listed in 
Fig. 29.2a(B), with multiple structures replacing its 
counterparts listed in Fig. 29.2a(A). The third stream 
consists of those listed in Fig. 29.2b(C), based on 
matrix/graph representation of underlying dependence 
structures. Moreover, another stream is featured with 
underlying dependencies in tree structures, such as tem- 
poral modeling, hierarchical learning, and causal tree 
structuring, as illustrated in Fig. 29.2b(D). This section 
will provide a tutorial on the basic structures listed in 
Fig. 29.2. Also, an overview will be made of a number 
of emerging topics, mainly coming from a recent sys- 
tematic overview [29.6]. 

Additionally, there is also a stream of studies 
that not only consist of unsupervised learning as 
a major ingredient but also include features of su- 
pervised learning and reinforcement learning, some 
of which are referred to under the term semi- 
supervised learning, while others are referred to un- 
der the names of semi-unsupervised learning, hy- 
brid learning, mixture of experts, etc. Among them, 
semi-supervised learning has become a well-adopted 
name in the literature of machine learning and will 
be further introduced in Sect. 29.5. Moreover, read- 
ers are further referred to Sect. 4.3 of [29.58] 
and [29.59] for a general formulation called semi-blind 
learning. 


29.3.1 Principal Subspaces 
and Independent Factor Analysis 


When a point jz is replaced with a line structure as illus- 
trated in Fig. 29.2a(A(3)), e.g., represented by a vector 
Wu =w-—_ of a unit length, we consider the error €; 
as the shortest distance from x; to the line. Then, min- 
imizing E by (29.2) results in that w,, is the principal 
component direction of the sample set Xy, that is, we 


A(2) 


Gaussian mixture 
(GM) 
—_ * BQ) 


B(4) 


D4) 


Fig. 29.2a,b Four streams of unsupervised learning studies fea- 
tured by types of underlying dependence structures 


€°6z | d Hed 


504 PartD 


Neural Networks 


€°6z | d Hed 


have 


1 N 
Swu =Awy, s=7 di w(x u)", (29.3) 


t=1 


where A is the largest eigenvalue of the sample covari- 
ance matrix S, and w,, is the corresponding eigenvector. 

Moreover, we consider a plane or subspace il- 
lustrated in Fig. 29.2a(A(4)), resulting in a princi- 
pal plane or subspace, i.e., a subspace spanned by 
m eigenvectors that correspond to the first m largest 
eigenvalue of S. Usually, the tasks in Fig. 29.2a(A(3)) 
and Fig. 29.2a(A(4)) are called principal compo- 
nent analysis (PCA) and principal subspace analysis 
(PSA). 

Considering a subspace spanned by the column vec- 
tors of A, each sample x, is represented by x, = Ay, from 
a vector y; = fy, ...,y!]" in the subspace with the 
mutually independent elements of y, being the coordi- 
nates along these column vectors, subject to an error 
e, = xX, —<; that is uncorrelated to or independent of y+. 
Thus, as illustrated in Fig. 29.2a(A(5)), x, comes from 
an underlying subspace as follows 


X, =X, +e; = Ay; +e, 


Ee,y; = 0 or plerly:) = per) . (29.4) 
featured with the following independence 
PO) = POS?) ---POP”). (29.5) 


Particularly, it is called factor analysis (FA) if we con- 
sider 


PO) = G0, D), 


p(e;) = G(e,|0,D) fora diagonal D , (29.6) 


where G(x|u, £) denotes a Gaussian density with the 
mean vector u and the covariance matrix ©. 

In general, the matrix A and other unknowns 
in (29.4) are estimated by the maximum likelihood 
learning on 


q(x|0) = G(x|0, AA? + D) , (29.7) 


with help of the expectation maximization (EM) al- 
gorithm. With the special case D=o7J, the space 
spanned by the column vectors of A is the same sub- 
space spanned by m eigenvectors that correspond to the 
first m largest eigenvalue of S in (29.3), which has been 
a well-known fact since Anderson’s work in 1959. In 
the last two decades, there has been a renewed interest 


in the machine learning literature under the new name 
of probabilistic PCA. 

Classically, the principal subspace is obtained via 
computing the eigenvectors of the sample covari- 
ance X. However, X is usually poor in accuracy when 
the sample size N is small while the dimensionality of x; 
is high. Alternatively, Oja’s rule and variants thereof 
have been proposed to learn the eigenvectors adaptively 
per sample without directly computing X. Also, exten- 
sions have been made on adaptive robust PCA learning 
on data with outliers and on the adaptive principle curve 
as the line in Fig. 29.2a(A(3)) extended to a curve. 

A hyperplane has two dual representations. One is 
spanned by several one-dimensional unit vectors, while 
the other is represented by a unit length normal vec- 
tor w that is orthogonal to this subspace. In the latter 
case, minimizing E results in that w,, is still a solution 
of (29.3) but with A becoming the smallest eigenvalue 
instead of the largest one. Accordingly, the problem 
is called minor component analysis (MCA). In gen- 
eral, an m-dimensional subspace in R? may also be 
represented by the spanning vectors of a d—m com- 
plementary subspace, for which minimizing E results 
in d—m eigenvectors that correspond to the first m 
smallest eigenvalues of S. The problem is called minor 
subspace analysis (MSA). When m > d/2, the minor 
subspace needs fewer free parameters than the principal 
subspace does. In a dual subspace pattern classifica- 
tion, each class is represented by either a principal 
subspace or a minor subspace. Because they are dif- 
ferent from PCA and PSA, MCA and MSA are more 
prone to noises. For further details about these topics 
the interested reader is referred to Sect. 3.2.1 of [29.58] 
and a recent overview [29.60]. In the following, we add 
brief summaries on three typical methods. 


PCA versus ICA 
Independent component analysis (ICA) has been widely 
studied in the past two decades. The key point is to 
seek a linear mapping y; = Wx, such that the compo- 
nents y"),...,y” of y, become mutually indepen- 
dent, as shown in (29.5), as an extension of PCA 
by which the components [y{”,..., y”] of yı = Wx, 
become mutually de-correlated when the rows of W 
consist of the eigenvectors of the first m largest eigen- 
value of S in (29.3). Strictly speaking, this is inex- 
act since the counterpart of ICA should be called 
de-correlated component analysis (DCA), with inde- 
pendence among y®,...,y™ in the second order 
of statistics. PCA is one extreme case of DCA that 


Machine Learning | 29.3 Unsupervised Learning 505 


chooses those de-correlated components with the first 
m largest variances/eigenvalues, while MCA is an- 
other extreme case that chooses those with the first 
m smallest variances/eigenvalues. Correspondingly, the 
extended counterpart of PCA/MCA should be the prin- 
cipal/minor ICA that chooses the independent compo- 
nents with the first m largest/smallest variances. Read- 
ers are referred to Sect. 2.4 of [29.61] for further details. 
Several adaptive learning algorithms have been devel- 
oped for implementing ICA, but their implementation 
cannot be guaranteed (29.5). Theoretically, theorems 
have been proved that such a guarantee can be reached 
as long as one bit of information is provided to each 
component yl ) [29.62]. 
FA-a, FA-b, and Model Selection 

In the literature of statistics and machine learning, the 
model (29.4) with (29.6) is conventionally referred to as 
FA. Actually, we have Ay = Aj with A = A7 !, 5 = oy 
for any unknown nonsingular matrix. Among the differ- 
ent choices to handle this indeterminacy, the standard 
one, shortly denoted by FA-a, imposes (29.6) on y, 
which reduces an indeterminacy of a general nonsin- 
gular ọ to an orthonormal matrix. One other choice is 
given by (29.4) with 


ATA =I, p(y) = GO; 


0, A) for a diagonal A , 
(29.8) 


shortly denoted by FA-b. We have %;=Ay= 
AAA '!y=AAdo7bA!y =A, with A=AAQ’, 
y=oA'y, and o76=], i.e., its %; is equivalent 
to the one by FA-a for a given m with an invertible 
A! In other words, FA-a and FA-b are equivalent for 
a learning principle based on e, = x,—X;, e.g., mini- 
mizing E by (29.2) or maximizing the likelihood on 
xı. Moreover, FA-a and FA-b are still equivalent when 
model selection is used for determining an appropriate 
value m by one of the classic model selection criteria 
to be introduced later in (29.14). However, FA-a and 
FA-b become considerably different to the Bayesian 
Yin-Yang (BYY) harmony learning in Sect. 3.2.1 
of [29.58] and also to automatic model selection in 
general. Empirically, experiments show that not only 
BYY harmony learning but also the variational Bayes 
method perform considerably better on FA-b than on 
FA-a [29.63]. 


Non-Gaussian FA 
Both FA-a and FA-b still suffer an indeterminacy of 
an mxm orthonormal matrix , which can be fur- 


ther removed when at most one of the components 
y®,...,y™ is Gaussian. Accordingly, (29.4) with 
non-Gaussian components py?) in (29.5) is called 
non-Gaussian FA (NFA). It is also referred to as inde- 
pendent FA (IFA), although NFA sounds better, since 
the concept of IFA covers not only NFA but also FA- 
a and FA-b. One useful special case of NFA is called 
binary FA when y, is a binary vector. Moreover, in the 
degenerated case e, = 0, obtaining A of x; = Ay, sub- 
ject to (29.5) is equivalent to getting W = AT! such 
that Wx, = y, to satisfy (29.5). For this reason, NFA 
with e, Æ 0 is also sometimes referred to as noisy ICA. 
Strictly speaking, the map x, —> y; towards (29.5), be- 
ing an inverse of NFA, should be nonlinear instead 
of a linear y, = Wx,. Maximum likelihood learning is 
implemented with help of the EM algorithm, which 
was developed in the middle to end of the 1990s for 
BFA/NFA, respectively. Also, learning algorithms have 
been proposed for implementing BYY harmony learn- 
ing with automatic model selection on m. Recently, both 
BFA and NFA were used for transcription regulatory 
networks in gene analysis; for further details the reader 
is referred to the overview [29.60] and especially its 
Roadmap. 


29.3.2 Multi-Model-Based Learning: 
Data Clustering, Object Detection, 
and Local Regression 


The task of data clustering is partitioning a set of 
samples into several clusters such that samples within 
a sample cluster are similar while samples from dif- 
ferent clusters should be as different as possible. An 
indicator matrix P = [pe] with PP’ =I is used to 
represent one possible partition of a sample set Xy = 
fe, into one of £=1,...,k clusters, i. e., pe, = 1 
if x, belongs to the £-th cluster, otherwise pe, = 1. For 
multi-model-based clustering, each cluster is modeled 
by one structure, with pe, obtained by a competi- 
tion of using the structure of each cluster to repre- 
sent a sample x,. The structure of each cluster could 
be one of the ones listed on the left-hand side of 
Fig. 29.2; multiple clusters are thus represented by mul- 
tiple structures listed on the right-hand side of Fig. 29.2, 
which feature the basic topics of the second stream of 
studies. 

We still start from the simplest point structure illus- 
trated in Fig. 29.2a(A(1)), extending to the structure of 
multi-points illustrated in Fig. 29.2a(B(1)). With data 
already divided into k clusters, it is easy to obtain the 


mean 4; of each cluster. Given {uj}, fixed, it is also 


€°6z | d Hed 


506 PartD 


Neural Networks 


€°6z | d Hed 


easy to divide Xy into k clusters by 


1, £= argmin gs, 
Pea = 


(29.9) 
0, otherwise, 


where sj = ||x;— ||", i.e., x; is assigned to the £-th 
cluster if pe „= 1. The key idea of the k-means al- 
gorithm is alternatively getting pg, and computing py; 
from an initialization. Although it aims at minimizing 
E, by 29.2 with e; = $$; pe.ré,1, k-means typically 
results in a local minimum of E>, depending on the ini- 
tialization. 

Merely using the mean ju; is not good for describing 
a cluster beyond a ball shape. Instead, it is extended to 
considering the Gaussian illustrated in Fig. 29.2a(A(2)) 
and thus its counterpart in Fig. 29.2a(B(2)), i. e., the fol- 
lowing Gaussian mixture 


k 
g(x|9) = >> Gely, 5) - (29.10) 


j=l 
K-means can be extended to getting pe „ by (29.11) with 
&,. = — lil Gl, ))] - (29.11) 


and computing each Gaussian by 


Pet 


ok 
ap = —, 
N 
1 
7k pare 
Me = Na% $ per, 
1 
aS S pea uiui, (29.12) 


ok 
Na; 


which actually performs a type of elliptic clustering. 
Instead of getting pe, by (29.9), we compute 


k 
Pea = llan 0"), (Ela 0) = 1 Yee 


i=1 


(29.13) 


Actually, alternatively iterating (29.13) and (29.12) is 
the well-known EM algorithm for carrying out maxi- 
mum likelihood learning on the Gaussian mixture. 
Another important topic is to determine an appro- 
priate k (model selection), i.e. how many clusters are 
needed. Classic model selection seeks a best k* = 


arg min, J(k) with a criterion J(k) in a format as fol- 
lows 


J(k) = -L(k, 0*)+a(k,N), 0* = arg max L(k, 8), 
(29.14) 


where L(k, 0) is the likelihood function of q(x|0), and 
w(k, N) > 0 increases with k and decreases with N. One 
typical example is called the Bayesian information cri- 
terion (BIC) or minimum description length (MDL). To 
obtain k* one needs to enumerate a set of k values and 
estimate 6* for each k value, which incurs an extensive 
computation and is thus difficult to scale up to a large 
number of clusters. 

Alternatively, automatic model selection aims at 
obtaining k* during learning 6* by a mechanism or 
principle that is different from the maximum likeli- 
hood. This learning drives away an extra cluster via 
a certain indicator o; — 0, e.g., p; = œj or p; = 4; Tr[ Xj]. 
One early effort is rival penalized competitive learning 
(RPCL). RPCL learning does not implement (29.12) by 
either (29.9) or (29.13), with pe, given as follows 


1, £* = arg min; £j, , 
£ = arg mingzex &,1 , (29.15) 
0, otherwise , 


by which learning is made on a cluster when pe, = 1, 
and penalizing or de-learning is made on a cluster 
when pe; = —y, with a heuristic penalizing strength of 
roughly y ~ 0.005 ~ 0.05. 

The BYY harmony learning gets rid of the difficulty 
of finding an appropriate penalizing strength, with both 
parameter learning and model selection made under 
the Ying Yang best harmony principle. The algorithm 
obtained still implements (29.12) and replaces pe, 
by (29.12) with 


Peat = qe |x:, DIG] + ATE), 


Are, = Yo ax: 0° Jej eea. (29.16) 
j 


where Are ,>0 means that the j-th component is 
better than the average of all the components for de- 
scribing the sample x,. We further update the j-th 
component in (29.12) to enhance the description. If 
0 > Are, >-—1, i.e., the fitness by the j-th compo- 
nent to x; is below the average but still not too far 
away, updating of the j-th component remains the same 
trend as in (29.12) but with reduced strength. Moreover, 
when —1 > Az +, the updating on the j-th component 


Machine Learning | 29.3 Unsupervised Learning 507 


reverses direction to become de-learning, similar to up- 
dating the rival in RPCL learning. 

RPCL learning, which was proposed in 1992, and 
BYY harmony learning, which was developed in 1995, 
are similar in nature to the popular sparse learn- 
ing method, which was developed in 1995 [29.64, 
65], and prior-based automatic model selection ap- 
proaches [29.66—68], that is, extra parts in a model 
are removed as some parameters are pushed towards 
zero. Without any priors on the parameters, these prior- 
based approaches degenerate to maximum likelihood 
learning, while RPCL learning and further improved by 
incorporating appropriate priors. 

For further details about automatic model selection, 
prior-aided learning, and model selection criteria the 
reader is referred to Sect. 2.2 of [29.58, 69] and [29.70] 
for recent overviews. Also, readers are referred to 
Sect. 2.1 and Table 1 of [29.58] for a tutorial on several 
algorithms for learning Gaussian mixture, including the 
ones introduced above. In the following, only three typ- 
ical ones are briefly summarized. 


Local Subspaces and Local Factor Analysis 

As illustrated in Fig. 29.2a(B(3)-(5)), the structure of 
multi-points illustrated in Fig. 29.2a(B(1)) can be ex- 
tended into multiple subspaces and FA models. Still, 
we can obtain pg, by (29.9), (29.15), and (29.13) 
with ¢,; given by either (29.11) with 3; = AA +D; 
or simply the shortest square distance from x, to the 
j-th subspace. Given data divided into k clusters, we 
may estimate the subspace or FA of each cluster as 
introduced in Sect. 29.3.1, which leads to extensions 
of the k-means algorithm, the EM algorithm, and the 
BYY harmony learning algorithm for learning FAs or 
subspaces that locate at different ir Moreover, 
readers are referred to [29.71] and [29.2] for learning 
local FAs with both the number k and the dimensions 
{m;} determined automatically during BYY harmony 
learning. 


Object Detection and Pattern-Based Clustering 
The structures in Fig. 29.3a(B(3),(4)) are applicable to 
the tasks of detecting lines and subspaces among im- 
age data, which are topics that are widely studied in 
the literature of pattern recognition and handled by the 
well-known Hough transform (HT) and randomized HT 
(RHT) [29.70]. Extensions can be made to detect mul- 
tiple objects such as circles, ellipses, lines and other 
shapes, as well as so-called pattern based clustering, 
still obtaining pe, by (29.9), (29.15), and (29.13) but 
with ¢,; being the shortest square distance from x; 


to each shape. However, it is no longer possible to 
use (29.12) for updating the parameters 6; of each shape. 
Instead, learning is done by 

On = pi + NPe.tV og Elt, (29.17) 
where 7 > 0 is a learning step size; for further details 
the reader is referred to [29.69, 70] and [29.8]. 


Mixture of Experts, RBF Networks, 

SBF Functions 
Let each Gaussian to be associated with a function 
f(x|o;) for a mapping x — z, we consider the task of 
learning 


k 
qearler, Y) = > Elx, Gleb, ), D]. 
j=l 
(29.18) 


from a set Dy = {x;,z;}_, of labeled data. The above 
q(z:\x;,W) is actually the alternative mixture of ex- 
perts [29.72], featured by a combination of unsuper- 
vised learning for the Gaussian mixture by (29.10) 
and supervised learning for every f(x|;). For a regres- 
sion task, typically we consider f (x|;) = wi x +c with 
E[z|x] = ey q(j|x, Of (x, o;) implementing a type of 
piecewise linear regression. In implementation, we still 
obtain pe; by (29.9), (29.15), and (29.13) but with the 
following £j, 

sj = —InfojGx| uj, DGE p), (29.19) 
and then compute each Gaussian by (29.12), as well as 
update G(z:lf@, y), 17). When oj = |5]/ D1 |2: 
it becomes equivalent to an extended normalized ra- 
dial basis function (RBF) network and a normalized 
RBF network simply with w; = 0. Moreover, letting 
each subspace be associated with f(x|9;) will lead to 
subspace-based functions (SBFs). For further details 
readers are referred to [29.5] and Sect. 7 of [29.69]. 


29.3.3 Matrix-Based Learning: 
Similarity Graph, Nonnegative 
Matrix Factorization, 
and Manifold Learning 


We proceed to the third stream, featured with 
graph/matrix structures. We start with the sample simi- 
larity graph, with each node for a sample and each edge 


€°6z | d Hed 


508 Part D | Neural Networks 
attached with a similarity measure between two sam- columns being the eigenvectors of W = D~->5wp-5, 
ples, as illustrated in Fig. 29.2b(C(1)). Such a graph corresponding to the first k largest eigenvalues. 
is also equivalently represented by a symmetric ma- Moreover, the above studies are closely related 
trix W = [wy]. to nonnegative matrix factorization (NMF) prob- 
One similarity measure is simply the inner prod- lems [29.73]. For example, the above problem can be 
uct w; =x7x; of two samples. Given a data matrix equivalently expressed as factorization W ~ HTH by 
X = [x,...,xy], we simply have W = XTX. We seek : 
an indicator matrix P = [pe ,,| that divides Xy = ENAS min ||W— H'H|? . (29.23) 
into k clusters, with help of the following maximization HH? =1H>0 
max Tr[H™WH], More generally, the NMF problem considers that 
HHT =I,H>0 
H = diag[n>,.... np, (29.20) X + FH, X>0, F>0,H=>0, (29.24) 
v : er : ; 
5 where H > 0 is a nonnegative matrix with each element 2 illustrated n Fig. 29.2b(C(1)). One typical method is 
= hy > 0, and ne is the number of samples in the ¢-th to iterate the following multiplicative update rule 
T cluster. It can be shown that this problem is equivalent 
e to minimizing E> by (29.2) with £, = Frai Pe sllx:— Hew — g” (FX); new __ pold (XH"); 
wll’. i. e., the same target that k-means aims at. y y (FTFH);’ y ! (FHH"); : 
Computationally, (29.20) is a typical intractable (29.25) 


binary quadratic programming problem, for which var- 
ious approximate methods are proposed. The most sim- 
ple one is dropping the constraint H > 0 to do a PCA 
analysis about the matrix W. That is, the columns of H 
consist of the k eigenvectors of W that correspond to 
the first k largest eigenvalues. Then, each element of 
the matrix diag[n?°, ...,n?-5]H is chopped into 1 or 0 
by arule of thumb. 

Another similarity measure is w; = exp(—||x; — 
x;||7), based on which we consider dividing the nodes 
of a graph into balanced two sets A,B such the to- 
tal sum of w; associated with edges connecting the 
two sets becomes as small as possible. Using a vec- 
torf =[fi.....fy]’ with f, = 1 if x, € A and f; = —1 if 
x, € B, the problem is formulated as follows 


minf' Lf, st. Mess I E0 


L=D-W, D = diag[w1,..., www], (29.21) 
where L is the graph Laplacian. Again, it is an in- 
tractable combinatorial problem and needs to consider 


some approximation. A typical one is given as follows 


T 

mn E ae [1,..., 1]f=0. (29.22) 
£ FTF 

Its solution f is the eigenvector of L that corresponds 

the second smallest eigenvalue. Moreover, this idea has 

been extended to cutting a graph into multiple clus- 

ters, which leads to approximately finding H with its 


which guarantees nonnegativity and is supposed to con- 
verge to a local solution of the following minimization 
IX-FH|?. 


min (29.26) 


HHT =1,H>0,F>0 
Particularly, if we also impose the constraint F'F = I, 
the resulted H divides the columns of X into k clusters, 
while the resulted F also divides the rows of X into k 
clusters, and is thus called bi-clustering [29.74]. 

Several NMF learning algorithms have been de- 
veloped in the literature. In [29.75], a binary matrix 
factorization (BMF) algorithm was developed under 
BYY harmony learning for clustering proteins that 
share similar interactions, featured with the nature of 
automatically determining the cluster number, while 
this number has to be pre-given for most existing BMF 
algorithms. 

In the past decade, the similarity graph and espe- 
cially the graph Laplacian L have also taken important 
roles in another popular topic called manifold learn- 
ing [29.76, 77]. Considering a mapping Y ~ WX, a lo- 
cality preserving projection is made to minimize the 
sum of each distance between two mapped points on the 
graph, subject to a unity L, norm of this projection WX. 

Alternatively, we may also regard that X is gener- 
ated via X = AY + E such that the topological depen- 
dence among Y is preserved by considering 


g(¥) x eT TILAT], (29.27) 


Machine Learning | 29.3 Unsupervised Learning 509 


where A is a positive diagonal matrix. Learning is im- 
plemented by BYY harmony learning, during which 
automatic model selection is made via updating q(Y) 
to drive some diagonal elements of A towards zeros. 
For further details readers are referred to the end part of 
Sect. 5 in [29.58]. 


29.3.4 Tree-Based Learning: Temporal 
Ladder, Hierarchical Mixture, 
and Causal Tree 


Unsupervised learning also includes learning tempo- 
ral and hierarchical underlying dependence structures, 
as illustrated in Fig. 29.2b(D). Instead of directly 
modeling temporal dependence underlying data X = 
[x,,...,Xy], its structure is typically represented in 
a hidden space, while non-temporal or spatial depen- 
dence is represented by a relation from the hidden space 
to the space where X is observed, in the ladder structure 
illustrated in Fig. 29.2b(D(1)). 

One typical example is the classic hidden Markov 
model (HMM). Its hidden space is featured by a discrete 
variable that jumps between a set of discrete values or 
states {s;}, with temporal dependence described by the 
jumping probabilities between the states, typically con- 
sidering p(sj|s;) of jumping from one state s; to another 
sj. The relation from the hidden space to the space of 
X is described by p(x,|s;) for the probability that the 
value of x; is emitted from the state s;. Classically, 
the values of x; are also a set of labels. The task is 
learning from X = [x,,..., xy] two probability matrices 
Q = [p(s|s;)] and E = [p(,|s;)]. Given the number of 
states, learning is typically implemented to maximize 
the likelihood p(X|Q, E) by the well-known Baum- 
Welch algorithm. 

Another example is the classic state-space model 
(SSM), which has been widely studied in the literature 
of control theory and signal processing since the 1960s; 
this has also been called a linear dynamical system with 
considered with renewed interest since the beginning of 
the 2000s. As illustrated in Fig. 29.2b(D(2)), its hid- 
den space is featured by an m-dimensional subspace 
and temporal dependence is described by one first-order 
vector autoregressive model as follows 


Yı = By;-1 + &, Ey,-1€) =0, 


€; = G(e,|0, A), A is diagonal , (29.28) 


while the spatial dependence is represented by a rela- 
tion between the coordinates of the state-space and the 
coordinates of the space of X, e.g., typically by (29.4). 


Though the EM algorithm has also been suggested for 
learning the SSM parameters, the performance is usu- 
ally unsatisfactory because an SSM is generally not 
identifiable due to an indeterminacy of any unknown 
nonsingular matrix, similar to what was discussed pre- 
viously with respect to the FA in (29.5). Favorably, it 
has been shown that the indeterminacy of not only any 
unknown nonsingular matrix but also an unknown or- 
thonormal matrix is usually removed by additionally 
requiring a diagonal matrix B, which leads to temporal 
factor analysis (TFA). 

TFA is an extension of the FA by (29.4) with (29.6) 
replaced by (29.28). As introduced in Sect. 29.3.1, the 
FA is generalized into NFA when (29.6) becomes (29.5) 
with each p(y”) being non-Gaussian. The NFA with 
a real vector y, can be further extended into a temporal 
NFA when (29.5) is also extended by (29.28) with 


ple) = p(e)-+-p(e™). 


Moreover, the BFA (i. e., NFA with a binary vector y,) 
can be extended into a temporal BFA. Also, TFA has 
been extended into an integration of several TFA mod- 
els coordinated by an HMM. For further details readers 
are referred to Sect. 5.2 of [29.58] for a recent overview 
on TFA and its extensions. 

A ladder is merely a special type of tree structure. 
Hierarchical modeling is one other type of tree struc- 
ture, as illustrated in Fig. 29.2b(D(3)). Again, the EM 
algorithm has been extended to implement learning on 
a hierarchical or tree mixture of Gaussians [29.78]. 
Also, a learning algorithm is available for implement- 
ing BYY harmony learning with tree configuration 
determined during learning. A learning algorithm for 
a three-level hierarchical mixture of Gaussians is shown 
in Fig. 12 of [29.3], featured by a hierarchical learn- 
ing flow circling from bottom up as one step and then 
top down as the other step. Similar to (29.16), where 
there is a term of A featuring the difference of BYY 
harmony learning from EM learning, there is also such 
a A term on each level of hierarchy. If these A terms 
are set to be zero, the algorithm degenerates back to 
the EM algorithm. For further details readers are re- 
ferred to Sect. 5.1 in [29.3] and especially equation (55) 
therein. 

Many applications consider several sets of samples. 
Each set is known to come from one model or pat- 
tern class. Typically, one does unsupervised learning on 
each set of samples by a hierarchical mixture of Gaus- 
sians, and then integrates individual hierarchical models 
in a supervised way to form a classifier. 


€°6z | d Hed 


510 Part D 


Neural Networks 


7°62 | d Hed 


Alternatively, we may put together all the individ- 
ual hierarchical mixtures with each as a branch of one 
higher level root of a tree, and then do learning as shown 
in Fig. 12 of [29.3]. The BYY harmony learning algo- 
rithm (including the EM algorithm as its degenerated 
case) for learning a two-level hierarchical mixture of 
Gaussians is shown in Sect. 5.3 of [29.58], and es- 
pecially Fig. 11 therein. This type of learning can be 
regarded as semi-supervised learning in the sense that 
each sample has two teaching labels. One is known, 
indicating which individual hierarchy x, comes from, 
while the other is unknown to be determined, indicat- 
ing which Gaussian component x, comes from. Even 
generally, this type of learning provides a general for- 
mulation that involves the multi-label classification of 
Sect. 29.2.2 (especially labels with a hierarchy). 

There are also real applications that consider a com- 
bination of ladder structures and hierarchical structures. 
For example, what is widely used in speech processing 
is an HMM model with each hidden state associated 
with a two-level hierarchical Gaussian mixture as illus- 
trated in Fig. 11 of [29.58]. Also, extensions are made 
with each Gaussian mixture replaced by a mixture of lo- 
cal subspaces or FA or NFA models. For further details 
readers are referred to Sect. 5.3 and Fig. 14 of [29.3]. 
Another example is considering a two-level hierarchi- 
cal model with both HMM for modeling nonstationary 
temporal dependence and TFA for modeling stationary 
temporal dependence. For further details readers are re- 
ferred to Sect. 5.2.2 of [29.58]. 


29.4 Reinforcement Learning 


Differently to unsupervised learning, reinforcement 
learning gets guidance from external evaluation. Also, 
unlike supervised learning in which the teacher clearly 
specifies the output that corresponds to an input, rein- 
forcement learning is only provided with an evaluative 
value about the action made. Furthermore, reinforce- 
ment learning is featured by a dynamic process in 
discrete time steps. At each step, upon observing the 
current environment and getting some input (if any), the 
learner makes an action and moves to a new state, re- 
ceiving an award or punish value about the action. The 
aim is to maximize the total award received. 

This section provides a brief tutorial on the basic is- 
sues of reinforcement learning, especially TD learning 
and Q-learning. Then, improvements on Q-learning are 
proposed by replacing its built-in winner-take-all com- 
petition mechanism with some unsupervised learning 


Another typical tree structure, as illustrated in 
Fig.29.2b(D(4)), is a learning probabilistic tree, i.e., 
a joint distribution of a set of variables on a tree with 
one node per variable. The most well-known study is 
structuring such tree models for a given set of bi-valued 
variables, as done by Pearl in 1986 [29.79]. Following 
this, one study in 1987 [29.80] extends this to con- 
struct tree representations of continuous variables. It 
has been proved that the tree can be structured from the 
correlations observed between pairs of variables if the 
visible variables are governed by a tree decomposable 
joint normal distribution. Moreover, the conditions for 
tree decomposable normal distribution are less restric- 
tive than those of bi-valued variables. 

Nowadays, many advances have been made along 
this line. Some of the basic results, e.g., (29.15) 
and (29.17) in [29.80], has become a widely used tech- 
nique in network construction for detecting whether an 
edge describes a direct link or a duplicated indirect link. 
For example, considering the association between two 
nodes i,j linked to a third node w with the correlation 
coefficients pj, and pj, we can remove the link j, j if its 
correlation coefficient pj; fails to satisfy, i. e., 

Pij > PiwPwj - (29.29) 
Otherwise we may either choose to keep the link i, j, 
or let three nodes to be linked to a newly added node 
and then remove all the original links among the three 
nodes. 


methods. For further reading readers are referred to tu- 
torials and reviews in [29.6, 81, 82]. 


29.4.1 Markov Decision Processes 


Reinforcement learning is closely related to Markov de- 
cision processes (MDP), which consist of a series of 
states s9,51,...,5;,S;41,.... At a state s,, an action 
a, = T (s+) is selected from the set A of actions accord- 
ing to a policy z, which makes the environment move 
to a new State s,4 1, and the reward r,+; associated with 
the transition (s;,a;,5;41) is received. The goal is to 
collect as much reward as possible, that is, to maximize 
the total reward or return 


N-1 
R= > Ft+1 > 
t=0 


Machine Learning | 29.4 Reinforcement Learning 511 


where N denotes the random time when a terminal state 
is reached. In the case of nonepisodic problems the 
return R = )~°°, y'r,41 is considered by a discount- 
factorO< y <1. 

Given an initial distribution based on which the 
initial state is sampled at random, we can assign the 
expected return E[R|z] to policy m. Since the actions 
are selected according to z, the task is to specify 
an algorithm that can be used to find a policy z to 
maximize E[R]. Suppose we know the state transition 
probability pa(s’|s) = P(s,1 = s’|s; = s) and the cor- 
responding reward r;+1 = R,(s’|s), the standard family 
of algorithms to calculate this optimal policy is featured 
by iterating the following two steps 


(1) x(s) = argmax V“(s), 


D V™(s)=Y~ pro (s"|s) Rare) l) +y], 


(29.30) 


with V7 (s) estimating E[R|s, 7]. The iteration can be 
made in one of several variants as follows: 


@ Doing step (1) once and then repeating step (2) sev- 
eral times or until it converges. Then step (1) is done 
once again, and so on. 

@ Doing step (2) by solving a set of linear equations. 

© Substituting the calculation of 7x (s) into the calcu- 
lation of V* (s) = max, V” (s), resulting in a com- 
bined step 


V* (s) = max 


J Pals’ |s)[Ra(s’|s) 


+ ye , (29.31) 


which is called backward induction and is iterated 
for all states until it converges to what is called the 
Bellman equation. 

© Preferentially applying the steps to states that are in 
some way of importance. 


Under some mild regularity conditions, all the im- 
plementations will reach a policy that achieves these 
optimal values of V*(s) = maxx V” (s) and thus also 
maximizes the expected return E[V”(s)], where s is 
a state that is randomly sampled from the underlying 
distribution. 

In the implementation of MDPs we need to know 
the probability p,(s’|s) per action a. Reinforcement 
learning avoids obtaining this p,(s’|s) with the help 


of stochastic approximation. The two most popular 
examples are temporal difference (TD) learning and Q- 
learning, respectively. The name TD derives from its 
use of differences in predictions over successive time 
steps to drive learning, while the name Q comes from 
its use of a function that calculates the quality of a state- 
action combination. 


29.4.2 TD Learning and Q-Learning 


TD learning aims at predicting a measure of the to- 
tal amount of reward expected over the future. At time 
t, we seek an estimate 7; of R, = par yo bi with 
0 < y < 1. Each estimate is a prediction because it in- 
volves future values of r. We can write 7, = M (s+), 
where JI is a prediction function. The prediction at any 
given time step is updated to bring it closer to the pre- 
diction of the same quantity at the next step, based on 
the error correction 6,4; = R,—TT,(s;). To obtain R, ex- 
actly requires waiting for the arrival of all the future 
values of r. Instead, we use R, = 7,41 + yRr+1 with 
IT,(s;41) as an estimate of R;4, available at step ¢, that 
is, we have 


Opty = ripi + yM Ssi) — (sy) , (29.32) 


which is termed the temporal difference error (or TD 
error). 

The simplest TD algorithm updates the prediction 
function J7, at step ¢ into to a new prediction function 
TI, as follows 


Tui) = M(x) + n1 ifx=s, aa 
BPs TT,(x) otherwise , ` 


where 7 is a learning step size and x denotes any pos- 
sible input signal. The simplest format is a prediction 
function implemented as a lookup table. Suppose that 
s, takes only a finite number of values and that there 
is an entry in a lookup table to store a prediction for 
each of these values. At step ¢, the state s; moves to the 
next s;+ı based on the current status of the table, e.g., 
the table entry for 5s, is the largest across the table, or 
S;4+1 is selected according to a fixed policy. When r;+1 
is observed, only the table entry for s; changes from its 
current value of 7, = I7;(s;) to (s) + 6:41. 

The algorithm uses a prediction of a later quantity 
TT,(s;+1) to update a prediction of an earlier quantity 
IT,(s;). As learning proceeds, later predictions tend to 
become accurate sooner than earlier ones, resulting in 
an overall error reduction. This depends on whether an 


7°62 | d Hed 


512 


1°62 | d Hed 


Part D 


Neural Networks 


input sequence has sufficient regularity to make pre- 
dicting possible. When s; comes from the states of 
a Markov chain, on which the r values are given by 
a function of these states, a prediction function may 
exist that accurately gives the expected value of the 
quantity R, for each t. 

Another view of the TD algorithm is that it operates 
to maintain the following consistency condition 


TT(s;) = r1 + yO (si41) . 


which must be satisfied by correct predictions. By the 
theory of Markov decision processes, any function that 
satisfies R; = r;+1 + yR-+1 for all ¢ must actually give 
the correct predictions. The TD error indicates how 
far the current prediction function deviates from this 
condition, and the algorithm acts to reduce this error to- 
wards this condition. Actually, /7,(s,) + 76,41 = (1 — 
ns) + n[riti + yM (s:+1)] is a type of stochastic 
approximation to the value function in (29.30), without 
directly requiring to know the probability p,(s’|s). 

Alternatively, Q-learning calculates the quality of 
a state-action combination, i. e., estimating Q(s,,a;) of 
R, conditionally on the action a; at s+. The implementa- 
tion of Q-learning consists of 


(29.34) 


a, = arg max Q,(s;,a) , 
a EA 


O(x,a)+75,41(a) ifx=s,a=a,; 


Oria) = QO,(x, a) otherwise, 
ô+ (a) = r(S;, a) F Y AE O,(s;+1, €) = O.(S;, a) . 
(29.35) 


At s;, an action a, € A is obtained in an easy compu- 
tation, and then makes a move to a new state s;+1. 
Receiving the reward 7,41 = r(s,, as) associated with 
the transition (s+, ar, S++1), only the table entry for s, 
and a, is updated. 

The format of 5,4, is similar to the one in (29.32) 
with the prediction J7,(s;) replaced by Q(s;,a;) and 
TT,(s;41) replaced by max, Q,(s;41,a). Alterna- 
tively, we may select a;+, by a fixed policy and 
then obtain 6,4; with max, Q,(s;41,a) replaced 
by Q:(5:+1,.4:41), Which leads to a variant of the 
Q-learning rule called state-action-reward-state-action 
(SARSA). Under some mild regularity conditions, sim- 
ilarly to TD learning, both Q-learning and SARSA 
converge to prediction functions that make optimal ac- 
tion choices. 

Both TD learning and Q-learning have variants and 
extensions. In the following, we briefly summarize two 
typical streams: 


@ In (29.33) and (29.35), only the table entry for s, 
is modified, though r;; provides useful informa- 
tion for learning earlier predictions as well. Under 
the name of eligibility traces, an exponentially de- 
caying memory trace is provided on a number of 
previous input signals so that each new observa- 
tion can update the parameters related to these 
signals. 

@ In addition to a lookup table, the prediction func- 
tion can be replaced by a more advanced pre- 
diction function. It could be a linear or non- 
linear regression function F;(@;,0) with input 
signals œr = fe) xe}, Each x® could be 
either a state or an action or even one addi- 
tional feature around a state in one eligibility 
trace, where t can be different from ¢. Then, 
learning adjusts 0 to reduce the error 6,4; or 
6;41(a1). 


29.4.3 Improving Q-Learning 
by Unsupervised Methods 


Examining the Q-learning by (29.35), we observe that 
it shares some common features with the multi-model- 
based learning introduced in Sect. 29.3. For a set A of 
finite many actions, we use the index £= 1,...,k to 
denote each action. Obtaining a, in (29.35) is equiv- 
alent to obtaining pe, in (29.9) with £j; = —O,(s;,J), 
that is, a selection is made by winner-take-all (WTA) 
competition. Then, updating Q,(s,,a) in (29.35) can be 
rewritten as follows 


0;41(8;, O) = 0,(s;,€) + n pe 1+1 (8), 
Q+ (s, £)=Q,(s,£), fors E S+, 


which is similar to the general updating rule by (29.17), 
with pe, selecting which column of Q table to up- 
date. This motivates the following improvements on 
Q-learning, motivated by the multi-model-based learn- 
ing methods in Sect. 29.3.2. 

First, the WTA selection of the above pe, can be 
replaced by an estimation of the posteriori probabilities 
as follows 


(29.36) 


k 
pea = (Elsi), q(els) = et / YT er | 


j=1 
(29.37) 
Putting this into (29.38), we improve the weak points 


incurred from a WTA competition by updating all the 
columns of the Q table with the weights by pe, as 


Machine Learning | 29.5 Semi-Supervised Learning 513 


a counterpart of (29.12) of the well-known EM algo- 
rithm for learning a finite mixture. 

Second, 5,41 (q@) in (29.35) uses maxa Q;(S;+1, 4) 
as a prediction of the Q-value at s;+1, also by a WTA 
competition that gives an optimistic choice. Alterna- 
tively, we can use the following more reliable one 


541 (81, a;) = (Sn a) + yAre (S41), 
Ares) =) > aGils)Qx(s,f) — Or(s1,ar), (29.38) 
j 


where q(j|s;+1) is given by (29.37) with s = s;+1 in- 
stead of s = s,, to obtain Are +(s:++1) with q(¢|s). 

Third, instead of pg, given by (29.37), we may also 
use a counterpart of (29.16) to implement Q-learning 
with help of BYY harmony learning. That is, we con- 
sider 


Per = (Els) + Are sy], (29.39) 


by which an action is encouraged when its value is 
higher than the average of all actions, while an action 
is discouraged when its value is below the average but 
still not too far away, and then is repelled when its value 
is far below this average. 


29.5 Semi-Supervised Learning 


In many real tasks it is easy to obtain a large amount 
of unlabeled training data but labeling them is expen- 
sive because of the requirement of great human effort 
and expertise or high execution cost. Semi-supervised 
learning [29.83—-86] attempts to exploit unlabeled data 
to help improve the learning performance without as- 
suming human intervention. In situations where the 
unlabeled data are exactly the test data, it is also called 
transductive learning [29.87]. 

Figure 29.3 illustrates why unlabeled data (gray 
points) can be helpful. It can be seen that although both 
classification boundaries are consistent with labeled 
data, the boundary obtained by considering unlabeled 
data is better in generalization. One reason is that the 
unlabeled data can disclose some information about 
data distribution which is helpful for model construc- 
tion. 

There are two popular assumptions connecting the 
distribution information disclosed by unlabeled data 
with label information. The cluster assumption assumes 


Moreover, we may simplify the above p¢ _; by focus- 
ing on a few of major actions, e.g., the winning action 
a, = arg max, ca Q;(s;,a) and its rival action similar 
to rival penalized competitive learning (RPCL) by pg. 
given as follows 


1, €* =argmax; Q,(s;,j) , 
Pea = jy, €=argmaxezex Q,(5:,/), 
0, otherwise , 
(29.40) 


i.e., the winning action is encouraged while its rival is 
repelled. 

BYY harmony learning and RPCL learning lead to 
discriminative Q-learning by which actions at each state 
become more discriminative and thus easier to be se- 
lected. As a result, confusing branches in a searching 
tree will be pruned away. Moreover, we may discard 
one extra action if we observe that its corresponding 


a = (Imag + ng(Elss , (29.41) 


is pushed to zero. Actually, this is the nature of auto- 
matic model selection, which controls the complexity 
of function Q(s, j). 


that data with similar inputs have similar class la- 
bels; the manifold assumption assumes that data live 
in a low-dimensional manifold, whereas unlabeled data 
can help to identify that manifold. The latter can be re- 
garded as a generalization of the former because it is 
usually assumed that the cluster structure of the data 
will be more easily found in the lower-dimensional 


a) Without unlabeled data b) With unlabeled data 


Fig. 29.3a,b Illustration of the usefulness of unlabeled 
data 


S°6z | d Hed 


514 Part D 


Neural Networks 


9°62 | d Hed 


manifold. These assumptions are closely related to low- 
density separation, which specifies that the boundary 
should not go across high-density regions in the in- 
stance space. 

Many semi-supervised learning methods have been 
developed. Roughly speaking, they can be catego- 
rized into four categories. In generative methods, both 
labeled and unlabeled data are assumed to be gen- 
erated by the same model, and thus, the unlabeled 
data can be exploited to model the label estimation 
or parameter estimation process. For example, if we 
assume the data come from a mixture model with T 
components, i. €., 


T 


FAIA) = Do af (x16), 


t=1 


(29.42) 


where œ, is mixing coefficient and 6 = {6,} are the 
model parameters, then label c; can be determined by 
the mixture component m; and the instance x; accord- 
ing to the maximum a posteriori criterion 


arg max ` P (c; = k\m; = j, xj) P (m; = j|xi) , 
mg 
(29.43) 


where estimating P(c; = k|m; = j, xi) requires label in- 
formation, but unlabeled data can be used to help 
estimate P(m; =/j|x;), and hence improve the learn- 
ing performance. Actually, the posteriori probability is 
equivalently given by a mixture of experts that will be 
further addressed next in Sect. 29.6.1. 

In semi-supervised support vector machines 
(S3VM), unlabeled data are used directly to help 
adjust the decision boundary, as illustrated in Fig. 29.3. 
Given / labeled examples and u unlabeled instances, 
the goal is usually accomplished by minimizing an 


29.6 Ensemble Methods 
29.6.1 Basic Concepts 


Ordinary learning methods try to construct one learner 
from training data, whereas ensemble methods [29.14] 
try to construct a set of learners and combine them 
to solve the problem. Such kinds of learning meth- 
ods are also called committee-based learning, meta- 
learning, or multiple classifier systems, although en- 
semble methods have also been found to be helpful in 


objective 


1 l u 
zlwl3e + Cr X eifa + Co D2 FON, 
i=1 j=l 


(29.44) 


where the first term is structural risk, the second 
term is empirical risk on the labeled data (x;, y;), the 
third term is empirical risk on the unlabeled instances 
4 (j= 1,...,u) and the estimated outputs ĵ;, whereas 
C,/C2 balance the contribution of labeled/unlabeled 
data. 

Graph-based methods construct a graph whose 
nodes are the training instances (both labeled and un- 
labeled), and the edges between nodes reflect a cer- 
tain relation, such as similarity, between the corre- 
sponding examples. Then, the learning process is ac- 
complished by propagating label information on the 
graph. 

Disagreement-based methods generate multiple 
learners and exploit the disagreements among the learn- 
ers, where unlabeled data serve as a kind of platform 
for information exchange; if one learner is much more 
confident on a disagreed unlabeled instance than other 
learner(s), then it will teach other(s) by assigning a pre- 
dicted pseudo-label to the instance. A representative of 
this category is co-training [29.31], which constructs 
two learners from two different views, and thus is 
closely related to multi-view learning. 

In addition to classification, semi-supervised regres- 
sion, dimension reduction, clustering, etc., have also 
been well studied. It is worth mentioning that exploiting 
unlabeled data does not always improve the perfor- 
mance, and sometimes the performance may be even 
worse than using only the labeled data. Some recent 
studies have tried to address this issue under the name 
of safe semi-supervised learning [29.88]. 


clustering [29.14, 89,90] and various tasks other than 
classification. 

Figure 29.4 shows a common ensemble architec- 
ture. An ensemble contains a number of base learners, 
or individual learners, component learners, or weak 
learners because the main purpose of ensemble meth- 
ods is to generate strong learners by combining learn- 
ers whose generalization performances are not strong. 
Base learners can be generated by a base learning 


Machine Learning | 29.6 Ensemble Methods 


algorithm, such as a decision tree algorithm, a neu- 
ral network algorithm, etc., and such ensembles are 
called homogeneous ensembles because they contain 
homogeneous base learners. An ensemble can also be 
heterogeneous if multiple types of base learners are 
included. 

The generalization ability of an ensemble is often 
much stronger than that of base learners. Roughly, there 
are three threads of studies that lead to the state-of- 
the art of ensemble methods. The combining classifiers 
thread was mostly studied in the pattern recognition 
community, where researchers usually focused on the 
design of powerful combining rules to obtain a strong 
combined classifier [29.28, 29]. The mixture of experts 
thread generally considered a divide-and-conquer strat- 
egy, trying to learn a mixture of parametric models 
jointly [29.91]. Equation (29.43) is actually a mixture of 
experts for classification, with P(c; = k|m; = j, x;) be- 
ing the individual expert and P(m; = j|x;) the gating net, 
especially for the one given in [29.72] where the gating 
net is given by the posteriori of af (x|6,) in (29.42). The 
ensembles of weak learners thread often works on weak 
learners and tries to design powerful algorithms to boost 
performance from weak to strong. Readers are referred 
to [29.30] for a recent survey on combining classifiers 
and mixture-of-experts as well as their relations, and to 
Sect. 5 of [29.86] for a brief overview on all the three 
threads. 

Generally, an ensemble is built in two steps; that 
is, generating the base learners and then combining 
them. It is worth noting that the computational cost 
of constructing an ensemble is often not much larger 
than creating a single learner. This is because when 
using a single learner, one usually has to generate 
multiple versions of the learner for model selection 
or parameter tuning; this is comparable to generat- 
ing multiple base learners in ensembles, whereas the 
computational cost for combining base learners is 
often small because most combining rules are sim- 
ple. 

The term boosting refers to a family of algorithms 
originated in [29.92], with AdaBoost [29.93] as its rep- 


Problem 


Learner | Learner | se 


Fig. 29.4 A common ensemble architecture 


resentative. This kind of algorithm is usually provably 
able to convert weak learners that are just slightly better 
than random guess to strong learners that have nearly 
perfect performance. 

Algorithm 29.1 shows the pseudo-code of 
AdaBoost. Roughly speaking, the basic idea of 
boosting is to let later learners try to correct the mis- 
takes made by earlier learners, and this is accomplished 
by deriving in each round a new data distribution which 
makes the earlier mistakes more evident. The base 
learners should be able to learn with specific distribu- 
tions; this is usually accomplished by re-weighting or 
re-sampling the training examples according to the data 
distribution in each round. Such a learning process is 
very similar to residual minimization, and it has a close 
relation to additive models, inspiring an interpretation 
that AdaBoost is a stagewise estimation procedure for 
fitting an additive logistic regression model with an 
exponential loss [29.94]. Notice that AdaBoost was 
designed for binary classification, but it has many 
variants for multi-class problems [29.93, 95, 96]. 

It has been proved [29.93] that the generalization 
error of AdaBoost is upper bounded by 


-( TaT 
<< +0(/2) ; 
m 


with probability at least 1 — 5, where €p is the error on 
the training sample D, d is the VC-dimension of base 
learners, m is the number of training samples, and O(-) 
is used instead of O(-) to hide logarithmic terms and 
constant factors. This generalization bound implies that 
the complexity d of base learners and the number T of 
learning rounds need to be constrained; otherwise Ad- 
aBoost will overfit. Empirical studies, however, show 
that AdaBoost often seems resistant to overfitting; that 
is, the test error often tends to decrease even after the 
training error reaches zero. 


(29.45) 


Algorithm 29.1 The AdaBoost Algorithm 
Input: data set D = {@1, y1), (x2, y2), aa (Xm, Ym) 3; 
Base learning algorithm £; number of learning 
rounds T. 
Process: 
1: Diı(x)= 1/m. % initialize the weight distribution 
2: fort=1,...,T: 
3: h,=X(D,D,); % train a classifier h, from D 
under distribution D, 
4: €, = Pyx op, (h(x) Æ f(x)); % evaluate the error 
of h, 
5: ife, > 0.5 then break 


515 


9°67 | d Hed 


516 Part D 


Neural Networks 


9°62 | d Hed 


6 a,=iln ( - ): % determine the weight of h; 
D, (xEXp(—arf ah (x)) 


T Dai) = 7; % update 
the distribution, where Z, is 
% a normalization factor which enables D,+1 

to be a distribution 


8: end 
Output: H (x) = sign (Zai ahi) 


29.6.2 Boosting 


For binary classification, formally, f(x) € {—1, +1}, the 
margin of the classifier h on the instance x is defined 
as f (x)h(x), and similarly, the margin of the ensemble 
A(x) = Da ah is fOO) = DP, fha), 


whereas the normalized margin is 


Eii Od Oh) 
ELi Oy 


where œ, are the weights of base learners. Given any 
threshold 6 > 0 of margin over the training sample D, 
it was proved in [29.97] that the generalization error of 
the ensemble is bounded with probability at least 1 — ô 
by 


fQ)A(x) = ; (29.46) 


T 
€<2"|] vel-®U-e,)!+8 


t=1 
+0 Z ine 
m02 ng i 


where €; is the training error of the base learner hy. 
This bound implies that when other variables are fixed, 
the larger the margin over the training set, the smaller 
the generalization error. Thus, [29.97] argued that Ad- 
aBoost tends to be resistant to overfitting because it can 
increase the ensemble margin even after the training er- 
ror reaches zero. 

This margin-based explanation seems reasonable; 
however, it was later questioned [29.98] by the fact 
that (29.47) depends heavily on the minimum margin, 
whereas there are counterexamples where an algorithm 
is able to produce uniformly larger minimum mar- 
gins than AdaBoost, but the generalization performance 
drastically decreases. From then on, there was much de- 
bate about whether the margin-based explanation holds; 
more details can be found in [29.14]. 


(29.47) 


One drawback of AdaBoost lies in the fact that it 
is very sensitive to noise. Great efforts have been de- 
voted to address this issue [29.99, 100]. For example, 
RobustBoost [29.101] tries to improve the noise toler- 
ance ability by boosting the normalized classification 
margin, which was believed to be closely related to the 
generalization error. 


29.6.3 Bagging 


In contrast to sequential ensemble methods such as 
boosting where the base learners are generated in a se- 
quential style to exploit the dependence between the 
base learners, bagging [29.99] is a kind of parallel en- 
semble method where the base learners are generated 
in parallel, attempting to exploit the independence be- 
tween the base learners. 

The name bagging came from the abbreviation of 
Bootstrap AGGregatING. Algorithm 29.2 shows its 
pseudo-code, where I is the indicator function. Bagging 
applies bootstrap sampling [29.102] to obtain multi- 
ple different data subsets for training the base learners. 
Given m training examples, a set of m training examples 
is generated by sampling with replacement; some orig- 
inal examples appear more than once, whereas some do 
not present. By applying the process T times, T such 
sets are obtained, and then each set is used for training 
a base learner. Bagging can deal with binary as well as 
multi-class problems by using majority voting to com- 
bine base learners; it can also be applied to regression 
by using averaging for combination. 


Algorithm 29.2 The Bagging Algorithm 

Input: Data set D = {@1; y1), (x2, y2), srry (Xm, Ym)}5 
Base learning algorithm £; 
Number of base learners T. 


Process: 
1: fort=1,...,T: 
2: hi = &(D, Dys) % Dos is bootstrap distribution 
3: end 


Output: H(x) = arg max Y I(h,(x) = y) 
yey 


Theoretical analysis [29.99, 103, 104] shows that 
Bagging is particularly effective with unstable base 
learners (whose performance will change significantly 
with even slight variation of the training sample), such 
as decision trees, because it has a tremendous vari- 
ance reduction effect, whereas it is not wise to apply 
Bagging to stable learners, such as nearest neighbor 
classifiers. 


Machine Learning | 29.6 Ensemble Methods 


A prominent extension of Bagging is the random 
forest method [29.105], which has been successfully 
deployed in many real tasks. Random forest incorpo- 
rates a randomized feature selection process in con- 
structing the individual decision trees. For each indi- 
vidual decision tree, at each step of split selection, it 
randomly selects a feature subset, and then executes 
conventional split selection within the feature subset. 
The recommended size of feature subsets is the loga- 
rithm of the number of all features [29.105]. 


29.6.4 Stacking 


Stacking [29.106-108] trains a meta-learner (or 
second-level learner), to combine the individual learn- 
ers (or first-level learners). First-level learners are often 
generated by different learning algorithms, and there- 
fore, stacked ensembles are often heterogeneous, al- 
though it is also possible to construct homogeneous 
stacked ensembles. Also, one similar approach was pro- 
posed in IJCNN1991 [29.109], with meta-learner re- 
ferred to by a different name called associative switch, 
which is learned from examples for combining multiple 
classifiers. 

Stacking can be viewed as a generalized framework 
of many ensemble methods, and can also be regarded 
as a specific combination method, i.e., combining by 
learning. It uses the original training examples to con- 
struct the first-level learners, and then generates a new 
data set to train the meta-learner, where the first-level 
learners’ outputs are used as input features whereas the 
original labels are used as labels. Notice that there will 
be a high risk of overfitting if the exact data that are 
used to train the first-level learners are also used to gen- 
erate the new data set for the meta-learner. Hence, it is 
recommended to exclude the training examples for the 
first-level learners from the data that are used for the 
meta-learner, and a cross-validation procedure is usu- 
ally used. 

It is crucial to consider the types of features for 
the new training data, and the types of learning al- 
gorithms for the meta-learner [29.106]. It has been 
suggested [29.110] to use class probabilities instead 
of crisp class labels as features for the new data, and 
to use multi-response linear regression (MLR) for the 
meta-learner. It has also been suggested [29.111] to use 
different sets of features for the linear regression prob- 
lems in MLR. 

If stacking (and many other ensemble methods) is 
simply viewed as assigning weights to combine differ- 
ent models, then it is closely related to Bayes model 


averaging (BMA), which assigns weights to models 
based on posterior probabilities. In theory, if the cor- 
rect data generation model is in consideration and if 
the noise level is low, BMA is never worse and of- 
ten better than stacking. In practice, however, BMA 
rarely performs better than stacking, because the correct 
data generalization model is usually unknown, whereas 
BMA is quite sensitive to model approximation er- 
ror [29.112]. 


29.6.5 Diversity 


If the base learners are independent, an amazing combi- 
nation effect will occur. Taking binary classification, for 
an example, suppose each base learner has an indepen- 
dent generalization error € and T learners are combined 
via majority voting. Then, the ensemble makes an error 
only when at least half of its base learners make errors. 
Thus, by Hoeffding inequality, the generalization error 
of the ensemble is 


Lr/2] (T ee 1 
ae (Jeo sem (-37 2-1?) : 


(29.48) 


which implies that the generalization error decreases 
exponentially to the ensemble size T, and ultimately ap- 
proaches zero as T approaches infinity. 

It is practically impossible to obtain really indepen- 
dent base learners, but it is generally accepted that to 
construct a good ensemble, the base learners should 
be as accurate as possible, and as diverse as possi- 
ble. This has also been confirmed by error-ambiguity 
decomposition and bias-variance-covariance decompo- 
sition [29.113-115]. Generating diverse base learners, 
however, is not easy, because these learners are gener- 
ated from the same training data for the same learning 
problem, and thus they are usually highly correlated. 
Actually, we need to require that the base learners must 
not be very poor; otherwise their combination may even 
worsen the performance. 

Usually, combining only accurate learners is of- 
ten worse than combining some accurate ones together 
with some relatively weak ones, because the comple- 
mentarity is more important than pure accuracy. Notice 
that it is possible to do some selection to construct 
a smaller but stronger ensemble after obtaining all base 
learners [29.116], possibly because this way makes it 
easier to trade off between individual performance and 
diversity. 


517 


9°62 | d Hed 


518 Part D 


Neural Networks 


2°62 | d Hed 


Unfortunately, there is not yet a clear understand- 
ing about diversity although it is crucial for ensemble 
methods. Many efforts have been devoted to design- 
ing diversity measures, however, none of them is well- 


29.7 Feature Selection and Extraction 


Real-world data are often high-dimensional and contain 
many spurious features. For example, in face recog- 
nition, an image of size m xn is often represented as 
a vector in R™”, which can be very high-dimensional 
for typical values of m and n. Similarly, biological 
databases such as microarray data can have thousands 
or even tens of thousands of genes as features. Such 
a large number of features can easily lead to the curse 
of dimensionality and severe overfitting. A simple ap- 
proach is to manually remove irrelevant features from 
the data. However, this may not be feasible in practice. 
Hence, automatic dimensionality reduction techniques, 
in the form of either feature selection or feature extrac- 
tion, play a fundamental role in many machine learning 
problems. 

Feature selection selects only a relevant subset of 
features for use with the model. In feature selection, 
the features may be scored either individually or as 
a subset. Not only can feature selection improve the 
generalization performance of the resultant classifier, 
the use of fewer features is also less computationally 
expensive and thus implies faster testing. Moreover, it 
can eliminate the need to collect a large number of ir- 
relevant and redundant features, and thus reduces cost. 
The discovery of a small set of highly predictive vari- 
ables also enhances our understanding of the underlying 
physical, biological, or natural processes, beyond just 
the building of accurate black-box predictors. 

Feature selection and extraction has been a classic 
topic in the literature of pattern recognition for sev- 
eral decades; many results obtained before the 1980s 
are systematically summarized in [29.118]. Reviews on 
further studies in the recent three decadse are referred 
to [29.119-121]. Roughly, feature selection methods 
can be classified into three main paradigms: filters, 
wrappers, and the embedded approach [29.120]. Filters 
score the usefulness of the feature subset obtained as 
a pre-processing step. Commonly used scores include 
mutual information and the inter/intra class distance. 
This filtering step is performed independently of the 
classifier and is typically least computationally expen- 
sive among the three paradigms. Wrappers, on the other 


accepted [29.14, 117]. In practice, heuristics are usually 
employed to generate diversity, and popular strategies in- 
clude manipulating data samples, input features, learning 
parameters, and output representations [29.14]. 


hand, score the feature subsets according to their pre- 
diction performance when used with the classifier. In 
other words, the classifier is trained on each of the can- 
didate feature subsets, and the one with the best score 
is then selected. However, as the number of candidate 
feature subsets can be very large, this approach is com- 
putationally expensive, though it is also expected to 
perform better than filters. Both filters and wrappers 
rely on search strategies to guide the search for the 
best feature subset. While a large number of search 
strategies can be used, one is often limited to the com- 
putationally simple greedy strategies: (i) forward, in 
which features are added to the candidate set one by 
one; or (ii) backward, in which one starts with the full 
feature set and deletes features one by one. Finally, 
embedded methods combine feature selection with the 
classifier to create a sparse model. For example, one can 
use the £; regularizer which shrinks the coefficients of 
the useless features to zero, essentially removing them 
from the model. Another popular algorithm is called 
recursive feature elimination [29.122] for use with sup- 
port vector machines. It repeatedly constructs a model 
and then removes those features with low weights. Em- 
pirically, embedded methods are often more efficient 
than filters and wrappers [29.120]. 

While most feature selection methods are super- 
vised, there are also recent works on feature selection 
in the unsupervised learning setting. However, unsuper- 
vised feature selection is much more difficult due to the 
lack of label information to guide the search for relevant 
features. Most unsupervised feature selection methods 
are based on the filter approach [29.123-125], though 
there are also some studies on wrappers [29.126] and 
embedded approaches [29.124, 127-129]. 

Recently, feature selection in multi-task learning 
has been receiving increasing attention. Recall that 
the Z; regularizer is commonly used to induce fea- 
ture selection in single-task learning; this is extended 
to the mixed norms in MTL. Specifically, let W = 
[W1,W2,...,Wr], where w; € Rf is the parameter asso- 
ciated with the t-th task. To enforce joint sparsity across 
the T tasks, the £49,; norm of W is used as the regular- 


Machine Learning | References 


izer, i. e., ||Wlloo. = 2i maxı<i<r |W,i| [29.130]. In 
other words, one uses an so norm on the rows of the W 
to combine the contributions of each row (feature) from 
all the tasks, and then combine the features by using 
the £; norm, which, because of its sparsity-encouraging 
property, leads to only a few nonzero rows of W. 
Instead of only selecting a subset from the exist- 
ing set of features, feature extraction aims at extracting 
a set of new features from the original features. This can 
be viewed as performing dimensionality reduction that 
maps the original features to a new lower-dimensional 
feature space, while ensuring that the overall struc- 
ture of the data points remains intact. The unsupervised 
methods previously introduced in Sect. 29.3.1 can all 
be used for feature extraction. The classic ones consist 
of principal component analysis (PSA) and princi- 
pal subspace analysis (PSA), and their complemen- 
tary counterparts minor component analysis (MCA) 
and minor subspace analysis (MSA), as well as the 
closely related factor analysis (FA), while indepen- 
dent component analysis (ICA) and non-Gaussian fac- 


References 


29.1 H. Simon: Why should machines learn? In: Ma- 
chine Learning. An Artificial Intelligence Ap- 
proach, ed. by I.R. Anderson, R.S. Michalski, 
J.G. Carbonell, T.M. Mitchell (Tioga Publ., Palo Alto 
1983) 

29.2 L. Xu: Bayesian Ying Yang learning, Scholarpedia 
2(3), 1809 (2007) 

29.3 L. Xu: Bayesian Ying-Yang system, best harmony 
learning, and five action circling, Front. Electr. 
Electr. Eng. China 5(3), 281-328 (2010) 

29.4 L. Xu, S. Klasa, A. Yuille: Recent advances on tech- 
niques static feed-forward networks with super- 
vised learning, Int. J. Neural Syst. 3(3), 253-290 
(1992) 

29.5 L. Xu: Learning algorithms for RBF functions and 
subspace based functions. In: Handbook of Re- 
search on Machine Learning, Applications and 
Trends: Algorithms, Methods and Techniques, 
ed. by E. Olivas, J.D.M. Guerrero, M.M. Sober, 
J.R.M. Benedito, A.J.S. Lopez (Inform. Sci. Ref., 
Hershey 2009) pp. 60-94 


29.6 L. Xu: Several streams of progresses on unsuper- 
vised learning: A tutorial overview, Appl. Inf. 1 
(2013) 


29.7 A. Jain: Data clustering: 50 years beyond k- 
means, Pattern Recognit. Lett. 31, 651-666 
(2010) 

29.8 H. Kriegel, P. Kroger, A. Zimek: Clustering high- 
dimensional data: A survey on subspace clus- 


tor analysis (NFA) are further developments of PCA 
and FA, respectively. Another popular further develop- 
ment of PCA is kernel principal component analysis 
(KPCA) [29.131]. 

Moreover, feature extraction and unsupervised 
learning are coordinately conducted in many learn- 
ing tasks, such as local factor analysis (LFA) in 
Sect. 29.3.2, nonnegative matrix factorization (NMF) 
and manifold learning in Sect. 29.3.3, temporal and 
hierarchical learning in Sect. 29.3.4, as well as other 
latent factor featured methods. Furthermore, the use 
of supervised information can lead to even better dis- 
criminative features for classification problems. Linear 
discriminant analysis (LDA) is the most classic ex- 
ample, which results in the Bayes optimal transform 
direction in the special case that the two classes are 
normally distributed with the same covariance. Learn- 
ing multiple layer perceptron or neural networks can be 
regarded as nonlinear extensions of LDA, with hidden 
units extracting optimal features for supervised classifi- 
cation and regression. 


tering, pattern-based clustering, and correlation 
clustering, ACM Trans. Knowl. Discov. Data 3(1), 1 
(2009) 

29.9 H. Yin: Advances in adaptive nonlinear mani- 
folds and dimensionality reduction, Front. Electr. 
Electr. Eng. China 6(1), 72-85 (2011) 

29.10 T.T. Kohonen Honkela: Kohonen network, Schol- 
arpedia 2(1), 1568 (2007) 

29.11 L. Xu, J. Neufeld, B. Larson, D. Schuurmans: Max- 
imum margin clustering, Adv. Neural Inf. Process. 
Syst. (2004) pp. 1537-1544 

29.12 K. Zhang, l. Tsang, J. Kwok: Maximum mar- 
gin clustering made practical, IEEE Trans. Neural 
Netw. 20(4), 583-596 (2009) 

29.13 Y.-F. Li, |. Tsang, J. Kwok, Z.-H. Zhou: Tighter and 
convex maximum margin clustering, Proc. 12th 
Int. Conf. Artif. Intell. Stat. (2009) 

29.14 Z.-H. Zhou: Ensemble Methods: Foundations and 
Algorithms (Taylor Francis, Boca Raton 2012) 

29.15 G. Tsoumakas, |. Katakis, |. Vlahavas: Mining 
multi-label data. In: Data Mining and Knowledge 
Discovery Handbook, 2nd edn., ed. by 0. Maimon, 
L. Rokach (Springer, Berlin, Heidelberg 2010) 

29.16 C. Silla, A. Freitas: A survey of hierarchical classifi- 
cation across different application domains, Data 
Min. Knowl. Discov. 22(1/2), 31-72 (2010) 

29.17 W. Bi, J. Kwok: Multi-label classification on tree- 
and DAG-structured hierarchies, Proc. 28th Int. 
Conf. Mach. Learn. (2011) 


519 


6z | d Hed 


520 PartD 


Neural Networks 


62 | d Hed 


29.18 


29.19 


29.20 


29.21 


29.22 


29.23 


29.24 


29.25 


29.26 


29.27 


29.28 


29.29 


29.30 


29.31 


29.32 


29.33 


29.34 


W. Bi, J. Kwok: Hierarchical multilabel classifica- 
tion with minimum Bayes risk, Proc. Int. Conf. 
Data Min. (2012) 

W. Bi, J. Kwok: Mandatory leaf node prediction in 
hierarchical multilabel classification, Adv. Neural 
Inf. Process. Syst. (2012) 

T.G. Dietterich, R.H. Lathrop, T. Lozano-Pérez: 
Solving the multiple-instance problem with axis- 
parallel rectangles, Artif. Intell. 89(1-2), 31-71 
(1997) 

Z.-H.M.-L. Zhou Zhang: Solving multi-instance 
problems with classifier ensemble based on con- 
structive clustering, Knowl. Inf. Syst. 11(2), 155-170 
(2007) 

Z.-H. Zhou, Y.-Y. Sun, Y.-F. Li: Multi-instance 
learning by treating instances as non-i.i.d. sam- 
ples, Proc. 26th Int. Conf. Mach. Learn. (2009) 
pp. 1249-1256 

Z.-H. Zhou, J.-M. Xu: On the relation between 
multi-instance learning and semi-supervised 
learning, Proc. 24th Int. Conf. Mach. Learn. (2007) 
pp. 1167-1174 

N. Weidmann, E. Frank, B. Pfahringer: A two-level 
learning method for generalized multi-instance 
problem, Proc. 14th Eur. Conf. Mach. Learn. (2003) 
pp. 468-479 

S.D. Scott, J. Zhang, J. Brown: On generalized 
multiple-instance learning, Int. J. Comput. Intell. 
Appl. 5(1), 21-35 (2005) 

Z.-H. Zhou, M.-L. Zhang, S.-J. Huang, Y.-F. Li: 
Multi-instance multi-label learning, Artif. Intell. 
176(1), 2291-2320 (2012) 

J. Foulds, E. Frank: A review of multi-instance 
learning assumptions, Knowl. Eng. Rev. 25(1), 1- 
25 (2010) 

L. Xu, A. Krzyzak, C. Suen: Several methods for 
combining multiple classifiers and their applica- 
tions in handwritten character recognition, IEEE 
Trans. Syst. Man Cybern. SMC 22(3), 418-435 (1992) 
J. Kittler, M. Hatef, R. Duin, J. Matas: On com- 
bining classifiers, IEEE Trans. Pattern Anal. Mach. 
Intell. 20(3), 226-239 (1998) 

L. Xu, S.I. Amari: Combining classifiers and learn- 
ing mixture-of-experts. In: Encyclopedia of Ar- 
tificial Intelligence, ed. by J. Dopioco, J. Dorado, 
A. Pazos (Inform. Sci. Ref., Hershey 2008) pp. 318- 
326 

A. Blum, T. Mitchell: Combining labeled and un- 
labeled data with co-training, Proc. 11th Annu. 
Conf. Comput. Learn. Theory (1998) pp. 92- 
100 

S. Abney: Bootstrapping, Proc. 40th Annu. Meet. 
Assoc. Comput. Linguist. (2002) pp. 360-367 
M.-F. Balcan, A. Blum, K. Yang: Co-training and 
expansion: Towards bridging theory and practice, 
Adv. Neural Inf. Process. Syst. (2005) pp. 89-96 
W. Wang, Z.-H. Zhou: A new analysis of co- 
training, Proc. 27th Int. Conf. Mach. Learn. (2010) 
pp. 1135-1142 


29.35 


29.36 


29.37 


29.38 


29.39 


29.40 


29.41 


29.42 


29.43 


29.44 


29.45 


29.46 


29.47 


29.48 


29.49 


29.50 


29.51 


29.52 


Z.-H. Zhou, D.-C. Zhan, Q. Yang: Semi-supervised 
learning with very few labeled training examples, 
Proc. 22nd AAAI Conf. Artif. Intell. (2007) pp. 675- 
680 

W. Wang, Z.-H. Zhou: Multi-view active learning 
in the non-realizable case, Adv. Neural Inf. Pro- 
cess. Syst. (2010) pp. 2388-2396 

R. Caruana: Multitask learning, Mach. Learn. 28(1), 
41-75 (1997) 

T. Evgeniou, M. Pontil: Regularized multi-task 
learning, Proc. 10th Int. Conf. Know. Discov. Data 
Min. (2004) pp. 109-117 

T. Evgeniou, C.A. Micchelli, M. Pontil: Learning 
multiple tasks with kernel methods, J. Mach. 
Learn. Res. 6, 615-637 (2005) 

A. Argyriou, T. Evgeniou, M. Pontil: Multi-task fea- 
ture learning, Adv. Neural Inf. Process. Syst. (2007) 
pp. 41-48 

A. Argyriou, T. Evgeniou, M. Pontil: Convex multi- 
task feature learning, Mach. Learn. 73(3), 243-272 
(2008) 

T. Kato, H. Kashima, M. Sugiyama, K. Asai: Multi- 
task learning via conic programming, Adv. Neural 
Inf. Process. Syst. (2007) pp. 737-744 

R. Ando, T. Zhang: A framework for learning pre- 
dictive structures from multiple tasks and un- 
labeled data, J. Mach. Learn. Res. 6, 1817-1853 
(2005) 

Y. Zhang, D.-Y. Yeung: A convex formulation for 
learning task relationships in multi-task learn- 
ing, Proc. 24th Conf. Uncertain. Artif. Intell. (2010) 
pp. 733-742 

L. Jacob, F. Bach, J. Vert: Clustered multi-task 
learning: A convex formulation, Adv. Neural Inf. 
Process. Syst. (2008) pp. 745-752 

L.J. Zhong Kwok: Convex multitask learning with 
flexible task clusters, Proc. 29th Int. Conf. Mach. 
Learn. (2012) 

J. Chen, J. Zhou, J. Ye: Integrating low-rank 
and group-sparse structures for robust multi-task 
learning, Proc. 17th Int. Conf. Knowl. Discov. Data 
Min. (2011) pp. 42-50 

S. Pan, J. Kwok, Q. Yang, J. Pan: Adaptive local- 
ization in A dynamic WiFi environment through 
multi-view learning, Proc. 22nd AAAI Conf. Artif. 
Intell. (2007) pp. 108-1113 

S. Pan, J. Kwok, Q. Yang: Transfer learning via 
dimensionality reduction, Proc. 23rd AAAI Conf. 
Artif. Intell. (2008) 

S. Pan, I. Tsang, J. Kwok, Q. Yang: Domain adapta- 
tion via transfer component analysis, IEEE Trans. 
Neural Netw. 22(2), 199-210 (2011) 

W. Dai, Q. Yang, G. Xue, Y. Yu: Boosting for transfer 
learning, Proc. 24th Int. Conf. Mach. Learn. (2007) 
pp. 193-200 

J. Huang, A. Smola, A. Gretton, K. Borgwardt, 
B. Schdlkopf: Correcting sample selection bias by 
unlabeled data, Adv. Neural Inf. Process. Syst. 
(2007) pp. 601-608 


Machine Learning 


References 


29.53 


29.54 


29.55 


29.56 


29.57 


29.58 


29.59 


29.60 


29.61 


29.62 


29.63 


29.64 


29.65 


29.66 


29.67 


29.68 


29.69 


M. Sugiyama, S. Nakajima, H. Kashima, P.V. Bue- 
nau, M. Kawanabe: Direct importance estimation 
with model selection and its application to co- 
variate shift adaptation, Adv. Neural Inf. Process. 
Syst. (2008) 

C. Elkan: The foundations of cost-sensitive learn- 
ing, Proc. 17th Int. Jt. Conf. Artif. Intell. (2001) 
pp. 973-978 

Z.-H. Zhou, X.-Y. Liu: On multi-class cost- 
sensitive learning, Proc. 21st Natl. Conf. Artif. In- 
tell. (2006) pp. 567-572 

X.-Y. Liu, Z.-H. Zhou: Learning with cost inter- 
vals, Proc. 16th Int. Conf. Knowl. Discov. Data Min. 
(2010) pp. 403-412 

P.D. Turney: Types of cost in inductive concept 
learning, Proc. 17th Int. Conf. Mach. Learn. (2000) 
pp. 15-21 

L. Xu: On essential topics of BYY harmony learn- 
ing: Current status, challenging issues, and gene 
analysis applications, Front. Electr. Elect. Eng. 
China 7(1), 147-196 (2012) 

L. Xu: Semi-blind bilinear matrix system, BYY har- 
mony learning, and gene analysis applications, 
Proc. 6th Int. Conf. New Trends Inf. Sci. Serv. Sci. 
Data Min. (2012) pp. 661-666 

L. Xu: Independent subspaces. In: Encyclopedia of 
Artificial Intelligence, ed. by J. Dopioco, J. Dorado, 
A. Pazos (Inform. Sci. Ref., Hershey 2008) pp. 903- 
912 

L. Xu: Independent component analysis and ex- 
tensions with noise and time: A Bayesian Ying- 
Yang learning perspective, Neural Inf. Process. 
Lett. Rev. 1(1), 1-52 (2003) 

L. Xu: One-bit-matching ICA theorem, convex- 
concave programming, and distribution approx- 
imation for combinatorics, Neural Comput. 19, 
546-569 (2007) 

S. Tu, L. Xu: Parameterizations make different 
model selections: Empirical findings from factor 
analysis, Front. Electr. Electr. Eng. China 6(2), 256- 
274 (2011) 

P. Williams: Bayesian regularization and pruning 
using A Laplace prior, Neural Comput. 7(1), 117-143 
(1995) 

R. Tibshirani: Regression shrinkage and selection 
via the lasso, J. R. Stat. Soc. Ser. B: Methodol. 
58(1), 267-288 (1996) 

M. Figueiredo, A. Jain: Unsupervised learning of 
finite mixture models, IEEE Trans. Pattern Anal. 
Mach. Intell. 24(3), 381-396 (2002) 

C. McGrory, D. Titterington: Variational approx- 
imations in Bayesian model selection for finite 
mixture distributions, Comput. Stat. Data Anal. 
51(11), 5352-5367 (2007) 

A. Corduneanu, C. Bishop: Variational Bayesian 
model selection for mixture distributions, Proc. 
8th Int. Conf. Artif. Intell. Stat. (2001) pp. 27-34 
L. Xu: Rival penalized competitive learning, 
Scholarpedia 2(8), 1810 (2007) 


29.70 


29.71 


29.72 


29.73 


29.74 


29.75 


29.76 


29.77 


29.78 


29.79 


29.80 


29.81 


29.82 


29.83 


29.84 


29.85 


29.86 


29.87 


29.88 


29.89 


29.90 


L. Xu: A unified perspective and new results 
on RHT computing, mixture based learning, and 
multi-learner based problem solving, Pattern 
Recognit. 40(8), 2129-2153 (2007) 

L. Xu: BYY harmony learning, structural RPCL, and 
topological self-organizing on mixture models, 
Neural Netw. 8-9, 1125-1151 (2002) 

L. Xu, M. Jordan, G. Hinton: An alternative model 
for mixtures of experts, Adv. Neural Inf. Process. 
Syst. (1995) pp. 633-640 

D. Lee, H. Seung: Learning the parts of ob- 
jects by non-negative matrix factorization, Na- 
ture 401(6755), 788-791 (1999) 

S. Madeira: A. Oliveira, Biclustering algorithms 
for biological data analysis: A survey, IEEE Trans. 
Comput. Biol. Bioinform. 1(1), 25-45 (2004) 

S. Tu, R. Chen, L. Xu: A binary matrix factorization 
algorithm for protein complex prediction, Pro- 
teome Sci. 9(Suppl 1), $18 (2011) 

X. He, P. Niyogi: Locality preserving projections, 
Adv. Neural Inf. Process. Syst. (2003) pp. 152- 
160 

X. He, B. Lin: Tangent space learning and general- 
ization, Front. Electr. Electr. Eng. China 6(1), 27-42 
(2011) 

M.M. Meila Jordan: Learning with mixtures of 
trees, J. Mach. Learn. Res. 1, 1-48 (2000) 

J. Pearl: Fusion, propagation and structuring 
in belief networks, Artif. Intell. 29(3), 241-288 
(1986), Sep. 

L. Xu, J. Pearl: Structuring causal tree models with 
continuous variables, Proc. 3rd Annu. Conf. Un- 
certain. Artif. Intell. (1987) pp. 170-179 

A. Barto: Temporal difference learning, Scholar- 
pedia 2(11), 1604 (2007) 

F. Woergoetter, B. Porr: Reinforcement learning, 
Scholarpedia 3(3), 1448 (2008) 

0. Chapelle, B. Schdlkopf, A. Zien: Semi-Super- 
vised Learning (MIT, Cambridge 2006) 

X. Zhu: Semi-supervised learning literature survey 
(Univ. of Wisconsin, Madison 2008) 

Z.-H. Zhou, M. Li: Semi-supervised learning by 
disagreement, Knowl. Inform. Syst. 24(3), 415-439 
(2010) 

Z.-H. Zhou: When semi-supervised learning 
meets ensemble learning, Front. Electr. Electr. 
Eng. China 6(1), 6-16 (2011) 

V.N. Vapnik: Statistical Learning Theory (Wiley, 
New York 1998) 

Y.-F. Li, Z.-H. Zhou: Towards making unlabeled 
data never hurt, Proc. 28th Int. Conf. Mach. Learn. 
(2011) pp. 1081-1088 

A. Fred, A.K. Jain: Data clustering using evidence 
accumulation, Proc. 16th Int. Conf. Pattern Recog- 
nit. (2002) pp. 276-280 

A. Strehl, J. Ghosh: Cluster ensembles — A knowl- 
edge reuse framework for combining multi- 
ple partitions, J. Mach. Learn. Res. 3, 583-617 
(2002) 


521 


6z | d Hed 


522 


62 | d Hed 


Part D 


Neural Networks 


29.91 


29.92 


29.93 


29.94 


29.95 


29.96 


29.97 


29.98 


29.99 


29.100 


29.101 


29.102 


29.103 


29.104 


29.105 


29.106 


29.107 


29.108 


29.109 


29.110 


29.111 


29.112 


R. Jacobs, M. Jordan, S. Nowlan, G. Hinton: Adap- 
tive mixtures of local experts, Neural Comput. 3, 
79-87 (1991) 

R.E. Schapire: The strength of weak learnability, 
Mach. Learn. 5(2), 197-227 (1990) 

Y. Freund, R.E. Schapire: A decision-theoretic 
generalization of on-line learning and an ap- 
plication to boosting, J. Comput. Syst. Sci. 55(1), 
119-139 (1997) 

J. Friedman, T. Hastie, R. Tibshirani: Additive 
logistic regression: A statistical view of boost- 
ing (with discussions), Ann. Stat. 28(2), 337-407 
(2000) 

R.E. Schapire, Y. Singer: Improved boosting algo- 
rithms using confidence-rated predictions, Mach. 
Learn. 37(3), 297-336 (1999) 

J. Zhu, S. Rosset, H. Zou, T. Hastie: Multi-class 
AdaBoost, Stat. Interface 2, 349-360 (2009) 

R.E. Schapire, Y. Freund, P. Bartlett, W.S. Lee: 
Boosting the margin: A new explanation for the 
effectiveness of voting methods, Ann. Stat. 26(5), 
1651-1686 (1998) 

L. Breiman: Prediction games and arcing algo- 
rithms, Neural Comput. 11(7), 1493-1517 (1999) 

L. Breiman: Bagging predictors, Mach. Learn. 
24(2), 123-140 (1996) 

C. Domingo, 0. Watanabe: Madaboost: A modifi- 
cation of AdaBoost, Proc. 13th Annu. Conf. Com- 
put. Learn. Theory (2000) pp. 180-189 

Y. Freund: An adaptive version of the boost by 
majority algorithm, Mach. Learn. 43(3), 293-318 
(2001) 

B. Efron, R. Tibshirani: An Introduction to the 
Bootstrap (Chapman Hall, New York 1993) 

A. Buja, W. Stuetzle: Observations on bagging, 
Stat. Sin. 16(2), 323-351 (2006) 

J.H.P. Friedman Hall: On bagging and nonlinear 
estimation, J. Stat. Plan. Inference 137(3), 669- 
683 (2007) 

L. Breiman: Random forests, Mach. Learn. 45(1), 
5-32 (2001) 

D.H. Wolpert: Stacked generalization, 
Netw. 5(2), 241-260 (1992) 

L. Breiman: Stacked regressions, Mach. Learn. 
24(1), 49-64 (1996) 

P. Smyth, D. Wolpert: Stacked density estimation, 
Adv. Neural Inf. Process. Syst. (1998) pp. 668-674 
L. Xu, A. Krzyzak, C. Sun: Associative switch for 
combining multiple classifiers, Int. Jt. Conf. Neu- 
ral Netw. (1991) pp. 43-48 

K.M. Ting, I.H. Witten: Issues in stacked general- 
ization, J. Artif. Intell. Res. 10, 271-289 (1999) 
A.K. Seewald: How to make stacking better and 
faster while also taking care of an unknown 
weakness, Proc. 19th Int. Conf. Mach. Learn. (2002) 
pp. 554-561 

B. Clarke: Comparing Bayes model averaging and 
stacking when model approximation error cannot 
be ignored, J. Mach. Learn. Res. 4, 683-712 (2003) 


Neural 


29.113 


29.114 


29.115 


29.116 


29.117 


29.118 


29.119 


29.120 


29.121 


29.122 


29.123 


29.124 


29.125 


29.126 


29.127 


29.128 


29.129 


29.130 


29,131 


A. Krogh, J. Vedelsby: Neural network ensembles, 
cross validation, and active learning, Adv. Neural 
Inf. Process. Syst. (1995) pp. 231-238 

N.R. Ueda Nakano: Generalization error of en- 
semble estimators, Proc. IEEE Int. Conf. Neural 
Netw. (1996) pp. 90-95 

G. Brown, J.L. Wyatt, P. Tino: Managing diversity 
in regression ensembles, J. Mach. Learn. Res. 6, 
1621-1650 (2005) 

Z.-H. Zhou, J. Wu, W. Tang: Ensembling neural 
networks: Many could be better than all, Artif. In- 
tell. 137(1-2), 239-263 (2002) 

L.I. Kuncheva, C.J. Whitaker: Measures of diver- 
sity in classifier ensembles and their relationship 
with the ensemble accuracy, Mach. Learn. 51(2), 
181-207 (2003) 

P. Devijver, J. Kittler: Pattern Recognition: A Sta- 
tistical Approach (Prentice Hall, New York 1982) 
Y. Saeys, |. Inza, P. Larraaga: A review of feature 
selection techniques in bioinformatics, Bioinfor- 
matics 19(23), 2507-2517 (2007) 

I. Guyon, A. Elisseeff: An introduction to variable 
and feature selection, J. Mach. Learn. Res. 3, 1157- 
1182 (2003) 

A. Jain, R. Duin, J. Mao: Statistical pattern recog- 
nition: A review, IEEE Trans. Pattern Anal. Mach. 
Intell. 22, 1 (2000) 

I. Guyon, J. Weston, S. Barnhill, V. Vapnik: Gene 
selection for cancer classification using support 
vector machines, Mach. Learn. 46(1-3), 389-422 
(2002) 

M. Dash, K. Choi, P. Scheuermann, H. Liu: Feature 
selection for clustering — A filter solution, Proc. 
2nd Int. Conf. Data Min. (2002) pp. 115-122, Dec. 
M. Law, M. Figueiredo, A. Jain: Simultaneous fea- 
ture selection and clustering using mixture mod- 
els, IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 
1154-1166 (2004) 

P. Mitra, C. Murthy, S.K. Pal: Unsupervised feature 
selection using feature similarity, IEEE Trans. Pat- 
tern Anal. Mach. Intell. 24(3), 301-312 (2002) 

V. Roth: The generalized LASSO, IEEE Trans. Neural 
Netw. 15(1), 16-28 (2004) 

C. Constantinopoulos, M. Titsias, A. Likas: 
Bayesian feature and model selection for Gaus- 
sian mixture models, IEEE Trans. Pattern Anal. 
Mach. Intell. 28(6), 1013-1018 (2006) 

J. Dy, C. Brodley: Feature selection for unsuper- 
vised learning, J. Mach. Learn. Res. 5, 845-889 
(2004) 

B. Zhao, J. Kwok, F. Wang, C. Zhang: Unsupervised 
maximum margin feature selection with manifold 
regularization, Proc. Int. Conf. Comput. Vis. Pat- 
tern Recognit. (2009) 

B. Turlach, W. Venables, S. Wright: Simultane- 
ous variable selection, Technometrics 27, 349-363 
(2005) 

B. Schölkopf, A. Smola: Learning with Kernels (MIT 
Press, Cambridge 2002) 


523 


30. Theoretical Methods in Machine Learning 


Badong Chen, Weifeng Liu, José C. Principe 


The problem of optimization in machine learning 30.1 Background Overview .....................00 524 


is well established but it entails several approx- 30.2 Reproducing Kernel Hilbert Spaces ....... 525 
imations. The theory of Hilbert spaces, which is 
principled and well established, helps solve the 
representation problem in machine learning by 
providing a rich (universal) class of functions where 
the optimization can be conducted. Working with 
functions is cumbersome, but for the class of re- 
producing kernel Hilbert spaces (RKHSs) it is still 


30.3 Online Learning 
with Kernel Adaptive Filters ................. 527 
30.3.1 Kernel Least 
Mean Square (KLMS) Algorithm... 527 
30.3.2 Kernel Recursive 
Least Squares (KRLS) Algorithm ... 535 
30.3.3 Kernel Affine Projection 


manageable provided the algorithm is restricted Algorithms (KAPA) ..........:c0.000e 536 

to inner products. The best example is the sup- . y 

port vector machine (SVM), which is a batch mode 30.4 Illustration Examples seeteeeeseees keene 538 = 

algorithm that uses a very efficient (supralinear) are nad a cel Prediction .... ee (= 

optimization procedure. However, the problem of oir Pei ica Di ia 540 w 

SVMs is that they display large memory and com- ee AE EN = 
; : 30.4.4 Nonlinear Channel Equalization. 540 

putational complexity. For the large-scale data 

limit, SVMs are restrictive because for fast oper- 30.5 COUS ON: iisi nii 542 

aon ie Crem meite, waid limereases wi Une Referentes. cos ce csceccersene reani ass 542 


square of the number of samples, must fit in com- 
puter memory. The computation in this best-case 
scenario is also proportional to number of samples 
square. This is not specific to the SVM algorithm 
and is shared by kernel regression. There are also 
other relevant data processing scenarios such as 
streaming data (also called a time series) where 
the size of the data is unbounded and potentially 
nonstationary, therefore batch mode is not directly 
applicable and brings added difficulties. 

Online learning in kernel space is more efficient 
in many practical large scale data applications. As 
the training data are sequentially presented to the 
learning system, online kernel learning, in general, 
requires much less memory and computational 
bandwidth. The drawback is that online algo- 
rithms only converge weakly (in mean square) to 
the optimal solution, i.e., they only have guaran- 
teed convergence within a ball of radius € around 
the optimum (e is controlled by the user). But be- 
cause the theoretical optimal ML solution has many 
approximations, this is one more approximation 


thatis worth exploring practically. The most impor- 
tant recent advance in this field is the development 
of the kernel adaptive filters (KAFs). The KAF algo- 
rithms are developed in reproducing kernel Hilbert 
space (RKHS), by using the linear structure of this 
space to implement well-established linear adap- 
tive algorithms (e.g., LMS, RLS, APA, etc.) and to 

obtain nonlinear filters in the original input space. 
The main goal of this chapter is to bring closer to 
readers, from both machine learning and signal 

processing communities, these new online learn- 
ing techniques. In this chapter, we focus mainly 

on the kernel least mean square (KLMS), kernel re- 
cursive least squares (KRLSs), and the kernel affine 
projection algorithms (KAPAs). The derivation of 

the algorithms and some key aspects, such as the 
mean-square convergence and the sparsification 

of the solutions, are discussed. Several illustration 
examples are also presented to demonstrate the 

learning performance. 


524 Part D | Neural Networks 


Loe | d Hed 


30.1 Background Overview 


The general goal of machine learning is to build a model 
from data with the goal of extracting useful structure 
contained in the data. More specifically, machine learn- 
ing can be defined as a process by which the topology 
and the free parameters of a neural network (i. e., the 
learning machine) are adapted through a process of 
stimulation by the environment in which the network 
is embedded [30.1]. There is a wide variety of machine 
learning algorithms. Based on the desired response of 
the algorithm or the type of input available during 
training, machine learning algorithms can be divided 
into several categories: supervised learning, unsuper- 
vised learning, semi-supervised learning, reinforcement 
learning, and so on [30.2]. In this chapter, however, we 
focus mainly on supervised learning, and in particular, 
on the regression tasks. The goal of supervised learn- 
ing is, in general, to infer a function that maps inputs 
to desired outputs or labels that should have the gen- 
eralization property, that is, it should perform well on 
unseen data instances. 

In supervised machine learning problems, we as- 
sume that the data pairs {x;, z;} collected from real- 
world experiments are stochastic and drawn inde- 
pendently from an unknown probability distribution 
P(X, Z) that represents the underlying phenomenon we 
wish to model. The optimization problem is normally 
formulated in terms of the expected risk R(f) defined 
as R(f) = S LE), z)dP(x, z), where a loss function 
L(f(x),z) translates the goal of the analysis, and f be- 
longs to a functional space. The optimization problem 
is to find the minimal expected risk R(f) among all 
possible functions, i.e. f* = mir R(f). Unfortunately, 
we cannot work with arbitrary functions in our model, 
so we restrict f to a mapper class F, and very likely 
f“ ¢ F. For instance, if our mapper is linear, then the 
functional class is the linear set which is small, albeit 
important, and so we will approximate f* by the clos- 
est linear function, committing sometimes a large error. 
But even if the mapper is a multilayer perceptron with 
fixed topology, the same problem exists although the 
error will likely be smaller. The best solution is there- 
fore fr = mimer R(f) and it represents the first source 
of implementation error experimenters face. But this is 
not the only problem. We also normally do not know 
P(X, Z) in advance (indeed in machine learning this 
is normally the goal of the analysis). Therefore, we 
resort to the law of large numbers and approximate 
the expected risk by Ry(f) = 1/N X; L (xi), zi) which 
we call the empirical risk. Therefore, our optimization 


goal becomes fy = milyer Ry(f), which is normally 
achieved by optimization algorithms. The difference 
between the optimal solution R(f*) and the solution 
achieved Ry (fy ) with the finite number of samples and 
the chosen mapper can be written as 


Ry (fy) — RF) 
= [RED REO] + [Rv RED] | 


where the first term is the approximation error while the 
second term is the estimation error. The optimization 
itself is also subject to constraints as we can imag- 
ine. The major compromise is how to treat the two 
terms. Statisticians favor algorithms to decrease as fast 
as possible the second term (estimation error), while 
optimization experts concentrate on supra-linear algo- 
rithms to minimize the first term (approximation error). 
But in large-scale data problems, one major consid- 
eration is the optimization time under these optimal 
assumptions, which can become prohibitively large. 
This paper among others [30.3] defines a third error p 
called the optimization error to approximate the prac- 
tical optimal solution Ry (fy) by Ry (fv), provided one 
can find Ry(fy) simply with algorithms that are O(N) 
in time and memory usage. Basically, the final solution 
of Ry (fv) will exist in a neighborhood of the optimal 
solution of radius p. 

Let us now explain how the powerful mathematical 
tool called the RKHS has been widely utilized in the 
areas of machine learning [30.4,5]. It is well known 
that the probability of linearly shattering data tends 
to one with the increase in dimensionality of the data 
space. However, the main bottleneck of this technique 
was the large number of free parameters of the high- 
dimensional classifiers, which results in two difficult 
issues: expensive computation and the need to regu- 
larize the solutions. The RKHS (also kernel space or 
feature space) provides a nice way to simplify the com- 
putation. The dimension of an RKHS can be very high 
(even infinite), but by the kernel trick the calculation in 
RKHS can still be done efficiently in the original in- 
put space if the algorithms can be expressed in terms of 
the inner products. Vapnik proposed a robust regularizer 
in support vector machine (SVM), which promoted the 
application of RKHS in pattern recognition [30.4, 5]. 

Kernel-based learning algorithms have been suc- 
cessfully applied in batch settings (say SVM). The 
batch kernel learning algorithms, however, usually re- 
quire significant memory and computational burden 


Theoretical Methods in Machine Learning | 30.2 Reproducing Kernel Hilbert Spaces 525 


due to the necessity of retaining all the training data 
and calculating a large Gram matrix. In many practical 
situations, the online kernel learning (OKL) is more ef- 
ficient. Since the training data are sequentially (one by 
one) presented to the learning system, OKL in general 
requires much less memory and computational cost. 
Another key advantage of OKL algorithms is that they 
can easily deal with nonstationary (time varying) en- 
vironments (i. e., where the data statistics change over 
time). 

Traditional linear adaptive filtering algorithms like 
the least mean square (LMS) and recursive least squares 
(RLSs) are the most well known and simplest on- 
line learning algorithms, especially in signal processing 
community [30.6—8]. In recent years, many researchers 
devoted to use the RKHS to design the optimal nonlin- 
ear adaptive filters, namely, the kernel adaptive filters 
(KAF) [30.9]. The KAF algorithms are developed in 
RKHS, by using the linear structure (inner product) 
of this space to implement the well-established linear 
adaptive algorithms and to obtain (by kernel trick) non- 
linear filters in the original input space. Up to now, there 
have been many KAF algorithms. Typical examples in- 
clude the KLMS [30.10], kernel affine projection algo- 
rithms (KAPA) [30.11], kernel recursive least squares 
(KRLS) [30.12], and the extended kernel recursive least 
squares (EX-KRLSs) [30.13]. If the kernel is a Gaus- 
sian, these nonlinear filters build a radial basis function 
(RBF) network with a growing structure, where centers 
are placed at the projected samples and the weights are 
directly related to the errors at each sample. 

The main bottleneck of KAF algorithms (and many 
other OKL algorithms) is their growing structure. This 
drawback will result in increasing memory and com- 
putational requirements, especially in continuous adap- 
tation situation where the number of centers grows 
unbounded. In order to make the KAF algorithms prac- 
tically useful, it is crucial to find a way to curb the 
network growth and to obtain a compact representa- 
tion. Some sparsification rules can be applied to address 
this issue [30.9]. According to these sparsification rules, 


the new input is accepted as a new center (i.e., in- 
serted into the center dictionary) only if it is judged 
as an important input under a certain criterion. Popu- 
lar sparsification criteria include the novelty criterion 
(NC) [30.14], coherence criterion (CC) [30.15], ap- 
proximate linear dependency (ALD) criterion [30.12], 
surprise criterion (SC) [30.16], and so on. In addition, 
the quantization approach can also be used to sparsify 
the solution and produce a compact network with desir- 
able accuracy [30.17]. 

Besides the RKHS, fundamental concepts and prin- 
ciples from information theory can also be applied 
in the areas of signal processing and machine learn- 
ing. For example, information theoretic descriptors like 
entropy and divergence have been widely used as sim- 
ilarity metrics and optimization criteria in information 
theoretic learning (ITL) [30.18]. These descriptors are 
particularly useful for nonlinear and non-Gaussian sit- 
uations since they capture the information content and 
higher order statistics of signals rather than simply their 
energy (i.e., second-order statistics like variance and 
correlation). Recent studies show that the ITL is closely 
related to RKHS. The quantity of correntropy in ITL 
is in essence a correlation measure in RKHS [30.19]. 
Many ITL costs can also be formulated in an RKHS 
induced by the Mercer kernel function defined as the 
cross information potential (CIP) [30.20]. The popular 
quadratic information potential (QIP) can be expressed 
as a squared norm in this RKHS. The estimators of in- 
formation theoretic quantities can also be reinterpreted 
in RKHS. For example, the nonparametric kernel esti- 
mator of the QIP can be expressed as a squared norm of 
the mean vector of the data in kernel space [30.20]. 

The focus of the present chapter is mainly on a large 
family of online kernel learning algorithms, the kernel 
adaptive filters. Several basic learning algorithms are 
introduced. Some key aspects about these algorithms 
are discussed, and several illustration examples are pre- 
sented. Although our focus is on the kernel adaptive 
filtering, the basic ideas will be applicable to many 
other online learning methods. 


30.2 Reproducing Kernel Hilbert Spaces 


A Hilbert space is a linear, complete, and normed space 
equipped with an inner product. A reproducing ker- 
nel Hilbert space is a special Hilbert space associated 
with a kernel k such that it reproduces (via an in- 


ner product) each function f in the space. Let X be 
a set (usually a compact subset of R?) and k(x, y) be 
a real-valued bivariate function on X x X. Then the 
function k(x, y) is said to be nonnegative definite if for 


TOE | d Hed 


526 PartD | Neural Networks 


TOE | d Hed 


any finite point set {x; € X}/_, and for any real number 
set fa; € RY. 


i=1? 


N N 


Yo YE aaj ui. x) >0. (30.1) 


i=1 j=1 


If the above inequality is strict for all nonzero vec- 
tors œ = [&1,..., ay], the function k(x,y) is said 
to be strictly positive definite (SPD). The following 
theorem shows that any symmetric and nonnegative 
definite bivariate function k(x,y) is a reproducing 
kernel. 


Theorem 30.1 (Moore-Aronszajn [30.21, 22]) 

Any symmetric, nonnegative definite function k(x, y) 
defines implicitly a Hilbert space H, that consists of 
functions on X such that: 


1) Vxe X,K(.,x) € Hy, 
2) Vxe X, VF € Hp f(x) = (F, KC) 96; 


where (., .) , denotes the inner product in Hg. 


Property (2) is the so-called reproducing property. 
In this case, we call k(x, y) a reproducing kernel, and 
Hp, an RKHS defined by the reproducing kernel k(x, y). 
Usually, the space X is also called the input space. 
Property 1) indicates that each point in the input space 
X can be mapped onto a function in a potentially much 
higher dimensional RKHS H. The nonlinear mapping 
from the input space to RKHS is defined as ®(x) = 
k(., x). In particular, we have 


(D(x), PO) = (KC. x), KC, Y) Ih 
= k(x, y). (30.2) 


Thus the inner products in high-dimensional RKHS can 
be simply calculated via kernel evaluation. This is nor- 
mally called the kernel trick. Note that the RKHS is 
defined by the selected kernel function, and the simi- 
larity between functions in the RKHS is also defined 
by the kernel since it defines the inner product of 
functions. 

The next theorem guarantees the existence of a non- 
linear mapping between the input space and a high- 
dimensional feature space (a vector space in which the 
training data are embedded). 


Theorem 30.2 (Mercer's [30.23]) 
Let k € Loo(X x X) be a symmetric bivariate kernel 
function. If k is the kernel of a positive integral oper- 


ator in L2 (X), and X is a compact subset of R¢, then 


Yy ELO: J kOe EE > 0. 
x 
(30.3) 


Let pi E€ L2(X) be the normalized orthogonal eigen- 
functions of the integral operator, and à; the corre- 
sponding positive eigenvalues. Then 


M 

K(x, y) = Do gil gil) . (30.4) 
i=1 

where M < oo. Since the eigenvalues are positive, one 


can readily construct a nonlinear mapping ¢ from the 
input space X to a feature space F 


gx—-F, 
0) = [Vine Vinge...) - 


The dimension of F is M, i.e., the number of the pos- 
itive eigenvalues (which can be infinite in the strictly 
positive definite kernel case). 


(30.5) 


Input space 


Feature space 


Fig. 30.1 Nonlinear map g(-) between the input space and 
the feature space 


Table 30.1 Some well-known kernels defined over X x X, 
X CR4(c>0,0>0,p EN) 


Kernels Expressions 

Polynomial K(x, y) = (c + xTy)P 
Exponential k(x, y) = exp(x7y/207) 
Sigmoid k(x, y) = tanh(xTy/o + c) 
Gaussian k(x, y) = exp(—|lx— yll?/207) 
Laplacian k(x, y) = exp(—||x—yl|/207) 
Cosine K(x, y) = exp(Z(x, y)) 
Multiquadratic K(x, y) = yle- +e 


Inverse K(x, y) = 1/y llx— yll? + c 


multiquadratic 


Theoretical Methods in Machine Learning | 30.3 Online Learning with Kernel Adaptive Filters 527 


The feature space F is isometric—isomorphic to the 
RKHS H, induced by the kernel. This can be eas- 
ily recognized by identifying g(x) = ®(x) = k(., x). In 
general, we do not distinguish these two spaces if no 
confusion arises. 

Now the basic idea of kernel-based learning al- 
gorithms can be simply described as follows: Via 
a nonlinear mapping g: X >F (or @: X > Hı), 
the data {x; € X}/_, are mapped into a high dimen- 
sional (usually M >> d) feature space F with a lin- 
ear structure (Fig. 30.1). Then a learning problem in 
X is solved in F instead, by working with {g(x;) € 
F}. As long as an algorithm can be formulated in 
terms of the inner products in F, all the operations 
can be done in the input space (via kernel evalua- 


tions). Because F is high dimensional, a simple linear 
learning algorithm (preferably one expressed solely 
in terms of inner products) in F can solve arbitrar- 
ily nonlinear problems in the input space, provided 
that F is rich enough to represent the mapping (the 
feature space can be universal if it is infinite dimen- 
sional). 

The kernel function K is a crucial factor in all ker- 
nel methods because it defines the similarity between 
data points. Some well-known kernels are listed in Ta- 
ble 30.1. Among these kernels, the Gaussian kernel is 
most popular and is, in general, a default choice due to 
its universal approximating capability (Gaussian kernel 
is strictly positive definite), desirable smoothness and 
numerical stability. 


30.3 Online Learning with Kernel Adaptive Filters 


In this section, we discuss several important online 
kernel learning algorithms, i.e., the kernel adaptive 
filtering algorithms. Suppose our goal is to learn a con- 
tinuous input-output mapping f: U — D based on a se- 
quence of input—output examples (the so called training 
data) {u(i), d(i)}, i= 1,2,..., where U C R” is the in- 
put domain, D C R is the desired output space. This 
supervised learning problem can be solved online (se- 
quentially) using an adaptive filter. Figure 30.2 shows 
a general scheme of an adaptive filter. Usually, an adap- 
tive filter consists of three elements: 


1) The input—output training data. 

2) The structure (or topology) of the filter, with a set 
of unknown parameters (or weights) w. 

3) An optimization criterion J (or cost function). 


An adaptive filtering algorithm will adjust the filter 
parameters so as to minimize the disparity (measured by 


Desired 
d(i) 
+ 


Input 
u(i) 


Adaptive filter 
W(i) 


Cost function 
J = Ele*(i)] 


Fig. 30.2 General configuration of an adaptive filter 


the cost function) between the filtering and desired out- 
puts. The filter topology can be a simple linear structure 
(e.g., the FIR filter) or any nonlinear network structure 
(e.g., MLPs, RBF, etc.). The cost function is, in gen- 
eral, the mean square error (MSE) or the least-squares 
(LSs) cost. The adaptive filtering algorithm is usually 
a gradient-based algorithm. 

The great appeal of developing adaptive filters in 
RKHS is to utilize the linear structure of this space to 
implement well-established linear adaptive filtering al- 
gorithms and to achieve nonlinear filters in the input 
space. Compared with other nonlinear adaptive filters, 
the KAFs have several desirable features: 


1) If choosing a universal kernel (e.g., Gaussian ker- 
nel), they are universal approximators. 

2) Under MSE criterion, the performance surface is 
still quadratic so gradient descent learning does not 
suffer from local minima. 

3) If pruning the redundant features, they have mod- 
erate complexity in terms of computation and 
memory. 


Table 30.2 gives the comparison of different adap- 
tive filters [30.9]. 


30.3.1 Kernel Least 
Mean Square (KLMS) Algorithm 


Among the family of the KAF, the KLMS is the sim- 
plest, which is derived by directly mapping the linear 


€°0€ | d Hed 


528 PartD 


Neural Networks 


€°0€ | d Hed 


Table 30.2 Comparison of different adaptive filters 


Adaptive filters Modeling capacity 
Linear adaptive filters Linear only 
Hammerstein, Weiner models Limited nonlinearity 
Volterra, Wiener series Universal 
Time-lagged neural networks Universal 

Recurrent neural networks Universal 

Kernel adaptive filters Universal 

Recursive Bayesian filters Universal 


least mean square (LMS) algorithm into RKHS [30.10]. 
Before proceeding, we simply discuss the well-known 
LMS algorithm. 


LMS Algorithm 
Usually, the LMS algorithm assumes a linear finite im- 
pulse response (FIR) filter (or transversal filter), whose 
output, at į iteration, is simply a linear combination of 
the input 


y(i) = w(i— 1)" u(i) , (30.6) 


where w(i— 1) denotes the estimated weight vector at 
(i— 1) iteration. With the above linear model, the LMS 
algorithm can be given as follows 


w(0) =0 
e(i) = d(i) —w(i— 1)"u(i) , 
w(i) = w(i— 1) + ne(iu(i) , 


(30.7) 


where e(i) = d(i) — y(i) is the prediction error, and y > 
O is the step size. The LMS algorithm is in essence 
a stochastic gradient-based algorithm under the instan- 
taneous MSE cost J(i) = e?(i)/2. In fact, the weight 
update equation of the LMS can be simply derived as 


a a , 
wi) == =n Aa o) 
=w(i—1)— nO GLT (d(i) 


we 1) 
—w(i— 1)"u(i)) 
=w(i— 1)+ ne(iju(i). 


The LMS algorithm has been widely applied in adap- 
tive signal processing due to its simplicity and effi- 
ciency [30.6—8]. The robustness of the LMS has been 
proven in [30.24], and it has been shown that a sin- 
gle realization of the LMS is optimal in the Hoo sense. 
The step size y is a crucial parameter and has signifi- 
cant influence on the learning performance. It controls 


(30.8) 


Convexity Complexity 
Yes Very simple 
No Simple 

Yes Very high 
No Modest 

No High 

Yes Modest 

No Very high 


the compromise between convergence speed and mis- 
adjustment. In practice, the selection of step size should 
guarantee the stability and convergence rate of the algo- 
rithm. 

The LMS algorithm is sensitive to the input power. 
In order to guarantee the stability and improve the per- 
formance, one often uses the normalized LMS (NLMS) 
algorithm, which is a variant of the LMS algorithm, 
where the step-size parameter is normalized by the in- 
put power, that is 


w(t) =wi- ———e(i)u(i) . (30.9) 


“Tu aE 


KLMS Algorithm 
The LMS algorithm assumes a linear FIR filter, and 
hence the performance will become very poor if the un- 
known mapping is highly nonlinear. To overcome this 
limitation, we are motivated to formulate a similar algo- 
rithm in a high-dimensional feature space (or equivalent 
RKHS), which is capable of learning arbitrary nonlin- 
ear mapping. This is the motivation of the development 
of the kernel adaptive filtering algorithms. 

Let us come back to the previous nonlinear learning 
problem, i.e., learning a continuous arbitrary input— 
output mapping f based on a sequence of input—output 
examples {u(i), d(i)}, i= 1,2,... Online learning finds 
sequentially an estimate of f such that f; (the estimate at 
iteration i) is updated based on the last estimate f;_; and 
the current example {u(i), d(i)}. This recursive process 
can be done in the feature space. First, we transform 
the input u(i) into a high-dimensional feature space F 
by a kernel-induced nonlinear mapping g(-). Second, 
we assume a linear model in the feature space, which is 
in the form similar to the linear model in (30.6) 

yi) = Ri- DEl), (30.10) 
where (i) = g(u(i)) is the mapped feature vector from 
the input space to the feature space, 2 (i— 1) denotes 
a high-dimensional weight vector in feature space. 


Theoretical Methods in Machine Learning 


30.3 Online Learning with Kernel Adaptive Filters 


Third, we develop a linear adaptive filtering algorithm 
based on the model (30.10) and the transformed training 
data {g (i), d(i)}, i= 1,2,... If we can formulate this 
linear adaptive algorithm in terms of the inner products, 
we will obtain a nonlinear adaptive algorithm in input 
space, namely the kernel adaptive filtering algorithm. 

Performing the LMS algorithm on the model 
(30.10) with new example sequence {g(i), d(i)} yields 
the KLMS algorithm [30.10] 


2(0)=0, 
e(i) = d(i)-— 2 (i— 1)" efi), 
2 (i) = 2i-1)+ nee). 
The KLMS is very similar to the LMS algorithm, 
except for the dimensionality (or richness) of the pro- 
jection space. By identifying g(w) = k(u,.), one can 
easily obtain the learning rule in the input space 


fo=0, 
e(i) = d(i) —fi-1(u@) , 
fi = fi—1ı + ne@k(ud, .) . 


The KLMS can be viewed as the solution of the fol- 
lowing regularized least squares problem 


(30.11) 


(30.12) 


in OA)? +i =f-alBy - G03) 
The above formula can be rewritten as 
min_(e(i)— AKUD)? + — Ail « (30.24) 
Afic Hy n 
where Af; = f; —fi—1. From (30.14), we observe: 


1) The learning of KLMS at iteration i is equivalent 
to solving a regularized least squares problem, in 
which the previous estimate f;_ is frozen, and only 
the adjustment term Af; is solved. 

2) In this least squares problem, there is only one train- 
ing example involved. i. e., {u(i), e(i)}. 

3) The regularization factor is directly related to the 
step size via y = (1 — n)/n. 


It has been proven in [30.10] that the KLMS has 
self-regularization property, i.e., the step size plays 
a similar role as the regularization parameter. 

Given an input u, the output of the KLMS filter, at 
iteration i, will be 


flu) =n) e0uG), u). (30.15) 


j=! 


If the kernel is a radial kernel (e.g., Gaussian kernel), 
the KLMS creates a growing RBF network by allo- 
cating a new kernel unit for every new example with 
input u(i) as the center and ne(i) as the coefficient. 
The network topology of the KLMS filter is shown in 
Fig. 30.3. The procedure of KLMS is summarized in 
Algorithm 30.1. 

It is also straightforward to derive the normalized 
KLMS algorithm. The weight update equation of nor- 
malized KLMS will be 


> ; n oe 
(i) = 2-1) + —* eo) 
lor Bone 


= @2(i-—1)+ e(i)g(i) . 


n 
k(u(i), u(i)) 


If the kernel function is the Gaussian kernel (Ta- 
ble 30.1), we have k(u(i),u(i)) = 1. In this case, the 
KLMS is automatically normalized. 


Algorithm 30.1 Kernel Least Mean Square Algorithm 


Initialization: 

Choose kernel « and step size n 

a, = nd(1), CQ) = {u(1)}. fi = aie (uC), .) 
Computation: 

while {u(i), d(i)}(i > 1) available do 


1) Compute the filter output: fi—ı(u(i)) = 
Da gk Ul), uQ) 

2) Compute the error: e(i) = d(i) — fi—1ı (u(i)) 

3) Store the new center: C(i) = {C(i— 1), u(i)} 


Fig. 30.3 The network topology of the KLMS filter 


529 


€°0€ | d Hed 


530 Part D | Neural Networks 


€°0€ | d Hed 


4) Compute and store the new coefficient: a; = 
ne(i) 


end while 
(where a denotes the coefficient vector and C(i) de- 
notes the dictionary at i iteration) 


The KLMS is a simple kernel learning algorithm, 
which requires O(i) operations per iteration. The role 
of the step size ņ in KLMS remains, in principle, the 
same as the step size in traditional LMS. Specifically, 
it controls the compromise between convergence speed 
and misadjustment. The step-size parameter in KLMS 
is also directly related to the optimization error intro- 
duced in [30.3]. 

In KLMS, the kernel is usually chosen to be a Gaus- 
sian kernel. The kernel size (or kernel bandwidth) o 
in the Gaussian kernel is a crucial parameter that 
controls the degree of smoothing and consequently 
has significant influence on the learning performance. 
In practice, the kernel size can be set manually, or 
estimated by rule-based methods (e.g., Silverman’s 
rule [30.25]), or determined automatically using cross- 
validation. 


Mean Square Convergence Performance 
The mean square convergence analysis is very impor- 
tant for adaptive filters. For linear adaptive filters, much 
research has been done in this area and significant re- 
sults have been achieved. For nonlinear adaptive filters, 
the mean square convergence analysis is, in general, 
rather complicated and little studied. The mean square 
convergence analysis of the KLMS is, however, rela- 
tively tractable since it is a simple linear algorithm in 
high-dimensional feature space, and hence its conver- 
gence analysis is much similar to those of the classical 
linear adaptive filters [30.26]. In the following, we dis- 
cuss the mean square convergence performance of the 
KLMS. 

Let us consider the case of nonlinear system identi- 
fication where the output data {d(i)} are related to the 
input vectors {w(i)} via 


d(i) =f* (ui) + vÒ , (30.17) 


where f*(-) denotes the unknown nonlinear mapping 
that needs to be estimated, v(i) stands for the mea- 
surement noise. Suppose the selected kernel is a uni- 
versal kernel (i.e., strictly positive definite kernel). 
Then, according to the universal approximation prop- 


erty [30.27], there is a weight vector 2* € F such that 
dli) = 2*" (i) + v(i). (30.18) 
The prediction error e(i) can thus be expressed as 
e(i) = BG -1)" ep) + VO = eal) + (i) , (30.19) 


where @(i— 1) = 2* — @(i—1) is the weight error 
vector in F, e,(i) 22 (i— 1)" (i) is the a priori error 
at iteration i. 

Subtracting 2* from both sides of the weight up- 
date equation 2 (i) = 2 (i— 1) + ne(de(i), we get 


2(i) = 2(i-1)— neli) . (30.20) 


Define the a posteriori error @, (7) 4 2 (i) (i). Then 
we have 


el) = eali) + (PO - Gi- . (30.21) 
By incorporating (30.20), 
e(i) = eali) — ne(i@(i) ei) 
= ea(i) — ne(i)k (u(i), u()) . 
Combining (30.20) and (30.22), and eliminating the 
prediction error e(i), yields 
(6@ — eal) PW 
kuu) 


Squaring both sides of (30.23), and after some 
straightforward manipulations, we obtain 


(30.22) 


= 2(i-1) + (30.23) 


cig. g0 
2 2 + <P 
cue), 
EUN m (30.24) 
-igne + —2 


(u(i), u(i)) ` 
where || 2 (i)||? = 2 (i)’ @ (i) is the weight error power 


(WEP) in feature space F. Further, taking expectations 
of both sides of (30.24) yields 


na (i) 
AOI K(u(i). u) 
E|I2 O Hel Ey. 


=E||2Gü-1)]+E oO 
k(u(i), u(i)) 


(30.25) 


Theoretical Methods in Machine Learning | 30.3 Online Learning with Kernel Adaptive Filters 531 


The above equation is referred to as the energy conser- 
vation relation in feature space [30.26], which shows 
how the WEP in feature space evolves in time. The 
expression of this fundamental relation is in the form 
similar to those of the energy conservation relation 
for classical linear adaptive filters [30.28—30]. In fact, 
this is not surprising, since the KLMS is a linear 
(but high-dimensional) adaptive algorithm in feature 
space. 

Substituting e,(i) = e4a(i)— ne(i)x(u(i),u(i)) into 
the energy conservation relation (30.25) yields 


E[N ON] = E [IE G- VIP] -2nFleaWe] 


+ n Ele(i) k (u(i), u(i))] . 
(30.26) 


When choosing Gaussian kernel, we have k(u(i), 
u(i)) = 1, and hence 


EIRO] = £[|2@G- D] 


— 2nElea(ie(i)] + n7Ele*(i)] . 
(30.27) 


Gaussian kernel is a normalized and shift-invariant ker- 
nel, which makes the analysis much simpler. Since 
Gaussian kernel is also a default kernel in KLMS, in 
the following, we will focus on Gaussian kernel and 
use (30.27) to analyze the mean square convergence be- 
havior of the KLMS. It is straightforward to generalize 
the discussion to arbitrary shift-invariant kernels. In the 
following, we give an assumption that will be used in 
the analysis. 


Assumption 30.1 A1 

The noise v(i) is zero-mean, independent, identically 
distributed (i.i.d.), and independent of the a priori es- 
timation error e4(i). 


The above assumption is commonly used in con- 
vergence analysis for classical linear adaptive filtering 
algorithms [30.8]. A sufficient condition for the inde- 
pendence between v(i) and e,(i) is the independence 
between v(i) and the input sequence {u(i)}. 

Combining (30.27) and assumption A1, we have 


E [IOP] =E [2 G- VI? ]-2n£ [20] 
+n? (Ele@]+&) . 
(30.28) 


where E? denotes the noise power (variance). It is 
worth noting that Eq. (30.28) depends on the noise v(i) 
through €? only. 


A Sufficient Condition for Mean Square Conver- 
gence. From (30.28), one can easily derive 


EIRO] <£ [2-1] 
 —2nE [e0] + 0° (E [a0] + §) <0 
2: 
po leo] . 
Ele] +8 
(30.29) 
Thus, if we choose the step size such that Vi, n < 
2E[e?(i)|/(Ele2(i)] + 2), the WEP in feature space 
will be monotonically decreasing (and hence conver- 
gent). This sufficient condition for the mean square 
convergence is, interestingly, identical to that of the nor- 
malized LMS algorithm. The essential reason for this is 
that the Gaussian kernel is a shift-invariant and normal- 
ized kernel (k(u, u) = 1). From (30.29), one can also 
observe that, when the noise power EZ is very small, the 


upper bound on step size will be approximately equal 
to 2.0. 


Steady-State Mean Square Performance. Take the 
limit of Eq. (30.28) as i > 00, 


Jim E [2 @OI?] = lim [2 G—1HI7] 
—2n lim E [e30] 
+n? (im e140] +8) 
(30.30) 


If the WEP in feature space reaches a steady-state 
value, i.e., limj—+oo E|||2 (i)||7] = im: Ell 2 (i— 
1)||?], we have 


-2n lim E[2()]+1? (1m e[a@]+8) zi 


(30.31) 
It follows that 
es 
: 2x7 né% 
Jim E [ea] = ae (30.32) 


The a priori error power E[e?(i)] is also referred to 
as the excess mean square error (EMSE) in the adap- 
tive filtering community. From (30.32), we see that the 
steady-state EMSE of KLMS depends only on the step 


€°0€ | d Hed 


532 


€°0€ | d Hed 


Part D 


Neural Networks 


@ Simulation 


0.002 — Theory 
0 > 
0.2 0.4 0.6 0.8 1 
Step size 7 


Fig. 30.4 Simulated and theoretical EMSE versus step 
size 


size and noise variance, and is NOT related to the kernel 
size and the unknown nonlinear mapping. We should 
point out here that, although the kernel size does not 
affect the KLMS steady-state accuracy, it has crucial 
influence on the convergence rate. In most practical sit- 
uations, the training data are finite and the algorithm can 
never reach the steady state. In these cases, the kernel 
size also has significant influence on the final accuracy 
(not the steady-state accuracy). 

We present here a simple simulation example to ver- 
ify the obtained theoretical results. Suppose that the 
training data are generated by the following nonlinear 
system [30.26] 


d(i) = sin(u(i)) + 0.5u(i— 1) —0.1u°(i— 2) + vò . 
(30.33) 


The input sequence {u(i)} is assumed to be a white 
Gaussian process with variance 1.0, and {v(i)} is a zero- 
mean white noise that is independent of {u(i)}. In the 
simulation, except mentioned otherwise, the step size is 
set at n = 0.5, the noise variance is £2 = 0.01, and the 
kernel size is o = 1.0. For different values of the step 
size, noise variance, and kernel size, the simulated and 
theoretical EMSE are illustrated in Figs. 30.4—30.6. Ev- 
idently, the experimental and theoretical results agree 
very well. 


Network Size Control 
The KLMS filter network grows rapidly with each new 
sample following a nonparametric approach. Due to fi- 
nite resources one must cut the growing structure of 
the filter and constrain the network size (number of the 


@ Simulation 
—— Theory 


> 
0.08 0.1 
Noise variance ¢? 


0.02 0.04 0.06 


Fig. 30.5 Simulated and theoretical EMSE versus noise 
variance 


EMSE 
A 

0.008 
0.007 
0.006 
0.005 


0.004 
@—o—_9—_0—g—_0—_0—0—_ 099900909 


@ Simulation 
—— Theory 
0 > 
0.5 1 IES 2 
Kemel size 0 


0.003 
0.002 
0.001 


Fig. 30.6 Simulated and theoretical EMSE versus kernel 
size 


centers). Some sparsification methods can be applied to 
cope with this issue. According to these methods, new 
samples are inserted into the dictionary, only if they sat- 
isfy a certain sparsification criterion. In the following, 
we briefly discuss several useful sparsification criteria. 

Suppose at i iteration, the dictionary is C(i) = 
{€1,€1,...,€m,}, and the coefficient vector a(i) = 
{@1,2,...,@m,}, where c; is the jth center, a; is the jth 
coefficient, and m; the dictionary size (or network size) 
at i iteration. In this case, the learned mapping is 


mi 


Ai) = Yo aOun). 


j=! 


(30.34) 


Theoretical Methods in Machine Learning | 30.3 Online Learning with Kernel Adaptive Filters 533 


When a new example {u(i + 1), d(i + 1)} is presented, 
the learning system needs to decide whether u(i + 1) 
should be inserted into the dictionary. This decision 
procedure is, in general, based on some sparsification 
criterion. 


Novelty Criterion. Platts NC [30.14] first computes 
the distance of u(i + 1) to the present dictionary 


dis} = min |ju(i+1)—c;||. (30.35) 
gEC(i) ü 


If dis; is smaller than some preset threshold 8; (8; > 0), 
u(i+ 1) will not be added into the dictionary. Other- 
wise, it computes the prediction error e(i+ 1) = d(i + 
1) —f,(u(i+ 1)). Only if the magnitude of the prediction 
error is larger than another preset threshold 82(82 > 0), 
u(i+ 1) will be accepted as a new center. If the input 
domain U is a compact subset, the NC criterion always 
produces a dictionary with finite elements. 


Coherence Criterion. According to the CC [30.15], 
the input u(i + 1) will be inserted into the dictionary if 
its coherence remains below a given threshold pọ, that 
is 


max |k(u(i+ 1), u(c;))| < uo (30.36) 
gEC(i) 


ALD Criterion. The ALD uses the distance of the new 
input to the linear span of the current dictionary in fea- 
ture space, that is [30.12] 


dis, = min |øu(i+ 1))— 2, boe) . (30.37) 
gec 


ALD is computationally expensive especially when the 
dictionary size m; is very large. In order to simplify 
the computation, one can use the following approximate 
distance 


dis = min ||øu+1))-bølc)|| . (30.38) 
MALEKU] 


Surprise Criterion. Surprise is a subjective informa- 
tion measure of an example {u,d} with respect to 
a learning system £, which is defined as the negative 
log likelihood of the example given the learning sys- 
tem’s estimate on the data distribution [30.16] 


Sc(u, d) = —lnp(u,d | £), (30.39) 


where p(u,d | £) is the subjective probability of (u, d) 
hypothesized by £. The surprise $s (u, d) measures how 
surprising the exemplar is to the learning system. The 
surprise of the new example {u(i + 1), d(i + 1)} is 


Sco luli+ 1), di + 1)) 


= —Inp(u(i+ 1),d(i+ 1) | L®), (30.40) 


where £ (i) denotes the present learning system. To sim- 
plify notation, one can write Ss (u(i + 1),d(i+ 1)) as 
SGi+ 1). 

By the definition, if surprise S(i+ 1) is large, the 
new example {u(i+ 1),d(i+1)} contains something 
new for the system to learn or it is suspicious. Other- 
wise, if surprise S(i+ 1) is very small, the new datum 
is well expected by the learning system £(i) and thus 
contains little information to be learned. Usually one 
can classify the new example into three categories 


abnormal: S(i+ 1) >T, , 
learnable: Ti > S(i+1)>T>, (30.41) 
redundant: S(i+ 1) < Tə, 


where T; and T, are threshold parameters. The choice 
of the thresholds and learning strategies defines the 
characteristics of the learning system. In general, a new 
center will be added only if the example is learnable, 
i.e, Ti > S(i+ 1) > Th. 

Besides the aforementioned sparsification methods, 
there is another technique, called the quantization ap- 
proach, to reduce the network size of KLMS. By 
quantization approach, the input space is quantized, if 
the quantization of the new input has already been as- 
signed a center, no new center will be added, while 
the coefficient of that center will be updated. This new 
algorithm is called the quantized KLMS (QKLMS) al- 
gorithm [30.17]. The mapping update equation of the 
QKLMS can be simply expressed as 


fo =0, 
e(i) = d(i) —fi-_1(u@). (30.42) 
fi = fim + ne@«(Qlu(],.) . 


where QJ.] is a quantization operator over input space. 
A simple online vector quantization (VQ) method has 
also been proposed in [30.17]. The QKLMS algorithm 
(with simple online VQ) is described in Algorithm 30.2. 


€°0€ | d Hed 


534 


€°0€ | d Hed 


Part D | Neural Networks 


Algorithm 30.2 Quantized Kernel Least Mean 
Square Algorithm 


Initialization: 
Choose kernel k, step size n, quantization size € 


a, = nd(1), CQ) = tuQ)}.fi = aik(u(1), -) 
Computation: 
while {u(i), d(i)}(i > 1) available do 


1) Compute the prediction error 


size(C(i—1)) 


3 


j=! 


e(i) = d(i)— a(i—1)«(GG—1),u@)) 
2) Compute the distance between u(i) and C(i— 1) 
dis(u(i), CG- 1)) = |lu — Ge Gi- DI 

where 


argmin 
1<j<size(C(i—1)) 


3) if dis(u(i), C(i— 1)) < £, then 


= lu® —GG— I)|| 


Ci) =C(i—1), a (i) = a (i— 1) + eli) 
else 
Ci) = {CG—-1), uM}, a(i) = [aG—1), ne] 
end if 
end while 
(where C;(i— 1) denotes the jth element of the dictio- 


nary C(i—1)). 


Kernel Maximum Correntropy (KMC) Algorithm 
Like most conventional adaptive filtering algorithms, 
the KLMS adopts the MSE as the optimality cost 
function. The MSE is mathematically tractable, com- 
putationally simple, and optimal for linear Gaussian 
systems. However, MSE may be a poor cost for nonlin- 
ear or/and non-Gaussian (e.g., heavy-tail distributions) 
situations, since it constraints only the second-order 
statistics. To cope with this problem, one may use 
a non-MSE cost, such as a higher order statistics, 
or an information theoretic criterion (entropy, corren- 
tropy, divergence, etc.). In particular, the kernel maxi- 
mum correntropy (KMC) algorithm has been developed 
in [30.31], which is derived by applying the maximum 
correntropy criterion (MCC) to KLMS. 

The correntropy defines a new correlation function 
between two random variables. Let X and Y be two 
random variables with the same dimensions, the cor- 


rentropy is defined by [30.19] 
V(X, Y) _ Exy [Keom (X, Y)] 


(30.43) 
= J Kcorr(X, y)dFxy (x, y) , 
where Keor(.,.) is a Mercer kernel (usually Gaussian 
kernel), and Fyy(x,y) denotes the joint distribution 
function of X, Y. Since any Mercer kernel induces 
a nonlinear mapping ¢g(-) from the input space to a high- 
dimensional (possibly infinite) feature space, and the 
inner product of two points g(X) and (Y) in feature 
space can be implicitly computed by using the Mercer 
kernel, so the correntropy (30.43) can alternatively be 
expressed as 


V(X, Y) = E[( (X), o (Y))] . 


where (.,.) denotes the inner product in the feature 
space induced by Keor(.,.). Clearly, correntropy is 
a generalized correlation function and it is also posi- 
tive definite, i. e., it defines a new RKHS for inference. 
By a simple Taylor series expansion on the kernel, one 
can see that correntropy provides a number that is the 
sum of all the statistical moments expressed by the ker- 
nel. In many applications, this sum may be sufficient to 
quantify better than correlation the relationships of in- 
terest and it is much simpler to estimate than the higher 
order statistical moments. Therefore, it can be consid- 
ered a new type of statistical descriptor and a new cost 
function for adaptive system training. 

Under MCC criterion, the learning cost function is 
V(d(i), YÒ) = Elkcon(d(i), y(i))]. Dropping the expec- 
tation operator, one obtains the instantaneous cost func- 
tion V(d(i), y(i)) = Kcor(d(i), y(i)). Thus, a stochastic 
gradient algorithm in RKHS Hx (which is induced by 
k, NOT by Kor) can be readily derived as follows 


(30.44) 


dO -~ 
f =f +n — Vd. 90) 
ð PA 
=fi— +g“ (at), WO) 
ð MOE E Ole aes 
= fi—ı + ngg COO) Wu? 
a : 
=fi-it+y ay(d) Kor (d (i), 
ð 
Da Vir | K(u(i), .)) 4% 
= fj Hip ikem Du), Goins) 
əyi) 


Theoretical Methods in Machine Learning 


30.3 Online Learning with Kernel Adaptive Filters 


where 0/df;-; denotes Frechet’s differential. 
This algorithm is called the KMC algorithm. If 
Keor(.,-) is a Gaussian kernel, i.e., Kcor(d(i), y(i) = 
exp(—(e(i)?/207)), then KMC (30.45) becomes 


\2 
f= fei + nga ex ( a econ 


262 


A2 
=f z exp ( ai ) e(i)K(u(i),.) . 
(30.46) 


The algorithm of (30.46) is, in fact, a KLMS algorithm 
with step size w = 4 exp(—(e(i)?/207)). 

To achieve a better performance, one should select 
a suitable kernel size for correntropy. Note that there 
may be two kernel sizes in KMC: the kernel size for the 
RKHS of filter and the kernel size for the cost function. 
Here we talk about the latter. A kernel size update rule 
has been proposed in [30.32], which is 


o(i+ 1) =ao(i)+ (1-a) fies ; 


where o(i) denotes the kernel size of correntropy at 
iteration 7, 0 < a < 1 isa forgetting factor, Bg is the kur- 
tosis of the Gaussian distribution (i. e., Bg = 3), and ße 
and o2 are, respectively, the kurtosis and variance of the 
prediction error. 


(30.47) 


30.3.2 Kernel Recursive 
Least Squares (KRLS) Algorithm 


The recursive least squares (RLS) is another popu- 
lar algorithm in the traditional linear adaptive filtering 
literature, which recursively updates the estimated au- 
tocorrelation matrix of the input signal vector and the 
cross-correlation vector between the input vector and 
the desired response. The convergence rate of RLS is, 
in general, much faster than the LMS algorithm. This 
improvement in performance, however, is achieved at 
the expense of an increase in computational complex- 
ity. Similar to the LMS algorithm, the RLS algorithm 
can also be kernelized. Next, we will discuss the KRLS 
algorithm [30.12]. The derivation of KRLS is based on 
a least squares formulation in the feature space. 

Based on a sequence of available examples (up to 
and including time i— 1) {u (j), d(j) Zi the regularized 
least squares regression in H% can be formulated as 


=] 


min J A-SUN + VI > 


j=l 


(30.48) 


where y > 0 is the regularization factor that controls the 
smoothness of the solution (to avoid overfitting). Note 
that in KLMS, the step size performs a similar role as 
the regularization factor (self-regularization property), 
and hence there is no need to add explicitly a regular- 
ization factor in KLMS. 

By the representer theorem [30.33], the function f 
in H; minimizing (30.48) can be expressed as a linear 
combination of the kernels in terms of the available data 


isl 


fO =} aku@,.). 


j=1 


(30.49) 


The learning problem can also be defined as finding œ € 
R! that minimizes 


min ||d(i— 1) —K(i— Ne(i— 1) | 
a(i—1)ERi—! 


+ yo(i—1)?K(i— 1)a(i— 1), 


(30.50) 


where a (i—1) = [a;,...,a;-1]’, d(i— 1) = [d(1),..., 
d(i—1)]", and K(i— 1) € R©“*— is the Gram ma- 
trix with elements Ky = k(u(j),u(k)), j,k = 1,2,..., 
i— 1. The solution of (30.50) will be 

o* = (yI + K(i-1))'d(i-1), (30.51) 
where J denotes an identity matrix with appropriate di- 


mension. Of course, the above least squares problem 
can alternatively be formulated in feature space F 


i=l 
min J AO- 27 9H) + vIe 


j=l 


(30.52) 


The solution of (30.52) can be derived as [30.9] 


2* = G(i—1)a* 
= @(i— 1)(yI+ K(i—1))~'d@— 1), (30.53) 


where ®(i—1) = [g(1),..., p@(i— 1)] (Hence the Gram 
matrix K can also be expressed as K = @’@). The 
KRLS algorithm will update this solution recursively 
as new data (u(i), d(i)) become available. 

When the new data (u(i), d(i)) are available, the op- 
timal solution of (30.53) becomes 


2* = G(i)(yI+ K())_'d(i). (30.54) 


535 


€°0€ | d Hed 


536 PartD 


Neural Networks 


€°0€ | d Hed 


Denote 


Oli) = MHKM = (I+ OOH). 
(30.55) 


It is easy to see 


Qti-1)~! 
h(i)? 


h(i) 


s= 
20) = | y+ (i) (i) 


| x (30.56) 


where h(i) = ®(i— 1)" (i). Using the block matrix in- 
version identity [30.9], one can derive 


Oi- Vr) +z)" —z(i) 
—z(i)? L | 
(30.57) 


a= no | 


where z(i) = Q(i-—I1)h(i), r(i) = y + k(u(i), u(i))— 
z(i)’ h(i). Then the coefficient vector can be updated as 
a” (i) = Q(i)d(i) 
— art (2-H) +202)". =O 
=i) | ~z(i)" 1 | 


er D] 

d(i) 

2 ae 1) oo | 
r(i) T te(i) i 


(30.58) 


where e(i) = d(i)—h(i)æ*(i—1) is the prediction 
error. 

Now we have obtained a recursive algorithm to 
solve the kernel least squares problem, namely, the 
KRLS algorithm (see Algorithm 30.3). The compu- 
tational cost of KRLS is O(i) per iteration. The 
KRLS also produces a network with linear growth. 
All the previously mentioned sparsification or quan- 
tization approaches can still be applied to curb the 
network growth. Notice that the algorithm presented 
here is just the basic KRLS algorithm. There are 
many variants or extensions of KRLS, including the 
exponentially weighted KRLS (EW-KRLS) [30.9], 
sliding window KRLS (SW-KRLS) [30.34], fixed- 
budget KRLS (FB-KRLS) [30.35], extended KRLS 
(EX-KRLS) [30.13], and so on. 


Algorithm 30.3 Kernel Recursive Least Squares 
Algorithm 
Initialization: 
Set the regularization parameter y > 0 
Ca) ={u)}, QO) = [k(w(1), #1) + J, 
a* (1) = Q(1)d(1) 
Computation: 
while {w(i), y(i)}(i > 1) available do 
A(i) = [k(w(i), (1)... Ku), u(i — 1)" 
z(i) = Q(i— 1)h(i) 
r(i) = y + kuf, wD) — 27h) 
sv QG- Dri zz zli) 
T ‘a 


e(i) = d(i) —h(i)’a* (i— 1) 
wn _ | e*G-D)—-z)r@7!e(i) 
n= [OG 
end while 
(where C;(i— 1) denotes the jth element of the dictio- 
nary C(i—1)). 


30.3.3 Kernel Affine Projection Algorithms 
(KAPA) 


The KAPA algorithms are nonlinear extensions of the 
conventional affine projection algorithms (APAs) in 
kernel space, which include the KLMS and KRLS as 
special cases [30.9]. Before presenting the KAPA al- 
gorithms, we give a brief introduction of the APA 
algorithms. 

Let d be a zero-mean scalar-valued random vari- 
able, and u be a zero-mean m x 1 random variable with 
a positive-definite covariance matrix R, = E[uu™]. De- 
note rq, the cross-covariance vector rg, = E[du]. Then 
the weight vector w that solves 


min E|d—w' ul? + \]jw||? , 
welRn 


(30.59) 


is given by 
w* = (M Ra) rae. 


The solution w* of (30.59) can also be recursively 
solved using a common gradient-based method 


w(i) = w(i— 1) + y[ran — OF + Ra)w(i— 1] 
= (1— nà)w(i— 1) + [rau — Ruw (i— 1)], 
(30.60) 


Theoretical Methods in Machine Learning 


30.3 Online Learning with Kernel Adaptive Filters 


Table 30.3 Weight update equations of four KAPA algorithms 


Algorithms Update equations 

KAPA-1 R (i) = 2(i—1) + nS Old — (i) "VB Gi—1)] 

KAPA-2 R (i) = R (i— 1) + nO (i (e1 + Oi) G() “| [d() — S (TR (i-1)] 
KAPA-3 2 (i) = 1— nA RG- 1) + nO Oldi) — (7 RGi- 1)] 

KAPA-4 2 (i) = 1—n) PG—1) + nO MAI Gi)" H(i) dei) 


Table 30.4 Several kernel learning algorithms related to 
KAPA 


Algorithms Relation to KAPA 
KLMS [30.10] KAPA-1 (L= 1) 
NKLMS [30.9] KAPA-2 (L= 1) 
NORMA [30.36] KAPA-3 (L= 1) 


Kernel Adaline [30.37] 
RA-RBF [30.38] 
SW-KRLS [30.34] 
RegNet [30.39] 


KAPA-I (L= N) 
KAPA-3 (nà = 1,L =N) 
KAPA-4 (n = 1) 
KAPA-4 (n = 1,L =N) 


or Newton’s recursion (for the case à Æ 0) 


w(i) = w(i— 1) +A + Ry)! 
X [Fau — AI + Ra )w(i— 1)] 
= (1—n)w(i— 1) + NOI + Ru) ran - 
(30.61) 


If the regularization factor à = 0, Newton’s recursion 
should be 


w(i) = w(i— 1) + n(el + Ru) [rau — Ruw(i— 1)] , 

(30.62) 

where £ is a small positive number to avoid numerical 
instability. 

Suppose we have access to the observations (train- 
ing examples) of u and d: {u(i),d(i)}, i=1,2,... 
Then the APA algorithms can be easily derived by ap- 
proximating R, and rg, in Algorithms (30.60)—(30.62). 
Based on the L most recent observations, the covari- 
ance matrix R, and the cross-covariance vector rg, can 
be simply approximated by 

7 1 
R, = -UOVO , 
r (30.63) 
Pau = UOI ; 
where 
U(i) = [u(i— L+ 1) PEE u(i)]mxL , 
d(i) = [di-—L+1),...,d@]’. 
Combining (30.60) and (30.63) yields 


w(i) = (1—nd)w(i—1) + qU Old -U (i) w(i-1)] . 


(30.64) 
When i = 0, (30.64) becomes 
w(i) = w(i— 1) + nU Dld) — U ("w(i — 1)]. 
(30.65) 
Similarly, combining (30.61) and (30.63), we have 
w(i) = (1—n)w(i— 1) 
+ (14+ U(i)U(i)") U Odi) 


=(1-n)w(i— 1) 
+ U(i)(AL+ VOTUAT dÀ) , 


(30.66) 


where the second equation comes from the matrix in- 
verse lemma. 

Further, using the approximations of (30.63), algo- 
rithm (30.62) becomes 


w(i) = w(i— 1) + ne + UDUT! 


(30.67) 
x U( ld (i) — U()Tw(i— 1)] . 


Algorithm (30.67) is equivalent to (by the matrix in- 
verse lemma) 


w(i) = w(i— 1) + nUD (e+ UTU)! 
x [d(i) — U(i)w(i— 1)] . 
(30.68) 


Algorithms (30.65), (30.68), (30.64), and (30.66) are, 
respectively, referred to as the APA-1, APA-2, APA-3, 
and APA-4 algorithms. 

Reformulating the above APA algorithms in fea- 
ture space yields the KAPA algorithms [30.11], whose 
weight update equations are summarized in Table 30.3. 

The KAPA algorithms are directly related to many 
other OKL algorithms [30.9]. Typical examples are pre- 
sented in Table 30.4. 


537 


€°0€ | d Hed 


538 Part D | Neural Networks 


HOE | d Hed 


30.4 Illustration Examples 


30.4.1 Chaotic Time Series Prediction 


Mackey-Glass (MG) Time Series 
First, we consider the Mackey—Glass time series. The 
time sequence is generated (with a sampling period T = 
6s) from the following time-delay differential equa- 
tion [30.9] 


dx(t) ax(t— 1) 
T bx(t) + Lra o’ 


(30.69) 


where b = 0.1, a = 0.2, and t = 30. The 10 most recent 
values (u(i) = [x(i— 10), ...,x(i— 1)]”) in the past are 
used as the input to predict the present value x(i). A seg- 
ment of 500 samples is used as the training data and 
another 100 as the testing data. The data are corrupted 
by additive Gaussian noise with zero mean and variance 
0.0016. Figure 30.7 shows the learning curves of LMS 
and KLMS algorithms. Evidently, the KLMS converges 


MSE 
012, 


0 100 200 300 400 500 
Tteration 


Fig. 30.7 Learning curves of LMS and KLMS in MG time 
series prediction (adopted from [30.9]) 


Table 30.5 Performance comparison among LMS, 
KLMS, and RN 


Algorithms Training MSE Testing MSE 
LMS 0.021 + 0.002 0.026 + 0.007 
KLMS (yn = 0.1) 0.0074 + 0.0003 0.0069 + 0.0008 
KLMS (n = 0.2) 0.0054 + 0.0004 0.0056 + 0.0008 
KLMS (n = 0.6) 0.0062 + 0.0012 0.0058 + 0.0017 


RN (A= 0) 00 0.012 + 0.004 
RN (A= 1) 0.0038 + 0.0002 0.0039 + 0.0008 
RN (A= 10) 0.011 + 0.0001 0.010 + 0.0003 


to a much smaller value of the testing MSE. This is an 
expected result as the MG time series is a nonlinear sys- 
tem. In the simulation, the Gaussian kernel is used, and 
the kernel parameter (a = 1/(207)) is set at a= 1.0. 
The step sizes of the LMS and KLMS are both set at 
0.2. Table 30.5 presents the performance comparison 
among LMS, KLMS with different step sizes, and reg- 
ularization network (RN) with different regularization 
parameters. The performance of KLMS is much better 
than LMS and is comparable to RN with the best regu- 
larization. This is indeed surprising since RN is a batch 
mode kernel regression method while KLMS is a sim- 
ple stochastic gradient algorithm in RKHS. 


Lorenz Time Series 
Next, we consider the Lorenz chaotic time series, gen- 
erated from a nonlinear, three-dimensional dynamic 
system [30.17] 


Bx + 

— = —px š 

dt n 

dy 

= Ey 30.70 
T (z—y) ( ) 
oe + 

gy TI t-z, 


where the parameters are B = 8/3, 8 = 10, and p = 28. 
The sample data are obtained using the first-order ap- 
proximation with step size 0.01. The state x is picked for 
short-term prediction task. The signal is preprocessed to 
be zero mean and unit variance (Fig. 30.8). 


0 500 1000 1500 2000 
Sample 


Fig. 30.8 A segment of the processed Lorenz time series 


Theoretical Methods in Machine Learning | 30.4 Illustration Examples 539 


We use the previous five consecutive samples 
u(i) = [x(i—5),...,x(i—1)]’ to predict the current 
sample x(i). The performances of QKLMS, KLMS- 
NC, KLMS-SC, and the standard KLMS are compared. 
Here, KLMS-NC and KLMS-SC denote the sparsified 
KLMS with, respectively, the novelty and surprise cri- 
terion. The Gaussian kernel with the kernel parameter 
a= 1.0 is used. The step sizes are all set at y = 0.1, 
and the other parameters are tuned such that all the 
algorithms except KLMS yield almost the same final 
network size (Fig. 30.9). Figure 30.10 shows the av- 
erage learning curves over 100 simulation runs with 
different segments of the signal, where the testing MSE 


Network size 
400 


350 
300 
250 
200 
150 
100 

50 


4000 
Tteration 


0 
0 1000 2000 3000 


Fig. 30.9 Network sizes of QKLMS, KLMS-NC, and 
KLMS-SC in Lorenz time series prediction 


Testing MSE 
10! 
sevens QKLMS 
— KLMS-NC 
~~~ KLMS-SC 
0 
10 — KLMS 


> 
4000 
Iteration 


0 1000 


2000 3000 


Fig. 30.10 Learning curves of QKLMS, KLMS-NC, 
KLMS-SC, and KLMS in Lorenz time series prediction 


is calculated based on 200 test data (the filter is fixed 
in the testing phase). Simulation results clearly indicate 
that the QKLMS exhibits much better performance, 
achieving almost the same testing MSE as the KLMS 
but with small network size. 


30.4.2 Frequency Doubling 


In frequency doubling, both the input and desired data 
for the learning system are sine waves with frequencies 
fo and 2f, respectively (Fig. 30.11). In this exam- 
ple, 1500 samples are used as the training data and 
another 200 as the testing data. The data are cor- 


A 


100 
Epochs 


Fig. 30.11 Simulation data in frequency doubling (adopted 
from [30.31]) 


IEP 
3 
— KLMS 
= —<$<@<— $$ KMC 
| --- MCC 


at N 


1 [l 
ai! 
Hk 
HH | 


z AAAI An vane had Ati M 
me 


1500 
Epochs 


Fig. 30.12 Learning curves of KLMS, KMC, and MCC in 
frequency doubling (adopted from [30.31]) 


TOE] d Hed 


540 


1 0€ | d Hed 


Part D | Neural Networks 


rupted by an impulsive mixture Gaussian noise, with 
probability density function p(x) ~ 0.9N (0, 0.01) + 
0.1N (2, 0.01) [30.31]. Let the dimension of the input 
vector be 2. Figure 30.12 shows the average learning 
curves of KLMS, KMC, and MCC (adaptive FIR filter 
under MCC criterion). It is clear that the KMC algo- 
rithm outperforms both KLMS and MCC algorithms. 
Simulation results suggest that the KMC algorithm per- 
forms well under impulsive noise environment. 


Primary signal 


Noise source n(i) 


u(i) , / 
Adaptive filter 


Fig. 30.13 Basic structure of the noise cancellation system 


Interference 
distortion 


MSE 
0.05 

—— NLMS 
0.04 -—-- KLMS-NC 


ee KAPA-2-NC | 


0.03 
0.02 


0.01 | hj 


Senne 
1000 
Iteration 


800 


400 600 


Fig. 30.14 Average learning curves of NLMS, KLMS- 
NC, and KAPA-2-NC in noise cancellation (adopted 
from [30.9]) 


; + ; 
s(i) r(i) 
Nonlinearit 
ar ©) 


Fig. 30.15 Basic structure of the nonlinear channel 


30.4.3 Noise Cancellation 


Noise cancellation is very important in signal process- 
ing where an unknown interference has to be removed 
based on some reference measurement. Figure 30.13 
shows the basic structure of a noise cancellation system. 
The goal of the noise cancellation is to use the refer- 
ence measurement u(i) as the input to the adaptive filter 
and to obtain the filter output y(i) as an estimate of the 
unknown noise source n(i), such that the noise can be 
subtracted from the noisy measurement d(7) to improve 
the signal-to-noise ratio (SNR). 

In this example, the noise source is assumed to be 
white and uniformly distributed over [—0.5, 0.5]. Fur- 
ther, the nonlinear interference distortion function is 


u(i) = n(i) —0.2u(i— 1) 1)n(i—1) 
+ 0.1In(i— 1) + 0.4u(i— 2) . 


During the training phase the primary signal is assumed 
to be s(i) = 0, that is, the system simply tries to recon- 
struct the noise source from the reference measurement. 
We use the NLMS, KLMS-NC, KAPA-2-NC (L= 
10) algorithms. The average learning curves over 200 
Monte Carlo simulations are illustrated in Fig. 30.14. 
In the simulation, the step sizes for NLMS, KLMS- 
NC, and KAPA-2-NC are 0.2, 0.5, and 0.2, respectively. 
The Gaussian kernel is used for both KLMS-NC and 
KAPA-2-NC with kernel parameter a = 1.0. The toler- 
ance parameters for KLMS-NC and KAPA-2-NC are 
8; = 0.15 and 8) = 0.01. The noise reduction (NR) fac- 
tor, defined as 


ue (30.71) 


E{n(i)”] 
119810 Fin) — OP 
and the corresponding final network sizes are listed in 
Table 30.6. The performance improvement of KAPA 
over KLMS is obvious. 


30.4.4 Nonlinear Channel Equalization 


The final example is on nonlinear channel equalization, 
where the nonlinear channel consists of a serial con- 
nection of a linear filter H(z) and a static nonlinearity 
(Fig. 30.15). The problem setting is as follows: 


Table 30.6 Performance comparison of NLMS, KLMS, 
and KAPA-2 in noise cancellation 


Algorithms Network size NR (dB) 

NLMS N/A 9.09 + 0.45 
KLMS-NC 407+ 14 15.58 + 0.48 
KAPA-2-NC 370 14 21.99 + 0.80 


Theoretical Methods in Machine Learning | 30.4 Illustration Examples 


MSE 
1 
— LMS 
tanisin APA-1 
0.8 Fee KLMS-NC 
~~~ KAPA-1-NC 


— KAPA-2-NC 


0 2000 4000 6000 8000 10000 
Tteration 


Fig. 30.16 Learning curves of LMS, APA-1, KLMS- 
NC, KAPA-1-NC, and KAPA-2-NC in nonlinear channel 
equalization (adopted from [30.9]) 


A binary signal {s(1),s5(2),...,5(N)} is fed into 
a nonlinear channel. At the receiver end of the channel, 
the signal is further corrupted by additive i.i.d. Gaussian 
noise, and then is observed as {r(1),r(2),...,r(N)}. 
The objective of channel equalization is to learn an in- 
verse filter that recovers the original signal with as low 
an error rate as possible. This problem can be formu- 
lated as a regression problem with input—output training 
data {(r(t+ D), r(t+D—1),...,r(t+D—I1+1)), sO}, 
where / is the time embedding length, and D is the 
equalization time lag. In this example, the nonlinear 
channel is defined by x(t) = s(t) + 0.5s(t— 1), r(t) = 
x(t) —0.9x(t)* + n(t), where n(f) is a white Gaussian 
noise with variance o°. 

We compare the performance of LMS, APA-1, 
KLMS-NC, KAPA-1-NC (L= 10) and KAPA-2-NC 
(L = 10). The noise variance is assumed to be 0.01. 
l=3, and D=2 in the equalizer. For KLMS-NC, 
KAPA-I-NC and KAPA-2-NC, the Gaussian kernel 
with kernel parameter 1.0 is used, and the NC is em- 
ployed with 8; = 0.26, 8; = 0.08. Figure 30.16 shows 
the average learning curves over 50 Monte Carlo sim- 
ulations, where the MSE is calculated between the 
continuous output (i.e., before taking the hard deci- 
sion) and the desired signal. Figure 30.17 plots the 
dynamic changes of the network sizes during the train- 
ing. In addition, different noise variances are set. To 


Network size 


110 
100 
90 
80 

———— KLMS-NC | 
70 di — KAPA-1-NC | 

~~~ KAPA-2-NC 
60 l > 
0 2000 4000 6000 8000 10000 


Tteration 


Fig. 30.17 Network sizes of KLMS-NC, KAPA-1-NC, and 
KAPA-2-NC over training in nonlinear channel equaliza- 
tion (adopted from [30.9]) 


BER 


10"! 


10° 


10° 


10+ 


> 
0 5 10 15 20 25 
Normalized SNR (dB) 


Fig. 30.18 Performance comparison of LMS, APA-1, 
KLMS-NC, KAPA-1-NC, and KAPA-2-NC with dif- 
ferent SNR in nonlinear channel equalization (adopted 
from [30.9]) 


make the comparison fair, we tune the NC parame- 
ters (ô; and 2) to make the network size almost the 
same (around 100) in each scenario. The simulation re- 
sults in terms of bit error rate (BER) are presented in 
Fig. 30.18, where the normalized SNR is defined as 
10 log,)(1/07). 


54 


1'OE | d Hed 


542 Part D | Neural Networks 


OE | d Hed 


30.5 Conclusion 


Online learning has found its place in a wide range of 
applications, especially in situations where the number 
of training data is extremely large or the data statistics 
change fast over time. Recent studies suggest that many 
online learning algorithms can be efficiently extended 
to kernel space, provided that these algorithms can be 
expressed in terms of inner products, since the inner 
products in high-dimensional kernel space can be sim- 
ply calculated using the kernel function in input space. 
At present, most of the well-known linear adaptive filter- 
ing algorithms, such as the LMS, RLS, and APA, have 
been kernelized. These new algorithms, namely the ker- 
nel adaptive filtering algorithms, can solve incremen- 
tally arbitrary nonlinear problems in the input space, if 
the kernel space is rich (high-dimensional) enough to 


References 


30.1 S. Haykin: Neural Networks and Learning Machines, 
3rd edn. (Prentice Hall, Upper Saddle River 2009) 

30.2 E. Alpaydin: Introduction to Machine Learning (MIT 
Press, Cambridge 2004) 

30.3 L. Bottou, 0. Bousquet: The tradeoffs of large-scale 
learning. In: Optimization for Machine Learning, 
ed. by S. Sra, S. Nowozin, S.J. Wright (MIT Press, 
Cambridge 2011) pp. 351-368 

30.4 V. Vapnik: The Nature of Statistical Learning Theory 
(Springer, New York 1995) 

30.5 B. Scholkopf, A.J. Smola: Learning with Kernels, 
Support Vector Machines, Regularization, Opti- 
mization and Beyond (MIT Press, Cambridge 2002) 

30.6 B. Widrow, S.D. Stearns: Adaptive Signal Processing 
(Prentice-Hall, Englewood Cliffs 1985) 

30.7 S. Haykin: Adaptive Filtering Theory, 3rd edn. 
(Prentice Hall, Upper Saddle River 1996) 

30.8 A.H. Sayed: Fundamentals of Adaptive Filtering 
(Wiley, Hoboken 2003) 

30.9 W. Liu, J.C. Principe, S. Haykin: Kernel Adaptive Fil- 
tering: A Comprehensive Introduction (Wiley, Hobo- 
ken 2010) 

30.10 W. Liu, P. Pokharel, J. Principe: The kernel least 
mean square algorithm, IEEE Trans. Signal Process. 
56, 543-554 (2008) 

30.11 W. Liu, J. Principe: Kernel affine projection algo- 
rithm, EURASIP J. Adv. Signal Process. 2008, 784292 
(2008) 

30.12 Y. Engel, S. Mannor, R. Meir: The kernel recursive 
least-squares algorithm, IEEE Trans. Signal Process. 
52, 2275-2285 (2004) 

30.13 W. Liu, II Park, Y. Wang, J.C. Principe: Extended ker- 
nel recursive least squares algorithm, IEEE Trans. 
Signal Process. 57, 3801-3814 (2009) 


represent the mapping. In general, KAFs naturally cre- 
ate a growing RBF network, learning the network topol- 
ogy and adapting the free parameters directly from data 
at the same time. However, by sparsifying the solution 
one can achieve a compact model with small network 
size even in continuous adaptation situations. 

Illustration examples, including chaotic time series 
prediction, frequency doubling, noise cancellation, and 
nonlinear channel equalization, have been presented to 
demonstrate the performance and usefulness of the ker- 
nel adaptive filtering algorithms. 

The use of the kernel trick and online sparsifica- 
tion techniques to develop other online learning algo- 
rithms (supervised, unsupervised, and reinforcement) is 
a promising direction for future research. 


30.14 J. Platt: A resource-allocating network for function 
interpolation, Neural Comput. 3, 213-225 (1991) 

30.15 C. Richard, J.C.M. Bermudez, P. Honeine: Online 
prediction of time series data with kernels, IEEE 
Trans. Signal Process. 57, 1058-1066 (2009) 

30.16 W. Liu, II Park, J.C. Principe: An information theo- 
retic approach of designing sparse kernel adaptive 
filters, IEEE Trans. Neural Netw. 20, 1950-1961 (2009) 

30.17 B. Chen, S. Zhao, P. Zhu, J.C. Principe: Quantized 
kernel least mean square algorithm, IEEE Trans. 
Neural Netw. Learn. Syst. 23(1), 22-32 (2012) 

30.18 J.C. Principe: Information Theoretic Learning: 
Renyi's Entropy and Kernel Perspectives (Springer, 
New York 2010) 

30.19 W. Liu, P.P. Pokharel, J.C. Principe: Correntropy: 
properties and applications in non-Gaussian sig- 
nal processing, IEEE Trans. Signal Process. 55(11), 
5286-5298 (2007) 

30.20 J.-W. Xu, A. Paiva, |. Park, J.C. Principe: A re- 
producing kernel Hilbert space framework for 
information-theoretic learning, IEEE Trans. Signal 
Process. 56(12), 5891-5902 (2008) 

30.21 E. Moore: On properly positive Hermitian matrices, 
Bull. Am. Math. Soc. 23(59), 66-67 (1916) 

30.22 N. Aronszajn: The theory of reproducing kernels 
and their applications, Cambr. Philos. Soc. Proc. 39, 
133-153 (1943) 

30.23 J. Mercer: Functions of positive and negative type, 
and their connection with the theory of integral 
equations, Philos. Trans. R. Soc. Lond. 209, 415- 
446 (1909) 

30.24 B. Hassibi, A.H. Sayed, T. Kailath: The Hə optimal- 
ity of the LMS algorithm, IEEE Trans. Signal Process. 
44, 267-280 (1996) 


Theoretical Methods in Machine Learning 


References 


30.25 


30.26 


30.27 


30.28 


30.29 


30.30 


30.31 


30.32 


B.W. Silverman: Density Estimation for Statistics 
and Data Analysis (Chapman Hall/CRC, London 
1986) 

B. Chen, S. Zhao, P. Zhu, J.C. Principe: Mean square 
convergence analysis of the kernel least mean 
square algorithm, Signal Process. 92, 2624-2632 
(2012) 

|. Steinwart: On the infuence of the kernel on the 
consistency of support vector machines, J. Mach. 
Learn. Res. 2, 67-93 (2001) 

N.R. Yousef, A.H. Sayed: A unified approach to 
the steady-state and tracking analysis of adap- 
tive filters, IEEE Trans. Signal Process. 49, 314-324 
(2001) 

TY. Al-Naffouri, A.H. Sayed: Adaptive filters with 
error nonlinearities: mean-square analysis and 
optimum design, EURASIP J. Appl. Signal Process. 
4, 192-205 (2001) 

TY. Al-Naffouri, A.H. Sayed: Transient analysis 
of adaptive filters with error nonlinearities, IEEE 
Trans. Signal Process. 51, 653-663 (2003) 

S. Zhao, B. Chen, J.C. Principe: Kernel adaptive fil- 
tering with maximum correntropy criterion, Proc. 
Int. Joint Conf. Neural Netw. (IJCNN) (2011) pp. 2012- 
2017 

S. Zhao, B. Chen, J.C. Principe: An adaptive ker- 
nel width update for correntropy, Proc. Intern. Joint 
Conf. Neural Netw. (CNN) (2012), pp. 1-5 


30.33 


30.34 


30.35 


30.36 


30.37 


30.38 


30.39 


C.J.C. Burges: A tutorial on support vector machines 
for pattern recognition, Data Min. Knowl. Discov. 2, 
121-167 (1998) 

S. Van Vaerenbergh, J. Via, |. Santamaria: A sliding 
window kernel RLS algorithm and its application 
to nonlinear channel identification, IEEE Int. Conf. 
Acoust., Speech, Signal Process. (ICASSP), Toulouse 
(2006) 

S. Van Vaerenbergh, I. Santamaria, W. Liu, 
J.C. Principe: Fixed-budget kernel recursive least- 
squares, 2010 IEEE Int. Conf. Acoust, Speech Sig- 
nal Process. (ICASSP), Dallas (2010) pp. 1882- 
1885 

J. Kivinen, A. Smola, R.C. Williamson: Online learn- 
ing with kernels, IEEE Trans. Signal Process. 52(8), 
2165-2176 (2004) 

T.-T. Frieb, R.F. Harrison: A kernel-based ADALINE, 
Proc. Eur. Symp. Artif. Neural Netw. 1999 (1999) 
pp. 245-250 

W. Liu, P.P. Pokharel, J.C. Principe: Recursively 
adapted radial basis function networks and its 
relationship to resource allocating networks and 
online kernel learning, 2007 IEEE Workshop Mach. 
Learn. Signal Process., Thessaloniki (2007) pp. 300- 
305 

F. Girosi, M. Jones, T. Poggio: Regularization theory 
and neural networks architectures, Neural Comput. 
7, 219-269 (1995) 


543 


oE | d Hed 


31. Probabilistic Modeling in Machine Learning 


Davide Bacciu, Paulo J.G. Lisboa, Alessandro Sperduti, Thomas Villmann 


Probabilistic methods are the heart of machine 
learning. This chapter shows links between core 
principles of information theory and probabilistic 
methods, with a short overview of historical and 
current examples of unsupervised and inferen- 
tial models. Probabilistic models are introduced as 
a powerful idiom to describe the world, using ran- 
dom variables as building blocks held together by 
probabilistic relationships. The chapter discusses 
how such probabilistic interactions can be mapped 
to directed and undirected graph structures, 
which are the Bayesian and Markov networks. We 
show how these networks are subsumed by the 
broader class of the probabilistic graphical mod- 
els, a general framework that provides concepts 
and methodological tools to encode, manipulate 
and process probabilistic knowledge in a computa- 
tionally efficient way. The chapter then introduces, 
in more detail, two topical methodologies that 
are central to probabilistic modeling in machine 
learning. First, it discusses latent variable mod- 
els, a probabilistic approach to capture complex 
relationships between a large number of observ- 
able and measurable events (data, in general), 
under the assumption that these are generated 
by an unknown, nonobservable process. We show 
how the parameters of a probabilistic model in- 
volving such nonobservable information can be 
efficiently estimated using the concepts under- 
lying the expectation—maximization algorithms. 
Second, the chapter introduces a notable example 


31.1 Probabilistic and Information-Theoretic 


GEN OS csi scethiovndteatiadiarticraiatvavacsesecs 545 
31.1.1. Information-Theoretic Methods.. 547 
31.1.2 Probabilistic Models .................. 548 
31.2 Graphical Models....................c:cccccceeees 552 
31.2.1 Bayesian Networks .................0.. 553 
31.2.2 Markov Networks..................0004 555 
34:23 WTEENCE. occ cescisccessaccessesesenesaas 556 
31.3 Latent Variable Models...............00.000.... 560 
31.3.1 Latent Space Representation ...... 561 


31.3.2 Learning with Latent Variables: 
The Expectation—Maximization 


APOR AM vesccsssadesessaaneesacacaszens 561 

31.3.3 Linear Factor Analysis ................ 562 
31.3.4 Mixture Models eeno cssrsesssss 563 

31.4 Markov Models ....................c.cccceceece eens 565 
Bll Markov CHINES: sccscsrsccssesccesatoas 566 
31.4.2 Hidden Markov Models.............. 567 
31.4.3 Related Models ...................005 574 

31.5 Conclusion and Further Reading........... 572 
ROTOTONCOS E A E 573 


of latent variable model, that is of particular 
relevance for representing the time evolution of 
sequence data, that is the hidden Markov model. 
The chapter ends with a discussion on advanced 
approaches for modeling complex data-generating 
processes comprising nontrivial probabilistic in- 
teractions between latent variables and observed 
information. 


31.1 Probabilistic and Information-Theoretic Methods 


Information theory is closely connected to probability 
theory and statistics. In particular, the standard defi- 
nition of information contained in a random variable 
X with a probability density function P(X) is well 
known to be /(X) = — log(P(X)), with the correspond- 


ing Shannon entropy, in differential form, given by the 
average information 


H(P) =~ | PCa) tog (POO) ae. (31.1) 


545 


546 PartD 


Neural Networks 


le | d Hed 


One of the fundamental theorems of information the- 
ory, the second Gibbs theorem, states that the normal 
distribution achieves maximum entropy, hence maximal 
average information from all distributions with known 
variance. To show this in the univariate case, consider 
the normal distribution in the standard form 


m (- X- | 
P 202 i 


It is straightforward to show that for the natural loga- 
rithm 


P(X) = 


27m0? 


z J P(X) log(P(X))dx = ; + log (v2x0?) 


= J G(X) log(P(X))dx, 
where G(X) is any arbitrary density function with vari- 
ance f G(X)(X— u)?dx = o°. Therefore, the difference 


in average information between the two density func- 
tions necessarily observes the following 


- f rœ log Pdr f GOO log(G(X))dx 


=- f Geo log(POX)}ax-+ [GEO log(G(X))dx 


=- f Geo 1g (E2) dx 


using Jensen’s inequality log(x) < x— 1 and the normal- 
ization property f P(X) = f Q(X) = 1. This is a par- 
ticular instance of Gibbs inequality and proves that the 
asymptotic distribution of the central limit theorem also 
maximizes entropy. 

This led, in probability theory, to the definition 
of natural measures of dissimilarity closely related 
to the expectation of information difference, e.g., the 
Kullback—Leibler (KL) divergence [31.1] 


P(X) 


Dxt (P||Q) = [ro toe (7) dx, (31.2) 


Q(X) 
as generalized distances between probability distribu- 
tions P and Q. 

The KL divergence occurs frequently in machine 
learning, where the development of learning strategies 
links information theory with statistical and biologi- 
cally motivated concepts. For instance, the perceptron 


model was established as a simple but mathematically 
tractable model of a biological neuron as the smallest 
information processing unit in brains [31.2]. Recog- 
nition that gradient descent provided a pragmatic but 
effective solution to the credit assignment problem, 
namely which values the hidden nodes should have, 
led to the multilayer perceptron as powerful compu- 
tational tools for classification and regression. Initially 
maximum likelihood optimization was used for param- 
eter estimation, following the tried and tested statisti- 
cal concepts of normally distributed errors leading to 
a sum-of-squares loss function in regression and, for 
classification, the Bernoulli distribution for binary data 
and the so-called cross-entropy (31.2) for multinomial 
class assignments, the latter two likelihood functions 
measuring information divergence averaged over the 
true distribution given by the empirical class labels. 

Information theoretic aspects (e.g., mutual informa- 
tion) were also considered in neural models in order 
to avoid overtraining [31.3], for instance in Boltzmann 
networks which directly mirror information princi- 
ples in statistical mechanics [31.4]. Related approaches 
are used currently for deep learning models, where 
information principles drive the feature representa- 
tions [31.5]. 

The correspondence between maximum entropy 
and maximum likelihood outlined above is just one 
aspect of the application of information-theoretic con- 
cepts in machine learning. The next section outlines fur- 
ther developments linked first to source identification 
through blind signal separation and matrix factoriza- 
tion methods. These concepts from signal processing 
identify important degrees of freedom that may be used 
as hidden variables in probabilistic models, discussed 
later in the chapter. Furthermore, the application of 
information-theoretic methods extends also to the au- 
tomatic identification of prototypes for use in compact 
data representations that include dictionaries defined by 
methods such as vector quantization, typically with un- 
supervised approaches. 

Supervised methods are introduced as probabilis- 
tic models, focusing first on discriminative methods. 
This indicates that the maximum likelihood approach 
is limited in its predictive power in generalization to 
out-of-sample data, because it allows models to be 
generated with very little bias but with considerable 
variance — for a more detailed discussion of this point 
refer [31.6]. What this means in practice is that flexible 
models such as neural networks are prone to overfitting 
unless the complexity of the model is controlled along 
with the extent to which the model fits the data. The 


Probabilistic Modeling in Machine Learning 


31.1 Probabilistic and Information-Theoretic Methods 


latter is described by the likelihood, but the model com- 
plexity can be controlled in a number of different ways. 
In probabilistic models an efficient framework to max- 
imize the generality of probabilistic inference models 
is to apply the maximum a posteriori (MAP) frame- 
work which optimizes the posterior probability of the 
model parameters given the data but also given prior 
distributions for the parameters, typically limiting their 
size by assuming a zero-centred normal distribution as 
the prior. This is the basis of the method of automatic 
relevance determination, explained in Sect. 31.2. 

While discriminative models are efficient approxi- 
mators for nonlinear response functions, both in regres- 
sion and in the estimation of class conditional density 
functions, they are difficult to interpret and can gener- 
ally be considered as black boxes, meaning that they 
are not readily interpreted to give insights about the 
data. A topical and widely used alternative approach 
is to model the joint distribution of the data directly. 
This is ideally done by factorization into subgraphs into 
which the multivariate structure of the data is broken-up 
using strict conditional independence requirements, as 
discussed in Sect. 31.2. Inference can then proceed us- 
ing Bayes theorem introduced in (31.6). 

An alternative approach to modeling the joint distri- 
bution of the covariates is to use the mutual correlation 
in the data to identify important degrees of freedom that 
may be hidden in the sense that they are not directly ob- 
served. This generates latent variable representations 
that naturally fit into the framework of probabilistic 
modeling. However, the introduction of additional vari- 
ables also introduces complexity into the optimization 
process for estimating their values. This leads naturally 
to the introduction of expectation maximization (EM), 
a general approach of particular value for estimating 
mixture models, discussed in Sect. 31.3. 

So far the modeling methodologies focus on snap- 
shots of the data, without taking into consideration the 
time evolution of the covariates. To do this requires 
explicit parametrization, for which arguably the most 
widely used probabilistic approach is hidden Markov 
models (HMM). These models are build on the concepts 
of conditional independence, latent variables, and ex- 
pectation maximization to model the time evolution of 
sequences of covariate measurements, in the last sub- 
stantive Sect. 31.4 


31.1.1 Information-Theoretic Methods 


While the statistical properties of perceptrons are 
widely investigated [31.6], the more difficult prob- 


lem of establishing statistical independence is becom- 
ing increasingly important and novel algorithms have 
been presented during the last decade [31.7]. Their 
applicability is enormous, ranging from variable selec- 
tion, to blind source separation (BSS) and statistical 
causality. Frequently, the difficult question of statistical 
dependence in data is replaced by the easier consid- 
eration of estimation and application of data correla- 
tions for learning strategies. A recent approach tries 
to determine independence by generalized correlation 
functions [31.8]. In this context of decorrelation and 
independence, BSS and nonnegative matrix factoriza- 
tions [31.9] of data channels are based on statistical 
deconvolution. A comprehensive overview for BSS, 
independent component analysis (ICA) and nonneg- 
ative matrix and tensor factorization (NMF) can be 
found in [31.10-12], respectively. Different aspects can 
be investigated, like ICA and BSS maximizing con- 
ditional probabilities [31.11]. A relevant connection 
exists between NMF and probabilistic graphical models 
comprising hidden variables [31.13], which is briefly 
discussed in Sect. 31.3.4. 

Other recent approaches in this field incorporate in- 
formation theoretic principles directly: Pham [31.14] 
investigated BSS based on mutual information, 
whereas [31.15] applied /-divergences. The infomax 
principle for ICA was considered in depth [31.16], as 
was the problem of learning overcomplete data rep- 
resentations and performing overcomplete noisy blind 
source separation, e.g., the sparse coding neural gas 
(SCNG) [31.17]. Recent results including modern di- 
vergences (generalized w--divergences) were recently 
published [31.18]. Obviously, information theoretic di- 
vergence measures like Rényi-divergences (belonging 
to the family of w-divergences) capture directly the sta- 
tistical information contained in the data, as expressed 
by the probability density function [31.19, 20]. This 
property can be used for unsupervised model estimation 
for instance in vector quantization, when divergences 
are used as dissimilarity measure [31.21]. 

Information optimum vector quantization by proto- 
types is a widely investigated topic in clustering and 
data compression, based on the optimization of the 
y-reconstruction error 


Evo) = f Iv wile PW = wav. 


where P(V = v) is the data density of the vector data v 
and ||v— w(v)||z is the Euclidean distance of the data 
vector and the prototype w(v) representing it. One of 


547 


Le | d Hed 


548 PartD 


Neural Networks 


Le | d Hed 


the key results concerning information theoretic prin- 
ciples for vector quantization is Zador’s magnification 
law [31.22]: if the data vectors v are given in q- 
dimensional Euclidean space, then the magnification 
law p ~ P% holds. Here, p(w) is the prototype density 
with the magnification factor 


q 


a= . 
qty 


This is the basic principle of vector quantization based 
on Euclidean distances. For different schemes like self- 
organizing maps, Neural Gas variants with slightly 
different magnification factors are obtained depend- 
ing on the choice of neighborhood cooperation scheme 
applied during prototype adaptation [31.23—25]. Infor- 
mation optimum magnification for œ = 1 is equivalent 
to maximum mutual information [31.22]. Yet, it is pos- 
sible to control the magnification for most of these 
algorithms by different strategies like localized or fre- 
quency sensitive competitive learning. For an overview, 
we refer to [31.23]. If the Euclidean distance is re- 
placed by divergence measures, optimum magnification 
a = | can also be achieved by maximum entropy learn- 
ing [31.26], or by the utilization of correntropy [31.27]. 
Vector quantization algorithms directly derived from in- 
formation theoretic principles based on Rényi entropies 
are intensively studied in [31.28], also highlighting its 
connection to graph clustering and Mercer kernel-based 
learning [31.29]. 

Other information theoretic vector quantizers opti- 
mize the mutual information between data and proto- 
types, or the respective KL divergence, instead of min- 
imizing a reconstruction error [31.30]. Based on this 
principle, several data embedding, or dimensionality re- 
duction techniques, have been developed as alternatives 
to multidimensional scaling. These approaches are fre- 
quently used to visualize data. Prominent examples are 
stochastic neighborhood embedding (SNE) [31.31] or 
variants thereof: for instance, t-SNE uses outlier-robust 
Student-f-distributions for data characterization instead 
of Gaussians [31.32]. The generalization to other diver- 
gences than KL can be found in [31.33]. 

Another role for information theory in machine 
learning is in feature selection. Removing irrelevant or 
redundant features not only leads to a simplification of 
the model and a reduced requirement for data acquisi- 
tion, but it is also central for maximizing the generality 
of the model when it is applied to future data. Most 
feature selection approaches are supervised schemes, 
hence using class information or expected regression 


values. Strategies to achieve this goal can be classi- 
cal Bayesian inference schemes of which automatic 
relevance determination (ARD) is a good example (de- 
scribed further in Sect. 31.2), or statistical approaches 
based on mutual correlation or covariances [31.34, 35]. 
An alternative approach to feature selection is to use 
mutual information 


K(X, Y) = De (V(X, Y) PŒ@QY)) 


between random variables X and Y with probabil- 
ity densities P and Q, respectively, and joint density 
J [31.36]. Here, the features are treated as random vari- 
ables to be compared and mutual information measures 
the information loss resulting from removal of variables 
from the model. Learning classification together with 
feature weighting in vector quantization is known as 
relevance learning [31.37]. Recent developments to in- 
troduce sparseness according to information theoretic 
constraints are discussed in [31.38, 39]. 

Information-theoretic measures such as mu- 
tual information, can be explicitly estimated from 
data [31.40]. This is used in the context of vectorial 
data analysis to obtain consistent and reliable estima- 
tors with topographic maps or kernels [31.41]. Further 
applications of information theoretic learning also use 
Rényi entropy 


az loe ( f eœ) a) 


as a cost function instead of the mean squared error, 
resorting, for computational efficiency, to Parzen esti- 
mators [31.42] or nearest neighbor entropy estimation 
models. For effective computation of an approximate 
of the mutual information /(X, Y), the quadratic Rényi 
entropy H2(p) or the closely related information energy 
are common choices [31.43]. Parzen window-based es- 
timators for some information theoretic cost functions 
have also been shown to be cost functions in a cor- 
responding Mercer kernel space [31.44]. In particular, 
a classification rule based on an information theoretic 
criterion has been shown to correspond to a linear clas- 
sifier in the kernel space. This leads to the formulation 
of the support vector machine (SVM) from information 
theory principles. 


A, (P) = 


31.1.2 Probabilistic Models 


Kernel models are known for having excellent discrim- 
ination performance, but they are typically not well 


Probabilistic Modeling in Machine Learning 


31.1 Probabilistic and Information-Theoretic Methods 


calibrated. This is because they are designed to be 
efficient binary class allocation models rather than es- 
timators of the posterior probability for membership of 
each class C. As an example, SVMs allocate inputs to 
classes on the basis on a binary-valued indicator vari- 
able that generally does not have a link function to 
a probability density estimate. This type of models is 
known as discriminative models, a well-known variant 
being Fisher’s linear discriminant. As the name implies, 
the central model is linear in the covariates, 
y=w'x 

optimizing, for binary classification, a discriminant 
function derived from the mean m; and variance s; of 
each class (i. e., i = 1,2), namely 


H= (rm = rn)” 
Sp tS 

In general, given the two data cohorts, the covariance 
matrix of the data S has a strict decomposition into 
within- and between-class covariance matrices as S = 
Sw + Sp. For an overall data mean vector m and a total 
of N; data points in each class, these matrices are given 
by 


N 


S=)°(@i-—m)"Gi-m)), 


i=1 


2 N 
Sy = 5 5 (i — m) (Œi — m;)) , 


j=1 i=1 


The solution to the optimization of J (w) is 
—1 
was, (m2—m), 


where the inverse of the within-class covariance matrix 
S, positions the discriminant hyperplane so as to min- 
imize the overlap between the projections of the data 
points in each class onto the direction of the weight 
w. This illustrates the observation that, in general, this 
projection will not be calibrated with a probabilistic es- 
timate such as the logit 
P(C|X) 
logit(P(C|X)) = log ( = San! . 


The correct calibration is found in a class of generalized 
linear models of the form 


y(x) =f (wx + wo), 


where f(-) is known as the activation function in ma- 
chine learning and its inverse is called a link function 
by statisticians [31.6]. Perhaps the best-known choice 
of activation is the sigmoid function, where the proba- 
bilistic model becomes logistic regression and the linear 
index w’ x represents exactly the logit (P(C|X)). This is 
very widely used and a generally well-calibrated model, 
even when severe class imbalance is present. 

It is often quoted that generalized linear mod- 
els are limited by the discriminant forms determined 
by the linear scores, which must therefore be hyper- 
planes. However, this ignores the observation that, in 
most practical applications, suitable attribute represen- 
tations are defined using domain knowledge, typically 
by binning variables into discrete states. This turns 
the probabilistic estimators into linear-in-the-parameter 
models with significant discrimination potential for 
nonlinearly separable data. In effect, if the link func- 
tion is properly tuned to the noise structure of the data 
and in particular when there are larger numbers of in- 
dependent covariates, well-designed generalized linear 
models are competitive with flexible machine learning 
models, the more so as the limitation of using a linear- 
in-the-parameters scoring index now works as a form 
of regularization limiting the complexity of the model. 
Moreover, the linear index provides a strong element 
of interpretability whose importance to application do- 
main experts cannot be overestimated. Notwithstanding 
the power of machine learning, generalized linear mod- 
els should always be used as benchmarks to set against 
nonlinear models. 

An alternative to probabilistic linear models is the 
wide range of flexible direct estimators of P(C|X) 
among which arguably the most widely used model re- 
mains the multilayer perceptron (MLP). Similarly to 
linear statistical models, however, it is important to 
note that the estimation of class conditional probabil- 
ities with an MLP is contingent on using a correct 
activation function at the output node together with 
a suitable choice of loss function, which must be one 
of the entropy functions outlined in the previous sec- 
tion. So, in binary classification, the log-likelihood 
function with a Bernoulli distribution should be used 
in conjunction with a sigmoid activation function. In 
the multinomial case, we would need an extension of 
the sigmoid function, the softmax activation, together 


549 


Le | d Hed 


550 PartD 


Neural Networks 


Le | d Hed 


with the cross-entropy as the loss function, since this 
is the correct measure of the divergence between the 
estimated and observed probability density functions. 
Similarly, for nonlinear regression, the activation func- 
tion should be linear with the usual sum-of-squares 
error function, provided the inherent noise in the data 
can be assumed to be normally distributed with zero 
mean, since this is where the loss function is derived 
from. In the event where the noise variance, for in- 
stance, is dependent on the covariates, heteroscedastic 
noise models must be used to derive appropriate loss 
functions [31.6]. 

While the strength of neural networks is their uni- 
versal approximation capability, in the sense of fitting 
any multivariate surface to an arbitrarily small error, 
this flexibility also makes them prone to overfitting, po- 
tentially resulting in data models with little bias but 
large variance, in direct contrast to generalized linear 
models. In both cases, it is necessary to control the com- 
plexity of the model and this is best done by adding 
a penalty term to enforce the principle of parsimony, 
colloquially known as Occam’s razor (lex parsimo- 
niae). Arguably, the most commonly used and effective 
scheme is to apply Bayes’ theory at the level of fitting 
the model parameters, then to the regularization hyper- 
parameters, and finally to model selection itself. 

As we saw previously, the output of the MLP rep- 
resents a direct estimate of the posterior probability of 
class membership P(C|X). This approach can be gener- 
alized for the analysis of longitudinal data where each 
individual subject is follow up over a period of time 
starting with a defined recruitment point and ending ei- 
ther at the end of a defined observation period or when 
an event of interest is observed, whichever occurs first. 
This is often called survival modeling and is typically 
used to estimate event rates in the presence of censor- 
ship, e.g., where the outcome of interest, for instance 
recovery from an illness, is observed in some subjects 
for only part of the allowed period of follow-up due 
to other events taking over, such as another condition 
setting-in, which prevent the observer from ever know- 
ing whether or not the subject would have recovered 
from the original illness, which is the event of interest. 
For discrete time, these models can be estimated using 
the standard MLP with an additional input node coding 
the time intervals. The output of the MLP again repre- 
sents a conditional probability, but now the probability 
of the subject surviving each time interval given that 
the subject survived until the start of the time interval. 
This defines the hazard function h;(x;), for subject with 
covariate vector x; and predictions over the /th discrete 


time interval, which is given by 
hi(x;) =P(T< t;|T > t1, Xi). 


For a single event of interest, i. e., a single risk factor, 
the log-likelihood function exactly mirrors that used in 
binary classification, treating as independent the proba- 
bility estimates for each of the N subjects and over the 
discrete time intervals where the subject was observed, 
i. e., up to the end of the follow-up period or until cen- 
sorship. This leads to the following loss function 


N i 


Ls = IT IT [na [1 =h) : 


i=1/=1 


(31.3) 


where the binary indicator variable d; = 0 if the event 
of interest was not observed for the subject during the 
specific time interval, and is 1 otherwise. This loss func- 
tion is known as a partial likelihood, since it is measured 
only over time periods where the outcomes for each 
subject are observed, an approach that has been ex- 
tended to the multinomial case to provide a rigorous 
treatment of censorship with flexible models in the con- 
text known as competing risks [31.45]. 

Application of the Bayesian regularization frame- 
work consists in maximizing the posterior probability 
for the model parameters w, given the data set D, the 
regularization hyperparameters œ and the choice of the 
model structure, e.g., selected covariates H, namely 


P(D|w, a, H)P(wia, H) 


P(w|D,a, H) = PED 


(31.4) 


The first term on the right-hand side of Eq. (31.4) 
denotes the probability of the model fitting the data, 
represented by the exponential of the entropy term dis- 
cussed in the introduction and defined for longitudinal 
data by (31.3), hence 


P(D|w, œ, H) =e. 


The second term in (31.4) represents a prior distribu- 
tion of the model parameters typically with a quadratic 
loss term corresponding to independent zero-mean uni- 
variate Gaussian distributions, sometimes called weight 
decay terms. A particularly efficient implementation of 
Bayesian regularization is to assign a separate weight 
decay term to each covariate, indexing the covariates by 
m of which there are Ng, with the N, hidden nodes in- 
dexed by n. This allows each covariate to be separately 
turned on or off depending on how informative it is 


Probabilistic Modeling in Machine Learning 


31.1 Probabilistic and Information-Theoretic Methods 


for fitting the observations about the outcome variable, 
a process known as automatic relevance determination 
(ARD) [31.4]. Expressed in full, this gives 


—G(w,a) 
P(wla, H) = ————.,_ where G(w,a 
wa D= (wa) 
1 No Nm a on a 
=z tm wa z=] (=) 


In principle, the best values for the regularization hyper- 
parameters, i.e., the weight decay parameters a, are 
those which minimize their posterior probability 


P(D\|a, H)P(a|H) 


P(a|D, H) = PDI) 


However, the denominator of (31.4) cannot be obtained 
in closed form, so a Laplace approximation is typi- 
cally around a stationary point in the loss function as 
a function of the weights. This amounts to a local Tay- 
lor expansion of 


P(Dla, H) = | Pwa, A) POsja, Haw 


e S(w.a) 
= | ———d, 
Z(t) 
where the linear term in the weights vanishes because 
of stationarity leading to 


1 
S*(w,a) ~ S(wMP a) + 5 Wh YT — wh) ; 


from which the posterior probability for the hyperpa- 
rameter results 


—S(w™? æ) 


Mw —1/2 
a. w (27x)? det(A) ; 


P(a|D,H)« 
In practice, what this means is that the log-odds ratio, 
given by the activation of the output node of the MLP 
can be assumed to have a univariate normal distribution 
whose variance is given by the Hessian of the matrix 
S with respect to the weights; g is the gradient of the 
activation a with respect to the weights, namely 


1 -( (azap)? ) 
P(alX,D) = 70r) Pe A 


with ayp denoting the most probable value of the ac- 
tivation function, i.e., the direct output of the MLP 
without marginalization, and 


S= g Ae: 


The so-called marginalized estimate of the MLP out- 
put is now the posterior distribution integrated over the 
activation a. In the above expression, g is the gradient 
of the activation with respect to the network weights 
and A is the corresponding Hessian; hence the matrix of 
second partial derivatives. For binary classification and 
single-risks modeling, this is given by a neat analytical 
expression 


h(x;, 0) = f OPa, D)da 


( aM? (xi) 

= g | —————. (31.5) 
V1 + (x/8)8"A—'g 

with g(-) denoting the sigmoid function. This adjust- 
ment to the original MLP output, i. e., aP, shows the 
regularization process in operation: stationary points, 
where the weights are well defined, have small vari- 
ance s? and therefore their value remains almost un- 
changed. Conversely, flat valleys in the loss function, 
where stationary points for the weights have broad 
Gaussian distributions, are penalized by reducing the 
value of the argument of the sigmoid function in 
(31.5) toward nil, reflecting an increase in uncertainty 
by shifting the MLP output toward the don’t know 
threshold. 

A probabilistic alternative to discriminative ap- 
proaches consists of generative models, where Bayes’ 
theorem is once again put into practice to estimate 
the posterior probability of class membership P(C;|X), 
from the class conditional density functions P(X|C,) 
and prior probabilities for the classes P(C;,), that 
is 


P(X|Cy)P(Cx) 
PAI P(G))” 


P(C;|X) = (31.6) 


where classes are indexed by k and the sum-rule has 
been used to expand the denominator. Suitable mod- 
els for the probability density functions (pdf) of the 
data given each class will depend on the nature of the 
data. However, it is straightforward to show for two 
classes that if the pdfs are normal distributions with 
equal variance, then the posterior probability will have 
exactly the functional form of the logistic regression 
model. This can be taken as an explanation in proba- 
bilistic terms of the potential limitations of this linear 
model, since different classes in practice tend to have 
distinct variances, even when that data sets for each 
class are approximately normally distributed. A natural 


551 


Le | d Hed 


552 


T'LE | d Hed 


Part D 


Neural Networks 


extension of this approach is to use a mixture of Gaus- 
sian distributions. This is a very flexible model that can 
parameterize also multimodal density functions. In the 
interest of space, we refer the interested reader to a stan- 
dard textbook [31.6]. 

The two approaches of discriminative and gener- 
ative models may be combined by using generative 
models to build kernels. These kernels define similarity 
between two covariate vectors x and x’ by correlation 
between the respective pdfs, with the values of the ker- 
nel function given by k(x, x’) = P(X = x)- P(X’ =x’) 
for suitable choices of the probability functions. A ker- 


31.2 Graphical Models 


In this section, we give a basic introduction to graphical 
models, a general framework for dealing with uncer- 
tainty in a computationally efficient way. Probabilistic 
models that we treat in the next sections belong to this 
framework. Here, we introduce the two main classes 
of graphical models, Bayesian and Markov networks, 
discussing different methods for performing probabilis- 
tic inference. Specific instances of learning within this 
framework, are given in the next two sections. For the 
sake of presentation, here we limit our presentation to 
discrete random variables; however, graphical models 
can be defined on continuous variables or mixed vari- 
ables. The material covered in this section is based 
on [31.6, 46, 47]. 

A graphical model allows us to represent a fam- 
ily of joint probability distributions in terms of a di- 
rected or undirected graph, where nodes are associ- 
ated with random variables, and edges represent some 
form of direct probabilistic interaction between vari- 
ables. Being able to compactly represent the joint 
probability distribution of a set of random variables 
X = {X,...,Xn} is very important: any probabilistic 
query involving the variables X;,...,X, can be an- 
swered by knowing their joint probability distribution 
P(X\,...,X,). For example, assume variables to be 
discrete, and suppose we want to know the posterior 
probability of X; and X, given all the other variables, 
i.e., P(X1, X2|X3,...,Xn). We can easily answer this 
query by computing 


P(X, X2|X3, tee Xn) 
7 P(X1,...,Xp) 
X xi edom(x P(X) = x1, X2 = X2, X3, caa Xn) ` 
X2Edom(X2) 


nel so designed will naturally form a Gram matrix. 
Such kernels lead naturally to the use of latent vari- 
ables 


k(x,x’) = JPO = x|Z = i) P(X’ =x'|Z =i) 


x P(Z =i), 


with weighting coefficients P(Z) reflecting the strength 
of the latent variable Z indexed by i. An example of this 
approach in practice will be seen in the HMMs later in 
this chapter (see Sect. 31.4.2). 


Unfortunately, storing the joint probability values asso- 
ciated with all the different assignments x), ... , Xn is not 
feasible: if d; is the size of dom(X}), all the different as- 
signments are = di, i.e., an exponential number of 
entries. This situation, however, constitutes the worst 
case. In fact, in many application domains, indepen- 
dence properties allow us to factorize the joint distribu- 
tion into compact parts which can be stored efficiently. 
Graphical models provide the language to compactly 
represent these factors, enabling in many cases infer- 
ence and learning over a compact parameterization of 
the joint distribution as graphical manipulations. 

Graphical models can be characterized accord- 
ing to the type of probabilistic interaction between 
variables they model. Directed graphs (Bayesian net- 
works) are used to express causal relationships between 
random variables (i.e., cause — effect relationships), 
while undirected graphs (Markov networks) are better 
suited to express probabilistic constraints among subset 
of variables to which it is difficult to ascribe a direction- 
ality (graphical models containing both directed and 
undirected edges are possible; however, they will not 
be covered here). In both cases, the joint distribution is 
factorized according to the notion of conditional inde- 
pendence. 


Definition 31.1 Conditional Independence 

Let X, Y, Z be sets of random variables with X; € X, 
Y; € Y, Z€ Z. X is conditionally independent of Y 
given Z (denoted as X1LY|Z) in a distribution P if, for 
all values x; € dom(X;), y; € dom(Y;), zi € dom(Z;) 


P(X =x, Y = y|Z = z) = P(X = x|Z = z) 


x P(Y =y|Z =2), 


Probabilistic Modeling in Machine Learning | 31.2 Graphical Models 553 


where X =x denotes Xj = x1,- .. , Xny = Xn, Y =y 
denotes Yı = y1, ..., Yny = Yny, Z =Z denotes Zı = 
Zis- -<s Znz = Znz» and ny = |X|, ny = |Y|, nz = |Z]. 


It is not difficult to see that if X1LY|Z, then it is 
also true that P(X|Y, Z) = P(X|Z). In fact, using the 
product rule for probabilities, we have P(X, Y|Z) = 
P(X|Y, Z)P(Y|Z). 

In the following, we will discuss how conditional 
independence is used within Bayesian and Markov net- 
works to factorize the joint distribution. Inference and 
learning will be discussed as well. 


31.2.1 Bayesian Networks 


Bayesian networks are directed acyclic graphs used to 
model causal relationships between random variables: 
an edge X; — X3 is used to express the fact that vari- 
able X, (cause) influences variable X> (effect). The 
combination of this interpretation in conjunction with 
the exploitation of conditional independence, where 
applicable, allows the efficient probabilistic modeling 
of many relevant application domains. In general, the 
product rule can be used to factorize the joint distribu- 
tion of variables X1, X2, X3, .. . , Xn as 


n 


P(X1,X2,X3,...,Xn) = | [PŒ X2. X1). 


i=1 


Bie?) 


The conditional independence relationships can be used 
to simplify the form of each factor in (31.7), i.e., 
by eliminating variables from the conditioning part, 
thus drastically reducing the number of probability 
values that need to be specified to define the factor. 
For example, if we assume that all the variables are 
Boolean, then the number of entries needed to define 
P(X, |X1,X2,...,Xn—1,) would be 2”—'. If we consider 
a simple scenario in which the variable X,, is depen- 
dent only on X,,_;, the corresponding simplified factor 
becomes P(X,|X1, X2,...,Xn—1) = P(X,|Xn—1), which 
only requires two entries. 

The naive Bayes model used in classification tasks 
can be understood as a Bayesian network, where the 
variable associated with the class label C is the cause 
and the variables X,,...,X, used to describe the at- 
tributes of the current input are the effects. The un- 
derlying conditional independence assumption is fairly 
simplistic, but allows a very parsimonious factoriza- 
tion of the joint distribution. By assuming that the class 
label does not depend on the attributes, and that the at- 


tributes are conditionally independent with respect to 
each other given the class label, i. e., Yi, j P(X;, X;|C) = 
P(X;|C)P(X)|C), naive Bayes factorizes the joint distri- 
bution as 


P(C,X1,X2,X3,....Xn) = P(O) [ ] PIC). 


i=1 


The details of this model are not discussed in this chap- 
ter, but a good didactic reference is [31.6]. 

In general, after simplification via conditional in- 
dependence, factors are in the form P(X;|Xj,,...,Xj,). 
where Xj,,...,X;, are denoted as parents of X;, and 
the notation pa(X;) is used with the following meaning 
pa(X;) = {Xj,,...,X;,}. The factor associated with vari- 
able X; can thus be rewritten as P(X;|pa(X;)) and the 
joint distribution as 


P(X, X2,X3,...,Xn) = | [PŒ (31.8) 


i=1 


The graphical representation of a Bayesian network is 
shown in Fig. 31.1. The graphical model includes one 
node for each involved variable. Moreover, a variable 
that is conditioned (effect) with respect to a parent 
one (cause) receives a directed edge from that variable. 
For example, in the Bayesian network represented in 
Fig. 31.1, we have pa(X7) = {X2, X3}, i. e., the set con- 
stituted by the two nodes from which X7 receives an 
edge. This means that the factor associated with X7 is 
P(X7|X2, X3). In Fig. 31.1, we have reported one popu- 
lar way to specify the parameters of P(X7|X2, X3) when 
the involved variables are discrete, i. e., the conditional 


erect Common effect 


Xi >X RE 


| 
| 
el | 
indirect j 
causal i 
i 
í effect È 


Xa | P(t|X7) 
f 03 
t| 09 


Common cause 


Fig. 31.1 An example of Bayesian network. Conditional probability 
tables are shown only for variables Xs and X7. Different types of 


probabilistic influence among variables are highlighted 


a Le | d Hed 


554 PartD 


Neural Networks 


T'LE | d Hed 


probability distribution table (CPD table). The CPD of 
X7 in Fig. 31.1, for instance, reports the probability of 
X = t, given each possible assignment of values to its 
parents. The CPD table associated with X5 is reported 
as well. By using the CPD tables associated to all nodes, 
the joint distribution can be rewritten as 

P(X, ...,X7) = P(X1)P(X3) P(X2|X1) P(X7|X2, X3) 


x P(Xs |X7)P(X6 |X7)P(X4 |X) i% 


Note that different distributions can be obtained by 
using different values for the entries of the CPD ta- 
bles. Thus, a Bayesian network is actually representing 
a family of distributions: all the distributions that are 
consistent with the conditional independence assump- 
tions used to simplify the factors. In fact, up to now, we 
have discussed how starting from a universal decompo- 
sition of the joint distribution via the product rule (note 
that such decomposition is not unique as it depends on 
the presentation order assigned to the variables), a set 
of conditional independence assumptions can be used 
to simplify the factors, leading to the corresponding 
graphical representation given by the Bayesian net- 
work. An important question, however, is whether the 
topological structure of a Bayesian network allows for 
the direct identification of other (conditional) inde- 
pendence relationships, i.e., whether there exist other 
(conditional) independence relationships that must hold 
for any joint distribution P that is compatible with the 
structure of a specific Bayesian network (note that ad- 
ditional relationships may hold only for some specific 
distributions, i. e., some specific assignment of values to 
the entries of the CDP tables). As we will see later, the 
answer to this question is important to devise general- 
purpose inference algorithms on Bayesian networks. 
A general procedure, called d-separation (directed sep- 
aration), can answer the question. It is based on the 
observation that two variables are not independent if 
one can influence the other via one or more paths in the 
graph. Let us exemplify this concept on the Bayesian 
network reported in Fig. 31.1, where we have high- 
lighted four different basic cases: 


1. Indirect causal effect: X, can influence X7 via X> if 
and only if X2 is not observed (a variable is said to 
be observed if the value assigned to that variable is 
known). 

2. Indirect evidential effect: X4 can influence X7 via X6 
if and only if X6 is not observed. 

3. Common cause: Xs can influence X¢ (and viceversa) 
via X7 if and only if X7 is not observed. 


4. Common effect: Xə can influence X; (and viceversa) 
if and only if either X7 or one of X7’s descendants 
(in this case, X5, X6, X4) is observed. 


The topological structure encountered in the com- 
mon effect is called v-structure and it plays a relevant 
role in the d-separation procedure. In general, it is clear 
from above that probabilistic influence does not fol- 
low edge direction. Thus, when considering a longer 
trail, e.g., the path from X; to X4, we have to consider 
whether each part of the trail allows probabilistic influ- 
ence to flow or not (according to the four basic cases 
described above). 


Definition 31.2 Active Trail 

Let X,,...,X, be a trail in a Bayesian network G, 
and £F be a subset of observed variables in G. The trail 
X1,...,X, is active given £ if: 


@ Whenever a v-structure X;—1 —> X; < Xj 1 does oc- 
cur, X; or one of its descendants belong to £; 
@ No other node along the trail belongs to £. 


Of course, by definition, if X; € £ or X, € E the trail is 
not active. Examples of active/not active trails from the 
Bayesian network represented in Fig. 31.1 are: the trail 
Xı, X2, X7, X6, X4 is active given the set £ = {X3, X5}, 
while it is not active whenever either X or X7 or X6 be- 
longs to £; on the other hand, the trail X; , X2, X7, X3 is 
active if X) Z £ and either X7 or X5 or X or X4 belongs 
to £. 

The Bayesian network represented in Fig. 31.1 
does not allow more than one trail between any cou- 
ple of nodes. In general, however, two nodes may 
have several trails connecting them and one node 
can influence the other one as long as there exist 
at least one active trail among them. This intuition 
is captured by the definition of the concept of d- 
separation. 


Definition 31.3 d-Separation 

Let X, Y, Z be nonintersecting sets of nodes of 
a Bayesian network. X and Y are d-separated given Z 
if there is no active trail between any node X € X and 
Y € Y given Z. 


The d-separation test can be used to precisely char- 
acterize the independence relationships which hold for 
probabilistic distributions that factorize according to the 
given Bayesian network. 


Probabilistic Modeling in Machine Learning | 31.2 Graphical Models 


In the following, we introduce another class of 
graphical models, i. e., Markov networks, which are de- 
scribed by undirected graphs. 


31.2.2 Markov Networks 


Directed edges in Bayesian networks are suited to de- 
scribe causal relationships between random variables. 
In many cases, however, the probabilistic interaction 
between two variables is not directional. In these cases, 
it is natural to consider undirected graphs, i. e., Markov 
networks. An undirected edge between variables X 
and Y represents a probabilistic constraint between the 
two variables. On the other hand, if X and Y are not 
connected, then we can state a conditional indepen- 
dence assertion involving them if and only if there are 
no active trails connecting them in the graph. Note 
that, since edges are now undirected, a trail is not ac- 
tive if and only if any of the variables in the trail 
is observed. This leads us to discuss which kind of 
joint distribution factorization a Markov network does 
represent. 

If we go back to the concept of active trail, it is clear 
that if we consider a subset S of fully connected nodes 
in the undirected graph, i.e., nodes in S are connected 
to each other, then any X, Y € S will be connected by so 
many trails involving nodes in S \ {X, Y} that it is wise 
to consider a single factor ¢s involving all nodes in S. 
Technically, S is called a clique, and we are actually in- 
terested in maximal cliques, i.e., cliques which cannot 
be extended in size by considering another node of the 
graph. For example, the maximal cliques of the Markov 
network given in Fig. 31.2 are 


c= {X1, X3, X5} 02 = 1X1, Xo} A 
c3 = {X2, X4} , c4 = {X3, X4} . 


Note that, while {X), X5} is a clique, it is not maximal 
since we can add X3 obtaining a larger clique. 

A different factor can be associated with each maxi- 
mal clique c;. By using a global normalization constant 
for the joint distribution factorization, a factor associ- 
ated with a clique c; can be modeled by a potential func- 
tion ¢-,(-), i.e., any nonnegative function (see Fig. 31.2 
for involving Boolean variables). Thus, the factoriza- 
tion of the joint distribution for the example in Fig. 31.2 
is 


1 
P(X, X2,X3,X4,X5) = Zee (X1, X3,X5) Pe, (X1, X2) 


X Pc; (X2, X4) bey (X3, X4) , 


where the normalization constant 


Z= > Pe, (X1, X3, X5) eo (X1, X2) 


Vix; EX; 
x Pez X2, X4) Pex (X3 , X4) 


is called the partition function. If with x we denote 
an assignment of values to the variables X),...,Xn 
and with x,, the corresponding assignments associated 
with variables in the clique c;, the general formulas for 
a Markov network are 


1 
P(X, vets Xn) i z I] Pei Xc) ’ 


Vic; 


where 


Z=) || $0). 


x Wig 


If the potential functions are restricted to be strictly pos- 
itive, then it is possible to find a precise correspondence 
between factorization and conditional independence. 
In fact, if we consider the set of all possible dis- 
tributions defined over variables of a given Markov 
network, then the set of such distributions that are con- 
sistent with the conditional independence statements 
that can be derived by using the adapted concept of 
active trails and d-separation coincides with the set 
of distributions that can be expressed as a factoriza- 
tion of the form given above with respect to maximal 
cliques of the network (Hammersley—Clifford theo- 
rem). 


onan pX, X) ı 


p(X, X3, X| 
: P(X, X4) 


Fig. 31.2 An example of Markov network involving five vari- 
ables. Maximal cliques and corresponding potential functions are 
highlighted. An example of potential function is given for clique 
{X>, X4}, where we have assumed that X2 and X4 are Boolean vari- 
ables 


555 


7 Le | d Hed 


556 PartD 


Neural Networks 


cle | d Hed 


For practical reasons, it is convenient to express 
a strictly positive potential function as a Boltzmann dis- 
tribution, i. e., 


he; Xc) = eT EC) ; 


where E(x,,) is called an energy function. Since the 
joint distribution is the product of potentials, the total 
energy is obtained by adding the energy functions of 
each of the maximal cliques. Energy functions are very 
useful since, in the absence of a specific probabilistic in- 
terpretation for the potential functions, assignments of 
values that have high probability can be given low en- 
ergies, while less probable assignments will correspond 
to high energies. 

Let us give an example of application of Markov 
networks: image de-noising. The task is to remove noise 
from a binary image Y where the pixels Y; are —1 or 
+1. Each observed pixel Y; is obtained by a noise-free 
image X with pixels X; where, with some small proba- 
bility, the sign of the pixel is flipped. Since neighboring 
pixels in the noise-free image are strongly correlated, 
as well as the two variables Y; and X;, due to the small 
flipping probability, we can use a Markov network like 
the one depicted in Fig. 31.3 to capture this knowledge. 
The total energy function encoding such prior knowl- 
edge would be 


E(X,Y)=-B J XX—n D> XN, 
Xj.XjEX XjEX 
Yicy 


where all the maximal cliques are considered and cou- 
ples of pixels with the same sign get lower energy 
values. Since we are interested in removing noise from 


Pixel i 


Fig. 31.3 A Markov network for image de-noising. Y; is 
the binary variable representing the state of pixel i in the 
noisy observed image, while X; refers to the noise-free im- 
age 


the observed pixels Y;, we add a bias toward pixel val- 
ues that have one particular sign, by summing a term 
hX; to the energy function for each pixel in the noise- 
free image 


E(X,Y)=h) X- X XX—n Do XY. 
XiEX Xi. XjEX XjEX 
Yiey 
Note that his operation is legal since it corresponds to 
multiplying the potential function, which are arbitrary 
nonnegative functions, by a nonnegative function. 
The factorized joint distribution over Y and X is 
then defined as 


1 
P(X, Y) = ai 


Probabilistic inference can now be performed by clamp- 
ing the value of Y to the observed image, which implic- 
itly corresponds to a conditional distribution P(X|Y) 
over free images, and by computing the assignments 
to X that minimizes the total energy of the Markov 
model, i.e., the assignment of values to pixels of X 
with highest probability given the observed image Y. 
The resulting assignment of values to X will return the 
(presumed) noise-free version of Y. 

In the following, we briefly present different ap- 
proaches to perform probabilistic inference in Bayesian 
and Markov networks. 


31.2.3 Inference 


Performing probabilistic inference in a graphical model 
over a set of random variables X means being able 
to answer any probabilistic query involving X. Since 
a graphical model, either a Bayesian or a Markov net- 
work, describes a factorization of the joint distribution, 
any probabilistic query can be answered, so the problem 
reduces to find efficient procedures to perform infer- 
ence. In the following, we report some of the most 
typical form of queries: 


© Conditional: In this case, we are interested in com- 
puting P(Y|Z =e), where Y,E CX, with YN 
E = Ø, where Y are the query variables and £ = 
{E\,...,E,} are the evidence variables for which 
specific values e = {e),..., ez} have been observed. 
@ Most probable assignment: Given evidence E = e, 
we are interested in computing the most likely as- 
signment y* to YC X\£. There are two main 
variants for this kind of query: most probable ex- 
planation (MPE) and maximum a posteriori (MAP). 


Probabilistic Modeling in Machine Learning | 31.2 Graphical Models 


A MPE query must solve the problem 


y* = arg max P(Y =y, £ =e), 
y 


where Y = X \ £, while a MAP query must solve 
the problem 


* = argmax ð P(Y=y,Z=z/E =e), 
y* = arg ma 2 (Y=y IE =e) 


where Z=X\E\Y. 


From the point of view of inference, both directed 
and undirected networks can be treated in the same way. 
In fact, directed networks can be converted to undi- 
rected networks. This is done by observing that factors 
in directed networks can be understood as factors cor- 
responding to cliques in an undirected graph obtained 
by mutually connecting all the parents of each node by 
new undirected edges and by dropping direction from 
the original directed edges. This procedure is known as 
moralization and the resulting undirected graph is the 
moral graph. By this means, all the variables involved 
in factors of the directed graph (e.g., CPTs) will be 
contained in corresponding cliques of the moral graph. 
Thus, we can focus on undirected graphs. 

From a computational point of view, in the worst 
case, probabilistic inference is difficult: every type of 
probabilistic inference in graphical models is NP-hard 
or harder. Specifically, the complexity of inference is re- 
lated to a topological property of the graphical network 
called treewidth. Approximate inference methods have 
been devised to deal with such computational complex- 
ity. Unfortunately, approximate inference turns out to 
be hard, in the worst case. Nevertheless, if the treewidth 
of the graphical network is not too large (e.g., in poly- 
trees), exact inference can be performed in a reasonable 
amount of time. Moreover, in many practical cases, ap- 
proximate inference is efficient and adequate. 

There are three major approaches to perform in- 
ference: exact algorithms, sampling algorithms, and 
variational algorithms. The former tries to compute the 
exact probabilities while avoiding repeated computa- 
tions. The second approach aims to efficiently approx- 
imate probabilities by sampling, in a smart way, the 
universe of events. Finally, the third approach allows 
us to treat both exact and approximate inference within 
the same conceptual framework. In the following, we 
briefly sketch the main ideas underpinning these ap- 
proaches. 


Exact Algorithms 

Let us illustrate one of the basic ideas of exact algo- 
rithms, i. e., variable elimination, by using the Markov 
network shown in Fig. 31.2, where we assume all vari- 
ables to be Boolean. Suppose we are interested in 
computing the marginal probability P(X). We can get 
it by summing the factorized joint distribution over the 
remaining variables 


POG) = > 590, Xs) 


XI X30 X4 ë X 


x p(X), X2)b (X2, X4)h(X3, X4) .- 


Naïve computation of the above equation would require 
O(2°) operations, since each summand involves five 
Boolean variables. However, we can rearrange the sum- 
mands in a smarter way 


1 
P(X) = 7 9 9X1. X2) X | P(r, Xa) X | (Xs, Xa) 


x X p(X, X3, Xs) 


X5 


= ; J 9%, Xo) D> G(X, X4) 


x 5 (X3, X4)ms (X1, X3) 


X3 


1 
=3 > (X1, X2) > (Xz, X4)m3 (X1, X4) 


x4 


1 
- X 9i, X2)m4 (X1, X2) 


1 
= git %) , 


where the m; terms are the intermediate factors obtained 
by summation on variable X;. Note that Z can be com- 
puted by summing on variable X2. Moreover, the total 
computational complexity reduces to O(2*) since no 
more than three variables occur together in any sum- 
mand. In general, the maximal number of variables that 
occur in any summand is determined by the elimina- 
tion order. Since many different elimination orders may 
be used, the lowest complexity is obtained by the order 
that minimizes this maximal number, which is related 
to the treewidth of the graph. Unfortunately, finding the 
optimal elimination order is NP-hard. 

One positive aspect of the elimination approach 
is that it also works for continuous variables since 


557 


a Le | d Hed 


558 PartD | Neural Networks 


T'LE | d Hed 


Fig. 31.4 Example of cluster graph, where the direction of the flow of computation is shown under each edge, while the 
scope of the computed factor transmitted to the other node after variable elimination is shown over each edge 


it is only based on the topology of the graph. How- 
ever, the elimination procedure returns only a single 
marginal probability, while it is often of interest to 
compute more than one marginal probability. Luckily, 
we can generalize the idea to efficiently compute all 
the single marginals. Here we give some hints on 
how to do it. Consider the sequence of intermediate 
factors generated in the example above. They can be 
indexed by the variables in their scope, i.e., 1.3.5 
= ¢ġ(X1, X3, Xs), Wi3.4 = $ (X3, X4 )ms (X1, X3), Y1,2,4 
= (Xz, X4)m3(X1,X4),Wi.2 = b(X1,X2)ma(X1, X2). 
Graphically, we can represent them via a cluster 
graph, where each node is associated with a subset of 
variables (i.e., the scope of intermediate factors) and 
the undirected edges support the flow of computation 
of the elimination process. In our example, the cluster 
graph is shown in Fig. 31.4, where we have shown the 
direction of the flow of computation under each edge, 
and the scope of the computed factor transmitted to the 
other node after variable elimination over each edge. 
The variable X, in the rightmost node is underlined to 
remark that it is the target of the flow of computation. 
In general, since each edge is associated with a variable 
elimination, it is not difficult to realize that the cluster 
graph is in fact a tree (called clique tree or junction 
tree). This structure can also be used for computing 
other marginals. In order to see that, we have to observe 
that the scope of the rightmost node is a subset of the 
scope of the node at its left, so it can be merged with 
this last node; moreover, each initial potential must be 
associated with a node with consistent scope, e.g., 


—S 
$ (Xi, X2) Q (Xz X4) 


o (Xi, X3, Xs) o (X3, Xa) 


Now, suppose we want to compute P(X3) by eliminat- 
ing all the other variables. We have to select a node 
which contains X; in its scope, e.g., the middle node. 
The flow of computation should now converge toward 
that node, as shown in 


Xi, X3 Xi, X4 
Xi, X3, X5 — Xi Xs, X4 [m 
Ms m 


Xı, Xo, X4 


Any elimination order consistent with the above flow 
will do the work, e.g., we first consider the leftmost 
node and eliminate X5 by transmitting the message 


ms (X1, X3) = J $ (X1. X3. Xs) 


X5 


to the middle node. Then, we do the same for the 
rightmost node, by eliminating X2 and transmitting the 
message 


mo(X1,X4) = X p(X, X2)p (Xo, X4) . 


Finally, the middle node can merge the two received 
messages with the local potential obtaining 


(X3, X4)ms (X1, X3)m (X1, X4) , 


which is an unnormalized version of the joint dis- 
tribution P(X,, X3, X4). Marginal P(X3) can then be 
computed by summing out X; and X4 and normalizing 
the result. Note that the same flow can be used to com- 
pute P(X,) and P(X4): in the first case, the final stage 
will sum out X; and X4, while in the second case it will 
sum out X; and X3. 

In general, all the factors needed by all the nodes to 
compute the marginals of the variables in their scope, 
can be computed by a sum-product message passing 
scheme where, having selected an arbitrary node as 
root, messages are transmitted from the leaves up to the 
root and then back from the root to the leaves. If ev- 
idence is present, restricted potentials (i.e., potentials 
where evidence variables are bound to the observed val- 
ues) are used. MEP and MAP queries can be answered 
by using a max-sum algorithm, which is a variation 
of the sum-product algorithm exploiting a trellis over 
all the values the variables can take. The message 
passing scheme sketched above can also be imple- 
mented using division, giving raise to the Belief Update 
algorithm. 


Sampling Algorithms 
The strategy adopted by sampling algorithms to per- 
form (approximate) inference is to approximate the 
joint distribution via estimates computed on a set of 


Probabilistic Modeling in Machine Learning | 31.2 Graphical Models 


representative instantiations of all, or some of, the vari- 
ables of the graphical model. Unlike exact inference, 
some techniques are specialized for directed networks. 
For example, a simple approach to estimate the joint 
probability in a Bayesian network is Forward Sam- 
pling. It starts by considering any topological ordering 
of the variables, e.g., for the network in Fig. 31.1 the 
order X1, X3, X2,X7,X5,Xo6,X4 will do the job. Then 
random samples are generated by following the order 
and by picking a value for each variable according to 
its distribution. Note that variables with conditional dis- 
tributions will be considered only when specific values 
for their parents have already been generated, so that the 
conditional probability for those variables is fully spec- 
ified. Once M full samples are generated in this way, the 
probability of a specific event P(E = e) is estimated as 
the fraction of samples where variables in £ take val- 
ues e. If the query is of the form P(Y|£ = e), samples 
which are not consistent with the evidence are rejected 
(rejection sampling) and the remaining samples used 
to estimate the conditional distribution on variables Y. 
With this approach, however, a large amount of gener- 
ated samples are discarded. 

An improvement on this aspect is given by the 
likelihood weighting algorithm, which is based on the 
observation that evidence variables can be forced to as- 
sume only the observed values in a sample as long as the 
sample is weighted by the likelihood of the evidence. 
This means that a weight is associated with each sample 
and the weight is given by the product of all the poste- 
rior probabilities corresponding to the observed values 
for the evidence variables, i. e., 


Wsample = I] P(E; = e;|pa(E;)) . 
EEE 


Estimates are then computed considering weighted 
samples. Likelihood weighting turns out to be a spe- 
cial case of a more general approach called importance 
sampling which aims at estimating the expectation of 
a function relative to some distribution. 

Improved sampling methods, which can also be ap- 
plied to Markov networks, are given by Markov chain 
Monte Carlo methods. Unlike the methods described 
so far, these methods generate a sequence of samples, 
in such a way that later samples are generated by dis- 
tributions that provably approximate with increasing 
precision the target posterior probability (i. e., the query 
P(Y|E = e)). 

The simpler method uses Gibbs sampling: an ini- 
tial assignment of values for the unobserved variables 


is generated from an initial distribution; subsequently, 
in turn, each unobserved variable is sampled using 
the posterior probability given the current sample for 
all other variables. This distribution can be computed 
efficiently by using only factors associated with the 
Markov blanket, i.e., the neighbors of the variable to 
be resampled in the Markov network (in Bayesian net- 
works, the Markov blanket of a node is given by the set 
of its parents, its children and the parents of its chil- 
dren). Using the theory of Markov chains (discussed in 
Sect. 31.4.1), it is possible to show that, under some 
assumptions, the sequence of generated distributions 
converges to a stationary distribution, where the frac- 
tion of time in which a specific assignment of values 
to variables (sample) does occur in the sequence is ex- 
actly proportional to the posterior probability of that 
assignment. 

A drawback of Gibbs sampling is that it uses only 
local moves (i. e., resampling of a single variable), lead- 
ing to very slow convergence for assignments with 
low probability. More effective methods, based on the 
Metropolis—Hastings approach, enable for a broader 
range of moves. Further, more advanced approaches al- 
low us to consider partial assignments in conjunction 
with a closed-form distribution for unassigned vari- 
ables. Others use deterministic methods to explicitly 
search for high-probability assignments to approximate 
the joint distribution. 


Variational Algorithms 
Probabilistic inference can be formulated as a con- 
strained optimization problem. This allows both to 
rediscover exact inference algorithms, such as the ones 
we have briefly discussed above, and to design ap- 
proximated inference algorithms, by simplifying either 
the objective function to optimize and/or the admissi- 
ble region for optimization. The possibility to devise 
theoretically founded approximation algorithms is par- 
ticularly appealing in cases where the joint distribution 
is characterized by a factorization with associated large 
treewidth. Research in this area has been recently very 
active, yielding to several interesting results. Here we 
do not have the space for a proper technical treatment, 
so we try to give only a brief introduction to the main 
ideas. 

Variational approaches are based on the idea of 
approximating an intractable probabilistic distribu- 
tion with a simpler one, which allows for inference. 
This simpler distribution is selected from a family of 
tractable distributions, as the distribution that is the best 
approximation to the desired one. Can we define a mea- 


559 


a Le | a Hed 


560 PartD 


Neural Networks 


ele | d Hed 


sure of the quality of the approximation that can be 
used for the minimization process? A good measure is 
the KL-divergence introduced in (31.2). Let us denote 
a distribution that factorizes according to the graphical 
model G as 


1 
PaX)=F JI be) (31.9) 


Vi,ci 


and let Q(X) be a member of the tractable distributions 
we use to approximate Pg(X). Then, a nice feature of 
KL-divergence is that it allows us to efficiently solve 
the optimization problem 


arg min Di BAE) 


without requiring to perform inference in Pg(X). In 
fact, using the factorization of Pg(X) in (31.9), it is not 
difficult to show that 


Dg (Q(X)||Pg(X)) = log Z— > 


Vi.cj 
+ Egflog Q(X)] , 


Egllog $<] 


(31.10) 


and, since log Z does not depend on Q(X), minimizing 
Dx (Q(X)||Pg(X)) is equivalent to maximizing the en- 
ergy functional term 


3 


Wici 


Egllog ¢.;] — Eollog Q(X)] . 


Following from the definition in (31.1), Họ(X) = 
—Eo[log Q(X)] is the entropy of Q, while the first term 
in (31.10) is referred to as energy term. 

Different variational methods correspond to differ- 
ent strategies for optimizing the energy functional. The 


31.3 Latent Variable Models 


Knowledge hindered in the complex relation between 
a large number of observable variables can be surfaced 
under the assumption that a simpler and unobservable 
process exists, which is responsible for generating the 
complex behavior of manifest data. Such an unobserv- 
able generative process can be modeled through the use 
of latent variables, as opposed to observable variables, 
that are not directly measurable, but can be inferred 
from observations and can explain the relation between 
manifest data. Intuitively, latent variables can be un- 


name variational is used since all of them adopt the 
general strategy of reformulating the optimization prob- 
lem by introducing new variational parameters to be 
used for optimization. In particular, each specific choice 
of values for the variational parameters expresses one 
member, i.e., Q(X), of the family of tractable distri- 
butions we want to use. The optimization procedure 
searches the space of variational parameters to find the 
Q* (X) that best approximates Pg(X). It is important to 
understand that the family of tractable distributions will 
actually corresponds to a set of constraints, involving 
the variational parameters that must be satisfied while 
maximizing the energy functional. By using Lagrange 
multipliers these constraints can be merged together 
with the energy functional, giving rise to a Lagrangian 
function that must be maximized. By taking the partial 
derivatives with respect to the variational parameters 
and the Lagrange multipliers, the solution to the op- 
timization problem can be characterized by a set of 
fixed-point equations. These equations can then be used 
to straightforwardly devise an iterative solution. 

Different variational methods work with different 
types of approximations. There are two main sources 
of approximation, which can be used singularly or in 
conjunction. One source is the energy functional, which 
can be substituted by a functional easy to manipulate 
while preserving a good degree of approximation. An- 
other source of approximation are the constraints, 1. e., 
the definition of the family of tractable distributions, 
which may not be fully consistent with the factoriza- 
tion represented by the graphical model (in this case, 
denoted as pseudo-distributions). 

We do not have space here to give more details; 
however, it is worth to mention that while convergence 
proofs of several variational methods are available, it 
is not so common to find theoretical guarantees on the 
approximation error made by the specific method. 


derstood as an attempt to model the unknown physical 
process generating the observations or as an abstraction 
providing a simplified representation of the manifest 
data, e.g., clusters. 

Probabilistic models that attempt to explain obser- 
vations in terms of latent variables are called latent 
variable models. In probabilistic terms, the simplifi- 
cation introduced by latent variables results in condi- 
tional independence assumptions, such that (subsets of) 
observable variables can be considered conditionally 


Probabilistic Modeling in Machine Learning 


31.3 Latent Variable Models 


independent when their hidden explanation, i. e., the la- 
tent variable assignment, is given. Similarly to observed 
variables, latent variables can be discrete or continu- 
ous: their nature, together with that of the observations, 
determines different types of probabilistic models. Nev- 
ertheless, parameter estimation in the different latent 
variable models can be achieved through a general it- 
erative principle, known as expectation—maximization. 


31.3.1 Latent Space Representation 


To understand the intuition at the basis of latent space 
representation, consider a joint distribution P(X) = 
P(X,,...,Xy) defined over N joint observed random 
variables X;. As discussed in Sect. 31.2.1, without 
any simplifying assumption, the number of free pa- 
rameters of this simple model grows as O(2‘—!) for 
Boolean variables, which quickly becomes unmanage- 
able for large N. One way to control the number of free 
parameters of a model, without taking too simplistic 
assumptions (e.g., X; being 1.i.d.), is to introduce a col- 
lection of latent, or hidden, variables Z = {Z,,..., Zg}. 
The latent variables are unobserved but can be used 
to factorize the joint distribution P(X) while allow- 
ing to capture (some of) the correlations between the 
X = {X,...,Xy} observed variables. More formally, 
latent variables are such that 


P(X) = J P(X|Z=HP(Z=2)dz, (31.11) 


that is the general formulation for the likelihood of a la- 
tent variable model. The details of the latent variable 
model, and the tractability of the integral in (31.11), are 
determined by the form of the conditional distribution 
P(X|Z) and by the marginal probability P(Z). A com- 
mon approach in latent variable models is to assume 
that observed variables become conditionally indepen- 
dent given the latent variables, that is 


N 
Px) = | T [PÆ = 9P =z). (31.12) 


z i=l 


A basic assumption for this latent model to be effective, 
is that the conditional and marginal distributions should 
be more tractable than the joint distribution P(X). For 
instance, in a simple scenario with discrete observa- 
tions and latent variables, this entails that K « N. Not 
surprisingly, the same intuition is applied, in a deter- 
ministic context, for dimensionality reduction (cf. the 
number of projection directions in PCA) and clustering. 


Different types of latent variable models are defined 
based on the nature of the latent and observed variables, 
as well as depending on the form of the conditional 
and marginal probabilities. In the following, we discuss 
two general classes of latent variable models with con- 
tinuous and discrete hidden variables, which are factor 
analysis and mixture models, respectively. 


31.3.2 Learning with Latent Variables: 
The Expectation-Maximization 
Algorithm 


Learning, in a probabilistic setting, entails working with 
the model likelihood. In latent variable models, the like- 
lihood in (31.11) might be difficult to treat due to the 
marginalization inside the logarithm, which can po- 
tentially couple all the model parameters. Despite the 
diversity of the models that can be designed, based on 
the general expression in (31.11), there exist a general 
principle to estimate their parameters. 

The expectation—maximization (EM) algorithm 
[31.48] is a general iterative method for the maximiza- 
tion of the likelihood under latent variables. The key 
intuition of the EM algorithm is to define an alterna- 
tive objective function where the parameter coupling 
introduced by the marginalization of the hidden vari- 
ables is removed. The EM algorithm maximizes the 
marginal data likelihood P(X|@), where 0 are the model 
parameters, through a tractable lower bound defined 
by introducing a function of the latent variables, i.e., 
Q(Z), into the data likelihood through marginalization. 
For notational simplicity, consider the case of discrete 
latent variables. For any nonzero distribution Q(Z), it 
holds 


£(0) = log P(X|6) 
a Z = z|0) 
P(X, Z = z|8) 
=| 
o2) Ot E ae 
> Tow) log P(X, Z|8) 


= D Q(z) log Q(z) = £(2,9), (31.13) 


where the lower bound £(Q, 0) < £(0) is obtained by 
the application of the Jensen inequality to the con- 
cave log function. The joint distribution P(X, Z|@) is 
known as the complete data likelihood, where the term 


561 


E'LE | d Hed 


562 PartD | Neural Networks 


ele | d Hed 


complete refers to the fact that the marginal data likeli- 
hood P(X|@) is completed with the observations z for 
the latent variables. 

The Expectation—maximization algorithm defines 
an alternate optimization process where the bound 
£(Q,) is maximized with respect to Q(-) and 0. In 
general, this is performed by two independent maxi- 
mization steps that are repeated until convergence: 


@ Expectation (E) Step: For 6 fixed, find the distribu- 
tion O“+) (z) that maximizes the bound £ (Q, 0); 

@ Maximization (M) Step: Given the distribution 
Q(z)“+"), estimate the model parameters 9+) 
that maximize the bound £(Q°+) , 0); 


where the superscript denotes the estimate at time t. 
Clearly, the optimal solution for the E-step is attained 
when 


oft (z) = PE = z|X, 9°), (31.14) 


that is when the lower bound in (31.13) becomes an 
equality. In practice, to explicitly evaluate the complete 
likelihood in LQ, gM), we would need to observe the z 
assignments. These are unknown, since latent variables 
are unobservable. However, given the marginalization 
of z in (31.13), we can substitute the unavailable z ob- 
servations with their expected values, by considering 
them as another random variable. To this end, it suf- 
fices that the E-step computes the expected value of 
the complete log-likelihood log P(X, Z|0) with respect 
to Z. These observations provide the final form of the 
classical EM algorithm: 


© £E-step: Given the current estimate of the model pa- 
rameters 6, compute 


QCD (6/8) = Exx gw [log P(X, Z|0)] ; 
(31.15) 


© M-step: Find the new estimate of the model param- 
eters 


OOF) = argmax QTD (6/0). (31.16) 


In other words, the E-step estimates the value of the 
otherwise unobserved latent variables, while the M-step 
finds the parameters that maximize the current estimate 
of the log-likelihood. In practice, the E-step often re- 
duces to estimating the expectation of Z as its posterior 


P(2Z|X, 0), while the M-step uses these values as suf- 
ficient statistics to update the model parameters 00+») , 
This alternate optimization is typically iterated until the 
log-likelihood does not change much between consecu- 
tive estimates, or when a number of maximum iterations 
is reached. Note that the two-step EM optimization pro- 
cess is prone to local optima. Hence, its convergence 
can be slow and, often, its solutions tend to be depen- 
dent on the initialization. 

The EM algorithm assumes that we can calcu- 
late the expected value of the complete log-likelihood. 
However, there are cases in which the required summa- 
tion is not computationally feasible (e.g., with infinite 
summations where the integral has no close-form solu- 
tion): in this cases, the approximated inference methods 
described in Sect. 31.2.3 can be used to define nonex- 
act EM algorithms. For instance, stochastic versions 
of the EM algorithm are obtained by approximating 
the infeasible summation using (e.g., Gibbs) sampling 
from the posterior distribution P(Z|X,6). The clas- 
sical EM algorithm is a ML method providing point 
estimates of the model parameters 6. The variational 
Bayes (VB) [31.6] method has been introduced to ob- 
tain a fully Bayesian solution that returns a posterior 
distribution of the parameters P(@), instead of their 
point estimate. VB is based on an analytical approxi- 
mation of the joint posterior of the latent variables and 
model parameters that yields to a generalization of the 
EM alternate optimization, where the maximization at 
the M-step is taken over possible distributions Q(0), in- 
stead of on @ itself. 


31.3.3 Linear Factor Analysis 


Factor analysis (FA) is an example of a latent variable 
model for continuous hidden and manifest variables. 
In its simplest linear form, it is a classical statistical 
model widely used for generative dimensionality re- 
duction. Similarly to its deterministic counterparts, e.g., 
PCA, it forms a low-dimensional embedding of a set 
of observations D = (x),...,Xn), where each obser- 
vation x is a D-dimensional vector of reals. FA finds 
a lower dimensional probabilistic representation of D, 
by assuming that the features of each x are indepen- 
dently generated by K real-valued latent variables Z = 
{Z,,...,Zx}, with K < D (see the associated graphical 
model in Fig. 31.5). 

The FA model, assumes that observations are linked 
to the latent vectors through a linear model 


x=Fzt+b+e, (31.17) 


Probabilistic Modeling in Machine Learning 


31.3 Latent Variable Models 


Fig. 31.5 Linear factor analysis: the observed D- 
dimensional variable X is related to the K latent variables 
Z = {Z1,... , Zg} through a linear mapping 


where € ~ N (e|0, W) is the Gaussian distributed noise 
with zero mean and covariance W, b is a bias vector 
and F is the factor loading matrix. The latent vari- 
ables are the factors and are generally assumed to be 
distributed as Z ~ N (z|0, Ix) = P(Z), where Ix is the 
K-dimensional identity matrix. Under such Gaussian 
assumptions, and given the linear model in (31.17), the 
conditional distribution of the observations is 

P(X =x|Z =z) = N (x|Fz+b, Y), (31.18) 
which, inserted in (31.11), provides the distribution for 
the FA complete likelihood 


P(X) = J P(X|Z)P(Z)dz = N (x|b, FF’ +W). 


(31.19) 


The form of the noise covariance ¥ determines the type 
of FA model: in general, this is chosen as a diagonal 
matrix with a vector of (Y1, ..., Wp) values on the main 
diagonal. When the diagonal elements are all equal to 
a single value o° € R, the FA reduces to the special case 
of the Probabilistic PCA [31.49]. 

Learning of the FA parameters 0 = (W, F) (b is usu- 
ally set a priori to the mean of the data) is obtained 
by maximum likelihood estimation. The most popular 
approach to obtain such estimates is based on solving 
an eigen-decomposition problem. Given the nature of 
FA as a latent variable model, its 0 parameters can also 
be estimated by applying EM to the logarithm of the 
complete likelihood in (31.19). The latter approach is, 
however, less used in general, given its slower conver- 
gence. 


31.3.4 Mixture Models 


The term mixture models identifies a large family 
of latent variable models comprising discrete hidden 


variables and generic manifest variables. A mixture 
model assumes that each observation is generated by 
a weighted contribution of a number of simple distri- 
butions, selected by the hidden variables. The simplest 
form of mixture model assumes that an observation 
is independently generated by a single mixture com- 
ponent. Widely popular elements of this family are 
the Gaussian mixture model for continuous observa- 
tions and the mixture of unigrams for multinomial data. 
In the following, we discuss an example of more ar- 
ticulated generative processes comprising observations 
with mixed component memberships. 


Probabilistic Latent Semantic Analysis 
Probabilistic latent semantic analysis (pLSA) [31.50] 
has been introduced to model mixed membership obser- 
vations, where a manifest sample is allowed to be gener- 
ated by multiple latent variables. Its primary application 
is on documental analysis, where latent variables are 
interpreted as topics to be identified in a collection of 
documents. Intuitively, in the mixture of unigrams, each 
document is assigned to a unique topic and, as a conse- 
quence, all the words in a document are constrained to 
belong to a single topic. The pLSA model relaxes this 
assumption by allowing words in a document to belong 
to different topics, obtaining a multitopic representation 
for the documents in the collection. 

The typical pLSA setting includes a dataset of 
multinomial samples, which are the documents D = 
{d,,...,dy}. Each document is an L-dimensional vec- 
tor of word counts of length equal to the size of the 
reference dictionary. In other words, the ith observed 
sample is a vector d; = (wi, nok wi), where wi is the 
number of occurrences of the jth word of the vocabulary 
in the ith document. This data is typically summarized 
in a rectangular L x N integer matrix n, such that each 
row n(-,d;) contains the word counts for document dj. 
The variables identifying words and documents, i. e., W; 
and Dj;, are observed, in contrast with the set of top- 
ics Z = {Z,,...,Zgx}, which are the latent variables. 
In pLSA, every observation n(wj, dj) is associated with 
a latent topic z by means of the hidden variable Z,. 

The fundamental probabilities associated with this 
model are P(D = d;i), that is, the document probabil- 
ity, P(W = w,|Z = zz), that is, the probability of word 
wj conditioned on topic zg, and P(Z = z,|D = dj), that 
is the conditional probability of topic zę given docu- 
ment d;. Given the nature of the manifest and hidden 
variables, all probabilities involved in pLSA are multi- 
nomials. The pLSA defines a (quasi) generative model 
for the word/document co-occurrences whose gener- 


563 


E'LE | d Hed 


564 PartD 


Neural Networks 


E'LE | d Hed 


ative process is described by Fig. 31.6, using plate 
notation. This is a concise representation for graphical 
models involving replications: rectangular plates denote 
replication of their content for a number of times given 
by term on the bottom right (e.g., N and La for the outer 
and inner plates in Fig. 31.6, respectively); each shaded 
circular item denotes an observed variable, while empty 
circles identify latent variables. 

The conditional independence relationships in 
Fig. 31.6 allow us to factorize the joint word-topic 
distribution: by using the parent decomposition rule in- 
troduced in (31.8), it yields 


P(W;, Di) = P(D:)P(W;|D:) 
K 
= P(D;) X P(Z|D;)P(WiIZx) (31.20) 
k=1 


that is the specific pLSA form of the general latent topic 
factorization in (31.12). The second equality in (31.20) 
is given by the marginalization of the latent topics Z 
and by the conditional independence assumption of the 
pLSA model, stating that word w; and document d; can 
be considered independent given the state of the la- 
tent variable Z. In other words, the word distribution 
of a document is modeled as a convex combination of 
K topic-specific distributions P(W;|Z;,). Such decom- 
position has a well-known characterization in terms of 
Nonnegative matrix factorization [31.13]. 

Estimation of the pLSA parameters 0 = 
{P(W|Z.),P(Z.|D;)} is obtained by maximization 
of the log-likelihood 


D W D W 
£(0) = log] | | ] P(W,, Dy" =F Y nowa) 
i=1j=1 i=1j=1 
K 
P(D) ) > P(Zx|Di)P(W)|Zx) 
k=1 


x log 


> 


(31.21) 


eo. © 


La 
N 


Fig. 31.6 Graphical model for the probabilistic latent se- 
mantic analysis: indices for the random variables D, Z, and 
W are omitted in the plate notation. The term Ly, denotes 
replication for the Lg words present in the dth document 


where P(W;, D;) has been expanded using the formu- 
lation in (31.20). As with other latent topic models, 
this maximization problem can be solved through the 
iterative EM-algorithm discussed in Sect. 31.3.2. Fol- 
lowing (31.15), the E-step computes the expectation 
of the complete likelihood P(Z, W, D) with respect to 
the pLSA latent topics, assuming observed documents 
and words. It easily shows that the resulting E-step 
computes 


P(Z;|Di) P(W; Z9 ® 
a P(Zy [Dj)© P(W)|Zy) ; 
(31.22) 


P(Z,|Wj, Di) = 


that is the probability of the topic Z, given word W; 
in document D,, estimated using the current values (at 
time #) of the model parameters 9 = {P(W|Z,)©, 
P(Z,|D;)}. Note that the decomposition on the right- 
hand side of Eq. (31.22) has been obtained by factor- 
ization of the posterior P(Z,|W;, Dj) using the Bayes 
theorem. 

The M-step equations (31.16) are obtained by dif- 
ferentiating the pLSA log-likelihood, extended with ap- 
propriate Lagrange multipliers for normalization, with 
respect to the P(Z;,|Dj;) and P(W;|Z,) parameters. The 
resulting update equations are 


E; nw, di) P(Z Wj, Di) 
i nj, di) 


P(Z|D) 0t? = 


’ 


(31.23) 
E2, nj, d;)P(Zi|W;, Di) 


Ea Die nw, di) P(ZelWj, Di) 
(31.24) 


P(W|Z,) °F? — 


The two-step optimization is iterated until a likelihood 
convergence criterion is met: often a validation set, or 
a tempered version of the EM are used in order to avoid 
model overfitting [31.50]. 


Advanced Topic Models 
The pLSA was the first mixed membership model 
allowing a single observed sample to be generated 
by multiple latent topics at the same time. How- 
ever, pLSA cannot be considered a fully generative 
model. In fact, the document-specific mixing weights 
for the topics are not sampled from a distribution, 
rather they are selected from P(z;,|d;) based on the in- 
dex of document d;. Hence, pLSA indexes only those 


Probabilistic Modeling in Machine Learning | 31.4 Markov Models 565 


documents that are in the training set D and can- 
not directly model the generative process of unseen 
test documents. In other words, the pLSA is basi- 
cally assigning null probabilities to all inputs that are 
not in the training set. The folding-in heuristic has 
been proposed to opportunistically solve this limitation, 
by assigning latent variables in the test-data to their 
MAP values before computing the test-set perplexity. 
However, the folding-in approach has been shown to 
lead to overly optimistic estimates of the test-set log- 
likelihood [31.51]. 

The latent Dirichlet allocation (LDA) [31.52] has 
been proposed as a Bayesian approach to address such 
modeling limitation of pLSA. It extends pLSA by treat- 
ing the multinomial weights P(Z|D) as additional latent 
random variables, sampling them from a Dirichlet dis- 
tribution, that is the conjugate prior of a multinomial 
distribution. Using conjugate distribution eases infer- 
ence as it ensures that the posterior distribution has the 
same form of the prior. The latent variable decomposi- 
tion of the LDA log-likelihood is 


P(W = wld.a, B) = J XC P(W = wiIZ =z,¢) 


x P(Z = 2/8) P(O|a)P(p|B)d0 , 
(31.25) 


where P(W|Z, ) is the multinomial word-topic distri- 
bution with parameters ¢ sampled from the Dirichlet 
distribution P(¢|6). The term P(Z|@) is the topic dis- 
tribution having 0 as document-specific multinomial 
parameter being sampled from the Dirichlet P(0|a). 


31.4 Markov Models 


Time series and, more generally, sequences are a form 
of structured data that represents a list of observations 
for which a complete order can be defined, e.g., time 
in a temporal sequence. Let a sequence of length T be 
Yn = Y1,---, Yr, Where the bold notation is used to de- 
note the fact that y is a compound object (in practice, 
however, this is can be treated as a set of random vari- 
ables). The term y; is used to denote the tth observation 
with respect to the total order. Position t is often referred 
to as time when dealing with time-series data. 

Two sequences are generally the results of indepen- 
dent trials, hence they can be considered 1.i.d. samples. 
However, the elements composing a sequence fail to 
meet such i.i.d. property. Therefore, in principle, a prob- 


o o 


La 


Fig. 31.7 Graphical model for the latent Dirichlet alloca- 
tion 


The terms œ and f are the hyperparameters of 
the Dirichlet distribution, see Fig. 31.7 for the model 
plate notation. Direct EM inference is impossible for 
LDA, since the integral in (31.25) is intractable due 
to the couplings between the parameters within the 
topic marginalization. Again, approximate and stochas- 
tic Bayesian inference methods, such as those in 
Sect. 31.2.3, are used to fit the LDA parameters, includ- 
ing VB [31.52], expectation propagation [31.53], and 
Gibbs sampling [31.54]. 

The principles underlying pLSA and LDA have 
inspired the development of latent topic models that ac- 
count for more articulated assumptions on the form of 
the hidden generative process. For instance, hierarchical 
LDA [31.55] proposes a generative process where ob- 
servations are generated by a topic tree instead of being 
drawn from a flat topic collection. Further, specialized 
latent variable models have been developed for specific 
applications, such as author-topic analysis in scientific 
literature [31.56] and image understanding [31.57]. 


abilistic model for y would be required to specify the 
joint distribution P(Y;,.. . , Yr). For discrete valued ob- 
servations yz, the joint distribution grows exponentially 
with the size of the observation domain. Clearly, this 
would make the use of the probabilistic model fairly 
impractical due to the exponential size of the param- 
eter space. To reduce such parameterization, Markov 
chains make the simplifying assumption that an obser- 
vation occurring at some position ¢ of the sequence, 
only depends on a limited number of its predecessors 
with respect to the complete order. In a time series, 
this entails that an observation at the present time, only 
depends on the history of a limited number of past 
observations. Markov chains allow us to model such 


le | d Hed 


566 PartD 


Neural Networks 


le | d Hed 


history dependence and are the heart of the hidden 
Markov model (HMM), which is the most popular ap- 
proach to model the generative process of sequential 
data. 

The HMM is a notable example of latent variable 
model: in the following, we provide an overview of 
the associated learning and inference problems. For 
simplicity, presentation focuses on sequences of finite 
length T and discrete time t. Sequence elements y, can 
be either discrete valued or defined over reals, without 
major impact on the model. The section also discusses 
how the HMM causation assumption can be modified 
to give rise to alternative approaches, with interesting 
applications that overshoot simple sequence modeling. 


31.4.1 Markov Chains 


A Markov chain is a simple stochastic process for 
sequences. It assumes that an observation y; at time 
(position) t only depends on a finite set of L > 1 pre- 
decessors in the sequence. The number of predecessor 
L influencing the new observation is the order of the 
Markov chain. 


Definition 31.4 Markov Chain 
An L-order Markov chain is a sequence of random vari- 


ables Y = Y,..., Yr such that for every t € {1,..., T}, 
it holds 
P(Y, =yl¥1,.-.,¥-1, Yita,---, Yr) 
= P(Y, = y| Y1,- - , Y1) - (31.26) 


Following from the discussions in Sect. 31.2.1, (31.26) 
states that the L predecessors of Y, define the set 
of its Bayesian parents pa(Y,) = {Y;-1,..., Y;-1}. For 
a first-order Markov chain, i. e., L = 1, (31.26) reduces 
to P(Y, = y;|Y:-1 = y;-1). Such conditional indepen- 
dence assumption formally encodes the intuition that 
the current observation can be predicted from the 
sole knowledge of the preceding sample. The graph- 
ical model of a first-order Markov chain is shown in 
Fig. 31.8, whose joint distribution decomposes as 


P(Y,,..., Yr) = P(Y%1)P(¥2|¥1), P(¥s|¥2) 
xX.. .P(Yr|Yr—:) 


T 
= PY) | | PEAY). 


t=2 


(31.27) 


The first element Yı has an empty conditioning part 
given that is has no predecessor. Its probability P(Y,) 


Fig. 31.8 Graphical model for a first-order Markov chain 
of length T, where pa(Y;) = {Y;—1} 


is referred to as marginal or prior probability, while the 
term P(Y,|Y,—1) is the transition probability. 

A Markov chain is stationary or homogeneous, if 
the transition probability does not depend on the time 
(position) ¢. In other words, the parameterization of the 
Markov chain is such that 


PY, =y an =y) =f0,y). 


where the transition distribution is a function f(y’, y) 
of the sole observations y, y’. An interesting stationary 
first-order Markov chain is that whose random variables 
take values from a finite alphabet of discrete symbols 
i,j € {1,...,M}. In these chains, the transition proba- 
bility 

Ay = P, = i/¥-1 = J) (31.28) 
denotes the probability of occurrence of the ith symbol 


preceded by symbol j. For convenience, such proba- 
bility is represented by the element Aj of the M x M 


transition matrix A = [A] = ,- Similarly, the marginal 
distribution defines the elements 

m; = P(Y; =i) (31.29) 
of the Mx 1 initial state vector 7 = EAM These 
Markov chains can be straightforwardly interpreted as 
state-transition systems, where each symbol i of the 
alphabet is a state and a state-transition arrow exists be- 
tween states i and j having a nonzero Aj entry in the 
transition matrix. 

The Markov chains described by (31.28) and 
(31.29), despite their simplicity, have found wide ap- 
plication, e.g., in modeling of physical phenomena, 
economic time series, and information retrieval. Learn- 
ing Markov chains requires fitting the M? parameters 
of the transition matrix plus an M-dimensional prior, 
where M is the size of the observation alphabet. Effi- 
cient methods exists to fit stationary first-order Markov 
chains by maximum likelihood (ML). By using the de- 
composition in (31.26), substituting the definitions in 
(31.28) and (31.29), the Markov chain log-likelihood 


Probabilistic Modeling in Machine Learning | 31.4 Markov Models 


for a generic sequence y writes 


M 
£(0) = log P(Y = y|0) = log Il gina) 


/=1 


T M N _ 
X I] ĮI A aes 


t=2 i j=1 


(31.30) 


where 0 = (A, 7) are the model parameters and ô(y; = 
i, Y;-1 =J) is the indicator function. For instance, it 
equals | if a transition from y,_; = j to y; = ican be ob- 
served in the sequence and it is 0 otherwise. Similarly, 
(yı = ï’) = 1 if and only if the first symbol of the se- 
quence is i’. The final expression of the log-likelihood 
is obtained by taking the log into the products and 
adding appropriate Lagrange multipliers for normaliza- 
tion. The ML estimate is obtained by differentiating this 
final expression with respect to parameters Aj and 7;, 
yielding 


a 607 =i, y1 =J) 


Aij = ; (31.31) 
i Di DL ly: =i, y1 =j) 
a EN tidak 
X4 6(1 =i) 


Intuitively, the ML estimate corresponds to counting the 
number of transitions from symbol j to i across time 
(similarly for the initial state). Generalization to a set of 
N samples sequences y” is straightforward: it suffices to 
count transitions both in time and across samples, and 
similarly for the initial symbols yj. 


31.4.2 Hidden Markov Models 


Markov chains model sequential data assuming that se- 
quence elements are generated by a fully observable 
stochastic process. In the discrete-state Markov chain, 
this requires each state of the process to correspond to 
an observable element of the sequence, i.e., en event. 
On the other hand, most real-world systems generate 
observable events that are correlated, but not coinci- 
dent, with the state of the generating process. More 
importantly, the only available information can be the 
outcome of the stochastic process at each time, i.e., 
event y,, while the state of the system remains unob- 
servable, i. e., hidden. The HMM allows modeling more 
general stochastic processes where the state transition 
dynamics is disentangled from the observable infor- 
mation generated by the process. The state-transition 


Fig. 31.9 A first-order HMM with hidden states S, chosen 
on the discrete domain {1,...,C}, fort=1...T 


dynamics is assumed to be nonobservable and is mod- 
eled by a Markov chain of discrete and finite latent 
variables, i. e., the hidden states. The observable infor- 
mation is then generated by such hidden states similarly 
to how latent variables generate observations in mixture 
models (see Sect. 31.3.4). 

The graphical model of an HMM is exemplified in 
Fig. 31.9: the hidden states are latent variables S,, while 
the sequence elements Y, are observed. 

The conditional dependence expressed by the arrow 
S, — Y, indicates that the observed element of the se- 
quence at time ¢ is generated by the corresponding hid- 
den state S, through the emission distribution bs, (y:) = 
P(Y; = y,|S; = s,). The unknown state-transition dy- 
namics is modeled by the first-order Markov chain 
of discrete and finite hidden states S,. By applying 
the Markovian decomposition in (31.27) to the hid- 
den states chain, the joint distribution of the observed 
sequence y= y,,..., yr and associated hidden states 
S= 5S ,...,S7 writes as 


T 
P(Y =y,S=s) = P(Si) | [ P(S:IS-) P“AS:) - 
t=2 


(31.33) 


The actual parameterization of the probabilities in 
(31.33) depends on the form of the observation and 
hidden states variables. From Sect. 31.8, a stationary 
hidden states chain is known to be regulated by the 
C x C matrix of state transitions Aj = P(S, = 1|S;—-1 = 
j) and by the C-dimensional vector of initial state 
probabilities m; = P(S, = i), where i,j are drawn from 
qilsesey C}. For discrete sequence observations y; € 
{1, M}, the emission distribution is an M x C emission 
matrix B such that its elements are 


bi(k) = By = P(Y, = k|S; =i). 


(31.34) 


For continuous observations y,, the state assignment 
S, = i selects the ith emission distributions b;(y,) = 
P(Y,|S; = i) from a mixture of C candidates. 


567 


Tle | d Hed 


568 Part D | Neural Networks 
An HMM is a latent variable model defined by P(Y|6) = X PY,S = s|) 
the 0 =(z,A,B) parameters and, implicitly, by the s 
(unkown) number of hidden states C. In [31.58], T 
three notable inference problems are identified for an a > P(S)) I] P(S,|S;-1)P(Y;|S;) , 
HMM. Shy. sST =2 
ee (31.35) 
Definition 31.5 Evaluation Problem 
Given a model @ and an observed sequence y, deter- Where the joint probability P(Y, S|@) has been factor- 
mine the likelihood P(Y = yl) of the sequence being ized according to the HMM assumption in (31.33). 
generated by the model. Direct computation of (31.35) is generally infea- 
E sible, as it would require O(TCT) operations. This 
probability can be efficiently computed, with O(TC’) 
Definition 31.6 Learning Problem operations, through accumulation of a recursive term 
Given a dataset of N observed sequences D= that is computed by scanning the sequence from left to 
fy!,...,y%} and the number of hidden states C, find right. The procedure is known as forward algorithm: let 
the parameters 7, A and B that maximize the probability Yi: be the observed subsequence from position 1 to f, 
of model 0 = {7, A, B} having generated the sequences define the forward probability as 
in D. 
a(i) = P(Y 1:1 = Yi, Si = il) (31.36) 
Definition 31.7 Optimal States Problem that is the probability of observing a partial sequence up 
Given 4 módel 0 and an observed seducice y- tidak position ź and the underlying hidden process being in 
: F i 1 M state i at time ¢. A recursive formulation of the œ,(i) 
optimal state assignment s = sŤ, .. . , s7 for the under- is obtained by introducing the hidden state S 
lying hidden Markov chain. ET SO AEC eee MEGEN RALE ee 
D by marginalization, yielding 
These classical inference problems are addressed us- c 
D ing efficient and numerically stable recursive algo- a,(i) = PM: Yi St = i, S1 = jl 9) 
4 rithms that exploit message passing on the HMM j=l l f 
= junction tree (Sect. 31.2.3) to factorize the, other- i é 
= wise hardly tractable, joint maximization problems. = > P(Y, = y,|S; =i, 9) 
F The underlying intuition is a recursive computa- 


tion of intermediate probabilities (messages) that are 
passed forward and backward along the sequence 
(the junction tree, in practice) to accumulate evi- 
dence for solving the joint problem. A discussion of 
the key aspects of these solutions is provided in the 
following. 


Evaluation 

The evaluation problem refers to measuring how well 
a given HMM matches an observed sequences. Let 
the model be 0 = (x,A,B) and the observed se- 
quence y = y1,..., Yr, the objective is to find P(Y = 
yl). To effectively compute this probability in the 
HMM assumption, it is needed to introduce the hid- 
den states assignment corresponding to the observed 
sequence y. Following the general approach for latent 
variable models in Eq. (31.11), these are introduced 
through marginalization on the joint assignment s = 
i EEEREN i 


j=1 
x P(S; = i|S;-1 =j, 0) 
x P(Y i::—1 = Yiz—1; Si-1 = J0) 


Cc 
= biy) X Aisi (i) : (31.37) 


j=l 


where the second equality follows from the condi- 
tional independence assumptions of the model. Since, 
pa(S,) = {S;—1}, the chain element S; is completely de- 
termined by the hidden state at previous time S,—1; 
similarly, emission Y, is conditional independent from 
the rest, given the hidden state S,. 

The forward recursion scans the observed sequence 
from left to right and recursively computes the @,(i) val- 
ues in each position t = 1,..., T using (31.37). At each 
observed position f, the a,(i) values are computed for 
each i € {1,..., C}, since the hidden states are not ob- 
served. The basis of the recursion is at t = 1, where the 


Probabilistic Modeling in Machine Learning | 31.4 Markov Models 


(31.37) reduces to a (i) = b;(y1)2;, such that yı is the 
first element of the observed sequence. The likelihood 
of the full sequence y = y;:r is computed at the end of 
the forward recursion as 


C Cc 
P(¥|9) = X` PY i:r, Sr = 19) = Dari). 


i=l i=1 


(31.38) 


Learning 

Learning of an HMM 6 = (z,A, B) amounts to find- 
ing the values of the parameters z, A and B that are 
most likely to have generated a dataset of observed 
i.i.d. sequences D = {y!,..., yN}. From the evaluation 
problem, we know how to measure the quality of the 
matching between a sequence y and a model @ using the 
likelihood P(Y|@). The HMM learning problem can be 
solved through ML estimation of 6 parameters consid- 
ering the hidden states as latent variables. As discussed 
in Sect. 31.3.2, this problem can be solved through ap- 
plication of the EM algorithm, whose HMM version is 
referred to as Baum—Welch algorithm [31.59], which is 
a form of sum-product inference algorithm introduced 
in Sect. 31.2.3. Marginalization of the hidden states as 
in (31.35), yields to the HMM log-likelihood on the 
dataset D 


N 
£(6) = log | | P(v"|6) 


n=1 


Tn 


x] [PSPS 
t=2 
(31.39) 


where overscript n refers to the nth sequence y” and T, 
is the corresponding length. The likelihood in (31.39) 
is intractable due to the nonobservable state assignment 
that introduces the marginalization term. Following the 
principles of the EM algorithm, we assume to know the 
unobserved state assignment, as in (31.30). This can be 
achieved by introducing indicator variables z}; for the 
unknown assignment, such that z} = 1 if the chain is 
in state 7 at position ¢ of the nth sequences, and it is 0 
otherwise. Given this (assumed) knowledge about the 


hidden state assignments, if is possible to write the cor- 
responding completed likelihood 


£.(9) 


Ta C 
x [| [[ P6 = ilS} =) Pars? = 


t=2 i j=1 


Tn C 


N Cc 
=), į) algm+) 9 ay 


n=l [ i=1 t=2ij=1 


x logAy +z; log bi) p > (31.40) 


where the latter equality introduces the parameters 6 in 
place of the corresponding probabilities and brings the 
logarithms into the products. 

The EM procedure is applied to the complete log- 
likelihood in (31.40). Following (31.15), the E-step 
computes the expected value of £.(0) with respect 
to the distribution of the indicator variables Z = {zi}, 
conditional on the observed sequences D and the cur- 
rent estimate of the parameters 6“. Given £.(0) as in 
(31.40), taking its conditional expectation with respect 
to the hidden variables Z, it yields to the following pos- 
terior probability: 


Exy,o [zu] = P(S; = ily) , 


where superscript n is omitted for notational simplic- 
ity. The estimation of this posterior is known as the 
smoothing problem. In the Baum—Welch algorithm, this 
is efficiently solved by a double recursion that exploits 
the following decomposition of the joint probability 


(31.41) 


P(S, = i, y) = P(S, = i, Yin, Yi4 1:7) 
= P(S; =i, Yi) 


x P(Y ir$: = 1) = œp) , 
(31.42) 


where the observed contribution from the predecessors 
of t (i.e., Y1.) is separated from that of its succes- 
sors (i. e., Y;41:7). The cancelations in (31.42) follow 
from the fact that S, d-separates (see definition in 
Sect. 31.2.1) the elements of the two subsequences, i. e., 
Yı: and Y;41:r. 


569 


Tle | d Hed 


570 PartD 


Neural Networks 


le | d Hed 


The first term in (31.42) is the œ;(i) probability 
defined in (31.36), which can be computed through 
the forward algorithm. The 6,(i) term can also be 
computed through a recursive procedure known as 
backward algorithm, due to the inverted direction with 
respect to the forward recursion. Consider the following 
recursive decomposition 

BiG) = P(Yr-7|Si-1 = J) 

c 
= OPW ar. S: = iS =) 

i=l 

c 
= OPIS, = DPY op n-715; =i) 

i=1 


x P(S; = i|S;-1 =J) 


Cc 
=) di) B, MA; (31.43) 


i=1 


it can be computed for 2 < t < T by scanning the se- 
quence backward, assuming r(j)= 1 for each je 
i ¢ eee Ot 

The final expression of the smoothed posterior in 
(31.41) is given by the joint œ — 6 recursions, known as 
the forward—backward algorithm, that is 


ee ee ae 

(i) = P(S, = iY) = P(Y) 

_ OU (31.44) 
a1 DB) 


Note that the forward and backward recursions can be 
ran in parallel, since the values of a and f do not de- 
pend on each other. To complete the derivations of the 
sufficient statistics for the M-step, it is also necessary to 
estimate the joint posterior 

= P(S, =i, S; 


=jlY), (31.45) 


Ezy,ow Bazan] 


which can be straightforwardly factorized into known 
probabilities along the lines of (31.42). It turns out that 
such joint posterior can be estimated using the a — ĝ 
probabilities computed by the forward-backward algo- 
rithm, that is 


P(S; =i, S; =j|Y) 
Qı HA: OD bÀ) 


T EE i a (m Amb BD 
(31.46) 


Yr (ij) = 


Parameters 0 = (x,A, B) are re-estimated at the M- 
step, with update equations that follow straightfor- 
wardly from the maximization problem in (31.16). It 
suffices to differentiate (31.40), extended with appropri- 
ate Lagrange multipliers to account for the sum-to-one 
constraints. Intuitively, the update equations can be 
straightforwardly written from the ML estimates for 
observable Markov chains in (31.31) and (31.32). It suf- 
fices to substitute the observed state counts, obtained 
through the indicator function 6(-), with the virtual 
counts y(-) estimated by (31.44) and (31.46) at the E- 
step. For the hidden state transition and initial state 
distributions this yields to 


N g” n E" 
Zimi De Yt, r—1 (i, D 
N 
Dna pa Y0) 


vy (i). 


n=1 


Aj = 


and 2;= (31.47) 


The estimate of the parameters B depends on the form 
of the emission distribution: if the observed sequences 
take values k from a finite alphabet {1,...,M}, the cor- 
responding multinomial emission in (31.34) is updated 
by 

N Tn 


=> 50. =k 


n=l t=1 


(31.48) 


where ô(-) is the indicator function counting the oc- 
currences of the symbols k in the observed sequences. 
Real-valued sequences are modeled usually through 
Gaussian emissions, whose parameters are fit as usual 
through maximization of the complete log-likelihood. 

Particular care must be taken to avoid numerical 
problems when implementing the forward-backward 
algorithm. Both recursions work with multiplications 
of small numbers: hence, the values of œ and 6 can un- 
derflow for long sequences. To this end, it is advisable 
to perform them in log-space or to work with scaled 
versions of the œ and f probabilities [31.60]. A sequen- 
tial version of the smoothing algorithm exists [31.61] 
that directly computes the smoothed posterior y;(i) = 
P(S, = i|Y) through a y-recursion that uses the œ val- 
ues generated by the forward algorithm. 


Optimal State 
Once a model 0 has been trained, it can be interesting 
to determine the most likely hidden state assignment 
s* that has generated an observed sequence y. This 
inference problem, known also as decoding, has differ- 
ent solutions, since several optimal assignment exists, 


Probabilistic Modeling in Machine Learning | 31.4 Markov Models 571 


depending on the interpretation of what an optimal as- 
signment is. For instance, the optimal hidden sequence 
can be the one maximizing the expected count of correct 
states. On the other hand, an optimal assignment might 
be the sequence of hidden states s* with the maximum 
joint probability P(Y = y, S =s*). 

The former optimality condition is solved by select- 
ing, at each position ¢, the most likely state given by the 
sequence, i. €., 


s* = arg _max „P(S: =ilY). (31.49) 
Clearly, this amounts to select the most likely state for 
each position independently, using the posterior com- 
puted by the Baum—Welch algorithm. Conversely, the 
latter optimality condition estimates the joint hidden 
state assignment 


s* = arg max P(Y, S=s). (31.50) 


This is a complex inference problem that can be ef- 
ficiently solved though a dynamic programming ap- 
proach, known as the Viterbi algorithm. Note that the 
two optimality definitions generally lead to different so- 
lutions. For instance, the Viterbi solution is constrained 
to provide only state transitions allowed by the generat- 
ing distribution, while this is not the case for the Baum, 
Welch solution, given that hidden states are selected in- 
dependently. 

The Viterbi algorithm is based on a backward re- 
cursion that exploits a factorization of the maximization 
problem in (31.50). Consider the restricted problem of 
determining the hidden state of the tail element T 


T 
max P(Y, Sr = sr) = max I] P(Y,|S,)P(S;|S;—1) 
at T =l 
T-1 
= I] P(Y,|S,)P(S;|S;—1) max P(¥;|Sr)P(Sr|Sr—1) ; 
=l 


(31:51) 


where the joint probability factorizes according to the 
Markov chain assumption. We can isolate the maxi- 
mization problem in the rightmost term 


€r—1(Sr—-1) = pe P(Y,|Sr = sr) 
x P(Sr = sr|S7—1 = Sp—1) $ (31.52) 


that is a message conveying information on the max- 
imization of the tail element to the penultimate po- 
sition. Substituting the definition of €7—;(sr—;) back 


in (31.51) and adding the maximization with respect 
to sr—1, suggests the recursive formulation of e€.(-) for 
a generic position ź— 1, i. e., 


E1 (S1) = max P(Y;|S; = 5;) 


x P(S; = si|S1 = 5-1) (51) 5 
(31.53) 


for 2 < t < T, where er(sr) = 1 is the basis of the re- 
cursion. At each step ¢ of the backward recursion, the 
Viterbi algorithm computes the e-message for each pos- 
sible assignment of the hidden state of t and propagates 
it to the predecessor ¢— 1. The recursion ends at the ini- 
tial element of the sequence, where the initial optimal 
state is obtained as 


s{ = arg max P(Y,|S, = s)P(S; = s,)ei(s). 


(31.54) 


The assignment of the remaining hidden states is ob- 
tained by backtracking through the forward recursion 


s¥ = arg max P(Y,|S; = s) 
x P(S: = s|S;-1 = s7_, )ér(s) . (31.55) 


Note that the Viterbi algorithm is a special case 
of a max-sum inference algorithm introduced in 
Sect. 31.2.3. 


31.4.3 Related Models 


Higher Order Markov Models 

Hidden Markov models serve as a starting point for the 
design on more complex Markov generative processes, 
besides the obvious extension to higher order hidden 
chains [31.62]. Factorial HMMs [31.63] generalize the 
original model by defining super states that are collec- 
tions of K discrete hidden states, each being part of an 
independent Markov chain (see Fig. 31.10). This facto- 
rial model results in K hidden Markov chains running 
in parallel: at each time step, the emission depends on 
the K-dimensional super state, but each state variable is 
decoupled from those of the other chains and evolves 
according to its own dynamics. By this means, it is 
possible to efficiently encode the state dynamics of K 
objects evolving independently that interact to jointly 
determine the observation (e.g., K cars moving in the 
traffic and jointly determining traffic jams). 


Le | d Hed 


572 PartD | Neural Networks 


cle | d Hed 


Fig. 31.10 Factorial HMM with K = 3 independent hidden 
Markov chains 


Fig. 31.11 A bottom-up hidden tree Markov model for 
a simple structure with five nodes: the generative process 
follows the direction of the arrows, i.e., from the leaves to 
the root (t = 1) 


Nonhomogenous HMMs 
Relaxation of the homogeneity assumption led to the 
input/output hidden Markov model (IO-HMM) [31.64] 
that allow modeling the causal dependence of the hid- 
den generative process from an additional input se- 
quence x. Basically, the IO-HMM enables nonhomoge- 


31.5 Conclusion and Further Reading 


Graphical models have been discussed as an excel- 
lent framework for probabilistic modeling of articulated 
processes that can be described by a static set of ran- 
dom variables tied up by probabilistic relationships. 
Such relationships need not to be necessarily known, 
a-priori. Several approaches exists to infer them from 
data, i.e., to determine the presence of a correspond- 
ing edge in the graphical model. However, the same 
approaches tend to fix the structure of the graphical 
model, once this is determined from the data. In other 
words, these graphical models represent a static picture 


neous transition and emission distributions that are ex- 
plicitly dependent (1. e., parameterized) on the currently 
observed label of the input sequence. An IO-HMM im- 
plements a mapping, referred to as transduction, from 
an observed input sequence x into an output (target) 
sequence y, realized by the input-conditional hidden 
process P(Y|X). Interesting applications of IO-HMM 
are in learning transformations between modalities in 
multimedia data [31.65], exploratory analysis of finan- 
cial time series [31.66] and gene data analysis [31.67]. 


HMMs for Structured Data 

Hidden tree Markov models represent the generative 
process of more complex, tree-structured information 
(see Fig. 31.11). Differently from the sequential do- 
main, the direction of the generative process leads 
to different representational capabilities when dealing 
with trees. Top-down approaches [31.68] model all pos- 
sible paths from the root to the leaves of the tree. 
Bottom-up models [31.69] propose a generative process 
from the leaves to the root, where complex structures 
are generated by composition of simpler substructures. 
Recently, an extension of the IO-HMM has been pro- 
posed to learn transductions between trees [31.70]. 


Bayesian and Nonparametric Extensions 
HMMs have been extended to allow a countably infinite 
number of hidden states through a Bayesian approach 
where state distributions are modeled by Dirichlet pro- 
cesses [31.71]. Abstracting from the direction of the 
arrows in Fig. 31.9 leads to a discriminative proba- 
bilistic model known as liner-chain conditional random 
fields [31.72], whose capability to model long term de- 
pendences is widely used in natural language parsing 
and computer vision. 


of the process, where the set of random variables and 
associated relationships is held fixed from a point on- 
ward. The nature of sequence data calls for the ability to 
model more dynamic phenomena. Processing of video 
information requires Markov networks that can unfold 
their structure across the video sequence. Even classic 
text analysis needs to account for novel generative dy- 
namics, where texts are produced as dynamic streams 
instead of being static collections of words, e.g., con- 
sider blog posts and associated comments, or the stream 
of social networks status updates. Therefore, the hori- 


Probabilistic Modeling in Machine Learning | References 


zon of current research is pushing graphical models to 
more dynamic formulations where, on the one hand, the 
structure is allowed to change over time and, on the 
other hand, the model is allowed to dynamically self- 
tune the number of parameters that is most adequate 
to represent the process at each time. Following the 
intuitions underlying the HMM approach, dynamical 
graphical models are being proposed that are capable 
of unfolding their structure across time, to better model 
the dynamics of complex time-varying processes. At 
the same time, concepts from nonparametric Bayesian 
statics are being used to develop models where latent 
variables can be dynamically adjusted to sample from 


References 


31.1 S. Kullback, R.A. Leibler: On information and suffi- 
ciency, Ann. Math. Stat. 22, 79-86 (1951) 

31.2 F. Rosenblatt: The perceptron: A probabilistic 
model for information storage and organization in 
the brain, Psychol. Rev. 65, 386-408 (1958) 

31.3 G. Deco, W. Finnoff, H.G. Zimmermann: Unsuper- 
vised mutual information criterion for elemination 
of overtraining in supervised mulilayer networks, 
Neural Comput. 7, 86-107 (1995) 

31.4 D.J.C. Mackay: Information Theory, Inference and 
Learning Algorithms (Cambridge Univ. Press, Cam- 
bridge 2003) 

31.5 R. Salakhutdinov, G. Hinton: Using deep belief nets 
to learn covariance kernels for Gaussian processes, 
Adv. Neural Inf. Process. Syst. 20, 1249-1256 (2008) 

31.6 C.M. Bishop: Pattern Recognition and Machine 
Learning (Springer, New York 2006) 

31.7 S. Seth, J.C. Principe: Variable selection: A statisti- 
cal dependence perspective, Proc. Int. Conf. Mach. 
Learn. Appl. (ICMLA) (2010) 

31.8 M. Rao, S. Seth, J. Xu, Y. Chen, H. Tagare, J.C. Princi- 
pe: A test of independence based on a generalized 
correlation function, Signal Process. 91, 15-27 (2011) 

31.9 D.D. Lee, H.S. Seung: Learning the parts of ob- 
jects by non-negative matrix factorization, Nature 
401(6755), 788-791 (1999) 

31.10 P. Comon, C. Jutten: Handbook of Blind Source Sep- 
aration (Academic, Oxford 2010) 

31.11 A. Hyvärinen, J. Karhunen, E. Oja: Independent 
Component Analysis (Wiley, New York 2001) 

31.12 A. Cichocki, R. Zdunek, A.H. Phan, S.-I. Amari: 
Nonnegative Matrix Tensor Factorizations (Wiley, 
Chichester 2009) 

31.13 E. Gaussier, C. Goutte: Relation between plsa and 
nmf and implications, Proc. 28th Int. ACM Conf. 
Res. Dev. Inf. Retr. (SIGIR'05) (ACM, New York 2005) 
pp. 601-602 

31.14 D.T. Pham: Mutual information approach to blind 
separation of stationary sources, IEEE Trans. Inf. 
Theory 48, 1935-1946 (2002) 


a virtually infinite set of events and where the very 
same structure of the latent space is adapted across time, 
i. e., through variable addition and pruning. Such a new 
class of dynamic graphical models introduces novel 
computational challenges associated with inference and 
representation of dynamic knowledge. The answers to 
this challenges can be partly found in the chapter, in 
the approximated inference methods discussed for static 
models and in the principles underlying the unfold- 
ing of Markov chains. Finally, it is worth to note that 
deep learning, described in Chap. 2, is an instance of 
graphical model where both nonlinearity and dynamic 
representations play an important role. 


31.15 M. Minami, S. Eguchi: Robust blind source sep- 
aration by beta divergence, Neural Comput. 14, 
1859-1886 (2002) 

31.16 T.-W. Lee, M. Girolami, T.J. Sejnowski: Indepen- 
dent component analysis using an extended info- 
max algorithm for mixed sub-Gaussian and super- 
Gaussian sources, Neural Comput. 11(2), 417-441 
(1999) 

31.17 K. Labusch, E. Barth, T. Martinetz: Sparse coding 
neural gas: Learning of overcomplete data repre- 
sentations, Neuro 72(7—-9), 1547-1555 (2009) 

31.18 A. Cichocki, S. Cruces, S.-l. Amari: Generalized 
alpha-beta divergences and their application to 
robust nonnegative matrix factorization, Entropy 
13, 134-170 (2011) 

31.19 |. Csiszár: Axiomatic characterization of informa- 
tion measures, Entropy 10, 261-273 (2008) 

31.20 F. Liese, |. Vajda: On divergences and informations 
in statistics and information theory, IEEE Trans. Inf. 
Theory 52(10), 4394-4412 (2006) 

31.21 T. Villmann, S. Haase: Divergence based vec- 
tor quantization, Neural Comput. 23(5), 1343-1392 
(2011) 

31.22 P.L. Zador: Asymptotic quantization error of contin- 
uous signals and the quantization dimension, IEEE 
Trans. Inf. Theory 28, 149-159 (1982) 

31.23 ‘T. Villmann, J.-C. Claussen: Magnification control in 
self-organizing maps and neural gas, Neural Com- 
put. 18(2), 446-469 (2006) 

31.24 B. Hammer, A. Hasenfuss, T. Villmann: Magnifica- 
tion control for batch neural gas, Neurocomputing 
70(7-9), 1225-1234 (2007) 

31.25 E. Merényi, A. Jain, T. Villmann: Explicit magnifi- 
cation control of self-organizing maps for “forbid- 
den" data, IEEE Trans. Neural Netw. 18(3), 786-797 
(2007) 

31.26 T. Villmann, S. Haase: Magnification in divergence 
based neural maps, Proc. Int. Jt. Conf. Artif. Neural 
Netw. (IJCNN 2011), ed. by R. Mikkulainen (IEEE, Los 
Alamitos 2011) pp. 437-441 


573 


LE | d Hed 


574 Part D | Neural Networks 
31.27 R. Chalasani, J.C. Principe: Self organizing maps 31.43 R. Andonie, A. Cataron: An information energy LVQ 
with the correntropy induced metric, Proc. Int. approach for feature ranking, Eur. Symp. Artif. Neu- 
Jt. Conf. Artif. Neural Netw. (CNN 2010) (IEEE, ral Netw. 2004, ed. by M. Verleysen (d-side, Evere 
Barcelona 2010) pp. 1-6 2004) pp. 471-476 
31.28 T. Lehn-Schigler, A. Hegde, D. Erdogmus, J.C. Prin- 31.44 R. Jenssen, D. Erdogmus, J.C. Principe, T. Eltoft: 
cipe: Vector quantization using information theo- Some equivalences between kernel methods and 
retic concepts, Nat. Comput. 4(1), 39-51 (2005) information theoretic methods, J. VLSI Signal Pro- 
31.29 R. Jenssen, D. Erdogmus, J.C. Principe, T. Eltoft: The cess. 45, 49-65 (2006) 
Laplacian PDF distance: A cost function for clus- 31.45 P.J.G. Lisboa, T.A. Etchells, I.H. Jarman, C.T.C. Ar- 
tering in a kernel feature space, Adv. Neural Inf. sene, M.S.H. Aung, A. Eleuteri, A.F.G. Taktak, 
Process. Syst., Vol. 17 (MIT Press, Cambridge 2005) F. Ambrogi, P. Boracchi, E. Biganzoli: Partial lo- 
pp. 625-632 gistic artificial neural network for competing risks 
31.30 A. Hegde, D. Erdogmus, T. Lehn-Schigler, Y.N. Rao, regularized with automatic relevance determina- 
J.C. Principe: Vector quantization by density tion, IEEE Trans. Neural Netw. 20(9), 1403-1416 
matching in the minimum  Kullback-Leibler- (2009) 
divergence sense, Proc. Int. Jt. Conf. Artif. Neural 31.46 M.I. Jordan: Graphical models, Stat. Sci. 19, 140-155 
Netw. (UCNN), Budapest (IEEE, New York 2004) (2004) 
pp. 105-109 31.47 D. Koller, N. Friedman: Probabilistic Graphical 
31.31 G.E. Hinton, S.T. Roweis: Stochastic neighbor em- Models: Principles and Techniques - Adaptive 
bedding, Adv. Neural Inf. Process. Syst., Vol. 15 (MIT Computation and Machine Learning (MIT Press, 
Press, Cambridge 2002) pp. 833-840 Cambridge 2009) 
31.32  L.van der Maaten, G. Hinten: Visualizing data using 31.48 A.P. Dempster, N.M. Laird, D.B. Rubin: Maximum 
t-SNE, J. Mach. Learn. Res. 9, 2579-2605 (2008) likelihood from incomplete data via the EM algo- 
31.33 K. Bunte, S. Haase, M. Biehl, T. Villmann: Stochastic rithm, J. R. Stat. Soc. Ser. B 39(1), 1-38 (1977) 
neighbor embedding (SNE) for dimension reduc- 31.49 M.E. Tipping, C.M. Bishop: Probabilistic principal 
tion and visualization using arbitrary divergences, component analysis, J. R. Stat. Soc. Ser. B 61(3), 611- 
Neurocomputing 90(9), 23-45 (2012) 622 (1999) 
31.34 M. Strickert, F.-M. Schleif, U. Seiffert, T. Villmann: 31.50 T. Hofmann: Unsupervised learning by probabilistic 
Derivatives of pearson correlation for gradient- latent semantic analysis, Mach. Learn. 42(1/2), 177- 
based analysis of biomedical data, Intel. Artif. Rev. 196 (2001) 
an Iberoam. Intel. Artif. 37, 37-44 (2008) 31.51 M. Welling, C. Chemudugunta, N. Sutter: Determin- 
=i 31.35 M. Strickert, B. Labitzke, A. Kolb, T. Villmann: Multi- istic latent variable models and their pitfalls, SIAM 
o spectral image characterization by partial general- Int. Conf. Data Min. (2008) 
— ized covariance, Proc. Eur. Symp. Artif. Neural Netw. 31.52 D.M. Blei, AY. Ng, M.I. Jordan: Latent Dirich- 
2 (ESANN'2011), Louvain-La-Neuve, ed. by M. Verley- let allocation, J. Mach. Learn. Res. 3, 993-1022 
sen (2011) pp. 105-110 (2003) 
31.36 V. Gómez-Verdejo, M. Verleysen, J. Fleury: Informa- 31.53 T. Minka, J. Lafferty: Expectation propagation for 
tion-theoretic feature selection for functional data the generative aspect model, Proc. Conf. Uncertain. 
classification, Neurocomputing 72(16-18), 3580- Al (2002) 
3589 (2009) 31.54 T. Griffiths, M. Steyvers: Finding scientific topics, 
31.37 B. Hammer, T. Villmann: Generalized relevance Proc. Natl. Acad. Sci. USA 101, 5228-5235 (2004) 
learning vector quantization, Neural Netw. 15(8/9), 31.55 M. Blei, D. Blei, T. Griffiths, J. Tenenbaum: Hi- 
1059-1068 (2002) erarchical topic models and the nested Chinese 
31.38 T. Villmann, M. Kästner: Sparse functional rel- restaurant process, Adv. Neural Inf. Process. Syst., 
evance learning in generalized learning vector Vol. 16 (MIT Press, Cambridge 2004) p. 17 
quantization, Lect. Notes Comput. Sci. 6731, 79-89 31.56 M. Rosen-Zvi, T. Griffiths, M. Steyvers, P. Smyth: The 
(2011) author-topic model for authors and documents, 
31.39 M. Kästner, B. Hammer, M. Biehl, T. Villmann: Func- Proc. 20th Conf. Uncertain. Artif. Intell., UAI '04 
tional relevance learning in generalized learning (AUAI, Corvallis 2004) pp. 487-494 
vector quantization, Neurocomputing 90(9), 85-95 31.57 L.-J. Li, L. Fei-Fei: What, where and who? classi- 
(2012) fying events by scene and object recognition, IEEE 
31.40 A. Kraskov, H. Stogbauer, P. Grassberger: Estimat- Tith Int. Conf. Comput. Vis. (ICCV) 2007 (2007), pp. 
ing mutual information, Phys. Rev. E 69(6), 66-138 1-8 
(2004) 31.58 L.R. Rabiner: A tutorial on hidden markov models 
31.41 Y.-I. Moon, B. Rajagopalan, U. Lall: Estimating and selected applications in speech recognition, 
mutual information by kernel density estimators, Proc. IEEE 77(2), 257-286 (1989) 
Phys. Rev. E 52, 2318-2321 (1995) 31.59 L.E. Baum, T. Petrie: Statistical inference for proba- 
31.42 J.C. Principe: Information Theoretic Learning bilistic functions of finite state Markov chains, Ann. 


(Springer, Heidelberg, 2010) 


Math. Stat. 37(6), 1554-1563 (1966) 


Probabilistic Modeling in Machine Learning 


References 


31.60 


31.61 


31.62 


31.63 


31.64 


31.65 


31.66 


S.E. Levinson, L.R. Rabiner, M.M. Sondhi: An intro- 
duction to the application of the theory of proba- 
bilistic functions of a Markov process to automatic 
speech recognition, Bell Syst. Tech. J. 62(4), 1035- 
1074 (1983) 

P.A. Devijver: Baum's forward-backward algorithm 
revisited, Pattern Recogn. Lett. 3(6), 369-373 (1985) 
M. Brand, N. Oliver, A. Pentland: Coupled hid- 
den Markov models for complex action recognition, 
Computer Vision and Pattern Recognition, Proc., 
1997 IEEE (1997) pp. 994-999 

Z. Ghahramani, M.I. Jordan: Factorial hidden 
Markov models, Mach. Learn. 29(2), 245-273 
(1997) 

Y. Bengio, P. Frasconi: Input-output HMMs for se- 
quence processing, IEEE Trans. Neural Netw. 7(5), 
1231-1249 (1996) 

Y. Li, H.Y. Shum: Learning dynamic audio-visual 
mapping with input-output hidden Markov mod- 
els, IEEE Trans. Multimed. 8(3), 542-549 (2006) 

B. Knab, A. Schliep, B. Steckemetz, B. Wichern: 
Model-based clustering with hidden Markov mod- 
els and its application to financial time-series 
data, Proc. GfKI 2002 Data Sci. Appl. Data Anal. 
(Springer, Berlin, Heidelberg 2003) pp. 561-569 


31.67 


31.68 


31.69 


31.70 


31.71 


31.72 


M. Seifert, M. Strickert, A. Schliep, |. Grosse: Ex- 
ploiting prior knowledge and gene distances in 
the analysis of tumor expression profiles with 
extended hidden Markov models, Bioinformatics 
27(12), 1645-1652 (2011) 

M. Diligenti, P. Frasconi, M. Gori: Hidden tree 
markov models for document image classification, 
IEEE Trans. Pattern Anal. Mach. Intell. 25(4), 519- 
523 (2003) 

D. Bacciu, A. Micheli, A. Sperduti: Compositional 
generative mapping for tree-structured data - 
Part I: Bottom-up probabilistic modeling of trees, 
IEEE Trans. Neural Netw. Learn. Syst. 23(12), 1987- 
2002 (2012) 

D. Bacciu, A. Micheli, A. Sperduti: An input-output 
hidden Markov model for tree transductions, Neu- 
rocomputing 112, 34-46 (2013) 

M.J. Beal, Z. Ghahramani, C.E. Rasmussen: The infi- 
nite hidden Markov model, Adv. Neural Inf. Process. 
Syst. 14, 577-584 (2002) 

C. Sutton, A. McCallum: An introduction to con- 
ditional random fields for relational learning. In: 
Introduction to Statistical Relational Learning, ed. 
by L. Getoor, B. Taskar (MIT Press, Cambridge 2006) 
pp. 93-128 


575 


LE | d Hed 


Marco Signoretto, Johan A. K. Suykens 


This chapter addresses the study of kernel meth- 
ods, a class of techniques that play a major role 
in machine learning and nonparametric statistics. 
Among others, these methods include support vec- 
tor machines (SVMs) and least squares SVMs, kernel 
principal component analysis, kernel Fisher dis- 
criminant analysis, and Gaussian processes. The 
use of kernel methods is systematic and prop- 
erly motivated by statistical principles. In practical 
applications, kernel methods lead to flexible pre- 
dictive models that often outperform competing 
approaches in terms of generalization perfor- 
mance. The core idea consists of mapping data 
into a high-dimensional space by means of a fea- 
ture map. Since the feature map is normally chosen 
to be nonlinear, a linear model in the feature space 
corresponds to a nonlinear rule in the original do- 
main. This fact suits many real world data analysis 
problems that often require nonlinear models to 
describe their structure. 

In Sect. 32.1 we present historical notes and 
summarize the main ingredients of kernel meth- 
ods. In Sect. 32.2 we present the core ideas of 
statistical learning and show how regularization 
can be employed to devise practical learning al- 
gorithms. In Sect. 32.3 we show a selection of 
techniques that are representative of a large class 
of kernel methods; these techniques — termed 
primal—dual methods — use Lagrange duality 
as the main mathematical tools. Section 32.4 
discusses Gaussian processes, a class of kernel 
methods that uses a Bayesian approach to per- 
form inference and learning. Section 32.5 recalls 
different approaches for the tuning of parame- 
ters. In Sect. 32.6 we review the mathematical 
properties of different yet equivalent notions of 
kernels and recall a number of specialized kernels 
for learning problems involving structured data. 
We conclude the chapter by presenting applica- 
tions in Sect. 32.7. 


32.1 


32.2 


32.3 


32.4 


32.5 


32.6 


32.1 


32. Kernel Methods 


Background... neseser inserere 
32.1.1 Summary of the Chapter ............ 
32.1.2 Historical Background................ 
32.1.3 The Main Ingredients................. 


Foundations of Statistical Learning....... 
32.2.1 Supervised and Unsupervised 

Inductive Learning .................... 
32.2.2 Semi-Supervised and 

Transductive Learning................ 
32.2.3 Bounds on Generalization Error.. 
32.2.4 Structural Risk Minimization 

and Regularization.................... 
32.2.5 Types of Regularization .............. 


32,351 
32.3.2 SVMs for Function Estimation...... 
32.3.3 Main Features of SVMs................ 
32.3.4 The Class of Least-Squares SVMs.. 
32.3.5 Kernel Principal Component 
atate ee E E 


Gaussian Processes .................:::ccececees 
32l. DETMMUON «occ c.scsadcancvegensesegennnese 
32.4.2. GPs for REgression..............:06.-+6 
32.4.3 Bayesian Decision Theory........... 


Model Selection......................:ccccsseeeees 
32.5.1) rossNalhdatið M essc 
32.5.2 Bayesian Inference 

of Hyperparameters................... 


32.6.1 
32.6.2 Reproducing Kernels.................. 
32.6.3 Equivalence Between 

the Two Notions..................0c008 
32.6.4 Kernels for Structured Data......... 


AN PI CABMONG: cess cscccccssoncussedveapereedessere 
32.7.1 Text Categorization .................... 
32.7.2 Time-Series Analysis.................. 
32.7.3 Bioinformatics and Biomedical 
Applications... cissie 


Kerer STIRS oee eaea e io 


577 


578 Part D | Neural Networks 


ze | d Hed 


32.1 Background 


This chapter addresses the study of kernel methods, 
a class of techniques that play a major role in machine 
learning and nonparametric statistics. 

The development of kernel-based techniques [32.1, 
2] has been an important activity within machine 
learning in the last two decades. In this period, 
a number of powerful kernel-based learning algo- 
rithms were proposed. Among others, these methods 
include support vector machines (SVMs) and least 
squares SVMs, kernel principal component analysis, 
kernel Fisher discriminant analysis, and Gaussian pro- 
cesses. The use of kernel methods is systematic and 
properly motivated by statistical principles. In practi- 
cal applications, kernel methods lead to flexible pre- 
dictive models that often outperform competing ap- 
proaches in terms of generalization performance. The 
core idea consists of mapping data into a high-di- 
mensional space by means of a feature map. Since 
the feature map is normally chosen to be nonlin- 
ear, a linear model in the feature space corresponds 
to a nonlinear rule in the original domain. This 
fact suits many real world data analysis problems 
that often require nonlinear models to describe their 
structure. 


32.1.1 Summary of the Chapter 


In the rest of this section we present historical notes 
and summarize the main ingredients of kernel meth- 
ods. In Sect. 32.2 we present the core ideas of sta- 
tistical learning and show how regularization can be 
employed to devise practical learning algorithms. In 
Sect. 32.3 we show a selection of techniques that 
are representative of a large class of kernel meth- 
ods; these techniques — termed primal—dual methods — 
use Lagrange duality as the main mathematical tools. 
Section 32.4 discusses Gaussian processes, a class 
of kernel methods that uses a Bayesian approach to 
perform inference and learning. Section 32.5 recalls 
different approaches for the tuning of parameters. In 
Sect. 32.6 we review the mathematical properties of 
different yet equivalent notions of kernels and recall 
a number of specialized kernels for learning prob- 
lems involving structured data. We conclude the chapter 
by presenting applications in Sect. 32.7. Additional 
information can be found in a number of existing tu- 
torials on SVMs and kernel methods, including [32.3- 
Fl: 


32.1.2 Historical Background 


The study of the mathematical foundation of kernels 
can be traced back at least to the beginning of the nine- 
teenth century in connection with a general theory of 
integral equations [32.8,9]. According to [32.10] the 
theory of reproducing kernel Hilbert spaces (RKHS) 
was first applied to detection and estimation prob- 
lems by Parzen [32.11]. Properties of (reproducing) 
kernels are thoroughly presented in [32.12]. A first 
systematic treatment in the domain of nonparametric 
statistics can be found in [32.13]. Modern mathemati- 
cal reviews include [32.14, 15]. The first use of kernel 
in the context of machine learning is generally at- 
tributed to [32.16]. The linear support vector algorithm, 
which undoubtedly had a prominent role in the his- 
tory of kernel methods, made its first appearance in 
Russia in the 1960s [32.17, 18], in the framework of 
the statistical learning theory developed by Vapnik and 
Chervonenkis [32.19, 20]. Later, the idea was developed 
in connection to kernels by Vapnik and co-workers at 
AT&T labs [32.21—24]. The novel approach was rooted 
on a solid theoretical foundation. Additionally, studies 
began to report state-of-the-art performances in a num- 
ber of applications, which further stimulated research 
on kernel-based techniques. 


32.1.3 The Main Ingredients 


Before delving into the details we now present the 
general setting for statistical learning problems and 
then briefly review the main ingredients of a sub- 
stantial part of kernel methods used in machine 
learning. 


Setting for Statistical Learning 
The setting of learning from examples comprises three 
components [32.25]: 


1. A generator of input data. We shall assume that data 
can be represented as vectors of RP. These vectors 
are independently and identically distributed (i.i.d.) 
according to a fixed but unknown probability distri- 
bution p(x). 

2. A supervisor that, given input data x, returns an out- 
put value y according to a conditional distribution 
p(y|x) also fixed and unknown. Note that the super- 
visor might or might not be present. 


Kernel Methods | 32.1 Background 


3. A learning machine (or learning algorithm) able to 
choose an hypothesis 


fe 8). (32.1) 


Note that the hypothesis f is a function of x and de- 
pends upon a parameter vector 0 belonging to a set 
©. The corresponding hypothesis space is then 


S={fx;,0):0 c0}, (32.2) 
which is one-to-one with the parameter space ©. 


When the supervisor is present the learning problem 
is called supervised. The goal is to find that hypothesis 
that best mimics the supervisor response. When the su- 
pervisor is not present, the learning problem is called 
unsupervised. In this case, the aim is to find an hypoth- 
esis that represents the best concise representation of 
the data produced by the generator. 

In both cases we might be interested either in the 
whole domain or we might be concerned only with 
a specific subset of points. 


Feature Mapping and Kernel Trick 
Kernel methods are a special class of learning al- 
gorithms. Their main idea consists of mapping input 
points, generally represented as elements of R?, into 
an high-dimensional inner product space F, called the 
feature space. The mapping is performed by means of 
a feature map ¢ 


0) :R? >F, 
xe p(x). (32.3) 


One then approaches the learning task of interest by 
finding a linear model in the features space according 
to training points ¢ (x1), ..., (xv) € F. Since the fea- 
ture map is normally chosen to be nonlinear, a linear 
model in the feature space corresponds to a nonlinear 
rule in R?. Alternative kernel methods differ in the way 
the linear model in the feature space is found. Nonethe- 
less, a common feature across different techniques is 
the following. If the algorithm can be expressed solely 
in terms of inner products, one can restate the problem 
in terms of evaluations of a kernel function 


k:RP xR? SR, 
(x,y) = k(x, y), (32.4) 


by letting 


k(x, y) = o(x) 60). (32.5) 


This fact, usually referred to as the kernel trick, is of 
particular interest for the cases where the feature space 
is infinite dimensional, which prevents direct computa- 
tion in the feature space. In practice, one often starts 
with designing a positive definite kernel, which guaran- 
tees the existence of a feature map ¢ satisfying (32.4). 


Primal—Dual Estimation Techniques 
As we shall see, an important class of machine learn- 
ing methods consists of primal—dual learning tech- 
niques [32.1,2, 26]. In this case, one starts from a pri- 
mal model representation of the type 


f(x; w, b) = w! (x) +b 
= widi(x) +d. (32.6) 


With reference to (32.1) note that here we have the 
tuple 0 = (w, b). The primal problem is then a mathe- 
matical optimization problem aimed at finding optimal 
w € F and b e R. Notably, the right-hand side of (32.6) 
is affine in (x); however, since @¢ is in general a non- 
linear mapping, f is a nonlinear function of x. 

A first approach consists of solving the primal 
problem. The information content of training data is 
absorbed into the primal model’s parameters during 
the procedure to find optimal parameters; the eval- 
uation of the model (32.6) on new patterns (out-of- 
sample extension) does no longer require the use of 
training data; therefore, they can be discarded after 
training. 

A second approach relies on Lagrangian duality ar- 
guments. In this case, the solution is represented in 
terms of dual variables a;,a@2,...,a@y and solved in 
«a,€ R” and be R. The dual model representation is 
then 


N 
fa, b) = YO oink %n,x) +b (32.7) 


n=l 


and depends upon the training patterns x1, x2, ..., Xy € 
R?. The representation in (32.6) is usually called para- 
metric, while (32.7) is the nonparametric representa- 
tion [32.26]. 


579 


ze | d Hed 


580 PartD | Neural Networks 


TZE | d Hed 


32.2 Foundations of Statistical Learning 


In this section, we briefly recall the main nomenclature 
and give a basic introduction on statistical learning the- 
ory. Historically, statistical learning theory constituted 
the theoretical foundation upon which the main meth- 
ods of support vector machines were grounded. The 
theory is similar in spirit to a number of alternative 
complexity criteria and bias-variance trade-off curves. 
Nowadays, it remains a powerful framework for the de- 
sign of learning algorithms. 


32.2.1 Supervised and Unsupervised 
Inductive Learning 


We have already introduced the distinction between su- 
pervised and unsupervised. Three important learning 
tasks are found within this categorization: regression, 
classification, and density estimation. In regression the 
supervisor’s response takes values in the real numbers. 
In classification the supervisor’s output takes values in 
the discrete finite set of possible labels Y. In particular, 
in the binary classification problem Y consists of two 
elements, e.g., Y = {—1, 1}. Density estimation is an in- 
stance of unsupervised learning: there is no supervisor. 
The functional relation to be learned from examples is 
the probability density p(x) (the generator). Supervised 
and unsupervised learning are concerned with estimat- 
ing a function (an optimal hypothesis) over the whole 
input domain R? based upon a finite set of training 
points. Therefore, they are inductive approaches aiming 
at the general picture. 


32.2.2 Semi-Supervised and Transductive 
Learning 


Semi-Supervised Inductive Learning 
In supervised learning the N training data are i.i.d. pairs 


{@1: y1), @2y2), -ON YN) CR? xY, (32.8) 


each of which is assumed to be drawn according to 


p, y) = polpa). (32.9) 


There is yet another inductive approach, namely semi- 
supervised learning. In semi-supervised learning one 
has a set of labeled pairs (32.8), as in supervised learn- 
ing, as well as a set of unlabeled data 


{XN+ XN+2 -o XN+ r} C RP (32.10) 


i.i.d. from the generator p(x), as in unsupervised learn- 
ing. The purpose is the same as in supervised learning: 
to find an approximation of the supervisor response. 
However this goal is achieved by a learning algorithm 
that takes into account the additional information com- 
ing from the unlabeled data. According to [32.27], 
semi-supervised learning was popularized for the first 
time in the mid-1970s although similar ideas appeared 
earlier. Alternative semi-supervised learning algorithms 
differ in the way they exploit the information from the 
unlabeled set. One popular idea is to assume that the 
(possibly high-dimensional) input data lie (roughly) on 
a low-dimensional manifold [32.28-31]. 


Transductive Learning 

In induction one seeks for the general picture with 
the purpose of making an out-of-sample prediction. 
This is an ambitious goal that might be unmotivated 
in certain settings. What if all the (unlabeled) data 
are given in advance? Suppose that one is only inter- 
ested in prediction at given (finitely many) points. It is 
expected that this less ambitious task results in sim- 
ple inference problems. These ideas are reflected in 
the approach found in transductive learning formula- 
tions. As in semi-supervised learning in transductive 
learning one has training pairs (32.65) as well as test 
(unlabeled) data (32.10). However, differently from 
semi-supervised learning one is only interested in mak- 
ing predictions for the test data (32.10). 


32.2.3 Bounds on Generalization Error 


Transductive and inductive inference share the common 
goal of achieving the lowest possible error on test data. 
In contrast with induction, transduction assumes that in- 
put test data are given in advance and consist of a finite 
discrete set of patterns drawn from the same distribution 
as the training set. From this perspective, it is clear that 
both transductive and inductive learning are concerned 
with generalization. In turn, a powerful framework to 
study the problem of generalization is the structural risk 
minimization (SRM) principle. 


Expected and Empirical Risk 
The starting point is the definition of a loss L(y, f (x; 0)), 
or discrepancy, between the response y of the supervi- 
sor to a given input x and the response f(x; 0) of the 
learning algorithm (that can be transductive or induc- 
tive). Formally, the generalization error can be defined 


Kernel Methods | 32.2 Foundations of Statistical Learning 581 


as the expected risk 


RO) = [rose 0))p(x, y)dxdy . (32.11) 


From a mathematical perspective the goal of learning 
is the minimization of this quantity. However, p(x, y) is 
unknown and one can rely only on the sample version 
of (32.11), namely the empirical risk 


N 
Rep (O)= >) LOr f&n 0) - (32.12) 


n=l 


A possible learning approach is based on empirical risk 
minimization (ERM) and encompasses maximum like- 
lihood (ML) inference [32.25]. It consists of finding 


Oy := arg min Ronp(9) . (32.13) 


Consistency 


Definition 32.1 
The ERM approach is said to be consistent if 


N oy NS. 
Remp (0n) — Ant R(0), 
~ N 
R(O, inf R(A) , 
(Ov) —> m (8) 


N : m 
where —> denotes convergence in probability for 
N>o. 


In words: the ERM is consistent if, as the number of 
training patterns N increases, both the expected risk 
R(6y) and the empirical risk RY (6n) converge to the 
minimal possible risk minge@ R(@), see Fig. 32.1 for 
an illustration. 


Fig. 32.1 Consistency of ERM 


It was shown in [32.32] that the necessary and suf- 
ficient condition for consistency is that 


P} sup |R(6)—R%,,(@)| =e} —> 0, We>0. 
Oco 


(32.14) 


In turn, the necessary and sufficient conditions 
for (32.14) to hold true were established in 1968 by 
Vapnik [32.33, 34] and are based on capacity factors. 


Capacity Factors 

Consistency is one of the main theoretical questions in 
statistics. From a learning perspective, however, it does 
not address the most important aspect. The aspect that 
one should be mostly concerned with is how to con- 
trol the generalization of a certain learning algorithm. 
Whereas consistency is an asymptotic result, we want to 
minimize the expected risk given that we have available 
only finitely many observations to train the learning 
algorithm. It turns out, however, that consistency is cen- 
tral to address also this aspect [32.25]. Additionally, 
a crucial role for answering this question is played by 
capacity factors that, roughly speaking, are all measures 
of how well the set of functions {f (x; 0) : 0 € ©} can 
separate data. A more detailed description is given in 
the following (precise definitions and formulas can be 
found in [32.25, Chap. 2]). In general, the theory states 
that without restricting the set of admissible functions, 
the ERM is not consistent. The interested reader is re- 
ferred to [32.25, 35]. 


VC Entropy. The first capacity factor (here and be- 
low VC is used as an abbreviation for Vapnik—Chervo- 
nenkis.) relates to the expected number of equivalence 
classes according to which the training patterns divide 
the set of functions {f(x; 6): 0 € ©} (an equivalence 
class is a subset of {f (x; 0) : 0 € ©} consisting of func- 
tions that attribute the same labels to the input pattern in 
the training set). We denote the VC entropy by En(p, N), 
where the symbols emphasize the dependence of the VC 
Entropy on the underlying joint probability p and the 
number of training patterns N. The condition 


forms the necessary and sufficient condition for (32.14) 
to hold true with respect to the fixed probability den- 


sity p. 


Growth Function. It corresponds to the maximal 
number of equivalence classes with respect to all the 


TZE | d Hed 


582 


TZE | d Hed 


Part D 


Neural Networks 


possible training samples of cardinality N. As such, it 
is a distribution-independent version of the VC entropy 
obtained via a worst-case approach. We denote it by 
Gr(N). The condition 


InGr(N) _ 


Noo 


0 


forms the necessary and sufficient condition for (32.14) 
to hold true for all the probability densities p. 


VC Dimension. This is the cardinality of the largest 
set of points that the algorithm can shatter; we de- 
note it by dimyc. Note that dimyc is a property of 
{f (x; 8) : 0 € ©}, which neither depends on N nor on p. 
Roughly speaking it tells how flexible the set of func- 
tions is. A finite value of dimyc forms the necessary 
and sufficient condition for (32.14) to hold true for all 
the probability densities p. 

The three capacities are related by the chain of in- 


equalities [32.33, 34] 
a +1 
dimyc : 


(32.15) 


En(p, N) < InGr(N) < dimyc (in 


Finite-Sample Bounds 
One of the key results of the theory developed by Vapnik 
and Chervonenkis is the following probabilistic bound. 
With probability 1 — 7 simultaneously for all 6 € © it 
holds that [32.25] 


En(p, 2N) —In7n 
snip OF y ~H 


R(0) < RY (32.16) 
Note that the latter depends on p. The result says that, 
for a fixed set of functions {f (x; 0): 0 € ©}, one can 
pick that 6 € © that minimizes ae (0) and in this way 
obtain the best guarantee on R(@). Now, taking into ac- 
count (32.15) one can formulate the following bound 


based on the growth function 


In Gr(2N) — lnn 
empl) + y n 4 


R(0) < RY (32.17) 


In the same way one has 


VC 


N 


m dimyc (in a + 1) —Inn 
R(0) < Remp l8) + : 


(32.18) 


Figure 32.2 illustrates the main idea. 


Note that both (32.17) and (32.18) are distribution- 
independent. Additionally (32.18) only depends upon 
the VC dimension (which, contrary to Gr, is in- 
dependent from N). Unfortunately there is no free 
lunch: (32.17) is less tight than (32.16) and (32.18) is 
less tight than (32.17). 

So far we gave a flavor of the theoretical framework 
in which the support vector algorithms were originally 
conceived. Recent research reinterpreted and signifi- 
cantly improved the error bounds using mathematical 
tools from approximation and learning theory, func- 
tional analysis, and statistics. The interested reader is 
referred to [32.36] and [32.37]. Although tighter bounds 
exist, the study of sharper bounds remains a challenge 
for future research. In fact, existing bounds are nor- 
mally too loose to lead to practical model selection 
techniques, i.e., strategies for tuning the parameters 
that control the capacity of the model’s class. Nonethe- 
less, the theory provides important guidelines for the 
derivation of algorithms. 


The Role of Transduction 
It turns out that a key step in obtaining the 
bound (32.16) is based upon the symmetrization lemma 


P {sup |R(O) — Rep (O)| > e} 
, (32.19) 


(0)— RY} 


N: E 
< 2P Ísup IR OE | 


emp 


where Ri, and Rd, are constructed upon two dif- 
ferent i.i.d. samples, precisely as in transduction. More 
specifically, (32.16) comes from upper-bounding the 
right-hand side of (32.19) [32.38]. More generally it is 
apparent that to obtain all bounds of this type the key 


element remains the symmetrization lemma [32.25]. 


Error 


Bound on test error 


Training error 


L > 
VC dimension 


Fig. 32.2 Illustration of the generalization bound depend- 
ing on capacity factors 


Kernel Methods | 32.2 Foundations of Statistical Learning 583 


Notably starting from the latter one can derive bounds 
explicitly designed for the transductive case where one 
of the two samples plays the role of the training set 
and the other of the test set. In light of this, Vapnik ar- 
gues that transductive inference is a fundamental step 
in machine learning. Additionally, since the bounds for 
transduction are tighter than those for induction, the 
theory suggests that, whenever possible, transductive 
inference should be preferred over inductive inference. 
Practical algorithms can take advantage of this fact by 
implementing the adaptive version of the structural risk 
minimization (SRM) principle that we discuss next. 


32.2.4 Structural Risk Minimization 
and Regularization 


The Structural Risk Minimization Principle 
The structure of the bounds above suggests that one 
should minimize the empirical risk while controlling 
some measure of complexity. The idea behind the SRM, 
introduced in the 1970s, is to construct nested subsets of 
functions 


SCS C- C8, =$= F(x, 9): 90€O}, 
(32.20) 


where each subset S$; has capacity h; (VC entropy, 
growth function, or VC dimension) with hı < M < 
-++ < hı and S is the entire hypothesis space. Then one 
chooses an element of the nested subsets so that the sec- 
ond term in the right-hand side of the bounds is kept 
under control; within that subset one then picks that 
specific function that minimizes the empirical risk. As 
Vapnik points out in [32.38]: 


[...] to find a good solution using a finite (limited) 
number of training examples one has to construct 
a (smart) structure which reflects prior knowledge 
about the problem of interest. 


In practice one can use the information coming from the 
unlabeled data to define a smart structure to improve the 
learning. In other words, the side information coming 
from unlabeled data can serve the purpose of devising 
a data-dependent set of functions. On top of this, one 
should use additional side information over the struc- 
ture of the problem, whenever available. Indeed, using 
informative representations for the input data is also 
a way to construct a smart set of functions. In fact, rep- 
resenting the data in a suitable form implies a mapping 
from the input space to a more convenient set of fea- 


tures. We will discuss this aspect more extensively in 
Sect. 32.6. 


Learning Through Regularization 

So far we have addressed the theory but we have not 
talked about how to practically implement it. It is under- 
stood that the essential idea of SRM is to find the best 
trade-off between the empirical risk and some measure 
of complexity (the capacity) of the hypothesis space. 
This ensures that the left-hand side of VC bounds — 
the expected risk that we are interested in to achieve 
generalization — is minimized. In practice there are dif- 
ferent ways to define the sets in the sequence (32.20). 
The generic set $; could be the set of polynomials of 
degree / or a set of splines with / nodes. However, it is 
in connection to regularization theory that practical im- 
plementations of the SRM principle find their natural 
domain. 


Tikhonov Theory 
Regularization theory was introduced by Andrey 
Tikhonov [32.3941] as a way to solve ill-posed prob- 
lems. Ill-posed problems are problems that are not well 
posed in the sense of Hadamard [32.42]. Consider solv- 
ing in f a linear operatorial equation of the type 

Af =b. (32.21) 
In the general case, f is an element of a Hilbert space, 
A is a compact operator, and b is an element of its range. 
Even if a solution exists, it is often observed that a slight 
perturbation of the right-hand side b causes large devi- 
ations in the solution f. Tikhonov proposed to solve this 
problem by minimizing a functional of the type 


IAF -bI +A re), 


where ||- || is a suitable norm on the range of A, A is 
some hyperparameter, and I" is a regularization func- 
tional (sometimes called stabilizer). The theory of such 
an approach was developed by Tikhonov and Ivanov; 
in particular it was shown that there exists a strategy to 
choose À depending on the accuracy of b that asymptot- 
ically leads to the desired solution f*. This was shown 
under the assumption that there exists c* such that 
ft ETEA 

According to Vapnik [32.20], the theory of 
Tikhonov regularization differs from statistical learning 
theory in a number of ways. To begin with Tikhonov 
regularization considers specific structures in the nested 
sequence (32.20) (depending on the way I” is defined); 


TZE | d Hed 


584 PartD 


Neural Networks 


TZE | d Hed 


secondly it requires the solution to be in the hypothe- 
sis space; finally the theory developed by Tikhonov and 
Ivanov was not concerned with guarantees for a finite 
number of observations. 

When f is an element of a reproducing kernel 
Hilbert space (RKHS) (Sect. 32.6), the theory is best 
known through the work of Wahba [32.13, 43]. 


SRM and Regularization in RKHSs 
SVMs, and more generally, primal—dual learning algo- 
rithms, represent an important class of kernel methods. 
The primal—dual approach emphasizes the geometrical 
aspects of the problem and it is particularly insight- 
ful when (32.7) is used to define a discriminative rule 
arising in a Classification problem. We will consider 
this class of learning algorithms in later sections. Al- 
ternatively, the setting of RKHSs provides a convenient 
way to define the sequence (32.20). When the hypoth- 
esis space S coincides with a RKHS of functions H, 
a nested sequence can be constructed by bounding the 
norm in H, used as a proxy for the complexity of 
models 

S= EH : [IFI <a}. (32.22) 
It turns out that there is a measure of capacity of ), 
which is an increasing function of a; [32.44]. This 
capacity measure can be used to derive probabilistic 
bounds in line with (32.16), (32.17), and (32.18). In 
practice, instead of solving the constrained problem 


min Romp) 


subject to ||f|| < a (32.23) 
for any /, one normally solves the provably equivalent 
penalized problem 


min Rump O + Ar MA? (32.24) 


fe 


and pick the optimal A; appropriately. Note that 
in (32.23) and (32.24) we wrote RN (f) instead of 


emp 
RY ap (9); as before. In fact, in this case, the solution of 
the learning problem is found by formulating a convex 
variational problem where the function f itself plays the 
role of the optimization variable 0. In practice, how- 
ever, the representer theorem [32.13, 43,45,46] shows 
that a representation of the optimal f only depends upon 
an expansion of kernel functions centered at the train- 
ing patterns. This result leads to a representation for f in 
line with (32.7). More specifically it holds that f(x) = 


SS at, k(X,,x) where a € R is found solving a finite 
dimensional optimization problem. The latter is con- 
vex [32.47] provided that L in the empirical risk (32.12) 
is a convex loss function. 


Abstract Penalized Empirical Risk Minimization 

Problems 
The penalized empirical risk minimization problem was 
introduced in (32.24) in the setting of RKHS of func- 
tions. However, it shall be noted that it is a very general 
idea. Ultimately this can be related to the general- 
ity of (32.21). The latter can either refer to infinite 
dimensional problems or to a finite system of linear 
equations involving objects living in some finite dimen- 
sional space. Therefore, for the sake of generality, one 
can consider in place of (32.24) the problem 


in RY (O)+AT(0), 
min Remp(@) + 4 PO) 


(32.25) 


where © — which is one-to-one with the hypothesis 
space — either coincides with some abstract vector 
space, or it is a subset of it; ae is the empirical risk 
and I”: © — R is a suitable penalty function. This, in 
particular, includes the situations where @ is a vector, 
a matrix, or a higher-order tensor (i. e., a higher-order 
array generalizing the notion of matrices). 


32.2.5 Types of Regularization 


A penalty frequently used in practice is T (0) = ||6||’, 
where ||@|| is the Hilbertian norm defined upon the 
space’s inner product 

18l? = (8,0). (32.26) 
This choice leads to ridge regression [32.48, 49]. 
Note that, in this case, ||0|| = 0 if and only if 0 is 
the zero vector of the space. A more general class 
of quadratic penalties is represented by seminorms. 
A seminorm is allowed to assign zero length to some 
nonzero vectors (in addition to the zero vector). They 
are commonly used in smoothing splines [32.13, 50], 
where the unknown is decomposed into an unpenalized 
parametric component and a penalized nonparametric 
part. 


LASSO and Non-Hilbertian Norms 
The methods that we present in the next sections are 
all instances of the problem class in (32.25); although 
this is not necessarily emphasized in the presentation, 
they all employ a simple quadratic penalty. This is 


Kernel Methods | 32.2 Foundations of Statistical Learning 585 


central to relying on Lagrange duality theory [32.47, 
51], which, in turn, constitutes the main technical tool 
for the derivation of a large class of kernel methods. 
However, it is important to mention that in the last 
decade much research effort has been expended on the 
design of alternative penalties (correspondingly, there 
has been increased interest in other notions of du- 
ality, such as Fenchel duality). This arises from the 
realization that using a certain penalty is also a way 
to convey prior knowledge. This fact is best under- 
stood within a Bayesian setting, in light of a maxi- 
mum a posteriori (MAP) interpretation of (32.25), see, 
e.g., [32.44]. A penalty term based on the space’s 
inner product has been replaced with various type 
of non-Hilbertian norms. These are norms that, con- 
trary to (32.26), do not arise from inner products. 
LASSO (least absolute shrinkage and selection oper- 
ator, [32.52]) is perhaps the most prominent example 
of such cases. In LASSO one considers linear func- 
tions 


D 
f(x; 8) = (0.x) = D> axa 


d=1 


and uses the /; norm 


D 
Alls = >> 16al (32.27) 
d=1 


to promote the sparsity of the parameter vector 0. Note 
that this corresponds to defining the structure (32.20) 
according to 


Sı = (4) : ll <a} . (32.28) 


Like ridge regression, LASSO is a continuous shrink- 
age method that achieves good prediction performance 
via a bias-variance trade-off. Since usually the esti- 
mated coefficient vector has many entries equal to zero, 
the approach has the further advantage over ridge re- 
gression of giving rise to interpretable models. 

More recently, different structure-inducing penal- 
ties have been proposed as a promising alterna- 
tive [32.53—-55]. The general idea is to convey structural 
assumption on the problem, such as grouping or hierar- 
chies over the set of input variables, by suitably crafting 
the penalty. In this way, the users are permitted to cus- 
tomize the regularization approach according to their 
subjective knowledge on the task. Correspondingly, as 
in (32.28), one (implicitly) forms a smart structure of 


nested subsets of functions, in agreement with the SRM 
principle. 

These ideas have been generalized to the case where 
© is infinite dimensional, in particular in the frame- 
work of the multiple kernel learning (MKL) problem. 
This was investigated both from a functional view- 
point [32.56,57], and from the more pragmatic point 
of view of optimization [32.58—60]. 


Spectral Regularization 
Yet another generalization of (32.25) arises in the 
context of multitask learning [32.61—63]. In this set- 
ting one approaches simultaneously different learning 
tasks under some common constraint(s). The general 
idea, sometimes also known as collaborative filter- 
ing, is that one can take advantage of shared features 
across tasks. In practical applications it was shown that 
one can significantly gain in terms of generalization 
performance from exploiting such prior knowledge. 
From the point of view of learning through regular- 
ization, a sensible approach is given in [32.64, 65]. 
Suppose one has T datasets, one for each task; the t- 
th dataset has N, observations. Note that, in general, 
Nı Æ N2 #---#Nr. In this setting one has to learn 
vectors 0;, f=1,2,...,7, one per task; the parame- 
ter space is, therefore, a space of matrices, © = RIXT 
where F is, possibly, infinity. The idea translates into 
penalized empirical risk minimization problems of the 


type 


T 
nd 2 Remp (6:) +All6 lls 
(32.29) 
where ||0||x is the nuclear norm 
R 
lll = >> (8) (32.30) 
r=1 


and o(0),o2(0),--- ,or(@) are the R< min(T, F) 
nonzero singular values of the Fx T matrix 0. Note 
that (32.30) corresponds to the l} norm of the vector 
of singular values. The definition also remains valid 
in the infinite dimensional case under some regularity 
assumptions [32.66]. The nuclear norm is the convex 
envelope of the rank function on the spectral-norm unit 
ball [32.67]; roughly speaking, it represents the best 
convex relaxation of the rank function. 

The use of the nuclear norm in (32.29) is moti- 
vated by the assumption that the parameter vectors of 


TZE | d Hed 


586 PartD 


Neural Networks 


EZE | d Hed 


related tasks should be approximately linearly depen- 
dent. This assumption is meaningful for a number of 
cases of interest. Other uses of the nuclear norm exist; 
ultimately, this is due to the fact that notions of rank 
are ubiquitous in the mathematical formulations stem- 
ming from real-life problems. As a consequence, the 
nuclear norm is a very versatile mathematical tool to 


32.3 Primal-Dual Methods 


The purpose of this section is to introduce the methods 
that have served as the archetypal approaches for a large 
class of kernel methods. In the process, we detail the 
Lagrange duality argument underlying general primal- 
dual techniques. We begin by giving a short overview of 
the formulations of SVMs introduced by Vapnik; suc- 
cessively, we discuss a number of modifications and 
extensions. 


32.3.1 SVMs for Classification 


Margin 
The problem of pattern recognition amounts to finding 
the label y € {—1, 1} that corresponds to a generic input 
point x € R?. This task can be approached by assigning 
a label ĵ according to the model 
$= sign [Toe fe p] i (32.31) 


where w! (x) +b is a hyperplane, found by a learning 
algorithm based on training data 


{(x1, 91), (x2, y2), cee (xv, yw)} C R? x {-1, 1}. 
(32.32) 


xa) 


Fig. 32.3 Several possible separating hyperplanes exist 


impose structure on (seemingly) very diverse settings. 
This includes the identification of linear time-invariant 
systems [32.68,69] and the analysis of nonstation- 
ary cointegrated systems [32.70]. Finally we mention 
that, in place of (32.30), one can consider spectral 
penalties [32.71,72] that include the nuclear norm as 
a special case. 


The concept of feature map ¢ was presented in short 
in Sect. 32.1.3. Later we will discuss the role of ¢ in 
more detail. For now, it suffices to say that ġ is expected 
to capture features that are important for the discrimi- 
nation of points. In the simplest case, @ is the identity 
map, i.e., 6 : x x. Note that w! $(x) +b is a primal 
model of the type (32.6), with w € F and b € R. 

In general, one can see that there are several possi- 
ble separating hyperplanes, see Fig. 32.3. 

The solution picked by the support vector classifi- 
cation (SVC) algorithm is the one that separates the 
data with the maximal margin. More precisely, Vap- 
nik considered a rescaling of the problem so that points 
closest to the separating hyperplane satisfy the normal- 
izing condition 


[wp +] =1. (32.33) 


The two hyperplanes w | p(x) +b = 1 and w! (x) + 
b=-—1 are called canonical hyperplanes, and the 
distance between them is called the margin, see 
Fig. 32.4. 


XA 


xo 


Fig. 32.4 SVC finds the solution that maximizes the 
margin 


Kernel Methods | 32.3 Primal-Dual Methods 587 


Assuming that the classification problem is separa- 
ble, i.e., that there exists at least a hyperplane separat- 
ing the training data (32.65), one obtains a canonical 
representation (w, b) satisfying 


Yn (wT bl) +b) >1, n=1,...,N. (32.34) 


Let us assume, without loss of generality, that y; = 1 
and y2 = —1. If the corresponding patterns x; and x2 are 
among the closest points to the separating hyperplane, 
the scaling imposed by Vapnik implies 


wi o(x)+b=1, 

wl o(m)+b=-1, (32.35) 
which, in turn, leads to 

w! (1) —$(2)) = 2. (32.36) 


Now the normal vector to the separating hyperplane 
w! (x) +b is (1/||w||)w. The margin is equal to the 
component of the vector @ (x1) — (x2) along (1/||w]|)w, 
i. e., the projection 


(1/|lwl)w! ($) —¢@a)). 


Using (32.36) one obtains that the margin is equal 
to 2/||w||; correspondingly, the distance between the 
points satisfying (32.33) and the separating hyperplane 
is 1/||w||. By minimizing ||w]||, subject to the set of con- 
straints (32.34), one obtains a maximal margin classifier 
that maximizes the margin between the two classes. 
This hyperplane, in turn, can be naturally envisioned as 
the simplest solution given the observed data. 


Primal Problem 
In practice, for computational reasons it is more con- 
venient to minimize $l? = swlw rather than ||w|. 
Additionally, it is in general unrealistic to assume that 
the classification problem is separable. In practical ap- 
plications, one should try to find a set of features (in 
fact, a feature mapping from the input domain to a more 
convenient representation) that allow to separate the 
two classes as much as possible. Nonetheless, there 
might be no boundary that can perfectly separate the 
data; therefore one should tolerate misclassifications. 
Taking this requirement into account leads to the pri- 
mal problem for the SVC algorithm [32.24]. This is the 


quadratic programming (QP) problem 


N 
1 
min Jp(w, £) = 5wiw +c) & 


n=l 
subject to ynlw! (xn) +b)>1-6&,, 
&, =0, n=1,...,N, 


n=1,...,N 
(32.37) 
where c > 0 is a user-defined parameter. In this prob- 


lem, one accounts for misclassifications by replacing 
the set of constraints (32.34), with the set of constraints 


Yn (wT b(n) +b) = 1 En, n=1,...,N, 
(32.38) 
where £1, &,...,&y are positive slack variables. It is 


clear that for higher values of c one penalizes more the 
violations of the conditions in (32.34). 


Dual Problem 
The Lagrangian corresponding to (32.37) is 


L(w, b, E50, v) = Jp(w, €) 
N 


N 
-J at, (nw b(n) +B) — 1+ En) = Do vatn» 


n=1 


(32.39) 


n=l 


with Lagrangian multipliers a, > 0, v, >20 for n= 
1,...,N. The solution is given by the saddle point of 
the Lagrangian 


max min £(w, b, €; œ, v). (32.40) 


QV w,b,ẸE 


One obtains 


ðL 


= lia N e 
DE, " 


=0-0<a, <c, 
(32.41) 
The dual problem is then the QP problem 


max Jp(a) 
a 


N 
subject to a AnVn = 0 


n=1 


O<a,<c, n=1,...,N, (32.42) 


EZE | d Hed 


588 PartD 


Neural Networks 


EZE | d Hed 


where 


N N 
1 
Jp(a) = 3 S YmYnk(Xm, Xp )Um On T > An , 


mn=1 n=1 
(32.43) 
and we used the kernel trick 
km Xn) = (Xm)! Ean) a m,n = Wi sisctund Vs 
(32.44) 


The classifier based on the dual model representation is 


N 
sign bs OnYnk(X, Xn) + | i 


n=l 


(32.45) 


where @, are positive real numbers obtained solving 
(32.42) and b is obtained based upon Karush—Kuhn— 
Tucker (KKT) optimality conditions, i.e., the set of 
conditions that must be satisfied at the optimum of 
a constrained optimization problem. These are 


aL 
ay OO = Di Meda) 
aL y 
a T0 n=O, 
ðL 

=0—>c—Qr vn =0, n=1,...,N 
dEn 
Qn (vn(w Aan) T b) =i En) =0, 

n= Tss N y 
Vnén =0, n=1,...,N, 
a,>0, n=1,...,N, 
w20, n=1,...,N. 

(32.46) 


From these equations it can be seen that, at optimum, 
we have 


yn(w! ọn) + b)—1=0 if O<a,<c, (32.47) 


from which one can compute b. 


SVC as a Penalized Empirical Risk Minimization 
Problem 
The derivation so far followed the classical approach 
due to Vapnik; the main argument comes from geo- 
metrical insights on the pattern recognition problem. 


Whenever the feature space is finite dimensional, one 
can approach learning either by solving the primal prob- 
lem or by solving the dual one; when this is not the case, 
one can still use the dual problem and rely on the dual 
representation obtained. 

Before proceeding, we highlight a different, yet 
equivalent problem formulation. For the primal prob- 
lem this reads 


N 


wp D [i-a (Taen), tawn 


n=l 


(32.48) 


where we let A = 1/(2c) and we define [-]4. by 


a, ifa>0 


, (32.49) 
0, otherwise. 


[a]+ = 


Problem (32.37) is an instance of (32.25) obtained by 
letting (note that F x R is naturally equipped with the 
inner product ((w1, b1), (w2, b2)) = wi wo +b, bp. and it 
is a (finite-dimensional) Hilbert space (HS).) © = F x 
R and taking as penalty the seminorm 


r :(w,b)>w' w. (32.50) 


This shows that (32.37) is essentially a regularized em- 
pirical risk minimization problem that can be analyzed 
in the framework of the SRM principle presented in 
Sect. 32.2. 


VC Bounds for Classification 

In Sect. 32.2.3 we already discussed bounds on the 
generalization error in terms of capacity factors. In 
particular, (32.18) states a bound for the case of VC 
dimension. The larger this VC dimension the smaller 
the training error (empirical risk) can become but the 
confidence term (second term on the right-hand side 
of (32.18) will grow. The minimum of the sum of these 
two terms is then a good compromise solution. For 
SVM classifiers, Vapnik has shown that hyperplanes 
satisfying ||w|| < a have a VC dimension A that is upper- 
bounded by 


h < min([ra], N) +1, (32.51) 


where [-] represents the integer part and r is the 
radius of the smallest ball containing the points 
(x1), p2), .. ., $ (xy) in the feature space F. 

Note that for each value of a there exists a corre- 
sponding value of À in (32.48), correspondingly, a value 
of c in (32.37) or (32.42). Additionally, the radius r can 


Kernel Methods | 32.3 Primal-—Dual Methods 


also be computed by solving a QP problem, see, e.g., 
[32.26]. It follows that one could compute solutions 
corresponding to multiple values of the hyperparame- 
ters, find the corresponding empirical risk and radius 
and then pick the model corresponding to the least value 
of the right-hand side of the bound (32.18). As we have 
already remarked, however, the bound (32.18) is often 
too conservative. Sharper bounds and frameworks alter- 
native to VC theory have been derived, see, e.g. [32.73]. 
In practice, however, the choice of parameters is of- 
ten guided by data-driven model selection criteria, see 
Sect. 32.5. 


Relative Margin and Data-Dependent 

Regularization 
Although maximum margin classifiers have proved to 
be very effective, alternative notions of data separation 
have been proposed. 

The authors of [32.74, 75], for instance, argue that 
maximum margin classifiers might be misled by direc- 
tion of large variations. They propose a way to correct 
this drawback by measuring the margin not in an abso- 
lute sense but rather relative to the spread of data in any 
projection direction. Note that this can be seen as a way 
to conveniently craft the hypothesis space, an important 
aspect that we discussed in Sect. 32.2. 


32.3.2 SVMs for Function Estimation 


In addition to classification, the support vector method- 
ology has also been introduced for linear and nonlinear 
function estimation problems [32.25]. For the general 
nonlinear case, output values are assigned according to 
the primal model 


$=w' pœ) +b. (32.52) 


In order to estimate the model’s parameter w and b, 
from training data consisting of N input-output pairs, 
Vapnik proposed to evaluate the empirical risk accord- 
ing to 


1 N 
Remp = N > 


n=l 


yn —w! O(%,)—b : (32.53) 


with the so called Vapnik’s €-insensitive loss function 
defined as 


E _ fo, if |y—f@)| < € 
ly fle = an —e, otherwise. 
(32.54) 


The idea is illustrated in Fig. 32.5. 


The corresponding primal optimization problem is 
the QP problem 


: * 1 ~ * 
che Jp(w, &,€") = av wte (En + E ) 


n=l 


subject to yn -wl (x) -b< e€+ Gas 


E e205 NY 
wln) +b— yn <€ +8" ; 
A= ee N, 


he €°>0,n=1,...,N, (32.55) 


where c > 0 is a user-defined parameter that determines 
the amount up to which deviations from the desired ac- 
curacy € are tolerated. Following the same approach 
considered above for (32.37), one obtains the dual QP 
problem 


x 
max Jp(Qm, Q; 
a.a* 


N 
subject to X Oa —a7)=0, 

n=l 

O<a,<c, n=1,...,N, 


O<ar<c, n=1,...,N, (32.56) 
where 


Jp (Qn, až) = 


1 x ok ok 
-5 De (m= on) (Oen = OFF )K Am Xn) 


mn=1 
N N 
-eJ (n0) +J yaana). (32.57) 
n=l n=1 


Note that, whereas (32.37) and (32.42) have tuning pa- 
rameter c, in (32.55) and (32.56) one has the additional 
parameter €. 


Fig. 32.5 ¢-insensitive loss 


589 


EZE | d Hed 


590 PartD 


Neural Networks 


EZE | d Hed 


Before continuing, we note that a number of inter- 
esting modifications of the original SVR primal—dual 
formulations exist. In particular, [32.76] proposed the 
v-tube support vector regression. In this method, the ob- 
jective Jp(w, €, &*) in (32.55) is replaced by 


Jp(w, §,€",€) 
N 


1 1 * 
= We wpe (r ta 5 (Eté ) <- (32.58) 


n=1 
In the latter, € is an optimization variable rather than 
a hyperparameter, as in (32.37); v, on the other hand, 
is fixed by the user and controls the fraction of support 
vectors that is allowed outside the tube. 


32.3.3 Main Features of SVMs 


Here we briefly highlight the main features of support 
vector algorithms, making a direct comparison with 
classical neural networks. 


Choice of Kernel 
A number of possible kernels, such as the Gaussian 
radial basis function (RBF) kernel, can be chosen in 
(32.44). Some examples are included in Table 32.1. 

In general, it is clear from (32.44) that a valid ker- 
nel function must preserve the fundamental properties 
of the inner-product. That is, for the equality to hold, 
the bivariate function k : R? x R? > R is required to 
be symmetric and positive definite. Note that this, in 
particular, imposes restriction on t in the polynomial 
kernel. A more in depth discussion on kernels is post- 
poned to Sect. 32.6. 


Global Solution 
(32.37) and its dual (32.42) are convex problems. This 
means that any local minimum must also be global. 
Therefore, even though SVCs share similarities with 
neural network schemes (see below), they do not suf- 
fer from the well-known issue of local minima. 


Sparseness 
The dual model is parsimonious: typically, many a’s 
are zero at the solution with the nonzero ones located 
in the proximity of the decision boundary. This is also 


Table 32.1 Some examples of kernel functions 


kernel name k(x, y) 

Linear xly 

Polynomial of degree d > 0 (c+x!' y)?, fort >0 
Gaussian RBF exp(—||x — y||?/o7) 


desirable in all those setting were one requires fast on- 
line out-of-sample evaluations of models. 


Neural Network Interpretation 

Both primal (parametric) and dual (nonparametric) 
problems admit neural network representations [32.26], 
see Fig. 32.6. Note that in the dual problem the size of 
the QP problem is not influenced by the dimension D of 
the input space, nor it is influenced by the dimension of 
the feature space. Notably, in classical multilayer per- 
ceptrons one has to fix the number of hidden units in 
advance; in contrast, in SVMs the number of hidden 
units follows from the QP problem and corresponds to 
the number of support vectors. 


SVM Solvers 

The primal and dual formulations presented above 
are all QP problems. This means that one can rely 
on general purpose QP solvers for the training of 
models. Additionally, a number of specialized de- 
composition methods have been developed, including 
the sequential minimum optimization (SMO) algo- 
rithm [32.77]. Publicly available software packages 
such as libSVM [32.78, 79] and SVMlight [32.80] in- 
clude implementations of efficient solvers. 


32.3.4 The Class of Least-Squares SVMs 


We discuss here the class of least-squares SVMs (LS- 
SVMs) obtained by simple modifications to the SVMs 
formulations. The arising problems relate to a number 
of existing methods and entail certain advantages with 
respect to the original SVM formulations. 


a) Primal problem b) Dual problem 


pila) K(x, x1) 
Wi ay 
x x 
va) ya) 
WF QAgsv 
prx) k(x, X#sv) 
Fig. 32.6a,b Primal-dual network interpretations of 


SVMs [32.26]. (a) The number F of hidden units in the 
primal weight space corresponds to the dimensionality of 
the feature space. (b) The number # SV of hidden units 
in the dual weight space corresponds to the number of 
nonzero &’s 


Kernel Methods | 32.3 Primal-—Dual Methods 


LS-SVMs for Classification 
We illustrate the idea with respect to the formulation 
for classification (LSs-SVC). The approach, originally 
proposed in [32.81], considers the primal problem 


N 
1 Y 
min Jp(w,e) = =w! = e 
w,b,E p(w ) 2 eats 2 dX # 
subject to y,(w! (x) +b) =1—e,, 


R= lren N a (32.59) 


This formulation simplifies the primal problem (32.37) 
in two ways. First, the inequality constraints are re- 
placed by equality constraints; the 1’s on the right-hand 
side in the constraints are regarded as target values 
rather than being treated as a threshold. An error €,, is 
allowed so that misclassifications are tolerated in the 
case of overlapping distributions. Secondly, a squared 
loss function is taken for these error variables. The La- 
grangian for (32.59) is 


L£(w, b, €; a, Vv) 


N 
= Jew, e) an (yaw Tn) +)— 1 ben), 
n=l 


(32.60) 


where @’s are Lagrange multipliers that can be positive 
or negative since the problem now only entails equality 
constraints. The KKT conditions for optimality yield 


OL 
ow =0>w= pe OnYnP (Xn) f 
w 
ðL 
a OF Eh, 
ðL 
=0 > yen=a,, n=l,...,N, 
dEn 
£ T 
9 =0 > y, (w (4%) +d)-1l+e,=0, 
An 
n=1,...,N 


(32.61) 


By eliminating the primal variables w and b, one obtains 
the KKT system [32.82] 


k 2 an H g H 


where 


(32.62) 


y= yi y2- yN] 


and 
ly =[1,1,...,1]" 


are N-dimensional vectors and 2 is defined entry-wise 
by 


(Q) nn = YnYnP (xm) Q (Xn) = YmYnk(Xm, Xn) 7 
(32.63) 


In the latter, we used the kernel trick introduced before. 
The dual model obtained corresponds to (32.45) where 
a and b are now obtained solving the linear system 
(32.62), rather than a more complex QP problem, as in 
(32.42). Notably, for LS-SVMs (and related methods) 
one can exploit a number of computational shortcuts 
related to spectral properties of the kernel matrix; for 
instance, one can compute solutions for different values 
of y at the price of computing the solution of a single 
problem, which cannot be done for QPs [32.83-85]. 

The LS-SVC is easily extended to handle multiclass 
problems [32.26]. Extensive comparisons with alterna- 
tive techniques (including SVC) for binary and multi- 
class classification are considered in [32.26, 86]. The re- 
sults show that, in general, LS-SVC either outperforms 
or perform comparably to the alternative techniques. In- 
terestingly, it is clear from the primal problem (32.61) 
that LS-SVC maximizes the margin while minimizing 
the within-class scattering from targets {+1,—1}. As 
such, LS-SVC is naturally related to Fisher discriminant 
analysis in the feature space [32.26]; see also [32.2, 87, 
88]. 


Alternative Formulations 
Besides classification, a primal—dual approach simi- 
lar to the one introduced above has been considered 
for function estimation [32.26]; in this case too the 
dual model representation is obtained by solving a lin- 
ear system of equations rather than a QP problem, 
as in SVR. This approach to function estimation is 
similar to a number of techniques, including smooth- 
ing splines [32.13], regularization networks [32.44, 89], 
kernel ridge regression [32.90], and Kriging [32.91]. 
LS-SVM solutions also share similarities with Gaussian 
processes [32.92, 93], which we discuss in more detail 
in the next section. 

Other formulations have been considered within 
a primal—dual framework. These include principal com- 
ponent analysis [32.94], which we discuss next, spec- 
tral clustering [32.95], canonical correlation analy- 
sis [32.96], dimensionality reduction and data visual- 
ization [32.97], recurrent networks [32.98], and optimal 


591 


EZE | d Hed 


592 PartD | Neural Networks 


EZE | d Hed 


control [32.99]; see also [32.100]. In all these cases the 
estimation problem of interest is conceived at the primal 
level as an optimization problem with equality con- 
straints, rather than inequality constraints as in SVMs. 
The constraints relate to the model which is expressed 
in terms of the feature map. From the KKT optimality 
conditions one jointly finds the optimal model repre- 
sentation and the model estimate. As for the case of 
classification the dual model representation is expressed 
in terms of kernel functions. 


Sparsity and Robustness 
An important difference with SVMs, is that the dual 
model found via LS-SVMs depends upon all the train- 
ing data. Reduction and pruning techniques have been 
used to achieve the sparse representation in a second 
stage [32.26, 101]. A different approach, which makes 
use of the primal-dual setting, leads to fixed-size tech- 
niques [32.26], which relate to the Nyström method 
proposed in [32.102] but lead to estimation in the pri- 
mal setting. Optimized versions of fixed-size LS-SVMs 
are currently applicable to large data sets with millions 
of data points for training and tuning on a personal com- 
puter [32.103]. 

In LS-SVM the estimation of the support values is 
only optimal in the case of a Gaussian distribution of the 
error variables [32.26]; [32.101] shows how to obtain 
robust estimates for regression by applying a weighted 
version of LS-SVM. The approach is suitable in the 
case of outliers or non-Gaussian error distributions with 
heavy tails. 


32.3.5 Kernel Principal Component Analysis 


Principal component analysis (PCA) is one of the 
most important techniques in the class of unsupervised 
learning algorithms. PCA linearly transforms a num- 
ber of possibly correlated variables into uncorrelated 
features called principal components. The transforma- 
tion is performed to find directions of maximal vari- 
ation. Often, few principal components can account 
for most of the structure in the original dataset. PCA 
is not suitable for discovering nonlinear relationships 
among the original variables. To overcome this limita- 
tion [32.104] originally proposed the idea of perform- 
ing PCA in a feature space rather than in the input 
space. 

Regardless of the space where the transformation is 
performed, there is a number of different ways to char- 
acterize the derivation of the PCA problem [32.105]. 
Ultimately, PCA analysis is readily performed by solv- 


ing an eigenvalue problem. Here we consider a primal- 
dual formulation similar to the one introduced above 
for LS-SVC [32.106]. In this way the eigenvalue prob- 
lem is seen to arise from optimality conditions. Notably 
the approach emphasizes the underlying model, which 
is important for finding the projection of out-of-sample 
points along the direction of maximal variation. The 
analysis assumes the knowledge of N training data 
pairs 


ate eee xy} C RP (32.64) 


i.i.d. according to the generator p(x). The starting point 
is to define the generic score variable z as 


z(x) =w! (P(x) — Âe). (32.65) 


The latter represents one projection of ¢(x) — He into 
the target space. Note that we considered data centered 
in the feature space with 


N 
jie= 5 X bn) (32.66) 


n=l 


corresponding to the center of the empirical distribu- 
tion. The primal problem consists of the following 
constrained formulation [32.94] 


N 
1 r y 2 
max —-w w+z> z 
subject to zn =w! (@n)—- Âg), n=1,...,N. 

(32:67) 


where y > 0. The latter maximizes the empirical vari- 
ance of z while keeping the norm of the corresponding 
parameter vector w small by the regularization term. 
One can also include a bias term, see [32.26] for 
a derivation. 

The Lagrangian corresponding to (32.67) is 


N 
1 
L(w,z;œ) =— aw we Z Dz 


n=l 


N 
-J onw! (Cn) — Îo)), 
n=l 


(32.68) 


Kernel Methods | 32.4 Gaussian Processes 


with conditions for optimality given by 


a 0w Lam (b(n) — He), 

w 

o£ 

—=0-a,=Yz, n=1,...,N, 

Zn 

ðL A 

dAn =0 >n =w! ($n) — ha). 
n=1,...,N. 


(32.69) 


By eliminating the primal variables w and z, one obtains 
forn=1,...,N 


1 N 
Ti = 5 Am ($ (Xn) — fie) (O (Xm) a fie) = 0 š 


m=1 


(32.70) 


The latter is an eigenvalue decomposition that can be 
stated in matrix notation as 


Raq —ha g (32.71) 


where A = 1/y and 92, is the centered Gram matrix de- 
fined entry-wise by 


N 
1 
[clam = K(Xn, Xm) = N > k(Xm, xı) 


l=1 


N 
~ ~ XO k(%n.x1) + = 5 X ky. xi) x 
l=1 i 


As before, one may choose any positive definite ker- 
nel; a typical choice corresponds to the Gaussian RBF 


32.4 Gaussian Processes 


So far we have dealt with primal-dual kernel methods; 
regularization was motivated by the SRM principle, 
which achieves generalization by trading off empiri- 
cal risk with the complexity of the model class. This 
is representative of a large number of procedures. How- 
ever, it leaves out an important class of kernel-based 
probabilistic techniques that goes under the name of 
Gaussian processes. In Gaussian processes one uses 


kernel. By solving the eigenvalue problem (32.71) one 
finds N pairs of eigenvalues and eigenvectors 


(Am, œ”), m = l2 N 


Correspondingly, one finds N score variables with dual 
representation 


N 1 N 
m = J an | Kand- 55D ka) 


n=1 i=1 


LA LAA 
= 2 Ht) + M X X kai) i 


i=1j=1 
(32.73) 


in which a” is the eigenvector associated to the eigen- 
value Àm. Note that all eigenvalues are positive and 
real because 2. is symmetric and positive semidef- 
inite; the eigenvectors are mutually orthogonal, i.e., 
(a!) Tov =0, for 14m. Note that when the feature 
map is nonlinear, the number of score variables asso- 
ciated to nonzero eigenvalues might exceed the dimen- 
sionality D of the input space. Typically, one selects 
then the minimal number of score variables that pre- 
serve a certain reconstruction accuracy, see [32.26, 104, 
105]. 

Finally, observe that by the second optimality con- 
dition in (32.61), one has z = A a! for! = 1,2,...,N. 
From this, we obtain that the score variables are empir- 
ically uncorrelated. Indeed, we have for / 4 m 


N 
> Zl (Xn)Zm (Xn) 


n=1 


N 
=} hûn) To” =0. (32.74) 


n=l 


a Bayesian approach to perform inference and learn- 
ing. The main idea goes back at least to the work of 
Wiener [32.107] and Kolmogorov [32.108] on time- 
series analysis. 

As a first step, one poses a probabilistic model 
which serves as a prior. This prior is updated in the light 
of training data so as to obtain a predictive distribution. 
The latter represents a spectrum of possible answers. In 


593 


ze | d Hed 


594 PartD 


Neural Networks 


H'ZE | d Hed 


contrast, in the standard SVM/LS-SVM framework one 
obtains only point-wise estimates. The approach, how- 
ever, is analytically tractable only for a limited number 
of cases of interests. In the following, we summarize the 
main ideas in the context of regression where tractabil- 
ity is ensured by Gaussian posteriors; the interested 
reader is referred to [32.109] for an in-depth review. 


32.4.1 Definition 


A real-valued stochastic process f is a Gaussian process 


(GP) if for every finite set of indices x1, x2, . . . , xy in an 
index set X, the tuple 
fe = (F011). f 02)... fw) (32.75) 


is a multivariate Gaussian random variable taking val- 
ues in R”. Note that the index set X represents the set of 
all possible inputs. This might be a countable set, such 
as N (e.g. a discrete time index) or, more commonly in 
machine learning, the Euclidean space R?. A GP f is 
fully specified by a mean function m : X — R and a co- 
variance function k : X x X — R defined by 


m(x) = Eff], (32.76) 
k(x, x’) = EF — m(a)) FQ’) ma]. (32.77) 


In light of this, one writes 


f~ GP(m,k). (32.78) 
Usually, for notational simplicity one takes the mean 
function to be zero, which we consider here; however, 
this need not to be the case. 

Note that the specification of the covariance func- 
tion implies a distribution over any finite collection of 
random variables obtained sampling the process f at 
given locations. Specifically, we can write for (32.75) 

fe~ N(0,K), (32.79) 
which means that f, follows a multivariate zero-mean 
Gaussian distribution with N x N covariance matrix K 
defined entry-wise by 


[K]nm — k(Xn, Xm) 3 


The typical use of a GP is in a regression context, which 
we consider next. 


32.4.2 GPs for Regression 


In regression one observes a dataset of input—output 
pairs (Xn, Yn), n = 1,2,...,N and wants to make a pre- 
diction at one or more test points. In the following, we 
call y the vector obtained staking the target observa- 
tions and denote by X the collection of input training 
patterns (32.64). In order to carry on the Bayesian infer- 
ence, one needs a model for the generating process. It 
is generally assumed that function values are observed 
in noise, that is, 


Yn =f@n) tEn, n=1,2,...,N. (32.80) 


One further assumes that €, are i.i.d. zero-mean Gaus- 
sian random variables independent of the process f and 
with variance o?. Under these circumstances, the noisy 
observations are Gaussian with mean zero and covari- 
ance function 


C(Xn. Xn) =E myn] = K(Xm, Xn) A OSs , (32.81) 


where the Kronecker delta function „m is 1 if n =m 
and 0 otherwise. 

Suppose now that we are interested in the value fx 
of the process at a single test point x, (the approach 
that we discuss below extends to multiple test points 
in a straightforward manner). By relying on proper- 
ties of Gaussian probabilities, we can readily write the 
joint distribution of the test function value and the noisy 
training observations. This reads 


y K +07ly kx 
Brei" = 


where Jy is the N x N identity matrix, kx = k(Xx, Xx), 
and finally 


(32.82) 


ky = [k(x1, xx), k2, Xe)... Ky, xx)] |. (32.83) 
Prediction with Noisy Observations 


Using the conditioning rule for multivariate Gaussian 
distributions, namely: 


batbhé s- 


yxa N (b+ CTAT|(x—a),B—CTAT'C) 


Kernel Methods | 32.4 Gaussian Processes 


one arrives at the key predictive equation for GP regres- 
sion 


fly, Xx" ~ N (mx, 02), (32.84) 
where 

tt. =k! (K+o7I'y, (32.85) 

o2 = kx — kl (K+ 07D 'k,. (32.86) 


Note that, by letting a = (K +.07J)~'y, one obtains for 
the mean value 


N 
My = X ankn, xx) : 


n=1 


(32.87) 


Up to the bias term b, the latter coincides with the typ- 
ical dual model representation (32.7), in which x plays 
the role of a test point. Therefore, one can see that, in 
the framework of GPs, the covariance function plays the 
same role of the kernel function. The variance oĉ, on 
the other hand, is seen to be obtained from the prior co- 
variance, by subtracting a positive term which accounts 
for the information about the process conveyed by train- 
ing data. 


Weight-Space View 

We have presented GPs through the so-called function 
space view [32.109], which ultimately captures the dis- 
tinctive nature of this class of methodologies. Here we 
illustrate a different view, which allows one to achieve 
three objectives: 1) it is seen that Bayesian linear mod- 
els are a special instance of GPs; 2) the role of Bayes’ 
tule is highlighted; 3) one obtains additional insight on 
the relationship with the feature map and kernel func- 
tion used before within primal—dual techniques. 


Bayesian Regression. The starting point is to charac- 
terize f as a parametric model involving a set of basis 
functions y1, Wo,..., Wr 


Q=] wii) = wl yo). (32.88) 


Note that F might be infinity. For the special case where 
w is the identity mapping, one recognizes in (32.80) 
the standard modeling assumptions for Bayesian linear 
regression analysis. Inference is based on the posterior 


distribution over the weights, computed by Bayes’ rule 


. likelihood x prior 
posterior = 


marginal likelihood’ 
P(X, w)p(w) 
p(wly, X) = POIX, wp(w) 
pOIX) 


where the marginal likelihood (a.k.a. normalizing con- 
stant) is independent of the weights 


(32.89) 


POW = f poix, wpa. (32.90) 


Explicit Feature Space Formulation. To make a pre- 
diction for a test pattern x, we average over all possible 
parameter values with weights corresponding to the 
posterior probability 


P(faly. Xa) = | plwy Dw. 
(32.91) 


One can see that computing the posterior p(wly, X) 
based upon the prior 


p(w) = N (0, Xp) (32.92) 


gives the predictive model 
Faly, X, xa ~ N (Wore) Sway, 


WT Eya) — Wea) T WA WT Eevee), 
(32.93) 


where A = (W | X,Y +0°Iy)! and we denoted by 


Y = [y a), ya), VOn)] 


the feature representation of the training patterns. It is 
not difficult to see that (32.93) is (32.84) in disguise. In 
particular, one has 


k(x,y) = V(x) 3,0). 


The positive definiteness of X, ensures the existence 
and uniqueness of the square root y > If we now de- 
fine 


ox) = I?ve), 


we retrieve the relationship in (32.44). We conclude that 
the kernel function considered in the previous sections 
can be interpreted as the covariance function of a GP. 


(32.94) 


(32.95) 


595 


H'ZE | d Hed 


596 PartD 


Neural Networks 


G'ZE | d Hed 


32.4.3 Bayesian Decision Theory 


Bayesian inference is particularly appealing when pre- 
diction is intended for supporting decisions. In this 
case, one requires a loss function L(firue,fouess) Which 
specifies the penalty obtained by guessing fouess When 
the true value is fue. Note that the predictive distribu- 
tion (32.84) or — equivalently — (32.93) was derived 
without reference to the loss function. This is a ma- 
jor difference with respect to the techniques developed 
within the framework of statistical learning. Indeed, 
in the non-Bayesian framework of penalized empirical 
risk minimization, prediction and loss are entangled; 
one tackles learning in a somewhat more direct way. 
In contrast, in the Bayesian setting there is a clear dis- 
tinction between 1) the model that generated the data 
and 2) capturing the consequences of making guesses. 
In light of this, [32.109] advises one to beware of ar- 
guments like a Gaussian likelihood implies a squared 
error loss. In order to find the point prediction that in- 
curs the minimal expected loss, one can define the merit 


32.5 Model Selection 


Kernel-based models depend upon a number of pa- 
rameters which are determined during training by 
numerical procedures. Still, one or more hyperpa- 
rameters usually need to be tuned by the user. In 
SVC, for instance, one has to fix the value of c. 
The choice of the kernel function, and of the cor- 
responding parameters, also needs to be properly 
addressed. 

In general, performance measures used for model 
selection include k-fold cross-validation, leave-one- 
out (LOO) cross-validation, generalized approximate 
cross-validation (GACV), approximate span bounds, 
VC bounds, and radius-margin bounds. For discus- 
sions and comparisons see [32.111, 112]. Another ap- 
proach found in the literature is kernel-target align- 
ment [32.113]. 


32.5.1 Cross-Validation 


In practice, model selection based on cross-validation 
is usually preferred over generalization error bounds. 
Criticism for cross-validation approaches is related 
to the high computational load involved; [32.114] 
presents an efficient methodology for hyperparame- 
ter tuning and model building using LS-SVMs. The 


function 
R fies ly, X, Xx) 


= | Lofaol (32.96) 


Y, X, Xx) dfx . 


Note that, since the true value fiue is unknown, the 
latter averages with respect to the model’s opinion 
D(f«ly, X,X) on what the truth might be. The corre- 
sponding best guess is 


Jop = arg min R(fouess|Y, X, Xe) - (32.97) 


Souess 


Since p(fx|y, X,x.) is Gaussian and hence symmet- 
ric, fop always coincides with the mean m whenever 
the loss is also symmetric. However, in many practical 
problems such as in critical safety applications, the loss 
can be asymmetrical. In these cases, one must solve the 
optimization problem in (32.97). Similar considerations 
hold for classification, see [32.109]. For an account on 
decision theory see [32.110]. 


approach is based on the closed form LOO cross- 
validation computation for LS-SVMs, only requir- 
ing the same computational cost of one single LS- 
SVM training. Leave-one-out cross-validation-based 
estimates of performance, however, generally exhibit 
a relatively high variance and are, therefore, prone to 
over-fitting. To amend this, [32.115] proposed the use 
of Bayesian regularization at the second level of infer- 
ence. 


32.5.2 Bayesian Inference 
of Hyperparameters 


Many authors have proposed a full Bayesian framework 
for kernel-based algorithms in the spirit of the meth- 
ods developed by MacKay for classical MLPs [32.116- 
118]. In particular, [32.26] discusses the case of LS- 
SVMs. It is shown that, besides leading to tuning 
strategies, the approach allows us to take probabilis- 
tic interpretations of the outputs; [32.109] discusses 
the Bayesian model selection for GPs. In general, the 
Bayesian framework consists of multiple levels of infer- 
ence. The parameters (i. e., with reference to the primal 
model, w and b) are inferred at level 1. Contrary to 
MLPs, this usually entails the solution of a convex op- 


Kernel Methods | 32.6 More on Kernels 


timization problem or even solving a linear system, as 
in LS-SVMs and GPs. The regularization parameter(s) 
and the kernel parameter(s) are inferred at higher levels. 


32.6 More on Kernels 


We have already seen that a kernel arising from an inner 
product can be interpreted as the covariance function 
of a Gaussian process. In this section we further study 
the mathematical properties of different yet equivalent 
notions of kernels. In particular, we will discuss that 
positive definite kernels are reproducing, in a sense that 
we are about to clarify. We will then review a number 
of specialized kernels for learning problems involving 
structured data. 


32.6.1 Positive Definite Kernels 


Denote by X a nonempty index set. A symmetric func- 
tion k: X xX —> R is a positive definite kernel if for 
any N € N and for any tuple (x1, x2,..., xy) € X”, the 
Gram matrix K defined entry-wise by Kym = k(%n,Xm); 
satisfies (note that, by definition, a positive definite ker- 
nel satisfies k(x, x) > 0 for any x € X) 


N N 
al! Ka = > > Kym nm Z 0 Yg € R“. 


n=1m=1 


In particular, suppose F is some Hilbert space (HS) 
(for an elementary introduction see [32.66]) with inner 
product (-,-). Then for any function ¢ : X —> F one has 


N N 
> yi $n), (Xm)) ) nm 


n=1m=1 


N N 
= > > (Anh (Xn), OnP(Xm)) 
n=1m=1 


N 2 
X ong Xn) (32.98) 
n=1 
From the first line one can then see that 
k: (xy) > ($x), 60) (32.99) 


is a positive definite kernel in the sense specified above. 
A continuous positive definite kernel k is often called 
a Mercer kernel [32.36]. 


The method progressively integrates out the parameters 
by using the evidence at a certain level of inference as 
the likelihood at the successive level. 


Note that in Sect. 32.1.3 we denoted (f(x), b(y)) by 
p(x)! $), implicitly making the assumption that the 
feature space F is a finite dimensional Euclidean space. 
However, one can show that the feature space associated 
to certain positive definite kernels (such as the Gaussian 
RBF [32.119]) is an infinite dimensional HS; in turn, 
the inner product in such a space is commonly denoted 


as (+,-+). 
32.6.2 Reproducing Kernels 


Evaluation Functional 
Let (H, (-,-)}) be a HS of real-valued functions (the 
theory of RKHSs generally deals with complex-valued 
functions [32.12, 14]; here we stick to the real setting 
for simplicity) on X equipped with the norm ||f|| = 
af (f,f). For x € X we denote by Ly the evaluation func- 
tional 


L,: HOR, 


fefœ. (32.100) 
L, is said to be bounded if there exists c > 0 such that 
LFI = F| < ellf|| for all f € H. By the Riesz repre- 
sentation theorem [32.120] if Ly is bounded then there 
exists a unique ny € H such that for any f € H 


Lf = (f; nx) - (32.101) 
Reproducing Kernel 
A function 
k:XxX>R, 
(x, y) > k(x, y) (32.102) 


is said to be a reproducing kernel of H if and only if 


Vx EX, kK DEH, 
Wx EX, Vf eH (F, k6) =f@). 


(32.103) 
(32.104) 


Note that by k(-,x) we mean the function k(-, x) : t => 
k(t, x). 


597 


9°ZE | d Hed 


598 PartD | Neural Networks 


9°ZE | d Hed 


The definition of reproducing kernel (r.k.) implies 
that k(-,x) = nx, i.e., k(-,x) is the representer of the 
evaluation functional Ly; (32.104) goes under the name 
of reproducing property. From (32.103) and (32.104) it 
is clear that 

k(x, y) = (k(x), kG, y)), Yx,yE€X; (32.105) 
since (-,-) is symmetric, it follows that k(x, y) = 
k(y,x). A HS of functions that possesses a reproduc- 
ing kernel is called a reproducing kernel Hilbert space 
(RKHS). 

Finally, notice that the reproducing kernel (.k.) of 
a space of a HS of functions corresponds in a one-to-one 
manner with the definition of the inner product (-,-); 
changing the inner product implies a change in the re- 
producing kernel. 


Basic Properties of RKHSs 
Let (G,(-,-)) be a HS of functions. If, for any x, the 
evaluation functional L, is bounded, then it is clear that 
G is a RKHS with reproducing kernel 


K(x, y) = (Nx, Ny) - (32.106) 
Vice versa, if G admits a reproducing kernel k, then all 
evaluation functionals are bounded. Indeed, we have 


FOO] = FC.) < WICE Il 
= Vk(x, x) If], (32.107) 


where we simply relied on the Cauchy—Schwarz in- 
equality. Boundedness of evaluation functionals means 
that all the functions in the space are well defined for 
all x. Note that this is not the case, for instance, of the 
space of square-integrable functions. 

It is not difficult to prove that, ina RKHS H, the 
representation of a bounded linear functional A is sim- 
ply Ak(-,x), i.e., it is obtained by applying A to the 
representer of Ly. As an example, take the functional 
evaluating the derivative of f at x 


D: fe f(x). 
If D, is bounded on H, then the property implies that 
FO=FKC0), EH, 


where k’(-, x) is the derivative of the function k(-, x). 


32.6.3 Equivalence Between 
the Two Notions 


Moore-Aronszajn Theorem 
If we let 


o:X SH, 


x= k(x,-), (32.108) 
one can see that, in light of (32.98), the reproducing 
kernel k is also a positive definite kernel. The converse 
result, stating that a positive definite kernel is the repro- 
ducing kernel of a HS of functions (H, (-,-)), is found 
in the Moore-Aronszajn theorem [32.12]. This com- 
pletes the equivalence between positive definite kernels 
and reproducing kernels. 


Feature Maps and the Mercer Theorem 

Note that (32.108) is a first feature map associated to the 
kernel function k. Correspondingly, this shows that the 
RKHS H is a possible instance of the feature space. 
A different feature map can be given in view of the 
Mercer theorem [32.8, 121], which historically played 
a major role in the development of kernel methods. The 
theorem states that every positive definite kernel can be 
written as 


k(x, y) = D> mieiaei(y) , (32.109) 


i=l 


where the series in the right-hand side of (32.109) con- 
verges absolutely and uniformly, (e;); is an orthonormal 
sequence of eigenfunctions, and (j1;); is the correspond- 
ing sequence of nonnegative eigenvalues such that for 
some measure v 


fre yye(y)dv(y) = wie(x)Vx EX, (32.110) 


f sora) = bj. (32.111) 


The eigenfunctions (e;); belong to the RKHS (H, (-,-)) 
associated to k. In fact, by (32.110) one has 


TE J tedo. (32.112) 


L 


and therefore e; can be approximated by elements 
in the span of (k,)xex [32.36]. One can further see 
that (./j/e;); is an orthonormal basis for H ; indeed one 


Kernel Methods | 32.6 More on Kernels 


has 
(Vien Pje) 
1 
by (110) (= [wwo vine) 


= f SEke aaow 
by a0) f vA by) Vi s 
vli j 


(32.113) 


Ja aw v(y) 


Note that we considered the general case in which 
the expansion (32.109) involves infinitely many terms. 
However, there are positive definite kernels (e.g., the 
polynomial kernel) for which only finitely many eigen- 
values are nonzero. 

In light of the Mercer theorem one can see that a dif- 
ferent feature map is given by 


o:xX—>F, 


x> (/ miei (x)); . (32.114) 


Note that @ maps x into an infinite dimensional vector 
with i-th entry $;(x) = ./[je;(x). 

Connecting Functional 

and Parametric View 
One can see now that X`, w;;(x) in the primal model 
(32.6) corresponds to the evaluation in x of a function 


f ina RKHS. To see this, we start from decomposing f 
according to the orthonormal basis (,/1;e;); 


f= ye Hiei) / Hiei 
(32.115) 


= Dwi Hiei A 


where we let w; = (f, ./Hiei),/Hi. Now one has 


Lf = (f, k(x) (Seve, kG, ») 
= Lowi 
= wie), 


where we applied the reproducing property on e; and 
used the definition of feature map (32.114). Addition- 


J ile: k(-,x)) = 5 Wiy/hiei(x) 


(32.116) 


ally, notice that one has 


II? = Gf) = (x wil Hiei > nvr) 
= = 2D wi (Tiei, 156) 


by(113) Lv ae [vhi 5 
Sy 
i 


(32.117) 


This shows that the penalty w! w, used within the 
primal problems of Sect. 32.3, can be connected to 
the squared norm of a function. The interested reader 
is referred to [32.14,36] for additional properties of 
kernels. 


32.6.4 Kernels for Structured Data 


In applications where data are well represented by vec- 
tors in a Euclidean space one usually uses the Gaussian 
RBF kernel, which is universal [32.122]. Nonetheless, 
there exists an entire set of rules according to which 
one can design new kernels from elementary positive 
definite functions [32.1]. Although the idea of kernels 
has been around for a long time, it was only in the 
1990s that the machine learning community started 
to realize that the index set X does not need to be 
(a subset of) some Euclidean space. This significantly 
improved the applicability of kernel-based algorithms 
to a broad range of data types, including sequence 
data, graphs and trees, and XML and HTML docu- 
ments [32.123]. 


Probabilistic Kernels 

One powerful approach consists of applying a kernel 
that brings generative models into a (possibly discrimi- 
native) kernel-based method [32.124, 125]. Generative 
models can deal naturally with missing data and in 
the case of hidden Markov models can handle se- 
quences of varying length. A popular probabilistic 
similarity measure is the Fisher kernel (32.126, 127]. 
The key intuition behind this approach is that similarly 
structured objects should induce similar log-likelihood 
gradients in the parameters of a predefined class of 
generative models [32.126]. Different instances exist, 
depending on the generative model of interest, see 
also [32.1]. 


599 


9°ZE | d Hed 


600 PartD 


Neural Networks 


“ze | d Hed 


Graph Kernels and Dynamical Systems 

Graphs can very naturally represent entities, their at- 
tributes, and their relationships to other entities; this 
makes them one of the most widely used tools for 
modeling structured data. Various type of kernels for 
graphs have been proposed, see [32.128, 129] and refer- 
ences therein. The approach can be extended to carry on 
recognitions and decisions for tasks involving dynami- 
cal systems; in fact, kernels on dynamical systems are 
related to graph kernels through the dynamics of ran- 
dom walks [32.128, 130]. 


Tensors and Kernels 
Tensors are multidimensional arrays that represent 
higher-order generalizations of vectors and matrices. 
Tensor-based methods are often particularly effective 
in low signal-to-noise ratios and when the number 
of observations is small in comparison with the di- 


32.7 Applications 


Kernel methods have been shown to be successful in 
many different applications. In this section we mention 
only a few examples. 


32.7.1 Text Categorization 


Recognition of objects and handwritten digits is studied 
in [32.135—137]; natural language text categorization is 
discussed in [32.138, 139]. The task consists of classi- 
fying documents based on their content. Attribute value 
representation of text is used to adequately represent the 
document text; typically, each distinct word in a docu- 
ment represents a feature with values corresponding to 
the number of occurrences. 


32.7.2 Time-Series Analysis 


The use of kernel methods for time-series prediction has 
been discussed in a number of papers [32.140-144], 
with applications ranging from electric load forecast- 
ing [32.145] to financial time series prediction [32.146]. 
Nonlinear system identification by LS-SVMs is dis- 
cussed in [32.26] and references therein; [32.134] stud- 


mensionality of the data. They are used in domains 
ranging from neuroscience to vision and chemomet- 
rics, where tensors best capture the multiway nature 
of the data [32.131]. The authors of [32.132] pro- 
posed a family of kernels that exploit the algebraic 
structure of data tensors. The approach is related to 
a generalization of the singular value decomposition 
(SVD) to higher-order tensors [32.133]. The essential 
idea is to measure similarity based upon a Grass- 
mannian distance of the subspaces spanned by ma- 
trix unfolding of data tensors. It can be shown that 
the approach leads to perfect separation of tensors 
generated by different sets of rank-1 atoms [32.132]. 
Within this framework, [32.134] proposed a kernel 
function for multichannel signals; the idea exploits 
the spectral information of tensors of fourth or- 
der cross-cumulants associated to each multichannel 
signal. 


ies the problem of training a discriminative classifier 
given a set of labeled multivariate time series. Applica- 
tions include brain decoding tasks based on magnetoen- 
cephalography (MEG) recordings. 


32.7.3 Bioinformatics and Biomedical 
Applications 


Gene expression analysis performed by SVMs is dis- 
cussed in [32.147]. Applications in metabolomics, ge- 
netics, and proteomics are presented in the tutorial 
paper [32.148]; [32.149] discussed different techniques 
for the integration of side information in models based 
on gene expression data to improve the accuracy of 
diagnosis and prognosis in cancer; [32.150] provides 
an introduction to general data fusion problems using 
SVMs with application to computational biology prob- 
lems. Detection of remote protein homologies by SVMs 
is discussed in [32.151], which combines discrimi- 
native methods with generative models. Bioengineer- 
ing and bioinformatics applications can also be found 
in [32.152-154]. Survival analysis based on primal- 
dual techniques in discussed in [32.155, 156]. 


Kernel Methods 


References 


References 


32.1 


32.2 


32:3 


32.5 


32.6 


32.7 


32.8 


32.9 


32.10 


32.11 


32.12 


32.13 


32.14 


32.15 


32.16 


32.17 


32.18 


J. Shawe-Taylor, N. Cristianini: Kernel Methods 
for Pattern Analysis (Cambridge Univ. Press, Cam- 
bridge 2004) 

B. Scholkopf, A.J. Smola: Learning with Kernels: 
Support Vector Machines, Regularization, Opti- 
mization, Beyond (MIT Press, Cambridge 2002) 
A.J. Smola, B. Schélkopf: A tutorial on support 
vector regression, Stat. Comput. 14(3), 199-222 
(2004) 

T. Hofmann, B. Schdlkopf, A.J. Smola: Kernel 
methods in machine learning, Ann. Stat. 36(3), 
1171-1220 (2008) 

K.R. Müller, S. Mika, G. Ratsch, K. Tsuda, B. Schöl- 
kopf: An introduction to kernel-based learning 
algorithms, IEEE Trans. Neural Netw. 12(2), 181-201 
(2001) 

F. Jakel, B. Sch6lkopf, F.A. Wichmann: A tutorial on 
kernel methods for categorization, J. Math. Psy- 
chol. 51(6), 343-358 (2007) 

C. Campbell: Kernel methods: A survey of cur- 
rent techniques, Neurocomputing 48(1), 63-84 
(2002) 

J. Mercer: Functions of positive and negative type, 
and their connection with the theory of integral 
equations, Philos. Trans. R. Soc. A 209, 415-446 
(1909) 

E.H. Moore: On properly positive Hermitian ma- 
trices, Bull. Am. Math. Soc. 23(59), 66-67 (1916) 
T. Kailath: RKHS approach to detection and es- 
timation problems - |: Deterministic signals in 
Gaussian noise, IEEE Trans. Inf. Theory 17(5), 530- 
549 (1971) 

E. Parzen: An approach to time series analysis, 
Ann. Math. Stat. 32, 951-989 (1961) 

N. Aronszajn: Theory of reproducing kernels, 
Trans. Am. Math. Soc. 68, 337-404 (1950) 

G. Wahba: Spline Models for Observational Data, 
CBMS-NSF Regional Conference Series in Applied 
Mathematics, Vol. 59 (SIAM, Philadelphia 1990) 
A. Berlinet, C. Thomas-Agnan: Reproducing Ker- 
nel Hilbert Spaces in Probability and Statistics 
(Springer, New York 2004) 

S. Saitoh: Integral Transforms, Reproducing Ker- 
nels and Their Applications, Chapman Hall/CRC 
Research Notes in Mathematics, Vol. 369 (Long- 
man, Harlow 1997) 

M. Aizerman, E.M. Braverman, L.I. Rozonoer: The- 
oretical foundations of the potential function 
method in pattern recognition learning, Autom. 
Remote Control 25, 821-837 (1964) 

V. Vapnik: Pattern recognition using generalized 
portrait method, Autom. Remote Control 24, 774- 
780 (1963) 

V. Vapnik, A. Chervonenkis: A note on one class of 
perceptrons, Autom. Remote Control 25(1), 112-120 
(1964) 


32. 


32. 


323 


32. 


32. 


32. 


32. 


32. 


32. 


32. 


32. 


32. 


32. 


32%. 


32. 


32. 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


32 


33 


34 


:35 


V. Vapnik, A. Chervonenkis: Theory of Pattern 
Recognitition (Nauka, Moscow 1974), in Russian, 
German Translation: W. Wapnik, A. Tscherwonen- 
kis, Theorie der Zeichenerkennung (Akademie- 
Verlag, Berlin 1979) 

V. Vapnik: Estimation of Dependences Based on 
Empirical Data (Springer, New York 1982) 

B.E. Boser, I.M. Guyon, V.N. Vapnik: A training al- 
gorithm for optimal margin classifiers, Proc. 5th 
Ann. ACM Workshop Comput. Learn. Theory, ed. 
by D. Haussler (1992) pp. 44—152 

|. Guyon, B. Boser, V. Vapnik: Automatic capacity 
tuning of very large VC-dimension classifiers, Adv. 
Neural Inf. Process. Syst. 5, 147-155 (1993) 

|. Guyon, V. Vapnik, B. Boser, L. Bottou, S.A. Solla: 
Structural risk minimization for character recog- 
nition, Adv. Neural Inf. Process. Syst. 4, 471-479 
(1992) 

C. Cortes, V. Vapnik: Support vector networks, 
Mach. Learn. 20, 273-297 (1995) 

V. Vapnik: The Nature of Statistical Learning The- 
ory (Springer, New York 1995) 

J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De 
Moor, J. Vandewalle: Least squares support vector 
machines (World Scientific, Singapore 2002) 

0. Chapelle, B. Schölkopf, A. Zien: Semi-Super- 
vised Learning (MIT Press, Cambridge 2006) 

M. Belkin, P. Niyogi: Semi-supervised learning on 
Riemannian manifolds, Mach. Learn. 56(1), 209- 
239 (2004) 

M. Belkin, P. Niyogi, V. Sindhwani: Manifold reg- 
ularization: A geometric framework for learning 
from labeled and unlabeled examples, J. Mach. 
Learn. Res. 7, 2399-2434 (2006) 

M. Belkin, P. Niyogi: Laplacian eigenmaps for di- 
mensionality reduction and data representation, 
Neural Comput. 15(6), 1373-1396 (2003) 

V. Sindhwani, P. Niyogi, M. Belkin: Beyond the 
point cloud: From transductive to semi-super- 
vised learning, Int. Conf. Mach. Learn. (ICML), Vol. 
22 (2005) pp. 824-831 

V. Vapnik, A. Chervonenkis: The necessary and 
sufficient conditions for consistency in the empir- 
ical risk minimization method, Pattern Recognit. 
Image Anal. 1(3), 283-305 (1991) 

V. Vapnik, A. Chervonenkis: Uniform convergence 
of frequencies of occurrence of events to their 
probabilities, Dokl. Akad. Nauk SSSR 181, 915-918 
(1968) 

V. Vapnik, A. Chervonenkis: On the uniform con- 
vergence of relative frequencies of events to their 
probabilities, Theory Probab. Appl. 16(2), 264-280 
(1971) 

0. Bousquet, S. Boucheron, G. Lugosi: Introduc- 
tion to statistical learning theory, Lect. Notes 
Comput. Sci. 3176, 169-207 (2004) 


601 


ZE | d Hed 


602 


ZE | d Hed 


Part D 


Neural Networks 


32.36 


32.37 


32.38 


32539 


32.40 


32.41 


32.42 


32.43 


32.44 


32.45 


32.46 


32.47 


32.48 


32.49 


32.50 


32.51 


32.52 


32.53 


32.54 


32.55 


F. Cucker, D.X. Zhou: Learning Theory: An Ap- 
proximation Theory Viewpoint, Cambridge Mono- 
graphs on Applied and Computational Mathe- 
matics (Cambridge Univ. Press, New York 2007) 

l. Steinwart, A. Christmann: Support Vector 
Machines, Information Science and Statistics 
(Springer, New York 2008) 

V. Vapnik: Transductive inference and semi- 
supervised learning. In: Semi-Supervised Learn- 
ing, ed. by 0. Chapelle, B. Schdlkopf, A. Zien (MIT 
Press, Cambridge 2006) pp. 453-472 

A.N. Tikhonov: On the stability of inverse prob- 
lems, Dokl. Akad. Nauk SSSR 39, 195-198 (1943) 
A.N. Tikhonov: Solution of incorrectly formulated 
problems and the regularization method, Sov. 
Math. Dokl. 5, 1035 (1963) 

A.N. Tikhonov, VY. Arsenin: Solutions of Ill-posed 
Problems (W.H. Winston, Washington 1977) 

J. Hadamard: Sur les problemes aux dérivées 
partielles et leur signification physique, Princet. 
Univ. Bull. 13, 49-52 (1902) 

G. Kimeldorf, G. Wahba: Some results on Tcheby- 
cheffian spline functions, J. Math. Anal. Appl. 33, 
82-95 (1971) 

T. Evgeniou, M. Pontil, T. Poggio: Regulariza- 
tion networks and support vector machines, Adv. 
Comput. Math. 13(1), 1-50 (2000) 

B. Scholkopf, R. Herbrich, A.J. Smola: A gen- 
eralized representer theorem, Proc. Ann. Conf. 
Comput. Learn. Theory (COLT) (2001) pp. 416-426 
F. Dinuzzo, B. Schdlkopf: The representer theorem 
for Hilbert spaces: A necessary and sufficient con- 
dition, Adv. Neural Inf. Process. Syst. 25, 189-196 
(2012) 

S.P. Boyd, L. Vandenberghe: Convex Optimization 
(Cambridge Univ. Press, Cambridge 2004) 

A.E. Hoerl, R.W. Kennard: Ridge regression: Bi- 
ased estimation for nonorthogonal problems, 
Technometrics 12(1), 55-67 (1970) 

D.W. Marquardt: Generalized inverses, ridge re- 
gression, biased linear estimation, and nonlinear 
estimation, Technometrics 12(3), 591-612 (1970) 

C. Gu: Smoothing Spline ANOVA Models (Springer, 
New York 2002) 

D.P. Bertsekas: Nonlinear Programming (Athena 
Scientific, Belmont 1995) 

R. Tibshirani: Regression shrinkage and selection 
via the LASSO, J. R. Stat. Soc. Ser. B 58(1), 267-288 
(1996) 

P. Zhao, G. Rocha, B. Yu: The composite abso- 
lute penalties family for grouped and hierarchi- 
cal variable selection, Ann. Stat. 37, 3468-3497 
(2009) 

R. Jenatton, J.Y. Audibert, F. Bach: Structured 
variable selection with sparsity-inducing norms, 
J. Mach. Learn. Res. 12, 2777-2824 (2011) 

M. Yuan, Y. Lin: Model selection and estimation in 
regression with grouped variables, J. R. Stat. Soc. 
Ser. B 68(1), 49-67 (2006) 


32.56 


32.57 


32.58 


32:59 


32.60 


32.61 


32.62 


32.63 


32.64 


32.65 


32.66 


32.67 


32.68 


32.69 


32.70 


32.71 


32.72 


32.73 


C.A. Micchelli, M. Pontil: Learning the Kernel 
Function via Regularization, J. Mach. Learn. Res. 
6, 1099-1125 (2005) 

C.A. Micchelli, M. Pontil: Feature space perspec- 
tives for learning the kernel, Mach. Learn. 66(2), 
297-319 (2007) 

F.R. Bach, G.R.G. Lanckriet, M.I. Jordan: Multiple 
kernel learning, conic duality, and the SMO al- 
gorithm, Proc. 21st Int. Conf. Mach. Learn. (ICML) 
(ACM, New York 2004) 

G.R.G. Lanckriet, T. De Bie, N. Cristianini, M.I. Jor- 
dan, W.S. Noble: A statistical framework for ge- 
nomic data fusion, Bioinformatics 20(16), 2626- 
2635 (2004) 

F.R. Bach, R. Thibaux, M.I. Jordan: Computing 
regularization paths for learning multiple kernels, 
Adv. Neural Inf. Process. Syst. 17, 41-48 (2004) 

J. Baxter: Theoretical models of learning to learn. 
In: Learning to Learn, ed. by L. Pratt, S. Thrun 
(Springer, New York 1997) pp. 71-94 

R. Caruana: Multitask learning. In: Learning to 
Learn, ed. by S. Thrun, L. Pratt (Springer, New York 
1998) pp. 95-133 

S. Thrun: Life-long learning algorithms. In: 
Learning to Learn, ed. by S. Thrun, L. Pratt 
(Springer, New York 1998) pp. 181-209 

A. Argyriou, T. Evgeniou, M. Pontil: Multi-task fea- 
ture learning, Adv. Neural Inf. Process. Syst. 19, 
41-48 (2007) 

A. Argyriou, T. Evgeniou, M. Pontil: Convex multi- 
task feature learning, Mach. Learn. 73(3), 243-272 
(2008) 

L. Debnath, P. Mikusinski: Hilbert Spaces with Ap- 
plication (Elsevier, San Diego 2005) 

M. Fazel: Matrix Rank Minimization with Appli- 
cation, Ph.D. Thesis (Stanford University, Stanford 
2002) 

Z. Liu, L. Vandenberghe: Semidefinite program- 
ming methods for system realization and identi- 
fication, Proc. 48th IEEE Conf. Decis. Control (CDC) 
(2009) pp. 4676-4681 

Z. Liu, L. Vandenberghe: Interior-point method 
for nuclear norm approximation with application 
to system identification, SIAM J. Matrix Anal. Appl. 
31(3), 1235-1256 (2009) 

M. Signoretto, J.A.K. Suykens: Convex estimation 
of cointegrated var models by a nuclear norm 
penalty, Proc. 16th IFAC Symp. Syst. Identif. (SYSID) 
(2012) 

A. Argyriou, C.A. Micchelli, M. Pontil: On spectral 
learning, J. Mach. Learn. Res. 11, 935-953 (2010) 
J. Abernethy, F. Bach, T. Evgeniou, J.P. Vert: Anew 
approach to collaborative filtering: Operator es- 
timation with spectral regularization, J. Mach. 
Learn. Res. 10, 803-826 (2009) 

P.L. Bartlett, S. Mendelson: Rademacher and 
Gaussian complexities: Risk bounds and struc- 
tural results, J. Mach. Learn. Res. 3, 463-482 
(2003) 


Kernel Methods | References 603 


32.74 


32.75 


32.76 


32.77 


32.78 


32.79 


32.80 


32.81 


32.82 


32.83 


32.84 


32.85 


32.86 


32.87 


32.88 


32.89 


32.90 


P.K. Shivaswamy, T. Jebara: Maximum rela- 
tive margin and data-dependent regularization, 
J. Mach. Learn. Res. 11, 747-788 (2010) 

P.K. Shivaswamy, T. Jebara: Relative margin ma- 
chines, Adv. Neural Inf. Process. Syst. 21(1-8), 7 
(2008) 

B. Schölkopf, A.J. Smola, R.C. Williamson, 
P.L. Bartlett: New support vector algorithms, 
Neural Comput. 12(5), 1207-1245 (2000) 

J. Platt: Fast training of support vector machines 
using sequential minimal optimization. In: Ad- 
vances in Kernel Methods - Support Vector Learn- 
ing, ed. by B. Scholkopf, C.J.C. Burges, A.J. Smola 
(MIT Press, Cambridge 1999) pp. 185-208 

C.C. Chang, C.J. Lin: LIBSVM: a library for support 
vector machines, ACM Trans. Intell. Syst. Technol. 
2(3), 27 (2011) 

R.E. Fan, P.H. Chen, C.J. Lin: Working set selec- 
tion using second order information for training 
support vector machines, J. Mach. Learn. Res. 6, 
1889-1918 (2005) 

T. Joachims: Making large-scale SVM learning 
practical. In: Advance in Kernel Methods - 
Support Vector Learning, ed. by B. Schdlkopf, 
C.J.C. Burges, A.J. Smola (MIT Press, Cambridge 
1999) pp. 169-184 

J.A.K. Suykens, J. Vandewalle: Least squares sup- 
port vector machine classifiers, Neural Process. 
Lett. 9(3), 293-300 (1999) 

J. Nocedal, S.J. Wright: Numerical Optimization 
(Springer, New York 1999) 

K. Pelckmans, J. De Brabanter, J.A.K. Suykens, 
B. De Moor: The differogram: Non-parametric 
noise variance estimation and its use for model 
selection, Neurocomputing 69(1), 100-122 (2005) 
K. Saadi, G.C. Cawley, N.L.C. Talbot: Fast exact 
leave-one-out cross-validation of least-square 
support vector machines, Eur. Symp. Artif. Neu- 
ral Netw. (ESANN-2002) (2002) 

R.M. Rifkin, R.A. Lippert: Notes on regularized 
least squares, Tech. Rep. MIT-CSAIL-TR-2007-025, 
CBCL-268 (2007) 

T. Van Gestel, J.A.K. Suykens, B. Baesens, S. Vi- 
aene, J. Vanthienen, G. Dedene, B. De Moor, 
J. Vandewalle: Benchmarking least squares sup- 
port vector machine classifiers, Mach. Learn. 
54(1), 5-32 (2004) 

G. Baudat, F. Anouar: Generalized discriminant 
analysis using a kernel approach, Neural Comput. 
12(10), 2385-2404 (2000) 

S. Mika, G. Ratsch, J. Weston, B. Schdlkopf, 
K.R. Müllers: Fisher discriminant analysis with 
kernels, Proc. 1999 IEEE Signal Process. Soc. Work- 
shop (1999) pp. 41-48 

T. Poggio, F. Girosi: Networks for approxima- 
tion and learning, Proc. IEEE 78(9), 1481-1497 
(1990) 

C. Saunders, A. Gammerman, V. Vovk: Ridge re- 
gression learning algorithm in dual variables, 


32.91 


32.92 


32.93 


32.94 


32.95 


32.96 


32.97 


32.98 


32.99 


32.100 


32.101 


32.102 


32.103 


32.104 


32.105 


32.106 


Int. Conf. Mach. Learn. (ICML) (1998) pp. 515- 
521 

N. Cressie: The origins of kriging, Math. Geol. 
22(3), 239-252 (1990) 

D.J.C. MacKay: Introduction to Gaussian pro- 
cesses, NATO ASI Ser. F Comput. Syst. Sci. 168, 
133-166 (1998) 

C.K.1. Williams, C.E. Rasmussen: Gaussian pro- 
cesses for regression, Advances in Neural Infor- 
mation Processing Systems, Vol.8 (MIT Press, Cam- 
bridge 1996) pp. 514-520 

J.A.K. Suykens, T. Van Gestel, J. Vandewalle, B. De 
Moor: A support vector machine formulation to 
pca analysis and its kernel version, IEEE Trans. 
Neural Netw. 14(2), 447-450 (2003) 

C. Alzate, J.A.K. Suykens: Multiway spectral clus- 
tering with out-of-sample extensions through 
weighted kernel PCA, IEEE Trans. Pattern Anal. 
Mach. Intell. 32(2), 335-347 (2010) 

T. Van Gestel, J.A.K. Suykens, J. De Brabanter, B. De 
Moor, J. Vandewalle: Kernel canonical correlation 
analysis and least squares support vector ma- 
chines, Lect. Notes Comput. Sci. 2130, 384-389 
(2001) 

J.A.K. Suykens: Data visualization and dimen- 
sionality reduction using kernel maps with a ref- 
erence point, IEEE Trans. Neural Netw. 19(9), 1501- 
1517 (2008) 

J.A.K. Suykens, J. Vandewalle: Recurrent least 
squares support vector machines, IEEE Trans. Cir- 
cuits Syst. I: Fundam. Theory Appl. 47(7), 109-1114 
(2000) 

J.A.K. Suykens, J. Vandewalle, B. De Moor: Optimal 
control by least squares support vector machines, 
Neural Netw. 14(1), 23-35 (2001) 

J.A.K. Suykens, C. Alzate, K. Pelckmans: Primal 
and dual model representations in kernel-based 
learning, Stat. Surv. &, 148-183 (2010) 

J.A.K. Suykens, J. De Brabanter, L. Lukas, J. Van- 
dewalle: Weighted least squares support vector 
machines: Robustness and sparse approximation, 
Neurocomputing 48(1), 85-105 (2002) 

C.K.1. Williams, M. Seeger: Using the Nystrom 
method to speed up kernel machines, Adv. Neural 
Inf. Process. Syst. 15, 682-688 (2001) 

K. De Brabanter, J. De Brabanter, J.A.K. Suykens, 
B. De Moor: Optimized fixed-size kernel models 
for large data sets, Comput. Stat. Data Anal. 54(6), 
1484-1504 (2010) 

B. Schdlkopf, A. Smola, K.-R. Miller: Nonlinear 
component analysis as a kernel eigenvalue prob- 
lem, Neural Comput. 10, 1299-1319 (1998) 

|. Jolliffe: Principle Component Analysis. In: Ency- 
clopedia of Statistics in Behavioral Science, (Wiley, 
Chichester 2005) 

J.A.K. Suykens, T. Van Gestel, J. Vandewalle, B. De 
Moor: A support vector machine formulation to 
PCA analysis and its kernel version, IEEE Trans. 
Neural Netw. 14(2), 447-450 (2003) 


ZE | d Hed 


604 PartD 


Neural Networks 


ZE | d Hed 


32.107 


32.108 


32.109 


32.110 


32.111 


32.112 


32.113 


32.114 


32.115 


32.116 


32.117 


32.118 


32.119 


32.120 


32.121 


32.122 


32.123 


32.124 


32.125 


32.126 


N. Weiner: Extrapolation, Interpolation, Smooth- 
ing of Stationary Time Series with Engineering 
Applications (MIT Press, Cambridge 1949) 

A.N. Kolmogorov: Sur I'interpolation et extrapo- 
lation des suites stationnaires, CR Acad. Sci. 208, 
2043-2045 (1939) 

C.E. Rasmussen, C.K.I. Williams: Gaussian Pro- 
cesses for Machine Learning, Vol. 1 (MIT Press, 
Cambridge 2006) 

J.0. Berger: Statistical Decision Theory and 
Bayesian Analysis (Springer, New York 1985) 

K. Duan, S.S. Keerthi, A.N. Poo: Evaluation of 
simple performance measures for tuning svm hy- 
perparameters, Neurocomputing 51, 41-59 (2003) 
P.L. Bartlett, S. Boucheron, G. Lugosi: Model se- 
lection and error estimation, Mach. Learn. 48(1), 
85-113 (2002) 

N. Shawe-Taylor, A. Kandola: On kernel target 
alignment, Adv. Neural Inf. Process. Syst. 14(1), 
367-373 (2002) 

G.C. Cawley: Leave-one-out cross-validation 
based model selection criteria for weighted 
LS-SVMS, Int. Joint Conf. Neural Netw. (CNN) 
(2006) pp. 1661-1668 

G.C. Cawley, N.L.C. Talbot: Preventing over-fitting 
during model selection via Bayesian regularisa- 
tion of the hyper-parameters, J. Mach. Learn. Res. 
8, 841-861 (2007) 

D.J.C. MacKay: Bayesian interpolation, Neural 
Comput. &, 415-447 (1992) 

D.J.C. MacKay: The evidence framework applied 
to classification networks, Neural Comput. 4&(5), 
720-736 (1992) 

D.J.C. MacKay: Probable networks and plausi- 
ble predictions — A review of practical Bayesian 
methods for supervised neural networks, Netw. 
Comput. Neural Syst. 6(3), 469-505 (1995) 

l. Steinwart, D. Hush, C. Scovel: An explicit de- 
scription of the reproducing kernel Hilbert spaces 
of Gaussian RBF kernels, IEEE Trans. Inform. The- 
ory 52, 4635-4643 (2006) 

J.B. Conway: A Course in Functional Analysis 
(Springer, New York 1990) 

F. Riesz, B.S. Nagy: Functional Analysis (Frederick 
Ungar, New York 1955) 

|. Steinwart: On the influence of the kernel on the 
consistency of support vector machines, J. Mach. 
Learn. Res. 2, 67-93 (2002) 

T. Gärtner: Kernels for Structured Data, Ma- 
chine Perception and Artificial Intelligence, Vol. 72 
(World Scientific, Singapore 2008) 

D. Haussler: Convolution kernels on discrete struc- 
tures, Tech. Rep. (UC Santa Cruz, Santa Cruz 1999) 
T. Jebara, R. Kondor, A. Howard: Probability prod- 
uct kernels, J. Mach. Learn. Res. 5, 819-844 
(2004) 

T.S. Jaakkola, D. Haussler: Exploiting generative 
models in discriminative classifiers, Adv. Neural 
Inf. Process. Syst. 11, 487-493 (1999) 


32.127 


32.128 


32.129 


32.130 


32.131 


32.132 


32.133 


32.134 


32.135 


32.136 


32.137 


32.138 


32.139 


32.140 


32.141 


32.142 


K. Tsuda, S. Akaho, M. Kawanabe, K.R. Miller: 
Asymptotic properties of the Fisher kernel, Neural 
Comput. 16(1), 115-137 (2004) 

S.V.N. Vishwanathan, N.N. Schraudolph, R. Kon- 
dor, K.M. Borgwardt: Graph kernels, J. Mach. 
Learn. Res. 11, 1201-1242 (2010) 

T. Gärtner, P. Flach, S. Wrobel: On graph kernels: 
Hardness results and efficient alternatives, Lect. 
Notes Comput. Sci. 2777, 129-143 (2003) 

S.V.N. Vishwanathan, A.J. Smola, R. Vidal: Binet- 
Cauchy kernels on dynamical systems and its 
application to the analysis of dynamic scenes, Int. 
J. Comput. Vis. 73(1), 95-119 (2007) 

P.M. Kroonenberg: Applied Multiway Data Analy- 
sis (Wiley, Hoboken 2008) 

M. Signoretto, L. De Lathauwer, J.A.K. Suykens: 
A kernel-based framework to tensorial data anal- 
ysis, Neural Netw. 24(8), 861-874 (2011) 

L. De Lathauwer, B. De Moor, J. Vandewalle: 
A multilinear singular value decomposition, SIAM 
J. Matrix Anal. Appl. 21(4), 1253-1278 (2000) 

M. Signoretto, E. Olivetti, L. De Lathauwer, 
J.A.K. Suykens: Classification of multichannel sig- 
nals with cumulant-based kernels, IEEE Trans. 
Signal Process. 60(5), 2304-2314 (2012) 

Y. LeCun, L.D. Jackel, L. Bottou, A. Brunot, 
C. Cortes, J.S. Denker, H. Drucker, |. Guyon, 
U.A. Muller, E. Sackinger, P. Simard, V. Vapnik: 
Comparison of learning algorithms for handwrit- 
ten digit recognition, Int. Conf. Artif. Neural Netw. 
(ICANN) 2 (1995) pp. 53-60 

D. Decoste, B. Schdlkopf: Training invariant sup- 
port vector machines, Mach. Learn. 46(1), 161-190 
(2002) 

V. Blanz, B. Schdlkopf, H. Bulthoff, C. Burges, 
V. Vapnik, T. Vetter: Comparison of view-based 
object recognition algorithms using realistic 3D 
models, Lect. Notes Comput. Sci. 1112, 251-256 
(1996) 

T. Joachims: Text categorization with support vec- 
tor machines: Learning with many relevant fea- 
tures, Lect. Notes Comput. Sci. 1398, 137-142 (1998) 
S. Dumais, J. Platt, D. Heckerman, M. Sahami: In- 
ductive learning algorithms and representations 
for text categorization, Proc. 7th Int. Conf. Inf. 
Knowl. Manag. (1998) pp. 148-155 

S. Mukherjee, E. Osuna, F. Girosi: Nonlinear pre- 
diction of chaotic time series using support vector 
machines, 1997 IEEE Workshop Neural Netw. Sig- 
nal Process. VII (1997) pp. 511-520 

D. Mattera, S. Haykin: Support vector machines for 
dynamic reconstruction of a chaotic system. In: 
Advances in Kernel Methods, ed. by B. Schölkopf, 
C.J.C. Burges, A.J. Smola (MIT Press, Cambridge 
1999) pp. 211-241 

K.R. Müller, A. Smola, G. Rätsch, B. Schölkopf, 
J. Kohlmorgen, V. Vapnik: Predicting time series 
with support vector machines, Lect. Notes Com- 
put. Sci. 1327, 999-1004 (1997) 


Kernel Methods 


References 


32.143 


32.144 


32.145 


32.146 


32.147 


32.148 


32.149 


M. Espinoza, J.A.K. Suykens, B. De Moor: Short 
term chaotic time series prediction using sym- 
metric Is-svm regression, Proc. 2005 Int. Symp. 
Nonlinear Theory Appl. (NOLTA) (2005) pp. 606- 
609 

M. Espinoza, T. Falck, J.A.K. Suykens, B. De Moor: 
Time series prediction using Is-svms, Eur. Symp. 
Time Ser. Prediction (ESTSP), Vol. 8 (2008) pp. 159- 
168 

M. Espinoza, J.A.K. Suykens, R. Belmans, B. De 
Moor: Electric load forecasting, IEEE Control Syst. 
27(5), 43-57 (2007) 

T. Van Gestel, J.A.K. Suykens, D.E. Baestaens, 
A. Lambrechts, G. Lanckriet, B. Vandaele, B. De 
Moor, J. Vandewalle: Financial time series predic- 
tion using least squares support vector machines 
within the evidence framework, IEEE Trans. Neu- 
ral Netw. 12(4), 809-821 (2001) 

M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristian- 
ini, C.W. Sugnet, T.S. Furey, M. Ares, D. Haussler: 
Knowledge-based analysis of microarray gene 
expression data by using support vector ma- 
chines, Proc. Natl. Acad. Sci. USA 97(1), 262-267 
(2000) 

J. Luts, F. Ojeda, R. Van de Plas, B. De Moor, S. Van 
Huffel, J.A.K. Suykens: A tutorial on support vector 
machine-based methods for classification prob- 
lems in chemometrics, Anal. Chim. Acta 665(2), 
129 (2010) 

A. Daemen, M. Signoretto, 0. Gevaert, J.A.K. Suy- 
kens, B. De Moor: Improved microarray-based 
decision support with graph encoded interac- 
tome data, PLoS ONE 5(4), 1-16 (2010) 


32.150 


32.151 


32.152 


32.153 


32.154 


32.155 


32.156 


S. Yu, L.C. Tranchevent, B. Moor, Y. Moreau: 
Kernel-based Data Fusion for Machine Learning, 
Studies in Computational Intelligence, Vol. 345 
(Springer, Berlin 2011) 

T. Jaakkola, M. Diekhans, D. Haussler: A dis- 
criminative framework for detecting remote pro- 
tein homologies, J. Comput. Biol. 7(1/2), 95-114 
(2000) 

C. Lu, T. Van Gestel, J.A.K. Suykens, S. Van Huffel, 
D. Timmerman, |. Vergote: Classification of ovar- 
ian tumors using Bayesian least squares support 
vector machines, Lect. Notes Artif. Intell. 2780, 
219-228 (2003) 

F. Ojeda, M. Signoretto, R. Van de Plas, 
E. Waelkens, B. De Moor, J.A.K. Suykens: Semi- 
supervised learning of sparse linear models 
in mass spectral imaging, Pattern Recognit. 
Bioinform. (PRIB) (Nijgmegen) (2010) pp. 325-334 
D. Widjaja, C. Varon, A.C. Dorado, J.A.K. Suykens, 
S. Van Huffel: Application of kernel principal com- 
ponent analysis for single lead ECG-derived res- 
piration, IEEE Trans. Biomed. Eng. 59(4), 1169-1176 
(2012) 

V. Van Belle, K. Pelckmans, S. Van Huffel, 
J.A.K. Suykens: Support vector methods for sur- 
vival analysis: A comparison between ranking 
and regression approaches, Artif. Intell. Med. 
53(2), 107-118 (2011) 

V. Van Belle, K. Pelckmans, S. Van Huffel, 
J.A.K. Suykens: Improved performance on 
high-dimensional survival data by application 
of survival-SVM, Bioinformatics 27(1), 87-94 
(2011) 


605 


ZE | d Hed 


Robert Kozma, Jun Wang, Zhigang Zeng 


This chapter introduces basic concepts, phenom- 
ena, and properties of neurodynamic systems. 
it consists of four sections with the first two 
on various neurodynamic behaviors of gen- 
eral neurodynamics and the last two on two 
types of specific neurodynamic systems. The 
neurodynamic behaviors discussed in the first 
two sections include attractivity, oscillation, 
synchronization, and chaos. The two specific 
neurodynamics systems are memrisitve neuro- 
dynamic systems and neurodynamic optimization 
systems. 


33.1 Dynamics of Attractor 


and Analog Networls.......................... 607 
33.1.1 Phase Space and Attractors......... 608 
33.1.2 Single Attractors 

of Dynamical Systems ................ 609 


33. Neurodynamics 


33.1.3 Multiple Attractors 


of Dynamical Systems ................ 610 
SS Ae. COCOM rnea 611 
33.2 Synchrony, Oscillations, 
and Chaos in Neural Networks.............. 611 
F220 “SYMCMPOMIZALON iersinii 611 
33.2.2 Oscillations in Neural Networks .. 616 
33.2.3 Chaotic Neural Networks............ 623 
33.3 Memristive Neurodynamics .................. 629 
33.3.1 Memristor-Based Synapses......... 630 
33.3.2 Memristor-Based 
Neural Networks... 632 
3325.5  COTGMUSION, «4.0553 sccssrscadedssancestaca 634 
33.4 Neurodynamic Optimization................. 634 
33.4.1 Neurodynamic Models ............... 635 
33.4.2 Design Methods ....................008 636 
33.4.3 Selected Applications................. 638 
33.4.4 Concluding Remarks.................. 638 
ROETEFCRICES oo. ccc.cccscsccesscsesaceeseseescaeesescsesaanes 639 


33.1 Dynamics of Attractor and Analog Networks 


An attractor, as a well-known mathematical object, 
is central to the field of nonlinear dynamical sys- 
tems (NDS) theory, which is one of the indispensable 
conceptual underpinnings of complexity science. An 
attractor is a set towards which a variable moves ac- 
cording to the dictates of a nonlinear dynamical system, 
evolves over time, such that points get close enough 
to the attractor, and remain close even if they are 
slightly disturbed. To well appreciate what an attractor 
is, some corresponding NDS notions, such as phase or 
state space, phase portraits, basins of attractions, initial 
conditions, transients, bifurcations, chaos, and strange 
attractors are needed to tame some of the unruliness of 
complex systems. 

Most of us have at least some inkling of what non- 
linear means, which can be illustrated by the most 
well-known and vivid example of the butterfly effect 
of a chaotic system that is nonlinear. It has prompted 


the use of the image of tiny air currents produced 
by a butterfly flapping its wing in Brazil, which are 
then amplified to the extent that they may influence 
the building up of a thunderhead in Kansas. Although 
no one can actually claim that there is such a linkage 
between Brazilian lepidopterological dynamics and cli- 
matology in the Midwest of the USA, it does serve to 
vividly portray nonlinearity in the extreme. 

As the existence of both the nonlinearity and 
the capacity in passing through different regimes of 
stability and instability, the outcomes of the nonlin- 
ear dynamical system are unpredictable. These dif- 
ferent regimes of a dynamical system are under- 
stood as different phases governed by different at- 
tractors, which means that the dynamics of each 
phase of a dynamical system are constrained within 
the circumscribed range allowable by that phase’s 
attractors. 


607 


608 PartD | Neural Networks 


lee | d Hed 


33.1.1 Phase Space and Attractors 


To better grasp the idea of phase space, a time se- 
ries and phase portrait have been used to represent the 
data points. Time series display changes in the val- 
ues of variables on the y-axis (or the z-axis), and time 
on the x-axis as in a time series chart, however, the 
phase portrait plots the variables against each other 
and leaves time as an implicit dimension not explicitly 
plotted. Attractors can be displayed by phase portraits 
as the long-term stable sets of points of the dynami- 
cal system. This means that the locations in the phase 
portrait towards which the system’s dynamics are at- 
tracted after transient phenomena have died down. To 
illustrate phase space and attractors, two examples are 
employed. 


Frictionless pivot 


Amplitude 


Massless rod 


Imagine a child on a swing and a parent pulling 
the swing back. This gives a good push to make the 
child move forward. When the child is not moving for- 
ward, he will move backward on the swing as shown 
in Fig. 33.1. The unpushed swing will come to rest 
as shown in the times series chart and phase space. 
The time series show an oscillation of the speed of the 
swing, which slows down and eventually stops, that is 
its flat lines. In phase space, the swing’s speed is plotted 
against the distance of the swing from the central point 
called a fixed point attractor since it attracts the sys- 
tem’s dynamics in the long run. The fixed point attractor 
in the center of Fig. 33.2 is equivalent to the flat line in 
Fig. 33.3. The fixed point attractor is another way to see 
and say that an unpushed swing will come to a state of 
rest in the long term. The curved lines with arrows spi- 
raling down to the center point in Fig. 33.2 display what 
is called the basin of attraction for the unpushed swing. 
These basins of attraction represent various initial con- 
ditions for the unpushed swing, such as starting heights 
and initial velocities. 


34 
2 
Bob's ~~ ane a eee. ae 1 
trajectory Bias Massive bob 
Equilibrium 0 
position 
-1 
Fig. 33.1 Schematics of an unpushed swing J 
3 > 
v2 0 1 2 3 4 5 6 7 8 
15 t 
i Fig. 33.3 Time series of the unpushed swing 
0.5 v 
2 
0 IES) 
1 
-0.5 0.5 
0 
=i -0.5 
-1 
113) a ES) 
-1.5 -1 —0.5 0 0.5 1 1S 2 , 


vi 


Fig. 33.2 Phase portrait and fixed point attractor of an un- 
pushed swing 


Fig. 33.4 Time series chart of the pushed swing 


Neurodynamics | 33.1 Dynamics of Attractor and Analog Networks 609 


Now consider another type of a similar dynami- 
cal system, this time the swing is pushed each time it 
comes back to where the parent is standing. The time 
series chart of the pushed swing is shown in Fig. 33.4 
as a continuing oscillation. This oscillation is around 
a zero value for y and is positive when the swing is 
going in one direction and negative when the swing is 
going in the other direction. As a phase space diagram, 
the states of variables against each other are shown in 
Fig. 33.5. The unbroken oval in Fig. 33.5 is a different 
kind of attractor from the fixed point one in Fig. 33.2. 
This attractor is well known as a limit cycle or peri- 
odic attractor of a pushed swing. It is called a limit 
cycle because it represents the cyclical behavior of the 
oscillations of the pushed swing as a limit to which 
the dynamical systems adheres under the sway of this 
attractor. It is periodic because the attractor oscillates 
around the same values, as the swing keeps going up 
and down until the s has a same heights from the lowest 
point. Such dynamical system can be called periodic for 
it has a repeating cycle or pattern. 

By now, what we have learned about attractors can 
be summarized as follows: they are spatially displayed 
phase portraits of a dynamical system as it changes over 
the course of time, thus they represent the long-term 
dynamics of the system so that whatever the initial con- 
ditions represented as data points are, their trajectories 
in phase space fall within its basins of attraction, they 
are attracted to the attractor. In spite of wide usage in 
mathematics and science, as Robinson points out there is 
still no precise definition of an attractor, although many 


v2 
34 


25 
2 


1.5 


0.5 


0.5 > 
04 02 0 02 04 06 08 1 12 14 


vı 


Fig. 33.5 Phase portrait and limit cycle attractor of 
a pushed swing (after [33.1]) 


have been offered [33.2]. So he suggests thinking about 
an attractor as a phase portrait that attracts a large set of 
initial conditions and has some sort of minimality prop- 
erty, which is the smallest portrait in the phase space of 
the system. The attractor has the property of attracting 
the initial conditions after any initial transient behavior 
has died down. The minimality requirement implies the 
invariance or stability of the attractor. As a minimal ob- 
ject, the attractor cannot be split up into smaller subsets 
and retains its role as what dominates a dynamical sys- 
tem during a particular phase of its evolution. 


33.1.2 Single Attractors 
of Dynamical Systems 


Standard methods for the study of stability of dynamical 
systems with a unique attractor include the Lyapunov 
method, the Lasalles invariance principle, and the com- 
bination of thereof. Usually, given the properties of 
a (unique) attractor, we can realize a dynamical system 
with such an attractor. 

Since the creation of the fundamental theorems of 
Lyapunov stability, many researchers have gone fur- 
ther and proved that most of the fundamental Lyapunov 
theories are reversible. Thus, from theory, this demon- 
strates that these theories are efficacious; i.e., there 
necessarily exists the corresponding Lyapunov function 
if the solution has some kind of stability. However, as 
for the construction of an appropriate V function for the 
determinant of stability, researchers are still interested. 
The difference between the existence and its construc- 
tion is large. However, there is no general rule for the 
construction of the Lyapunov function. In some cases, 
different researchers have different methods for the 
construction of the Lyapunov function based on their 
experience and technique. Those, who can construct 
a good quality Lyapunov function, can get more use- 
ful information to demonstrate the effectiveness of their 
theories. Certainly, many successful Lyapunov func- 
tions have a practical background. For example, some 
equations inferred from the physical model have a clear 
physical meaning such as the mechanics guard system, 
in which the total sum of the kinetic energy and po- 
tential energy is the appropriate V function. The linear 
approximate method can be used; i.e., for the nonlin- 
ear differential equation, firstly find its corresponding 
linear differential equation’s quadric form positive de- 
fined V function, then consider the nonlinear quality for 
the construction of a similar V function. 

Grossberg proposed and studied additive neural net- 
works because they add nonlinear contributions to the 


ee | d Hed 


610 PartD 


Neural Networks 


neuron activity. The additive neural network has been 
used for many applications since the 1960s [33.3, 4], 
including the introduction of self-organizing maps. In 
the past decades, neural networks as a special kind 
of nonlinear systems have received considerable at- 
tention. The study of recurrent neural networks with 
their various generalizations has been an active research 
area [33.5-17]. The stability of recurrent neural net- 
works is a prerequisite for almost all neural network 
applications. Stability analysis is primarily concerned 
with the existence and uniqueness of equilibrium points 
and global asymptotic stability, global exponential sta- 
bility, and global robust stability of neural networks at 
equilibria. In recent years, the stability analysis of re- 
current neural networks with time delays has received 
much attention [33.18, 19]. Single attractors of dynam- 
ical systems are shown in Fig. 33.6. 


33.1.3 Multiple Attractors 
of Dynamical Systems 


Multistable systems have attracted extensive interest in 
both modeling studies and neurobiological research in 


lee | d Hed 


Fig. 33.7 Two limit cycle attractors of dynamical systems 


recent years due to their feasibility to emulate and ex- 
plain biological behavior [33.20-34]. Mathematically, 
multistability allows the system to have multiple fixed 
points and periodic orbits. As noted in [33.35], more 
than 25 years of experimental and theoretical work has 
indicated that the onset of oscillations in neurons and 
in neuron populations is characterized by multistabil- 
ity. 

Multistability analysis is different from monosta- 
bility analysis. In monostability analysis, the objective 
is to derive conditions that guarantee that each nonlin- 
ear system contains only one equilibrium point, and all 
the trajectories of the neural network converge to it. 
Whereas in multistability analysis, nonlinear systems 
are allowed to have multiple equilibrium points. Sta- 
ble and unstable equilibrium points, and even periodic 
trajectories may co-exist in a multistable system. 

The methods to study the stability of dynamical 
systems with a unique attractor include the Lyapunov 
method, the Lasalles invariance principle, and the com- 
bination of the two methods. One unique attractor can 
be realized by one dynamical system, but it is much 
more complicated for multiple attractors to be realized 
by one dynamical system or dynamical multisystems 
because of the compatibility, agreement, and behavior 
optimization among the systems. Generally, the usual 
global stability conditions are not adequately applicable 
to multistable systems. The latest results on multistabil- 
ity of neural networks can be found in [33.36—52]. It is 
shown in [33.45, 46] that the n-neuron recurrent neural 
networks with one step piecewise linear activation func- 
tion can have 2” locally exponentially stable equilib- 
rium points located in saturation regions by partitioning 
the state space into 2” subspaces. In [33.47], mul- 


Fig. 33.8 2* equilibrium point attractors of dynamical sys- 
tems 


Neurodynamics 


33.2 Synchrony, Oscillations, and Chaos in Neural Networks 


tistability of almost periodic solutions of recurrently 
connected neural networks with delays is investigated. 
In [33.48], by constructing a Lyapunov functional and 
using matrix inequality techniques, a delay-dependent 
multistability criterion on recurrent neural networks is 
derived. In [33.49], the neural networks with a class 
of nondecreasing piecewise linear activation functions 
with 2r corner points are considered. It is proved that 
the n-neuron dynamical systems can have and only 
have (2r+ 1)" equilibria under some conditions, of 
which (r+ 1)” are locally exponentially stable and oth- 
ers are unstable. In [33.50], some multistability prop- 
erties for a class of bidirectional associative memory 
recurrent neural networks with unsaturation piecewise 
linear transfer functions are studied based on local inhi- 
bition. In [33.51], for two classes of general activation 
functions, multistability of competitive neural networks 


with time-varying and distributed delays is investigated 
by formulating parameter conditions and using inequal- 
ity techniques. In [33.52], the existence of 2” stable 
stationary solutions for general n-dimensional delayed 
neural networks with several classes of activation func- 
tions is presented through formulating parameter condi- 
tions motivated by a geometrical observation. Two limit 
cycle attractors and 2+ equilibrium point attractors of 
dynamical systems are shown in Figs. 33.7 and 33.8, 
respectively. 


33.1.4 Conclusion 


In this section, we briefly introduced what attractors 
can be summarized as, and phase space and attractors. 
Furthermore, single-attractor and multiattractors of dy- 
namical systems were also discussed. 


33.2 Synchrony, Oscillations, and Chaos in Neural Networks 


33.2.1 Synchronization 


Biological Significance of Synchronization 
Neurodynamics deals with dynamic changes of neu- 
ral properties and behaviors in time and space at 
different levels of hierarchy in neural systems. The 
characteristic spiking dynamics of individual neurons 
is of fundamental importance. In large-scale systems, 
such as biological neural networks and brains with 
billions of neurons, the interaction among the con- 
nected neural components is crucial in determining col- 
lective properties. In particular, synchronization plays 
a critical role in higher cognition and conscious- 
ness experience [33.53-57]. Large-scale synchroniza- 
tion of neuronal activity arising from intrinsic asyn- 
chronous oscillations in local electrical circuitries of 
neurons are at the root of cognition. Synchroniza- 
tion at the level of neural populations is characterized 
next. 

There are various dynamic behaviors of potential 
interest for neural systems. In the simplest case, the 
system behavior converges to a fixed point, when all 
major variables remain unchanged. A more interesting 
dynamic behavior emerges when the system behavior 
periodically repeats itself at period T, which will be 
described first. Such periodic oscillations are common 
in neural networks and are often caused by the pres- 
ence of inhibitory neurons and inhibitory neural pop- 
ulations. Another behavior emerges when the system 


neither converges to a fixed point nor exhibits peri- 
odic oscillations, rather it maintains highly complex, 
chaotic dynamics. Chaos can be microscopic effect at 
the cellular level, or mesoscopic dynamics of neural 
populations or cortical regions. At the highest level 
of hierarchy, chaos can emerge as the result of large- 
scale, macroscopic effect across cortical areas in the 
brain. 

Considering the temporal dynamics of a system 
of interacting neural units, limit cycle oscillations and 
chaotic dynamics are of importance. Synchronization 
in limit cycle oscillations is considered first, which il- 
lustrates the basic principles of synchronization. The 
extension to more complex (chaotic) dynamics is de- 
scribed in Sect. 33.2.3. Limit cycle dynamics is de- 
scribed as a cyclic repetition of the system’s behavior 
at a given time period T. The cyclic repetition covers 
all characteristics of the system, e.g., microscopic cur- 
rents, potentials, and dynamic variables; see, e.g., the 
Hodgkin—Huxley model of neurons [33.58]. Limit cy- 
cle oscillations can be described as a cyclic loop of the 
system trajectory in the space of all variables. The state 
of the system is given as a point on this trajectory at any 
given time instant. As time evolves, the point belong- 
ing to the system traverses along the trajectory. Due to 
the periodic nature of the movement, the points describ- 
ing the system at time ¢ and t+ T coincide fully. We 
can define a convenient reference system by selecting 
a center point of the trajectory and describe the mo- 


611 


eee | d Hed 


612 


cee | d Hed 


Part D 


Neural Networks 


tion as the vector pointing from the center to the actual 
state on the trajectory. This vector has an amplitude and 
phase in a suitable coordinate system, denoted as é (t) 
and ®(t), respectively. The evolution of the phase in an 
isolated oscillator with frequency wp can be given as 
follows 


dð) 
dt 


wo. (33.1) 


Several types of synchronization can be defined. 
The strongest synchronization takes place when two 
(or multiple) units have identical behaviors. Consider- 
ing limit cycle dynamics, strong synchronization means 
that the oscillation amplitude and phase are the same 
for all units. This means complete synchrony. An ex- 
ample of two periodic oscillators is given by the clocks 
shown in Fig. 33.9a—c [33.59]. Strong synchroniza- 
tion means that the two pendulums are connected with 
a rigid object forcing them move together. The lack of 
connection between the two pendulums means the ab- 
sence of synchronization, i.e., they move completely 
independently. An intermediate level of synchrony may 
arise with weak coupling between the pendulums, such 
as a spring or a flexible band. Phase synchrony takes 
place when the amplitudes are not the same, but the 


a) 


patti) 


phases of the oscillations could still coincide. Fig- 
ure 33.9b-d depicts the case of out-of-phase synchrony, 
when the phases of the two oscillators are exactly the 
opposite. 


Amplitude Measures of Synchrony 
Denote by a;(t) the time signal produced by the individ- 
ual units (neurons); j = 1,...,N, and the overall signal 
of interacting units (A) is determined as 


N 
A(t) =1/N ) a(t). (33.2) 


j=l 
The variance of time series A(t) is given as follows 


of =(P O) (AQ)? . (33.3) 


Here (f(t)) denotes time averaging over a give time 
window. After determining the variance of the individ- 
ual channels o based on (33.3), the synchrony yy in 
the system with N components is defined as follows 


2 


ie — S (33.4) 
1/N J i= Oi; 
b) 
d) \ Ea(tr) 


galt) 


\ éB(h) 


Fig. 33.9a-d Synchronization in pendulums, in phase and out of phase (after [33.59]). Bottom plots: Illustration of 
periodic trajectories, case of in-phase (a-c) and out-of-phase oscillations (b-d) 


Neurodynamics 


33.2 Synchrony, Oscillations, and Chaos in Neural Networks 


This synchrony measure has a nonzero value in syn- 
chronized and partially synchronized systems 0 < yy < 
1, while yy = 0 means the complete absence of syn- 
chrony in neural networks [33.60]. 

Fourier transform-based signal processing methods 
are very useful for the characterization of synchrony 
in time series, and they are widely used in neural net- 
work analysis. The Fourier transform makes important 
assumptions on the analyzed time series, including sta- 
tionary or slowly changing statistical characteristics and 
ergodicity. In many applications these approximations 
are appropriate. In analyzing large-scale synchrony on 
brain signals, however, alternative methods are also 
justified. Relevant approaches include the Hilbert trans- 
form for rapidly changing brain signals [33.61, 62]. 
Here both Fourier and Hilbert-based methods are out- 
lined and avenues for their applications in neural net- 
works are indicated. Define the cross correlation func- 
tion (CCF) between discretely sampled time series x; (t) 
and x;(t), t = 1,...,N as follows 


T-T 
ccr) = E Ette) O- (y) 
t=1 


(33.5) 


Here (x;) is the mean of the signal over period T, 
and it is assumed that x;(t) is normalized to unit vari- 
ance. For completely correlated pairs of signals, the 
maximum of the cross correlation is 1, for uncorrelated 
signals it equals 0. The cross power spectral density 
CPSDj(@), cross spectrum for short, is defined as the 
Fourier transform of the cross correlation as follows: 
CPSDj;(@) = F (CCF,(t)). If i = j, i. e., the two chan- 
nels coincide, then we talk about autocorrelation and 
auto power spectral density APSDj;;(@); for details of 
Fourier analysis, see [33.63]. Coherence y? is defined 
by normalizing the cross spectrum by the autospectra 


|CPSD;(@)|? 
|APSD;i(@)||APSD;(@)| ` 


The coherence satisfies 0 < yw) < 1 and it con- 
tains useful information on the frequency content of 
the synchronization between signals. If coherence is 
close to unity at some frequencies, it means that the 
two signals are closely related or synchronized; a co- 
herence near zero means the absence of synchrony at 
those frequencies. Coherence functions provide useful 
information on synchrony in brain signals at various 
frequency bands [33.64]. For other information-theo- 
retical characterizations, including mutual information 
and entropy measures. 


(33.6) 


yj(@) = 


Phase Synchronization 
If the components of the neural network are weakly 
interacting, the synchrony evaluated using the ampli- 
tude measure y in (33.4) may be low. There can still 
be a meaningful synchronization effect in the system, 
based on phase measures. Phase synchronization is de- 
fined as the global entrainment of the phases [33.65], 
which means a collective adjustment of their rhythms 
due to their weak interaction. At the same time, in sys- 
tems with phase synchronization the amplitudes need 
not be synchronized. Phase synchronization is often 
observed in complex chaotic systems and it has been 
identified in biological neural networks [33.61, 65]. 

In complex systems, the trajectory of the system 
in the phase space is often very convoluted. The ap- 
proach described in (33.1), i. e., choosing a center point 
for the oscillating cycle in the phase space with natural 
frequency wo, can be nontrivial in chaotic systems. In 
such cases, the Hilbert transform-based approach can 
provide a useful tool for the characterization of phase 
synchrony. Hilbert analysis determines the analytic sig- 
nal and its instantaneous frequency, which can be used 
to describe phase synchronization effects. Considering 
time series s(f), its analytic signal z(t) is defined as fol- 
lows [33.62] 


z(t) = s(t) + BA = AHi O | (33.7) 


Here A(f) is the analytic amplitude, ®(f) is the an- 
alytic phase, while S(1) is the Hilbert transform of s(t), 
given by 


+oo 
= ev f Oa, (33.8) 


where PV stands for the principal value of the integral 
computed over the complex plane. The analytic signal 
and its instantaneous phase can be determined for an 
arbitrary broadband signal. However, the analytic sig- 
nal has clear meaning only at a narrow frequency band, 
therefore, the bandpass filter should precede the eval- 
uation of analytic signal in data with broad frequency 
content. 

The Hilbert method of analytic signals is illus- 
trated using actual local field potentials measured over 
rabbits with an array of chronically implanted intracra- 
nial electrodes [33.67]. The signals have been filtered 
in the theta band (3—7 Hz). An example of time se- 
ries s(t) is shown in Fig. 33.10a. The Hilbert trans- 
form S(t) is depicted in Fig. 33.10b in red, while blue 
color shows s(f). Figure 33.10c shows the analytic 


613 


eee | d Hed 


614 Part D | Neural Networks 


cee | d Hed 


a) s(t) 
A 
200 


Time (s) 
c) Phase y(t) 
A 


2 


24 -1.6 -0.8 0 0.8 1.6 2.4 
Time (s) 


b) Analytical signal z(t) 
200 


100 | 
AN 
-100 
-200 | 
8 0 08 


24 -1.6 -0 


Amplitude 


> 


1.6 2.4 
Time (s) 


d) Phase space z(t) 


Imag (z(f)) 


Real (z(t)) 


Fig. 33.10a-d Demonstration of the Hilbert analytic signal approach on electroencephalogram (EEG) signals (af- 
ter [33.66]); (a) signal s(t); (b) Hilbert transform S(t) (red) of signal s(t) (blue); (c) instantaneous phase ®(f); and 


analytic signal in complex plane z(f) 


phase ®(t), and Fig. 33.10d depicts the analytic z(f) 
signal in the complex plane. Figure 33.11 shows the 
unwrapped instantaneous phase with bifurcating phase 
curves indicating desynchronization at specific time in- 
stances —1.3s, —0.4s, and 1s. The plot on the right- 
hand side of Fig. 33.11 depicts the evolution of the in- 
stantaneous frequency in time. The frequency is around 
5 Hz most of the time, indicating phase synchroniza- 
tion. However, it has very large dispersion at a few 
specific instances (desynchronization). 

Synchronization between channels x and y can be 
measured using the phase lock value (PLV) defined as 
follows [33.61] 


l t+T/2 

PLV „(®) = J elPO-Py@Igzr} (33.9) 
t—T/2 

PLV ranges from 1 to 0, where 1 indicates complete 


phase locking. PLV defined in (33.9) determines an av- 
erage value over a time window of length T. Note that 


PLY is a function of t by applying the given sliding win- 
dow. PLV is also the function of the frequency, which 
is being selected by the bandpass filter during the pre- 
processing phase. By changing the frequency band and 
time, the synchronization can be monitored at various 
conditions. This method has been applied productively 
in cognitive experiments [33.68]. 


Synchronization- Desynchronization 

Transitions 
Transitions between neurodynamic regimes with and 
without synchronization have been observed and ex- 
ploited for cognitive monitoring. The Haken—Kelso— 
Bunz (HKB) model is one of the prominent and elegant 
approaches providing a theoretical framework for syn- 
chrony switching, based on the observations related 
to bimanual coordination [33.69]. The HKB model 
invokes the concepts of metastability and multista- 
bility as fundamental properties of cognition. In the 
experiment, the subjects were instructed to follow the 
rhythm of a metronome with their index fingers in an 


Neurodynamics | 33.2 Synchrony, Oscillations, and Chaos in Neural Networks 615 


anti-phase manner. It was observed that by increasing 
the metronome frequency, the subject spontaneously 
switched their anti-phase movement to in-phase at a cer- 
tain oscillation frequency and maintained it thereon 
even if the metronome frequency was decreased again 
below the given threshold. 

The following simple equation is introduced to de- 
scribe the dynamics observed: dA®/dt = — sin(@) — 
2e sin(2®). Here AD = ġı — dz is the phase difference 
between the two finger movements, control parameter £ 
is related to the inverse of the introduced oscillatory fre- 
quency. The system dynamics is illustrated in Fig. 33.12 
by the potential surface V, where stable fixed points 
correspond to local minima. For low oscillatory fre- 
quencies (high ¢), there are stable equilibria at anti- 
phase conditions. As the oscillatory frequency increases 


a) Unwrapped phase (radians) 
140 
120 
100 
80 
60 
40 


20 


24 -1.6 -08 0 0.8 1.6 2.4 


Time (s) 
b) 
S 57 
g 
3 
= 
Sy 
O 
ee, l 


9 Channels 


0.8 


1.6 1 
Time (s) 


Fig. 33.11a,b Illustration of instantaneous phases; (a) un- 
wrapped phase with bifurcating phase curves indicating 
desynchronization at specific time instances —1.3 s, —0.4 s, 
and 1 s; (b) evolution of instantaneous frequency in time 


(low £) the dynamics transits to a state where only the 
in-phase equilibrium is stable. 

Another practical example of synchrony-desyn- 
chrony transition in neural networks is given by image 
processing. An important basic task of neural networks 
is image segmentation, which is difficult to accom- 
plish with excitatory nodes only. There is evidence that 
biological neural networks use inhibitory connections 
for completing basic pattern separation and integration 
tasks [33.70]. Synchrony between interacting neurons 
may indicate the recognition of an input. A typical neu- 
ral network architecture implementing such a switch 
between synchronous and nonsynchronous states us- 
ing local excitation and global inhibition is shown in 
Fig. 33.13. This system uses amplitude difference to 


Fig. 33.12 Illustration of the potential surface (V) of the 
HKB system as a function of the phase difference in radi- 
ans A@ and inverse frequency e£. The transition from anti- 
phase to in-phase behavior is seen as the oscillation fre- 
quency increases (£ decreases) 


Fig. 33.13 Neural network with local excitation and 
a global inhibition node (black; after [33.70]) 


eee | d Hed 


616 PartD 


Neural Networks 


cee | d Hed 


measure synchronization between neighboring neurons. 
Phase synchronization measures have been proposed as 
well to accomplish the segmentation and recognition 
tasks [33.71]. Phase synchronization provides a very 
useful tool for learning and control of the oscillations 
in weakly interacting neighborhoods. 


33.2.2 Oscillations in Neural Networks 


Oscillations in Brains 
The interaction between opposing tendencies in phys- 
ical and biological systems can lead to the onset of 
oscillations. Negative feedback between the system’s 
components plays an important role in generating os- 
cillations in electrical systems. Brains as large-scale 
bioelectrical networks consist of components oscillat- 
ing at various frequencies. The competition between 
inhibitory and excitatory neurons is a basic ingredi- 
ent of cortical oscillations. The intricate interaction 
between oscillators produces the amazingly rich oscil- 
lations that we experimentally observe as brain rhythms 
at multiple time scales [33.72, 73]. 

Oscillations occur in the brain at different time 
scales, starting from several milliseconds (high fre- 
quencies) to several seconds (low frequencies). One 
can distinguish between oscillatory components based 
on their frequency contents, including delta (1—4 Hz), 
theta (4—7 Hz), alpha (7—12 Hz), beta (12—30 Hz), and 
gamma (30—80Hz) bands. The above separation of 
brain wave frequencies is somewhat arbitrary, however, 
they can be used as a guideline to focus on various 
activities. For example, higher cognitive functions are 
broadly assumed be manifested in oscillations in the 
higher beta and gamma bands. 

Brain oscillations take place in time and space. 
A large part of cognitive activity happens in the cortex, 
which is a convoluted surface of the six-layer cortical 
sheet of gyri and sulci. The spatial activity is organized 
on multiple scales as well, starting from the neuronal 
level (um), to granules (mm), cortical activities (several 
cm), and hemisphere-wide level (20cm). The tempo- 
ral and spatial scales are not independent, rather they 
delicately interact and modulate each other during cog- 
nition. Modern brain monitoring tools provide insight 
to these complex space-time processes [33.74]. 


Characterization of Oscillatory Networks 
Oscillations in neural networks are synchronized activ- 
ities of populations of neurons at certain well-defined 
frequencies. Neural systems are often modeled as the 
interaction of components which oscillate at specific, 


well-defined frequencies. Oscillatory dynamics can cor- 
respond to either microscopic neurons, to mesoscopic 
populations of tens of thousands neurons, or to macro- 
scopic neural populations including billions of neurons. 
Oscillations at the microscopic level have been thor- 
oughly studied using spiking neuron models, such as 
the Hodgkin—Huxley equation (HH). Here we focus on 
populations of neurons, which have some natural os- 
cillation frequencies. It is meaningful to assume that 
the natural frequencies are not identical due to the 
diverse properties of populations in the cortex. Inter- 
estingly, the diversity of oscillations at the microscopic 
and mesoscopic levels can give rise to large-scale syn- 
chronous dynamics at higher levels. Such emergent 
oscillatory dynamics is the primary subject of this 
section. 

Consider N coupled oscillators with natural fre- 
quencies aj; j= 1,...,N. A measure of the synchro- 
nization in such systems is given by parameter R, 
which is often called the order parameter. This ter- 
minology was introduced by Haken [33.75] to de- 
scribe the emergence of macroscopic order from dis- 
order. The time-varying order parameter R(t) is defined 
as [33.76] 

RO) = |1/N x BL, i20], (33.10) 

Order parameter R provides a useful synchroniza- 
tion measure for coupled oscillatory systems. A com- 
mon approach is to consider a globally coupled system, 
in which all the components interact with each other. 
This is the broadest possible level of interaction. The 
local coupling model represents just the other extreme 
limit, i.e., each node interacts with just a few others, 
which are called its direct neighbors. In a one-dimen- 
sional array, a node has two neighbors on its left and 
right, respectively (assuming periodic boundary con- 
ditions). In a two-dimensional lattice, a node has four 
direct neighbors, and so on. The size of the neigh- 
borhood can be expanded, so the connectivity in the 
network becomes more dense. There is of special inter- 
est in networks that have a mostly regular neighborhood 
with some further neighbors added by a selection rule 
from the whole network. The addition of remote or non- 
local connections is called rewiring, and the networks 
with rewiring are small world networks. They have been 
extensively studied in network theory [33.76—78]. Fig- 
ure 33.14 illustrates local (top left) and global coupling 
(bottom right), as well as intermediate coupling, with 
the bottom left plot giving an example of network with 
random rewiring. 


Neurodynamics 


33.2 Synchrony, Oscillations, and Chaos in Neural Networks 


The Kuramoto Model 

The Kuramoto model [33.79] is a popular approach 
to describe oscillatory neural systems. It implements 
mean-field (global) coupling. The synchronization in 
this model allows an analytical solution, which helps to 
interpret the underlying dynamics in clear mathemati- 
cal terms [33.76]. Let 6; and œ; denote the phase and the 
inherent frequency of the i-th oscillator. The oscillators 
are coupled by a nonlinear interaction term depending 
on their pair-wise phase differences. In the Kuramoto 
model, the following sinusoidal coupling term has been 
used to model neural systems 


d9 


K 
Fa -5X sin(—§), j=l,...,N. 


! N 


(33:11) 


Here K denotes the coupling strength and K = 0 
means no coupling. The system in (33.11) and its 
generalizations have been studied extensively since 
its first introduction by Kuramoto [33.79]. Kuramoto 
used Lorenztian initial distribution of phases @ defined 
as: L(0) = y/{x(y* + (w — w)?)}. This leads to the 
asymptotic solution N — inf and t — inf for order pa- 
rameter R in simple analytic terms 


R= /1—(K./K)_ if K > K,,R = 0 otherwise . 
(33.12) 


Here K, denotes the critical coupling strength given 
by K. = 2y. There is no synchronization between the 


a) b) 


c) d) 


KAN 
TRA 
vA 


Fig. 33.14a-d Network architectures with various connec- 
tivity structures: (a) local, (b) and (c) are intermediate, and 
(d) global (mean-field) connectivity 


oscillators if K < K., and the synchronization becomes 
stronger as K increases at supercritical conditions K > 
K., see Fig. 33.15. Inputs can be used to control syn- 
chronization, i.e., a highly synchronized system can 
be (partially) desynchronized by input stimuli [33.80, 
81]. Alternatively, input stimuli can induce large-scale 
synchrony in a system with low level of synchrony, as 
evidenced by cortical observations [33.82]. 


Neural Networks as Dynamical Systems 
A dynamical system is defined by its equation of mo- 
tion, which describes the location of the system as 
a function of time t 


dX(t, A) 


33.13 
EP ( ) 


=F(X), XeR”. 


Here X is the state vector describing the state of 
the system in the n-dimensional Euclidean space X = 
X(x1,..-,Xn) € R” and å is the vector of system param- 
eters. Proper initial conditions must be specified and it 
is assumed that F(X) is a sufficiently smooth nonlinear 
function. In neural dynamics it is often assumed that 
the state space is a smooth manifold, and the goal is to 
study the evolution of the trajectory of X(t) in the state 
space as time varies along the interval [f, T]. 

The Cohen-—Grossberg (CG) equation is a general 
formulation of the motion of a neural network as a dy- 
namical system with distributed time delays in the 
presence of inputs. The CG model has been studied 


R (Order parameter) 
l4 


Mean-field | 
0.4 coupling 
0.2 
0 > 
0 5 10 15 


K (Coupling) 


Fig. 33.15 Kuramoto model in the mean-field case. Depen- 
dence of order parameter R on the coupling strength K. 
Below a critical value K., the order parameter is 0, indi- 
cating the absence of synchrony; synchrony emerges for K 
above the critical value 


617 


eee | d Hed 


618 PartD 


Neural Networks 


cee | d Hed 


thoroughly in the past decades and it served as a starting 
point for various other approaches. The general form of 
the CG model is [33.83] 


dz;(t) x 
T S aO) | Bil) A hO) 


j=l 


N 
-J baft- ty) +u |, 


j=l 
(33.14) 


Here X(t) = [x1(f),x2(t),...,xy(t)]' is the state 
vector describing a neural network with N neurons. 
Function a;(t) describes the amplification, b;(t) denotes 
a properly behaved function to guarantee that the solu- 
tion remains bounded, f;(x) is the activation function, u; 
denotes external input, aj; and bj are components of the 
connection weight matrix and the delayed connection 
weight matrix, respectively, and tj describes the time 
delays between neurons, i,j =1,...,n. The solution 
of (33.14) can be determined after specifying suitable 
initial conditions. 

There are various approaches to guarantee the sta- 
bility of the CG equation as it approaches its equilibria 
under specific constraints. Global convergence assum- 
ing symmetry of the connectivity matrix has been 
shown [33.83]. The symmetric version of a simpli- 
fied CG model has become popular as the Hopfield 
or Hopfield—Tank model [33.84]. Dynamical proper- 
ties of CG equation have been studied extensively, 
including asymptotic stability, exponential stability, ro- 
bust stability, and stability of periodic bifurcations and 
chaos. Symmetry requirements for the connectivity ma- 
trix have been relaxed, still guaranteeing asymptotic 
stability [33.85]. CG equations can be employed to 
find the optimum solutions of a nonlinear optimization 
problem when global asymptotic stability guarantees 
the stability of the solution [33.86]. Global asymptotic 
stability of the CG neural network with time delay is 
studied using linear matrix inequalities (LMI). LMI is 
a fruitful approach for global exponential stability by 
constructing Lyapunov functions for broad classes of 
neural networks. 


Bifurcations in Neural Network Dynamics 
Bifurcation theory studies the behavior of dynamical 
systems in the neighborhood of bifurcation points, 1. e., 
at points when the topology of the state space abruptly 
changes with continuous variation of a system parame- 
ter. An example of the state space is given by the folded 


surface in Fig. 33.16, which illustrates a cusp bifurca- 
tion point. Here A = [a, b] is a two-dimensional param- 
eter vector, X € R! [33.87]. As parameter b increases, 
the initially unfolded manifold undergoes a bifurcation 
through a cusp folding with three possible values of 
state vector X. This is an example of pitchfork bifur- 
cation, when a stable equilibrium point bifurcates into 
one unstable and two stable equilibria. The projection to 
the a — b plane shows the cusp bifurcation folding with 
multiple equilibria. The presence of multiple equilib- 
ria provides the conditions for the onset of oscillatory 
states in neural networks. The transition from fixed 
point to limit cycle dynamics can described by bifur- 
cation theory. 


Neural Networks with Inhibitory Feedback 
Oscillations in neural networks are typically due to de- 
layed, negative feedback between neural population. 
Mean-field models are described first, starting with 
Wilson—Cowan (WC) oscillators, which are capable of 
producing limit cycle oscillations. Next, a class of more 
general networks with excitatory—inhibitory feedback 
are described, which can generate unstable limit cycle 
oscillations. 

The Wilson—Cowan model is based on statistical 
analysis of neural populations in the mean-field limit, 
i.e., assuming that all components of the system fully 
interact [33.88, 89]. In the brain it may describe a sin- 
gle cortical column in one of the sensory cortices, which 
in turn interacts with other columns to generate syn- 
chronous or asynchronous oscillations, depending on 
the cognitive state. In its simplest manifestation, the 
WC model has one excitatory and one inhibitory com- 


Unstable 
equilibrium 


Bifurcation 


point \ 
t 


Stable 
equilibrium 


x 
a 
b 


Oscillatory 
states 


Fig. 33.16 Folded surface in the state space illustrating 
cusp bifurcation following (after [33.87]). By increasing 
parameter b, the stable equilibrium bifurcates to two stable 
and one unstable equilibria 


Neurodynamics | 33.2 Synchrony, Oscillations, and Chaos in Neural Networks 619 


ponent, with interaction weights denoted as Wgg, wer, 
Wig, and wy. Nonlinear function f stands for the stan- 
dard sigmoid with rate constant a 


dXg 
e —Xe+f(weeXe + weXiı + Pg), (33.15) 
dXı 
T = —X + f (werXe + wuXı + Py), (33.16) 
fŒ = 1/4]. (33.17) 


Pg and P; describe the effect of input stimuli 
through the excitatory and inhibitory nodes, respec- 
tively. The inhibitory weights are negative, while the 
excitatory ones are positive. The WC system has been 
extensively studied with dynamical behaviors includ- 
ing fixed point and oscillatory regimes. In particular, 
for fixed weight values, it has been shown that the 
WC system undergoes a pitchfork bifurcation by chang- 
ing Pg or Pı input levels. Figure 33.17 shows the 
schematics of the two-node system, as well as the illus- 
tration of the oscillatory states following the bifurcation 
with parameters Weg = 11.5, wy = —2, Wer = —wE = 
—10, and input values Pg = 0 and P; = —4, with rate 
constant a= 1. Stochastic versions of the Wilson- 


Time t 


Fig. 33.17 Schematic diagram of the Wilson—Cowan os- 
cillator with excitatory (E) and inhibitory (I) populations; 
solid lines show excitatory, dashed show inhibitory con- 
nections. The right panels show the trajectory in the phase 
space of Xg — Xj and the time series of the oscillatory sig- 
nals (after [33.90]) 


Cowan oscillators have been extensively developed as 
well [33.90]. Coupled Wilson—Cowan oscillators have 
been used in learning models and have demonstrated 
applicability in a number of fields, including visual pro- 
cessing and pattern classification [33.9 1-93]. 

Oscillatory neural networks with interacting ex- 
citatory—inhibitory units have been developed in Free- 
man K sets [33.94]. That model uses an asymmetric 
sigmoid function f(x) modeled based on neurophysio- 
logical activations and given as follows 


F(x) = qil — exp [1/4 - 1) )} - (33.18) 


Here q is a parameter specifying the slope and 
maximal asymptote of the sigmoid curve. The sigmoid 
has unit gain at zero, and has maximum gain at pos- 
itive x values due to its asymmetry, see (33.18). This 
property provides the opportunity for self-sustained os- 
cillations without input at a wide range of parameters. 
Two versions of the basic oscillatory units have been 
studied, either one excitatory and one inhibitory unit, 
or two excitatory and two inhibitory units. This is il- 
lustrated in Fig. 33.18. Stability conditions of the fixed 
point and limit cycle oscillations have been identi- 
fied [33.95, 96]. The system with two E and two I units 
has the advantage that it avoids self-feedback, which is 
uncharacteristic in biological neural populations. Inter- 
estingly, the extended system has an operating regime 
with an unstable equilibrium without stable equilib- 
ria. This condition leads to an inherent instability in 
a dynamical regime when the system oscillates with- 
out input. Oscillations in the unstable region have been 
characterized and conditions for sustained unstable os- 
cillations derived [33.96]. Simulations in the region 
confirmed the existence of limit cycles in the unstable 
regime with highly irregular oscillatory shapes of the 
cycle, see Fig. 33.18, upper plot. Regions with regular 
limit cycle oscillations and fixed point oscillations have 
been identified as well, see Fig. 33.18, middle and bot- 
tom [33.97]. 


Spatiotemporal Oscillations 

in Heterogeneous NNs 
Neural networks describe the collective behavior of 
populations of neurons. It is of special interest to study 
populations with a large-number of components having 
complex, nonlinear interactions. Homogeneous popula- 
tions of neurons allow mathematical modeling in mean- 
field approximation, leading to oscillatory models such 
as the Wilson—Cowan oscillators and Freeman KII sets. 
Field models with heterogeneous structure and dynamic 


eee | d Hed 


620 PartD | Neural Networks 


TEE | d Hed 


a) P(t) 


We 


> 
0 200 400 600 800 1000 1200 1400 1600 1800 2000 


SS... eee 
0 200 400 600 800 1000 1200 1400 1600 1800 2000 


> 
0 200 400 600 800 1000 1200 1400 1600 1800 2000 
Time (ms) 


Fig. 33.18a,b Illustration of excitatory—inhibitory models. (a) Left: 
simplified model with one excitatory (E) and one inhibitory (I) 
node. Right: extended model with two E and two I nodes. (b) Simu- 
lations with the extended model with two E and two I nodes; yı — y4 
show the activations of the nodes; b1: limit cycle oscillations in the 
unstable regime; b2: oscillations in the stable limit cycle regime; 
b3: fixed point regime (after [33.97]) 


variables are of great interest as well, as they are the 
prerequisite of associative memory functions of neural 
networks. 

A general mathematical formulation views the neu- 
ropil, the interconnected neural tissue of the cortex, 
as a dynamical system evolving in the phase space, 
see (33.13). Consider a population of spiking neu- 
rons each of which is modeled by a Hodgkin—Huxley 
equation. The state of a neuron at any time instant is 
determined by its depolarization potential, microscopic 


current, and spike timing. Each neuron is represented 
by a point in the state space given by the above coor- 
dinates comprising vector X(t) € R”, and the evolution 
of a neuron is given with its trajectory the state space. 
Neuropils can include millions and billions of neurons; 
thus the phase space of the neurons contains a myriads 
of trajectories. Using the ensemble density approach of 
population modeling, the distribution of neurons in the 
state space at a given time ¢ is described by a prob- 
ability density function p(X, t). The ensemble density 
approach models the evolution of the probability den- 
sity in the state space [33.98]. One popular approach 
uses the Langevin formalism given next. 


Field Theories of Neural Networks 
Consider the stochastic process X(t), which is described 
by the Langevin equation [33.99] 


dX(t) = (X (Ð )dt + o (X(t) dW(P) . (33.19) 


Here jz and o denote the drift and variance, respec- 
tively, and dW(r) is a Wiener process (Brown noise) 
with normally distributed increments. The probability 
density p(X, t) of Langevin equation (33.19) satisfies 
the following form of the Fokker—Planck equation, after 
omitting higher-order terms 


wX D) 


=- 2 mop, 0] 


De ay Pi PX.) 


i=l j=1 
(33.20) 


The Fokker—Planck equation has two components. 
The first one is a flow term containing drift vec- 
tor 4;(X), while the other term describes diffusion 
with diffusion coefficient matrix Dj(X, t). The Fokker- 
Planck equation is a partial differential equation (PDE) 
that provides a deterministic description of macroscopic 
events resulting from random microscopic events. The 
mean-field approximation describes time-dependent, 
ensemble average population properties, instead of 
keeping track of the behavior of individual neurons. 

Mean-field models can be extended to describe the 
evolution of neural populations distributed in physi- 
cal space. Considering the cortical sheet as a de facto 
continuum of the highly convoluted neural tissue (the 
neuropil), field theories of brains are developed using 
partial differential equations in space and time. The cor- 
responding PDEs are wave equations. Consider a sim- 
ple one-dimensional model to describe the dynamics of 


Neurodynamics 


33.2 Synchrony, Oscillations, and Chaos in Neural Networks 


the current density ®(x, t) as a macroscopic variable. In 
the simple case of translational invariance of the con- 
nectivity function between arbitrary two points of the 
domain with exponential decay, the following form of 
the wave equation is obtained [33.100] 


a ® — ad 
2 T (wj — yv A)P +20 


= (03 + ons) S[P(x, t) + P(x, ©] . 


PP (33.21) 


Here A = 9? /x? is the Laplacian in one dimen- 
sion, S(.) is a sigmoid transfer function for firing 
rates, P(x, t) describes the effect of inputs; wp = v/o, 
where v is the propagation velocity along lateral axons, 
and o is the spatial relaxation constant of the applied 
exponential decay function [33.100]. The model can 
be extended to excitatory-inhibitory components. An 
example of simulations with a one-dimensional neu- 
ral field model incorporating excitatory and inhibitory 
neurons is given in Fig. 33.19 [33.101]. The figure 
shows the propagation of two traveling pulses and the 
emergence of transient complex behavior ultimately 
leading to an elevated firing rate across the whole tis- 
sue [33.101]. For recent developments in brain field 
models, see [33.90, 102]. 


Coupled Map Lattices for NNs 
Spatiotemporal dynamics in complex systems has been 
modeled using coupled map lattices (CML) [33.103]. 
CMLs use continuous state space and discrete time and 
space coordinates. In other words, CMLs are defined on 
(finite or infinite lattices) using discrete time iterations. 
Using periodic boundary conditions, the array can be 


250 


Fig. 33.19 Numerical simulations of a one-dimensional 
neural field model showing the interaction of two travel- 
ing pulses (after [33.101]) 


folded into a circle in one dimension, or into a torus 
for lattices of dimension 2 or higher. CML dynamics is 
described as follows 


1 K/2 

mi = -Af D)e J fal), 
k=—K/2 

(33.22) 


where x, (i) is the value of node i at iteration step n, i = 
1,..., N; N is the size of the lattice. Note that in (33.22) 
a periodic boundary condition applies. f(.) is a nonlin- 
ear mapping function used in the iterations and € is the 
coupling strength, 0 < £ < 1. e = 0 means no coupling, 
while £ = 1 is maximum coupling. The CML rule de- 
fined in (33.22) has two terms. The first term on the 
right-hand side is an iterative update of the i-th state, 
while the second term describes coupling between the 
units. Parameter K has a special role in coupled map 
lattices; it defines the size of the neighborhoods. K = N 
describes mean-field coupling, while smaller K values 
belong to smaller neighborhoods. The geometry of the 
system is similar to the ones given in Fig. 33.14. The 
case of local neighborhood is the upper left diagram in 
Fig. 33.14, while mean-field coupling is the lower right 
diagram. Similar rules have been defined for higher-di- 
mensional lattices. 

CMLs exhibit very rich dynamic behavior, includ- 
ing fixed points, limit cycles, and chaos, depending 
on the choice of control parameters, £, K, and func- 
tion f(.) [33.103, 104]. An example of the cubic sig- 
moid function 


f(x,a) = ax’ —ax+x 


is shown in Fig. 33.20, together with the bifurcation di- 
agram with respect to parameter a. By increasing the 
value of parameter a, the map exhibits bifurcations from 
fixed point to limit cycle, and ultimately to the chaotic 
regime. 

Complex CML dynamics has been used to design 
dynamic associative memory systems. In CML, each 
memory is represented as a spatially coherent oscil- 
lation and is learnt by a correlational learning rule 
operating in limit cycle or chaotic regimes. In such sys- 
tems, both the memory capacity and the basin volume 
for each memory are larger in CML than in the Hopfield 
model employing the same learning rule [33.105]. CML 
chaotic memories reduce the problem of spurious mem- 
ories, but they are not immune to it. Spurious memories 
prevent the system from exploiting its memory capacity 
to the fullest extent. 


621 


eee | d Hed 


622 


cee | d Hed 


Part D 


Neural Networks 


fœ 
1 


0.5 


-1 -0.5 0 0.5 1 
X 


Fig. 33.20a,b Transfer function for CML: (a) shape of the cubic transfer function f(x, a) = ax? — ax + x; (b) bifurcation 


diagram over parameter a 


Stochastic Resonance 
Field models of brain networks develop determinis- 
tic PDEs (Fokker—Planck equation) for macroscopic 
properties based on a statistical description of the 
underlying stochastic dynamics of microscopic neu- 
rons. In another words, they are deterministic sys- 
tems at the macroscopic level. Stochastic resonance 
(SR) deals with conditions when a bistable or multi- 
stable system exhibits strong oscillations under weak 
periodic perturbations in the presence of random 
noise [33.106]. In a typical SR situation, the weak 
periodic carrier wave is insufficient to cross the po- 
tential barrier between the equilibria of a multistable 
system. Additive noise enables the system to sur- 
mount the barrier and exhibit oscillations as it transits 
between the equilibria. SR is an example of pro- 
cesses when properly tuned random noise improves 
the performance of a nonlinear system and it is 
highly relevant to neural signal processing [33.107, 
108]. 

A prominent example of SR in a neural net- 
work with excitatory and inhibitory units is described 
in [33.109]. In the model developed, the activation 
rate of excitatory and inhibitory neurons is described 
by Me and ui, respectively. The ratio œ = [le/ fi is an 
important parameter of the system. The investigated 
neural populations exhibit a range of dynamic behav- 
iors, including convergence to fixed point, damped 
oscillations, and persistent oscillations. Figure 33.21 


summarizes the main findings in the form of a phase 
diagram in the space of parameters œ and noise level. 
The diagram contains three regions. Region I is at low 
noise levels and it corresponds to oscillations decay- 
ing to a fixed point at an exponential rate. Region II 
corresponds to high noise, when the neural activity ex- 
hibits damped oscillations as it approaches the steady 
state. Region III, however, demonstrates sustained os- 
cillations for an intermediate level of noise. If a is 
above a critical value (see the tip of Region MI), 


a 
Dynamical SR 
~. 
Damped oscillations 
Band-pass filter 
Threshold SR Critical 
fluctuations 


Network 
oscillations 


H 


M Berger effect 


> 
Noise 


Fig. 33.21 Stochastic resonance in excitatory—inhibitory 
neural networks; œ describes the relative strength of inhi- 
bition. Region I: fixed point dynamics. Region II: damped 
oscillatory regime. Region III: sustained periodic oscilla- 
tions illustrating stochastic resonance (after [33.109]) 


Neurodynamics | 33.2 Synchrony, Oscillations, and Chaos in Neural Networks 623 


a) X. Y.Z 


| 
thu I| ne HW | 


men i ent 


0 50 100 150 200 250 300 
Time 


AL ia 
. Jii | | iii 


Fig. 33.22a,b Lorenz attractor in the chaotic regime; (a) time series of the variables X, Y, and Z; (b) butterfly-winged 
chaotic Lorenz attractor in the phase space spanned by variables X, Y, and Z 


the activities in the steady state undergo a first-order 
phase transition at a critical noise level. The inten- 
sive oscillations in Region III at an intermediate noise 
level show that the output of the system (oscilla- 
tions) can be enhanced by an optimally selected noise 
level. 

The observed phase transitions may be triggered by 
neuronal avalanches, when the neural system is close 
to a critical state and the activation of a small number 
of neurons can generate an avalanche process of activa- 
tion [33.110]. Neural avalanches have been described 
using self-organized criticality (SOC), which has been 
identified in neural systems [33.111]. There is much 
empirical evidence of the cortex conforming to the self- 
stabilized, scale-free dynamics with avalanches during 
the existence of some quasi-stable states [33.112, 113]. 
These avalanches maintain a metastable background 
state of activity. 

Phase transitions have been studied in models with 
extended layers of excitatory and inhibitory neuron 
populations, respectively. A specific model uses ran- 
dom cellular neural networks to describe conditions 
with sustained oscillations [33.114]. The role of var- 
ious control parameters has been studied, including 
noise level, inhibition, and rewiring. Rewiring describes 
long axonal connections to produce neural network ar- 
chitectures resembling connectivity patterns with short 
and long-range axons in the neuropil. By properly tun- 
ing the parameters, the system can reside in a fixed 
point regime in isolation, but it will switch to per- 
sistent oscillations under the influence of learnt input 
patterns [33.115]. 


33.2.3 Chaotic Neural Networks 


Emergence of Chaos in Neural Systems 
Neural networks as dynamical systems are described 
by the state vector X(t) which obeys the equation of 
motion (33.13). Dynamical systems can exhibit fixed 
point, periodic, and chaotic behaviors. Fixed points and 
periodic oscillations, and transitions from one to the 
other through bifurcation dynamics has been described 
in Sect. 33.2.2. The trajectory of a chaotic system does 
not converge to a fixed point or limit cycle, rather it 
converges to a chaotic attractor. Chaotic attractors, or 
strange attractors, have the property that they define 
a fractal set in the state space, moreover, chaotic trajec- 
tories close to each other at some point, diverge from 
each other exponentially fast as time evolves [33.116, 
117]. 

An example of the chaotic Lorenz attractor is shown 
in Fig. 33.22. The Lorenz attractor is defined by a sys- 
tem of three ordinary differential equations (ODEs) 
with nonlinear coupling, originally derived for the de- 
scription of the motion of viscous flows [33.118]. The 
time series belonging to variables X, Y, Z are shown in 
Fig. 33.22a for parameters in the chaotic region, while 
the strange attractor is illustrated by the trajectory in the 
phase space, see Fig. 33.22b. 


Chaotic Neuron Model 
In chaotic neural networks the individual components 
exhibit chaotic behavior, and the goal is to study the or- 
der emerging from their interaction. Nerve membranes 
produce propagating action potentials in a highly non- 


eee | d Hed 


624 PartD 


Neural Networks 


cee | d Hed 


linear process which can generate oscillations and bi- 
furcations to chaos. Chaos has been observed in the gi- 
ant axons of squid and it has been used to study chaotic 
behavior in neurons. The Hodgkin—Huxley equations 
can model nonlinear dynamics in the squid giant axon 
with high accuracy [33.58]. The chaotic neuron model 
of Aihara et al. is an approximation of the Hodgkin- 
Huxley equation and it reproduces chaotic oscillations 
observed in the squid giant axon [33.119, 120]. The 
model uses the following simple iterative map 


x(t+ 1) = kx(t) -—afQ()) +a, 


where x(t) is the state of the chaotic neuron at time t, 
k is a decay parameter, œ characterizes refractoriness, 
a is a combined bias term, and f(y(t)) is a nonlin- 
ear transfer function. In the chaotic neuron model, 
the log sigmoid transfer function is used, see (33.17). 
Equation (33.23) combined with the sigmoid produces 
a piece-wise monotonous map, which generates chaos. 
Chaotic neural networks composed of chaotic neu- 
rons generate spatio-temporal chaos and are able to 
retrieve previously learnt patterns as the chaotic trajec- 
tory traverses the state space. Chaotic neural networks 
are used in various information processing systems 
with abilities of parallel distributed processing [33.12 1— 
123]. Note that CMLs also consist of chaotic oscillators 
produced by a nonlinear local iterative map, like in 
chaotic neural networks. CMLs define a spatial rela- 
tionship among their nodes to describe spatio-tempo- 
ral fluctuations. A class of cellular neural networks 
combines the explicit spatial relationships similar to 
CMLs with detailed temporal dynamics using Cohen- 
Grossberg model [33.83] and it has been used success- 
fully in neural network applications [33.124, 125]. 


(33.23) 


Collective Chaos in Neural Networks 
Chaos in neural networks can be an emergent macro- 
scopic property stemming from the interaction of non- 
linear neurons, which are not necessarily chaotic in iso- 
lation. Starting from the microscopic neural level up to 
the macroscopic level of cognition and consciousness, 
chaos plays an important role in neurodynamics [33.82, 
126-129]. There are various routes to chaos in neu- 
ral systems, including period-doubling bifurcations to 
chaos, chaotic intermittency, and collapse of a two-di- 
mensional torus to chaos [33.130, 131]. 

Chaotic itinerancy is a special form of chaos, 
which is between ordered dynamics and fully devel- 
oped chaos. Chaotic itinerancy describes the trajectory 
through high-dimensional state space of neural activ- 
ity [33.132]. In chaotic itinerancy the chaotic system 


is destabilized to some degree but some traces of the 
trajectories remain. This describes an itinerant behavior 
between the states of the system containing destabilized 
attractors or attractor ruins, which can be fixed point, 
limit cycle, torus, or strange attractor with unstable di- 
rections. Dynamical orbits are attracted to a certain 
attractor ruin, but they leave via an unstable mani- 
fold after a (short or long) stay around it and move 
toward another attractor ruin. This successive chaotic 
transition continues unless a strong input is received. 
A schematic diagram is shown in Fig. 33.23, where the 
trajectory of a chaotic itinerant system is shown visit- 
ing attractor ruins. Chaotic itinerancy is associated with 
perceptions and memories, the chaos between the at- 
tractor ruins is related to searches, and the itinerancy 
is associated with sequences in thinking, speaking, and 
writing. 

Frustrated chaos is a dynamical system in a neu- 
ral network with a global attractor structure when local 
connectivity patterns responsible for stable oscilla- 
tory behaviors become intertwined, leading to mutually 
competing attractors and unpredictable itinerancy be- 
tween brief appearances of these attractors [33.133]. 
Similarly to chaotic itinerancy, frustrated chaos is re- 
lated to destabilization of the dynamics and it generates 
itinerant, wavering oscillations between the orbits of 
the network, the trajectories of which have been stable 
with the original connectivity pattern. Frustrated chaos 
is shown to belong to the family of intermittency type 
of chaos [33.134, 135]. 

To characterize chaotic dynamics, tools of statistical 
time series analysis are useful. The studies may involve 
time and frequency domains. Time domain analysis 


Me 
Q, 
/ 
D 
E Q 
F 
oC 


Fig. 33.23 Schematic illustration of itinerant chaos with 
a trajectory visiting attractor ruins (after [33.132]) 


Neurodynamics 


33.2 Synchrony, Oscillations, and Chaos in Neural Networks 


includes attractor reconstruction, i.e., the attractor is 
depicted in the state space. Chaotic attractors have frac- 
tal dimensions, which can be evaluated using one of the 
available methods [33.136—-138]. In the case of low-di- 
mensional chaotic systems, the reconstruction can be 
illustrated using two or three-dimensional plots. An ex- 
ample of attractor reconstruction is given in Fig. 33.22 
for the Lorenz system with three variables. Attractor re- 
construction of a time series can be conducted using 
time-delay coordinates [33.139]. 

Lyapunov spectrum analysis is a key tool in iden- 
tifying and describing chaotic systems. Lyapunov ex- 
ponents measure the instability of orbits in different 
directions in the state space. It describes the rate of ex- 
ponential divergence of trajectories that were once close 
to each other. The set of corresponding Lyapunov ex- 
ponents constitutes the Lyapunov spectrum. The maxi- 
mum Lyapunov exponent A® is of crucial importance; 
as a positive leading Lyapunov exponent A* > 0 is the 
hallmark of chaos. X(t) describes the trajectory of the 
system in the phase space starting from X(0) at time ¢ = 
0. Denote by XAx (t) the perturbed trajectory starting 
from [X(0) + Axo]. The leading Lyapunov exponent can 
be determined using the following relationship [33.140] 


A* = lim C In[|Xax() —X(O|/|Aroll . 
Axo—>0 


(33.24) 


where A* <0 corresponds to convergent behavior, 
A* =0 indicates periodic orbits, and A* > 0 signi- 
fies chaos. For example, the Lorenz attractor has A* = 
0.906, indicating strong chaos (Fig. 33.24). Equa- 
tion (33.24) measures the divergence for infinitesimal 
perturbations in the limit of infinite time series. In prac- 
tical situations, especially for short time series, it is 
often difficult to distinguish weak chaos from random 
perturbations. One must be careful with conclusions 
about the presence of chaos when A* has a value 
close to zero. Lyapunov exponents are widely used in 
brain monitoring using electroencephalogram (EEG) 
analysis, and various methods are available for charac- 
terization of normal and pathological brain conditions 
based on Lyapunov spectra [33.141, 142]. 

Fourier analysis conducts data processing in the 
frequency domain, see (33.5) and (33.6). For chaotic 
signals, the shape of the power spectra is of special 
interest. Power spectra often show 1/f% power law be- 
havior in log—log coordinates, which is the indication of 
scale-free system and possibly chaos. Power-law scal- 
ing in systems at SOC is suggested by a linear decrease 


in log power with increasing log frequency [33.143]. 
Scaling properties of criticality facilitate the coexis- 
tence of spatially coherent cortical activity patterns for 
a duration ranging from a few milliseconds to a few 
seconds. Scale-free behavior characterizes chaotic brain 
activity both in time and frequency domains. For com- 
pleteness, we mention the Hilbert space analysis as an 
alternative to Fourier methods. The analytic signal ap- 
proach based on Hilbert analysis is widely used in brain 
monitoring. 


Emergent Macroscopic Chaos 

in Neural Networks 
Freeman’s K model describes spatio-temporal brain 
chaos using a hierarchical approach. Low-level K sets 
were introduced in the 1970s, named in the honor of 
Aharon Kachalsky, an early pioneer of neural dynam- 
ics [33.82,94]. K sets are multiscale models, describ- 
ing an increasing complexity of structure and dynam- 
ics. K sets are mesoscopic models and represent an 
intermediate level between microscopic neurons and 
macroscopic brain structures. K-sets are topological 
specifications of the hierarchy of connectivity in neu- 
ral populations in brains. K sets describe the spatial 
patterns of phase and amplitude of the oscillations gen- 
erated by neural populations. They model observable 
fields of neural activity comprising electroencephalo- 
grams (EEGs), local field potentials (LFPs), and mag- 
netoencephalograms (MEGs) [33.144]. K sets form 
a hierarchy for cell assemblies with components start- 
ing from KO to KIV [33.145, 146]. 

KO sets represent noninteractive collections of neu- 
rons forming cortical microcolumns; a KO set models 
a neuron population of ~ 10°—10* neurons. KO models 
dendritic integration in average neurons and an asym- 
metric sigmoid static nonlinearity for axon transmis- 
sion. The KO set is governed by a point attractor with 
zero output and stays at equilibrium except when per- 
turbed. In the original K-set models, KOs are described 
by a state-dependent, linear second-order ordinary dif- 
ferential equation (ODE) [33.94] 


ab &°X(t)/dt? + (a+ b) dX(t)/dt+ P(t) = U(t). 
(33.25) 


Here a and b are biologically determined time con- 
stants. X(t) denotes the activation of the node as a func- 
tion of time. U(t) includes an asymmetric sigmoid 
function Q(x), see (33.18), acting on the weighted sum 
of activation from neighboring nodes and any external 
input. 


625 


eee | d Hed 


626 Part D | Neural Networks 


cee | d Hed 


b) yıt-r) 


<— Receptors and glomeruli 


Activations, layer 2 


0 100 200 300 400 500 600 700 800 900 


Activations, layer 3 


0 100 200 300 400 500 600 700 800 900 


Activations, layer 4 


0 100 200 300 400 500 600 700 800 900 
Time (ms) 


yi(t) 


Fig. 33.24a-c KII diagram and behaviors; (a) 3 double layer hierarchy of KII and time series over each layer, exhibit- 
ing intermittent chaotic oscillations, (b) phase space reconstruction using delayed time coordinates 


KI sets are made of interacting KO sets, either exci- 
tatory or inhibitory with positive feedback. The dynam- 
ics of KI is described as convergence to a nonzero fixed 
point. If KI has sufficient functional connection density, 
then it is able to maintain a nonzero state of back- 
ground activity by mutual excitation (or inhibition). 


KI typically operates far from thermodynamic equi- 
librium. Neural interaction by stable mutual excitation 
(or mutual inhibition) is fundamental to understanding 
brain dynamics. KII sets consists of interacting exci- 
tatory and inhibitory KI sets with negative feedback. 
KII sets are responsible for the emergence of limit cy- 


Neurodynamics 


33.2 Synchrony, Oscillations, and Chaos in Neural Networks 


cle oscillation due to the negative feedback between the 
neural populations. Transitions from point attractor to 
limit cycle attractor can be achieved through a suit- 
able level of feedback gain or by input stimuli, see 
Fig. 33.18. 

KIII sets made up of multiple interacting KII sets. 
Examples include the sensory cortices. KII sets gen- 
erate broadband, chaotic oscillations as background 
activity by combined negative and positive feedback 
among several KII populations with incommensurate 
frequencies. The increase in nonlinear feedback gain 
that is driven by input results in the destabilization of 
the background activity and leads to the emergence of 
a spatial amplitude modulation (AM) pattern in KIMI. 
KIII sets are responsible for the embodiment of mean- 
ing in AM patterns of neural activity shaped by synaptic 
interactions that have been modified through learning in 
KII layers. The KII model is illustrated in Fig. 33.24 
with three layers of excitatory—inhibitory nodes. In 
Fig. 33.24a the temporal dynamics is illustrated in each 
layer, while Fig. 33.24b shows the phase space recon- 
struction of the attractor. This is a chaotic behavior 
resembling the dynamics of the Lorenz attractor in 
Fig. 33.22. KIV sets are made up of interacting KIM 
units to model intentional neurodynamics of the limbic 
system. KIV exhibits global phase transitions, which 
are the manifestations of hemisphere-wide coopera- 
tion through intermittent large-scale synchronization. 
KIV is the domain of Gestalt formation and preaffer- 
ence through the convergence of external and internal 
sensory signals leading to intentional action [33.144, 
146]. 


Properties of Collective Chaotic Neural 
Networks 
KIII is an associative memory, encoding input data 
in spatio-temporal AM patterns [33.147, 148]. KII 
chaotic memories have several advantages as compared 
to convergent recurrent networks: 


1. They produce robust memories based on relatively 
few learning examples even in noisy environment. 

2. The encoding capacity of a network with a given 
number of nodes is exponentially larger than their 
convergent counterparts. 

3. They can recall the stored data very quickly, just as 
humans and animals can recognize a learnt pattern 
within a fraction of a second. 


The recurrent Hopfield neural network can store 
an estimated 0.15N input patterns in stable attractors, 


where N is the number of neurons [33.84]. Exact anal- 
ysis by Mceliece et al. [33.149] shows that the memory 
capacity of the Hopfield network is N/(4logN). Various 
generalizations provide improvements over the initial 
memory gain [33.150, 151]. It is of interest to eval- 
uate the memory capacity of the KII memory. The 
memory capacity of chaotic networks which encode 
input into chaotic attractors is, in principle, exponen- 
tially increased with the number of nodes. However, 
the efficient recall of the stored memories is a serious 
challenge. The memory capacity of KII as a chaotic as- 
sociative memory device has been evaluated with noisy 
input patterns. The results are shown in Fig. 33.25, 
where the performance of Hopfield and KII memo- 
ries are compared; the top two plots are for Hopfield 
nets, while the lower two figures describe KIII re- 
sults [33.152]. The light color shows recognition rate 
close to 100%, while the dark color means poor recog- 
nition approaching 0. The right-hand column has higher 
noise levels. The Hopfield network shows the well- 
known linear gain curve ~ 0.15. The KIII model, on the 
other hand, has a drastically better performance. The 
boundary separating the correct and incorrect classifi- 
cation domains is superlinear; it has been fitted with as 
a fifth-order polynomial. 


Cognitive Implications 
of Intermittent Brain Chaos 
Developments in brain monitoring techniques provide 
increasingly detailed insights into spatio-temporal neu- 
rodynamics and neural correlates of large-scale cog- 
nitive processing [33.74, 153-155]. Brains as large- 
scale dynamical systems have a basal state, which is 
a high-dimensional chaotic attractor with a dynamic 
trajectory wandering broadly over the attractor land- 
scape [33.82, 126]. Under the influence of external 
stimuli, cortical dynamics is destabilized and condenses 
intermittently to a lower-dimensional, more organized 
subspace. This is the act of perception when the subject 
identifies the stimulus with a meaning in the context 
of its previous experience. The system stays intermit- 
tently in the condensed, more coherent state, which 
gives rise to a spatio-temporal AM activity pattern cor- 
responding to the stimulus in the given context. The 
AM pattern is meta-stable and it disintegrates as the 
system returns to the high-dimensional chaotic basal 
state (less synchrony) Brain dynamics is described 
as a sequence of phase transitions with intermittent 
synchronization-desynchronization effects. The rapid 
emergence of synchronization can be initiated by (Heb- 
bian) neural assemblies that lock into synchronization 


627 


eee | d Hed 


628 Part D | Neural Networks 


TEE | d Hed 


a) Hopfiled 


Noise level 40 % 


—. N N ww f 
O to O C1 S 


Size of the training set 


5 10 15 20 25 n305 d0 45550 
Size of the network 


b) K-model 
50 


Noise level 40 % 


10 15 20 25 30 35 40 45 50 
Size of the network 


Fe N N UU UARA 
W S A O A oe a | 


Size of the training set 


i= 
© 


T 


Noise level 50 % 


50 


Size of the training set 
v 
nn 


5 10 15 20 25 30 35 40 45 50 
Size of the network 


Noise level 50% 


Size of the training set 


5 10 15 20 25 30 35 40 45 50 
Size of the network 


Fig. 33.25a,b Comparison of the memory capacity of (a) Hopfield and (b) KHI neural networks; the noise level is 40% 
(left); 50% (right); the lighter the color the higher the recall accuracy. Observe the linear gain for Hopfield networks and 
the superlinear (fifth-order) separation for KIN (after [33.152]) 


across widespread cortical and subcortical areas [33.82, 
156, 157]. 

Intermittent oscillations in spatio-temporal neural 
dynamics are modeled by a neuropercolation approach. 
Neuropercolation is a family of probabilistic models 
based on the theory of probabilistic cellular automata 
on lattices and random graphs and it is motivated by 
structural and dynamical properties of neural popu- 
lations. Neuropercolation constructs the hierarchy of 
interactive populations in networks as developed in 
Freeman K models [33.94, 144], but replace differen- 
tial equations with probability distributions from the 
observed random networks that evolve in time [33.158]. 
Neuropercolation considers populations of cortical neu- 
rons which sustain their background state by mutual 
excitation, and their stability is guaranteed by the neural 
refractory periods. Neural populations transmit and re- 


ceive signals from other populations by virtue of small- 
world effects [33.77, 159]. Tools of statistical physics 
and finite-size scaling theory are applied to describe 
critical behavior of the neuropil. Neuropercolation the- 
ory provides a mathematical approach to describe phase 
transitions and critical phenomena in large-scale, in- 
teractive cortical networks. The existence of phase 
transitions is proven in specific probabilistic cellular au- 
tomata models [33.160, 161]. 

Simulations by neuropercolation models demon- 
strate the onset of large-scale synchronization-desyn- 
chronization behavior [33.162]. Figure 33.26 illustrates 
results of intermittent phase desynchronization for neu- 
ropercolation with excitatory and inhibitory popula- 
tions. Three main regimes can be distinguished, sepa- 
rated by critical noise values ¢; > £ọ. In Regime I ¢ > 
£1, Fig. 33.26a, the channels are not synchronous and 


Neurodynamics | 33.3 Memristive Neurodynamics 


a) 


b) c) 


0 0 
2000 2000 


4000 4000 


the phase values are distributed broadly. In Regime II 
£1 > £ > £p, Fig. 33.26b, the phase lags are drastically 
reduced indicating significant synchrony over extended 
time periods. Regime III is observed for high val- 
ues of £ọ >£, when the channels demonstrate highly 
synchronized, frozen dynamics, see Fig. 33.26c. Sim- 


33.3 Memristive Neurodynamics 


Sequential processing of fetch, decode, and execu- 
tion of instructions through the classical von Neu- 
mann digital computers has resulted in less efficient 
machines as their ecosystems have grown to be in- 
creasingly complex [33.164]. Though modern digital 
computers are fast and complex enough to emulate the 
brain functionality of animals like spiders, mice, and 
cats [33.165, 166], the associated energy dissipation in 
the system grows exponentially along the hierarchy 
of animal intelligence. For example, to perform cer- 
tain cortical simulations at the cat scale even at an 
83 times slower firing rate, the IBM team has to em- 
ploy Blue Gene/P (BG/P), a super computer equipped 
with 147456 CPUs and 144 TBs of main memory. On 
the other hand, the human brain contains more than 
100 billion neurons and each neuron has more than 
20000 synapses [33.167]. Efficient circuit implementa- 
tion of synapses, therefore, is very important to build 
a brain-like machine. One active branch of this re- 
search area is cellular neural networks (CNNs) [33.168, 
169], where lots of multiplication circuits are utilized in 
a complementary metal-oxide-semiconductor (CMOS) 
chip. However, since shrinking the current transistor 
size is very difficult, introducing a more efficient ap- 
proach is essential for further development of neural 
network implementations. 

The memristor was first authorized by Chua as 
the fourth basic circuit element in electrical circuits in 
1971 [33.170]. It is based on the nonlinear character- 


2000 


Fig. 33.26a-c Phase synchroni- 
zation—desynchronization with 
excitatory—inhibitory connections 
in neuropercolation with 256 gran- 
250 ule nodes; the z-axis shows the pair- 
200 wise phase between the units. (a) 
150 No synchrony; (b) intermittent syn- 
chrony; (c) highly synchronized, 
frozen phase regime (after [33.162]) 


4000 


ilar transitions can be induced by the relative strength 
of inhibition, as well as by the fraction of rewiring 
across the network [33.114, 115, 163]. The probabilistic 
model of neural populations reproduces important prop- 
erties of the spatio-temporal dynamics of cortices and is 
a promising approach for large-scale cognitive models. 


istics of charge and flux. By supplying a voltage or 
current to the memristor, its resistance can be altered. 
In this way, the memristor remembers information. In 
that seminal work, Chua demonstrated that the memris- 
tance M(q) relates the charge q and the flux g in such 
a way that the resistance of the device will change with 
the applied electric field and time 


m=, 
dq 

The parameter M denotes the memristance of a charge 
controlled memristor, measured in ohms. Thus, the 
memristance M can be controlled by applying a voltage 
or current signal across the memristor. In other words, 
the memristor behaves like an ordinary resistor at any 
given instance of time, where its resistance depends on 
the complete history of the device [33.170]. 

Although the device was proposed nearly four 
decades ago, it was not until 2008 that researchers from 
HP Labs showed that the devices they had fabricated 
were indeed two-terminal memristors [33.171]. Fig- 
ure 33.27 shows the I-V characteristics of a generic 
memristor, where memristance behavior is observed 
for TiOz-based devices. A TiOz—, layer with oxy- 
gen vacancies is placed on a perfect TiO, layer, and 
these layers are sandwiched between platinum elec- 
trodes. In metal oxide materials, the switching from Roff 
to Ron and vice versa occurs as a result of ion migra- 
tion, due to the enormous electric fields applied across 


(33.26) 


629 


EEE | d Hed 


630 PartD 


Neural Networks 


EEE | d Hed 


the nanoscale structures. These memristors have been 
fabricated using nanoimprint lithography and were suc- 
cessfully integrated on a CMOS substrate in [33.172]. 
Apart from these metal-oxide memristors, memristance 
has also been demonstrated using magnetic materials 
based on their magnetic domain wall motion and spin- 
torque induced magnetization switching in [33.173]. 


Current (A) x 10+ 


15 

1 

0.5 

0 

-0.5 

=il 
peasy 05 0 0.5 a 


Voltage (V) 


Fig. 33.27 Typical J-V characteristic of memristor (af- 
ter [33.171]). The pinched hysteresis loop is due to the 
nonlinear relationship between the memristance current 
and voltage. The parameters of the memristor are Ron = 
1002, Ror = 16KQ, Rint = 11kKQ, D= 10nm, w= 
107 cm? s7! yp = 10 and Vin = sin(2xt). The mem- 
ristor exhibits the feature of pinched hysteresis, which 
means that a lag occurs between the application and the 
removal of a field and its subsequent effect, just like the 
feature of neurons in the human brain 


f(x) 


l4 


0.5 If 


Fig. 33.28 Window function for different integer p 


Furthermore, several different types of nonlinear mem- 
ristor models have been investigated [33.174, 175]. 
One of them is the window model in which the state 
equation is multiplied by window function F,(œ), 
namely 


daw 


— 33.27 
T ( ) 


Ron. 
= Wy p OF) r 


where p is an integer parameter and F, (œ) is defined by 


F,(w) = 1-(22-1)", (33.28) 


which is shown in Fig. 33.28. 


33.3.1 Memristor-Based Synapses 


The design of simple weighting circuits for synap- 
tic multiplication between arbitrary input signals and 
weights is extremely important in artificial neural sys- 
tems. Some efforts have been made to build neuron- 
like analog neural networks [33.178—180]. However, 
this research has gained limited success so far be- 
cause of the difficulty in implementing the synapses 
efficiently. Based on the memristor, a novel weight- 
ing circuit was proposed by Kim et al. [33.176, 181, 
182] as shown in Fig. 33.29. The memristors pro- 
vide a bridge-like switching for achieving either posi- 
tive or negative weighting. Though several memristors 
are employed to emulate a synapse, the total area of 
the memristors is less than that of a single transis- 
tor. To compensate for the spatial nonuniformity and 
nonideal response of the memristor bridge synapse, 
a modified chip-in-the-loop learning scheme suitable 


Vss 


Fig. 33.29 Memristor bridge circuit. The synaptic weight 
is programmable by varying the input voltage. The weight- 
ing of the input signal is also performed in this circuit 
(after [33.176]) 


Neurodynamics | 33.3 Memristive Neurodynamics 


Pe Oe ee eee BWW 
POO eee eee Beeler 
OOOO ele eee BW eee 
PO OOO ee Oe BW WWW 
WD E ll de AA A AA R NA NR 
POO eee eee NRAN RNR RNR 
PaO eee eee ANNARA ANN RNR 
MAAA AATA EEEE EEEE 
WEEE D EE A A A A A A R A N 
LALALALA TRONE EEEREN 
Ore ee evi ANANA 
SOOO OO OW AAAS 
DESSE ANAA ANANN AAR 
GEESE AAAA R 
DEEE EEEE < AAA 
re ele evi AAAA RN RNR RNR 
OV WW ewe AAAA RANNAR 
BAA Aoeee eebE: 
oa oar ay aa al 


Fig. 33.30 Neuromorphic memristive computer equipped 
with STDP (after [33.177]) 


for the proposed neural network architecture is inves- 
tigated [33.176]. In the proposed method, the initial 
learning is conducted by software, and the behavior of 
the software-trained network is learned via the hard- 
ware network by learning each of the single layered 
neurons of the network independently. The forward 
calculation of single layered neuron learning is im- 
plemented through circuit hardware and is followed 
by a weight updating phase assisted by a host com- 


puter. Unlike conventional chip-in-the-loop learning, 
the need for the readout of synaptic weights for cal- 
culating weight updates in each epoch is eliminated by 
virtue of the memristor bridge synapse and the proposed 
learning scheme. 

On the other hand, spike-timing-dependent learn- 
ing (STDP), which is a powerful learning paradigm 
for spiking neural systems because of its massive 
parallelism, potential scalability, and inherent defect, 
fault, and failure-tolerance, can be implemented by 
using a crossbar memristive array combined with neu- 
rons that asynchronously generate spikes of a given 
shape [33.177,185]. Such spikes need to be sent 
back through the neurons to the input terminal as in 
Fig. 33.30. The shape of the spikes turns out to be very 
similar to the neural spikes observed in realistic bio- 
logical neurons. The STDP learning function obtained 
by combining such neurons with memristors is exactly 
obtained from neurophysiological experiments on real 
synapses. Such nanoscale synapses can be combined 
with CMOS neurons which is possible to create neuro- 
morphic hardware several orders of magnitude denser 
than in conventional CMOS. This method offers bet- 
ter control over power dissipation; fewer constraints on 
the design of memristive materials used for nanoscale 
synapses; greater freedom in learning algorithms than 
traditional design of synapses since the synaptic learn- 
ing dynamics can be dynamically turned on or off; 
greater control over the precise form and timing of the 
STDP equations; the ability to implement a variety of 
other learning laws besides STDP; better circuit diver- 
sity since the approach allows different learning laws 
to be implemented in different areas of a single chip 


g 


wT four} cE 


B 


“i EOE 


Gi 
E 


Fig. 33.31 Memristor-based cellular 
neural networks cell (after [33.183]) 


631 


EEE | d Hed 


632 


EEE | d Hed 


Part D 


Neural Networks 


y 


Fig. 33.32 Simple realization of MNN based on fuzzy concepts (after [33.184]) 


using the same memristive material for all synapses. 
Furthermore, an analog CMOS neuromorphic design 
utilizing STDP and memristor synapses is investigated 
for use in building a multipurpose analogy neuromor- 
phic chip [33.186]. In order to obtain a multipurpose 
chip, a suitable architecture is established. Based on the 
technique of IBM 90nm CMOS9RF, neurons are de- 
signed to interface with Verilog-A memristor synapses 
models to perform the XOR operation and edge detec- 
tion function. 

To make the neurons compatible with such new 
synapses, some novel training methods are proposed. 
For instance, Manem et al. proposed a variation-tolerant 
training method to efficiently reconfigure memristive 
synapses in a trainable threshold gate array (TTGA) 
system [33.187]. The training process is inspired from 
the gradient descent machine learning algorithm com- 
monly used to train artificial threshold neural networks 


known as perceptrons. The proposed training method 
is robust to the unpredictability of CMOS and nanocir- 
cuits with decreasing technology sizes, but also pro- 
vides its own randomness in its training. 


33.3.2 Memristor-Based Neural Networks 


Employing memristor-based synapses, some results 
have been obtained about the memristor-based neural 
networks [33.183, 184, 188]. As the template weights in 
memristor-based neural networks (MNNs) are usually 
known and need to be updated between each template 
in a sequence of templates, there should be a way to 
rapidly change the weights. Meanwhile, the MNN cells 
need to be modified, as the programmable couplings 
are implemented by memristors which require program- 
ming circuits to isolate each other. Lehtonen and Laiho 
proposed a new cell of memristor-based cellular neural 


Neurodynamics 


33.3 Memristive Neurodynamics 


network that can be used to program the templates. For 
this purpose, a voltage global is input into the cell. This 
voltage is used to convey the weight of one connection 
into the cells [33.183]. The level of virtual ground and 
switches are controlled so that the memristor connected 
to a particular neighbor is biased above the program- 
ming threshold, until it reaches the desired resistance 
value. 

Merrikh-Bayat et al. presented a new way to explain 
the relationships between logical circuits and artificial 
neural networks, logical circuits and fuzzy logic, and 
artificial neural networks and fuzzy inference systems, 
and proposed a new neuro-fuzzy computing system, 
which can effectively be implemented via the mem- 
ristor-crossbar structure [33.184]. A simple realization 
of MNNs is shown in Figs. 33.32-33.34. Figure 33.32 
shows that it is possible to interpret the working pro- 
cedure of conventional artificial neural network ANN 
without changing its structure. In this figure, each row 
of the structure implements a simple fuzzy rule or min- 
term. Figure 33.33 shows how the activation function 
of neurons can be implemented when the activation 
function is modeled by a t-norm operator. Matrix mul- 
tiplication is performed by vector circuit in Fig. 33.34. 
This circuit consists of a simple memristor crossbar 
where each of its rows is connected to the virtually 
grounded terminal of an operational amplifier that plays 
the role of a neuron with identity activation function. 
The advantages of the proposed system are twofold: 
first, its hardware can be directly trained using the 
Hebbian learning rule and without the need to per- 
form any optimization; second, this system has a great 
ability to deal with a huge number of input-output 
training data without facing problems like overtraing- 
ing. 

Howard et al. proposed a spiking neuro-evolution- 
ary system which implements memristors as plas- 
tic connections [33.188]. These memristors provide 
a learning architecture that may be beneficial to the evo- 
lutionary design process that exploits parameter self- 
adaptation and variable topologies, allow the num- 
ber of neurons, connection weights, and interneu- 
ral connectivity pattern to emerge. This approach 
allows the evolution of networks with appropriate 
complexity to emerge whilst exploiting the memris- 
tive properties of the connections to reduce learning 
time. 

To investigate the dynamic behaviors of memris- 
tor-based neural networks, Zeng etal. proposed the 
memristor-based recurrent neural networks (MRNNs) 
[33.189, 190] shown in Fig. 33.35, where x;(.) is the 


Fig. 33.33 Implementation of the activation function of neurons 


(after [33.184]) 


eee we 
eee 


Ká > 
a 


p 


E E E 


T(t) 


LO bw TAO) 


Neuron with identity 
activation function 


T-norm operator 


E On-2 (t) 


Ona) 


On (t) 


Fig. 33.34 Memristor crossbar-based circuit (after [33.184]) 


state of the i-th subsystem, f;(.) is the amplifier, My; 
is the connection memristor between the amplifier fi(.) 
and state x;(.), R; and C; are the resistor and capaci- 
tor, I; is the external input, a;, b; are the outputs, i, j = 
1,2,...,n. The parameters in this neural network are 


633 


EEE | d Hed 


634 Part D | Neural Networks 
D L il 
Mgin Mgon ~M8nn —— y 
Mf; (Łj=1, 2, ....7) 
J Mg 12 Mg22 Mgm 
| —Mgi1 | Mgn1 I Mgnt Y 
Mf, Mfp, —Mfin 
Il Mfi2 Mf. Mf,2 I 
Mf: Mf Mfnı 
+ 4- $ Mf; (i, j=1, 2, ..., n) 
xÝ (Ca x C3 x,¥ Cu 
Laava 
Ry R; z Ru 
fi () b (.) O) 
P Mg; (i,j=1, 2, ...,7) 
UO) \/ B'O \/ fa) v | 
| Mgi(i,j=1,2, ..., n) 
a, | bi a| b2 an | bn 
Fig. 33.35 Circuit of a memristor-based recurrent network (after [33.189]) 
changed according to the state of the system, so this ble applications in analog, digital information process- 
network is a state-dependent switching system. The ing, and memory and logic applications. However, the 
dynamic behavior of this neural network with time- problem, of how to take advantage of the nonvolatile 
varying delays was investigated based on the Filippov memory of memristors, nanoscale, low-power dissipa- 
theory and the Lyapunov method. tion, and so on to design a method to process and store 
the information, which needs learning and memory, into 
33.3.3 Conclusion the synapses of the memristor-based neural networks at 
the dynamical mapping space by a more rational space- 
z Memristor-based synapses and neural networks have parting method, is still an open issue. Further investiga- 
5 been investigated by many scientists for their possi- tion is needed to shorten such a gap. 
(=j 
w 
S 
F 


33.4 Neurodynamic Optimization 


Optimization is omnipresent in nature and society, and 
an important tool for problem-solving in science, en- 
gineering, and commerce. Optimization problems arise 
in a wide variety of applications such as the design, 
planning, control, operation, and management of en- 
gineering systems. In many applications (e.g., online 


pattern recognition and in-chip signal processing in mo- 
bile devices), real-time optimization is necessary or 
desirable. For such applications, conventional optimiza- 
tion techniques may not be competent due to stringent 
requirements on computational time. It is computation- 
ally challenging when optimization procedures are to 


Neurodynamics | 33.4 Neurodynamic Optimization 


be performed in real time to optimize the performance 
of dynamical systems. 

The brain is a profound dynamic system and its 
neurons are always active from birth to death. When 
a decision is to be made in the brain, many of its neu- 
rons are highly activated to gather information, search 
memory, compare differences, and make inferences 
and decisions. Recurrent neural networks are brain-like 
nonlinear dynamic system models and can be prop- 
erly designed to imitate biological counterparts and 
serve as goal-seeking parallel computational models 
for solving optimization problems in a variety of set- 
tings. Neurodynamic optimization can be physically 
realized in designated hardware such as application- 
specific integrated circuits (ASICs) where optimization 
is carried out in a parallel and distributed manner, where 
the convergence rate of the optimization process is in- 
dependent of the problem dimensionality. Because of 
the inherent nature of parallel and distributed informa- 
tion processing, neurodynamic optimization can handle 
large-scale problems. In addition, neurodynamic opti- 
mization may be used for optimizing dynamic systems 
in multiple time scales with parameter-controlled con- 
vergence rates. These salient features are particularly 
desirable for dynamic optimization in decentralized 
decision-making scenarios [33.191—194]. While pop- 
ulation-based evolutionary approaches to optimization 
have emerged as prevailing heuristic and stochastic 
methods in recent years, neurodynamic optimization 
deserves great attention in its own right due to its close 
ties with optimization and dynamical systems theories, 
as well as its biological plausibility and circuit imple- 
mentability with very large scale integration (VLSI) or 
optical technologies. 


33.4.1 Neurodynamic Models 


The past three decades witnessed the birth and growth 
of neurodynamic optimization. Although a couple of 
circuit-based optimization methods were developed 
earlier [33.195—197], it was perhaps Hopfield and Tank 
who spearheaded neurodynamic optimization research 
in the context of neural computation with their sem- 
inal work in the mid 1980s [33.198—-200]. Since the 
inception, numerous neurodynamic optimization mod- 
els in various forms of recurrent neural networks have 
been developed and analyzed, see [33.201—256], and 
the references therein. For example, Tank and Hop- 
field extended the continuous-time Hopfield network 
for linear programming and showed their experimen- 
tal results with a circuit of operational amplifiers and 


other discrete components on a breadboard [33.200]. 
Kennedy and Chua developed a circuit-based recurrent 
neural network for nonlinear programming [33.201]. It 
is proven that the state of the neurodynamics is glob- 
ally convergent and an equilibrium corresponding to an 
approximate optimal solution of the given optimization 
problems. 

Over the years, neurodynamic optimization re- 
search has made significant progress with models with 
improved features for solving various optimization 
problems. Substantial improvements of neurodynamic 
optimization theory and models have been made in the 
following dimensions: 


i) Solution quality: designed based on smooth penalty 
methods with a finite penalty parameter; the earliest 
neurodynamic optimization models can converge to 
approximate solutions only [33.200, 201]. Later on, 
better models designed based on other design prin- 
ciples can guarantee to state or output convergence 
to exact optimal solutions of solvable convex and 
pseudoconvex optimization problems with or with- 
out any conditions [33.204, 205, 208, 210], etc. 

ii) Solvability scope: the solvability scope of neuro- 
dynamic optimization has been expanded from lin- 
ear programming problems [33.200, 202, 208, 211, 
212, 214-219, 223, 242, 244, 251], to quadratic pro- 
gramming problems [33.202—206, 210, 214, 217, 
218, 220, 225, 226, 229, 233, 240-243, 247], to 
smooth convex programming problems with various 
constraints [33.201, 204, 205, 210, 214, 222, 224, 
228, 230, 232, 234, 237, 245, 246, 257], to nons- 
mooth convex optimization problems [33.235, 248, 
250-256], and recently to nonsmooth optimization 
with some nonconvex objective functions or con- 
straints [33.239, 249, 254-256]. 

iii) Convergence property: the convergence property of 
neurodynamic optimization models has been ex- 
tended from near-optimum convergence [33.200, 
201], to conditional exact-optimum global conver- 
gence [33.205, 208, 210], to guaranteed global con- 
vergence [33.204, 205, 214-216, 218, 219, 222, 
226-228, 230, 232, 234, 240, 243, 245, 247, 250, 
253, 256, 257], to faster global exponential con- 
vergence [33.206, 224, 225, 228, 233, 237, 239, 241, 
246, 254], to even more desirable finite-time con- 
vergence [33.235, 248, 249, 251,252,255], with in- 
creasing convergence rate. 

iv) Model complexity: the neurodynamic optimization 
models for constrained optimization are essentially 
multilayer due to the introduction of instrumen- 


635 


wee | d Hed 


636 Part D 


Neural Networks 


ee | d Hed 


tal variables for constraint handling (e.g., Lagrange 
multipliers or dual variables). The architectures of 
later neurodynamic optimization models for solv- 
ing linearly constrained optimization problems have 
been reduced from multilayer structures to single- 
layer ones with decreasing model complexity to 
facilitate their implementation [33.243, 244, 251, 
252, 254, 255]. 

Activation functions are a signature component 
of neural network models for quantifying the 
firing state activities of neurons. The activation 
functions in existing neurodynamic optimization 
models include smooth ones (e.g., sigmoid), as 
shown in Fig. 33.36a,b [33.200, 208-210], nons- 
mooth ones (e.g., piecewise-linear) as shown in 
Fig. 33.36c,d [33.203, 206], and even discontinuous 
ones as shown in Fig. 33.36e,f [33.243, 244, 251, 
252, 254, 255]. 


33.4.2 Design Methods 


The crux of neurodynamic optimization model design 
lies in the derivation of a convergent neurodynamic 
equation that prescribes the states of the neurodynam- 
ics. A properly derived neurodynamic equation can 
ensure that the states of neurodynamics reaches an equi- 


a) b) 
c) d) 
e) f) 


Fig. 33.36a-f Three classes of activation functions in 
neurodynamic optimization models: smooth in (a) and (b), 
nonsmooth in (c) and (d), and discontinuous in (e) and (f) 


librium that satisfies the constraints and optimizes the 
objective function. Although the existing neurodynamic 
optimization models are highly diversified with many 
different features, the design methods or principles for 
determining their neurodynamic equations can be cate- 
gorized as follows: 


i) Penalty methods 

ii) Lagrange methods 
iii) Duality methods 

iv) Optimality methods. 


Penalty Methods 
Consider the general constrained optimization problem 


minimize f(x) 
subject to g(x) <0, 
h(x) = 0, 


where x € Re” is the vector of decision variables, f(x) 
is an objective function, g(x) = [g1(x),...,%m(x)]' is 
a vector-valued function, and h(x) = [hi (x),..., Mp w] 
a vector-valued function. 

A penalty method starts with the formulation of 
a smooth or nonsmooth energy function based on 
a given objective function f(x) and constraints g(x) 
and h(x). It plays an important role in neurodynamic 
optimization. Ideally, the minimum of a formulated en- 
ergy function corresponds to the optimal solution of 
the original optimization problem. For constrained op- 
timization, the minimum of the energy function has to 
satisfy a set of constraints. Most early approaches for- 
mulate an energy function by incorporating objective 
function and constraints through functional transfor- 
mation and numerical weighting [33.198—201]. Func- 
tional transformation is usually used to convert con- 
straints to a penalty function to penalize the violation 
of constraints; e.g., a smooth penalty function is as 
follows 


m p 


Pa) =5 Erwt. 


i=1 j=l 


where [y] t = max{0, y}. Numerical weighting is often 
used to balance constraint satisfaction and objective op- 
timization, e.g., 


E(x) = f(x) + wpe) . 


where w is a positive weight. 


Neurodynamics | 33.4 Neurodynamic Optimization 


In smooth penalty methods, neurodynamic equa- 
tions are usually derived as the negative gradient flow 
of the energy function in the form of a differential equa- 
tion 

dx(t) 

rp x —VE(x(t)). 

If the energy function is bounded below, the stability 
of the neurodynamics can be ensured. Nevertheless, 
the major limitation is that the neurodynamics de- 
signed using a smooth penalty method with any fixed 
finite penalty parameter can converge to an approxi- 
mate optimal solution only, as a compromise between 
constraint satisfaction and objective optimization. One 
way to remedy the approximated limitation of smooth 
penalty design methods is to introduce a variable 
penalty parameter. For example, a time-varying de- 
laying penalty parameter (called temperature) is used 
in deterministic annealing networks to achieve ex- 
act optimality with a slow cooling schedule [33.208, 
210]. 

If the objective function or penalty function is nons- 
mooth, the gradient has to be replaced by a generalized 
gradient and the neurodynamics can be modeled us- 
ing a differential inclusion [33.235, 248, 249, 251, 252, 
255]. Two advantages of nonsmooth penalty methods 
over smooth ones are possible constraint satisfaction 
and objective optimization with some finite penalty pa- 
rameters and finite-time convergence of the resulting 
neurodynamics. Needless to say, nonsmooth neurody- 
namics are much more difficult to analyze to guarantee 
their stability. 


Lagrange Methods 
A Lagrange method for designing a neurodynamic 
optimization model begins with the formulation of 
a Lagrange function (Lagrangian) instead of an energy 
function [33.204, 205]. A typical Lagrangian is defined 
as 


m Pp 


Læ WD =f) + Ag+ ho), 


i=1 j=l 


where A = (A1,...,Am)' and À = (y,..., Ly)" are La- 
grange multipliers, for inequality constraints g(x) and 
equality constraints h(x), respectively. 

According to the saddle-point theorem, the opti- 
mal solution can be determined by minimizing the 
Lagrangian with respect to x and maximizing it with 
respect to A and u. Therefore, neurodynamic equations 


can be derived in an augmented space 


SO = VLOA MO), 
AO L vLE, AOO) 
eHO L V LEOA KO), 


where € is a positive time constant. The equilibrium 
of the Lagrangian neurodynamics satisfy the Lagrange 
necessary optimality conditions. 


Duality Methods 
For convex optimization, the objective functions of pri- 
mal and dual problems reach the same value at their 
optima. In view of this duality property, the dual- 
ity methods for designing neurodynamic optimization 
models begin with the formulation of an energy func- 
tion consisting of a duality gap between the primal and 
dual problems and a constraint-based penalty function, 


e.g., 
1 
E(x, y) = ZF) —faly))? + p(x) + pal) , 


where y is a vector of dual variables, fa(y) is the dual 
objective function to be maximized, p(x) and pa(y) 
are, respectively, smooth penalty functions to pe- 
nalize the violations of constraints of primal (orig- 
inal) and dual problems. The corresponding neuro- 
dynamic equation can be derived with guaranteed 
global stability as the negative gradient flow of the 
energy function similarly as in the aforementioned 
smooth penalty methods [33.216, 218, 222, 226, 258, 
259]. Neurodynamic optimization models designed by 
using duality design methods can guarantee global 
convergence to the exact optimal solutions of con- 
vex optimization problems without any parametric 
condition. 

In addition, using duality methods, dual networks 
and their simplified/improved versions can be designed 
for quadratic programming with reduced model com- 
plexity by mapping their global convergent optimal dual 
state variables to optimal primal solutions via linear 
or piecewise-linear output functions [33.240, 247, 260- 
263]. 


Optimality Methods 
The neurodynamic equations of some recent models 
are derived based on optimality conditions (e.g., the 


637 


ee | d Hed 


638 PartD 


Neural Networks 


ee | d Hed 


Karush—Kuhn-Tucker condition) and projection meth- 
ods. Basically, the methods are to map the equilibrium 
of the designed neurodynamic optimization models to 
the equivalent equalities given by optimality conditions 
and projection equations (i. e., all equilibria essentially 
satisfy the optimality conditions) [33.225, 227, 228]. 
For several types of common geometric constraints 
(such as nonnegative constraints, bound constraints, 
and spherical constraints), some projection operators 
map the neuron state variables onto the convex fea- 
sible regions by using their activation functions and 
avoid the use of excessive dual variables as in the 
dual networks, and thus lower the model complexity. 
For neurodynamic optimization models designed using 
optimality methods, stability analysis is needed ex- 
plicitly to ensure that the resulting neurodynamics are 
stable. 

Once a neurodynamic equation has been derived 
and its stability is proven, the next step is to deter- 
mine the architecture of the neural network in terms of 
the neurons and connections based on the derived neu- 
rodynamic equation. The last step is usually devoted 
to simulation or emulation to test the performance of 
the neural network numerically or physically. The sim- 
ulation/emulation results may reveal additional prop- 
erties or characteristics for further analysis or model 
redesign. 


33.4.3 Selected Applications 


Over the last few decades, neurodynamic optimization 
has been widely applied in many fields of science, engi- 
neering, and commerce, as highlighted in the following 
selected nine areas. 


Scientific Computing 
Neurodynamic optimization models ave been devel- 
oped for solving linear equations and inequalities and 
computing inverse or pseudoinverse matrices [33.240, 
264-268]. 


Network Routing 
Neurodynamic optimization models have been devel- 
oped or applied for shortest-path routing in networks 
modeled by using weighted directed graphs [33.258, 
269-271]. 


Machine Learning 
Neurodynamic optimization has been applied for sup- 
port vector machine learning to take the advantages of 
its parallel computational power [33.272-274]. 


Data Processing 
The data processing applications of neurodynamic 
optimization include, but are not limited to, sort- 
ing [33.275—277], winners-take-all selection [33.240, 
277,278], data fusion [33.279], and data reconcilia- 
tion [33.254]. 


Signal/Image Processing 
The applications of neurodynamic optimization for sig- 
nal and image processing include, but are not limited 
to, recursive least-squares adaptive filtering, overcom- 
plete signal representations, time delay estimation, and 
image restoration and reconstruction [33.191, 203, 204, 
280-283]. 


Communication Systems 
The telecommunication applications of neurodynamic 
optimization include beamforming [33.284, 285]) and 
simulations of DS-CDMA mobile communication sys- 
tems [33.229]. 


Control Systems 
Intelligent control applications of neurodynamic op- 
timization include pole assignment for synthesizing 
linear control systems [33.286—289] and model predic- 
tive control for linear/nonlinear systems [33.290—292]. 


Robotic Systems 
The applications of neurodynamic optimization in in- 
telligent robotic systems include real-time motion plan- 
ning and control of kinematically redundant robot ma- 
nipulators with torque minimization or obstacle avoid- 
ance [33.259-263, 267, 293-298] and grasping force 
optimization for multifingered robotic hands [33.299]. 


Financial Engineering 
Recently, neurodynamic optimization was also applied 
for real-time portfolio selection based on an equivalent 
probability measure to optimize the asset distribution in 
financial investments; [33.255, 300]. 


33.4.4 Concluding Remarks 


Neurodynamic optimization provides a parallel dis- 
tributed computational model for solving many opti- 
mization problems. For convex and convex-like opti- 
mization, neurodynamic optimization models are avail- 
able with guaranteed optimality, expended applicability, 
improved convergence properties, and reduced model 
complexity. Neurodynamic optimization approaches 
have been demonstrated to be effective and efficient for 


Neurodynamics | References 


many applications, especially those with real-time solu- 
tion requirements. 

The existing results can still be further improved 
to expand their solvability scope, increase their con- 
vergence rate, or reduce their model complexity. With 
the view that neurodynamic approaches to global op- 
timization and discrete optimization are much more 


References 


interesting and challenging, it is necessary to develop 
neurodynamic models for nonconvex optimization and 
combinatorial optimization. In addition, neurodynamic 
optimization approaches could be more widely ap- 
plied for many other application areas in conjunction 
with conventional and evolutionary optimization ap- 
proaches. 


33.1 R. Abraham: Dynamics: The Geometry of Behavior 
(Aerial, Santa Cruz 1982) 

33.2 J. Robinson: Attractor. In: Encyclopedia of Nonlin- 
ear Science, ed. by A. Scott (Routledge, New York 
2005) pp. 26-28 

REHE] S. Grossberg: Nonlinear difference-differential 
equations in prediction and learning theory, Proc. 
Natl. Acad. Sci. 58, 1329-1334 (1967) 

33.4 S. Grossberg: Global ratio limit theorems for 
some nonlinear functional differential equations 
|, Bull. Am. Math. Soc. 74, 93-100 (1968) 

33.5 H. Zhang, Z. Wang, D. Liu: Robust exponential sta- 
bility of recurrent neural networks with multiple 
time-varying delays, IEEE Trans. Circuits Syst. Il: 
Express Br. 54, 730-734 (2007) 

33.6 A.N. Michel, K. Wang, D. Liu, H. Ye: Qualitative 
limitations incurred in implementations of recur- 
rent neural networks, IEEE Cont. Syst. Mag. 15(3), 
52-65 (1995) 

33.7 H. Zhang, Z. Wang, D. Liu: Global asymptotic 
stability of recurrent neural networks with multi- 
ple time varying delays, IEEE Trans. Neural Netw. 
19(5), 855-873 (2008) 

33.8 S. Hu, D. Liu: On the global output convergence of 
a class of recurrent neural networks with time- 
varying inputs, Neural Netw. 18(2), 171-178 (2005) 

33.9 D. Liu, S. Hu, J. Wang: Global output conver- 
gence of a class of continuous-time recurrent 
neural networks with time-varying thresholds, 
IEEE Trans. Circuits Syst. Il: Express Br. 51(4), 161- 
167 (2004) 

33.10 H. Zhang, Z. Wang, D. Liu: Robust stability analy- 
sis for interval Cohen-Grossberg neural networks 
with unknown time varying delays, IEEE Trans. 
Neural Netw. 19(11), 1942-1955 (2008) 

33.11 M. Han, J. Fan, J. Wang: A dynamic feedforward 
neural network based on Gaussian particle swarm 
optimization and its application for predictive 
control, IEEE Trans. Neural Netw. 22(9), 1457-1468 
(2011) 

33.12 S. Mehraeen, S. Jagannathan, M.L. Crow: Decen- 
tralized dynamic surface control of large-scale 
interconnected systems in strict-feedback form 
using neural networks with asymptotic stabiliza- 
tion, IEEE Trans. Neural Netw. 22(11), 1709-1722 
(2011) 


33.13 Y. Zhang, T. Chai, H. Wang: A nonlinear control 
method based on anfis and multiple models for 
a class of SISO nonlinear systems and its appli- 
cation, IEEE Trans. Neural Netw. 22(11), 1783-1795 
(2011) 

33.14 Y. Chen, W.X. Zheng: Stability and L perfor- 
mance analysis of stochastic delayed neural net- 
works, IEEE Trans. Neural Netw. 22(10), 1662-1668 
(2011) 

33.15 M. Di Marco, M. Grazzini, L. Pancioni: Global 
robust stability criteria for interval delayed full- 
range cellular neural networks, IEEE Trans. Neural 
Netw. 22(4), 666-671 (2011) 

33.16 W.-H. Chen, W.X. Zheng: A new method for com- 
plete stability analysis of cellular neural networks 
with time delay, IEEE Trans. Neural Netw. 21(7), 
1126-1139 (2010) 

33.17 H. Zhang, Z. Wang, D. Liu: Global asymptotic 
stability and robust stability of a general class 
of Cohen-Grossberg neural networks with mixed 
delays, IEEE Trans. Circuits Syst. I: Regul. Pap. 
56(3), 616-629 (2009) 

33.18 X.X. Liao, J. Wang: Algebraic criteria for global 
exponential stability of cellular neural networks 
with multiple time delays, IEEE Trans. Circuits 
Syst. | 50, 268-275 (2003) 

33.19 Z.G. Zeng, J. Wang, X.X. Liao: Global exponen- 
tial stability of a general class of recurrent neural 
networks with time-varying delays, IEEE Trans. 
Circuits Sys. | 50(10), 1353-1358 (2003) 

33.20 D. Angeli: Multistability in systems with counter- 
clockwise input-output dynamics, IEEE Trans. Au- 
tom. Control 52(4), 596-609 (2007) 

33.21 D. Angeli: Systems with counterclockwise input- 
output dynamics, IEEE Trans. Autom. Control 51(7), 
1130-1143 (2006) 

33.22 D. Angeli: Convergence in networks with coun- 
terclockwise neural dynamics, IEEE Trans. Neural 
Netw. 20(5), 794-804 (2009) 

33.23 J. Saez-Rodriguez, A. Hammerle-Fickinger, 
0. Dalal, S. Klamt, E.D. Gilles, C. Conradi: Multi- 
stability of signal transduction motifs, IET Syst. 
Biol. 2(2), 80-93 (2008) 

33.24 L. Chandrasekaran, V. Matveev, A. Bose: Multista- 
bility of clustered states in a globally inhibitory 
network, Phys. D 238(3), 253-263 (2009) 


639 


EE | d Hed 


640 Part D 


Neural Networks 


EE | d Hed 


33:25 


33.26 


33.27 


33.28 


33.29 


33.30 


33.31 


33.32 


33.33 


33.34 


33.35 


33.36 


33.37 


33.38 


33.39 


33.40 


B.K. Goswami: Control of multistate hopping in- 
termittency, Phys. Rev. E 78(6), 066208 (2008) 

A. Rahman, M.K. Sanyal: The tunable bistable and 
multistable memory effect in polymer nanowires, 
Nanotechnology 19(39), 395203 (2008) 

K.C. Tan, H.J. Tang, W.N. Zhang: Qualitative anal- 
ysis for recurrent neural networks with linear 
threshold transfer functions, IEEE Trans. Circuits 
Syst. I: Regul. Pap. 52(5), 1003-1012 (2005) 

H.J. Tang, K.C. Tan, E.J. Teoh: Dynamics analysis 
and analog associative memory of networks with 
LT neurons, IEEE Trans. Neural Netw. 17(2), 409-418 
(2006) 

L. Zou, H.J. Tang, K.C. Tan, W.N. Zhang: Nontriv- 
ial global attractors in 2-D multistable attractor 
neural networks, IEEE Trans. Neural Netw. 20(11), 
1842-1851 (2009) 

D. Liu, A.N. Michel: Sparsely interconnected neu- 
ral networks for associative memories with ap- 
plications to cellular neural networks, IEEE Trans. 
Circuits Syst. Il: Analog Digit, Signal Process. 41(4), 
295-307 (1994) 

M. Brucoli, L. Carnimeo, G. Grassi: Discrete-time 
cellular neural networks for associative memo- 
ries with learning and forgetting capabilities, IEEE 
Trans. Circuits Syst. 1: Fundam. Theory Appl. 42(7), 
396-399 (1995) 

R. Perfetti: Dual-mode space-varying self- 
designing cellular neural networks for associative 
memory, IEEE Trans. Circuits Syst. |: Fundam. 
Theory Appl. 46(10), 1281-1285 (1999) 

G. Grassi: On discrete-time cellular neural net- 
works for associative memories, IEEE Trans. Cir- 
cuits Syst. |: Fundam. Theory Appl. 48(1), 107-111 
(2001) 

L. Wang, X. Zou: Capacity of stable periodic so- 
lutions in discrete-time bidirectional associative 
memory neural networks, IEEE Trans. Circuits Syst. 
ll: Express Br. 51(6), 315-319 (2004) 

J. Milton: Epilepsy: Multistability in a dynamic 
disease. In: Self- Organized Biological Dynamics 
Nonlinear Control: Toward Understanding Com- 
plexity, Chaos, and Emergent Function in Living 
Systems, ed. by J. Walleczek (Cambridge Univ. 
Press, Cambridge 2000) pp. 374-386 

U. Feudel: Complex dynamics in multistable sys- 
tems, Int. J. Bifurc. Chaos 18(6), 1607-1626 (2008) 
J. Hizanidis, R. Aust, E. Scholl: Delay-induced 
multistability near a global bifurcation, Int. J. Bi- 
furc. Chaos 18(6), 1759-1765 (2008) 

G.G. Wells, C.V. Brown: Multistable liquid crystal 
waveplate, Appl. Phys. Lett. 91(22), 223506 (2007) 
G. Deco, D. Marti: Deterministic analysis of 
stochastic bifurcations in multi-stable neurody- 
namical systems, Biol. Cybern. 96(5), 487-496 
(2007) 

J.D. Cao, G. Feng, YY. Wang: Multistability 
and multiperiodicity of delayed Cohen-Grossberg 
neural networks with a general class of ac- 


33.41 


33.42 


33.43 


33.44 


33.45 


33.46 


33.47 


33.48 


33.49 


33.50 


33.51 


33.52 


33.53 


33.54 


33.55 


tivation functions, Phys. D 237(13), 1734-1749 
(2008) 

CY. Cheng, K.H. Lin, C.W. Shih: Multistability in 
recurrent neural networks, SIAM J. Appl. Math. 
66(4), 1301-1320 (2006) 

Z. Yi, K.K. Tan: Multistability of discrete-time re- 
current neural networks with unsaturating piece- 
wise linear activation functions, IEEE Trans. Neural 
Netw. 15(2), 329-336 (2004) 

Z. Yi, K.K. Tan, T.H. Lee: Multistability analysis 
for recurrent neural networks with unsaturating 
piecewise linear transfer functions, Neural Com- 
put. 15(3), 639-662 (2003) 

Z.G. Zeng, T.W. Huang, W.X. Zheng: Multistability 
of recurrent neural networks with time-varying 
delays and the piecewise linear activation func- 
tion, IEEE Trans. Neural Netw. 21(8), 1371-1377 
(2010) 

Z.G. Zeng, J. Wang, X.X. Liao: Stability analysis of 
delayed cellular neural networks described us- 
ing cloning templates, IEEE Trans. Circuits Syst. |: 
Regul. Pap. 51(11), 2313-2324 (2004) 

Z.G. Zeng, J. Wang: Multiperiodicity and expo- 
nential attractivity evoked by periodic external 
inputs in delayed cellular neural networks, Neu- 
ral Comput. 18(4), 848-870 (2006) 

L.L. Wang, W.L. Lu, T.P. Chen: Multistability and 
new attraction basins of almost-periodic solu- 
tions of delayed neural networks, IEEE Trans. Neu- 
ral Netw. 20(10), 1581-1593 (2009) 

G. Huang, J.D. Cao: Delay-dependent multista- 
bility in recurrent neural networks, Neural Netw. 
23(2), 201-209 (2010) 

L.L. Wang, W.L. Lu, T.P. Chen: Coexistence and 
local stability of multiple equilibria in neural net- 
works with piecewise linear nondecreasing ac- 
tivation functions, Neural Netw. 23(2), 189-200 
(2010) 

L. Zhang, Z. Yi, J.L. Yu, P.A. Heng: Some mul- 
tistability properties of bidirectional associative 
memory recurrent neural networks with unsat- 
urating piecewise linear transfer functions, Neu- 
rocomputing 72(16-18), 3809-3817 (2009) 

X.B. Nie, J.D. Cao: Multistability of competitive 
neural networks with time-varying and dis- 
tributed delays, Nonlinear Anal.: Real World Appl. 
10(2), 928-942 (2009) 

CY. Cheng, K.H. Lin, C.W. Shih: Multistability and 
convergence in delayed neural networks, Phys. D 
225(1), 61-74 (2007) 

T.J. Sejnowski, C. Koch, P.S. Churchland: Com- 
putational neuroscience, Science 241(4871), 1299 
(1988) 

G. Edelman: Remembered Present: A Biological 
Theory of Consciousness (Basic Books, New York 
1989) 

W.J. Freeman: Societies of Brains: A Study in the 
Neuroscience of Love and Hate (Lawrence Erl- 
baum, New York 1995) 


Neurodynamics 


References 


33.56 


33.57 


33.58 


33.59 


33.60 


33.61 


33.62 


33.63 


33.64 


33.65 


33.66 


33.67 


33.68 


33.69 


33.70 


33.71 


R. Llinas, U. Ribary, D. Contreras, C. Pedroarena: 
The neuronal basis for consciousness, Philos. 
Trans. R. Soc. B 353(1377), 1841 (1998) 

F. Crick, C. Koch: A framework for consciousness, 
Nat. Neurosci. 6(2), 119-126 (2003) 

A.L. Hodgkin, A.F. Huxley: A quantitative descrip- 
tion of membrane current and its application to 
conduction and excitation in nerve, J. Physiol. 
117(4), 500 (1952) 

A. Pikovsky, M. Rosenblum: Synchronization, 
Scholarpedia 2(12), 1459 (2007) 

D. Golomb, A. Shedmi, R. Curtu, G.B. Ermentrout: 
Persistent synchronized bursting activity in cor- 
tical tissues with low magnesium concentration: 
A modeling study, J. Neurophysiol. 95(2), 1049- 
1067 (2006) 

M.L.V. Quyen, J. Foucher, J.-P. Lachaux, E. Ro- 
driguez, A. Lutz, J. Martinerie, F.J. Varela: Compar- 
ison of Hilbert transform and wavelet methods for 
the analysis of neuronal synchrony, J. Neurosci. 
Methods 111(2), 83-98 (2001) 

W.J. Freeman, L.J. Rogers: Fine temporal reso- 
lution of analytic phase reveals episodic syn- 
chronization by state transitions in gamma EEGs, 
J. Neurophysiol. 87(2), 937-945 (2002) 

G.E.P. Box, G.M. Jenkins, G.C. Reinsel: Ser. Probab. 
Stat, Time Series Analysis: Forecasting and Con- 
trol, Vol. 734 (Wiley, Hoboken 2008) 

R.W. Thatcher, D.M. North, C.J. Biver: Develop- 
ment of cortical connections as measured by EEG 
coherence and phase delays, Hum. Brain Mapp. 
29(12), 1400-1415 (2007) 

A. Pikovsky, M. Rosenblum, J. Kurths: Synchro- 
nization: A Universal Concept in Nonlinear Sci- 
ences, Vol. 12 (Cambridge Univ. Press, Cambridge 
2003) 

J. Rodriguez, R. Kozma: Phase synchronization in 
mesoscopic electroencephalogram arrays. In: In- 
telligent Engineering Systems Through Artificial 
Neural Networks Series, ed. by C. Dagli (ASME, New 
York 2007) pp. 9-14 

J.M. Barrie, W.J. Freeman, M.D. Lenhart: Spa- 
tiotemporal analysis of prepyriform, visual, 
auditory, and somesthetic surface EEGs in 
trained rabbits, J. Neurophysiol. 76(1), 520-539 
(1996) 

G. Dumas, M. Chavez, J. Nadel, J. Martinerie: 
Anatomical connectivity influences both intra- 
and inter-brain synchronizations, PloS ONE 7(5), 
236414 (2012) 

J.A.S. Kelso: Dynamic Patterns: The Self- 
Organization of Brain and Behavior (MIT Press, 
Cambridge 1995) 

S. Campbell, D. Wang: Synchronization and 
desynchronization in a network of locally cou- 
pled Wilson-Cowan oscillators, IEEE Trans. Neural 
Netw. 7(3), 541-554 (1996) 

H. Kurokawa, C.Y. Ho: A learning rule of the oscil- 
latory neural networks for in-phase oscillation, 


33.72 


33.73 


33.74 


33.75 


33.76 


33.77 


33.78 


33.79 


33.80 


33.81 


33.82 


33.83 


33.84 


33.85 


33.86 


33.87 


33.88 


33.89 


IEICE Trans. Fundam. Electron. Commun. Comput. 
Sci. 80(9), 1585-1594 (1997) 

G. Buzsaki: Rhythms of the Brain (Oxford Univ. 
Press, New York 2009) 

A.K. Engel, P. Fries, W. Singer: Dynamic predic- 
tions: Oscillations and synchrony in top-down 
processing, Nat. Rev. Neurosci. 2(10), 704-716 
(2001) 

W.J. Freeman, R.Q. Quiroga: Imaging Brain Func- 
tion with EEG: Advanced Temporal and Spa- 
tial Analysis of Electroencephalographic Signals 
(Springer, New York 2013) 

H. Haken: Cooperative phenomena in systems far 
from thermal equilibrium and in nonphysical sys- 
tems, Rev. Mod. Phys. 47(1), 67 (1975) 

S.H. Strogatz: Exploring complex networks, Nature 
410(6825), 268-276 (2001) 

0. Sporns, D.R. Chialvo, M. Kaiser, C.C. Hilge- 
tag: Organization, development and function of 
complex brain networks, Trends Cogn. Sci. 8(9), 
418-425 (2004) 

B. Bollobas, R. Kozma, D. Miklos (Eds.): Hand- 
book of Large-Scale Random Networks, Bolyai 
Soc. Math. Stud., Vol. 18 (Springer, Berlin, Heidel- 
berg 2009) 

Y. Kuramoto: Cooperative dynamics of oscillator 
community, Prog. Theor. Phys. Suppl. 79, 223-240 
(1984) 

M.G. Rosenblum, A.S. Pikovsky: Controlling syn- 
chronization in an ensemble of globally coupled 
oscillators, Phys. Rev. Lett. 92(11), 114102 (2004) 
0.V. Popovych, P.A. Tass: Synchronization con- 
trol of interacting oscillatory ensembles by mixed 
nonlinear delayed feedback, Phys. Rev. E 82(2), 
026204 (2010) 

W.J. Freeman: The physiology of perception, Sci. 
Am. 264, 78-85 (1991) 

M.A. Cohen, S. Grossberg: Absolute stability of 
global pattern formation and parallel memory 
storage by competitive neural networks, IEEE 
Trans. Syst. Man Cybern. 13(5), 815-826 (1983) 

J.J. Hopfield, D.W. Tank: Computing with neu- 
ral circuits - A model, Science 233(4764), 625-633 
(1986) 

C.M. Marcus, R.M. Westervelt: Dynamics of 
iterated-map neural networks, Phys. Rev. A 40(1), 
501 (1989) 

W. Yu, J. Cao, J. Wang: An LMI approach to 
global asymptotic stability of the delayed Cohen- 
Grossberg neural network via nonsmooth analy- 
sis, Neural Netw. 20(7), 810-818 (2007) 

F.C. Hoppensteadt, E.M. Izhikevich: Weakly Con- 
nected Neural Networks, Applied Mathematical 
Sciences, Vol. 126 (Springer, New York 1997) 

H.R. Wilson, J.D. Cowan: Excitatory and inhibitory 
interactions in localized populations of model 
neurons, Biophys. J. 12(1), 1-24 (1972) 

H.R. Wilson, J.D. Cowan: A mathematical theory 
of the functional dynamics of cortical and tha- 


641 


EE | d Hed 


642 


EE | d Hed 


Part D 


Neural Networks 


33.90 


33:91 


33:92 


33:93 


33.94 


33.95 


33.96 


33.97 


33.98 


33.99 


33.100 


33.101 


33.102 


33.103 


33.104 


33.105 


33.106 


lamic nervous tissue, Biol. Cybern. 13(2), 55-80 
(1973) 

P.C. Bressloff: Spatiotemporal dynamics of contin- 
uum neural fields, J. Phys. A: Math. Theor. 45(3), 
033001 (2011) 

D. Wang: Object selection based on oscillatory 
correlation, Neural Netw. 12(4), 579-592 (1999) 

A. Renart, R. Moreno-Bote, X.-J. Wang, N. Parga: 
Mean-driven and fluctuation-driven persistent 
activity in recurrent networks, Neural Comput. 
19(1), 1-46 (2007) 

M. Ursino, E. Magosso, C. Cuppini: Recognition 
of abstract objects via neural oscillators: interac- 
tion among topological organization, associative 
memory and gamma band synchronization, IEEE 
Trans. Neural Netw. 20(2), 316-335 (2009) 

W.J. Freeman: Mass Action in the Nervous System 
(Academic, New York 1975) 

D. Xu, J. Principe: Dynamical analysis of neu- 
ral oscillators in an olfactory cortex model, IEEE 
Trans. Neural Netw. 15(5), 1053-1062 (2004) 

R. Ilin, R. Kozma: Stability of coupled excitatory- 
inhibitory neural populations and application to 
control of multi-stable systems, Phys. Lett. A 
360(1), 66-83 (2006) 

R. Ilin, R. Kozma: Control of multi-stable chaotic 
neural networks using input constraints, 2007. 
IJCNN 2007. Int. Jt. Conf. Neural Netw., Orlando 
(2007) pp. 2194-2199 

G. Deco, V. K. Jirsa, P. A. Robinson, M. Breaks- 
pear, K. Friston: The dynamic brain: From spiking 
neurons to neural masses and cortical fields, PLoS 
Comput. Biol. 4(8), e1000092 (2008) 

L. Ingber: Generic mesoscopic neural networks 
based on statistical mechanics of neocorti- 
cal interactions, Phys. Rev. A 45(4), 2183-2186 
(1992) 

V.K. Jirsa, K.J. Jantzen, A. Fuchs, J.A. Scott Kelso: 
Spatiotemporal forward solution of the EEG and 
meg using network modeling, IEEE Trans. Med. 
Imaging 21(5), 493-504 (2002) 

S. Coombes, C. Laing: Delays in activity-based 
neural networks, Philos. Trans. R. Soc. A 367(1891), 
1117-1129 (2009) 

V.K. Jirsa: Neural field dynamics with local and 
global connectivity and time delay, Philos. Trans. 
R. Soc. A 367(1891), 1131-1143 (2009) 

K. Kaneko: Clustering, coding, switching, hier- 
archical ordering, and control in a network of 
chaotic elements, Phys. D 41(2), 137-172 (1990) 

R. Kozma: Intermediate-range coupling gener- 
ates low-dimensional attractors deeply in the 
chaotic region of one-dimensional lattices, Phys. 
Lett. A 244(1), 85-91 (1998) 

S. Ishii, M.-A. Sato: Associative memory based on 
parametrically coupled chaotic elements, Phys. D 
121(3), 344-366 (1998) 

F. Moss, A. Bulsara, M.F. Schlesinger (Eds.): 
The proceedings of the NATO Advanced Research 


33.107 


33.108 


33.109 


33.110 


33.111 


33.112 


33.113 


33.114 


33.115 


33.116 


33.117 


33.118 


33.119 


33.120 


33.121 


33.122 


33.123 


Workshop: Stochastic Resonance in Physics and 
Biology (Plenum Press, New York 1993) 

S.N. Dorogovtsev, A.V. Goltsev, J.F.F. Mendes: Crit- 
ical phenomena in complex networks, Rev. Mod. 
Phys. 80(4), 1275 (2008) 

M.D. McDonnell, L.M. Ward: The benefits of noise 
in neural systems: bridging theory and experi- 
ment, Nat. Rev. Neurosci. 12(7), 415-426 (2011) 
A.V. Goltsev, M.A. Lopes, K.-E. Lee, J.F.F. Mendes: 
Critical and resonance phenomena in neural net- 
works, arXiv preprint arXiv:1211.5686 (2012) 

P. Bak: How Nature Works: The Science of Self- 
Organized Criticality (Copernicus, New York 1996) 
J.M. Beggs, D. Plenz: Neuronal avalanches in 
neocortical circuits, J. Neurosci. 23(35), 11167-11177 
(2003) 

J.M. Beggs: The criticality hypothesis: How lo- 
cal cortical networks might optimize information 
processing, Philos. Trans. R. Soc. A 366(1864), 
329-343 (2008) 

T. Petermann, T.C. Thiagarajan, M.A. Lebedev, 
M.A.L. Nicolelis, D.R. Chialvo, D. Plenz: Sponta- 
neous cortical activity in awake monkeys com- 
posed of neuronal avalanches, Proc. Natl. Acad. 
Sci. 106(37), 15921-15926 (2009) 

M. Puljic, R. Kozma: Narrow-band oscillations in 
probabilistic cellular automata, Phys. Rev. E 78(2), 
026214 (2008) 

R. Kozma, M. Puljic, W.J. Freeman: Thermody- 
namic model of criticality in the cortex based on 
EEG/ECoG data. In: Criticality in Neural Systems, 
ed. by D. Plenz, E. Niebur (Wiley, Hoboken 2014) 
pp. 153-176 

J.-P. Eckmann, D. Ruelle: Ergodic theory of chaos 
and strange attractors, Rev. Mod. Phys. 57(3), 617 
(1985) 

E. Ott, C. Grebogi, J.A. Yorke: Controlling chaos, 
Phys. Rev. Lett. 64(11), 1196-1199 (1990) 

E.N. Lorenz: Deterministic nonperiodic flow, J. At- 
mos. Sci. 20(2), 130-141 (1963) 

K. Aihara, T. Takabe, M. Toyoda: Chaotic neural 
networks, Phys. Lett. A 144(6), 333-340 (1990) 

K. Aihara, H. Suzuki: Theory of hybrid dynam- 
ical systems and its applications to biological 
and medical systems, Philos. Trans. R. Soc. A 
368(1930), 4893-4914 (2010) 

G. Matsumoto, K. Aihara, Y. Hanyu, N. Takahashi, 
S. Yoshizawa, J.-I. Nagumo: Chaos and phase 
locking in normal squid axons, Phys. Lett. A123(4), 
162-166 (1987) 

K. Aihara: Chaos engineering and its appli- 
cation to parallel distributed processing with 
chaotic neural networks, Proc. IEEE 90(5), 919- 
930 (2002) 

L. Wang, S. Li, F. Tian, X. Fu: A noisy chaotic 
neural network for solving combinatorial opti- 
mization problems: Stochastic chaotic simulated 
annealing, IEEE Trans. Syst. Man Cybern. B 34(5), 
2119-2125 (2004) 


Neurodynamics | References 643 


33.124 


33.125 


33.126 


33.127 


33.128 


33.129 


33.130 


33:131 


33.132 


33.133 


33.134 


33.135 


33.136 


33.137 


33.138 


33.139 


33.140 


33.141 


Z. Zeng, J. Wang: Improved conditions for global 
exponential stability of recurrent neural net- 
works with time-varying delays, IEEE Trans. Neu- 
ral Netw. 17(3), 623-635 (2006) 

M.D. Marco, M. Grazzini, L. Pancioni: Global ro- 
bust stability criteria for interval delayed full- 
range cellular neural networks, IEEE Trans. Neural 
Netw. 22(4), 666-671 (2011) 

C.A. Skarda, W.J. Freeman: How brains make 
chaos in order to make sense of the world, Be- 
hav. Brain Sci. 10(2), 161-195 (1987) 

H.D.I. Abarbanel, M.I. Rabinovich, A. Selver- 
ston, M.V. Bazhenov, R. Huerta, M.M. Sushchik, 
L.L. Rubchinskii: Synchronisation in neural net- 
works, Phys.-Usp. 39(4), 337-362 (1996) 

H. Korn, P. Faure: Is there chaos in the brain? Il. 
experimental evidence and related models, c.r. 
Biol. 326(9), 787-840 (2003) 

R. Kozma, W.J. Freeman: Intermittent spatio- 
temporal desynchronization and sequenced syn- 
chrony in ECoG signals, Chaos Interdiscip. J. Non- 
linear Sci. 18(3), 037131 (2008) 

K. Kaneko: Collapse of Tori and Genesis of Chaos 
in Dissipative Systems (World Scientific Publ., Sin- 
gapore 1986) 

K. Aihara: Chaos in neural networks. In: The Im- 
pact of Chaos on Science and Society, ed. by 
C. Grebogi, J.A. Yorke (United Nations Publ., New 
York 1997) pp. 110-126 

|. Tsuda: Toward an interpretation of dynamic 
neural activity in terms of chaotic dynamical sys- 
tems, Behav. Brain Sci. 24(5), 793-809 (2001) 

H. Bersini, P. Sener: The connections between 
the frustrated chaos and the intermittency chaos 
in small Hopfield networks, Neural Netw. 15(10), 
1197-1204 (2002) 

P. Berge, Y. Pomeau, C. Vidal: Order in Chaos (Her- 
man, Paris and Wiley, New York 1984) 

Y. Pomeau, P. Manneville: Intermittent transition 
to turbulence in dissipative dynamical systems, 
Commun. Math. Phys. 74(2), 189-197 (1980) 

T. Higuchi: Relationship between the fractal di- 
mension and the power law index for a time 
series: A numerical investigation, Phys. D 46(2), 
254-264 (1990) 

B. Mandelbrot: Fractals and Chaos: The Mandel- 
brot Set and Beyond, Vol. 3 (Springer, New York 
2004) 

K. Falconer: Fractal Geometry: Mathematical 
Foundations and Applications (Wiley, Hoboken 
2003) 

T. Sauer, J.A. Yorke, M. Casdagli: Embedology, 
J. Stat. Phys. 65(3), 579-616 (1991) 

A. Wolf, J.B. Swift, H.L. Swinney, J.A. Vastano: De- 
termining Lyapunov exponents from a time series, 
Phys. D 16(3), 285-317 (1985) 

L.D. lasemidis, J.C. Sackellares, H.P. Zaveri, 
W.J. Williams: Phase space topography and 
the Lyapunov exponent of electrocorticograms 


33.142 


33.143 


33.144 


33.145 


33.146 


33.147 


33.148 


33.149 


33.150 


33.151 


33.152 


33.153 


33.154 


33.155 


33.156 


33.157 


33.158 


in partial seizures, Brain Topogr. 2(3), 187-201 
(1990) 

S. Micheloyannis, N. Flitzanis, E. Papanikolaou, 
M. Bourkas, D. Terzakis, S. Arvanitis, C.J. Stam: 
Usefulness of non-linear EEG analysis, Acta Neu- 
rol. Scand. 97(1), 13-19 (2009) 

W.J. Freeman: A field-theoretic approach to 
understanding scale-free neocortical dynamics, 
Biol. Cybern. 92(6), 350-359 (2005) 

W.J. Freeman, H. Erwin: Freeman k-set, Scholar- 
pedia 3(2), 3238 (2008) 

R. Kozma, W. Freeman: Basic principles of the 
KIV model and its application to the navi- 
gation problem, Integr. Neurosci. 2(1), 125-145 
(2003) 

R. Kozma, W.J. Freeman: The KIV model of in- 
tentional dynamics and decision making, Neural 
Netw. 22(3), 277-285 (2009) 

H.-J. Chang, W.J. Freeman, B.C. Burke: Biolog- 
ically modeled noise stabilizing neurodynamics 
for pattern recognition, Int. J. Bifurc. Chaos 8(2), 
321-345 (1998) 

R. Kozma, J.W. Freeman: Chaotic resonance - 
methods and applications for robust classifica- 
tion of noisy and variable patterns, Int. J. Bifurc. 
Chaos 11(6), 1607-1629 (2001) 

R.J. McEliece, E.C. Posner, E. Rodemich, 
S. Venkatesh: The capacity of the Hopfield 
associative memory, IEEE Trans. Inf. Theory 33(4), 
461-482 (1987) 

J. Ma: The asymptotic memory capacity of the 
generalized Hopfield network, Neural Netw. 12(9), 
1207-1212 (1999) 

V. Gripon, C. Berrou: Sparse neural networks with 
large learning diversity, IEEE Trans. Neural Netw. 
22(7), 1087-1096 (2011) 

|. Beliaev, R. Kozma: Studies on the memory ca- 
pacity and robustness of chaotic dynamic neural 
networks, Int. Jt. Conf. Neural Netw., IEEE (2006) 
pp. 3991-3998 

D.A. Leopold, N.K. Logothetis: Multistable phe- 
nomena: Changing views in perception, Trends 
Cogn. Sci. 3(7), 254-264 (1999) 

E.D. Lumer, K.J. Friston, G. Rees: Neural correlates 
of perceptual rivalry in the human brain, Science 
280(5371), 1930-1934 (1998) 

G. Werner: Metastability, criticality and phase 
transitions in brain and its models, Biosystems 
90(2), 496-508 (2007) 

W.J. Freeman: Understanding perception through 
neural codes, IEEE Trans. Biomed. Eng. 58(7), 
1884-1890 (2011) 

R. Kozma, J.J. Davis, W.J. Freeman: Synchronized 
minima in ECoG power at frequencies between 
beta-gamma oscillations disclose cortical singu- 
larities in cognition, J. Neurosci. Neuroeng. 1(1), 
13-23 (2012) 

R. Kozma: Neuropercolation, Scholarpedia 2(8), 
1360 (2007) 


EE | d Hed 


644 PartD 


Neural Networks 


EE | d Hed 


33.159 


33.160 


33.161 


33.162 


33.163 


33.164 


33.165 


33.166 


33.167 


33.168 


33.169 


33.170 


33.171 


33.102 


33.173 


33.174 


E. Bullmore, 0. Sporns: Complex brain networks: 
Graph theoretical analysis of structural and func- 
tional systems, Nat. Rev. Neurosci. 10(3), 186-198 
(2009) 

R. Kozma, M. Puljic, P. Balister, B. Bollobas, 
W. Freeman: Neuropercolation: A random cellular 
automata approach to spatio-temporal neurody- 
namics, Lect. Notes Comput. Sci. 3305, 435-443 
(2004) 

P. Balister, B. Bollobas, R. Kozma: Large deviations 
for mean field models of probabilistic cellular au- 
tomata, Random Struct. Algorithm. 29(3), 399-415 
(2006) 

R. Kozma, M. Puljic, L. Perlovsky: Modeling 
goal-oriented decision making through cognitive 
phase transitions, New Math. Nat. Comput. 5(1), 
143-157 (2009) 

M. Puljic, R. Kozma: Broad-band oscillations by 
probabilistic cellular automata, J. Cell. Autom. 
5(6), 491-507 (2010) 

S. Jo, T. Chang, |. Ebong, B. Bhadviya, 
P. Mazumder, W. Lu: Nanoscale memristor 
device as synapse in neuromorphic systems, 
Nano Lett. 10, 1297-1301 (2010) 

L. Smith: Handbook of Nature-Inspired and In- 
novative Computing: Integrating Classical Models 
with Emerging Technologies (Springer, New York 
2006) pp. 433-475 

G. Indiveri, E. Chicca, R. Douglas: A VLSI ar- 
ray of low-power spiking neurons and bistable 
synapses with spike-timing dependent plasticity, 
IEEE Trans. Neural Netw. 17, 211-221 (2006) 

Editors of Scientific American: The Scientific Amer- 
ican Book of the Brain (Scientifc American, New 
York 1999) 

L. Chua, L. Yang: Cellular neural networks, Theory. 
IEEE Trans. Circuits Syst. CAS-35, 1257-1272 (1988) 
C. Zheng, H. Zhang, Z. Wang: Improved robust sta- 
bility criteria for delayed cellular neural networks 
via the LMI approach, IEEE Trans. Circuits Syst. Il - 
Expr. Briefs 57, 41-45 (2010) 

L. Chua: Memristor - The missing circuit element, 
IEEE Trans. Circuits Theory CT-18, 507-519 (1971) 

D. Strukov, G. Snider, D. Stewart, R. Williams: 
The missing memristor found, Nature 453, 80-83 
(2008) 

Q. Xia, W. Robinett, M. Cumbie, N. Banerjee, 
T. Cardinali, J. Yang, W. Wu, X. Li, W. Tong, 
D. Strukov, G. Snider, G. Medeiros-Ribeiro, 
R. Williams: Memristor - CMOS hybrid integrated 
circuits for reconfigurable logic, Nano Lett. 9, 
3640-3645 (2009) 

X. Wang, Y. Chen, H. Xi, H. Li, D. Dimitrov: Spin- 
tronic memristor through spin-torque-induced 
magnetization motion, IEEE Electron Device Lett. 
30, 294-297 (2009) 

Y. Joglekar, S. Wolf: The elusive memristor: Prop- 
erties of basic electrical circuits, Eur. J. Phys. 30, 
661-675 (2009) 


33.175 


33.176 


33.177 


33.178 


33.179 


33.180 


33.181 


33.182 


33.183 


33.184 


33.185 


33.186 


33.187 


33.188 


33.189 


M. Pickett, D. Strukov, J. Borghetti, J. Yang, 
G. Snider, D. Stewart, R. Williams: Switching dy- 
namics in titanium dioxide memristive devices, 
J. Appl. Phys. 106(6), 074508 (2009) 

S. Adhikari, C. Yang, H. Kim, L. Chua: Memris- 
tor bridge synapse-based neural network and 
its learning, IEEE Trans. Neural Netw. Learn. Syst. 
23(9), 1426-1435 (2012) 

B. Linares-Barranco, T. Serrano-Gotarredona: Ex- 
ploiting memristive in adaptive spiking neuro- 
morphic nanotechnology systems, 9th IEEE Conf. 
Nanotechnol., Genoa (2009) pp. 601-604 

M. Holler, S. Tam, H. Castro, R. Benson: An 
electrically trainable artificial neural network 
(ETANN) with 10240 Floating gate synapsess, Int. 
J. Conf. Neural Netw., Washington (1989) pp. 191- 
196 

H. Withagen: Implementing backpropagation 
with analog hardware, Proc. IEEE ICNN-94, Or- 
lando (1994) pp. 2015-2017 

S. Lindsey, T. Lindblad: Survey of neural net- 
work hardware invited paper, Proc. SPIE Appl. Sci. 
Artif. Neural Netw. Conf., Orlando (1995) pp. 1194- 
1205 

H. Kim, M. Pd Sah, C. Yang, T. Roska, L. Chua: 
Neural synaptic weighting with a pulse-based 
memristor circuit, IEEE Trans. Circuits Syst. | 59(1), 
148-158 (2012) 

H. Kim, M. Pd Sah, C. Yang, T. Roska, L. Chua: 
Memristor bridge synapses, Proc. IEEE 100(6), 
2061-2070 (2012) 

E. Lehtonen, M. Laiho: CNN using memristors 
for neighborhood connections, 12th Int. Work- 
shop Cell. Nanoscale Netw. Appl. (CNNA), Berkeley 
(2010) 

F. Merrikh-Bayat, F. Merrikh-Bayat, S. Shouraki: 
The neuro-fuzzy computing system with the ca- 
pacity of implementation on a memristor crossbar 
and optimization-free hardware training, IEEE 
Trans. Fuzzy Syst. 22(5), 1272-1287 (2014) 

G. Snider: Spike-timing-dependent learning 
in memristive nanodevices, IEEE Int. Symp. 
Nanoscale Archit., Anaheim (2008) pp. 85-92 

|. Ebong, D. Deshpande, Y. Yilmaz, P. Mazumder: 
Multi-purpose neuro-architecture with memris- 
tors, 11th IEEE Conf. Nanotechnol., Portland, Ore- 
gon (2011) pp. 1194-1205 

H. Manem, J. Rajendran, G. Rose: Stochastic gra- 
dient descent inspired training technique for 
a CMOS/Nano memristive trainable threshold gate 
way, IEEE Trans. Circuits Syst. | 59(5), 1051-1060 
(2012) 

G. Howard, E. Gale, L. Bull, B. Costello, 
A. Adamatzky: Evolution of plastic learning 
in spiking networks via memristive connec- 
tions, IEEE Trans. Evol. Comput. 16(5), 711-729 
(2012) 

S. Wen, Z. Zeng, T. Huang: Exponential stabil- 
ity analysis of memristor-based recurrent neural 


Neurodynamics 


References 


33.190 


33.191 


33.192 


33.193 


33.194 


33.195 


33.196 


33.197 


33.198 


33.199 


33.200 


33.201 


33.202 


33.203 


33.204 


33.205 


networks with time-varying delays, Neurocom- 
puting 97(15), 233-240 (2012) 

A. Wu, Z. Zeng: Dynamics behaviors of memristor- 
based recurrent neural networks with time- 
varying delays, Neural Netw. 36, 1-10 (2012) 

A. Cichocki, R. Unbehauen: Neural Networks for 
Optimization and Signal Processing (Wiley, New 
York 1993) 

J. Wang: Recurrent neural networks for optimiza- 
tion. In: Fuzzy Logic and Neural Network Hand- 
book, ed. by C.H. Chen (McGraw-Hill, New York 
1996), pp. 4.1-4.35 

Y. Xia, J. Wang: Recurrent neural networks for 
optimization: The state of the art. In: Recurrent 
Neural Networks: Design and Applications, ed. by 
L.R. Medsker, L.C. Jain (CRC, Boca Raton 1999), 13- 
45 

Q. Liu, J. Wang: Recurrent neural networks with 
discontinuous activation functions for convex op- 
timization. In: Integration of Swarm Intelligence 
and Artifical Neutral Network, ed. by S. Dehuri, 
S. Ghosh, S.B. Cho (World Scientific, Singapore 
2011), 95-119 

I.B. Pyne: Linear programming on an electronic 
analogue computer, Trans. Am. Inst. Elect. Eng. 
75(2), 139-143 (1956) 

L.O. Chua, G. Lin: Nonlinear programming without 
computation, IEEE Trans. Circuits Syst. 31(2), 182- 
189 (1984) 

G. Wilson: Quadratic programming analogs, IEEE 
Trans. Circuits Syst. 33(9), 907-911 (1986) 

J.J. Hopfield, D.W. Tank: Neural computation of 
decisions in optimization problems, Biol. Cybern. 
52(3), 141-152 (1985) 

J.J. Hopfield, D.W. Tank: Computing with neu- 
ral circuits - a model, Science 233(4764), 625-633 
(1986) 

D.W. Tank, J.J. Hopfield: Simple neural optimiza- 
tion networks: an A/D converter, signal decision 
circuit, and a linear programming circuit, IEEE 
Trans. Circuits Syst. 33(5), 533-541 (1986) 

M.P. Kennedy, L.0. Chua: Neural networks for 
nonlinear programming, IEEE Trans. Circuits Syst. 
35(5), 554-562 (1988) 

A. Rodriguez-Vazquez, R. Dominguez-Castro, 
A. Rueda, J.L. Huertas, E. Sanchez-Sinencio: 
Nonlinear switch-capacitor neural networks for 
optimization problems, IEEE Trans. Circuits Syst. 
37(3), 384-398 (1990) 

S. Sudharsanan, M. Sundareshan: Exponential 
stability and a systematic synthesis of a neural 
network for quadratic minimization, Neural Netw. 
4, 599-613 (1991) 

S. Zhang, A.G. Constantinides: Lagrange program- 
ming neural network, IEEE Trans. Circuits Syst. 
39(7), 441-452 (1992) 

S. Zhang, X. Zhu, L. Zou: Second-order neural nets 
for constrained optimization, IEEE Trans. Neural 
Netw. 3(6), 1021-1024 (1992) 


33. 


33: 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33: 


33. 


33. 


33. 


33. 


33. 


33. 


206 


207 


208 


209 


210 


211 


212 


213 


214 


215 


216 


217 


218 


219 


220 


221 


A. Bouzerdoum, T.R. Pattison: Neural network 
for quadratic optimization with bound con- 
straints, IEEE Trans. Neural Netw. 4&(2), 293-304 
(1993) 

M. Ohlsson, C. Peterson, B. Soderberg: Neural net- 
works for optimization problems with inequality 
constraints: The knapsack problem, Neural Com- 
put. 5, 331-339 (1993) 

J. Wang: Analysis and design of a recurrent neural 
network for linear programming, IEEE Trans. Cir- 
cuits Syst. | 40(9), 613-618 (1993) 

W.E. Lillo, M.H. Loh, S. Hui, S.H. Zak: On solving 
constrained optimization problems with neural 
networks: A penalty method approach, IEEE Trans. 
Neural Netw. 4(6), 931-940 (1993) 

J. Wang: A deterministic annealing neural net- 
work for convex programming, Neural Netw. 7(4), 
629-641 (1994) 

S.H. Zak, V. Upatising, S. Hui: Solving linear 
programming problems with neural networks: 
A comparative study, IEEE Trans. Neural Netw. 6, 
94-104 (1995) 

Y. Xia, J. Wang: Neural network for solving lin- 
ear programming problems with bounded vari- 
ables, IEEE Trans. Neural Netw. 6(2), 515-519 
(1995) 

M. Vidyasagar: Minimum-seeking properties of 
analog neural networks with multilinear objec- 
tive functions, IEEE Trans. Autom. Control 40(8), 
1359-1375 (1995) 

M. Forti, A. Tesi: New conditions for global stabil- 
ity of neural networks with application to linear 
and quadratic programming problems, IEEE Trans. 
Circuits Syst. | 42(7), 354-366 (1995) 

A. Cichocki, R. Unbehauen, K. Weinzierl, R. Holzel: 
A new neural network for solving linear program- 
ming problems, Eur. J. Oper. Res. 93, 244-256 
(1996) 

Y. Xia: A new neural network for solving linear 
programming problems and its application, IEEE 
Trans. Neural Netw. 7(2), 525-529 (1996) 

X. Wu, Y. Xia, J. Li, W.K. Chen: A high-performance 
neural network for solving linear and quadratic 
programming problems, IEEE Trans. Neural Netw. 
7(3), 1996 (1996) 

Y. Xia: A new neural network for solving linear 
and quadratic programming problems, IEEE Trans. 
Neural Netw. 7(6), 1544-1547 (1996) 

Y. Xia: Neural network for solving extended linear 
programming problems, IEEE Trans. Neural Netw. 
8(3), 803-806 (1997) 

M.J. Perez-Ilzarbe: Convergence analysis of 
a discrete-time recurrent neural network to 
perform quadratic real optimization with bound 
constraints, IEEE Trans. Neural Netw. 9(6), 
1344-1351 (1998) 

M.C.M. Teixeira, S.H. Zak: Analog neural non- 
derivative optimizers, IEEE Trans. Neural Netw. 
9(4), 629-638 (1998) 


645 


EE | d Hed 


646 Part D 


Neural Networks 


EE | d Hed 


33.222 


33.223 


33.224 


33.225 


33.226 


33.227 


33.228 


33.229 


33.230 


33.231 


33.232 


33.233 


33.234 


33.235 


33.236 


Y. Xia, J. Wang: A general methodology for de- 
signing globally convergent optimization neural 
networks, IEEE Trans. Neural Netw. 9(6), 1331-1343 
(1998) 

E. Chong, S. Hui, H. Zak: An analysis of a class of 
neural networks for solving linear programming 
problems, IEEE Trans. Autom. Control 44(11), 1995- 
2006 (1999) 

Y. Xia, J. Wang: Global exponential stability of re- 
current neural networks for solving optimization 
and related problems, IEEE Trans. Neural Netw. 
11(4), 1017-1022 (2000) 

X. Liang, J. Wang: A recurrent neural network 
for nonlinear optimization with a continuously 
differentiable objective function and bound con- 
straints, IEEE Trans. Neural Netw. 11(6), 1251-1262 
(2000) 

Y. Leung, K. Chen, Y. Jiao, X. Gao, K. Leung: A new 
gradient-based neural network for solving linear 
and quadratic programming problems, IEEE Trans. 
Neural Netw. 12(5), 1074-1083 (2001) 

X. Liang: A recurrent neural network for nonlin- 
ear continuously differentiable optimization over 
a compact convex subset, IEEE Trans. Neural Netw. 
12(6), 1487-1490 (2001) 

Y. Xia, H. Leung, J. Wang: A projection neural 
network and its application to constrained op- 
timization problems, IEEE Trans. Circuits Syst. | 
49(4), 447-458 (2002) 

R. Fantacci, M. Forti, M. Marini, D. Tarchi, G. Van- 
nuccini: A neural network for constrained op- 
timization with application to CDMA communi- 
cation systems, IEEE Trans. Circuits Syst. II 50(8), 
484-487 (2003) 

Y. Leung, K. Chen, X. Gao: A high-performance 
feedback neural network for solving convex non- 
linear programming problems, IEEE Trans. Neural 
Netw. 14(6), 1469-1477 (2003) 

Y. Xia, J. Wang: A general projection neural 
network for solving optimization and related 
problems, IEEE Trans. Neural Netw. 15, 318-328 
(2004) 

X. Gao: A novel neural network for nonlinear con- 
vex programming, IEEE Trans. Neural Netw. 15(3), 
613-621 (2004) 

X. Gao, L. Liao, W. Xue: A neural network for a class 
of convex quadratic minimax problems with con- 
straints, IEEE Trans. Neural Netw. 15(3), 622-628 
(2004) 

Y. Xia, J. Wang: A recurrent neural network for 
nonlinear convex optimization subject to non- 
linear inequality constraints, IEEE Trans. Circuits 
Syst. | 51(7), 1385-1394 (2004) 

M. Forti, P. Nistri, M. Quincampoix: Generalized 
neural network for nonsmooth nonlinear pro- 
gramming problems, IEEE Trans. Circuits Syst. | 
51(9), 1741-1754 (2004) 

Y. Xia, G. Feng, J. Wang: A recurrent neural net- 
work with exponential convergence for solving 


33.237 


33.238 


33,239 


33.240 


33.241 


33.242 


33.243 


33.244 


33.245 


33.246 


33.247 


33.248 


33.249 


33.250 


convex quadratic program and linear piecewise 
equations, Neural Netw. 17(7), 1003-1015 (2004) 
Y. Xia, J. Wang: Recurrent neural networks for 
solving nonlinear convex programs with linear 
constraints, IEEE Trans. Neural Netw. 16(2), 379- 
386 (2005) 

Q. Liu, J. Cao, Y. Xia: A delayed neural network for 
solving linear projection equations and its appli- 
cations, IEEE Trans. Neural Netw. 16(4), 834-843 
(2005) 

X. Hu, J. Wang: Solving pseudomonotone varia- 
tional inequalities and pseudoconvex optimiza- 
tion problems using the projection neural net- 
work, IEEE Trans. Neural Netw. 17(6), 1487-1499 
(2006) 

S. Liu, J. Wang: A simplified dual neural network 
for quadratic programming with its KWTA appli- 
cation, IEEE Trans. Neural Netw. 17(6), 1500-1510 
(2006) 

Y. Yang, J. Cao: Solving quadratic programming 
problems by delayed projection neural network, 
IEEE Trans. Neural Netw. 17(6), 1630-1634 (2006) 
X. Hu, J. Wang: Design of general projection neu- 
ral network for solving monotone linear vari- 
ational inequalities and linear and quadratic 
optimization problems, IEEE Trans. Syst. Man Cy- 
bern. B 37(5), 1414-1421 (2007) 

Q. Liu, J. Wang: A one-layer recurrent neural 
network with a discontinuous hard-limiting ac- 
tivation function for quadratic programming, IEEE 
Trans. Neural Netw. 19(4), 558-570 (2008) 

Q. Liu, J. Wang: A one-layer recurrent neural net- 
work with a discontinuous activation function for 
linear programming, Neural Comput. 20(5), 1366- 
1383 (2008) 

Y. Xia, G. Feng, J. Wang: A novel neural network 
for solving nonlinear optimization problems with 
inequality constraints, IEEE Trans. Neural Netw. 
19(8), 1340-1353 (2008) 

M.P. Barbarosou, N.G. Maratos: A nonfeasible 
gradient projection recurrent neural network 
for equality-constrained optimization problems, 
IEEE Trans. Neural Netw. 19(10), 1665-1677 (2008) 
X. Hu, J. Wang: An improved dual neural net- 
work for solving a class of quadratic programming 
problems and its k-winners-take-all application, 
IEEE Trans. Neural Netw. 19(12), 2022-2031 (2008) 
X. Xue, W. Bian: Subgradient-based neural net- 
works for nonsmooth convex optimization prob- 
lems, IEEE Trans. Circuits Syst. | 55(8), 2378-2391 
(2008) 

W. Bian, X. Xue: Subgradient-based neural net- 
works for nonsmooth nonconvex optimization 
problems, IEEE Trans. Neural Netw. 20(6), 1024- 
1038 (2009) 

X. Hu, C. Sun, B. Zhang: Design of recurrent neural 
networks for solving constrained least absolute 
deviation problems, IEEE Trans. Neural Netw. 21(7), 
1073-1086 (2010) 


Neurodynamics 


References 


33. 


33. 


33. 


33. 


33. 


33; 


33. 


33. 


33. 


33. 


33% 


33. 


33. 


33. 


33. 


251 


252 


253 


254 


255 


256 


257 


258 


259 


260 


261 


262 


263 


264 


265 


Q. Liu, J. Wang: Finite-time convergent recur- 
rent neural network with a hard-limiting activa- 
tion function for constrained optimization with 
piecewise-linear objective functions, IEEE Trans. 
Neural Netw. 22(4), 601-613 (2011) 

Q. Liu, J. Wang: A one-layer recurrent neural net- 
work for constrained nonsmooth optimization, 
IEEE Trans. Syst. Man Cybern. 40(5), 1323-1333 (2011) 
L. Cheng, Z. Hou, Y. Lin, M. Tan, W.C. Zhang, F. Wu: 
Recurrent neural network for nonsmooth convex 
optimization problems with applications to the 
identification of genetic regulatory networks, IEEE 
Trans. Neural Netw. 22(5), 714-726 (2011) 

Z. Guo, Q. Liu, J. Wang: A one-layer recurrent 
neural network for pseudoconvex optimization 
subject to linear equality constraints, IEEE Trans. 
Neural Netw. 22(12), 1892-1900 (2011) 

Q. Liu, Z. Guo, J. Wang: A one-layer recurrent neu- 
ral network for constrained pseudoconvex opti- 
mization and its application for dynamic portfolio 
optimization, Neural Netw. 26(1), 99-109 (2012) 
W. Bian, X. Chen: Smoothing neural network for 
constrained non-Lipschitz optimization with ap- 
plications, IEEE Trans. Neural Netw. Learn. Syst. 
23(3), 399-411 (2012) 

Y. Xia: An extended projection neural network for 
constrained optimization, Neural Comput. 16(4), 
863-883 (2004) 

J. Wang, Y. Xia: Analysis and design of primal- 
dual assignment networks, IEEE Trans. Neural 
Netw. 9(1), 183-194 (1998) 

Y. Xia, G. Feng, J. Wang: A primal-dual neural net- 
work for online resolving constrained kinematic 
redundancy in robot motion control, IEEE Trans. 
Syst. Man Cybern. B 35(1), 54-64 (2005) 

Y. Xia, J. Wang: A dual neural network for kine- 
matic control of redundant robot manipulators, 
IEEE Trans. Syst. Man Cybern. B 31(1), 147-154 (2001) 
Y. Zhang, J. Wang: A dual neural network for con- 
strained joint torque optimization of kinemat- 
ically redundant manipulators, IEEE Trans. Syst. 
Man Cybern. B 32(5), 654-662 (2002) 

Y. Zhang, J. Wang, Y. Xu: A dual neural network for 
bi-criteria kinematic control redundant manipu- 
lators, IEEE Trans. Robot. Autom. 18(6), 923-931 
(2002) 

Y. Zhang, J. Wang, Y. Xia: A dual neural network 
for redundancy resolution of kinematically re- 
dundant manipulators subject to joint limits and 
joint velocity limits, IEEE Trans. Neural Netw. 14(3), 
658-667 (2003) 

A. Cichocki, R. Unbehauen: Neural networks for 
solving systems of linear equations and related 
problems, IEEE Trans. Circuits Syst. | 39(2), 124-138 
(1992) 

A. Cichocki, R. Unbehauen: Neural networks for 
solving systems of linear equations — part Il: 
Minimax and least absolute value problems, IEEE 
Trans. Circuits Syst. II 39(9), 619-633 (1992) 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


33. 


266 


267 


268 


269 


270 


271 


272 


273 


274 


275 


276 


277 


278 


279 


280 


281 


282 


283 


J. Wang: Recurrent neural networks for computing 
pseudoinverse of rank-deficient matrices, SIAM 
J. Sci. Comput. 18(5), 1479-1493 (1997) 

G.G. Lendaris, K. Mathia, R. Saeks: Linear Hop- 
field networks and constrained optimization, IEEE 
Trans. Syst. Man Cybern. B 29(1), 14-118 (1999) 

Y. Xia, J. Wang, D.L. Hung: Recurrent neural net- 
works for solving linear inequalities and equa- 
tions, IEEE Trans. Circuits Syst. | 46(4), 452-462 
(1999) 

J. Wang: A recurrent neural network for solv- 
ing the shortest path problem, IEEE Trans. Circuits 
Syst. | 43(6), 482-486 (1996) 

J. Wang: Primal and dual neural networks for 
shortest-path routing, IEEE Trans. Syst. Man Cy- 
bern. A 28(6), 864-869 (1998) 

Y. Xia, J. Wang: A discrete-time recurrent neural 
network for shortest-path routing, IEEE Trans. Au- 
tom. Control 45(11), 2129-2134 (2000) 

D. Anguita, A. Boni: Improved neural network for 
SVM learning, IEEE Trans. Neural Netw. 13(5), 1243- 
1244 (2002) 

Y. Xia, J. Wang: A one-layer recurrent neural net- 
work for support vector machine learning, IEEE 
Trans. Syst. Man Cybern. B 34(2), 1261-1269 (2004) 
L.V. Ferreira, E. Kaszkurewicz, A. Bhaya: Support 
vector classifiers via gradient systems with dis- 
continuous right-hand sides, Neural Netw. 19(10), 
1612-1623 (2006) 

J. Wang: Analysis and design of an analog sort- 
ing network, IEEE Trans. Neural Netw. 6, 962-971 
(1995) 

B. Apolloni, |. Zoppis: Subsymbolically manag- 
ing pieces of symbolical functions for sorting, IEEE 
Trans. Neural Netw. 10(5), 1099-1122 (1999) 

J. Wang: Analysis and design of k-winners-take- 
all model with a single state variable and Heav- 
iside step activation function, IEEE Trans. Neural 
Netw. 21(9), 1496-1506 (2010) 

Q. Liu, J. Wang: Two k-winners-take-all networks 
with discontinuous activation functions, Neural 
Netw. 21, 406-413 (2008) 

Y. Xia, M. S. Kamel: Cooperative learning algo- 
rithms for data fusion using novel L1 estimation, 
IEEE Trans. Signal Process. 56(3), 1083—1095 (2008) 
B. Baykal, A.G. Constantinides: A neural approach 
to the underdetermined-order recursive least- 
squares adaptive filtering, Neural Netw. 10(8), 
1523-1531 (1997) 

Y. Sun: Hopfield neural network based algorithms 
for image restoration and reconstruction — Part 
I: Algorithms and simulations, IEEE Trans. Signal 
Process. 49(7), 2105-2118 (2000) 

X.Z. Wang, J.Y. Cheung, Y.S. Xia, J.D.Z. Chen: Min- 
imum fuel neural networks and their applica- 
tions to overcomplete signal representations, IEEE 
Trans. Circuits Syst. | 47(8), 1146-1159 (2000) 

X.Z. Wang, J.Y. Cheung, Y.S. Xia, J.D.Z. Chen: Neu- 
ral implementation of unconstrained minimum 


647 


EE | d Hed 


648 PartD 


Neural Networks 


EE | d Hed 


33.284 


33.285 


33.286 


33.287 


33.288 


33.289 


33.290 


33.291 


33.292 


Li-norm optimization—least absolute deviation 
model and its application to time delay estima- 
tion, IEEE Trans. Circuits Syst. Il 47(11), 1214-1226 
(2000) 

P.-R. Chang, W.-H. Yang, K.-K. Chan: A neural 
network approach to MVDR beamforming prob- 
lem, IEEE Trans. Antennas Propag. 40(3), 313-322 
(1992) 

Y. Xia, G.G. Feng: A neural network for robust 
LCMP beamforming, Signal Process. 86(3), 2901- 
2912 (2006) 

J. Wang, G. Wu: A multilayer recurrent neural 
network for on-line synthesis of minimum-norm 
linear feedback control systems via pole assign- 
ment, Automatica 32(3), 435-442 (1996) 

Y. Zhang, J. Wang: Global exponential stabil- 
ity of recurrent neural networks for synthesizing 
linear feedback control systems via pole assign- 
ment, IEEE Trans. Neural Netw. 13(3), 633-644 
(2002) 

Y. Zhang, J. Wang: Recurrent neural networks for 
nonlinear output regulation, Automatica 37(8), 
1161-1173 (2001) 

S. Hu, J. Wang: Multilayer recurrent neural net- 
works for online robust pole assignment, IEEE 
Trans. Circuits Syst. | 50(11), 1488-1494 (2003) 

Y. Pan, J. Wang: Model predictive control of un- 
known nonlinear dynamical systems based on 
recurrent neural networks, IEEE Trans. Ind. Elec- 
tron. 59(8), 3089-3101 (2012) 

Z. Yan, J. Wang: Model predictive control of non- 
linear systems with unmodeled dynamics based 
on feedforward and recurrent neural networks, 
IEEE Trans. Ind. Inf. 8(4), 746-756 (2012) 

Z. Yan, J. Wang: Model predictive control of track- 
ing of underactuated vessels based on recurrent 


33.293 


33.294 


33.295 


33.296 


33.297 


33.298 


33.299 


33.300 


neural networks, IEEE J. Ocean. Eng. 37(4), 717-726 
(2012) 

J. Wang, Q. Hu, D. Jiang: A Lagrangian network 
for kinematic control of redundant robot manip- 
ulators, IEEE Trans. Neural Netw. 10(5), 1123-1132 
(1999) 

H. Ding, S.K. Tso: A fully neural-network-based 
planning scheme for torque minimization of re- 
dundant manipulators, IEEE Trans. Ind. Electron. 
46(1), 199-206 (1999) 

H. Ding, J. Wang: Recurrent neural networks for 
minimum infinity-norm kinematic control of re- 
dundant manipulators, IEEE Trans. Syst. Man Cy- 
bern. A 29(3), 269-276 (1999) 

W.S. Tang, J. Wang: Two recurrent neural networks 
for local joint torque optimization of kinemat- 
ically redundant manipulators, IEEE Trans. Syst. 
Man Cybern. B 30(1), 120-128 (2000) 

W.S. Tang, J. Wang: A recurrent neural network 
for minimum infinity-norm kinematic control of 
redundant manipulators with an improved prob- 
lem formulation and reduced architectural com- 
plexity, IEEE Trans. Syst. Man Cybern. B 31(1), 98- 
105 (2001) 

Y. Zhang, J. Wang: Obstacle avoidance for kine- 
matically redundant manipulators using a dual 
neural network, IEEE Trans. Syst. Man Cybern. B 
4(1), 752-759 (2004) 

Y. Xia, J. Wang, L.-M. Fok: Grasping force opti- 
mization of multi-fingered robotic hands using 
a recurrent neural network, IEEE Trans. Robot. Au- 
tom. 20(3), 549-554 (2004) 

Q. Liu, C. Dang, T. Huang: A one-layer recurrent 
neural network for real-time portfolio optimiza- 
tion with probability criterion, IEEE Trans. Cybern. 
43(1), 14-23 (2013) 


649 


34. Computational Neuroscience — 
Biophysical Modeling of Neural Systems 


Harrison Stratton, Jennie Si 


Only within the past few decades have we had the 


34.1.1 Introduction to the Anatomy 


tools capable of probing the brain to search for the 34.1.2 lhe po Senin . a 
fundamental components of cognition. Modern Signal Detection...................000 651 
numerical techniques coupled with the fabrication 34.1.3 Associations =- The Foundation 
of precise electronics have allowed us to identify Of CORNION -seinenc 652 
the very substrates of our own minds. The pioneer- 
ing work of Hodgkin and Huxley provided us with 34.2 Cells and Signaling Among Cells ........... 652 
the first biologically validated mathematical model 34.2.1 Neurons — Electrically Excitable 
describing the flow of ions across the membranes BG: E EE 652 
of giant squid axon. This model demonstrated 34.2.2 Glial Cells - Supporting Neural 
the fundamental principles underlying how the NEtWOTKS oiccen 654 
electrochemical potential difference, maintained 34.2.3 Transduction Proteins — 
across the neuronal membrane, can serve as Cellular Signaling Mechanisms.... 654 
a medium for signal transmission. This early model 34.2.4 Electrochemical 
has been expanded and improved to include Potential Difference — 
elements not originally described through collab- Signaling Medium ............ eee 654 
oration between biologists, computer scientists, 34.3 Modeling Biophysically Realistic 
physicists and mathematicians. Multi-disciplinary NOI o.. 656 
efforts are required to understand this system that 34.3.1 Electrical Properties of Neurons .. 656 
spans multiple orders of magnitude and involves 34.3.2 Early Empirical Models - 
diverse cellular signaling cascades. The massive Hodgkin and Huxley.................. 657 
amount of data published concerning specific 34.3.3 Compartmental Modeling - 
functionality within neural networks is currently Anatomical Reduction................ 659 
one of the major challenges faced in neuroscience. 34.3.4 Cable Equations — 
The diverse and sometimes disparate data col- Connecting Compartments ......... 659 
lected across many laboratories must be collated 34.4 Reducing Computational Complexity 
ito ine same amewa before wE can iensen for Large Network Simulations ............. 660 
toa general theory explaining the brain. Since this 34.4.1 Reducing Computational 
broad field would typically be the subject of its Complexity - Large Scale Models. 660 
own textbook, here we will focus on the funda- 34.4.2 Firing Rate-Based 
mental physical relationships that can be used to Dynamic Models.................:00008 661 
understand biological processes in the brain. 34.4.3 Spike Response Model ............... 662 
. 34.5 Conclusions ..............0.....ccceccceeeeeeeeeeeees 662 
34.1 Anatomy and Physiology 
of the Nervous System ....................0068 u9 References. srai nimenneen 662 


34.1 Anatomy and Physiology of the Nervous System 


The animal brain is undoubtedly a unique organ that 
has evolved from humble beginnings starting with small 


groups of specialized cells in organisms long ago. The 
vast complexity found within the cortices of the mam- 


v 
o 

=l 
i=) 
w 
B 
= 


650 Part D 


Neural Networks 


LHE | d Hed 


malian brain is the result of selection across countless 
generations. The purpose of these early cells is in prin- 
ciple consistent with the function of our entire brain. 
Both serve to assess environmental variables in or- 
der to produce output that is situationally relevant. 
This production of appropriate behavioral responses 
is essential for an organism to successfully obtain re- 
sources in complex and often hostile surroundings. The 
morphology of the brain is species dependent. Its struc- 
ture is functionally correlated to the necessary output 
a specific animal requires to survive in a particular en- 
vironment. For example, the commonly used laboratory 
mouse has a brain structure that is coarsely similar to 
a human. However, the mouse possesses particularly 
enlarged olfactory bulbs situated in the front of the 
skull. This is in great contrast to the human olfactory 
centers that are considerably smaller, but perform the 
same function. This dramatic difference in the size of 
the olfactory bulb relates to differences in environmen- 
tal variables that exist between lab mice and humans. 
In contrast to humans, mice live in environments where 
scent is a highway of information. Odorant molecules 
can provide crucial signals regarding changes to the en- 
vironment that indicate such things as the approach of 
a predator or the presence of food. In contrast, humans 
have evolved a diverse set of methods for gathering 
food and avoiding predators that are highly depen- 
dent upon visual stimulation. As a consequence of this 
developmental variable, our brains have evolved to ef- 
ficiently process visual stimuli with incredible speed 
and acuity [34.1]. This paradigm of form fits function 
exists throughout nature and allows experimentalists 
to take advantage of shared anatomical and physio- 
logical characteristics. By adjusting for differences in 
evolutionary history, we can confidently perform ex- 
periments on neurons from other animals. This data 
can then be extended and translated to gather informa- 
tion about the properties of our own nervous system. 
From this point on, unless otherwise specified, any ref- 
erence to the nervous system refers to that of higher 
mammal species including rodents and primates, and 
humans. 


34.1.1 Introduction to the Anatomy 
of the Nervous System 


Cells of the nervous system arise from ectodermal 
embryonic tissue and generally develop into two dis- 
tinct groups: the cells of the peripheral nervous system 
(PNS) and those of the central nervous system (CNS). 


The brain and the spinal cord together constitute the 
CNS, with the nerves of the body (peripheral nerves) 
and autonomic ganglion forming the PNS. The PNS 
refers to neurons and sensory organs located outside 
the blood brain barrier (BBB) created by the meninges, 
which is a three-layer, dynamic, protective system 
isolating the CNS from the circulatory system. Addi- 
tionally, the primary immune system does not extend 
into the brain, leaving it particularly vulnerable. Ex- 
ploration of the PNS has formed the foundation of 
neuroscience research because these neurons tend to 
be large and easy to locate and remove for experi- 
mental examination. While many of the foundational 
principles of neuroscience were discovered within the 
PNS, the CNS has been the primary target of most 
recent neuroscience research. This is primarily due 
to the emergence of the frontal cortices as the sub- 
strate for conscious thought and action. The CNS 
contains many sub-regions that can be broadly sep- 
arated into the spinal cord, the brainstem, cerebellar 
cortex, and cerebral cortex. Increasing complexity can 
be observed moving from spinal cord to frontal cortex, 
which demonstrates that the most forward structures 
are the most recently evolved. The recently evolved 
frontal cortex structures are of high interest to neu- 
roscience researchers and are the subject of many 
computational investigations attempting to elicit their 
function [34.2]. 

The human brain is divided into four major re- 
gions (Fig. 34.1): the cortex, which includes the four 
lobes of the brain, the midbrain, the brainstem, and 
the cerebellum. The brainstem, midbrain, and cerebel- 
lum, also known as subcortical regions were the first 
to evolve and play various roles in the regulation of 
basic physiological function and relay of information 
to the cortex. The brainstem continues caudally as the 
spinal cord and contains numerous nuclei for the pro- 
cessing of information generated by spinal neurons. 
The midbrain is of particular importance concerning 
the integration and transmission of information from 
the spinal cord and brainstem to the cortex. The tha- 
lamus, part of the midbrain, is often called the gateway 
to the cerebral cortex as it is located centrally and re- 
tains projections to all parts of the cortex, and thus 
plays a pivotal role in the transmission of subcortical 
information to various association areas of the cor- 
tex. The cortex is divided into four lobes including 
frontal, parietal, temporal, and occipital. Each lobe con- 
tains areas of specialized function as well as areas 
of association. Incoming sensory information requires 


Computational Neuroscience — Biophysical Modeling of Neural Systems 


Rostral 


BwWNe 


5 Broca's area 


context, hence information first travels to the areas of 
specialized function and subsequently traverses vari- 
ous association areas for integration before reaching 
its target of motor output [34.3]. An example of this 
flow of information within the cortex is depicted in 
Fig. 34.1, which details the progression of information 
from the primary auditory cortex for deciphering sound 
to Broca’s area, which is involved in speech production. 
Figure 34.1 also demonstrates the general anatomy of 
the human brain, including the brainstem, cerebellum, 
and cortex. 

Reviewed simply, the flow of information through- 
out this comprehensive system begins in sensory cells 
and ends in motor output via peripheral nerves. Sensory 
cells, such as mechanoreceptors for tactile sensation, 
transduce and forward signals to the spinal cord for first 
level processing. Sensory fibers run specifically along 
the dorsal surface of the spinal cord and ascend the en- 
tirety of the cord. Some synapse locally on the spinal 
cord and others project fully to the cortical surface of 
the brain. The information from sensory cells is first 
received in the proximal cortex, associations between 
stimuli are made, and this information is again relayed 
to the front of the brain where it is used to form a plan 
of action based on the specific input pattern. The front 
of the brain then begins a processing cascade that flows 
back toward the central sulcus of the brain and output 
motor neurons are triggered. These output signals de- 
scend the brain as a large bundle that go on to form 
the ventral surface of the spinal cord, where they ei- 
ther end on a local spinal neuron or are routed distally 
to a muscle. For a more complete discussion of neu- 
ral anatomy and information processing, please refer to 
Kandel et al. [34.1]. 


Frontal lobe 
E Parietal lobe 
E Temporal lobe 
{> Occipital lobe 
D Cerebellum 
E Pons 
E Medulla oblongata 


Primary auditory cortex 
Wernicke's area 
Primary visual cortex 
Angular gyrus 


* Arrow indicates flow of information 
from secondary association cortices 
to Broca's area; the Arcuate fasciculus 


Fig. 34.1 A cartoon representation 

of the human brain with the major 
lobes colored according to the key on 
the left. The blue arrow highlights, as 
an example, a specific pathway that 
heavily contributes to the output of 
language by bringing together au- 
ditory and visual information in the 
parietal lobe. This information that 
contains associations between stimuli 
is then forwarded to the frontal cor- 
tex where it is integrated with the rest 
of the bodies’ sensory information 
and will guide the output of speech 
sounds 


34.1.2 Sensation — Environmental Signal 
Detection 


Transduction of physical environmental stimuli into 
electrical and chemical signals is the common func- 
tion shared by all sensory systems. This transduction 
provides baseline input to the brain from an array of 
sensors placed throughout the body in the skin, eyes, 
ears, mouth, and nose. These highly specialized cells 
respond only to the application of very precise ex- 
ternal stimuli. While each receptor cell is specialized 
for a specific signal, there is much variation within 
each sensory system, allowing the most pertinent in- 
formation to be extracted from the environment. An 
example of this variation exists within the eye, where 
there are two primary detector cell types: rods and 
cones [34.4]. The former is involved in the sensation 
of light and dark and the latter for the perception of col- 
ors. Even within the collection of cone cells there are 
further specializations that allow for the detection of 
various color wavelengths most specifically red, blue, 
and green. A similar pattern is observed in cells of the 
inner ear, where hair cells are housed in osseous cavities 
lined with membrane. This makes these cells special- 
ized for the detection of vibrations in the air occurring 
at different frequencies. Sensory specialization is fur- 
ther mapped onto the brain where distinct regions of 
the cortex correspond to specific environmental stim- 
uli or motor output patterns. Sensation is a vital part 
of cognition as the representation of the world we each 
possess is built upon our own unique sensory expe- 
riences. This literally means that we have shaped the 
surfaces of our cortices based on our experiences as 
individuals. 


34.1 Anatomy and Physiology of the Nervous System 651 


ie | d Hed 


652 Part D | Neural Networks 


THE | d Hed 


34.1.3 Associations — The Foundation 
of Cognition 


The brain is an integrative structure capable of trans- 
forming sensory stimuli while forming associations 
between stimuli based on temporal and local param- 
eters. Sensory integration allows organisms to relate 
pertinent information about environmental variables in 
real time based on successful behavioral patterns of the 
past. The mechanism driving these associative prop- 
erties of the brain has been the subject of countless 
scientific endeavors yielding a basic understanding. Al- 
though progress has been made in understanding how 
the nervous system adapts to the environment, there 
are countless questions that emerge as new discoveries 
are made that require a constant revision of param- 
eters. Kandel [34.5] was among the first to develop 
experiments capable of demonstrating the molecular 
and cellula r mechanisms underlying learning within 
a biological system. 

Developing associations across a variety of sensory 
pathways is a major constituent of learning within ani- 
mals. These associations are formed by altering cellular 
physiology based on input experience for a certain neu- 
ron or population of connected neurons. This cellular 
learning is the foundation for consciousness and there 
are many processes at the cellular level that contribute 


34.2 Cells and Signaling Among Cells 


The brain is composed of two major cell types: neurons 
and glial cells. Both of these cell types are essential for 
the brain to function properly. Standard models place 
electrically excitable neurons in a signaling role with 
glial cells serving an indispensable support role al- 
though emerging evidence suggests glial cells could be 
more heavily involved with signaling than previously 
thought. The majority of research in computational neu- 
roscience focuses on modeling neurons and their role 
in generating conscious behavior. This is largely due 
to the fact that their maintenance of a potential differ- 
ence across the membrane serves as an efficient and 
robust communication pathway that can be modeled us- 
ing established electrical dynamics. Unlike other cells 
of the body the neuron is a non-differentiating cell, 
which means it does not continuously undergo cellu- 
lar mitosis or meiosis. This simple difference in the life 
cycle of these cells affords them an indispensable role 
in the animal nervous system — memory. Each cell in 


to learning in different ways. Some association mech- 
anisms directly alter the number of synapses between 
neurons based on a history of communication between 
the cells, where other processes will affect the cell’s 
DNA to accommodate a certain input pattern [34.6]. All 
of the mechanisms that influence learning in the brain 
have yet to be defined, which leaves a large opportu- 
nity for conjecture as to what constitutes learning and 
what does not. Modeling maintains a distinct advan- 
tage, as models of cellular communication and learning 
can be prototyped in silico to account for the large num- 
ber of modifications occurring over time. The output 
of these models can then logically guide our search 
for learning mechanisms within the brain by outlin- 
ing a possible path where learning mechanisms could 
be discovered. While we understand that associations 
between neurons and glial cells likely form the basis 
of cognition, we have yet been able to recreate this 
phenomenon to completely explain its nature. This is 
partly due to the fact that the mammalian brain is so 
large and contains so many networks that detection and 
characterization of all changes occurring within the sys- 
tem simultaneously is extremely difficult. To resolve 
this issue, many neuroscientists have turned to using 
model organisms that provide a reduced set of neurons 
with which to experiment and demonstrate fundamental 
theories. 


the body has an innate type of memory that begins by 
receiving messages from their environment at the mem- 
brane. These signals can sometimes propagate into the 
nucleus where alterations in DNA can occur, which will 
ultimately affect the function of the cell. Neurons have 
this type of DNA memory but also form connections 
with their neighboring cells or even cells that are lo- 
cated in other brain regions [34.7]. Because neurons do 
not divide these connections are not reset and can last 
for periods of time that are much longer than the life cy- 
cle of a typical cell. Connections between neurons are 
called synapses and form the fundamental communica- 
tion element in the nervous system. 


34.2.1 Neurons — Electrically Excitable Cells 
Neurons are composed of a cell body referred to as 


the soma, which contains membrane-bound organelles 
that are found in most cells, in addition to one or more 


Computational Neuroscience — Biophysical Modeling of Neural Systems 


34.2 Cells and Signaling Among Cells 


protoplasmic projections: an axon and dendrites. The 
soma contains the nucleus and is the central portion of 
the neuron. Among neurons, the most common path- 
way for signal transmission is from the axon terminal 
of one neuron to another’s dendrites, which then re- 
lays the signal to the neuron’s soma and on to the 
axon to be transmitted to another neuron. In particular, 
dendrites receive and conduct electrochemical signals 
from other neurons to the soma and play an integral 
role in determining the extent to which action poten- 
tials are generated. Dendrites are composed of many 
branches called dendritic trees and can create exten- 
sive and unique branched networks between neurons. 
The axon is the anatomical structure through which 
action potentials are transmitted away from the soma 
to other neurons or other types of cells such as mus- 
cles, ganglia, or glands [34.8]. Axons vary in length 
tremendously and bundle together to form large periph- 
eral nerves that course from the spinal cord to the toe. 
They vary in composition depending upon their loca- 
tion in the PNS or the CNS and may be myelinated 
or unmyelinated. Myelin is a fatty, dielectric insulat- 
ing layer that speeds signal conduction along the axon 
by forming discrete regions of low resistance and high 
conduction velocity. In between the discrete myelinated 
portions of axon, there are nodes of Ranvier to repeat 
the signal along the next segment of axon. Axons nor- 
mally maintain an equal radius throughout their course 
and terminate at a synapse, where the electrochemical 
signal will be transmitted from the neuron to the target 
cell, which may be another neuron or another type of 
cell. A synapse is formed by the end(s) of one neuron’s 
axon, called the axon terminal and the dendrites, axon, 
or soma of the receiving neuron. The synapse is fun- 
damentally a transducer that converts electrical signals 
from a membrane potential wave to a chemical signal 
that modifies the state of a downstream cell [34.9]. 
Neurons are classified by the branching pattern and 
location of their dendrites and axons, their physiolog- 
ical function and location within the nervous system. 
Structural and functional classification, and the type of 
neurotransmitter released are relevant to modeling and 
will be reviewed. Structurally, neurons are classified 
as unipolar, pseudo-unipolar, bipolar, or multi-polar. 
Unipolar neurons contain one protoplasmic projection 
that divides distally into sensory and transmitting por- 
tions of an axon. Bipolar neurons retain two projections 
from the soma, one from which dendrites extend and 
the other from which the axon extends. The major- 
ity of neurons are multipolar neurons, which normally 
contain one long axon and many dendritic projections. 


Figure 34.2a exemplifies bipolar, hippocampal neurons 
and Fig. 34.2b demonstrates the anatomy of a corti- 
cal, glutamatergic neuron. Functionally, neurons can be 
classified according to their electrophysiology and be 
described as tonic, phasic, or fast-spiking. However, it 
is more effective to describe the different firing pat- 
terns experienced by neurons, as most neurons exhibit 
variable firing patterns. Tonic firing involves continu- 
ous responses to stimuli and recurrent generation of 


a) 


Fig. 34.2 (a) Fluorescently labeled hippocampal slice 
showing neuronal cell bodies stained in blue with neu- 
ronal nuclei antibody (NeuN). The GFP (green fluorescent 
protein) and RFP (red fluorescent protein) labeled neurons 
extending their processes are of dentate granule cell ori- 
gin and have matured over the course of the experiment 
as this slice was imaged at postnatal day 53. (b) Here 
we see a neuron that has had its cell body stained with 
a red fluorescent marker and presynaptic glutamate recep- 
tors indicated in green. The sheer volume of presynaptic 
targets can be seen here along with two segments that have 
been selected and magnified to give a better view of how 
synapses cover the dendrite surface 


653 


THE | d Hed 


654 PartD 


Neural Networks 


THE | d Hed 


action potentials. Tonic firing patterns are observed in 
large excitatory neurons during basal levels of activity 
to provide constant communication between elements 
of the network. Phasic firing patterns consist of bursts 
of action potentials, often in quick succession that has 
dramatic downstream effects. Bursting and phasic firing 
are highly studied phenomena that often signal a shift 
in steady state activity levels in the circuits where they 
are observed. Along with the branching patterns of den- 
drites and axons, neurons may be classified by the type 
of neurotransmitter released at synapses. The two dom- 
inant neurotransmitters in the brain are glutamate and 
gamma-aminobutyric acid (GABA), which generally 
mediate excitatory and inhibitory neurotransmission re- 
spectively [34.10]. 


34.2.2 Glial Cells - Supporting Neural 
Networks 


Glial cells are fandamentally different from neurons in 
that they do not form synapses with other cells and 
generally do not maintain a membrane potential. Al- 
though they are not directly involved in signaling be- 
tween neurons through synaptic means, these cells play 
a large role in the maintenance of synapses as well as 
signal integrity. There are four major types of glial cells 
in the brain: oligodendrocytes, astrocytes, ependymal 
cells, and micro-glial cells [34.11]. Oligodendrocytes 
secrete myelin, the insulating dielectric material cov- 
ering axons, which facilitates signal transmission over 
longer distances. Ependymal cells line the ventricles of 
the brain and secrete the cerebral spinal fluid that bathes 
the brain and provides a route for expulsion of waste and 
the intake of nutrients. Micro-glial cells act as a type of 
immune system for the brain by digesting dead cells and 
collecting material that should not be present or could be 
damaging to cells of the brain. Astrocytes are abundant, 
star-shaped cells that generally surround neurons and 
provide nutrition and oxygen, and remove waste that 
could be toxic to a neuron if left to accumulate via cere- 
bral spinal fluid. This astrocyte driven waste-removal 
system is essential for normal physiological function 
and structurally resembles the lymphatic system found 
in the rest of the body’s tissues. Remarkably, new ev- 
idence suggests astrocytes may actually participate in 
synaptic communication, and that communication may 
be bi-directional [34.12]. Communication between as- 
trocytes and neurons has yet to be fully characterized 
and provides an opportune target for new venture into 
neural network modeling. This new evidence of im- 
plicating astrocytes will shift our understanding of the 


brain as a communications structure and will open new 
questions that can be addressed using computational 
models. The current models of non-linear neural sys- 
tems are already complex and must be altered to account 
for the evidence present in this new paradigm. 


34.2.3 Transduction Proteins — 
Cellular Signaling Mechanisms 


All cells of the body contain functional proteins em- 
bedded in their lipid bi-layer that are responsible for 
transducing environmental signals across the cellular 
membrane. The incredible variety of proteins present 
in the nervous system serves as mediators of cellu- 
lar communication. Each protein will have its own 
unique structure depending on its function, and there 
are large groups of proteins that all share common prop- 
erties such as the g-protein coupled receptors (GPCRs), 
ligand gated ion channels, passive ion channels, and 
a plethora of others. All of these will not be defined 
here as the classification and physiology of membrane 
proteins can be considered the subject of a whole field 
and are reviewed in detail by Grillner in [34.12]. The 
most important protein varieties for our consideration 
are those that control the flow of ions across the lipid 
bi-layer, which can be achieved in a number of ways. 
Some directly pass ions through a small pore in their 
center that is selective to certain ion species based on 
their electron distribution. Others use stored chemical 
energy to transfer a subset of ions outside of the cell 
while bringing others into the cell allowing a gradient 
to be established by using stored cellular energy to push 
ions against their potential energy gradient. 


34.2.4 Electrochemical 
Potential Difference - 
Signaling Medium 


Neurons carry information to their targets in the form 
of fluctuations of their membrane potential. Changes in 
the value of the membrane potential can trigger a vari- 
ety of cellular signals including the opening of voltage 
gated proteins or the release of chemical neurotransmit- 
ters at a synapse [34.13]. As mentioned earlier, neurons 
generate their membrane potential by selectively trans- 
porting ions across the cellular membrane using energy, 
normally in the form of adenosine triphosphate (ATP). 
With the right combination of proteins neurons are able 
to shift their membrane potential in response to com- 
munication from other cells in a discrete fashion. This 
discrete wave along the membrane is known as an ac- 


Computational Neuroscience — Biophysical Modeling of Neural Systems 


34.2 Cells and Signaling Among Cells 


tion potential and is only generated by a neuron once 
a certain level of activation has been attained. A sin- 
gle neuron in a network must receive communication 
from other cells, normally at its dendrites, which will 
cause the accumulation of positive ions inside the cell. 
Once the positive ions have accumulated to a certain 
critical level the cell will generate an action potential 
that flows down the axon to the cells targets. This ac- 
tion potential is the fundamental signaling unit within 
the nervous system and triggers the release of neuro- 
transmitters when it arrives at a synapse. Once an action 
potential has been generated there is a period where the 
cell cannot create another wave, and this time is known 
as the absolute refractory period. 

The potential energy across the membrane is a com- 
plex value that results from the transport of ions against 
their concentration gradient (Fig. 34.3). If these ions 
were left to freely diffuse the membrane potential 
would eventually deteriorate as each ion moved toward 
its own value of equilibrium potential. This equilibrium 


Extracellular 


E Na’ 
A K* 
ec 


@ Anions Intracellular 


Fig. 34.3 The lipid bi-layer is composed of phosphate 
groups on the extra and intracellular faces of the mem- 
brane with a variety of lipids attached that essentially will 
self-assemble when dissolved in water at the appropriate 
concentration. The extra cellular surface has a net posi- 
tive charge relative to the cytoplasmic side that is relatively 
negative. The membrane acts as a semi-permeable barrier 
that prevents the passage of molecules based on their at- 
traction to water. Here we see that potassium ions have 
accumulated within the cell along with positive anions and 
sodium, and chloride can be found outside the cell due to 
the action of specific transporters embedded in the mem- 
brane. The asymmetric distribution of charge across the 
membrane causes a potential difference to exist in terms 
of both the electrical potential of the ions and their chemi- 
cal nature to diffuse down their concentration gradient 


potential, also known as the reversal potential or the 
Nernst potential, can be calculated using the following 
relationship for any ionic species x 


RT | Klo 


zF n [X]; è (34.1) 


y =e 


In the above equation, known as the Nernst equation, 
E, represents the equilibrium potential for a certain ion 
species with [X], and [X]; representing the external and 
internal concentrations of the ion, respectively [34.14]. 
Additionally, R is the Rydberg constant, T is the tem- 
perature in kelvin, z is the atomic number of the ion, 
and F is the Faraday constant followed by the natural 
logarithm of the concentration difference. At the rever- 
sal potential for an ionic species there is a net force 
of zero on ions in the systems and these particles will 
be at rest. From this relationship we can see that the 
energy driving the fluctuations in membrane potential 
originates from both electrical and chemical sources. 

While the Nernst equation is capable of determining 
the reversal potential for a single ionic species, physi- 
ological systems often have many additional ions that 
participate in cellular signaling. The Nernst equation 
was expanded to form the Goldman-Hodgkin-KĶatz 
equation that yields the membrane potential in a resting 
system composed of multiple ionic species as shown 
below 


LRT n Pg [Kt], + Pra [Nat], + Poi [C17]; 
F Pk [K+]; + Pra [Nat]. + Pa [Cl], 
(34.2) 


Table 34.1 Here we see the concentration distribution of 
different salt species as they are found within a voltage 
clamp experiment using a segment of giant squid axon. 
These values will be different in each experimental prepa- 
ration and should be considered carefully as small vari- 
ations can have large implications for the firing patterns 
observed. Take note that here there are no positive anions 
present as the center of the giant squid axon is devoid of 
organelles unlike the inside of a mammalian neuron of the 
central nervous system that would likely contain many 


Ion Cytoplasmic Extracellular Equilibrium 
concentration concentration potential 

Kt 400 20 -75 

Nat 50 440 +55 

om 52 560 —60 

Organic 385 None None 

anions 

O 


655 


THE | d Hed 


656 PartD 


Neural Networks 


EHE | d Hed 


Here u is the resting membrane potential of the cellular 
system under consideration and the constants are the 
same as shown above in the Nernst equation [34.15]. 
However, instead of relying solely upon the concen- 
tration of the particular ion inside and outside the cell 
here we can see that the number of ion channels, rep- 
resented by the variable P, is also taken into account. 
Here P is the membrane’s permeability to that particular 
ionic species in the unit cm/s. This equation is used to 
calculate the membrane potential of a particular physi- 
ological system with multiple internal and external salt 
species such as potassium, sodium, and chloride. No- 
tice that the relationship between membrane potential 
and chloride concentration is the inverse due to its neg- 
ative valence. 

The simplest description for the maintenance of 
the membrane potential derives from the asymmetric 
distribution of ions across the cell’s semi-permeable 
membrane. The concentrations displayed above were 
the first calculated for any neuron and were measured 


from the axon of a giant squid. By carefully measur- 
ing the potential difference between the inside of the 
axon and the external solution Hodgkin and Huxley 
were able to determine the amount of current flow- 
ing across the axon’s membrane under varying con- 
ditions [34.16]. The potential difference between the 
inside and outside of the cell is generated by manip- 
ulating the concentration of each ion with respect to 
its charge. Through careful observation of Table 34.1 
above one can note that this resting condition will lead 
to a state where the cell is relatively negative on the 
inner membrane and positive along the outer wall of 
the membrane. It is this relative potential that varies 
along the surface of the neural membrane and it is 
what is responsible for carrying information along the 
length of the cell. These concentrations are consid- 
ered bulk values and hardly ever deviate from these 
concentrations unless a period of sustained firing has 
occurred, where the cell can deplete this potential 
difference. 


34.3 Modeling Biophysically Realistic Neurons 


34.3.1 Electrical Properties of Neurons 


When modeling a system one must first consider its 
physical dimensions and elements in order to con- 
struct a model that is true to reality and mathematically 
sound. By investigating the most fundamental structure 
of a neuron it is simple to see how this system can be 
easily related to that of an electronic capacitor. We have 
a system composed of two electrically conductive medi- 
ums, the extracellular fluid and the cytoplasm, which 
are separated by a dielectric layer that is also the phos- 
pholipid bi-layer. Therefore, from this description of 
a neuron we can generate the following relationship for 
the membrane potential 


u= >, (34.3) 


C 
where u represents the potential across the membrane 
and it is equal to the quotient of the charge along the 
membrane surface Q and the capacitance of the mem- 
brane itself C [34.17]. From this relationship it is clear 
that the membrane potential relies on the species and 
number of charges distributed across the membrane sur- 
face as well as the lipid constituents of the bi-layer. 
The capacitance of the membrane in a neuron is gener- 
ally around 1 yF/cm? but this can fluctuate depending 


on the local ionic and lipid composition. Four major 
ionic currents are most often found in the cellular mem- 
brane and considered in a biologically realistic model 
(Fig. 34.4). 

Similarly to electrical capacitors, neurons are reliant 
upon dynamic currents that flow through the membrane 


ti 


| 
aaa 


Fig. 34.4 A schematic equivalent circuit diagram of the 
four major ionic currents most often found in the cellular 
membrane. The resistors represent the varying conductiv- 
ity of membrane channels for each ion and the batteries are 
each ion’s respective concentration gradient. On the right 
we can see the membrane has an inherent capacitive cur- 
rent that acts to slow the spread of membrane currents and 
is manifested by the physical structure of the cell 


Membrane 
capacitance 


Computational Neuroscience — Biophysical Modeling of Neural Systems 


34.3 Modeling Biophysically Realistic Neurons 


a) b) Magnitude (mA/cm?) 
20 0.2 
— v(.5) — Soma.ina_HH(.5) 
0 0.1 N — Soma.ik_HH(.5) | 
-20 0 i 
—40 
—60 —0.1 
80 > 0.2 > 
0 S 10 15 0 5 10 15 
Time (ms) Time (ms) 
co) d) 
0.006 il 
— Soma.gnabar_HH(.5)*soma.m_HH(.)*soma.h_HH(.5) 0.8 
0.004) — Soma.gkbar_HH(.5)*soma.n_HH(.5)* a | /N 7 
0.4 
O2 a E — 
> DB EE > 
10 15 0 5 10 15 
Time (ms) Time (ms) 


Fig. 34.5 (a) The membrane potential measured at the middle of a spherical single compartment model cell denoted 
by v(.5). The cell has Hodgkin—Huxley type current dynamics. The y axis shows membrane potential at the center of 
the cell, v(.5), and the x axis represents time in ms. (b) The blue trace shows the total magnitude, in mA /cm?, of the 
outward potassium current within the Hodgkin—Huxley (HH) model. The trace red shows the total magnitude of the 
inward HH sodium current in the same units of mA/cm?. (c) The red trace is the variable 8nat mh as shown in (34.5). 
The blue trace shows the potassium gating constant gg+n* also from (34.5). (d) The state of each of the gating variables 
m, n, and h where the value 1 represents fully open and 0 represents fully closed or inactivated channels. This simulation 
was conducted within the NEURON simulation environment with a single compartment of area 29 000 p?, initialized at 
—50 mV, with physiological concentrations of calcium, chloride, potassium, and sodium ions 


into and out of the cell. One of the most important ions 
is potassium (K+), which conducts current based on 
the following relationship, ix = (yg x u)— (yg x Ex) = 
VK x (u— Ex), where y is the ionic conductance, u is 
the membrane potential, and E is the ionic reversal po- 
tential. The final term on the right-hand side of the 
equation is known as the electromotive force and can 
be calculated independently for each ionic current. This 
relationship shows how a neuron with both membrane 
potential and a potassium concentration gradient pro- 
duces net potassium current. This potassium current 
can be generalized across the entire surface using this 
relationship, gx = Nx x yx, where g is the ionic con- 
ductance, N is the number of channels open at rest, and 
(gamma) is the permeability of an individual potassium 
channel. Determination of individualized channel con- 
ductance is performed in a lab setting using the patch 
clamp technique to isolate single ion channels. Once 
isolated, these channels can be tested using pharma- 
cological techniques to determine their single channel 
conductivity [34.18]. These experiments are very sensi- 
tive and must be performed for each channel of interest 


within the model as they cannot be represented without 
accurate biological data. 

Describing a neuron as a capacitor yields interest- 
ing properties that we can infer about neurons from 
the large established set of knowledge regarding capac- 
itance. The innate capacitive nature of the membrane 
actually affects the passage of current and, therefore, is 
relevant to our discussion here. When current is injected 
into our capacitive system it is inherently slowed based 
upon the time course of the current injection and the 
capacitance of the system represented by this relation 
Au = <4". Therefore, the magnitude of the change 
in potential across the capacitor is relative to the du- 
ration of the current, presenting a natural latency in 
signal transmission that must be accounted for when 
modeling. 


34.3.2 Early Empirical Models — 
Hodgkin and Huxley 


Investigation into the structure of the brain began with 
improvements to the light microscope in the early 


657 


EHE | d Hed 


658 Part D | Neural Networks 
twentieth century. As characterization of cellular types ant squid axon in vitro. These results have been adjusted 
progressed rapidly, understanding of cellular physiol- to set the resting membrane potential at a value of 0 mV 
ogy lagged quite far behind. This is largely due to instead of the typical —65 mV as is seen in many mod- 
the fact that the technology to manipulate individual ern interpretations. 
cells had yet to be invented. To go around this prob- This model has been expanded and interpreted for 
lem Hodgkin and Huxley used an axon harvested from many systems outside the giant squid axon, and a gen- 
a giant squid as their experimental system. This allowed eral form for determining gating variables is shown 
them to easily observe the behavior of the experimen- here 
tal preparation while functionally examining changes 
that occurred at the microscopic level. In general, this == [8 — 6o (u)] . (34.7) 
model also describes the segment of neuron as a capac- Ta (u) 
itor with x ionic currents and applied current as shown Jn this differential form, © represents a particular gating 
here variable of interest. When the membrane voltage u, is 
fixed at a certain value then © approaches 69(u) with 
c” =— > LO ++I. (34.4) a time constant represented by to (u). These values can 
dt F be calculated using the transformation equations shown 
below 
This model uses three ionic current components includ- 
: i ; ag (u) 
ing potassium, sodium, and a non-specific leak current olu) = —__—____ , 
that are described below [æo (u) i Bow) 
Tto (u) = ; (34.8) 
YOL = guam hlu — Eyat) [æg (u) + Bo(w)] 
x 
+ &xt n'(u— Eg+) +81 (u—EL) . (34.5) Table 34.2 Parameters associated with dynamics of ions 
within biological membranes, specifically of the giant 
Each of the terms above represents a specific ion with squid. The first column shows the gating variables asso- 
a gating variable that has been experimentally fitted ciated with each ionic species. The center column is the 
to display the characteristics of each specific channel. equilibrium potential that can be found for each ion within 
A simulation of these parameters in a model cell ex- the system and is shown in millivolts. On the right the over- 
pressing the standard Hodgkin—Huxley type channels all conductance of each ionic species within the model of 
is displayed in Fig. 34.5. In this case, the sodium cur- action potential generation and membrane potential main- 
rent is determined using two different gating variables tenance is shown 
because it has two distinct phases, activation and inac- Ton species —gat- | Reversal potential | Conductance 
tivation. During the inactivation phase the conduction ing variables (0) Eg (mV) go (mS/cm?) 
pore of the channel is blocked by a string of intracellu- Na- (0 =m,h) 115 120 
lar positive amino acids that literally plug the channel K_(6 =n) =p 36 
closed. This is distinct from potassium or leak channels L — no gating 10.6 0.3 
that do not have an inactivation mechanism built into 
the protein. The differential form of each gating vari- Table 34.3 Gating variables with specific relationships 
able for the Hodgkin-Huxley model is shown below that describe the particular function of each of the gating 
(Tables 34.2 and 34.3) in terms of the experimentally variables 
determined parameters œ and f Giiny O LET CIES 
0.1—0.01lu =u 
= m = anl) (1 =m) ~ Ble) ; n cag | (02st 
5 n= Oy (u) (—n)— Brn, (2.5—0.1u) E 
ə h = a,(u) (1— h) — B,(u) . (34.6) ™ pew A 
g z 
w These values were obtained during initial laboratory A 0.07e 20 CEET] 


experimentation by Hodgkin and Huxley using the gi- 


Computational Neuroscience — Biophysical Modeling of Neural Systems 


34.3 Modeling Biophysically Realistic Neurons 


34.3.3 Compartmental Modeling - 
Anatomical Reduction 


Consider a simple case of a neuron with a spherical cell 
body. The time course of the potential change across the 
membrane can be described using the following 


Ault) = InRn (1 -e7). (34.9) 


In the above relationship Zm and Rm are the membrane 
current and resistance, respectively. The rightmost term 
contains e to the power of time divided by the mem- 
brane time constant T. 

Spatially, the current in our spherical cell body will 
decay along the length of the membrane according to 
the relationship below 


Au(x) = Ame ™® . (34.10) 
Above we can see that the spatial decay of potential re- 
lies upon the potential where the current was initially 
injected Aug, x is the distance from the site of the cur- 
rent injection, and w is the membrane length constant. 
This constant is defined as, œ = V/7m/Ta, Where rm is 
the membrane resistance and r, is the axial resistance 
along the length of the compartment. The length con- 
stant is also dependent upon the radius of the segment 
with large axons conducting current more easily than 
smaller axons. In addition, each cell has individualized 
values of membrane and axial resistance that are based 
on the particular distribution of protein and cellular or- 
ganelles in each experimental scenario. 


34.3.4 Cable Equations - 
Connecting Compartments 


One of the most convenient ways of modeling a neu- 
ron is by simplifying its neural structure. This can be 
done by approximating the shape of parts of the cell 
surface as cylinders (Fig. 34.6), since this is very close 
to the real shape of a neuronal process. By starting with 
a simple cylinder, we can consider this as the fundamen- 
tal unit for computation where quantities regulating the 
system will be derived and repeated along the length 
of the model neuron [34.19]. The one-dimensional ca- 
ble equation can effectively describe the propagation of 
current along a length of cylinder that does not branch 
as shown below 


3V F= eV 
OX?" 


JT (34.11) 


In the above, V and F are both independent functions 
of time and space represented by T and X, respectively. 
Since most neurons are branched we can consider 
a neuron to be composed of many one-dimensional 
elements that can be arranged to form the branching 
structure. The boundary conditions of each cylinder are 
used to calculate specific values within each region of 
membrane. To derive this relationship one must first 
consider conservation of charge as shown here, 


La- f imda =o, 
A 


where the leftmost term is the sum of axial currents 
entering the section and im represents the integrated 
transmembrane currents over the entire segment area. 
Expanding this description to include an electrode 
current source s we obtain the following relation- 
ship, 


Li- f inaa + f iaa =0. 
A A 


The electrode current can be simplified to a point source 
of current due to the fact that most electrode current 
sources are much smaller than the cell itself. 

To simplify our model we can split the neuron up 
into j compartments of size m and area A. All the 
properties of the compartment can be represented as 
the average at the middle of the compartment shown 
here, imjAj = a iag. The current between compart- 
ments can be defined using Ohm’s law where the 
voltage drop between compartments is divided by 
the axial resistance between compartmental centers 
lagi = (Ve — vj) /rx. Therefore, our compartment can then 
be described using the following relationship, i,,A; = 
(ve — yj) /e- The total membrane current can then 
be found using this expression, 


dv; 
; j : 
im Aj = oa oT lion (Vj, t) ; 


where c; is the compartmental capacitance and iion (vj, t) 
is a function that captures the varying values of ion 
channel conductance in the membrane. 

A set of branched cables can be constructed from 
individual segments to yield a set of differential equa- 
tions that follow the form shown in (34.12) below 


Gfracdvjdt + ion (vj, t) = > x=) ; (34.12) 
ra 


k Jk 


659 


EHE | d Hed 


660 PartD 


Neural Networks 


He | d Hed 


= > 


A Dis im dA = 0 
A 


TMembrane 


Gue de 


J Jel 


Cj = CmadAx 


rj = RaAx/a(d/2y° 


Fig. 34.6 Generalized diagram of a single compartment 
representing a segment of a neuron in a biophysical model 
using cable equations. Below, three compartments are 
shown to indicate axial and membrane current components 
that are represented by their summated valued at the center 
of each compartment [34.6] 


In order for the equations above to be completely valid 
we must assume that the axial current flow between 
various compartments can be closely approximated by 
calculating the value of the current at the center of each 
compartment. This implies that the current can vary lin- 
early across compartments with the compartment size 
chosen specifically to account for any spatial variations 
that may exist within the experimental system, 
ga + tion.) = iat a + a ; 
dt Tj+1,k 


Ti-1.k 
The above is a special case (34.14), which was dis- 
cussed [34.5], where specific attention is paid to the ax- 
ial current in adjacent compartments when a uniformly 


(34.13) 


distributed current passes through an initial neuronal 
segment with constant diameter [34.20]. Compartments 
have length Ax and diameter d. The capacitance can 
be written as C,,727dAx and the axial resistance of each 
compartment is defined as R,Ax/ (4), where Cm is 
the specific membrane capacitance and R, is the spe- 
cific cytoplasmic resistivity. Manipulation of (34.13) 
using the above consideration then yields the following 


(34.14) 
d v1 — 2 + 1 
4R, Ax ` 


(34.14) 


dy 
Cay tit) = 


Here the total ionic current specified above has been 
replaced with į(v;, t), a term that expands our considera- 
tion of injected ions by using a current density function. 
Now, if we consider the case where compartment size 
becomes infinitely small and the right-hand side then 
reduces to the second partial derivative of membrane 
potential with respect to the distance from the compart- 
ment of interest j. This reduction yields (34.15) below 


ay YER) a) 


After multiplying both sides by the membrane resis- 
tance and with a simple application of Ohm’s law we 
can see that iR,, = v and, therefore, we find 


dv dRm \ (8°v 
CmR; = ; 
oA (=) (eR) 
This relationship can be scaled using the constants for 
time Tm = RmCm and space w = Tm/Ta, respectively. 


(34.15) 


(34.16) 


34.4 Reducing Computational Complexity for Large Network Simulations 


34.4.1 Reducing Computational 
Complexity — Large Scale Models 


The Hodgkin—Huxley model [34.21] sets the founda- 
tion for mathematically modeling detailed temporal 
dynamics of how action potentials in neurons are ini- 
tiated and propagated. The set of nonlinear ordinary 
differential equations of the form (34.4)-(34.6) were 
developed to describe the electrical characteristics of 
the squid giant axon. Given the many different mem- 


brane currents that may be involved in the firing of 
an action potential, the Hodgkin—Huxley model repre- 
sents the simplest possible representation of neuronal 
dynamics yet realistically captures the biophysical re- 
lationship between the voltage and time dependence of 
cell membrane currents. Even so, it is a daunting task 
to study a large neural network based on interconnected 
neurons each of which is modeled by Hodgkin—Huxley 
equations. The effort made in [34.22, 23] is a good il- 
lustration of the inherent challenges. Even on a single 


Computational Neuroscience — Biophysical Modeling of Neural Systems 


34.4 Biophysical Neural Models 


neuron level, recent studies have shown that it would 
be easier to tune parameters in a less biophysically 
realistic model under the general scope of integrate- 
and-fire, or threshold models, which approximate the 
pulse-like electrical activity as a threshold process. Or 
in other words, such models are less sensitive in their 
model parameters and thus provide more robust and ac- 
curate model fitting results given profiles of injected 
current waveforms [34.24—26]. These threshold models 
are easy to work with but they are phenomenolog- 
ical approaches to modeling true neural behavior. It 
would not be possible to use these models to study 
membrane voltage profile over a precise time course, 
and it would be impossible to assess environmental 
parameters such as temperature change, chemical en- 
vironment change, pharmacological manipulations of 
the ion channels and their impact on the membrane 
dynamics. 

Given the many challenges of mathematical mod- 
eling of realistic neurons and neuronal networks under 
fine spatial and temporal resolution, great efforts have 
been made on several fronts to advance the study 
of neural network dynamic behaviors. Common to 
all approaches, the role of time in neuronal activities 
is emphasized and, thus, the models are usually de- 
scribed by nonlinear ordinary differential equations. 
In the following, we will examine some of these dy- 
namic models that are built on different premises and 
considerations. 


34.4.2 Firing Rate-Based Dynamic Models 


The neural firing rate-based encoding scheme assumes 
that information about environmental stimulus is con- 
tained in the firing rates of the neurons. Thus, the 
specific spike times are under-represented. Sufficient 
evidence points out that in most sensory systems, the 
firing rate increases, generally non-linearly, with in- 
creasing stimulus intensity, and measurement of firing 
rates has become a standard tool for describing the 
properties of all types of sensory or cortical neurons, 
partly due to the relative ease of measuring rates ex- 
perimentally. However, this approach neglects all the 
information possibly contained in the exact timing of 
the spikes [34.1]. Maybe it is inefficient, but the rate 
coding is robust and easy to measure and thus has been 
used as a standard or basic tool for studying sensory or 
cortical neuron characteristics in association with exter- 
nal stimuli or behaviors. The class of firing rate-based 
dynamic models mainly takes into account two consid- 
erations. First, these models account for a population of 


neurons in the model. As such, these models aim at sim- 
ulating large-scale neural network behaviors. Second, 
these models were motivated by associative memory 
processes where the time reflects the memory recall 
process. 

Consider a population of neurons, and let r;(t) de- 
note the mean firing rate of a target neuron i, and r;(t), 
j€ N; = {j/j is presynaptic to i}, the mean firing rates 
of all neurons presynaptic to neuron i. Let h,(t) be the 
input to target neuron i, which is 


hi(t) = X wi Or . (34.17) 


JENi 


Equation (34.17) takes into account all presynaptic neu- 
rons’ contributions weighted by synaptic efficacy w;j. 
A representative class of the firing rate model as studied 
extensively in the artificial neural networks community, 
which was first popularized by Hopfield [34.27], can 
then be described at the fixed-point of an associative 
memory process as 


ri(t) =O |X wOna (34.18) 


JENi 


In (34.18), OC.) is considered a gain function. Conse- 
quently the firing rate dynamics associated with this 
associative memory process can be defined by introduc- 
ing a time constant t in the associative network as 


(34.19) 


dr; 
T ae = —r() +0 |J wi 


JENi 


The firing rate model (34.19) also has another inter- 
esting interpretation where the mean firing rate r;(t) is 
considered to be the spatial averaged neural potential 
F(t) of neuron i due to contributions from a local pop- 
ulation of neurons j € N; = {j/j is presynaptic to i}. As 
such, (34.19) becomes 


oF) _ -F +0 | Y w OFA 


34.20 
J | ( ) 
JENi 


The model described by (34.20) was used for the 
analysis of a large neural network where slow neural 
dynamics were assumed in order to describe spatially 
homogeneous motoneurons [34.28]. 


661 


HHE | d Hed 


662 


HE | d Hed 


Part D 


Neural Networks 


34.4.3 Spike Response Model 


The spike response model [34.29] uses response ker- 
nels to account for the integral effect of presynaptic 
action potentials. With two linear kernels and under 
a simple renewal assumption, it can be shown that the 
spike response model (SRM) is a generalization of the 
integrate-and-fire neuron. The spike response model de- 
scribes the membrane potential u;(t) of neuron i as 


u(t) = n(t—ti) + | « (t-i, s) I™ (t—5) ds, 


ieee” 


(34.21) 


where 7 represents the typical form of an action po- 
tential, which includes both depolarization and repolar- 
ization, as well as the process of settling down to the 
resting potential; 7; stands for neuron i firing an action 
potential at that time. Also in the equation, the kernel 
«(t—ij,s) is a linear impulse response function of the 
membrane potential to a unit input current. Imagine it 
as a time course of an additive membrane potential to 


34.5 Conclusions 


In this chapter we have provided an introduction to 
some established modeling approaches to studying bio- 
logical neural systems. Motivations behind these mod- 
els are twofold. First and foremost, biological realism 
is considered to be of the utmost importance. Given 
the complex nature of a biological neuron, reduced- 
order neuronal models that can be or have been vali- 
dated by the Hodgkin—Huxley model or biological data 
have been developed. As discussed, these models only 
scratch the surface of providing an accurate and realis- 
tic account of a real neural system, not even a specific 
brain area or something capable of explaining a behav- 
ioral parameter completely and thoroughly. Nonethe- 
less, this decade has probably seen the most progress in 
terms of computational modeling approaches to under- 
standing the brain. The International Neuroinformatics 


References 


34.1 E.R. Kandel, J.H. Schwartz, T.M. Jessell, S.A. Siegel- 
baum, A.J. Hudspeth: Principles of Neural Science, 


5th edn. (McGraw-Hill, New York 2013) 


the membrane potential of neuron i after its action po- 
tential fired at 7;. The term /°'(t—.s) accounts for all 
external driving currents. 

According to the spike response model, an ac- 
tion potential is fired if the membrane potential u;(t) 
crosses the threshold 6(t—7;) from below, where it 
is noted that this threshold value is a function of 
(t—î;). The consideration of using a dynamic thresh- 
old in a phenomenological neuron model was proven 
to be an important contributor to the success of us- 
ing SRM for spike-time prediction under random 
conductance injection [34.30]. Actually, SRM out- 
performed a standard leaky integrate-and-fire model 
significantly when tested on the same experimental 
data. In [34.14], the authors performed an analyti- 
cal reduction from the full conductance-based model 
to the spike response model for fast spiking neurons. 
After estimating the three parameters in the SRM, 
namely the kernels (t—7;), «(t—i;,s), and the dy- 
namic threshold @(t—7;), the authors show that the 
full conductance-based model of a fast-spiking neuron 
model is well approximated by a single variable SRM 
model. 


Coordinating Facility (INCF) was established about 10 
years ago with a focus on coordinating and promot- 
ing neuroinformatics research activities. Their activities 
are based on maintenance of database and computa- 
tional infrastructure to support neuroscience research 
and applications. The grand scale and ambitious neu- 
ral modeling project of simulating a human brain in the 
next 10 years led by Henry Markram is another exam- 
ple of the urgency and timeliness of studying the brain 
by using advanced computing machinery. The wonders 
of new technology have certainly provided us with the 
tools critical to pondering some of the most challenging 
questions about the brain. An insurmountable amount 
of work has yet to be done to fill the huge gap between 
a single neuron and its model proposed by Hodgkin- 
Huxley and the true understanding of the human brain. 


34.2 M.L. Hines, N.T. Carnevale: Neuron: A tool for neu- 


roscientists, Neuroscientist 7(2), 123-135 (2001) 


Computational Neuroscience — Biophysical Modeling of Neural Systems 


References 


34.3 


34.4 


34.5 


34.6 


34.7 


34.8 


34.9 


34.10 


34.11 


34.12 


34.13 


34.14 


34.15 


N. Baumann, D. Pham-Dinh: Biology of oligo- 
dendrocyte and myelin in the mammalian central 
nervous system, Physiol. Rev. 81(2), 871-927 (2001) 
R. Brette, M. Rudolphy, T. Carnevale, M. Hines, 
D. Beeman, J.M. Bower, M. Diesmann, A. Mor- 
rison, P.H. Goodman, F.C. Harris Jr., M. Zirpe, 
T. Natschlager, D. Pecevski, A.B. Ermentrout, 
M. Djurfeldt, A. Cansner, 0. Rochel, T. Vieville, 
E. Mulles, A.P. Davison, S. El Boustani, A. Destexhe, 
J. Harris, C. Frederick, B. Ermentrout: Simulation 
of networks of spiking neurons: A review of tools 
and strategies, J. Comput. Neurosci. 23(3), 349-398 
(2007) 

E.R. Kandel, J.H. Schwartz: Molecular biology of 
learning: Modulation of transmitter release, Sci- 
ence 218(4571), 433-443 (1982) 

N.T. Carnevale, M.L. Hines: The NEURON Book (Cam- 
bridge Univ. Press, Cambridge 2006) 

J. Chen: A simulation study investigating the im- 
pact of dendritic morphology and synaptic topol- 
ogy on neuronal firing patterns, Neural Comput. 
22(4), 1086-1111 (2010) 

J. Crank, A.B. Crowley: On an implicit scheme for the 
isotherm migration method along orthogonal flow 
lines in two dimensions, Int. J. Heat Mass Transf. 
22(10), 1331-1337 (1979) 

R.J. Douglas, K.A.C. Martin: Recurrent neuronal cir- 
cuits in the neocortex, Curr. Biol. 17(13), R496-500 
(2007) 

W. Gerstner: Time structure of the activity in neural 
network models, Phys. Rev. E 51(1), 738-758 (1995) 
W. Gerstner, R. Naud: How good are neuron mod- 
els?, Science 326(5951), 379-380 (2009) 

S. Grillner: The motor infrastructure: From ion 
channels to neuronal networks, Nat. Rev. Neurosci. 
4(7), 573-586 (2003) 

A.L. Hodgkin, A.F. Huxley: Propagation of electri- 
cal signals along giant nerve fibres, Proc. R. Soc. B 
14.0(899), 177-183 (1952) 

R. Jolivet, T.J. Lewis, W. Gerstner: Generalized 
integrate-and-fire models of neuronal activity ap- 
proximate spike trains of a detailed model to a high 
degree of accuracy, J. Neurophysiol. 92(2), 959-976 
(2004) 

K.M. Stiefel, J.T. Sejnowski: Mapping function onto 
neuronal morphology, J. Neurophysiol. 98(1), 513- 
526 (2007) 


34.16 


34.17 


34.18 


34.19 


34.20 


34.21 


34.22 


34.23 


34.24 


34.25 


34.26 


34.27 


34.28 


34.29 


34.30 


C.L. Kutscher: Chemical transmission in the mam- 
malian nervous system, Neurosci. Biobehav. Rev. 
2(2), 123-124 (1978) 

L.F. Abbott: Modulation of function and gated 
learning in a network memory, Proc. Natl. Acad. Sci. 
USA 87(23), 9241-9245 (1990) 

S.B. Laughlin, T.J. Sejnowski: Communication in 
neuronal networks, Science 301(5641), 1870-1874 
(2003) 

P. Lledo, G. Gheusi, J. Vincent: Information pro- 
cessing in the mammalian olfactory system, Phys- 
iol. Rev. 85(1), 281-317 (2005) 

M.L. Hines, N.T. Carnevale: The NEURON simula- 
tion environment, Neural Comput. 9(6), 1179-1209 
(1997) 

A.L. Hodgkin, A.F. Huxley: A quantitative descrip- 
tion of membrane current and its application to 
conduction and excitation in nerve, Bull. Math. 
Biol. 52(1), 25-71 (1990) 

W.W. Lytton: Adapting a feedforward heteroas- 
sociative network to Hodgkin-Huxley dynamics, 
J. Comput. Neurosci. 5(4), 353-364 (1998) 

W.W. Lytton: Optimizing synaptic conductance cal- 
culation for network simulations, Neural Comput. 
8(3), 501-509 (1996) 

M. Migliore, C. Cannia, W.W. Lytton, H. Markram, 
M.L. Hines: Parallel network simulations with NEU- 
RON, J. Comput. Neurosci. 21(2), 119-129 (2006) 

Y. Sun, D. Zhou, A.V. Rangan, D. Cai: Library-based 
numerical reduction of the Hodgkin—Huxley neu- 
ron for network simulation, J. Comput. Neurosci. 
27(3), 369-390 (2009) 

X. Wang: Decision making in recurrent neuronal 
circuits, Neuron 60(2), 215-234 (2008) 

J.J. Hopfield: Neural networks and physical systems 
with emergent collective computational abilities, 
Proc. Nat. Acad. Sci. USA 79(8), 2554-2558 (1982) 
J.L. Feldman, J.D. Cowan: Large-scale activ- 
ity in neural nets I: Theory with applications 
to motoneuron pool, Biol. Cybern. 17(1), 29-38 
(1975) 

W. Gerstner, W. Kistler: Spiking Neuron Models 
(Cambridge Univ. Press, Cambridge 2002) 

R. Jolivet, R. Kobayashi, A. Rauch, R. Naud, S. Shi- 
nomoto, W. Gerstner: A benchmark test for a quan- 
titative assessment of simple neuron models, 
J. Neurosci. Methods 169, 417-424 (2008) 


663 


HE | d Hed 


665 


35. Computational Models of Cognitive 
and Motor Control 


Ali A. Minai 


Most of the earliest work in both experimental and 35.1 OVO@FMIOW oerni a aa a E 665 
theoretical/computational system neuroscience 35.2 Motor Control o...on 667 
focused on sensory systems and the peripheral 35.2.1 Cortical Representation 

(spinal) control of movement. However, over the Of Movement ............ccceecc cece eee 667 


last three decades, attention has turned increas- 
ingly toward higher functions related to cognition, 
decision making and voluntary behavior. Exper- 
imental studies have shown that specific brain 
structures — the prefrontal cortex, the premo- 


35.2.2 Synergy-based Representations.. 668 
35.2.3 Computational Models 
of Motor Control s.a.s 669 


35.3 Cognitive Control 


tor and motor cortices, and the basal ganglia = and Working MeMOTY ossia 670 
i : 35.3.1 Action Selection 

playa central role in these functions, as does the and Reinforcement Learning ...... 670 

Copamine system dieu cable reward cuing f= 35.3.2 Working Memory..........0...:::008 671 


inforcement learning. Because of the complexity 
of the issues involved and the difficulty of direct 


35.3.3 Computational Models 
of Cognitive Control 


observation in deep brain structures, computa- and Working Memory ................ 671 
tional modeling has been crucial in elucidating the , 

neural basis of cognitive control, decision making, 35.4 Conclusion... 674 
RDN OF CE SAS MUGS Woning memen, Ane mo- Referentes. oniri ti 674 


tor control. The resulting computational models are 
also very useful in engineering domains such as 
robotics, intelligent agents, and adaptive control. 
While it is impossible to encompass the totality 
of such modeling work, this chapter provides an 
overview of significant efforts in the last 20 years. 


35.1 Overview 


Mental function is usually divided into three parts: per- 
ception, cognition, and action — the so-called sense- 
think-act cycle. Though this view is no longer held dog- 
matically, it is useful as a structuring framework for 
discussing mental processes. Several decades of theory 
and experiment have elucidated an intricate, multicon- 
nected functional architecture for the brain [35.1, 2] — 
a simplified version of which is shown in Fig. 35.1. 
While all regions and functions shown — and many not 
shown — are important, this figure provides a summary 
of the main brain regions involved in perception, cogni- 
tion, and action. The highlighted blocks in Fig. 35.1 are 


It also outlines many of the theoretical issues 
underlying this work, and discusses significant 
experimental results that motivated the computa- 
tional models. 


discussed in this chapter, which focuses mainly on the 
higher level mechanisms for the control of behavior. 

The control of action (or behavior) is, in a real 
sense, the primary function of the nervous system. 
While such actions may be voluntary or involuntary, 
most of the interest in modeling has understandably fo- 
cused on voluntary action. This chapter will follow this 
precedent. 

It is conventional to divide the neural substrates 
of behavior into higher and lower levels. The latter 
involves the musculoskeletal apparatus of action (mus- 
cles, joints, etc.) and the neural networks of the spinal 


v 
o 
al 
(= 
w 
ul 
=a 


666 PartD 


Neural Networks 


se | d Hed 


Cognitive control & working memory 


Medial Prefrontal 
cortex 


Other 
frontal cortex 
areas 


temporal 
lobe 


Fig. 35.1 A general schematic of 
primary signal flow in the nervous 
system. Many modulatory regions 
and connections, as well as several 
known connections, are not shown. 
The shaded areas indicate the com- 
ponents covered in this chapter 


Motor 
control 


Premotor cortex 


SMA 


Sensory 
systems 
Visual Basal ganglia 
Auditory f 
Parietal 
Olfactory cortex 
Somatosensory 
7 Cerebellum 
Multimodal 


cord and brainstem. These systems are seen as repre- 
senting the actuation component of the action system, 
which is controlled by the higher level system com- 
prising cortical and subcortical structures. This division 
between a controller (the brain) and the plant (the 
body and spinal networks), which parallels the mod- 
els used in robotics, has been criticized as arbitrary 
and unhelpful [35.3,4], and there has recently been 
a shift of interest toward more embodied views of cog- 
nition [35.5, 6]. However, the conventional division is 
useful for organizing material covered in this chapter, 
which focuses primarily on the higher level systems, 
i. e., those above the spinal cord and the brainstem. 
The higher level system can be divided further into 
a cognitive control component involving action selec- 
tion, configuration of complex actions, and the learning 
of appropriate behaviors through experience, and a mo- 
tor control component that generates the control signals 
for the lower level system to execute the selected action. 
The latter is usually identified with the motor cortex 
(M1), premotor cortex (PMC), and the supplementary 
motor area (SMA), while the former is seen as involv- 


Motor cortex 


Musculoskeletal 


Brainstem 


Spinal circuits 


system 


ing the prefrontal cortex (PFC), basal ganglia (BG), the 
anterior cingulate cortex (ACC) and other cortical and 
subcortical regions [35.7]. With regard to the genera- 
tion of actions per se, an influential viewpoint for the 
higher level system is summarized by Doya [35.8]. It 
proposes that higher level control of action has three 
major loci: the cortex, the cerebellum, and the BG. Of 
these, the cortex — primarily the M1 — provides a self- 
organized repertoire of possible actions that, when trig- 
gered, generate movement by activating muscles via 
spinal networks, the cerebellum implements fine mo- 
tor control configured through error-based supervised 
learning [35.9], and the BG provide the mechanisms for 
selecting among actions and learning appropriate ones 
through reinforcement learning [35.10-13]. The motor 
cortex and cerebellum can be seen primarily as mo- 
tor control (though see [35.14]), whereas the BG falls 
into the domain of cognitive control and working mem- 
ory (WM). The PFC is usually regarded as the locus 
for higher order choice representations, plans, goals, 
etc. [35.15-18], while the ACC is thought to be in- 
volved in conflict monitoring [35.19-21]. 


Computational Models of Cognitive and Motor Control 


35.2 Motor Control 


35.2 Motor Control 


Given its experimental accessibility and direct rele- 
vance to robotics, motor control has been a primary 
area of interest for computational modeling [35.22— 
24]. Mathematical, albeit non-neural, theories of motor 
control were developed initially within the framework 
of dynamical systems. One of these directions led to 
models of action as an emergent phenomenon [35.3, 
25-33] arising from interactions among preferred coor- 
dination modes [35.34]. This approach has continued to 
yield insights [35.29] and has been extended to multiac- 
tor situations as well [35.33, 35-37]. Another approach 
within the same framework is the equilibrium point 
hypothesis [35.38,39], which explains motor control 
through the change in the equilibrium points of the mus- 
culoskeletal system in response to neural commands. 
Both these dynamical approaches have paid relatively 
less attention to the neural basis of motor control and 
focused more on the phenomenology of action in its 
context. Nevertheless, insights from these models are 
fundamental to the emerging synthesis of action as an 
embodied cognitive function [35.5, 6]. 

A closely related investigative tradition has been de- 
veloped from the early studies of gaits and other rhyth- 
mic movements in cats, fish, and other animals [35.40— 
45], leading to computational models for central pat- 
tern generators (CPGs), which are neural networks 
that generate characteristic periodic activity patterns au- 
tonomously or in response to control signals [35.46]. 
It has been found that rhythmic movements can be ex- 
plained well in terms of CPGs — located mainly in the 
spinal cord — acting upon the coordination modes in- 
herent in the musculoskeletal system. The key insight 
to emerge from this work is that a wide range of use- 
ful movements can be generated by modulation of these 
CPGs by rather simple motor control signals from the 
brain, and feedback from sensory receptors can shape 
these movements further [35.43]. This idea was demon- 
strated in recent work by [jspeert et al. [35.47] showing 
how the same simple CPG network could produce both 
swimming and walking movements in a robotic sala- 
mander model using a simple scalar control signal. 

While rhythmic movements are obviously impor- 
tant, computational models of motor control are often 
motivated by the desire to build humanoid or biomor- 
phic robots, and thus need to address a broader range of 
actions — especially aperiodic and/or voluntary move- 
ments. Most experimental work on aperiodic movement 
has focused on the paradigm of manual reaching [35.30, 
48—64]. However, seminal work has also been done 


with complex reflexes in frogs and cats [35.65—72], iso- 
metric tasks [35.73, 74], ball-catching [35.75], drawing 
and writing [35.60, 76-81], and postural control [35.71, 
72, 82,83). 

A central issue in understanding motor control is the 
degrees of freedom problem [35.84] which arises from 
the immense redundancy of the system — especially in 
the context of multijoint control. For any desired move- 
ment — such as reaching for an object — there are an 
infinite number of control signal combinations from 
the brain to the muscles that will accomplish the task 
(see [35.85] for an excellent discussion). From a con- 
trol viewpoint, this has usually been seen as a problem 
because it precludes the clear specification of an objec- 
tive function for the controller. To the extent that they 
consider the generation of specific control signals for 
each action, most computational models of motor con- 
trol can be seen as direct or indirect ways to address the 
degrees of freedom problem. 


35.2.1 Cortical Representation 
of Movement 


It has been known since the seminal work by Penfield 
and Boldrey [35.86] that the stimulation of specific lo- 
cations in the M1 elicit motor responses in particular 
locations on the body. This has led to the notion of 
a motor homunculus — a map of the body on the M1. 
However, the issue of exactly what aspect of move- 
ment is encoded in response to individual neurons is 
far from settled. A crucial breakthrough came with 
the discovery of population coding by Georgopoulos 
et al. [35.49]. It was found that the activity of spe- 
cific neurons in the hand area of the M1 corresponded 
to reaching movements in particular directions. While 
the tuning of individual cells was found to be rather 
broad (and had a sinusoidal profile), the joint activity of 
many such cells with different tuning directions coded 
the direction of movement with great precision, and 
could be decoded through neurally plausible estima- 
tion mechanisms. Since the initial discovery, population 
codes have been found in other regions of the cor- 
tex that are involved in movement [35.49, 53, 54, 60, 
77-80, 87]. Population coding is now regarded as the 
primary basis of directional coding in the brain, and 
is the basis of most brain—machine interfaces (BMI) 
and brain-controlled prosthetics [35.88, 89]. Neural net- 
work models for population coding have been devel- 
oped by several researchers [35.90-93], and popula- 


667 


T'SE | d Hed 


668 


T'SE | d Hed 


Part D 


Neural Networks 


tion coding has come to be seen as a general neural 
representational strategy with application far beyond 
motor control [35.94]. Excellent reviews are provided 
in [35.95, 96]. Mathematical and computational models 
for Bayesian inference with population codes are dis- 
cussed in [35.97, 98]. 

An active research issue in the cortical coding of 
movement is whether it occurs at the level of kine- 
matic variables, such as direction and velocity, or in 
terms of kinetic variables, such as muscle forces and 
joint torques. From a cognitive viewpoint, a kinematic 
representation is obviously more useful, and popula- 
tion codes suggest that such representations are indeed 
present in the motor cortex [35.48, 53,54, 60, 77-80, 
99,100] and PFC [35.15, 101]. However, movement 
must ultimately be constructed from the appropriate 
kinetic variables, i.e., by controlling the forces gener- 
ated by specific muscles and the resulting joint torques. 
Studies have indicated that some neurons in the M1 are 
indeed tuned to muscle forces and joint torques [35.58, 
59, 73, 99, 100, 102, 103]. This apparent multiplicity of 
cortical representations has generated significant debate 
among researchers [35.74]. One way to resolve this 
issue is to consider the kinetic and kinematic represen- 
tations as dual representations related through the con- 
straints of the musculoskeletal system. However, Shah 
et al. [35.104] have used a simple computational model 
to show that neural populations tuned to kinetic or kine- 
matic variables can act jointly in motor control without 
the need for explicit coordinate transformations. 

Graziano et al. [35.105] studied movements elicited 
by the sustained electrode stimulation of specific sites 
in the motor cortex of monkeys. They found that dif- 
ferent sites led to specific complex, multijoint move- 
ments such as bringing the hand to the mouth or lifting 
the hand above the head regardless of the initial posi- 
tion. This raises the intriguing possibility that individual 
cells or groups of cells in the M1 encode goal-directed 
movements that can be triggered as units. The study 
also indicated that this encoding is not open-loop, but 
can compensate — at least to some degree — for varia- 
tion or extraneous perturbations. The M1 and other re- 
lated regions (e.g., the supplementary motor area and 
the PMC) appear to encode spatially organized maps of 
a few canonical complex movements that can be used 
as basis functions to construct other actions [35.105— 
107]. A neurocomputational model using self-organized 
feature maps has been proposed in [35.108] for the rep- 
resentation of such canonical movements. 

In addition to rhythmic and reaching movements, 
there has also been significant work on the neural 


basis of sequential movements, with the finding that 
such neural codes for movement sequences exist in 
the supplementary motor area [35.109-111], cerebel- 
lum [35.112, 113], BG [35.112], and the PFC [35.101]. 
Coding for multiple goals in sequential reaching has 
been observed in the parietal cortex [35.114]. 


35.2.2 Synergy-based Representations 


A rather different approach to studying the construction 
of movement uses the notion of motor primitives, of- 
ten termed synergies [35.63, 115, 116]. Typically, these 
synergies are manifested in coordinated patterns of spa- 
tiotemporal activation over groups of muscles, implying 
a force field over posture space [35.117, 118]. Studies in 
frogs, cats, and humans have shown that a wide range 
of complex movements in an individual subject can be 
explained as the modulated superposition of a few syn- 
ergies [35.63, 65-72, 115,119, 120]. Given a set of n 
muscles, the n-dimensional time-varying vector of ac- 
tivities for the muscles during an action can be written 
as 


N 

m (1) =) g(t- 4). 

k=1 

where g(t) is a time-varying synergy function that 
takes only nonnegative values, c/ is the gain of the 
kth synergy used for action q, and i is the tempo- 
ral offset with which the kth synergy is triggered for 
action q [35.69]. The key point is that a broad range 
of actions can be constructed by choosing different 
gains and offsets over the same set of synergies, which 
represent a set of hard-coded basis functions for the 
construction of movements [35.120, 121]. Even more 
interestingly, it appears that the synergies found em- 
pirically across different subjects of the same species 
are rather consistent [35.67, 72], possibly reflecting the 
inherent constraints of musculoskeletal anatomy. Var- 
ious neural loci have been suggested for synergies, 
including the spinal cord [35.67, 107, 122], the motor 
cortex [35.123], and combinations of regions [35.85, 
124]. 

Though synergies are found consistently in the anal- 
ysis of experimental data, their actual existence in the 
neural substrate remains a topic for debate [35.125, 
126]. However, the idea of constructing complex move- 
ments from motor primitives has found ready appli- 
cation in robotics [35.127—132], as discussed later in 
this chapter. A hierarchical neurocomputational model 
of motor synergies based on attractor networks has re- 
cently been proposed in [35.133, 134]. 


(35.1) 


Computational Models of Cognitive and Motor Control 


35.2 Motor Control 


35.2.3 Computational Models 
of Motor Control 


Motor control has been modeled computationally at 
many levels and in many ways, ranging from explicitly 
control-theoretic models through reinforcement-based 
models to models based on emergent dynamical pat- 
terns. This section provides a brief overview of these 
models. 

As discussed above the M1, premotor cortex (PMC) 
and the supplementary motor area (SMA) are seen as 
providing self-organized codes for specific actions, in- 
cluding information on direction, velocity, force, low- 
level sequencing, etc., while the PFC provides higher 
level codes needed to construct more complex actions. 
These codes, comprising a repertoire of actions [35.10, 
106], arise through self-organized learning of activity 
patterns in these cortical systems. The BG system is 
seen as the primary locus of selection among the actions 
in the cortical repertoire. The architecture of the sys- 
tem involving the cortex, BG, and the thalamus, and in 
particular the internal architecture of the BG [35.135], 
makes this system ideally suited to selectively disin- 
hibiting specific cortical regions, presumably activating 
codes for specific actions [35.10, 136, 137]. The BG 
system also provides an ideal substrate for learning 
appropriate actions through a dopamine-mediated rein- 
forcement learning mechanism [35.138—141]. 

Many of the influential early models of motor con- 
trol were based on control-theoretic principles [35.142— 
144], using forward and inverse kinematic and dy- 
namic models to generate control signals [35.55, 57, 
145-150] — see [35.146] for an excellent introduction. 
These models have led to more sophisticated ones, such 
as MOSAIC (modular selection and identification for 
control) [35.151] and AVITEWRITE (adaptive vector 
integration to endpoint handwriting) [35.81]. The MO- 
SAIC model is a mixture of experts, consisting of many 
parallel modules, each comprising three subsystems. 
These are: A forward model relating motor commands 
to predicted position, a responsibility predictor that es- 
timates the applicability of the current module, and an 
inverse model that learns to generate control signals 
for desired movements. The system generates motor 
commands by combining the recommendations of the 
inverse models of all modules weighted by their appli- 
cability. Learning in the model is based on a variant 
of the EM algorithm. The model in [35.57] is a com- 
prehensive neural model with both cortical and spinal 
components, and builds upon the earlier VITE model 
in [35.55]. The AVITEWRITE model [35.81], which is 


a further extension of the VITE model, can generate the 
complex movement trajectories needed for writing by 
using a combination of pre-specified phenomenological 
motor primitives (synergies). A cerebellar model for the 
control of timing during reaches has been presented by 
Barto et al. [35.152]. 

The use of neural maps in models of motor con- 
trol was pioneered in [35.153,154]. These models 
used self-organized feature maps (SOFMs) [35.155] to 
learn visuomotor coordination. Baraduc et al. [35.156] 
presented a more detailed model that used multiple 
maps to first integrate posture and desired movement 
direction and then to transform this internal repre- 
sentation into a motor command. The maps in this 
and most subsequent models were based on earlier 
work by [35.90-93]. An excellent review of this ap- 
proach is given in [35.94]. A more recent and com- 
prehensive example of the map-based approach is the 
SURE-REACH (sensorimotor, unsupervised, redun- 
dancy-resolving control architecture) model in [35.157] 
which focuses on exploiting the redundancy inher- 
ent in motor control [35.84]. Unlike many of the 
other models, which use neutrally implausible error- 
based learning, SURE-REACH relies only on unsu- 
pervised and reinforcement learning. Maps are also 
the central feature of a general cognitive architec- 
ture called ERA (epigenetic robotics architecture) by 
Morse et al. [35.158]. 

Another successful approach to motor control mod- 
els is based on the use of motor primitives, which are 
used as basis functions in the construction of diverse ac- 
tions. This approach is inspired by the experimental ob- 
servation of motor synergies as described above. How- 
ever, most models based on primitives implement them 
nonneurally, as in the case of AVITEWRITE [35.81]. 
The most systematic model of motor primitives has 
been developed by Schaal et al. [35.129-132]. In this 
model, motor primitives are specified using differen- 
tial equations, and are combined after weighting to 
produce different movements. Recently, Matsubara et 
al. [35.159] have shown how the primitives in this 
model can be learned systematically from demonstra- 
tions. Drew et al. [35.123] proposed a conceptual model 
for the construction of locomotion using motor prim- 
itives (synergies) and identified the characteristics of 
such primitives experimentally. A neural model of 
motor primitives based on hierarchical attractor net- 
works has been proposed recently in [35.133, 134, 160], 
while Neilson and Neilson [35.85, 124] have proposed 
a model based on coordination among adaptive neural 
filters. 


669 


T'SE | d Hed 


ESE | d Hed 


Part D 


Neural Networks 


Motor control models based on primitives can be 
simpler than those based on trajectory tracking be- 
cause the controller typically needs to choose only the 
weights (and possibly delays) for the primitives rather 
than specifying details of the trajectory (or forces). 
Among other things, this promises a potential solu- 
tion to the degrees of freedom problem [35.84] since 
the coordination inherent in the definition of motor 
primitives reduces the effective degrees of freedom 
in the system. Another way to address the degrees 
of freedom problem is to use an optimal control ap- 
proach with a specific objective function. Researchers 
have proposed objective functions such as minimum 
jerk [35.161], minimum torque [35.162], minimum ac- 
celeration [35.163], or minimum energy [35.85], but 
an especially interesting idea is to optimize the dis- 
tribution of variability across the degrees of freedom 
in a task-dependent way [35.144, 164-167]. From this 
perspective, motor control trades off variability in task- 
irrelevant dimensions for greater accuracy in task-rele- 
vant ones. Thus, rather than specifying a trajectory, the 
controller focuses only on correcting consequential er- 
rors. This also explains the experimental observation 
that motor tasks achieve their goals with remarkable 
accuracy while using highly variable trajectories to 
achieve the same goal. Trainin et al. [35.168] have 
shown that the optimal control principle can be used 
to explain the observed neural coding of movements in 


the cortex. Biess et al. [35.169] have proposed a de- 
tailed computational model for controlling an arm in 
three-dimensional space by separating the spatial and 
temporal components of control. This model is based 
on optimizing energy usage and jerk [35.161], but is 
not implemented at the neural level. 

An alternative to these prescriptive and construc- 
tivist approaches to motor control is provided by mod- 
els based on dynamical systems [35.3, 25-27, 29,31- 
33]. The most important way in which these models 
diverge from the others is in their use of emergence 
as the central organizational principle of control. In 
this formulation, control programs, structures, prim- 
itives, etc., are not preconfigured in the brain—body 
system, but emerge under the influence of task and 
environmental constraints on the affordances of the 
system [35.33]. Thus, the dynamical systems view of 
motor control is fundamentally ecological [35.170], 
and like most ecological models, is specified in terms 
of low-dimensional state dynamics rather than high- 
dimensional neural processes. Interestingly, a corre- 
spondence can be made between the dynamical and 
optimal control models through the so-called uncon- 
trolled manifold concept [35.31, 33, 39,171]. In both 
models, the dimensions to be controlled and those 
that are left uncontrolled are decided by external con- 
straints rather than internal prescription, as in classical 
models. 


35.3 Cognitive Control and Working Memory 


A lot of behavior — even in primates — is automatic, or 
almost so. This corresponds to actions (or internal be- 
haviors) so thoroughly embedded in the sensorimotor 
substrate that they emerge effortlessly from it. In con- 
trast, some tasks require significant cognitive effort for 
one or more reason, including: 


1. An automatic behavior must be suppressed to allow 
the correct response to emerge, e.g., in the Stroop 
task [35.172]. 

2. Conflicts between incoming information and/or re- 
called behaviors must be resolved [35.19, 20]. 

3. More contextual information — e.g., social context — 
must be taken into account before acting. 

4. Intermediate pieces of information need to be stored 
and recalled during the performance of the task, 
e.g., in sequential problem solving. 


5. The timing of subtasks within the overall task is 
complex, e.g., in delayed-response tasks or other se- 
quential tasks [35.173]. 


Roughly speaking, the first three fall under the 
heading of cognitive control, and the latter two of work- 
ing memory. However, because of the functions are 
intimately linked, the terms are often subsumed into 
each other. 


35.3.1 Action Selection 
and Reinforcement Learning 


Action selection is arguably the central component of 
the cognitive control process. As the name implies, it in- 
volves selectively triggering an action from a repertoire 
of available ones. While action selection is a complex 


Computational Models of Cognitive and Motor Control 


35.3 Cognitive Control and Working Memory 


process involving many brain regions, a consensus has 
emerged that the BG system plays a central role in 
its mechanism [35.10, 12,14]. The architecture of the 
BG system and the organization of its projections to 
and from the cortex [35.135, 174, 175] make it ideally 
suited to function as a state-dependent gating system 
for specific functional networks in the cortex. As shown 
in Fig. 35.2, the hypothesis is that the striatal layer of 
the BG system, receiving input from the cortex, acts 
as a pattern recognizer for the current cognitive state. 
Its activity inhibits specific parts of the globus pal- 
lidus (GPi), leading to disinhibition of specific neural 
assemblies in the cortex — presumably allowing the 
behavior/action encoded by those assemblies to pro- 
ceed [35.10]. The associations between cortical activity 
patterns and behaviors are key to the functioning of the 
BG as an action selection system, and the configura- 
tion and modulation of these associations are thought 
to lie at the core of cognitive control. The neurotrans- 
mitter dopamine (DA) plays a key role here by serving 
as a reward signal [35.138—140] and modulating rein- 
forcement learning [35.176, 177] in both the BG and the 
cortex [35.141, 178-180]. 


35.3.2 Working Memory 


All nontrivial behaviors require task-specific informa- 
tion, including relevant domain knowledge and the 
relative timing of subtasks. These are usually grouped 
under the function of WM. An influential model of 
WM in [35.181] identifies three components in WM: 
(1) a central executive, responsible for attention, de- 
cision making, and timing; (2) a phonological loop, 
responsible for processing incoming auditory infor- 
mation, maintaining it in short-term memory, and re- 
hearsing utterances; and (3) a visuospatial sketchpad, 
responsible for processing and remembering visual in- 
formation, keeping track of what and where informa- 
tion, etc. An episodic buffer to manage relationships 
between the other three components is sometimes in- 
cluded [35.182]. Though already rather abstract, this 
model needs even more generalized interpretation in 
the context of many cognitive tasks that do not directly 
involve visual or auditory data. Working memory func- 
tion is most closely identified with the PFC [35.183- 
185]. 

Almost all studies of WM consider only short- 
term memory, typically on the scale of a few sec- 
onds [35.186]. Indeed, one of the most significant — 
though lately controversial — results in WM research 
is the finding that only a small number of items can 


be kept in mind at any one time [35.187, 188]. How- 
ever, most cognitive tasks require context-dependent 
repertoires of knowledge and behaviors to be enabled 
collectively over longer periods. For example, a player 
must continually think of chess moves and strategies 
over the course of a match lasting several hours. The 
configuration of context-dependent repertoires for ex- 
tended periods has been termed long-term working 
memory [35.189]. 


35.3.3 Computational Models 
of Cognitive Control 
and Working Memory 


Several computational models have been proposed for 
cognitive control, and most of them share common 
features. The issues addressed by the models include 
action selection, reinforcement learning of appropriate 
actions, decision making in choice tasks, task sequenc- 
ing and timing, persistence and capacity in WM, task 
switching, sequence learning, and the configuration of 
context-appropriate workspaces. Most of the models 
discussed below are neural with a range of biological 
plausibility. A few important nonneural models are also 
mentioned. 

A comprehensive model using spiking neurons and 
incorporating many biological features of the BG sys- 
tem has been presented in [35.13, 193]. This model 
focuses only on the BG and explicitly on the dy- 
namics of dopamine modulation. A more abstract but 
broader model of cognitive control is the agents of the 
mind model in [35.14], which incorporates the cere- 
bellum as well as the BG. In this model, the BG 
provide the action selection function while the cerebel- 
lum acts to refine and amplify the choices. A series of 
interrelated models have been developed by O’Reilly, 
Frank etal. (35.17, 179, 194-199]. All these models 
use the adaptive gating function of the BG in combi- 
nation with the WM function of the prefrontal cortex 
to explain how executive function can arise without 
explicit top-down control — the so-called homuncu- 
lus [35.196, 197]. A comprehensive review of these and 
other models of cognitive control is given in [35.200]. 
Models of goal-directed action mediated by the PFC 
have also been presented in [35.201] and [35.202]. 
Reynolds and O'Reilly et al. [35.203] have proposed 
a model for configuring hierarchically organized rep- 
resentations in the PFC via reinforcement learning. 
Computational models of cognitive control and work- 
ing have also been used to explain mental pathologies 
such as schizophrenia [35.204]. 


671 


ESE | d Hed 


672 


ESE | d Hed 


Neural Networks 


Sensory & 
association 
cortex 


Prefrontal 
cortex 


a Premotor w 
cortex/SMA 


Striatum 


— 


D; 


Limbic 
system 


SWZ 
Sensory GPi > Thalamus 
information | 
Action <——, Brainstem/spinal cord 


Fig. 35.2 The action selection and reinforcement learning substrate in the BG. Wide filled arrows indicate excita- 
tory projections while wide unfilled arrows represent inhibitory projections. Linear arrows indicate generic excitatory 
and inhibitory connectivity between regions. The inverted D-shaped contacts indicate modulatory dopamine connec- 
tions that are crucial to reinforcement learning. Abbreviations: SMA = supplementary motor area; SNc = substantia 
nigra pars compacta; VTA = ventral tegmental area; OFC = orbitofrontal cortex; GPe = globus pallidus (external 
nuclei); GPi = globus pallidus (internal nuclei); STN = subthalamic nucleus; D; = excitatory dopamine receptors; 
D = inhibitory dopamine receptors. The primary neurons of GPi are inhibitory and active by default, thus keeping 
all motor plans in the motor and premotor cortices in check. The neurons of the striatum are also inhibitory but usually 
in an inactive down state (after [35.190]). Particular subgroups of striatal neurons are activated by specific patterns of 
cortical activity (after [35.136]), leading first to disinhibition of specific actions via the direct input from striatum to 
GPi, and then by re-inhibition via the input through STN. Thus the system gates the triggering of actions appropriate to 
current cognitive contexts in the cortex. The dopamine input from SNc projects a reward signal based on limbic system 
state, allowing desirable context-action pairs to be reinforced (after [35.191, 192]) — though other hypotheses also exist 
(after [35.14]). The dopamine input to PFC from the VTA also signals reward and other task-related contingencies 


An important aspect of cognitive control is switch- state. While this model captures many phenomeno- 
ing between tasks at various time-scales [35.205, 206]. logical aspects of behavior, it is not explicitly neural. 
Imamizu et al. [35.207] compared two computational Botvinick and Plaut [35.173] present an alternative 
models of task switching — a mixture-of-experts (MoE) neural model that relies on distributed neural repre- 


model and MOSAIC -— using brain imaging. They con- 
cluded that task switching in the PFC was more consis- 
tent with the MoE model and that in the parietal cortex 
and cerebellum with the MOSAIC model. 

An influential abstract model of cognitive con- 
trol is the interactive activation model in [35.208, 
209]. In this model, learned behavioral schemata con- 
tend for activation based on task context and cognitive 


sentations and the dynamics of recurrent neural net- 
works rather than explicit schemata and contention. 
Dayan etal. [35.210,211] have proposed a neural 
model for implementing complex rule-based decision 
making where decisions are based on sequentially un- 
folding contexts. A partially neural model of behavior 
based on the CLARION cognitive model has been de- 
veloped in [35.212]. 


Computational Models of Cognitive and Motor Control 


35.3 Cognitive Control and Working Memory 


Recently, Grossberg and Pearson [35.213] have pre- 
sented a comprehensive model of WM called LIST 
PARSE. In this model, the term working memory 
is applied narrowly to the storage of temporally or- 
dered items, i.e., lists, rather than more broadly to 
all short-term memory. Experimentally observed ef- 
fects such as recency (better recall of late items in 
the list) and primacy (better recall of early items in 
the list) are explained by this model, which uses the 
concept of competitive queuing for sequences. This 
is based on the observation [35.101, 214] that multi- 
ple elements of a behavioral sequence are represented 
in the PFC as simultaneously active codes with acti- 
vation levels representing the temporal order. Unlike 
the WM models discussed in the previous paragraph, 
the WM in LIST PARSE is embedded within a full 
cognitive control model with action selection, trajec- 
tory generation, etc. Many other neural models for 
chains of actions have also been proposed [35.214— 
224]. 

Higher level cognitive control is characterized by 
the need to fuse information from multiple sensory 
modalities and memory to make complex decisions. 
This has led to the idea of a cognitive workspace. In the 
global workspace theory (GWT) developed in [35.225], 
information from various sensory, episodic, semantic, 
and motivational sources comes together in a global 
workspace that forms brief, task-specific integrated rep- 
resentations that are broadcast to all subsystems for 
use in WM. This model has been implemented com- 
putationally in the intelligent distribution agent (IDA) 
model by Franklin et al. [35.226, 227]. A neurally im- 
plemented workspace model has been developed by 
Dehaene et al. [35.172, 228, 229] to explain human sub- 
jects’ performance on effortful cognitive tasks (i. e., 
tasks that require suppression of automatic responses), 
and the basis of consciousness. The construction of cog- 
nitive workspaces is closely related to the idea of long- 
term working memory [35.189]. Unlike short-term 
working memory, there are few computational models 
for long-term working memory. Neural models seldom 
cover long periods, and implicitly assume that a chain- 
ing process through recurrent networks (e.g., [35.173]) 
can maintain internal attention. Tyer et al. [35.230, 231] 
have proposed an explicitly neurodynamical model of 
this function, where a stable but modulatable pat- 
tern of activity called a graded attractor is used to 
selectively bias parts of the cortex in the context- 
dependent fashion. An earlier model was proposed 
in [35.232] to serve a similar function in the hippocam- 
pal system. 


Another class of models focuses primarily on sin- 
gle decisions within a task, and assume an underlying 
stochastic process [35.186, 233-235]. Typically, these 
models address two-choice short-term decisions made 
over a second or two [35.186]. The decision process be- 
gins with a starting point and accumulates information 
over time resulting in a diffusive (random walk) pro- 
cess. When the diffusion reaches one of two boundaries 
on either side of the starting point, the corresponding 
decision is made. This elegant approach can model such 
concrete issues as decision accuracy, decision time, and 
the distribution of decisions without any reference to 
the underlying neural mechanisms, which is both its 
chief strength and its primary weakness. Several con- 
nectionist models have also been developed based on 
paradigms similar to the diffusion approach [35.236— 
238]. The neural basis of such models has been dis- 
cussed in detail in [35.239]. A population-coding neural 
model that makes Bayesian decisions based on cumula- 
tive evidence has been described by Beck et al. [35.98]. 

Reinforcement learning [35.176] is widely used 
in many engineering applications, but several mod- 
els go beyond purely computational use and include 
details of the underlying brain regions and neurophysi- 
ology [35.141, 240]. Excellent reviews of such models 
are provided in [35.241—243]. Recently, models have 
also been proposed to show how dopamine-mediated 
learning could work with spiking neurons [35.244] and 
population codes [35.245]. 

Computational models that focus on working mem- 
ory per se (i.e., not on the entire problem of cognitive 
control) have mainly considered how the requirement 
of selective temporal persistence can be met by bio- 
logically plausible neural networks [35.246, 247]. Since 
working memories must bridge over temporal dura- 
tions (e.g., in remembering a cue over a delay period), 
there must be some neural mechanism to allow ac- 
tivity patterns to persist selectively in time. A natural 
candidate for this is attractor dynamics in recurrent 
neural networks [35.248, 249], where the recurrence 
allows some activity patterns to be stabilized by re- 
verberation [35.250]. The neurophysiological basis of 
such persistent activity has been studied in [35.251]. 
A central feature in many models of WM is the role 
of dopamine in the PFC [35.252—254]. In particular, 
it is believed that dopamine sharpens the response of 
PFC neurons involved in WM [35.255] and allows for 
reliable storage of timing information in the presence 
of distractors [35.246]. The model in [35.246, 252] in- 
cludes several biophysical details such as the effect of 
dopamine on different ion channels and its differential 


673 


ESE | d Hed 


674 Part D 


SE | d Hed 


Neural Networks 


modulation of various receptors. More abstract neu- 
ral models for WM have been proposed in [35.256] 
and [35.257]. 

A especially interesting type of attractor network 
uses the so-called bump attractors — spatially local- 
ized patterns of activity stabilized by local network 
connectivity and global competition [35.258]. Such 
a network has been used in a biologically plausible 
model of WM in the PFC in [35.259], which demon- 
strates that the memory is robust against distracting 


35.4 Conclusion 


This chapter has attempted to provide an overview of 
neurocomputational models for cognitive control, WM, 
and motor control. Given the vast body of both exper- 
imental and computational research in these areas, the 
review is necessarily incomplete, though every attempt 
has been made to highlights the major issues, and to 
provide the reader with a rich array of references cover- 
ing the breadth of each area. 


stimuli. A similar conclusion is drawn in [35.180] based 
on another bump attractor model of working memory. It 
shows that dopamine in the PFC can provide robustness 
against distractors, but robustness against internal noise 
is achieved only when dopamine in the BG locks the 
state of the striatum. Recently, Mongillo et al. [35.260] 
have proposed the novel hypothesis that the persis- 
tence of neural activity in WM may be due to calcium- 
mediated facilitation rather than reverberation through 
recurrent connectivity. 


The models described in this chapter relate to 
several other mental functions including sensorimo- 
tor integration, memory, semantic cognition, etc., as 
well as to areas of engineering such as robotics and 
agent systems. However, these links are largely ex- 
cluded from the chapter — in part for brevity, but mainly 
because most of them are covered elsewhere in this 
Handbook. 


References 

35.1 J. Fuster: The cognit: A network model of cortical 35.10 A.M. Graybiel: Building action repertoires: Mem- 
representation, Int. J. Psychophysiol. 60, 125-132 ory and learning functions of the basal ganglia, 
(2006) Curr. Opin. Neurobiol. 5, 733-741 (1995) 

35.2 J. Fuster: The Prefrontal Cortex (Academic, London 35.11 A.M. Graybiel: The basal ganglia and cognitive 
2008) pattern generators, Schizophr. Bull. 23, 459-469 

35.3 M.T. Turvey: Coordination, Am. Psychol. 45, 938- (1997) 
953 (1990) 35.12 A.M. Graybiel: The basal ganglia: Learning new 


35.4 D. Sternad, M.T. Turvey: Control parameters, equi- 
libria, and coordination dynamics, Behav. Brain 
Sci. 18, 780-783 (1996) 

35.5 R. Pfeifer, M. Lungarella, F. lida: Self- 
organization, embodiment, and biologically 
inspired robotics, Science 318, 1088-1093 (2007) 

35.6 A. Chemero: Radical Embodied Cognitive Science 
(MIT Press, Cambridge 2011) 

35.7 J.C. Houk, S.P. Wise: Distributed modular archi- 
tectures linking basal ganglia, cerebellum, and 
cerebral cortex: Their role in planning and con- 
trolling action, Cereb. Cortex 5, 95-110 (2005) 

35.8 K. Doya: What are the computations of the cere- 
bellum, the basal ganglia and the cerebral cor- 
tex?, Neural Netw. 12, 961-974 (1999) 

35.9 M. Kawato, H. Gomi: A computational model 
of four regions of the cerebellum based on 
feedback-error learning, Biol. Cybern. 68, 95-103 
(1992) 


tricks and loving it, Curr. Opin. Neurobiol. 15, 638- 
644 (2005) 

35.13 M.D. Humphries, R.D. Stewart, K.N. Gurney: 
A physiologically plausible model of action selec- 
tion and oscillatory activity in the basal ganglia, 
J. Neurosci. 26, 12921-12942 (2006) 

35.14 J.C. Houk: Agents of the mind, Biol. Cybern. 92, 
427-437 (2005) 

35.15 E. Hoshi, K. Shima, J. Tanji: Neuronal activity in 
the primate prefrontal cortex in the process of 
motor selection based on two behavioral rules, 
J. Neurophysiol. 83, 2355-2373 (2000) 

35.16 E.K. Miller, J.D. Cohen: An integrative theory of 
prefrontal cortex function, Annu. Rev. Neurosci. 
4, 167-202 (2001) 

35.17 N.P. Rougier, D.C. Noelle, T.S. Braver, J.D. Cohen, 
R.C. O'Reilly: Prefrontal cortex and flexible cog- 
nitive control: Rules without symbols, PNAS 102, 
7338-7343 (2005) 


Computational Models of Cognitive and Motor Control | References 

35.18 J. Tanji, E. Hoshi: Role of the lateral prefrontal 35.36 V.C. Ramenzoni, M.A. Riley, K. Shockley, 
cortex in executive behavioral control, Physiol. A.A. Baker: Interpersonal and intrapersonal 
Rev. 88, 37-57 (2008) coordinative modes for joint and individual 

35.19 M.M. Botvinick, J.D. Cohen, C.S. Carter: Conflict task performance, Human Mov. Sci. 31, 1253-1267 
monitoring and anterior cingulate cortex: An up- (2012) 
date, Trends Cogn. Sci. 8, 539-546 (2004) 35.37 M.A. Riley, M.C. Richardson, K. Shockley, V.C. Ra- 

35.20 M.M. Botvinick: Conflict monitoring and decision menzoni: Interpersonal synergies, Front. Psychol. 
making: Reconciling two perspectives on anterior 2(38), DOI 10.3389/fpsyg.2011.00038. (2011) 
cingulate function, Cogn. Affect Behav. Neurosci. 35.38 A.G. Feldman, M.F. Levin: The equilibrium-point 
7, 356-366 (2008) hypothesis—past, present and future, Adv. Exp. 

35.21 J.W. Brown, T.S. Braver: Learned predictions of Med. Biol. 629, 699-726 (2009) 
error likelihood in the anterior cingulate cortex, 35.39 M.L. Latash: Motor synergies and the equili- 
Science 307, 1118-1121 (2005) brium-point hypothesis, Mot. Control 14, 294-322 

35.22 J.S. Albus: New approach to manipulator con- (2010) 
trol: The cerebellar model articulation controller 35.40 C.E. Sherrington: Integrative Actions of the Ner- 
(CMAC), J. Dyn. Sys. Meas. Control 97, 220-227 vous System (Yale Univ. Press, New Haven 1906) 
(1975) 35.41 C.E. Sherrington: Remarks on the reflex mecha- 

35.23 D. Marr: A theory of cerebellar cortex, J. Physiol. nism of the step, Brain 33, 1-25 (1910) 

202, 437-470 (1969) 35.42 C.E. Sherrington: Flexor-reflex of the limb, 

35.24 M.H. Dickinson, C.T. Farley, R.J. Full, M.A.R. Koehl, crossed extension reflex, and reflex stepping and 
R. Kram, S. Lehman: How animals move: An in- standing (cat and dog), J. Physiol. 40, 28-121 
tegrative view, Science 288, 100-106 (2000) (1910) 

35.25 H. Haken, J.A.S. Kelso, H. Bunz: A theoreti- 35.43 S. Grillner, T. Deliagina, 0. Ekeberg, A. El Manira, 
cal model of phase transitions in human hand R.H. Hill, A. Lansner, G.N. Orlovsky, P. Wallén: 
movements, Biol. Cybern. 51, 347-356 (1985) Neural networks that co-ordinate locomotion 

35.26 E. Saltzman, J.A.S. Kelso: Skilled actions: A task and body orientation in lamprey, Trends Neurosci. 
dynamic approach, Psychol. Rev. 82, 225-260 18, 270-279 (1995) 

(1987) 35.44 P.J. Whelan: Control of locomotion in the decere- 

35.27 P.N. Kugler, M.T. Turvey: Information, Natural Law, brate cat, Prog. Neurobiol. 49, 481-515 (1996) 
and the Self-Assembly of Rhythmic Movement 35.45 S. Grillner: The motor infrastructure: From ion 
(Lawrence Erlbaum, Hillsdale 1987) channels to neuronal networks, Nat. Rev. Neu- 

35.28 G. Schöner: A dynamic theory of coordination rosci. 4, 673-686 (2003) 
of discrete movement, Biol. Cybern. 63, 257-270 35.46 S. Grillner: Biological pattern generation: The cel- 
(1990) lular and computational logic of networks in 

35.29 J.A.S. Kelso: Dynamic Patterns: The Self- motion, Neuron 52, 751-766 (2006) 

Organization of Brain and Behavior (MIT Press, 35.47 A.J. ljspeert, A. Crespi, D. Ryczko, J.M. Cabelguen: 
Cambridge 1995) From swimming to walking with a salamander 

35.30 P. Morasso, V. Sanguineti, G. Spada: A computa- robot driven by a spinal cord model, Science 315, 
tional theory of targeting movements based on 1416-1420 (2007) 
force fields and topology representing networks, 35.48 A.P. Georgopoulos, J.F. Kalaska, R. Caminiti, 
Neurocomputing 15, 411-434 (1997) J.T. Massey: On the relations between the direc- 

35.31 J.P. Scholz, G. Schöner: The uncontrolled man- tion of two-dimensional arm movements and cell 
ifold concept: Identifying control variables for discharge in primate motor cortex, J. Neurosci. 2, 
a functional task, Exp. Brain Res. 126, 289-306 1527-1537 (1982) 

(1999) 35.49 A.P. Georgopoulos, R. Caminiti, J.F. Kalaska, 

35.32 M.A. Riley, M.T. Turvey: Variability and determin- J.T. Massey: Spatial coding of movement: A hy- 
ism in motor behavior, J. Mot. Behav. 34, 99-125 pothesis concerning the coding of movement di- 
(2002) rection by motor cortical populations, Exp. Brain 

35.33 M.A. Riley, N. Kuznetsov, S. Bonnette: State-, pa- Res. Suppl. 7, 327-336 (1983) 
rameter-, and graph-dynamics: Constraints and 35.50 A.P. Georgopoulos, R. Caminiti, J.F. Kalaska: Static 
the distillation of postural control systems, Sci. spatial effects in motor cortex and area 5: Quanti- 
Mot. 74, 5-18 (2011) tative relations in a two-dimensional space, Exp. 

35.34 E.C. Goldfield: Emergent Forms: Origins and Early Brain Res. 54, 446-454 (1984) 

Development of Human Action and Perception 35.51 A.P. Georgopoulos, R.E. Kettner, A.B. Schwartz: 
(Oxford Univ. Press, Oxford 1995) Primate motor cortex and free arm movements 
35.35 J.A.S. Kelso, G.C. de Guzman, C. Reveley, E. Tog- to visual targets in three-dimensional space. 


noli: Virtual partner interaction (VPI): Exploring 
novel behaviors via coordination dynamics, PLoS 
ONE 4, e5749 (2009) 


ll: Coding of the direction of movement by 
a neuronal population, J. Neurosci. 8, 2928-2937 
(1988) 


675 


SE | d Hed 


676 Part D | Neural Networks 

= 35.52 A.P. Georgopoulos, J. Ash, N. Smyrnis, M. Taira: 35.67 M.C. Tresch, P. Saltiel, E. Bizzi: The construction of 
a The motor cortex and the coding of force, Science movement by the spinal cord, Nat. Neurosci. 2, 
las 256, 1692-1695 (1992) 162-167 (1999) 

= 35.53 J. Ashe, A.P. Georgopoulos: Movement parame- 35.68 W.J. Kargo, S.F. Giszter: Rapid correction of aimed 
w ters and neural activity in motor cortex and area, movements by summation of force-field primi- 
ur Cereb. Cortex 5(6), 590-600 (1994) tives, J. Neurosci. 20, 409-426 (2000) 

35.54 A.B. Schwartz, R.E. Kettner, A.P. Georgopoulos: 35.69 A. d'Avella, P. Saltiel, E. Bizzi: Combinations of 
Primate motor cortex and free arm movements to muscle synergies in the construction of a nat- 
visual targets in 3-D space. I. Relations between ural motor behavior, Nat. Neurosci. 6, 300-308 
singlecell discharge and direction of movement, (2003) 

J. Neurosci. 8, 2913-2927 (1988) 35.70 A. d'Avella, E. Bizzi: Shared and specific muscle 

35.55 D. Bullock, S. Grossberg: Neural dynamics of synergies in natural motor behaviors, Proc. Natl. 
planned arm movements: emergent invari- Acad. Sci. USA 102, 3076-3081 (2005) 
ants and speed-accuracy properties during tra- 35.71 L.H. Ting, J.M. Macpherson: A limited set of mus- 
jectory formation, Psychol. Rev. 95, 49-90 cle synergies for force control during a postural 
(1988) task, J. Neurophysiol. 93, 609-613 (2005) 

35.56 D. Bullock, S. Grossberg, F.H. Guenther: A self- 35.72 G. Torres-Oviedo, J.M. Macpherson, L.H. Ting: 
organizing neural model of motor equivalent Muscle synergy organization is robust across a va- 
reaching and tool use bya multijoint arm, J. Cogn. riety of postural perturbations, J. Neurophysiol. 
Neurosci. 5, 408-435 (1993) 96, 1530-1546 (2006) 

35.57 D. Bullock, P. Cisek, S. Grossberg: Cortical networks 35.73 L.E. Sergio, J.F. Kalaska: Systematic changes in 
for control of voluntary arm movements under motor cortex cell activity with arm posture during 
variable force conditions, Cereb. Cortex 8, 48-62 directional isometric force generation, J. Neuro- 
(1998) physiol. 89, 212-228 (2003) 

35.58 S.H. Scott, J.F. Kalaska: Changes in motor cortex 35.74 R. Ajemian, A. Green, D. Bullock, L. Sergio, 
activity during reaching movements with similar J. Kalaska, S. Grossberg: Assessing the function of 
hand paths but different arm postures, J. Neuro- motor cortex: Single-neuron models of how neu- 
physiol. 73, 2563-2567 (1995) ral response is modulated by limb biomechanics, 

35.59 S.H. Scott, J.F. Kalaska: Reaching movements Neuron 58, 414-428 (2008) 
with similar hand paths but different arm 35.75 B. Cesqui, A. d'Avella, A. Portone, F. Lacquaniti: 
orientations. |. Activity of individual cells in Catching a ball at the right time and place: Indi- 
motor cortex, J. Neurophysiol. 77, 826-852 vidual factors matter, PLoS ONE 7, e31770 (2012) 
(1997) 35.76 P. Morasso, F.A. Mussa-lIvaldi: Trajectory forma- 

35.60 D.W. Moran, A.B. Schwartz: Motor cortical repre- tion and handwriting: A computational model, 
sentation of speed and direction during reaching, Biol. Cybern. 45, 131-142 (1982) 

J. Neurophysiol. 82, 2676-2692 (1999) 35.77 A.B. Schwartz: Motor cortical activity during 

35.61 R. Shadmehr, S.P. Wise: The Computational Neu- drawing movements: Single unit activity during 
robiology of Reaching and Pointing: A Founda- sinusoid tracing, J. Neurophysiol. 68, 528-541 
tion for Motor Learning (MIT Press, Cambridge (1992) 

2005) 35.78 A.B. Schwartz: Motor cortical activity during 

35.62 A. d'Avella, A. Portone, L. Fernandez, F. Lac- drawing movements: Population representation 
quaniti: Control of fast-reaching movements by during sinusoid tracing, J. Neurophysiol. 70, 28- 
muscle synergy combinations, J. Neurosci. 26, 36 (1993) 

7791-7810 (2006) 35.79 A.B. Schwartz: Direct cortical representation of 

35.63 E. Bizzi, V.C. Cheung, A. d'Avella, P. Saltiel, drawing, Science 265, 540-542 (1994) 

M. Tresch: Combining odules for movement, Brain 35.80 D.W. Moran, A.B. Schwartz: Motor cortical activity 
Res. Rev. 7, 125-133 (2008) during drawing movements: Population repre- 

35.64 S. Muceli, A.T. Boye, A. d'Avella, D. Farina: Iden- sentation during spiral tracing, J. Neurophysiol. 
tifying representative synergy matrices for de- 82, 2693-2704 (1999) 
scribing muscular activation patterns during mul- 35.81 R.W. Paine, S. Grossberg, A.W.A. Van Gemmert: 
tidirectional reaching in the horizontal plane, A quantitative evaluation of the AVITEWRITE model 
J. Neurophysiol. 103, 1532-1542 (2010) of handwriting learning, Human Mov. Sci. 23, 

35.65 S.F. Giszter, F.A. Mussa-lIvaldi, E. Bizzi: Convergent 837-860 (2004) 
force fields organized in the frog's spinal cord, 35.82 G. Torres-Oviedo, L.H. Ting: Muscle synergies 
J. Neurosci. 13, 467-491 (1993) characterizing human postural responses, J. Neu- 

35.66 F.A. Mussa-lIvaldi, S.F. Giszter: Vector field ap- rophysiol. 98, 2144-2156 (2007) 
proximation: A computational paradigm for mo- 35.83 L.H. Ting, J.L. McKay: Neuromechanics of muscle 


tor control and learning, Biol. Cybern. 67, 491-500 
(1992) 


synergies for posture and movement, Curr. Opin. 
Neurobiol. 17, 622-628 (2007) 


Computational Models of Cognitive and Motor Control 


References 


35.84 


35.85 


35.86 


35.87 


35.88 


35.89 


35.90 


35.91 


35.92 


35.93 


35.94 


35.95 


35.96 


35.97 


35.98 


35.99 


35.100 


35.101 


N. Bernstein: The Coordination and Regulation 
of Movements (Pergamon, Oxford 1967) 

P.D. Neilson, M.D. Neilson: On theory of motor 
synergies, Human Mov. Sci. 29, 655-683 (2010) 
W. Penfield, E. Boldrey: Somatic motor and sen- 
sory representation in the motor cortex of man as 
studied by electrical stimulation, Brain 60, 389- 
443 (1937) 

T.D. Sanger: Theoretical considerations for the 
analysis of population coding in motor cortex, 
Neural Comput. 6, 29-37 (1994) 

J.K. Chapin, R.A. Markowitz, K.A. Moxo, 
M.A.L. Nicolelis: Direct real-time control of 
a robot arm using signals derived from neuronal 
population recordings in motor cortex, Nat. 
Neurosci. 2, 664-670 (1999) 

A.A. Lebedev, M.A.L. Nicolelis: Brain-machine in- 
terfaces: Past, present, and future, Trends Neu- 
rosci. 29, 536-546 (2006) 

E. Salinas, L. Abbott: Transfer of coded informa- 
tion from sensory to motor networks, J. Neurosci. 
15, 6461-6474 (1995) 

E. Salinas, L. Abbott: A model of multiplicative 
neural responses in parietal cortex, Proc. Natl. 
Acad. Sci. USA 93, 11956-11961 (1996) 

A. Pouget, T. Sejnowski: A neural model of the 
cortical representation of egocentric distance, 
Cereb. Cortex 4, 314-329 (1994) 

A. Pouget, T. Sejnowski: Spatial transformations 
in the parietal cortex using basis functions, 
J. Cogn. Neurosci. 9, 222-237 (1997) 

A. Pouget, L.H. Snyder: Computational ap- 
proaches to sensorimotor transformations, Nat. 
Neurosci. Supp. 3, 1192-1198 (2000) 

A. Pouget, P. Dayan, R.S. Zemel: Information pro- 
cessing with population codes, Nat. Rev. Neu- 
rosci. 1, 125-132 (2000) 

A. Pouget, P. Dayan, R.S. Zemel: Inference and 
computation with population codes, Annu. Rev. 
Neurosci. 26, 381-410 (2003) 

W.J. Ma, J.M. Beck, P.E. Latham, A. Pouget: 
Bayesian inference with probabilistic population 
codes, Nat. Neurosci. 9, 1432-1438 (2006) 

J.M. Beck, W.J. Ma, R. Kiani, T. Hanks, A.K. Church- 
land, J. Roitman, M.N. Shadlen, P.E. Latham, 
A. Pouget: Probabilistic population codes for 
bayesian decision making, Neuron 60, 1142-1152 
(2008) 

R. Ajemian, D. Bullock, S. Grossberg: Kinematic 
coordinates in which motor cortical cells encode 
movement direction, Neurophys. 84, 2191-2203 
(2000) 

R. Ajemian, D. Bullock, S. Grossberg: A model 
of movement coordinates in the motor cortex: 
Posture-dependent changes in the gain and di- 
rection of single cell tuning curves, Cereb. Cortex 
11, 1124-1135 (2001) 

B.B. Averbeck, M.V. Chafee, D.A. Crowe, A.P. Geor- 
gopoulos: Parallel processing of serial move- 


35 


35 


35. 


35. 


35 


35 


35. 


35 


35 


35 


35 


35 


35 


35 


35. 


35 


.102 


.103 


104 


105 


.106 


.107 


108 


.109 


.110 


.111 


.112 


.113 


114 


«15 


116 


.117 


ments in prefrontal cortex, PNAS 99, 13172-13177 
(2002) 

R. Caminiti, P.B. Johnson, A. Urbano: Making 
arm movements within different parts of space: 
Dynamic aspects in the primate motor cortex, 
J. Neurosci. 10, 2039-2058 (1990) 

K.M. Graham, K.D. Moore, D.W. Cabel, P.L. Grib- 
ble, P. Cisek, S.H. Scott: Kinematics and kinetics 
of multijoint reaching in nonhuman primates, 
J. Neurophysiol. 89, 2667-2677 (2003) 

A. Shah, A.H. Fagg, A.G. Barto: Cortical involve- 
ment in the recruitment of wrist muscles, J. Neu- 
rophysiol. 91, 2445-2456 (2004) 

M.S.A. Graziano, T. Aflalo, D.F. Cooke: Arm move- 
ments evoked by electrical stimulation in the 
motor cortex of monkeys, J. Neurophysiol. 94, 
4209-4223 (2005) 

M.S.A. Graziano: The organization of behavioral 
repertoire in motor cortex, Annu. Rev. Neurosci. 
29, 105-134 (2006) 

M.S.A. Graziano: The Intelligent Movement Ma- 
chine (Oxford Univ. Press, Oxford 2008) 

T.N. Aflalo, M.S.A. Graziano: Possible origins of 
the complex topographic organization of motor 
cortex: Reduction of a multidimensional space 
onto a two-dimensional array, J. Neurosci. 26, 
6288-6297 (2006) 

K. Shima, J. Tanji: Both supplementary and pre- 
supplementary motor areas are crucial for the 
temporal organization of multiple movements, 
J. Neurophysiol. 80, 3247-3260 (1998) 

J.-W. Sohn, D. Lee: Order-dependent modulation 
of directional signals in the supplementary and 
presupplementary motor areas, J. Neurosci. 27, 
13655-13666 (2007) 

H. Mushiake, M. Inase, J. Tanji: Neuronal Activ- 
ity in the primate premotor, supplementary, and 
precentral motor cortex during visually guided 
and internally determined sequential move- 
ments, J. Neurophysiol. 66, 705-718 (1991) 

H. Mushiake, P.L. Strick: Pallidal neuron activity 
during sequential arm movements, J. Neurophys- 
iol. 74, 2754-2758 (1995) 

H. Mushiake, P.L. Strick: Preferential activity of 
dentate neurons during limb movements guided 
by vision, J. Neurophysiol. 70, 2660-2664 (1993) 
D. Baldauf, H. Cui, R.A. Andersen: The posterior 
parietal cortex encodes in parallel both goals for 
double-reach sequences, J. Neurosci. 28, 10081- 
10089 (2008) 

T. Flash, B. Hochner: Motor primitives in verte- 
brates and invertebrates, Curr. Opin. Neurobiol. 
15, 660-666 (2005) 

J.A.S. Kelso: Synergies: Atoms of brain and behav- 
ior. In: Progress in Motor Control, ed. by D. Sternad 
(Springer, Berlin, Heidelberg 2009) pp. 83-91 

F.A. Mussa-Ivaldi: Do neurons in the motor cortex 
encode movement direction? An alternate hy- 
pothesis, Neurosci. Lett. 91, 106-111 (1988) 


677 


SE | d Hed 


678 PartD 


Neural Networks 


SE | d Hed 


35.118 


35.119 


35.120 


35.121 


35:122 


35.123 


35.124 


35.125 


35.126 


35.127 


35.128 


35:129 


35.130 


35.131 


35.132 


35.133 


F.A. Mussa-lvaldi: From basis functions to ba- 
sis fields: vector field approximation from sparse 
data, Biol. Cybern. 67, 479489 (1992) 

A. d'Avella, D.K. Pai: Modularity for sensorimotor 
control: Evidence and a new prediction, J. Mot. 
Behav. 42, 361-369 (2010) 

A. d'Avella, L. Fernandez, A. Portone, F. Lac- 
quaniti: Modulation of phasic and tonic mus- 
cle synergies with reaching direction and speed, 
J. Neurophysiol. 100, 1433-1454 (2008) 

G. Torres-Oviedo, L.H. Ting: Subject-specific mus- 
cle synergies in human balance control are con- 
sistent across different biomechanical contexts, 
J. Neurophysiol. 103, 3084-3098 (2010) 

C.B. Hart: A neural basis for motor primitives in 
the spinal cord, J. Neurosci. 30, 1322-1336 (2010) 
T. Drew, J. Kalaska, N. Krouchev: Muscle synergies 
during locomotion in the cat: A model for motor 
cortex control, J. Physiol. 586(5), 1239-1245 (2008) 
P.D. Neilson, M.D. Neilson: Motor maps and syn- 
ergies, Human Mov. Sci. 24, 774-797 (2005) 

J.J. Kutch, A.D. Kuo, A.M. Bloch, W.Z. Rymer: 
Endpoint force fluctuations reveal flexible rather 
than synergistic patterns of muscle cooperation, 
J. Neurophysiol. 100, 2455-2471 (2008) 

M.C. Tresch, A. Jarc: The case for and against mus- 
cle synergies, Curr. Opin. Neurobiol. 19, 601-607 
(2009) 

A. ljspeert, J. Nakanishi, S. Schaal: Learning 
rhythmic movements by demonstration using 
nonlinear oscillators, IEEE Int. Conf. Intell. Rob. 
Syst. (IROS 2002), Lausanne (2002) pp. 958-963 
A. ljspeert, J. Nakanishi, S. Schaal: Movement 
imitation with nonlinear dynamical systems in 
humanoid robots, Int. Conf. Robotics Autom. 
(ICRA 2002), Washington (2002) pp. 1398-1403 

A. ljspeert, J. Nakanishi, S. Schaal: Trajectory for- 
mation for imitation with nonlinear dynamical 
systems, IEEE Int. Conf. Intell. Rob. Syst. (IROS 
2001), Maui (2001) pp. 752-757 

A. ljspeert, J. Nakanishi, S. Schaal: Learning at- 
tractor landscapes for learning motor primitives. 
In: Advances in Neural Information Processing 
Systems 15, ed. by S. Becker, S. Thrun, K. Ober- 
mayer (MIT Press, Cambridge 2003) pp. 1547-1554 
S. Schaal, J. Peters, J. Nakanishi, A. ljspeert: 
Control, planning, learning, and imitation with 
dynamic movement primitives, Proc. Workshop 
Bilater. Paradig. Humans Humanoids. IEEE Int. 
Conf. Intell. Rob. Syst. (IROS 2003), Las Vegas (2003) 
S. Schaal, P. Mohajerian, A. ljspeert: Dynamics 
systems vs. optimal control — a unifying view. In: 
Computational Neuroscience: Theoretical Insights 
into Brain Function, Progress in Brain Research, 
Vol. 165, ed. by P. Cisek, T. Drew, J.F. Kalaska (El- 
sevier, Amsterdam 2007) pp. 425-445 

K.V. Byadarhaly, M. Perdoor, A.A. Minai: A neural 
model of motor synergies, Proc. Int. Conf. Neural 
Netw., San Jose (2011) pp. 2961-2968 


35. 


35. 


35 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35 


35. 


35. 


35. 


35. 


35. 


35. 


134 


135 


.136 


137 


138 


139 


140 


141 


142 


143 


144 


145 


146 


147 


148 


149 


150 


151 


K.V. Byadarhaly, M.C. Perdoor, A.A. Minai: A mod- 
ular neural model of motor synergies, Neural 
Netw. 32, 96-108 (2012) 

G.E. Alexander, M.R. DeLong, P.L. Strick: Paral- 
lel organization of functionally segregated circuits 
linking basal ganglia and cortex, Annu. Rev. Neu- 
rosci. 9, 357-381 (1986) 

A.W. Flaherty, A.M. Graybiel: Input-output or- 
ganization of the sensorimotor striatum in the 
squirrel monkey, J. Neurosci. 14, 599-610 (1994) 
S. Grillner, J. Hellgren, A. Ménard, K. Saitoh, 
M.A. Wikström: Mechanisms for selection of ba- 
sic motor programs — roles for the striatum and 
pallidum, Trends Neurosci. 28, 364-370 (2005) 

W. Schultz, P. Dayan, P.R. Montague: A neural 
substrate of prediction and reward, Science 275, 
1593-1599 (1997) 

W. Schultz, A. Dickinson: Neuronal coding of pre- 
diction errors, Annu. Rev. Neurosci. 23, 473-500 
(2000) 

W. Schultz: Multiple reward signals in the brain, 
Nat. Rev. Neurosci. 1, 199-207 (2000) 

P.R. Montague, S.E. Hyman, J.D. Cohen: Computa- 
tional roles for dopamine in behavioural control, 
Nature 431, 760-767 (2004) 

D.M. Wolpert, M. Kawato: Multiple paired forward 
and inverse models for motor control, Neural 
Netw. 11, 1317-1329 (1998) 

M. Kawato: Internal models for motor control and 
trajectory planning, Curr. Opin. Neurobiol. 9, 718- 
727 (1999) 

D.M. Wolpert, Z. Ghahramani: Computational 
principles of movement neuroscience, Nat. Neu- 
rosci. Supp. 3, 1212-1217 (2000) 

M. Kawato, K. Furukawa, R. Suzuki: A hierarchi- 
cal neural network model for control and learning 
of voluntary movement, Biol. Cybern. 57, 169-185 
(1987) 

R. Shadmehr, F.A. Mussa-Ivaldi: Adaptive repre- 
sentation of dynamics during learning of a motor 
task, J. Neurosci. 74, 3208-3224 (1994) 

D.M. Wolpert, Z. Ghahramani, M.I. Jordan: An in- 
ternal model for sensorimotor integration, Sci- 
ence 269, 1880-1882 (1995) 

A. Karniel, G.F. Inbar: A model for learning human 
reaching movements, Biol. Cybern. 77, 173-183 
(1997) 

A. Karniel, G.F. Inbar: Human motor control: 
Learning to control a time-varying, nonlinear, 
many-to-one system, IEEE Trans. Syst. Man Cy- 
bern. Part C 30, 1-11 (2000) 

Y. Burnod, P. Baraduc, A. Battaglia-Mayer, 
E. Guigon, E. Koechlin, S. Ferraina, F. Lacquaniti, 
R. Caminiti: Parieto-frontal coding of reaching: 
An integrated framework, Exp. Brain Res. 129, 
325-346 (1999) 

M. Haruno, D.M. Wolpert, M. Kawato: MOSAIC 
model for sensorimotor learning and control, 
Neural Comput. 13, 2201-2220 (2001) 


Computational Models of Cognitive and Motor Control 


References 


35.152 


35.153 


35.154 


35.155 


35.156 


35.157 


35.158 


35.159 


35.160 


35.161 


35.162 


35.163 


35.164 


35.165 


35.166 


35.167 


35.168 


A.G. Barto, A.H. Fagg, N. Sitkoff, J.C. Houk: A cere- 
bellar model of timing and prediction in the 
control of reaching, Neural Comput. 11, 565-594 
(1999) 

H. Ritter, T. Martinetz, K. Schulten: Topology- 
conserving maps for learning visuo-motor- 
coordination, Neural Netw. 2, 159-168 (1989) 

T. Martinetz, H. Ritter, K. Schulten: Three- 
dimensional neural net for learning visuo-motor 
coordination of a robot arm, IEEE Trans. Neural 
Netw. 1, 131-136 (1990) 

T. Kohonen: Self-organized formation of topo- 
logically correct feature maps, Biol. Cybern. 43, 
59-69 (1982) 

P. Baraduc, E. Guignon, Y. Burnod: Recoding arm 
position to learn visuomotor transformations, 
Cereb. Cortex 11, 906-917 (2001) 

M.V. Butz, 0. Herbort, J. Hoffmann: Exploiting 
redundancy for flexible behavior: Unsupervised 
learning in a modular sensorimotor control ar- 
chitecture, Psychol. Rev. 114, 1015-1046 (2007) 
A.F. Morse, J. de Greeff, T. Belpeame, A. Cangelosi: 
Epigenetic Robotics Architecture (ERA), IEEE Trans. 
Auton. Ment. Develop. 2 (2002) pp. 325-339 

T. Matsubara, S.-H. Hyon, J. Morimoto: Learning 
parametric dynamic movement primitives from 
multiple demonstrations, Neural Netw. 24, 493- 
500 (2011) 

K.V. Byadarhaly, A.A. Minai: A Hierarchical Model 
of Synergistic Motor Control, Proc. Int. Joint Conf. 
Neural Netw., Dallas (2013) 

T. Flash, N. Hogan: The coordination of arm 
movements: An experimentally confirmed math- 
ematical model, J. Neurosci. 5, 1688-1703 (1985) 
Y. Uno, M. Kawato, R. Suzuki: Formation and 
control of optimal trajectories in human multi- 
joint arm movements: Minimum torque-change 
model, Biol. Cybern. 61, 89-101 (1989) 

S. Ben-Itzhak, A. Karniel: Minimum accelera- 
tion criterion with constraints implies bang-bang 
control as an underlying principle for optimal 
trajectories of arm reaching movements, Neural 
Comput. 20, 779-812 (2008) 

C.M. Harris, D.M. Wolpert: Signal-dependent 
noise determines motor planning, Nature 394, 
780-784 (1998) 

E. Todorov, M.I. Jordan: Optimal feedback control 
as a theory of motor coordination, Nat. Neurosci. 
5, 1226-1235 (2002) 

E. Todorov: Optimality principles in sensorimotor 
control, Nat. Neurosci. 7, 907-915 (2004) 

F.J. Valero-Cuevas, M. Venkadesan, E. Todorov: 
Structured variability of muscle activations sup- 
ports the minimal intervention principle of motor 
control, J. Neurophysiol. 102, 59-68 (2009) 

E. Trainin, R. Meir, A. Karniel: Explaining patterns 
of neural activity in the primary motor cortex us- 
ing spinal cord and limb biomechanics models, 
J. Neurophysiol. 97, 3736-3750 (2007) 


35.169 A. Biess, D.G. 


35 


35 


35 


35. 


35 


35 


35. 


35 


35 


35 


35. 


35 


35 


.170 


171 


172 


173 


174 


175 


176 


177 


.178 


.179 


180 


.181 


182 


.183 


.184 


Libermann, T. Flash: A com- 
putational model for redundant arm three- 
dimensional pointing movements: Integration of 
independent spatial and temporal motor plans 
simplifies movement dynamics, J. Neurosci. 27, 
13045-13064 (2007) 

J.J. Gibson: The Theory of Affordances. In: Per- 
ceiving, Acting, and Knowing: Toward an Eco- 
logical Psychology, ed. by R. Shaw, J. Brans- 
ford (Lawrence Erlbaum, Hillsdale 1977) pp. 67- 
82 

M.L. Latash, J.P. Scholz, G. Schöner: Toward a new 
theory of motor synergies, Mot. Control 11, 276- 
308 (2007) 

S. Dehaene, M. Kerszberg, J.-P. Changeux: A neu- 
ronal model of a global workspace in effortful 
cognitive tasks, Proc. Natl. Acad. Sci. USA 95, 
14529-14534 (1998) 

M. Botvinick, D.C. Plaut: Doing without schema 
hierarchies: A recurrent connectionist approach 
to normal and impaired routine sequential ac- 
tion, Psychol. Rev. 111, 395-429 (2004) 

F.A. Middleton, P.L. Strick: Basal ganglia output 
and cognition: Evidence from anatomical, behav- 
ioral, and clinical studies, Brain Cogn. 42, 183-200 
(2000) 

F.A. Middleton, P.L. Strick: Basal ganglia ‘projec- 
tions’ to the prefrontal cortex of the primate, 
Cereb. Cortex 12, 926-945 (2002) 

R.S. Sutton, A.G. Barto: Reinforcement Learning 
(MIT Press, Cambridge 1998) 

R.S. Sutton: Learning to predict by the meth- 
ods of temporal difference, Mach. Learn. 3, 9-44 
(1988) 

N.D. Daw, Y. Niv, P. Dayan: Uncertainty-based 
competition between prefrontal and dorsolateral 
striatal systems for behavioral control, Nat. Neu- 
rosci. 8, 1704-1711 (2005) 

M.J. Frank, R.C. O'Reilly: A mechanistic account 
of striatal dopamine function in human cogni- 
tion: Psychopharmacological studies with caber- 
goline and haloperidol, Behav. Neurosci. 120, 
497-517 (2006) 

A.J. Gruber, P. Dayan, B.S. Gutkin, S.A. Solla: 
Dopamine modulation in the basal ganglia locks 
the gate to working memory, J. Comput. Neurosci. 
20, 153-166 (2006) 

A. Baddeley: Human Memory (Lawrence Erlbaum, 
Hove, UK 1990) 

A. Baddeley: The episodic buffer: A new compo- 
nent of working memory?, Trends Cogn. Sci. 4, 
417-423 (2000) 

P.S. Goldman-Rakic: Cellular basis of working 
memory, Neuron 14, 477-485 (1995) 

P.S. Goldman-Rakic, A.R. Cools, K. Srivastava: The 
prefrontal landscape: implications of functional 
architecture for understanding human mentation 
and the central executive, Philos. Trans.: Biol. Sci. 
351, 1445-1453 (1996) 


679 


SE | d Hed 


680 PartD 


Neural Networks 


SE | d Hed 


35. 


35. 


35: 


35. 


35. 


35. 


35. 


35. 


35; 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


185 


186 


187 


188 


189 


190 


191 


192 


193 


194 


195 


196 


197 


198 


199 


200 


201 


202 


J. Duncan: An adaptive coding model of neural 
function in prefrontal cortex, Nat. Rev. Neurosci. 
2, 820-829 (2001) 

R. Ratcliff, G. McKoon: The diffusion decision 
model: Theory and data for two-choice decision 
tasks, Neural Comput. 20, 873-922 (2008) 

G. Miller: The magical number seven, plus or mi- 
nus two: Some limits of our capacity for process- 
ing information, Psychol. Rev. 63, 81-97 (1956) 
J.E. Lisman, A.P. Idiart: Storage of 7 +2 short- 
term memories in oscillatory subcycles, Science 
267, 1512-1515 (1995) 

K. Ericsson, W. Kintsch: Long-term working mem- 
ory, Psychol. Rev. 102, 211-245 (1995) 

C.J. Wilson: The contribution of cortical neurons 
to the firing pattern of striatal spiny neurons. 
In: Models of Information Processing in the Basal 
Ganglia, ed. by J.C. Houk, J.L. Davis, D.G. Beiser 
(MIT Press, Cambridge 1995) pp. 29-50 

A.M. Graybiel, T. Aosaki, A.W. Flaherty, M. Kimura: 
The basal ganglia and adaptive motor control, 
Science 265, 1826-1831 (1994) 

A.M. Graybiel: The basal ganglia and chunking 
of action repertoires, Neurobiol. Learn. Mem. 70, 
119-136 (1998) 

M.D. Humphries, K. Gurney: The role of intra- 
thalamic and thalamocortical circuits in action 
selection, Network 13, 131-156 (2002) 

R.C. O'Reilly, Y. Munakata: Computational explo- 
rations in cognitive neuroscience: Understanding 
the mind by simulating the brain (MIT Press, Cam- 
bridge 2000) 

M.J. Frank, B. Loughry, R.C. O'Reilly: Interac- 
tions between frontal cortex and basal ganglia in 
working memory: A computational model, Cogn. 
Affect Behav. Neurosci. 1, 137-160 (2001) 

T.E. Hazy, M.J. Frank, R.C. O'Reilly: Banishing 
the homunculus: Making working memory work, 
Neuroscience 139, 105-118 (2006) 

R.C. O'Reilly, M.J. Frank: Making working memory 
work: A computational model of learning in the 
prefrontal cortex and basal ganglia, Neural Com- 
put. 18, 283-328 (2006) 

M.J. Frank, E.D. Claus: Anatomy of a decision: 
Striato-orbitofrontal interactions in reinforce- 
ment learning, decision making and reversal, 
Psychol. Rev. 113, 300-326 (2006) 

R.C. O'Reilly: Biologically based computational 
models of high level cognition, Science 314, 91- 
94 (2006) 

R.C. O'Reilly, S.A. Herd, W.M. Pauli: Computational 
models of cognitive control, Curr. Opin. Neurobiol. 
20, 257-261 (2010) 

M.E. Hasselmo: A model of prefrontal cortical 
mechanisms for goal-directed behavior, J. Cogn. 
Neurosci. 17, 1-14 (2005) 

M.E. Hasselmo, C.E. Stern: Mechanisms under- 
lying working memory for novel information, 
Trends Cogn. Sci. 10, 487-493 (2006) 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


203 


204 


205 


206 


207 


208 


209 


210 


211 


212 


213 


214 


215 


216 


217 


218 


219 


J.R. Reynolds, R.C. O'Reilly: Developing PFC rep- 
resentations using reinforcement learning, Cog- 
nition 113, 281-292 (2009) 

T.S. Braver, D.M. Barch, J.D. Cohen: Cognition and 
control in schizophrenia: A computational model 
of dopamine and prefrontal function, Biol. Psy- 
chiatry 46, 312-328 (1999) 

S. Monsell: Task switching, Trends Cog. Sci. 7, 134- 
140 (2003) 

T.S. Braver, J.R. Reynolds, D.I. Donaldson: Neural 
mechanisms of transient and sustained cognitive 
control during task switching, Neuron 39, 713-726 
(2003) 

H. Imamizu, T. Kuroda, T. Yoshioka, M. Kawato: 
Functional magnetic resonance imaging exami- 
nation of two modular architectures for switching 
multiple internal models, J. Neurosci. 24, 1173- 
1181 (2004) 

R.P. Cooper, T. Shallice: Contention scheduling 
and the control of routine activities, Cogn. Neu- 
ropsychol. 17, 297-338 (2000) 

R.P. Cooper, T. Shallice: Hierarchical schemas and 
goals in the control of sequential behavior, Psy- 
chol. Rev. 113, 887-916 (2006) 

P. Dayan: Images, frames, and connectionist hi- 
erarchies, Neural Comput. 18, 2293-2319 (2006) 

P. Dayan: Simple substrates for complex cogni- 
tion, Front. Neurosci. 2, 255-263 (2008) 

S. Helie, R. Sun: Incubation, insight, and creative 
problem solving: A unified theory and a connec- 
tionist model, Psychol. Rev. 117, 994-1024 (2010) 
S. Grossberg, L.R. Pearson: Laminar cortical dy- 
namics of cognitive and motor working mem- 
ory, sequence learning and performance: Toward 
a unified theory of how the cerebral cortex works, 
Psychol. Rev. 115, 677-732 (2008) 

B.J. Rhodes, D. Bullock, W.B. Verwey, B.B. Aver- 
beck, M.P.A. Page: Learning and production of 
movement sequences: Behavioral, neurophysio- 
logical, and modeling perspectives, Human Mov. 
Sci. 23, 683-730 (2004) 

B. Ans, Y. Coiton, J.-C. Gilhodes, J.-L. Velay: 
A neural network model for temporal sequence 
learning and motor programming, Neural Netw. 
7, 1461-1476 (1994) 

R.S. Bapi, D.S. Levine: Modeling the role of frontal 
lobes in sequential task performance. l: Basic 
Strucure and primacy effects, Neural Netw. 7, 
1167-1180 (1994) 

J.G. Taylor, N.R. Taylor: Analysis of recurrent 
cortico-basal ganglia-thalamic loops for working 
memory, Biol. Cybern. 82, 415-432 (2000) 

R.P. Cooper: Mechanisms for the generation and 
regulation of sequential behaviour, Philos. Psy- 
chol. 16, 389-416 (2003) 

R. Nishimoto, J. Tani: Learning to generate com- 
binatorial action sequences utilizing the initial 
sensitivity of deterministic dynamical systems, 
Neural Netw. 17, 925-933 (2004) 


Computational Models of Cognitive and Motor Control 


References 


35. 


35. 


35. 


35. 


35). 


35. 


35. 


35; 


35. 


35. 


35; 


35. 


35. 


35. 


35. 


35. 


35. 


220 


221 


222 


223 


224 


225 


226 


227 


228 


229 


230 


231 


232 


233 


234 


235 


236 


P.F. Dominey: From sensorimotor sequence to 
grammatical construction: evidence from simu- 
lation and neurophysiology, Adapt. Behav. 13, 
347-361 (2005) 

E. Salinas: Rank-order-selective neurons form 
a temporal basis set for the generation of motor 
sequences, J. Neurosci. 29, 4369-4380 (2009) 

S. Vasa, T. Ma, K.V. Byadarhaly, M. Perdoor, 
A.A. Minai: A Spiking Neural Model for the Spa- 
tial Coding of Cognitive Response Sequences, Proc. 
IEEE Int. Conf. Develop. Learn., Ann Arbor (2010) 
pp. 140-146 

F. Chersi, P.F. Ferrari, L. Fogassi: Neuronal chains 
for actions in the parietal lobe: A computational 
model, PloS ONE 6, e27652 (2011) 

M.R. Silver, S. Grossberg, D. Bullock, M.H. Histed, 
E.K. Miller: A neural model of sequential move- 
ment planning and control of eye movements: 
Item-order-rank working memory and saccade 
selection by the supplementary eye fields, Neural 
Netw. 26, 29-58 (2011) 

B.J. Baars: A Cognitive Theory of Consciousness 
(Cambridge Univ. Press, Cambridge 1988) 

B.J. Baars, S. Franklin: How conscious experience 
and working memory interact, Trends Cog. Sci. 7, 
166-172 (2003) 

S. Franklin, F.G.J. Patterson: The LIDA Architec- 
ture: Adding New Modes of Learning to an Intel- 
ligent, Autonomous, Software Agent, IDPT-2006 
Proc. (Integrated Design and Process Technology) 
(Society for Design and Process Science, San Diego 
2006) 

S. Dehaene, J.-P. Changeux: The Wisconsin card 
sorting test: Theoretical analysis and modeling in 
a neuronal network, Cereb. Cortex 1, 62-79 (1991) 
S. Dehaene, L. Naccache: Towards a cognitive 
neuroscience of consciousness: Basic evidence 
and a workspace framework, Cognition 79, 1-37 
(2001) 

L.R. Iyer, S. Doboli, A.A. Minai, V.R. Brown, 
D.S. Levine, P.B. Paulus: Neural dynamics of idea 
generation and the effects of priming, Neural 
Netw. 22, 674-686 (2009) 

L.R. lyer, V. Venkatesan, A.A. Minai: Neurocogni- 
tive spotlights: Configuring domains for ideation, 
Proc. Int. Conf. Neural Netw. (2011) pp. 2961-2968 
S. Doboli, A.A. Minai, P.J. Best: Latent attractors: 
A model for context-dependent place represen- 
tations in the hippocampus, Neural Comput. 12, 
1003-1037 (2000) 

R. Ratcliff: A theory of memory retrieval, Psychol. 
Rev. 85, 59-108 (1978) 

F.G. Ashby: A biased random-walk model for two 
choice reaction times, J. Math. Psychol. 27, 277- 
297 (1983) 

J.R. Busemeyer, J.T. Townsend: Decision field the- 
ory, Psychol. Rev. 100, 432-459 (1993) 

J.L. McClelland, D.E. Rumelhart: An interactive 
activation model of context effects in letter per- 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


35. 


237 


238 


239 


240 


241 


242 


243 


244 


245 


246 


247 


248 


249 


250 


251 


252 


253 


ception. Part 1: An account of basic findings, 
Psychol. Rev. 88, 375-407 (1981) 

D.E. Rumelhart, J.L. McClelland: An interactive ac- 
tivation model of context effects in letter percep- 
tion: Part 2. The contextual enhancement effect 
and some tests and extensions of the model, Psy- 
chol. Rev. 89, 60-94 (1982) 

M. Usher, J.L. McClelland: The time course of per- 
ceptual choice: The leaky, competing accumulator 
model, Psychol. Rev. 108, 550-592 (2001) 

J.I. Gold, M.N. Shadlen: The neural basis of de- 
cision making, Annu. Rev. Neurosci. 30, 535-574 
(2007) 

M. Khamassi, L. Lachéze, B. Girard, A. Berthoz, 
A. Guillot: Actor-critic models of reinforcement 
learning in the basal ganglia: From natural to ar- 
tificial rats, Adapt. Behav. 13, 131-148 (2005) 

N.D. Daw, K. Doya: The computational neurobiol- 
ogy of learning and reward, Curr. Opin. Neurobiol. 
16, 199-204 (2006) 

P. Dayan, Y. Niv: Reinforcement learning: The 
Good, The Bad and The Ugly, Curr. Opin. Neuro- 
biol. 18, 185-196 (2008) 

K. Doya: Modulators of decision making, Nat. 
Neurosci. 11, 410-416 (2008) 

E.M. Izhikevich: Solving the distal reward problem 
through linkage of STDP and dopamine signaling, 
Cereb. Cortex 17, 2443-2452 (2007) 

R. Urbanczik, W. Senn: Reinforcement learning in 
populations of spiking neurons, Nat. Neurosci. 12, 
250-252 (2009) 

D. Durstewitz, J.K. Seamans, T.J. Sejnowski: 
Dopamine mediated stabilization of delay-period 
activity in a network model of prefrontal cortex, 
J. Neurophysiol. 83, 1733-1750 (2000) 

D. Durstewitz, J.K. Seamans: The computational 
role of dopamine D1 receptors in working mem- 
ory, Neural Netw. 15, 561-572 (2002) 

J.J. Hopfield: Neural networks and physical sys- 
tems with emergent collective computational 
abilities, Proc. Natl. Acad. Sci. USA 79, 2554-2558 
(1982) 

D.J. Amit, N. Brunel: Learning internal repre- 
sentations in an attractor neural network with 
analogue neurons, Netw. Comput. Neural Syst. 6, 
359-388 (1995) 

D.J. Amit, N. Brunel: Model of global spontaneous 
activity and local structured activity during de- 
lay periods in the cerebral cortex, Cereb. Cortex 7, 
237-252 (1997) 

X.J. Wang: Synaptic basis of cortical persistent ac- 
tivity: The importance of NMDA receptors to work- 
ing memory, J. Neurosci. 19, 9587-9603 (1999) 

D. Durstewitz, M. Kelc, 0. Gunturkun: A neuro- 
computational theory of the dopaminergic mod- 
ulation of working memory functions, J. Neurosci. 
19, 2807-2822 (1999) 

N. Brunel, X.-J. Wang: Effects of neuromodula- 
tion in a cortical network model of object work- 


681 


SE | d Hed 


682 PartD 


Neural Networks 


SE | d Hed 


35.254 


35.255 


35.256 


35.257 


ing memory dominated by recurrent inhibition, 
J. Comput. Neurosci. 11, 63-85 (2001) 

J.D. Cohen, T.S. Braver, J.W. Brown: Computational 
perspectives on dopamine function in prefrontal 
cortex, Curr. Opin. Neurobiol. 12, 223-229 (2002) 
D. Servan-Schreiber, H. Printz, J.D. Cohen: A net- 
work model of catecholamine effects: Gain, 
signal-to-noise ratio, and behavior, Science 249, 
892-895 (1990) 

S. Hochreiter, J. Schmidhuber: Long short-term 
memory, Neural Comput. 9, 1735-1780 (1997) 

S.L. Moody, S.P. Wise, G. di Pellegrino, D. Zipser: 
A model that accounts for activity in primate 


35.258 


35.259 


35.260 


frontal cortex during a delayedmatch-to-sample 
task, J. Neurosci. 18, 399-410 (1998) 

R. Hahnloser, R.J. Douglas, M. Mahowald, 
K. Hepp: Feedback interactions between 
neuronal pointers and maps for attentional 
processing, Nat. Neurosci. 2, 746-752 (1999) 

A. Compte, N. Brunel, P.S. Goldman-Rakic, X.- 
J. Wang: Synaptic mechanisms and network dy- 
namics underlying spatial working memory in 
a cortical network model, Cereb. Cortex 10, 910- 
923 (2000) 

G. Mongillo, 0. Barak, M. Tsodyks: Synaptic theory 
of working memory, Science 319, 1543-1546 (2008) 


36. Cognitive Architectures and Agents 


Sebastien Hélie, Ron Sun 


A cognitive architecture is the essential structures 
and processes of a domain-generic computational 
cognitive model used for a broad, multiple-level, 
multiple domain analysis of cognition and behav- 
ior. This chapter reviews some of the most popular 
psychologically-oriented cognitive architectures, 
namely adaptive control of thought-rational 
(ACT-R), Soar, and CLARION. For each cognitive ar- 
chitecture, an overview of the model, some key 
equations, and a detailed simulation example 
are presented. The example simulation with AC- 
T-R is the initial learning of the past tense of 
irregular verbs in English (developmental psy- 
chology), the example simulation with Soar is the 
well-known missionaries and cannibals problem 
(problem solving), and the example simulation 
with CLARION is a complex mine field navigation 
task (autonomous learning). This presentation is 
followed by a discussion of how cognitive ar- 
chitectures can be used in multi-agent social 
simulations. A detailed cognitive social simula- 
tion with CLARION is presented to reproduce results 
from organizational decision-making. The chapter 
concludes with a discussion of the impact of neural 
network modeling on cognitive architectures and 
a comparison of the different models. 


361 Background............... cc eeeeeeeeeneeeeeeenee 683 
36M1. OUHIME: assisen seserian 684 


36.1 Background 


Cognitive theories are often underdetermined by 
data [36.1]. As such, different theories, with very little 
in common, can sometimes be used to account for the 
very same phenomena [36.2]. This problem can be re- 
solved by adding constraints to cognitive theories. The 
most intuitive approach to adding constraints to any 
scientific theory is to collect more data. While experi- 
mental psychologists have come a long way toward this 


36.2 Adaptive Control 


of Thought-Rational (ACT-R)................. 685 
36.2.1 The Perceptual-Motor Modules... 686 
36.2.2 The Goal Module...................0 686 
36.2.3 The Declarative Module.............. 686 
36.2.4 The Procedural Memory.............. 686 
36.2.5 Simulation Example: 
Learning Past Tenses of Verbs ..... 687 
ee E E TE 688 
36.3.1 Architectural Representation ...... 688 
36.3.2 The Soar Decision Cycle .............. 688 
363:3 IMPIS Ei <a scc.cssaceccesanagecbscnsedesavs 688 
36.3.4 EXteNSİONS. oosssorserrricripaserssisss 689 
36.3.5 Simulation Example: 
Learning in Problem Solving....... 689 
SGM CUARION onise eisdinn 690 


36.4.1 The Action-Centered Subsystem.. 691 
36.4.2 The Non-Action-Centered 

SUDANS TE oaa aa vvartees case 691 
36.4.3 The Motivational 

and Meta-Cognitive Subsystems. 692 
36.4.4 Simulation Example: 


Minefield Navigation ................. 692 

36.5 Cognitive Architectures as Models 
of Multi-Agent Interaction ................... 693 
36.5.1 (PREMIO: aeee cn usccehaters 694 
36.6 General Discussion.....................0:00008 694 
Referentes. oc. se hecssusdecesue sahanusdedentne saavenn ddan tees 695 


goal in over a century of psychology research, the gap 
between empirical and theoretical progress is still sig- 
nificant. 

Another tactic that can be adopted toward constrain- 
ing psychological theories is unification [36.1]. Newell 
argued that more data could be used to constraint a the- 
ory if the theory was designed to explain a wider range 
of phenomena. In particular, these unified (i.e., in- 


683 


v 
o 
= 
as 
(= 
w 
fon) 
= 


684 Part D | Neural Networks 


oe | d Hed 


tegrative) cognitive theories could be put to the test 
against well-known (stable) regularities that have been 
observed in psychology. So far, these integrative theo- 
ries have taken the form of cognitive architectures, and 
some of them have been very successful in explaining 
a wide range of data. 

A cognitive architecture is the essential struc- 
tures and processes of a domain-generic computa- 
tional cognitive model used for a broad, multiple- 
level, multiple-domain analysis of cognition and be- 
havior [36.3]. Specifically, cognitive architectures deal 
with componential processes of cognition in a struc- 
turally and mechanistically well-defined way. Their 
function is to provide an essential framework to fa- 
cilitate more detailed exploration and understanding 
of various components and processes of the mind. In 
this way, a cognitive architecture serves as an initial 
set of assumptions to be used for further development. 
These assumptions may be based on available empirical 
data (e.g., psychological or biological), philosophical 
thoughts and arguments, or computationally inspired 
hypotheses concerning psychological processes. A cog- 
nitive architecture is useful and important precisely 
because it provides a comprehensive initial frame- 
work for further modeling and simulation in many task 
domains. 

While there are all kinds of cognitive architectures 
in existence, in this chapter we are concerned specif- 
ically with psychologically-oriented cognitive archi- 
tectures (as opposed to software engineering-oriented 
cognitive architectures, e.g., LIDA [36.4], or neurally- 
oriented cognitive architectures, e.g., ART [36.5]). 
Psychologically-oriented cognitive architectures are 
particularly important because they shed new light on 
human cognition and, therefore, they are useful tools 
for advancing the understanding of cognition. In un- 
derstanding cognitive phenomena, the use of computa- 
tional simulation on the basis of cognitive architectures 
forces one to think in terms of processes, and in terms 
of details. Instead of using vague, purely conceptual 
theories, cognitive architectures force theoreticians to 
think clearly. They are, therefore, critical tools in the 
study of the mind. Cognitive psychologists who use 
cognitive architectures must specify a cognitive mecha- 
nism in sufficient detail to allow the resulting models 
to be implemented on computers and run as simula- 
tions. This approach requires that important elements 
of the models be spelled out explicitly, thus aiding in 
developing better, conceptually clearer theories. It is 
certainly true that more specialized, narrowly-scoped 
models may also serve this purpose, but they are not 


as generic and as comprehensive and thus may not be 
as useful to the goal of producing general intelligence. 

It is also worth noting that psychologically-oriented 
cognitive architectures are the antithesis of expert sys- 
tems. Instead of focusing on capturing performance 
in narrow domains, they are aimed to provide broad 
coverage of a wide variety of domains in a way that 
mimics human performance [36.6]. While they may 
not always perform as well as expert systems, busi- 
ness and industrial applications of intelligent systems 
increasingly require broadly-scoped systems that are 
capable of a wide range of intelligent behaviors, not 
just isolated systems of narrow functionalities. For ex- 
ample, one application may require the inclusion of 
capabilities for raw image processing, pattern recog- 
nition, categorization, reasoning, decision-making, and 
natural language communications. It may even require 
planning, control of robotic devices, and interactions 
with other systems and devices. Such requirements ac- 
centuate the importance of research on broadly-scoped 
cognitive architectures that perform a wide range of 
cognitive functionalities across a variety of task do- 
mains (as opposed to more specialized systems). 

In order to achieve general computational intel- 
ligence in a psychologically-realistic way, cognitive 
architectures should include only minimal initial struc- 
tures and independently learn from their own experi- 
ences. Autonomous learning is an important way of 
developing additional structure, bootstrapping all the 
way to a full-fledged cognitive model. In so doing, 
it is important to be careful to devise only minimal 
initial learning capabilities that are capable of boot- 
strapping, in accordance with whatever phenomenon 
is modeled [36.7]. This can be accomplished through 
environmental cues, structures, and regularities. The 
avoidance of overly complicated initial structures, and 
thus the inevitable use of autonomous learning, may of- 
ten help to avoid overly representational models that are 
designed specifically for the task to be achieved [36.3]. 
Autonomous learning is thus essential in achieving gen- 
erality in a psychologically-realistic way. 


36.1.1 Outline 


The remainder of this chapter is organized as follows. 
The next three sections review some of the most popu- 
lar cognitive architectures that are used in psychology 
and cognitive science. Specifically, Sect. 36.2 reviews 
ACT-R, Sect. 36.3 reviews Soar, and Sect. 36.4 re- 
views CLARION. Each of these sections includes an 
overview of the architecture, some key equations, and 


Cognitive Architectures and Agents 


36.2 Adaptive Control of Thought-Rational (ACT-R) 


a detailed simulation example. Following this presenta- 
tion, Sect. 36.5 discusses how cognitive architectures 
can be used in multi-agent cognitive social simula- 


tions and presents a detailed example with CLARION. 
Finally, Sect. 36.6 presents a general discussion and 
compares the models reviewed. 


36.2 Adaptive Control of Thought-Rational (ACT-R) 


ACT-R is one of the oldest and most successful cog- 
nitive architectures. It has been used to simulate and 
implement many cognitive tasks and applications, such 
as the Tower of Hanoi, game playing, aircraft con- 
trol, and human—computer interactions [36.8]. ACT-R 
is based on three key ideas [36.9]: 


a) Rational analysis 

b) The distinction between procedural and declarative 
memories 

c) A modular structure linked with communication 
buffers (Fig. 36.1). 


According to the rational analysis of cognition (first 
key idea; [36.10]), the cognitive architecture is opti- 
mally tuned to its environment (within its computa- 
tional limits). Hence, the functioning of the architecture 
can be understood by investigating how optimal behav- 
ior in a particular environment would be implemented. 


According to Anderson, such optimal adaptation is 
achieved through evolution [36.10]. 

The second key idea, the distinction between declar- 
ative and procedural memories, is implemented by 
having different modules in ACT-R, each with its own 
representational format and learning rule. Briefly, pro- 
cedural memory is represented by production rules 
(similar to a production system) that can act on the en- 
vironment through the initiation of motor actions. In 
contrast, declarative memory is passive and uses chunks 
to represent world knowledge that can be accessed by 
the procedural memory but does not interact directly 
with the environment through motor actions. 

The last key idea in ACT-R is modularity. As seen in 
Fig. 36.1, procedural memory (i. e., the production sys- 
tem) cannot directly access information from the other 
modules: the information has to go through dedicated 
buffers. Each buffer can hold a single chunk at any 
time. Hence, buffers serve as information processing 


Intentional module Declarative module 
(not identified) (temporal/hippocampus) 


Goal buffer Retrieval buffer 
(DLPFC) (VLPFC) 
tt i 


Matching (striatum) 


Selection (pallidum) 


Execution (thalamus) 


Manual buffer 
(VLPFC) 


Productions 
(basal ganglia) 


Visual buffer 
(DLPFC) 


Visual module 
(occipital/etc) 


Manual module 
(motor/cerebellum) 


Fig. 36.1 General architecture 
of ACT-R. DLPFC = dorsolat- 


eral prefrontal cortex; VLPFC 
= ventrolateral prefrontal cor- 
tex (after [36.11], by courtesy 

of the American Psychological 
Association) 


685 


T'9E | d Hed 


686 PartD 


Neural Networks 


T'9E | d Hed 


bottlenecks in ACT-R. This restricts the amount of in- 
formation available to the production system, which in 
turn limits the processing that can be done by this mod- 
ule at any given time. Processing within each module 
is encapsulated. Hence, all the modules can operate in 
parallel without much interference. The following sub- 
sections describe the different ACT-R modules in more 
details. 


36.2.1 The Perceptual-Motor Modules 


The perceptual-motor modules in ACT-R include a de- 
tailed representation of the output of perceptual sys- 
tems, and the input of motor systems [36.8]. The visual 
module is divided into the well-established ventral 
(what) and dorsal (where) visual streams in the primate 
brain. The main function of the dorsal stream is to find 
the location of features in a display (e.g., red-colored 
items, curved-shaped objects) without identifying the 
objects. The output from this module is a location 
chunk, which can be sent back to the central production 
system. The function of the ventral stream is to iden- 
tify the object at a particular location. For instance, the 
central production system could send a request to the 
dorsal stream to find a red object in a display. The dor- 
sal stream would search the display and return a chunk 
representing the location of a red object. If the central 
production system needs to know the identity of that 
object, the location chunk would be sent to the ven- 
tral stream. A chunk containing the object identity (e.g., 
a fire engine) would then be returned to the production 
system. 


36.2.2 The Goal Module 


The goal module serves as the context for keeping track 
of cognitive operations and supplement environmental 
stimulations [36.8]. For instance, one can do many dif- 
ferent operations with a pen picked up from a desk (e.g., 
write a note, store it in a drawer, etc.). What operation 
is selected depends primarily on the goal that needs to 
be achieved. If the current goal is to clean the desk, 
the appropriate action is to store the pen in a drawer. 
If the current goal is to write a note, putting the pen in 
a drawer is not a useful action. 

In addition to providing a mental context to select 
appropriate production rules, the goal module can be 
used in more complex problem-solving tasks that need 
subgoaling [36.8]. For instance, if the goal is to play 
a game of tennis, one first needs to find an opponent. 
The goal module must create this subgoal that needs 


to be achieved before moving back to the original goal 
(i.e., playing a game of tennis). Note that goals are 
centralized in a unique module in ACT-R and that pro- 
duction rules only have access to the goal buffer. The 
current goal to be achieved is the one in the buffer, while 
later goals stored in the goal module are not accessible 
to the production. Hence, the play a game of tennis goal 
is not accessible to the production rules while the find 
an opponent subgoal is being pursued. 


36.2.3 The Declarative Module 


The declarative memory module contains knowledge 
about the world in the form of chunks [36.8]. Each 
chunk represents a piece of knowledge or a concept 
(e.g., fireman, bank, etc.). Chunks can be accessed 
effortfully by the central production system, and the 
probability of retrieving a chunk depends on the chunk 
activation 


Btw]! 
z ' (36.1) 


Pi=|1+e 


where P; is the probability of retrieving chunk i, B; is 
the base-level activation of chunk i, Sj; is the associa- 
tion strength between chunks j and i, W; is the amount of 
attention devoted to chunk j, t is the activation thresh- 
old, and € is a noise parameter. It is important to note 
that the knowledge chunks in the declarative module are 
passive and do not do anything on their own. The func- 
tion of this module is to store information so that it can 
be retrieved by the central production system (which 
corresponds to procedural memory). Only in the cen- 
tral production system can the knowledge be used for 
further reasoning or to produce actions. 


36.2.4 The Procedural Memory 


The procedural memory is captured by the central pro- 
duction rule module and fills the role of a central 
executive processor. It contains a set of rules in which 
the conditions can be matched by the chunks in all the 
peripheral buffers and the output is a chunk that can 
be placed in one of the buffers. The production rules 
are chained serially, and each rule application takes 
a fixed amount of psychological time. The serial nature 
of central processing constitutes another information 
processing bottleneck in ACT-R. 

Because only one production rule can be fired at any 
given time, its selection is crucial. In ACT-R, each pro- 
duction rule has a utility value that depends on: a) its 


Cognitive Architectures and Agents | 36.2 Adaptive Control of Thought-Rational (ACT-R) 687 


probability of achieving the current goal, b) the value 
(importance) of the goal, and c) the cost of using the 
production rule [36.8]. Specifically, 


U; = P;G- Ci, (36.2) 


where U; is the utility of production rule i, P; is the 
(estimated) probability that selecting rule i will achieve 
the goal, G is the value of the current goal, and C; 
is the (estimated) cost of rule i. The most useful rule 
is always selected in every processing cycle, but the 
utility values can be noisy, which can result in the se- 
lection of suboptimal rules [36.12]. Rule utilities are 
learned online by counting the number of times that ap- 
plying a rule has achieved the goal. Anderson [36.10] 
has shown that the selection according to these counts 
is optimal in the Bayesian sense. Also, production rules 
can be made more efficient by using a process called 
production compilation (36.12, 13]. Briefly, if two pro- 
duction rules are often fired in succession and the result 
is positive, a new production rule is created which di- 
rectly links the conditions from the first production rule 
to the action following the application of the second 
production rule. Hence, the processing time is cut in 
half by applying only one rule instead of two. 


36.2.5 Simulation Example: 
Learning Past Tenses of Verbs 


General computational intelligence requires the ab- 
straction of regularities to form new rules (e.g., rule 
induction). A well-known example in psychology is 
children learning English past tenses of verbs [36.13]. 
This classical result shows that children’s accuracy in 
producing the past tense of irregular verbs follows a U- 
shaped curve [36.14]. Early in learning, children have 
a separate memory representation for the past tense 
of each verb (and no conjugation rule). Hence, the 
past tenses of irregular verbs are used mostly correctly 
(Phase 1). After moderate training, children notice that 
most verbs can be converted to their past tense by 
adding the suffix -ed. This leads to the formulation of 
a default rule (e.g., to find the past tense of a verb, add 
the suffix -ed to the verb stem). This rule is a useful 
heuristic and works for all regular verbs in English. 
However, children tend to overgeneralize and (incor- 
rectly) apply the rule to irregular verbs. This leads 
to errors and the low point of the U-shaped curve 
(Phase 2). Finally, children learn that there are excep- 
tions to the default rule and memorize the past tense of 
irregular verbs. Performance improves again (Phase 3). 


In ACT-R, the early phase of training uses instance- 
based retrieval (i. e., retrieving the chunk representing 
each verb’s past tense using (36.1)). The focus of the 
presentation is on the induction of the default rule, 
which is overgeneralized in Phase 2 and correctly ap- 
plied in Phase 3. This is accomplished by joining two 
production rules. First, consider the following memory 
retrieval rule used in Phase 1 [36.12]: 


1. Retrieve-past-tense: 
IF the goal is to find the past tense of a word w: 
THEN issue a request to declarative memory for the 
past tense of w. 


If a perfect match is retrieved from declarative 
memory, a second rule is used to produce the (probably 
correct) response. However, if Rule 1 fails to retrieve 
a perfect match, the verb past tense is unknown and an 
analogy rule is used instead [36.12]: 


2. Analogy-find-pattern: 
IF the goal is to find the past tense of word w1; 
AND the retrieval buffer contains past tense w2- 
suffix of w2: 
THEN set the answer to w1-(w2-suffix). 


This rule produces a form of generalization using 
an analogy. Because Rule 2 always follows Rule 1, they 
can be combined using production compilation [36.12]. 
Also, w2 is likely to be a regular verb, so w2-suffix 
is likely to be -ed. Hence, combining Rules 1 and 2 
yields [36.12]: 


3. Learned-rule: 
IF the goal is to find the past tense of word w: 
THEN set the answer to w-ed 


which is the default rule that can be used to accu- 
rately find the past tense of regular verbs. The U-shaped 
curve representing the performance of children learn- 
ing irregular verbs can thus be explained with ACT-R 
as follows [36.13]: In Phase 1, Rule 3 does not exist 
and Rule 1 is applied to correctly conjugate irreg- 
ular verbs. In Phase 2, Rule 3 is learned and has 
proven useful with regular verbs (thus increasing P; 
in (36.2)). Hence, it is often selected to incorrectly 
conjugate irregular verbs. In phase 3, the irregular 
verbs become more familiar as more instances have 
been encountered. This increases their base-level ac- 
tivation in declarative memory (B; in (36.1)), which 
facilitates retrieval and increases the likelihood that 
Rule 1 is selected to correctly conjugate the irregular 
verbs. More details about this simulation can be found 
in [36.13]. 


T'9E | d Hed 


688 Part D | Neural Networks 


€°9€ | d Hed 


36.3 Soar 


Soar was the original unified theory of cognition pro- 
posed by Newell [36.1]. Soar has been used success- 
fully in many problem solving tasks such as Eight 
Puzzle, the Tower of Hanoi, Fifteen Puzzle, Think- 
a-Dot, and Rubik’s Cube. In addition, Soar has been 
used for many military applications such as train- 
ing models for human pilots and mission rehearsal 
exercises. According to the Soar theory of intelli- 
gence [36.15], human intelligence is an approxima- 
tion of a knowledge system [36.9]. Hence, the most 
important aspect of intelligence (natural or artifi- 
cial) is the use of all available knowledge [36.16], 
and failures of intelligence are failures of knowl- 
edge [36.17]. 

All intelligent behaviors can be understood in terms 
of problem solving in Soar [36.9]. As such, Soar is 
implemented as a set of problem-space computational 
models (PSCM) that partition the knowledge into goal 
relevant ways [36.16]. Each PSCM implicitly contains 
the representation of a problem space defined by a set 
of states and a set of operators that can be visual- 
ized using a decision tree [36.17]. In a decision tree 
representation, the nodes represent the states, and one 
moves around from state to state using operators (the 
branches/connections in the decision tree). The objec- 
tive of a Soar agent is to move from an initial state to 
one of the goal states, and the best operator is always 
selected at every time step [36.16]. If the knowledge 
in the model is insufficient to select a single best op- 
erator at a particular time step, an impasse is reached, 
and a new goal is created to resolve the impasse. This 
new goal defines its own problem space and set of 
operators. 


36.3.1 Architectural Representation 


The general architecture of Soar is shown in Fig. 36.2. 
The main structures are a working memory and a long- 
term memory. Working memory is a blackboard where 
all the relevant information for the current decision cy- 
cle is stored [36.17]. It contains a goal representation, 
perceptual information, and relevant knowledge that 
can be used as conditions to fire rules. The outcome 
of rule firing can also be added to the working mem- 
ory to cause more rules to fire. The long-term memory 
contains associative rules representing the knowledge 
in the system (in the form of IF —> THEN rules). The 
rules in long-term memory can be grouped/organized to 
form operators. 


36.3.2 The Soar Decision Cycle 


In every time step, Soar goes through a six-step deci- 
sion cycle [36.17]. The first step in Soar is to receive an 
input from the environment. This input is inserted into 
working memory. The second step is called the elabo- 
ration phase. During this phase, all the rules matching 
the content of working memory fire in parallel, and the 
result is put into working memory. This, in turn, can 
create a new round of parallel rule firing. The elabora- 
tion phase ends when the content of working memory is 
stable, and no new knowledge can be added in working 
memory by firing rules. 

The third step is the proposal of operators that are 
applicable to the content in working memory. If no op- 
erator is applicable to the content of working memory, 
an impasse is reached. Otherwise, the potential opera- 
tors are evaluated and ordered according to a symbolic 
preference metric. The fourth step is the selection of 
a single operator. If the knowledge does not allow for 
the selection of a single operator, an impasse is reached. 
The fifth step is to apply the operator. If the opera- 
tor does not result is a change of state, an impasse 
is reached. Finally, the sixth step is the output of the 
model, which can be an external (e.g., motor) or an in- 
ternal (e.g., more reasoning) action. 


36.3.3 Impasses 
When the immediate knowledge is insufficient to reach 
a goal, an impasse is reached and a new goal is created 


to resolve the impasse. Note that this subgoal produces 
its own problem space with its own set of states and 


Procedural Semantic Episodic 
memory memory memory 


Long-term memory 


Learning 


Decision 
process 


Working memory 


Perception | action 


Fig. 36.2 The general architecture of Soar. The subdivi- 
sion of long-term memory is a new addition in Soar 9 


Cognitive Architectures and Agents | 36.3 Soar 


operators. If the subgoal reaches an impasse, another 
subgoal is recursively created to resolve the second im- 
passe, and so on. There are four main types of impasses 
in Soar [36.17]: 


1. No operator is available in the current state. 

New goal: Find an operator that is applicable in the 
current state. 

2. An operator is applicable in the current state, but its 

application does not change the current state. 
New goal: Modify the operator so that its appli- 
cation changes the current state. Alternatively, the 
operator can be modified so that it is no longer 
deemed applicable in the current state. 

3. Two or more operators are applicable in the current 
state but neither one of them is preferred according 
to the symbolic metric. 

New goal: Further evaluate the options and make 
one of the operators preferred to the others. 

4. More than one operator is applicable, and there 
is knowledge in working memory favoring two or 
more operators in the current state. 

New goal: Resolve the conflict by removing from 
working memory one of the contradictory prefer- 
ences. 


Regardless of which type of impasse is reached, re- 
solving an impasse is an opportunity for learning in 
Soar [36.17]. Each time a new result is produced while 
achieving a subgoal, a new rule associating the current 
state with the new result is added in the long-term mem- 
ory to ensure that the same impasse will not be reached 
again in the future. This new rule is called a chunk to 
distinguish it from rules that were precoded by the mod- 
eler (and learning is called chunking). 


36.3.4 Extensions 


Unlike ACT-R (and CLARION, as described next), 
Soar was originally designed as an artificial intelligence 
model [36.16]. Hence, initially, more attention was paid 
to functionality and performance than to psychological 
realism. However, Soar has been used in psychology 
and Soar 9 has been extended to increase its psycho- 
logical realism and functionality [36.18]. This version 
of the architecture is illustrated in Fig. 36.2. First, the 
long-term memory has been further subdivided in cor- 
respondence with psychology theory [36.19]. The asso- 
ciative rules are now part of the procedural memory. 
In addition to the procedural memory, the long-term 
memory now also includes a semantic and an episodic 


memory. The semantic memory contains knowledge 
structures representing factual knowledge about the 
world (e.g., the earth is round), while the episodic 
memory contains a snapshot of the working memory 
representing an episode (e.g., Fido the dog is now sit- 
ting in front of me). 

At the subsymbolic level, Soar 9 includes activa- 
tions in working memory to capture recency/usefulness 
(as in ACT-R). In addition, Soar 9 uses non-symbolic 
(numerical) values to model operator preferences. 
These are akin to utility functions and are used when 
symbolic operators as insufficient to select a single op- 
erator [36.20]. When numerical preferences are used, 
an operator is selected using a Boltzmann distribution 


eS(O)/T 


P(O;) = SSO sO 
J 


(36.3) 


where P(O;) is the probability of selecting operator i, 
S(O;) is the summed support (preference) for operator 
i, and t is a randomness parameter. Numerical operator 
preferences can be learned using reinforcement learn- 
ing. 

Finally, recent work has been initiated to add a clus- 
tering algorithm that would allow for the creation of 
new symbolic structures and a visual imagery module 
to facilitate symbolic spatial reasoning (not shown in 
Fig. 36.2). Also, the inclusion of emotions is now being 
considered (via appraisal theory [36.21]). 


36.3.5 Simulation Example: 
Learning in Problem Solving 


Nason and Laird [36.20] proposed a variation of Soar 
that includes a reinforcement learning algorithm (fol- 
lowing the precedents of CLARION and ACT-R) to 
learn numerical preference values for the operators. In 
this implementation (called Soar-RL), the preferences 
are replaced by Q-values [36.22] that are learned using 
environmental feedback. After the Q-value of each rele- 
vant operator has been calculated (i. e., all the operators 
available in working memory), an operator is stochasti- 
cally selected (as in (36.3)). 

Soar-RL has been used to simulate the missionaries 
and cannibals problem. The goal in this problem is to 
transport three missionaries and three cannibals across 
a river using a boat that can carry at most two persons 
at a time. Several trips are required, but the cannibals 
must never outnumber the missionaries on either river- 
bank. This problem has been used as a benchmark in 


689 


€°9€ | d Hed 


690 PartD 


19€| d Hed 


Neural Networks 


Input 


problem-solving research because, if the desirability of 
a move is evaluated in terms of the number of people 
that have crossed the river (which is a common assump- 
tion), a step backward must be taken midway in solving 
the problem (i. e., a move that reduces the number of 
peoples that crossed the river must be selected). 

In the Soar-RL simulation, the states were defined 
by the number of missionaries and cannibals on each 
side of the river and the location of the boat. The op- 
erators were boat trips transporting people, and the 
Q-values of the operators were randomly initialized. 
Also, to emphasize the role of reinforcement learn- 
ing in solving this problem, chunking was disengaged. 
Hence, the only form of adaptation was the adjustment 


36.4 CLARION 


CLARION is an integrative cognitive architecture 
consisting of a number of distinct subsystems with 
a dual representational structure in each subsystem (im- 
plicit versus explicit representations). CLARION is the 


ACS 


NACS 


Action-centered Non-action- 
explicit centered explicit 
representation representation 
Action-centered Non-action- 
implicit centered implicit 
representation representation 


Output 


Goal structure Reinforcement 


Goal setting 


Filtering 
Selection 
Regulation 


Drives 


MS MCS 


Fig. 36.3 The CLARION architecture. ACS stands for the action- 
centered subsystem, NACS the non-action-centered subsystem, MS 
the motivational subsystem, and MCS the meta-cognitive subsys- 
tem (after [36.23]) 


of the operator Q-values. Success states (i. e., all peo- 
ple crossed the river) were rewarded, failure states (i. e., 
cannibals outnumbering missionaries on a riverbank) 
were punished, and all other states received neutral re- 
inforcement. 

Using this simulation methodology, Soar-RL gen- 
erally learned to solve the missionaries and cannibals 
problem. Most errors resulted from the stochastic de- 
cision process [36.20]. Nason and Laird also showed 
that the model performance can be improved fivefold 
by adding a symbolic preference preventing an opera- 
tor at time ¢ from undoing the result of the application 
of an operator at time t — 1. More details on this simu- 
lation can be found in [36.20]. 


newest of the reviewed architectures, but it has al- 
ready been successfully applied to several tasks such 
as navigation in mazes and mine fields, human rea- 
soning, creative problem solving, and cognitive social 
simulations. CLARION is based on the following ba- 
sic principles [36.24]. First, humans can learn with or 
without much a priori specific knowledge to begin with, 
and humans learn continuously from on-going expe- 
rience in the world. Second, there are different types 
of knowledge involved in human learning (e.g., pro- 
cedural vs. declarative, implicit vs. explicit; [36.24]), 
and different types of learning processes are involved 
in acquiring different types of knowledge. Third, moti- 
vational processes as well as meta-cognitive processes 
are important and should be incorporated in a psycho- 
logically realistic cognitive architecture. According to 
CLARION, all three principles are required to achieve 
general computational intelligence. An overview of the 
architecture is shown in Fig. 36.3. 

The CLARION subsystems include the action-cen- 
tered subsystem (ACS), the non-action-centered subsys- 
tem (NACS), the motivational subsystem (MS), and the 
meta-cognitive subsystem (MCS). The role of the ACS 
is to control actions, regardless of whether the actions 
are for external physical movements or for internal men- 
tal operations. The role of the NACS is to maintain gen- 
eral knowledge. The role of the MS is to provide under- 
lying motivations for perception, action, and cognition, 
in terms of providing impetus and feedback (e.g., indi- 
cating whether or not outcomes are satisfactory). The 
role of the MCS is to monitor, direct, and modify dy- 
namically the operations of the other subsystems. 


Cognitive Architectures and Agents | 36.4 CLARION 691 


Each of these interacting subsystems consists of 
two levels of representations. In each subsystem, the 
top level encodes explicit (e.g., verbalizable) knowl- 
edge and the bottom level encodes implicit (e.g., non- 
verbalizable) knowledge. The two levels interact, for 
example, by cooperating through a combination of the 
action recommendations from the two levels, respec- 
tively, as well as by cooperating in learning through 
bottom-up and top-down processes. Essentially, it is 
a dual-process theory of mind [36.24]. 


36.4.1 The Action-Centered Subsystem 


The ACS is composed of a top and a bottom level. 
The bottom level of the ACS is modular. A number 
of neural networks co-exist, each of which is adapted 
to a specific modality, task, or group of input stim- 
uli. These modules can be developed in interacting 
with the world (computationally, through various de- 
composition methods [36.25]). However, some of them 
are formed evolutionarily, reflecting hardwired instincts 
and propensities [36.26]. Because of these networks, 
CLARION is able to handle very complex situations 
that are not amenable to simple rules. 

In the top level of the ACS, explicit symbolic con- 
ceptual knowledge is captured in the form of explicit 
symbolic rules (for details, see [36.24]). There are many 
ways in which explicit knowledge may be learned, in- 
cluding independent hypothesis testing and bottom-up 
learning. The basic process of bottom-up learning is as 
follows: if an action implicitly decided by the bottom 
level is successful, then the model extracts an explicit 
rule that corresponds to the action selected by the bot- 
tom level and adds the rule to the top level. Then, 
in subsequent interactions with the world, the model 
verifies the extracted rule by considering the outcome 
of applying the rule. If the outcome is not successful, 
then the rule should be made more specific; if the out- 
come is successful, the agent may try to generalize the 
rule to make it more universal [36.27]. After explicit 
rules have been learned, a variety of explicit reason- 
ing methods may be used. Learning explicit conceptual 
representations at the top level can also be useful in 
enhancing learning of implicit reactive routines at the 
bottom level [36.7]. The action-decision cycle in the 
ACS can be described by the following steps: 


1. Observe the current state of the environment 
2. Compute the value of each possible action in the 
current state in the bottom level 


3. Compute the value of each possible action in the 
current state in the top level 

4. Choose an appropriate action by stochastically se- 
lecting or combining the values in the top and 
bottom levels 

5. Perform the selected action 

6. Update the top and bottom levels according to the 
received feedback (if any) 

7. Go back to Step 1. 


36.4.2 The Non-Action-Centered Subsystem 


The NACS may be used for representing general knowl- 
edge about the world (i.e., the semantic memory and 
the episodic memory), and for performing various kinds 
of memory retrievals and inferences. The NACS is also 
composed of two levels (a top and a bottom level) and 
is under the control of the ACS (through its actions). 

At the bottom level, associative memory networks 
encode non-action-centered implicit knowledge. Asso- 
ciations are formed by mapping an input to an output 
(such as mapping 2 + 3 to 5). Backpropagation [36.7, 
28] or Hebbian [36.29] learning algorithms can be used 
to establish such associations between pairs of inputs 
and outputs. At the top level of the NACS, a general 
knowledge store encodes explicit non-action-centered 
knowledge [36.29, 30]. In this network, chunks (passive 
knowledge structures, similar to ACT-R) are specified 
through dimensional values (features). A node is set up 
in the top level to represent a chunk. The chunk node 
connects to its corresponding features represented as in- 
dividual nodes in the bottom level of the NACS [36.29, 
30]. Additionally, links between chunk nodes encode 
explicit associations between pairs of chunks, known 
as associative rules. Explicit associative rules may be 
learned in a variety of ways [36.24]. 

During reasoning, in addition to applying associa- 
tive rules, similarity-based reasoning may be employed 
in the NACS. Specifically, a known (given or inferred) 
chunk may be automatically compared with another 
chunk. If the similarity between them is sufficiently 
high, then the latter chunk is inferred. The similarity 
between chunks i and j is computed by using 


NeiNg 
f (ng) 


where Sci~cj is the similarity from i to chunk j, nine; 
counts the number of features shared by chunks i and j 
(i.e., the feature overlap), nej counts the total number 


(36.4) 


Sci~g = 


19E | d Hed 


692 


19€| d Hed 


Part D 


Neural Networks 


of features in chunk j, and f(x) is a slightly super- 
linear, monotonically increasing, positive function (by 
default, f(x) = x1). Thus, similarity-based reasoning 
in CLARION is naturally accomplished using (1) top- 
down activation by chunk nodes of their corresponding 
bottom-level feature-based representations, (2) calcu- 
lation of feature overlap between any two chunks (as 
in (36.4)), and (3) bottom-up activation of the top- 
level chunk nodes. This kind of similarity calculation 
is naturally accomplished in a multi-level cognitive ar- 
chitecture and represents a form of synergy between the 
explicit and implicit modules. Each round of reasoning 
in the NACS can be described by the following steps: 


1. Propagate the activation of the activated features in 
the bottom level 

2. Concurrently, fire all applicable associative rules in 
the top level 

3. Integrate the outcomes of top and bottom-level pro- 
cessing 

4. Update the activations in the top and bottom levels 
(e.g., similarity-based reasoning) 

5. Go back to Step 1 (if another round of reasoning is 
requested by the ACS). 


36.4.3 The Motivational 
and Meta-Cognitive Subsystems 


The motivational subsystem (MS) is concerned with 
drives and their interactions [36.31], which lead to ac- 
tions. It is concerned with why an agent does what 
it does. Simply saying that an agent chooses actions 
to maximize gains, rewards, reinforcements, or pay- 
offs leaves open the question of what determines these 
things. The relevance of the MS to the ACS lies pri- 
marily in the fact that it provides the context in which 
the goal and the reinforcement of the ACS are set. It 
thereby influences the working of the ACS, and by ex- 
tension, the working of the NACS. 

Dual motivational representations are in place in 
CLARION. The explicit goals (such as finding food) 
of an agent may be generated based on internal 
drives (for example, being hungry; see [36.32] for 
details). Beyond low-level drives (concerning physio- 
logical needs), there are also higher-level drives. Some 
of them are primary, in the sense of being hard wired, 
while others are secondary (derived) drives acquired 
mostly in the process of satisfying primary drives. 

The meta-cognitive subsystem (MCS) is closely 
tied to the MS. The MCS monitors, controls, and reg- 
ulates cognitive processes for the sake of improving 


cognitive performance [36.33, 34]. Control and regula- 
tion may be in the forms of setting goals for the ACS, 
setting essential parameters of the ACS and the NACS, 
interrupting and changing on-going processes in the 
ACS and the NACS, and so on. Control and regulation 
can also be carried out through setting reinforcement 
functions for the ACS. All of the above can be done on 
the basis of drive activations in the MS. The MCS is 
also made up of two levels: the top level (explicit) and 
the bottom level (implicit). 


36.4.4 Simulation Example: 
Minefield Navigation 


Sun etal. [36.7] empirically tested and simulated 
a complex minefield navigation task. In the empirical 
task, the subjects were seated in front of a computer 
monitor that displayed an instrument panel containing 
several gauges that provided current information on the 
status/location of a vehicle. The subjects used a joy- 
stick to control the direction and speed of the vehicle. 
In each trial, a random mine layout was generated, 
and the subjects had limited time to reach a target lo- 
cation without hitting a mine. Control subjects were 
trained for several consecutive days in this task. Sun 
and colleagues also tested three experimental condi- 
tions with the same amount of training but emphasized 
verbalization, over-verbalization, and dual-tasking (re- 
spectively). The human results show that learning was 
slower in the dual-task condition than in the single-task 
condition, and that a moderate amount of verbalization 
speeds up learning. However, the effect of verbalization 
is reversed in the over-verbalization condition; over- 
verbalization interfered with (slowed down) learning. 
In the CLARION simulation, simplified (explicit) 
rules were represented in the form state — action in the 
top level of the ACS. In the bottom level of the ACS, 
a backpropagation network was used to (implicitly) 
learn the input-output function using reinforcement 
learning. Reinforcement was received at the end of ev- 
ery trial. The bottom-level information was used to 
create and refine top-level rules (with bottom-up learn- 
ing). The model started out with no specific a priori 
knowledge about the task (the same as a typical sub- 
ject). The bottom level contained randomly initialized 
weights. The top level started empty and contained no 
a priori knowledge about the task (either in the form of 
instructions or rules). The interaction of the two levels 
was not determined a priori either: there was no fixed 
weight in combining outcomes from the two levels. The 
weights were automatically set based on relative perfor- 


Cognitive Architectures and Agents 


36.5 Cognitive Architectures as Models of Multi-Agent Interaction 


mance of the two levels on a periodic basis. The effects 
of the dual task and the various verbalization condi- 
tions were modeled using rule-learning thresholds so 
that more/less activities could occur at the top level. 
The CLARION simulation results closely matched the 
human results [36.7]. In addition, the human and sim- 


ulated data were input into a common ANOVA and 
no Statistically significant difference between human 
and simulated data was found in any of the conditions. 
Hence, CLARION did a good job of simulating detailed 
human data in the minefield navigation task. More de- 
tails about this simulation can be found in [36.7]. 


36.5 Cognitive Architectures as Models of Multi-Agent Interaction 


Most of the work in social simulation assumes rudimen- 
tary cognition on the part of agents. Agent models have 
frequently been custom-tailored to the task at hand, of- 
ten with a restricted set of highly domain-specific rules. 
Although this approach may be adequate for achieving 
the limited objectives of some social simulations, it is 
overall unsatisfactory. For instance, it limits the realism, 
and hence applicability of social simulation, and more 
importantly it also precludes any possibility of resolv- 
ing the theoretical question of the micro—macro link. 

Cognitive models, especially cognitive architec- 
tures, may provide better grounding for understanding 
multi-agent interaction. This can be achieved by incor- 
porating realistic constraints, capabilities, and tenden- 
cies of individual agents in terms of their psychological 
processes (and maybe even in terms of their physical 
embodiment) and their interactions with their environ- 
ments (which include both physical and social envi- 
ronments). Cognitive architectures make it possible to 
investigate the interaction of cognition/motivation on 
the one hand and social institutions and processes on 
the other, through psychologically realistic agents. The 
results of the simulation may demonstrate significant 
interactions between cognitive-motivational factors and 
social-environmental factors. Thus, when trying to un- 
derstand social processes and phenomena, it may be 
important to take the psychology of individuals into 
consideration given that detailed computational mod- 
els of cognitive agents that incorporate a wide range 
of psychological functionalities have been developed in 
cognitive science. 

For example, Sun and Naveh simulated an organi- 
zational classification decision-making task using the 
CLARION cognitive architecture [36.35]. In a classifi- 
cation decision-making task, agents gather information 
about problems, classify them, and then make further 
decisions based on the classification. In this case, the 
task is to determine whether a blip on a screen is a hos- 
tile aircraft, a flock of geese, or a civilian aircraft. In 
each case, there is a single object in the airspace. The 


object has nine different attributes, each of which can 
take on one of three possible values (e.g., its speed 
can be low, medium, or high). An organization must 
determine the status of an observed object: whether 
it is friendly, neutral, or hostile. There are a total of 
19683 possible objects, and 100 problems are chosen 
randomly from this set. 

Critically, no one single agent has access to all the 
information necessary to make a choice. Decisions are 
made by integrating separate decisions made by differ- 
ent agents, each of which is based on a different subset 
of information. In terms of organizational structures, 
there are two major types of interest: teams and hier- 
archies. In teams, decision-makers act autonomously, 
individual decisions are treated as votes, and the organi- 
zation decision is the majority decision. In hierarchies, 
agents are organized in a chain of command, such that 
information is passed from subordinates to superiors, 
and the decision of a superior is based solely on the 
recommendations of his/her subordinates. In this task, 
only a two-level hierarchy with nine subordinates and 
one supervisor is considered. 

In addition, organizations are distinguished by the 
structure of information accessible to each agent. There 
are two types of information access: distributed access, 
in which each agent sees a different subset of three at- 
tributes (no two agents see the same subset of three 
attributes), and blocked access, in which three agents 
see exactly the same subset of attributes. In both cases, 
each attribute is accessible to three agents. 

The human experiments by Carley et al. [36.36] 
were done in a 2 x 2 fashion (organization x information 
access). The data showed that humans generally per- 
formed better in team situations, especially when dis- 
tributed information access was in place. Moreover, 
distributed information access was generally better 
than blocked information access. The worst perfor- 
mance occurred when hierarchical organizational struc- 
ture and blocked information access were used in 
conjunction. 


693 


S'9E | d Hed 


694 PartD 


Neural Networks 


9°9€| d Hed 


The results of the CLARION simulations closely 
matched the patterns of the human data, with teams out- 
performing hierarchal structures, and distributed access 
being superior to blocked access. As in the human data, 
the effects of organization and information access were 
present, but more importantly the interaction of these 
two factors with length of training was reproduced. 
These interactions reflected the following trends: (1) the 
superiority of team and distributed information access 
at the start of the learning process and, (2) either the dis- 
appearance or reversal of these trends towards the end. 
These trends persisted robustly across a wide variety of 
settings of cognitive parameters, and did not critically 
depend on any one setting of these parameters. Also, 
as in humans, performance was not grossly skewed to- 
wards one condition or the other. 


36.5.1 Extention 


One advantage of using a more cognitive agent in so- 
cial simulations is that we can address the question of 
what happens when cognitive parameters are varied. 
Because CLARION captures a wide range of cognitive 
processes, its parameters are generic (rather than task 
specific). Thus, one has the opportunity of studying so- 
cial and organizational issues in the context of a general 
theory of cognition. Below we present some of the re- 
sults observed (details can be found in [36.35]). 
Varying the parameter controlling the probability of 
selecting implicit versus explicit processing in CLAR- 
ION interacted with the length of training. Explicit rule 
learning was far more useful at the early stages of 
learning, when increased reliance on rules tended to 
boost performance (compared with performance toward 
the end of the learning process). This is because ex- 
plicit rules are crisp guidelines that are based on past 
success, and as such, they provide a useful anchor at 
the uncertain early stages of learning. However, by the 
end of the learning process, they become no more re- 
liable than highly trained networks. This corresponds 


36.6 General Discussion 


This chapter reviewed the most popular psycholog- 
ically-oriented cognitive architectures with some ex- 
ample applications in human developmental learning, 
problem-solving, navigation, and cognitive social sim- 
ulations. ACT* (ACT-R’s early version; [36.39]) and 
Soar [36.15] were some of the first cognitive architec- 


to findings in human cognition, where there are indica- 
tions that rule-based learning is more widely used in 
the early stages of learning, but is later increasingly 
supplanted by similarity-based processes and skilled 
performance [36.37, 38]. Such trends may partially ex- 
plain why hierarchies did not perform well initially; 
because a hierarchy’s supervisor was burdened with 
a higher input dimensionality, it took a longer time to 
encode rules (which were, nevertheless, essential at the 
early stages of learning). 

Another interesting result was the effect of vary- 
ing the generalization threshold. The generalization 
threshold determines how readily an agent generalizes 
a successful rule. It was better to have a higher rule gen- 
eralization threshold than a lower one (up to a point). 
That is, if one restricts the generalization of rules to 
those rules that have proven to be relatively success- 
ful, the result is a higher-quality rule set, which leads to 
better performance in the long run. 

This CLARION simulation showed that some cog- 
nitive parameters (e.g., learning rate) had a monolithic, 
across-the-board effect under all conditions, while in 
other cases, complex interactions of factors were at 
work (see [36.35] for full details of the analysis). This 
illustrates the importance of limiting one’s social simu- 
lation conclusions to the specific cognitive context in 
which human data were obtained (in contrast to the 
practice of some existing social simulations). By using 
CLARION, Sun and Naveh [36.35] were able to ac- 
curately capture organizational performance data and, 
moreover, to formulate deeper explanations for the re- 
sults observed. In cognitive architectures, one can vary 
parameters and options that correspond to cognitive 
processes and test their effects on collective perfor- 
mance. In this way, cognitive architectures may be used 
to predict human performance in social/organizational 
settings and, furthermore, to help to improve collective 
performance by prescribing optimal or near-optimal 
cognitive abilities for individuals for specific collective 
tasks and/or organizational structures. 


tures available and have been around since the early 
1980s, while CLARION was first proposed in the 
mid-1990s [36.30]. This chronology is crucial when ex- 
ploring their learning capacity. ACT* and Soar were 
developed before the connectionist revolution [36.40], 
and were, therefore, implemented using knowledge- 


Cognitive Architectures and Agents | References 695 


rich production systems [36.41]. In contrast, CLARION 
was proposed after the connectionist revolution and was 
implemented using neural networks. While some at- 
tempts have been made to implement ACT-R [36.42] 
and Soar [36.43] with neural networks, these archi- 
tectures remain mostly knowledge-rich production sys- 
tems grounded in the artificial intelligence tradition. 
One of the most important impacts of the connection- 
ist revolution has been data-driven learning rules (e.g., 
backpropagation) that allows for autonomous learning. 
CLARION was created within this tradition, and every 
component in CLARION has been implemented us- 
ing neural networks. For instance, explicit knowledge 
may be implemented using linear, two-layer neural net- 
works [36.7, 23, 28, 29], while implicit knowledge has 
been implemented using nonlinear multilayer back- 
propagation networks in the ACS [36.7, 29] and recur- 
rent associative memory networks in the NACS [36.23, 
29]. This general philosophy has also been applied to 
modeling the MS and the MCS using linear (explicit) 
and nonlinear (implicit) neural networks [36.44]. As 
such, CLARION requires less pre-coded knowledge 
to achieve its goals, and can be considered more au- 
tonomous. 

While the different cognitive architectures were 
motivated by different problems and took different im- 
plementation approaches, they share some theoretical 
similarities. For instance, Soar is somewhat similar to 
the top levels of CLARION. It contains production rules 
that fire in parallel and cycles until a goal is reached. In 
CLARION, top-level rules in the NACS fire in paral- 
lel in cycles (under the control of the ACS). However, 
CLARION includes a distinction between action-cen- 


References 


tered and non-action-centered knowledge. While this 
distinction has been added in Soar 9 [36.18], the addi- 
tional distinction between explicit and implicit knowl- 
edge (one of the main assumptions in CLARION) was 
not. The inclusion of implicit knowledge in the bottom 
level of CLARION allows for an automatic represen- 
tation of similarity-based reasoning, which is absent in 
Soar. While Soar can certainly account for similarity- 
based reasoning, adding an explicit (and ad hoc) repre- 
sentation of similarity can become cumbersome when 
a large number of items are involved. 

ACT-R initially took a different approach. Work on 
the ACT-R cognitive architecture has clearly focused on 
psychological modeling from the very beginning and, 
as such, it includes more than one long-term memory 
store, distinguishing between procedural and declar- 
ative memories (similar to CLARION). In addition, 
ACT-R has a rudimentary representation of explicit and 
implicit memories: explicit memory is represented by 
symbolic structures (i. e., chunks and production rules), 
while implicit memory is represented by the activation 
of the structures. In contrast, the distinction between ex- 
plicit and implicit memories in CLARION is one of the 
main focuses of the architecture, and a more detailed 
representation of implicit knowledge has allowed for 
a natural representation of similarity-based reasoning as 
well as natural simulations of many psychological data 
sets [36.7, 28, 29]. Yet, ACT-R memory structures have 
been adequate for simulating many data sets with over 
30 years of research. Future work should be devoted to 
a detailed comparison of ACT-R, Soar, and CLARION 
using a common set of tasks to more accurately com- 
pare their modeling paradigms, capacities, and limits. 


36.1 A. Newell: Unified Theories of Cognition (Harvard 
Univ. Press, Cambridge 1990) 

36.2 S. Roberts, H. Pashler: How persuasive is a good 
fit? A comment on theory testing, Psychol. Rev. 107, 
358-367 (2000) 

36.3 R. Sun: Desiderata for cognitive architectures, Phi- 
los. Psychol. 17, 341-373 (2004) 

36.4 S. Franklin, F.G. Patterson Jr.: The Lida architecture: 
Adding new modes of learning to an intelligent, 
autonomous, software agent, Integr. Design Pro- 
cess Technol. IDPT-2006, San Diego (Society for 
Design and Process Science, San Diego 2006) p. 8 

36.5 G.A. Carpenter, S. Grossberg: A massively parallel 
architecture for a self-organizing neural pattern 
recognition machine, Comput. Vis. Graph. Image 
Process. 37, 54-115 (1987) 


36.6 P. Langley, J.E. Laird, S. Rogers: Cognitive architec- 
tures: Research issues and challenges, Cogn. Syst. 
Res. 10, 141-160 (2009) 

36.7 R. Sun, E. Merrill, T. Peterson: From implicit skills 
to explicit knowledge: A bottom-up model of skill 
learning, Cogn. Sci. 25, 203-244 (2001) 

36.8 J.R. Anderson, D. Bothell, M.D. Byrne, S. Douglass, 
C. Lebiere, Y. Qin: An integrated theory of the mind, 
Psychol. Rev. 111, 1036-1060 (2004) 

36.9 N.A. Taatgen, J.R. Anderson: Constraints in cog- 
nitive architectures. In: The Cambridge Hand- 
book of Computational Psychology, ed. by R. Sun 
(Cambridge Univ. Press, New York 2008) pp. 170- 
185 

36.10 J.R. Anderson: The Adaptive Character of Thought 
(Erlbaum, Hillsdale 1990) 


9E | d Hed 


696 PartD 


Neural Networks 


9E | d Hed 


36.11 


36.12 


36.13 


36.14 


36.15 


36.16 


36.17 


36.18 


36.19 


36.20 


36.21 


36.22 


36.23 


36.24 


36.25 


36.26 


36.27 


J.R. Anderson, D. Bothell, M.D. Byrne, S. Douglass, 
C. Lebiere, Y. Qin: An integrated theory of the mind, 
Psychol. Rev. 111, 1037 (2004) 

N.A. Taatgen, C. Lebiere, J.R. Anderson: Modeling 
paradigms in ACT-R. In: Cognition and Multi-Agent 
Interaction: From Cognitive Modeling to Social Sim- 
ulation, ed. by R. Sun (Cambridge Univ. Press, New 
York 2006) pp. 29-52 

N.A. Taatgen, J.R. Anderson: Why do children learn 
to say "broke"? A model of learning the past tense 
without feedback, Cognition 86, 123-155 (2002) 
G.F. Marcus, S. Pinker, M. Ullman, M. Hollander, 
T.J. Rosen, F. Xu: Overregularization in language 
acquisition, Monogr. Soc. Res. Child Dev. 57, 1-182 
(1992) 

J.E. Laird, A. Newell, P.S. Rosenbloom: Soar: An ar- 
chitecture for general intelligence, Artif. Intell. 33, 
1-64 (1987) 

J.F. Lehman, J. Laird, P. Rosenbloom: A Gentle 
Introduction to Soar, an Architecture for Human 
Cognition (University of Michigan, Ann Arbor 2006) 
R.E. Wray, R.M. Jones: Considering Soar as an agent 
architecture. In: Cognition and Multi-Agent Inter- 
action: From Cognitive Modeling to Social Simula- 
tion, ed. by R. Sun (Cambridge Univ. Press, New York 
2006) pp. 53-78 

J.E. Laird: Extending the Soar cognitive architec- 
ture, Proc. 1st Conf. Artif. General Intell. (IOS Press, 
Amsterdam 2008) pp. 224-235 

D.L. Schacter, A.D. Wagner, R.L. Buckner: Memory 
systems of 1999. In: The Oxford Handbook of Mem- 
ory, ed. by E. Tulving, F.l.M. Craik (Oxford Univ. 
Press, New York 2000) pp. 627-643 

S. Nason, J.E. Laird: Soar-RL: Integrating reinforce- 
ment learning with Soar, Cogn. Syst. Res. 6, 51-59 
(2005) 

K.R. Scherer: Appraisal considered as a process of 
multi-level sequential checking. In: Appraisal Pro- 
cesses in Emotion: Theory, Methods, Research, ed. 
by K.R. Scherer, A. Schor, T. Johnstone (Oxford Univ. 
Press, New York 2001) pp. 92-120 

C. Watkins: Learning from Delayed Rewards (Cam- 
bridge Univ., Cambridge 1990) 

R. Sun, S. Hélie: Psychologically realistic cognitive 
agents: Taking human cognition seriously, J. Exp. 
Theor. Artif. Intell. 25(1), 65-92 (2013) 

R. Sun: Duality of the Mind: A Bottom-up Approach 
Toward Cognition (Lawrence Erlbaum Associates, 
Mahwah 2002) 

R. Sun, T. Peterson: Multi-agent reinforcement 
learning: Weighting and partitioning, Neural Netw. 
12, 127-153 (1999) 

L. Hirschfield, S. Gelman (Eds.): Mapping the Mind: 
Domain Specificity in Cognition and Culture (Cam- 
bridge Univ. Press, Cambridge 1994) 

R. Michalski: A theory and methodology of induc- 
tive learning, Artif. Intell. 20, 111-161 (1983) 


36.28 


36.29 


36.30 


36.31 


36.32 


36.33 


36.34 


36.35 


36.36 


36.37 


36.38 


36.39 


36.40 


36.41 


36.42 


36.43 


36.44 


R. Sun, P. Slusarz, C. Terry: The interaction of 
the explicit and the implicit in skill learning: 
A dual-process approach, Psychol. Rev. 112, 159-192 
(2005) 

S. Hélie, R. Sun: Incubation, insight, and cre- 
ative problem solving: A unified theory and a 
connectionist model, Psychol. Rev. 117, 994-1024 
(2010) 

R. Sun: Integrating Rules and Connectionism for 
Robust Commonsense Reasoning (Wiley, New York 
1994) 

F. Toates: Motivational Systems (Cambridge Univ. 
Press, Cambridge 1986) 

R. Sun: Motivational representations within a com- 
putational cognitive architecture, Cogn. Comput. 1, 
91-103 (2009) 

T. Nelson (Ed.): Metacognition: Core Readings (Allyn 
and Bacon, Boston 1993) 

J.D. Smith, W.E. Shields, D.A. Washburn: The 
comparative psychology of uncertainty monitoring 
and metacognition, Behav. Brain Sci. 26, 317-373 
(2003) 

R. Sun, |. Naveh: Simulating organizational 
decision-making using a cognitively realistic agent 
model, J. Artif. Soc. Soc. Simul. 7(3) (2004) 

K. M. Carley, M. J. Prietula, Z. Lin: Design ver- 
sus cognition: The interaction of agent cognition 
and organizational design on organizational per- 
formance, J. Artif. Soc. Soc. Simul. 1 (1998) 

S. Hélie, J.G. Waldschmidt, F.G. Ashby: Automaticity 
in rule-based and information-integration cate- 
gorization, Atten. Percept. Psychophys. 72, 1013- 
1031 (2010) 

S. Hélie, J.L. Roeder, F.G. Ashby: Evidence for cor- 
tical automaticity in rule-based categorization, 
J. Neurosci. 30, 14225-14234 (2010) 

J.R. Anderson: The Architecture of Cognition (Har- 
vard Univ. Press, Cambridge 1983) 

D. Rumelhart, J. McClelland, The PDP Research 
Group (Eds.): Parallel Distributed Processing: Ex- 
plorations in the Microstructure of Cognition. Vol. 
1: Foundations (MIT Pres, Cambridge 1986) 

S. Russell, P. Norvig: Artificial Intelligence: A Mod- 
ern Approach (Prentice Hall, Upper Saddle River 
1995) 

C. Lebiere, J.R. Anderson: A connectionist imple- 
mentation of the ACT-R production system, Proc. 
15th Annu. Conf. Cogn. Sci. Soc. (Lawrence Erlbaum 
Associates, Hillsdale 1993) pp. 635-640 

B. Cho, P.S. Rosenbloom, C.P. Dolan: Neuro- 
Soar: A neural-network architecture for goal- 
oriented behavior, Proc. 13th Annu. Conf. Cogn. Sci. 
Soc. (Lawrence Erlbaum Associates, Hillsdale 1991) 
pp. 673-677 

N. Wilson, R. Sun, R. Mathews: A motivationally- 
based simulation of performance degradation un- 
der pressure, Neural Netw. 22, 502-508 (2009) 


37. Embodied Intelligence 


Angelo Cangelosi, Josh Bongard, Martin H. Fischer, Stefano Nolfi 


Embodied intelligence is the computational ap- 
proach to the design and understanding of 
intelligent behavior in embodied and situated 
agents through the consideration of the strict 
coupling between the agent and its environment 
(situatedness), mediated by the constraints of the 
agent's own body, perceptual and motor system, 
and brain (embodiment). The emergence of the 
field of embodied intelligence is closely linked 
to parallel developments in computational in- 
telligence and robotics, where the focus is on 
morphological computation and sensory-motor 
coordination in evolutionary robotics models, and 
in neuroscience and cognitive sciences where 
the focus is on embodied cognition and devel- 
opmental robotics models of embodied symbol 
learning. This chapter provides a theoretical 
and technical overview of some principles of 
embodied intelligence, namely morphological 
computation, sensory-motor coordination, and 
developmental embodied cognition. It will also 
discuss some tutorial examples on the model- 
ing of body/brain/environment adaptation for the 
evolution of morphological computational agents, 
evolutionary robotics model of navigation and ob- 
ject discrimination, and developmental robotics 


37.1 Introduction to Embodied Intelligence. 697 
37.2 Morphological Computation 
for Body-Behavior Coadaptation .......... 698 
37.2.1 The Counterintuitive Nature 
of Morphological Computation ... 699 
37.2.2 Evolution and Morphological 


Computation ossessi 700 
37.3 Sensory-Motor Coordination 

in Evolving Robots .....................:c::ceeeee 701 
37.3.1 Enabling the Discovery 

of Simple Solutions ................... 701 
37.3.2 Accessing and Generating 

Information Through Action ....... 702 
37.3.3 Channeling the Course 

of the Learning Process.............. 703 


37.4 Developmental Robotics for Higher 
Order Embodied Cognitive Capabilities.. 703 
37.4.1 Embodied Cognition 


and Developmental Robots. ........ 703 

37.4.2 Embodied Language Learning..... 705 
37.4.3 Number and Space................... 706 

37.5 COMCIUSION.................ccccccccceeeeesseeeeeeeees 709 
Referentes. oe sehscsnsccertuscsensctudensssaedtentandenssens 711 


models of language and numerical cognition in 
humanoid robots. 


37.1 Introduction to Embodied Intelligence 


Organisms are not isolated entities which develop their 
sensory-motor and cognitive skills in isolation from 
their social and physical environment, and indepen- 
dently from their motor and sensory systems. On the 
contrary, behavioral and cognitive skills are dynamical 
properties that unfold in time and arise from a large 
number of interactions between the agents’ nervous 
system, body, and environment [37.1—7]. Embodied in- 
telligence is the computational approach to the design 
and understanding of intelligent behavior in embod- 
ied and situated agents through the consideration of 


the strict coupling between the agent and its environ- 
ment (situatedness), mediated by the constraints of the 
agent’s own body, perceptual and motor system, and 
brain (embodiment). 

Historically, the field of embodied intelligence has 
its origin from the development and use of bio-inspired 
computational intelligence methodologies in computer 
science and robotics, and the overcoming of the limita- 
tions of symbolic approaches typical of classical artifi- 
cial intelligence methods. As argued in Brooks’ [37.2] 
seminal paper on Elephants don’t play chess, the study 


697 


v 
far] 

| 

e 
o 
w 
N 
= 


698 PartD 


Neural Networks 


ale | d Hed 


of apparently simple behaviors, such as locomotion 
and motor control, permits an understanding of the 
embodied nature of intelligence, without the require- 
ment to start from higher order abstract skills as those 
involved in chess playing algorithms. Moreover, the 
emergence of the field of embodied intelligence is 
closely linked to parallel developments in robotics, with 
the focus on morphological computation and sensory— 
motor coordination in evolutionary and developmen- 
tal robotics models, and in neuroscience and cogni- 
tive sciences with the focus on embodied cognition 
(EC). 

The phenomenon of morphological computation 
concerns the observation that a robot’s (or animal’s) 
body plan may perform computations: A body plan 
that allows the robot (or animal) to passively exploit 
interactions with its environment may perform compu- 
tations that lead to successful behavior; in another body 
plan less well suited to the task at hand, those com- 
putations would have to be performed by the control 
policy [37.8-10]. If both the body plans and control 
policies of robots are evolved, evolutionary search may 
find robots that exhibit more morphological computa- 
tion than an equally successful robot designed by hand 
(see more details in Sect. 37.2). 

The principle of sensory-motor coordination, 
which concerns the relation between the characteris- 
tics of the agents’ control policy and the behaviors 
emerging from agent/environmental interactions, has 
been demonstrated in numerous evolutionary robotics 
models [37.6]. Experiments have shown how adap- 
tive agents can acquire an ability to coordinate their 
sensory and motor activity so as to self-select their 
forthcoming sensory experiences. This sensory—motor 
coordination can play several key functions such as en- 
abling the agent to access the information necessary to 
make the appropriate behavioral decision, elaborating 
sensory information, and reducing the complexity of the 
agents’ task to a manageable level. These two themes 
will be exemplified through the illustration of evolu- 
tionary robotics experiments in Sect. 37.3 in which the 
fine-grained characteristics of the agents’ neural control 
system and body are subjected to variations (e.g. gene 
mutation) and in which variations are retained or dis- 
carded on the basis of their effects at the level of the 


overall behavior exhibited by the agent in interaction 
with the environment. 

In cognitive and neural sciences, the term em- 
bodied cognition (EC) [37.11, 12] is used to refer to 
systematic relationships between an organism’s cogni- 
tive processes and its perceptual and response reper- 
toire. Notwithstanding the many interpretations of this 
term [37.13], the broadest consensus of the proponents 
of EC is that our knowledge representations encom- 
pass the bodily activations that were present when 
we initially acquired this knowledge (for differentia- 
tions, [37.14]). This view helps us to understand the 
many findings of modality-specific biases induced by 
cognitive computations. Examples of EC in psychology 
and cognitive science can be sensory—motor (e.g., a sys- 
tematic increase in comparison time with angular dis- 
parity between two views of the same object [37.15]), or 
conceptual (e.g., better recall of events that were experi- 
enced in the currently adopted body posture [37.16]), or 
emotional in nature (e.g., interpersonal warmth induced 
by a warm handheld object [37.17]). Such findings were 
hard to accommodate under the more traditional views 
where knowledge was presumed symbolic, amodal and 
abstract and thus dissociated from sensory input and 
motor output processes. 

Embodied cognition experiments in psychology 
have inspired the design of developmental robotics 
models [37.18] which exploit the ontogenetic inter- 
action between the developing (baby) robot and its 
social and physical environment to acquire both sim- 
ple sensory—motor control strategies and higher order 
capabilities such as language and number learning 
(Sect. 37.4). 

To provide the reader with both a theoretical and 
technical understanding of the principles of morpho- 
logical computation, sensory—motor coordination and 
developmental EC the following three sections will 
review the progress in these fields, and analyze in de- 
tail some key studies as examples. The presentation of 
studies on the modeling of both sensory—motor tasks 
(such as locomotion, navigation, and object discrimi- 
nation) and of higher order cognitive capabilities (such 
as linguistic and numerical cognition) demonstrates the 
impact of embodied intelligence in the design of a vari- 
ety of perceptual, motor, and cognitive skills. 


37.2 Morphological Computation for Body-Behavior Coadaptation 


Embodied intelligence dictates that there are certain 
body plans and control policies that, when combined, 


will produce some desired behavior. For example, 
imagine that the desired task is active categorical per- 


Embodied Intelligence 


37.2 Morphological Computation for Body-Behavior Coadaptation 


ception (ACP) [37.19, 20]. ACP requires a learner to 
actively interact with objects in its environment to clas- 
sify those objects. This stands in contrast to passive 
categorization whereby an agent observes objects from 
a distance — perhaps it is fed images of objects or views 
them through a camera — and labels the objects accord- 
ing to their perceived class. In order for an animal or 
robot to perform ACP, it must not only possess a con- 
trol policy that produces as output the correct class for 
the object being manipulated, but also some manipula- 
tor with which to physically affect (and be affected by) 
the object. 

One consequence of embodied intelligence is that 
certain pairings of body and brain produce the desired 
behavior, and others do not. Returning to the example 
of ACP, if a robot’s arm is too short to reach the objects 
then it obviously will not be able to categorize them. 
Imagine now a second robot that possesses an arm of 
the requisite length but can only bring the back of its 
hand into contact with the objects. Even if this robot’s 
control policy distinguishes between round and edged 
objects based on the patterned firing of touch sensors 
embedded in its palm, this robot will also not be able to 
perform ACP. 

A further consequence of embodied intelligence is 
that some body plans may require a complex control 
policy to produce successful behavior, while another 
body plan may require a simpler control policy. This has 
been referred to as the morphology and control tradeoff 
in the literature [37.7]. Continuing the ACP example, 
consider a third robot that can bring its palm and fingers 
into contact with the objects, but only possesses a single 
binary touch sensor in its palm. In order to distinguish 
between round and edged objects, this robot will require 
a control policy that performs some complex signal pro- 
cessing on the time series data produced by this single 
sensor during manipulation. A fourth robot however, 
equipped with multiple tactile sensors embedded in its 
palm and fingers, may be able to categorize objects im- 
mediately after grasping them: Perhaps round objects 
produce characteristic static patterns of tactile signals 
that are markedly different from those patterns pro- 
duced when grasping edged objects. 

The morphology and control tradeoff however 
raises the question as to what is being traded. It has 
been argued that what is being traded is computa- 
tion [37.7,8]. If two robots succeed at a given task, 
and each robot is equipped with the simplest control 
policy that will allow that robot to succeed, but one 
control policy performs fewer computations than the 
other control policy, then the body plan of the robot 


equipped with the simpler control policy must perform 
the missing computations required to succeed at the 
task. 

This phenomenon of a robot’s (or animal’s) body 
plan performing computation has been termed mor- 
phological computation [37.8—10]. Paul [37.8] outlined 
a theoretical robot that uses its body to compute the 
XOR function. In another study [37.9] it was shown 
how the body of a vacuum cleaning robot could literally 
replace a portion of its artificial neural network con- 
troller, thus subsuming the computation normally per- 
formed by that part of the control policy into the robot’s 
body. Pfeifer and Gomez [37.21] describe a number of 
other robots that exhibit the phenomenon of morpho- 
logical computation. 


37.2.1 The Counterintuitive Nature 
of Morphological Computation 


All of the robots outlined by Pfeifer and Gomez [37.21] 
were designed manually; in some cases the control poli- 
cies were automatically optimized. If for each task there 
are a spectrum of robot body plan/control policy pair- 
ings that achieve the task, one might ask where along 
this spectrum the human-designed robots fall. That is, 
what mixtures of morphological computation and con- 
trol computation do human designers tend to favor? 
The bulk of the artificial intelligence literature, since 
the field’s beginnings in the 1950s, seems to indicate 
that humans exhibit a cognitive chauvinism: we tend 
to favor control complexity over morphological com- 
plexity. Classical artificial intelligence dispensed with 
the body altogether: it was not until the 1980s that the 
role of morphology in intelligent behavior was explic- 
itly stated [37.2]. As a more specific example, object 
manipulation was first addressed by creating rigid, ar- 
ticulated robot arms that required complex control poli- 
cies to succeed [37.22]. Later, it was realized that soft 
manipulators could simplify the amount of control re- 
quired for successful manipulation (e.g., [37.23]). Most 
recently, a class of robot manipulators known as jam- 
ming grippers’ was introduced [37.24]. In a jamming 
gripper, a robot arm is tipped with a bag of granular 
material such that when air is removed from the bag 
the grains undergo a phase transition into a jammed, 
solid-like state. The control policies for jamming grip- 
pers are much simpler than those required for rigid or 
even soft multifingered dexterous manipulators: at the 
limit, the controller must switch the manipulator be- 
tween just two states (grip or release), regardless of the 
object. 


699 


ale | d Hed 


700 Part D 


Neural Networks 


ale | d Hed 


Despite the fact that the technology for creating 
jamming grippers has existed for decades, it took a long 
time for this class of manipulators to be discovered. 
In other branches of robotics, one can discern a sim- 
ilar historical pattern: new classes of robot body plan 
were successively proposed that required less and less 
explicit control. In the field of legged locomotion for 
example, robots with whegs (wheel-leg hybrids) were 
shown to require less explicit control than robots with 
legs to enable travel over rough terrain [37.25]. 

These observations suggest that robots with more 
morphological computation are less intuitive for hu- 
mans to formulate and then design than robots with 
less morphological computation. However, there may 
be a benefit to creating robots that exhibit significant 
amounts of morphological computation. For example, 
hybrid dynamic walkers require very little control and 
are much more energy efficient compared to fully ac- 
tuated legged robots [37.26]. It has been argued that 
tensegrity robots also require relatively little control 
compared to robots composed of serially linked rigid 
components, and this class of robot has several desir- 
able properties such as the ability to absorb and recover 
from external perturbations [37.9]. 

So, if robots that exhibit morphological compu- 
tation are desirable, yet it is difficult for humans to 
navigate in this part of the space of possible robots, can 
an automated search method be used to discover such 
robots? 


37.2.2 Evolution and Morphological 
Computation 


One of the advantages of using evolutionary algorithms 
to design robots, compared to machine learning meth- 
ods, is that both the body plan and the control policy can 
be placed under evolutionary control [37.27]. Typically, 
machine learning methods optimize some of the param- 
eters of a control policy with a fixed topology. However, 
if the body plans and control policies of robots are 
evolved, and there is sufficient variation within the 
population of evolving robots, search may discover 
multiple successful robots that exhibit varying degrees 
of morphological computation. Or, alternatively, if mor- 
phological computation confers a survival advantage 
within certain contexts, a phylogeny of robots may 
evolve that exhibit increasing amounts of morpholog- 
ical computation. 

A recent pair of experiments illustrates how mor- 
phological computation may be explored. An evolution- 
ary algorithm was employed to evolve the body plans 


Fig. 37.1ła-d A sample of four evolved robots with differ- 
ing amounts of morphological complexity. (a) A simple- 
shaped robot that evolved to locomote over flat ground. 
(b-d) Three sample robots, more morphologically com- 
plex than the robot in (a), that evolved in icy environments 
(after Auerbach and Bongard [37.28]). To view videos of 
these robots see [37.29] 


and control policies of robots that must move in one of 
two environments. The first environment included noth- 
ing else other than a flat, high-friction ground plane 
(Fig. 37.1a). The second environment was composed 
of a number of low-friction bars that sit atop the high- 
friction ground plane (Fig. 37.1b-d). These bars can 
be thought of as ice distributed across a flat landscape. 
In order for robots to move across the icy terrain, 
they must evolve appendages that are able to reach 
down between the icy blocks, come into contact with 
the high-friction ground, and push or pull themselves 
forward. 

It was found that robots evolved to travel over the 
ice had more complex shapes than those evolved to 
travel over flat ground (compare the robot in Fig. 37.la 
to those in Fig. 37.1b—d) [37.28]. However, it was 
also found that the robots that travel over ice had 
fewer mechanical degrees of freedom (DOFs) than the 
robots evolved to travel over flat ground [37.30]. If 
a robot possesses fewer mechanical DOFs, one can 
conclude that it has a simpler control policy, because 
there are fewer motors to control. It seems that the 
robots evolved to travel over ice do so in the follow- 
ing manner: the complex shapes of their appendages 
cause the appendages to reach down into the crevices 
between the ice without explicit control; the simple con- 
trol policy then simply sweeps the appendages back and 


Embodied Intelligence 


37.3 Sensory—Motor Coordination in Evolving Robots 


forth, horizontally, to in effect skate along the tops of 
the ice. In contrast, robots evolved to travel over flat 
ground must somehow push back, reach up, and pull 
forward — using several mechanical DOFs — to move 
forward. 

One could conclude from these experiments that the 
robots evolved to travel over ice perform more mor- 
phological computation than those evolved to travel 


over flat ground: the former robots have more com- 
plex bodies but simpler control policies than the latter 
robots, yet both successfully move in their environ- 
ments. Much more work is required to generalize this 
result to different robots, behaviors, and environments, 
but this initial work suggests that evolutionary robotics 
may be a unique tool for studying the phenomenon of 
morphological computation. 


37.3 Sensory-Motor Coordination in Evolving Robots 


The actions performed by embodied and situated agents 
inevitably modify the agent—environmental relation 
and/or the environment. The type of stimuli that an 
agent will sense at the next time step at t+; crucially 
depends, for example, on whether the agent turns left 
or right at the current time t. Similarly, the stimuli 
that an agent will experience next at time t+; when 
standing next to an object depend on the effort with 
which it will push the object at time t. This implies 
that actions might play direct and indirect adaptive 
roles. Actions playing a direct role are, for example, 
foraging or predator escaping behaviors that directly 
impact on the agent’s own survival chances. Action 
playing indirect roles consists, for example, in wander- 
ing through the environment to spot interesting sensory 
information (e.g., the perception of a food area that 
might eventually afford foraging actions) or playing 
a fighting game with a conspecific that might en- 
able the agent to acquire capacities that might later 
be exploited to deal with aggressive individuals. The 
possibility to self-select useful sensory stimuli through 
action is referred with the term sensory—motor coordi- 
nation. 

Together with morphological computation, senso- 
ry—motor coordination constitutes a fundamental prop- 
erty of embodied and situated agents and one of most 
important characteristic that can be used to differen- 
tiate these systems from alternative forms of intelli- 
gence. In the following sections, we illustrate three 
of the key roles that can be played by sensory—motor 
coordination: 


i) The discovery of parsimonious behavioral strategies 

ii) The access and generation of useful sensory infor- 
mation through action and active perception 

iii) The constraining and channeling of the learning 
process during evolution and development. 


37.3.1 Enabling the Discovery 
of Simple Solutions 


Sensory—motor coordination can be exploited to find 
solutions relying on more parsimonious control policies 
than alternative solutions not relying, or relying less, on 
this principle. An example is constituted by a set of ex- 
periments in which a Khepera robot [37.31] endowed 
with infrared and speed sensors, has been evolved for 
the ability to remain close to large cylindrical objects 
(food) while avoiding small cylindrical objects (dan- 
gers). From a passive perspective, that does not take into 
account the possibility to exploit sensory—motor coor- 
dination, the ability to discriminate between sensory 
stimuli experienced near small and large cylindrical ob- 
jects requires a relatively complex control policy since 
the two classes of stimuli strongly overlap in the robot’s 
perceptual space [37.32]. On the other hand, robots 
evolved for the ability to perform this task tend to con- 
verge on a solution relying on a rather simple control 
policy: the robots begin to turn around objects as soon 
as they approach them and then discriminate the size of 
the object on the basis of the sensed differential speed 
of the left and right wheels during the execution of the 
object-circling behavior [37.33]. In other words, the ex- 
ecution of the object-circling behavior allows the robots 
to experience sensory stimuli on the wheel sensors that 
are well differentiated for small and large objects. This, 
in turn, allows them to solve the object discrimina- 
tion problem with a rather simple but reliable control 
policy. 

Another related experiment in which a Khepera 
robot provided solely with infrared sensors was adapted 
for finding and remaining close to a cylindrical object, 
while avoiding walls, demonstrates how sensory—motor 
coordination can be exploited to solve tasks that re- 
quire the display of differentiated behavior in different 


701 


EZE | d Hed 


702 


EZE | d Hed 


Part D 


Neural Networks 


environmental circumstances, without discriminating 
the contexts requiring different responses [37.32, 34]. 
Indeed, evolved robots manage to avoid walls, find 
a cylindrical object, and remain near it simply by mov- 
ing backward or forward when their frontal infrared 
sensors are activated or not, respectively, and by turning 
left or right when their right and left infrared sensors 
are activated, respectively (providing that the turning 
speed and the move forward speed is appropriately reg- 
ulate on the basis of the sensors activation). Indeed, 
the execution of this simple control rule combined with 
the effects of the robot’s actions lead to the exhibi- 
tion of a move-forward behavior far from obstacles, an 
obstacle avoidance behavior near walls, and an oscil- 
latory behavior near cylindrical objects (in which the 
robot remains near the object by alternating forward and 
backward and/or turn-left and turn-right movements). 
The differentiation of the behavior observed during the 
robot/wall and robot/cylinder interactions can be ex- 
plained by considering that the execution of the same 
action produces different sensory effects in interaction 
with different objects. In particular, the execution of 
a turn-left action at time f elicited by the fact that 
the right infrared sensors are more activated than the 
left sensors near an object leads to the perception of: 
(i) a similar sensory stimulus eliciting a similar action 
at time f+ , ultimately producing an object avoidance 
behavior near a wall object, (ii) a different sensory 
stimulus (in which left infrared sensors can become 
more activated than the left infrared sensors) eliciting 
a turn-right action at time t+; ultimately producing an 
oscillatory behavior near the cylinder. 

Examples of clever use of sensory—motor coor- 
dination abound in natural and artificial evolution. 
A paradigmatic example of the use of sensory—motor 
coordination in natural organisms are the navigation ca- 
pabilities of flying insects that are based on the optic 
flow, i.e., the apparent motion of contrasting objects 
in the visual field caused by the relative motion of the 
agent [37.35]. Houseflies, for example, use this solu- 
tion to navigate up to 700 body lengths per second 
in unknown 3D environment while using quite modest 
processing resources, i. e., about 0.001% of the number 
of neurons present in the human brain [37.36]. Ex- 
amples in the evolutionary robotics literature include 
wheeled robots performing navigation tasks ((37.32], 
see below), artificial fingers and humanoid robotic arms 
evolved for the ability to discriminate between ob- 
ject varying in shapes [37.20, 37], and wheeled robots 
able to navigate visually by using a pan-tilt cam- 
era [37.38]. 


37.3.2 Accessing and Generating 
Information Through Action 


A second fundamental role of sensory—motor coordi- 
nation consists in accessing and/or generating useful 
sensory information though action. Differently from ex- 
perimental settings in which stimuli are brought to the 
passive agent by the experimenter, in ecological condi- 
tions agents need to access relevant information through 
action. For example, infants access the visual informa- 
tion necessary to recognize the 3D structure of an object 
by rotating it in the hand and by keeping it at close dis- 
tance so to minimize visual occlusions [37.39]. The use 
of sensory—motor coordination for this purpose is usu- 
ally named active perception [37.37, 40, 41]. 

Interestingly, action can be exploited not only to 
access sensory information but also to generate it. To 
understand this aspect, we should consider that through 
their action agents can elaborate the information they 
access through their sensory system over time and store 
the result of the elaboration in their body state and/or 
in their posture or location. A well-known example of 
this phenomenon is constituted by depth perception as 
a result of convergence, i.e., the simultaneous inward 
movement of both eyes toward each other, to maintain 
a single binocular percept of a selected object. The exe- 
cution of this behavior produces a kinesthetic sensation 
in the eye muscles that reliably correlates with the ob- 
ject’s depth. 

The careful reader might have recognized that the 
robot’s behavioral discrimination strategies to perceive 
larger and smaller cylindrical objects, described in the 
previous section, exploit the same active perception 
mechanism. For a robot provided with infrared and 
wheel-speed sensors, the perception of object size nec- 
essarily requires a capacity to integrate the information 
provided by several stimuli. The elaboration of this in- 
formation however is not realized internally, within the 
robot’s nervous system, but rather externally through 
the exhibition of the object-circling behavior. It is this 
behavior that generates the corresponding kinesthetic 
sensation on the wheel sensors that is then used by the 
robot to decide to remain or leave, depending on the 
circumstances. 

Examples of clever strategies able to elaborate the 
required information through action and active percep- 
tion abound in evolutionary robotics experiments. By 
carrying out an experiment in which a robot needed to 
reach two foraging areas located in the northeast and 
southwest side of a rectangular environment surrounded 
by walls, Nolfi [37.34] observed that the evolved robots 


Embodied Intelligence | 37.4 Developmental Robotics for Higher Order Embodied Cognitive Capabilities 


developed a clever strategy that allows them to compute 
the relative length of the two sides of the environment 
and to navigate toward the two right corners on the 
basis of a simple control policy. The strategy consists 
in leaving the first encountered corner with an angle 
of about 45° with respect to the two sides, moving 
straight, and then eventually following the left side of 
the next encountered wall ({37.34] for details). Another 
clever exploitation of sensory—motor coordination was 
observed in an experiment involving two cooperating 
robots that helped each other to navigate toward circu- 
lar target areas [37.42]. Evolved robots discovered and 
displayed a behavior solution that allowed them to in- 
form each other on the relative location of the center of 
their target navigation area despite their sensory system 
being unable to detect their relative position within the 
area [37.42]. 


37.3.3 Channeling the Course 
of the Learning Process 


A third fundamental role of sensory-motor coordina- 
tion consists in channeling the course of the forthcom- 
ing adaptive process. 

The sensory states experienced during learning cru- 
cially determine the course and the outcome of the 
learning process [37.43]. This implies that the actions 
displayed by an agent, that co-determine the agent’s 
forthcoming sensory states, ultimately affect how the 
agent changes ontogenetically. In other words, the be- 
havior exhibited by an agent at a certain stage of its 


development constraints and channels the course of the 
agent’s developmental process. 

Indeed, evolutionary robotics experiments indicate 
how the evolution of plastic agents (agents that vary 
their characteristics while they interact with the envi- 
ronment [37.44]) lead to qualitatively different results 
with respect to the evolution of nonplastic individuals. 
The traits evolved in the case of nonplastic individuals 
are selected directly for enabling the agent to display 
the required capabilities. The traits evolved in the case 
of plastic individuals, instead, are selected primarily for 
enabling the agents to acquire the required capabilities 
through an ontogenetic adaptation process. This implies 
that, in this case, the selected traits do not enable the 
agent to master their adaptive task (agents tend to dis- 
play rather poor performance at the beginning of their 
lifetime) but rather to acquire the required capacities 
through ontogenetic adaptation. 

More generally, the behavioral strategies adopted by 
agents at a certain stage of their developmental process 
can crucially constrain the course of the adaptive pro- 
cess. For example, agents learning to reach and grasp 
objects might temporarily reduce the complexity of the 
task to be mastered by freezing (i. e., locking) selected 
DOFs and by then unfreezing them when their capacity 
reaches a level that allows them to master the task in 
its full complexity [37.45, 46]. This type of process can 
enable exploratory learning by encompassing variation 
and selection of either the general strategy displayed by 
the agent or the specific way in which the currently se- 
lected strategy is realized. 


37.4 Developmental Robotics for Higher Order Embodied Cognitive 


Capabilities 


37.4.1 Embodied Cognition 
and Developmental Robots 


The previous sections have demonstrated the fun- 
damental role of embodiment and of the agent- 
environment coupling in the design of adaptive agents 
and robots capable to perform sensory—motor tasks 
such as navigation and object discrimination. How- 
ever, embodiment also plays an important role in higher 
order cognitive capabilities [37.12], such as object cat- 
egorization and representation, language learning, and 
processing, and even the acquisition of abstract con- 
cepts such as numbers. In this section, we will consider 
some of the key psychological and neuroscience ev- 


idence of EC and its contribution in the design of 
linguistic and numerical skills in cognitive robots. 
Intelligent behavior has traditionally been mod- 
eled as a result of activation patterns across distributed 
knowledge representations, such as hierarchical net- 
works of interrelated propositional (symbolic) nodes 
that represent objects in the world and their attributes 
as abstract, amodal (nonembodied) entities [37.47]. For 
example, the response bird to a flying object with feath- 
ers and wings would result from perceiving its features 
and retrieving its name from memory on the basis of 
a matching process. Such traditional views were at- 
tractive for a number of reasons: They followed the 
predominant philosophical tradition of logical concep- 


703 


le | d Hed 


704 PartD 


Neural Networks 


le | d Hed 


tual knowledge organization, according to which all 
objects are members of categories and category mem- 
bership can be determined in an all-or-none fashion via 
defining features. Also, such hierarchical knowledge 
networks were consistent with cognitive performance 
in simple tasks such as speeded property verification, 
which were thought to tap into the retrieval of knowl- 
edge. For example, verifying the statement a bird has 
feathers was thought to be easier than verifying the 
statement a bird is alive because the feature feathers 
was presumably stored in memory as defining the cate- 
gory bird, while the feature alive applies to all animals 
and was therefore represented at a superordinate level 
of knowledge, hence requiring more time to retrieve af- 
ter having just processed bird [37.47]. Finally, it was 
convenient to computationally model such networks by 
liking the human mind to an information processing de- 
vice with systematic input, storage, retrieval, and output 
mechanisms. Thus, knowledge was considered as an ab- 
stract commodity independent of the physical device 
within which it was implemented. 

More recent work called into question several of 
these assumptions about the workings of the human 
mind. For example, graded category memberships and 
prototypicality effects in categorization tasks pointed to 
disparities between the normative logical knowledge or- 
ganization and the psychological reality of knowledge 
retrieval [37.48]. Computational modeling of cognitive 
processes has revealed alternative, distributed represen- 
tational networks for computing intelligent responses 
in perceptual, conceptual, and motor tasks that avoid 
the neurophysiologically implausible assumption of lo- 
calized storage of specific knowledge [37.49]. Most 
importantly, though, traditional propositional knowl- 
edge networks were limited to explaining the meaning 
of any given concept in terms of an activation pattern 
across other conceptual nodes, thus effectively defin- 
ing the meaning of one symbol in terms of arbitrary 
other symbols. This process never referred to a concrete 
experience or event and essentially made the process 
of connecting internal and external referents arbitrary. 
In other words, traditional knowledge representations 
never make contact with specific sensory and motor 
modalities that is essential to imbue meaning to the ac- 
tivation pattern in a network. This limitation is known 
as the grounding problem [37.50] and points to a fun- 
damental flaw in traditional attempts to model human 
knowledge representations. 

A second reason for abandoning traditional amodal 
models of knowledge representation is the fact that 
these models cannot account for patterns of sensory 


and motor excitation that occur whenever we activate 
our knowledge. Already at the time when symbol ma- 
nipulation approaches to intelligent behavior had their 
heyday there was powerful evidence for a mandatory 
link between intelligent thought and sensory—motor 
experience: When matching two images of the same ob- 
ject, the time we need to recognize that it is the same 
object is linearly related to the angular disparity be- 
tween the two views [37.15]. This result suggests that 
the mental comparison process simulates the physical 
object rotation we would perform if the two images 
were manipulable in our hands. In recent years, there 
has been both more behavioral and also neuroscientific 
evidence of an involvement of sensory—motor processes 
in intelligent thought, leading to the influential notion of 
action simulation as an obligatory component of intel- 
ligent thought (for review, [37.51]). 

To summarize, the idea that sensory and motor pro- 
cesses are an integral part of our knowledge is driven 
by both theoretical and empirical considerations. On the 
theoretical side, the EC stance addresses the grounding 
problem, a fundamental limitation of classical views of 
knowledge representation. Empirically, it is tough for 
traditional amodal conceptualizations of knowledge to 
address systematic patterns of sensory and motor biases 
that accompany knowledge activation. 

Amongst the latest development in robotics and 
computational intelligence, the field of developmen- 
tal robotics has specifically focused on the essential 
role of EC in the ontogenetic development of cogni- 
tive capabilities. Developmental robotics (also know 
as epigenetic robotics and as the field of autonomous 
mental development) is the interdisciplinary approach 
to the autonomous design of behavioral and cogni- 
tive capabilities in artificial agents (robots) that takes 
direct inspiration from the developmental principles 
and mechanisms observed in natural cognitive systems 
(children) [37.18, 52-54]. In particular, the key princi- 
ple of developmental robotics is that the robot, using 
a set of intrinsic developmental principles regulating the 
real-time interaction between its body, brain, and en- 
vironment, can autonomously acquire an increasingly 
complex set of sensorimotor and mental capabilities. 
Existing models in developmental robotics have cov- 
ered the full range of sensory—motor and cognitive 
capabilities, from intrinsic motivation and motor con- 
trol to social learning, language and reasoning with 
abstract knowledge ([37.18] for a full overview). 

To demonstrate the benefits of combining EC with 
developmental robotics in the modeling of embodied 
intelligence, the two domains of the action bases of 


Embodied Intelligence | 37.4 Developmental Robotics for Higher Order Embodied Cognitive Capabilities 


language and of the relationship between space and nu- 
merical cognition have been chosen. In Sect. 37.4.2, we 
will look at seminal examples of the embodied bases 
of language in psycholinguistics, neuroscience, and de- 
velopmental psychology, and the corresponding devel- 
opmental robotics models. Section 37.4.3 will consider 
EC evidence on the link between spatial and numerical 
cognition, and a developmental robotics model of em- 
bodied language learning. 


37.4.2 Embodied Language Learning 


In experimental psychology and psycholinguistics, an 
influential demonstration of action simulation as part of 
language comprehension was first carried out by Glen- 
berg and Kaschak [37.55]. They asked healthy adults to 
move their right index finger from a button in their mid- 
sagittal plane either away from or toward their body 
to indicate whether a visually presented statement was 
meaningful or not. Sentences like Open the drawer led 
to faster initiation of movements toward than away from 
the body, while sentences like Close the drawer led to 
faster initiation of movements away from than toward 
the body. Thus, there was a congruency effect between 
the implied spatial direction of the linguistic descrip- 
tion and the movement direction of the reader’s motor 
response. This motor congruency effect in language 
comprehension has been replicated and extended (for 
review, [37.56]). It suggests that higher level cognitive 
feats (such as language comprehension) are ultimately 
making use of lower level (sensory—motor) capacities 
of the agent, as predicted by an embodied account of 
intelligence. 

In parallel, growing cognitive neuroscience evi- 
dence has shown that the cortical areas of the brain 
specialized for motor processing are also involved in 
language processing tasks; thus supporting the EC view 
that action and language are strictly integrated [37.57, 
58]. For example, Hauk et al. [37.59] carried out brain 
imaging experiments where participants read words re- 
ferring to face, arm, or leg actions (e.g., lick, pick, kick). 
Results support the embodied view of language, as the 
linguistic task of reading a word differentially activated 
parts of the premotor area that were directly adjacent, or 
overlapped, with region activated by actual movement 
of the tongue, the fingers, or the feet, respectively. 

The embodied nature of language has also been 
shown in developmental psychology studies, as in 
Tomasello’s [37.60] constructivist theory of language 
acquisition and in Smith and Samuelson’s [37.61] study 
on embodiment biases in early word learning. For ex- 


ample, Smith and Samuelson [37.61] investigated the 
role of embodiment factors such as posture and spa- 
tial representations during the learning of first words. 
They demonstrated the importance of the changes in 
postures involved in the interaction with objects lo- 
cated in different parts (left and right) of the child’s 
peripersonal space. Experimental data with 18-month 
old children show that infants can learn new names 
also in the absence of the referent objects, when the 
new label is said whilst the child looks at the same 
left/right location where the object has previously ap- 
peared. This specific study was the inspiration of a de- 
velopmental robotics study on the role of posture in 
the acquisition of object names with the iCub baby 
robot [37.62]. 

The iCub is an open source robotic platform devel- 
oped as a benchmark experimental tool for cognitive 
and developmental robotics research [37.63]. It has a to- 
tal of 53 DOF, with a high number of DOF (32) in 
the arms and hands to study object manipulation and 
the role of fine motor skills in cognitive development. 
This facilitates the replication of the experimental setup 
of Smith and Samuelson’s study [37.61]. In the iCub 
experiments, a human tutor shows two novel objects re- 
spectively in the left and right location of a table put in 
front of the robot. Initially the robot moves to look at 
each object and learns to categorize it according to its 
visual features, such as shape and color. Subsequently 
the tutor hides both objects, directs the robot’s atten- 
tion toward the right side where the first object was 
shown and says a new word: Modi. In the test phase 
both objects are presented simultaneously in the centre 
of the table, and the robot is asked Find the modi. The 
robot must then look and point at the object that was 
presented in the right location. Four different experi- 
ments were carried out, as in Smith and Samuelson’s 
child study. Two experiments differ with regards to 
the frequency of the left/right locations used to show 
each objects: the Default Condition when each object 
always appears in the same location, and the Switch 
Condition when the position of the two objects is var- 
ied to weaken the object/location spatial association. 
In the other two experimental conditions, the object 
is named whilst in sight, so to compare the relative 
weighting of the embodiment spatial constraints and the 
time constraint. 

The robot’s behavior is controlled by a modular 
neural network consisting of a series of pretrained 
Kohonen self-organizing maps (SOMs), connected 
through Hebbian learning weights that are trained on- 
line during the experiment [37.64]. The first SOM is 


705 


le | d Hed 


706 PartD 


Neural Networks 


le | d Hed 


a color map as it is used to categorize objects accord- 
ing to their color (average RGB (red-green-blue) color 
of the foveal area). The second map, the auditory map, 
is used to represent the words heard by the robot, as 
the result of the automatic speech recognition system. 
The other SOM is the body-hub map, and this is the 
key component of the robot’s neural system that imple- 
ments the role of embodiment. The body-hub SOM has 
four inputs, each being the angle of a single joint. In the 
experiments detailed here only 2 degrees from the head 
(up/down and left/right motors), and 2 degrees from the 
eyes (up/down and left/right motors) are used. Embod- 
iment is operationalized here as the posture of eye and 
head position when the robot has to look to the left and 
to the right of the scene. 

During each experiment, the connection weight 
linking the color map and the auditory map to the body- 
hub map are adjusted in real time using a Hebbian 
learning rule. These Hebbian associative connections 
are only modified from the current active body pos- 
ture node. As the maps are linked together in real time, 
strong connections between objects typically encoun- 
tered in particular spatial locations, and hence in similar 
body postures, build up. 

To replicate the four experimental conditions of 
Smith and Samuelson [37.61], 20 different robots were 
used in each condition, with new random weights for 
the SOM and Hebbian connections. Results from the 
four conditions show a very high match between the 
robot’s data and the child experiment results, closely 
replicating the variations in the four conditions. For 
example, in the Default Condition 83% of the trials 
resulted in the robots selecting the spatially linked 
objects, whilst in the Switch condition, where the 
space/object association was weakened, the robots’ 
choices were practically due to chance at 55%. Smith 
and Samuelson [37.61] reported 71% of children se- 
lected the spatially linked object, versus 45% in the 
Switch condition. 

This model demonstrates that it is possible to build 
an embodied cognitive system that develops linguis- 
tic and sensorimotor capabilities through interaction 
with the world, closely resembling the embodiment 
strategies observed in children’s acquisition early word 
learning. Other cognitive robotics models have also 
been developed which exploit the principle of embod- 
iment in robots’ language learning, as in models of 
compositionality in action and language [37.65-68], in 
models of the cultural evolution of construction gram- 
mar [37.69, 70], and the modeling of the grounding of 
abstract words [37.71]. 


37.4.3 Number and Space 


Number concepts have long been considered as pro- 
totypical examples of abstract and amodal concepts 
because their acquisition would require generalizing 
across a large range of instances to discover the in- 
variant cardinality meaning of words such as two and 
four [37.72]. Mental arithmetic would therefore appro- 
priately be modeled as abstract symbol manipulation, 
such as incrementing a counter or retrieving factual 
knowledge [37.73]. But evidence for an inescapable 
reference back from abstract number concepts to the 
sensori-motor experiences during concept acquisition 
has been present for a long time. Specifically, Moyer 
and Landauer [37.74] showed that the speed of decid- 
ing which of two visually presented digits represents 
the larger number depends on their numerical distance, 
with faster decisions for larger distances. Thus, even in 
the presence of abstract symbols we seem to refer to 
analog representations, as if comparing sensory impres- 
sions of small and large object compilations. 

More recent studies provided further evidence that 
sensory—motor experiences have a strong impact on the 
availability of number knowledge. This embodiment 
signature can be documented by measuring the speed 
of classifying single digits as odd or even with lateral- 
ized response buttons. The typical finding is that small 
numbers (1, 2) are classified faster with left responses 
and large numbers (8, 9) are classified faster with right 
responses [37.76]. This spatial-numerical association 
response codes, or SNARCs effect, has been replicated 
across several tasks and extended to other effectors (for 
review [37.77]), including even attention shifts to the 
left or right side induced by small or large numbers, re- 
spectively [37.78]. 

Importantly, SNARC depends on one’s sensory— 
motor experiences, such as directional scanning and 
finger counting habits, as well as current task de- 
mands. For example, the initial acquisition of num- 
ber concepts in childhood occurs almost universally 
through finger counting and this learning process leaves 
a residue in the number knowledge of adults. Those 
who start counting on their left hand, thereby associ- 
ating small numbers with left space, have a stronger 
SNARC than those who start counting on their right 
hand [37.79]. Similarly, reading direction modulates 
the strength of SNARC-. In the original report by De- 
haene etal. [37.76], it was noticed that adults from 
a right-to-left reading culture presented with weaker 
or even reversed SNARC. The notion of a spill-over 
of directional reading habits into the domain of num- 


Embodied Intelligence | 37.4 Developmental Robotics for Higher Order Embodied Cognitive Capabilities 707 


2 4 


ber knowledge was further supported by developmental 
studies showing that it takes around 3 years of schooling 
before the SNARC emerges [37.80]. However, more re- 
cent work has found SNARC even in preschoolers (for 
review [37.81], thus lending credibility to the role of 
embodied practices such as finger counting in the for- 
mation of SNARC. 

In a recent series of experiments with Russian— 
Hebrew bilinguals, Shaki et al. [37.82—84] (for review 
(37.85]) documented that both one’s habitual reading 
direction and the most recent, task-specific scanning di- 
rection determine the strength of one’s SNARC. These 
findings make clear that SNARC is a compound effect 
where embodied and situated (task-specific) factors add 
different weights to the overall SNARC. 

SNARC and other biases extend into more complex 
numerical tasks such as mental arithmetic. For exam- 
ple, the association of larger numbers with right space 
is also present during addition (the operational momen- 
tum or OM effect). Regardless of whether symbolic 
digits or nonsymbolic dot patterns are added together, 
participants tend to over-estimate the sum, and this 
bias also influences spatial behavior [37.86]. More gen- 
erally, intelligent behavior such as mental arithmetic 
seems to reflect component processes (distance effect, 
SNARC effect, OM effect) that are grounded in senso- 
rimotor experiences. 

The strong link between spatial cognition and num- 
ber knowledge permits the modeling of the embodiment 
processes in the acquisition of number in robots. This 
has been the case with the recent developmental model 
developed by Rucinski et al. [37.75,87] to model the 
SNARC effect and the contribution of pointing ges- 
tures in number acquisition. In the first study [37.75], 
a simulation model of the iCub is used. The robot is 
first trained to develop a body schema of the upper 


Fig. 37.2a,b (a) iCub simulation 


Ee model of the SNARC (spatial— 
numerical association response code) 
effect; (b) SNARC effect results, with 

$ the difference in reaction times (right 

SL minus left hand) is plotted against 

=x number magnitude (after [37.75]) 
ae 
Tz 
ty 
6 8 10 12 14 
Number 


body through motor babbling of its arms. The iCub is 
subsequently trained to learn to recognize numbers by 
associating quantities of objects with numerical sym- 
bols as 7 and 2. In the SNARC test case, the robot has 
to perform a psychological-like experiment and press 
a left or right button to make judgments on number 
comparison and parity judgment (Fig. 37.2b). 

The robot’s cognitive architecture is based on 
a modular neural network controller with two main 
components, following inspiration from a connectionist 
model of numerical cognition [37.88] and the TRoPI- 
CALS cognitive architecture of Caligiore et al. [37.89, 
90]. The two main components of the neural control 
system are: (i) ventral pathway network, responsible for 
processing of the identity of objects as well as task- 
dependent decision making and language processing; 
and (ii) dorsal pathway network, involved in process- 
ing of spatial information about locations and shapes of 
objects and processing for the robot’s action. 

The ventral pathway is modeled, following Chen 
and Verguts [37.88], with a symbolic input which en- 
codes the alphanumerical number symbols of numbers 
from 1 to 15, a mental number line encoding the num- 
ber meaning (quantity), a decision layer for the number 
comparison and parity judgment tasks, and a response 
layer, with two neurons for left/right hand response se- 
lection. The dorsal pathway is composed of a number 
of SOMs which code for spatial locations of objects 
in the robot peripersonal space. One map is associ- 
ated with gaze direction, and two maps respectively for 
each of the robot’s left and right arms. The input to the 
gaze map arrives from the 3-dimensional proprioceptive 
vector representing the robot gaze direction (azimuth, 
elevation and vergence). The input to each arm position 
map consists of a 7-dimensional proprioceptive vector 
representing the position of the relevant arm joints. This 


le | d Hed 


708 PartD 


Neural Networks 


le | d Hed 


dorsal pathway constitutes the core component of the 
model where the embodied properties of the model are 
directly implemented as the robot’s own sensorimotor 
maps. 

To model the developmental learning processes in- 
volved in number knowledge acquisition, a series of 
training phases are implemented. For the embodiment 
part, the robot is first trained to perform a process equiv- 
alent to motor babbling, to develop the gaze and arm 
space maps. With motor babbling the robot builds its 
internal visual and motor space representations (SOMs) 
by performing random reaching movements to touch 
a toy in its peripersonal space, whilst following its 
hand’s position. Transformations between the visual 
spatial map for gaze and the maps of reachable left 
and right spaces are implemented as connections be- 
tween the maps, which are learned using the classical 
Hebbian rule. At each trial of motor babbling, gaze 
and appropriate arm are directed toward the same 
point and resulting co-activations in already devel- 
oped spatial maps is used to establish links between 
them. 

The next developmental training establishes the 
links between number words (modeled as activations 
in the ventral input layer) and the number meaning 
(activations in the mental number line hidden layer). 
Subsequently the robot is taught to count. This stage 
models the cultural biases that result in the internal as- 
sociation of small numbers with the left side of space 
and large numbers with the right side. As an example 
of these biases, we considered a tendency of children 
to count objects from left to right, which is related to 
the fact that European culture is characterized by left- 
to-right reading direction [37.91]. In order to model the 
process of learning to count, the robot was exposed to 
an appropriate sequence of number words (fed to the 
ventral input layer of the model network), while at the 
same time the robot’s gaze was directed toward a spe- 
cific location in space (via the input to the gaze visual 
map). These spatial locations were generated in such 
a way that their horizontal coordinates correlated with 
number magnitude (small numbers presented on the 
left, large numbers on the right) with a certain amount 
of Gaussian noise. During this stage, Hebbian learning 
established links between number word and stimuli lo- 
cation in the visual field. 

Finally, the model is trained to perform number rea- 
soning tasks, such as number comparison and parity 
judgment, which corresponds to establishing appropri- 
ate links between the mental number line hidden layer 
and neurons in the decision layer. Specifically, one 


experiment focuses on the modeling of the SNARC ef- 
fect. The robot’s reaction time (i. e., amount of activity 
needed to exceed a response threshold in one of the two 
response nodes) in parity judgment and number com- 
parison tasks were recorded to calculate the difference 
between right hand and left hand RTs for the same num- 
ber. When difference values are plotted against number 
magnitudes the SNARC effect manifests itself in a neg- 
ative slope as in Fig. 37.2. As the connections between 
visual and motor maps form a gradient from left to 
right, the links to the left arm map become weaker, 
while those to the right become stronger. Thus, when 
a small number is presented, internal connections lead 
to stronger automatic activation of the representations 
linked with the left arm than that of the right arm, thus 
causing the SNARC effect. 

This model of space and number knowledge was 
also extended to include a more active interaction with 
the environment during the number learning process. 
This is linked to the fact that gestures such as point- 
ing at the object being counted, or the actual touching 
of the objects enumerated, has been show to improve 
the overall counting performance in children [37.92]. In 
the subsequent model by Rucinski et al. [37.87], a sim- 
pler neural control architecture was used based on the 
Elman recurrent network to allow sequential number 
counting and the representation of gestures as propri- 
oceptive states for the pointing gestures. The robot has 
to learn to produce a sequence of number words (from 
1 to 10) with the length of the sequence equivalent to 
the number of objects present in the scene. Visual in- 
put to the model is a one-dimensional saliency map, 
which can be considered a simple model of a retina. 
In input, the additional proprioceptive signal was ob- 
tained from a pointing gesture performed by the iCub 
humanoid robot and is used to implement the gestural 
input to the model in the pointing condition. The out- 
put nodes encode the phonetic representation of the 10 
numbers. 

During the experiment, the robot is first trained 
to recite a sequence of number words. Then, in order 
to assess the impact of the proprioceptive informa- 
tion connected with the pointing gesture, the training 
is divided into two conditions: (i) training to count 
the number of objects shown to the visual input in 
the absence of the proprioceptive gesture signal, and 
(ii) counting though pointing, via the activation of the 
gesture proprioceptive input. Results show that such 
a simple recurrent architecture benefits from the input 
of the proprioceptive gesturing signal, with improved 
counting accuracy. In particular, the model reproduces 


Embodied Intelligence | 37.5 Conclusion 


the quantitative effects of gestures on the counted 
set size, replicating child psychology data reported 
in [37.92]. 

Overall, such a developmental robotics model 
clearly shows that the modeling of embodiment phe- 
nomena, such as the use of spatial representation in 
number judgments, and of the pointing gestures for 


37.5 Conclusion 


This chapter has provided an overview of the three key 
principles of embodied intelligence, namely morpho- 
logical computation, sensory—motor coordination, and 
EC, and of the experimental approaches and models 
from evolutionary robotics and developmental robotics. 
The wide range of behavioral and cognitive capabil- 
ities modeled through evolutionary and developmen- 
tal experiments (e.g., locomotion in different environ- 
ments, navigation and object discrimination, posture in 
early word learning and space and number integration) 
demonstrates the impact of embodied intelligence in the 
design of a variety of perceptual, motor and cognitive 
skills, including the potential to model the embodied 
basis of abstract knowledge as in numerical cognition. 

The current progress of both evolutionary and de- 
velopmental models of embodied intelligence, although 
showing significant scientific and technological ad- 
vances in the design of embodied and situated agents, 
still has a series of open challenges and issues. These 
issues are informing ongoing work in the various fields 
of embodied intelligence. 

One open challenge in morphological computation 
concerns how best to automatically design the body 
plans of robots so that they can best exploit this phe- 
nomenon. In parallel to this, much work remains to 
be done to understand what advantages morphological 
computation confers on a robot. For one, it is likely 
that a robot with a simpler control policy will be more 
robust to unanticipated situations: for example the jam- 
ming gripper is able to grasp multiple objects with 
the same control strategy; a rigid hand requires differ- 
ent control strategies for different objects. Secondly, 
a robot that performs more morphological computa- 
tion may be more easily transferred from the simulation 
in which it was evolved to a physical machine: with 
a simpler control policy there is less that can go wrong 
when experiencing the different sensor signals and mo- 
tor feedback generated by operation in the physical 
world. 


number learning, can allow us to understand the acqui- 
sition of abstract concepts in humans as well as artificial 
agents and robots. This further demonstrates the benefit 
of the embodied intelligence approach to model a range 
of behavioral and cognitive phenomena from simple 
sensory—motor tasks to higher order linguistic and ab- 
stract cognition tasks. 


Evolving robots provides a unique opportunity for 
developing rigorous methods for measuring whether 
and how much morphological computation a robot per- 
forms. For instance, if evolutionary algorithms can be 
designed that produce robots with similar abilities yet 
different levels of control and morphological complex- 
ity, and it is found that in most cases reduced control 
complexity implies greater morphological complexity, 
this would provide evidence for the evolution of mor- 
phological computation. 

The emerging field of soft robotics [37.93] provides 
much opportunity for exploring the various aspects 
of morphological computation because the space of 
all possible soft robot body plans — with large vari- 
ations in shape and continuous admixtures of hard 
and soft materials — is much larger than the space 
of rigid linkages traditionally employed in classical 
robots. 

The design issue, 1. e., the question of how systems 
able to exploit coordinated action and perception pro- 
cesses can be designed, represents an open challenge 
for sensory-motor coordination as well. As illustrated 
above, adaptive techniques in which the fine-grained 
characteristics that determine how agents react to cur- 
rent and previous sensory states are varied randomly 
and in which variations are retained or discarded on 
the basis of their effects at the level of the overall be- 
havior exhibited by the agent/s interacting with their 
environment constitutes an effective method. However, 
this method might not scale up well with the number of 
parameters to be adapted. The question of how sensory— 
motor coordination capabilities can be acquired through 
the use of other learning techniques that relays on 
shorter term feedbacks represents an open issue. An in- 
teresting research direction, in that respect, consists in 
the hypothesis that the development of sensory—motor 
coordination can be induced through the use of task 
independent criteria such as information theoretic mea- 
sures [37.94, 95]. 


709 


gle | d Hed 


710 PartD 


Neural Networks 


gle | d Hed 


Other important research directions concerns the 
theoretical elaboration of the different roles that mor- 
phological computation and sensory-motor coordina- 
tion can play and the clarification of the relation- 
ship between processes occurring as a result of the 
agent/environmental interactions and processes occur- 
ring inside the agents’ nervous systems 

In developmental robotics models of EC the is- 
sues of open-ended, cumulative learning and of the 
scaling up of the sensory-motor and cognitive reper- 
toires still requires significant efforts and novel method- 
ological and theoretical approaches. Another issue, 
which combines both evolutionary and developmen- 
tal approaches, is the interaction of phylogenetic and 
ontogenetic phenomena in the body/environment/brain 
adaptation. 

Human development is characterized by cumula- 
tive, open-ended learning. This refers to the fact that 
learning and development do not start and stop at 
specific stages, but rather this is a life-long learning 
experience. Moreover, the skills acquired in various de- 
velopmental stages are accumulated and integrated to 
support the further acquisition of more complex capa- 
bilities. One consequence of cumulative, open-ended 
learning is cognitive bootstrapping. For example in lan- 
guage development, the phenomenon of the vocabulary 
spurt exist, in which the knowledge and experience 
from the slow learning of the first 50—100 words causes 
a redefinition of the word learning strategy, and to syn- 
tactic bootstrapping, where children rely on syntactic 
cues and word context in verb learning to determine the 
meaning of new verbs [37.96]. Although some com- 
putational intelligence models of the vocabulary spurt 
exist [37.97], robotic experiments on language learning 
have been restricted to smaller lexicons, not reaching 
the critical threshold to allow extensive modeling of 
the bootstrapping of the agent’s lexicon and grammar 
knowledge. These current limitations are also linked 
to the general issue of the scaling up of the robot’s 
motor and cognitive capabilities and of cross-modal 
learning. Most of the current cognitive robotics models 
typically focus on the separate acquisition of only one 
task or modality (perception, or phonetics, or semantics 
etc.), often with limited repertoires rarely reaching 10 
or slightly more learned actions or words. Thus a truly 
online, cross-modal, cumulative, open-ended develop- 
mental robotics model remains a fundamental challenge 
to the field. 


Another key challenge for future research is the 
modeling of the interaction of the different timescales 
of adaptation in embodied intelligence, that is between 
phylogenetic (evolutionary) factors and ontogenetic 
(development, learning, maturation) phenomena. For 
example, maturation refers to changes in the anatomy 
and physiology of both the child’s brain and the body, 
especially during the first years of life. Maturational 
phenomena related to the brain include the decrease of 
brain plasticity during early development, whilst matu- 
ration in the body is more evident due to the significant 
morphological growth changes a child goes through 
from birth to adulthood (see Thelen and Smith’s analy- 
sis of crawling and walking [37.98]). The ontogenetic 
changes due to maturation and learning have impor- 
tant implications for the interaction of development 
with phylogenetic changes due to evolution. Body mor- 
phology and brain plasticity variations can be in fact 
explained as evolutionary adaptations of the species to 
changing environmental context as with heterochronic 
changes [37.99]. For example, Elman etal. [37.43] 
discuss how genetic and heterochronic mechanisms 
provide an alternative explanation of the nature/nurture 
debate, where genetic phenomena produce architectural 
constraints of the organism’s brain and body, which 
subsequently control and affects the results of learn- 
ing interaction. Following this, Cangelosi [37.100] has 
tested the effects of heterochronic changes in the evo- 
lution of neural network architectures for simulated 
robotic agents. 

The interaction between ontogenetic and phylo- 
genetic factors has been investigated through evo- 
lutionary robotics models. For example, Hinton and 
Nolan [37.101] and Nolfi et al. [37.102] have devel- 
oped evolutionary computational models explaining 
the effects of learning in evolution. The modeling of 
the evolution of varying body and brain morpholo- 
gies in response to phylogenetic and ontogenetic re- 
quirements is also the goal of the evo-devo field of 
computational intelligence [37.7, 103-105]. These evo- 
lutionary/ontogenetic interaction models have, how- 
ever, mostly focused on simple sensory—motor tasks 
such as navigation and foraging. Future work com- 
bining evolutionary and developmental robotics mod- 
els can better provide theoretical and technological 
understanding of the contribution of different adapta- 
tion time scales and mechanisms in embodied intelli- 
gence. 


Embodied Intelligence | References 
References 
37.1 R.D. Beer: A dynamical systems perspective on 37.18 A. Cangelosi, M. Schlesinger: Developmental 
agent-environment interaction, Artif. Intell. 72, Robotics: From Babies to Robots (MIT Press, Cam- 
173-215 (1995) bridge 2012) 
Siha2 R.A. Brooks: Elephants don't play chess, Robot. 37.19 J. Bongard: The utility of evolving simulated 
Auton. Syst. 6(1), 3-15 (1990) robot morphology increases with task complexity 
37.3 A. Cangelosi: Grounding language in action and for object manipulation, Artif. Life 16(3), 201-223 
perception: From cognitive agents to humanoid (2010) 
robots, Phys. Life Rev. 7(2), 139-151 (2010) 37.20 E. Tuci, G. Massera, S. Nolfi: Active categorical per- 
37.4 H.J. Chiel, R.D. Beer: The brain has a body: ception of object shapes in a simulated anthro- 
Adaptive behavior emerges from interactions of pomorphic robotic arm, IEEE Trans. Evol. Comput. 
nervous system, body and environment, Trends 14(6), 885-899 (2010) 
Neurosci. 20, 553-557 (1997) 37.21 R. Pfeifer, G. Gomez: Morphological computa- 
37.5 F. Keijzer: Representation and Behavior (MIT tion - Connecting brain, body, and environment, 
Press, London 2001) Lect. Notes Comput. Sci. 5436, 66-83 (2009) 
37.6 S. Nolfi, D. Floreano: Evolutionary Robotics: The 37.22 V. Pavlov, A. Timofeyev: Construction and sta- 
Biology, Intelligence, and Technology of Self- bilization of programmed movements of a mo- 
Organizing Machines (MIT/Bradford Books, Cam- bile robot-manipulator, Eng. Cybern. 14(6), 70-79 
bridge 2000) (1976) 
37.7 R. Pfeifer, J.C. Bongard: How the Body Shapes the 37.23 S. Hirose, Y. Umetani: The development of soft 
Way We Think: A New View of Intelligence (MIT gripper for the versatile robot hand, Mech. Mach. 
Press, Cambridge 2006) Theor. 13(3), 351-359 (1978) 
37.8 C. Paul: Morphology and computation, Proc. 37.24 E. Brown, N. Rodenberg, J. Amend, A. Mozeika, 
Int. Conf. Simul. Adapt. Behav. (2004) pp. 33- E. Steltz, M.R. Zakin, H. Lipson, H.M. Jaeger: 
38 Universal robotic gripper based on the jamming 
37.9 C. Paul: Morphological computation: A basis for of granular material, Proc. Natl. Acad. Sci. USA 
the analysis of morphology and control require- 107(44), 18809-18814 (2010) 
ments, Robot. Auton. Syst. 54(8), 619-630 (2006) 37.25 T.J. Allen, R.D. Quinn, R.J. Bachmann, R.E. Ritz- 
37.10 R. Pfeifer, F. lida: Morphological computation: mann: Abstracted biological principles applied 
Connecting body, brain and environment, Jpn. with reduced actuation improve mobility of 
Sci. Mon. 58(2), 48-54 (2005) legged vehicles, Proc. IEEE/RSJ Int. Conf. Intell. 
37.11 G. Pezzulo, L.W. Barsalou, A. Cangelosi, M.H. Fi- Robot. Syst. 2 (2003) pp. 1370-1375 
scher, K. McRae, M.J. Spivey: The mechanics of 37.26 M. Wisse, G. Feliksdal, J. Van Frankkenhuyzen, 
embodiment: A dialog on embodiment and com- B. Moyer: Passive-based walking robot, IEEE 
putational modeling, Front. Psychol. 2(5), 1-21 Robot. Autom. Mag. 14(2), 52-62 (2007) 
(2011) 37.27 K. Sims: Evolving 3d morphology and behavior by 
37.12 D. Pecher, R.A. Zwaan (Eds.): Grounding Cogni- competition, Artif. Life 1(4), 353-372 (1994) 
tion: The Role of Perception and Action in Memory, 37.28 J.E. Auerbach, J.C. Bongard: On the relationship 
Language, and Thinking (Cambridge Univ. Press, between environmental and morphological com- 
Cambridge 2005) plexity in evolved robots, Proc. 14th Int. Conf. 
37.13 M. Wilson: Six views of embodied cognition, Psy- Genet. Evol. Comput. (2012) pp. 521-528 
chon. Bull. Rev. 9, 625-636 (2002) 37.29  GECCO 2012 Robot Videos: https://www.youtube. 
37.14 L. Meteyard, S.R. Cuadrado, B. Bahrami, com/playlist?list=PLD5943A95ABC2COB3 
G. Vigliocco: Coming of age: A review of embodi- 37.30 J.E. Auerbach, J.C. Bongard: On the relationship 
ment and the neuroscience of semantics, Cortex between environmental and mechanical com- 
48(7), 788-804 (2012) plexity in evolved robots, Proc. 13th Int. Conf. 
37.15 R. Shepard, J. Metzler: Mental rotation of three Simul. Synth. Living Syst. (2012) pp. 309-316 
dimensional objects, Science 171(972), 701-703 37.31 F. Mondada, E. Franzi, P. lenne: Mobile robot 
(1972) miniaturisation: A tool for investigation in con- 
37.16 K. Dijkstra, M.P. Kaschak, R.A. Zwaan: Body pos- trol algorithms, Proc. 3rd Int. Symp. Exp. Robot. 
ture facilitiates the retrieval of autobiographical (Kyoto, Japan 1993) 
memories, Cognition 102, 139-149 (2007) 37.32 S. Nolfi: Power and limits of reactive agents, Neu- 
37.17 L.E. Williams, J.A. Bargh: Keeping one's distance: rocomputing 49, 119-145 (2002) 
The influence of spatial distance cues on af- 37.33 C. Scheier, R. Pfeifer, Y. Kunyioshi: Embedded 


fect and evaluation, Psychol. Sci. 19, 302-308 
(2008) 


neural networks: Exploiting constraints, Neural 
Netw. 11, 1551-1596 (1998) 


711 


ZE | d Hed 


712 


ZE | d Hed 


Part D 


Neural Networks 


37.34 


37.35 


37.36 


37.37 


37.38 


37.39 


37.40 


37.41 


37.42 


37.43 


37.44 


37.45 


37.46 


37.47 


37.48 


37.49 


37.50 


37.51 


37.52 


S. Nolfi: Categories formation in self-organizing 
embodied agents. In: Handbook of Categoriza- 
tion in Cognitive Science, ed. by H. Cohen, 
C. Lefebvre (Elsevier, Amsterdam 2005) pp. 869- 
889 

J.J. Gibson: The Perception of the Visual World 
(Houghton Mifflin, Boston 1950) 

N. Franceschini, F. Ruffier, J. Serres, S. Viollet: 0p- 
tic flow based visual guidance: From flying insects 
to miniature aerial vehicles. In: Aerial Vehicles, 
ed. by T.M. Lam (InTech, Rijeka 2009) 

S. Nolfi, D. Marocco: Active perception: A sensori- 
motor account of object categorization. In: From 
Animals to Animats 7, (MIT Press, Cambridge 2002) 
pp. 266-271 

D. Floreano, T. Kato, D. Marocco, S. Sauser: Coevo- 
lution of active vision and feature selection, Biol. 
Cybern. 90(3), 218-228 (2004) 

H.A. Ruff: Infants’ manipulative exploration of 
objects: Effect of age and object characteristics, 
Dev. Psychol. 20, 9-20 (1984) 

R. Bajcsy: Active perception, Proc. IEEE 76(8), 996- 
1005 (1988) 

D.H. Ballard: Animate vision, Artif. Intell. 48, 57- 
86 (1991) 

J. De Greef, S. Nolfi: Evolution of implicit and ex- 
plicit communication in a group of mobile robots. 
In: Evolution of Communication and Language 
in Embodied Agents, ed. by S. Nolfi, M. Mirolli 
(Springer, Berlin 2010) 

J.L. Elman, E.A. Bates, M. Johnson, A. Karmiloff- 
Smith, D. Parisi, K. Plunkett: Rethinking Innate- 
ness: A Connectionist Perspective on Development 
(MIT Press, Cambridge 1996) 

S. Nolfi, D. Floreano: Learning and evolution, au- 
ton, Robots 7(1), 89-113 (1999) 

N. Bernstein: The Coordination and Regulation of 
Movements (Pergamon, Oxford 1967) 

P. Savastano, S. Nolfi: Incremental learning in 
a 14 DOF simulated iCub robot: Modelling infant 
reach/grasp development, Lect. Notes Comput. 
Sci. 7375, 369-370 (2012) 

A.M. Collins, M.R. Quillian: Retrieval time from se- 
mantic memory, J. Verb. Learn. Verb. Behav. 8, 
240-247 (1969) 

E. Rosch: Cognitive representations of seman- 
tic categories, J. Exp. Psychol. Gen. 104, 192-233 
(1975) 

D.E. Rumelhart, J.L. McClelland, P.D.P. Group: Par- 
allel Distributed Processing: Explorations in the 
microstructure of Cognition (MIT Press, Cambridge 
1986) 

S. Harnad: The symbol grounding problem, Phys- 
ica D 42, 335-346 (1990) 

L.W. Barsalou: Grounded cognition, Annu. Rev. 
Psychol. 59, 617-645 (2008) 

M. Asada, K. Hosoda, Y. Kuniyoshi, H. Ishiguro, 
T. Inui, Y. Yoshikawa, M. Ogino, C. Yoshida: Cogni- 


37.53 


37.54 


37.55 


37.56 


37.57 


37.58 


37.59 


37.60 


37.61 


37.62 


37.63 


37.64 


37.65 


37.66 


37.67 


tive developmental robotics: A survey, IEEE Trans. 
Auton. Mental Dev. 1, 12-34 (2009) 

M. Lungarella, G. Metta, R. Pfeifer, G. Sandini: De- 
velopmental robotics: A survey, Connect Sci. 15(4), 
151-190 (2003) 

P.Y. Oudeyer: Developmental robotics. In: Ency- 
clopedia of the Sciences of Learning, Springer 
References Series, ed. by N.M. Seel (Springer, New 
York 2012) p. 329 

A. Glenberg, K. Kaschak: Grounding language in 
action, Psychon. Bull. Rev. 9(3), 558-565 (2002) 
M.H. Fischer, R.A. Zwaan: Embodied language - 
A review of the role of the motor system in lan- 
guage comprehension, Q. J. Exp. Psychol. 61(6), 
825-850 (2008) 

F. Pulvermüller: The Neuroscience of Language 
(Cambridge Univ. Press, Cambridge 2003) 

S.F. Cappa, D. Perani: The neural correlates 
of noun and verb processing, J. Neurolinguist. 
16(2/3), 183-189 (2003) 

0. Hauk, |. Johnsrude, F. Pulvermiller: Somato- 
topic representation of action words in human 
motor and premotor cortex, Neuron 41(2), 301-330 
(2004) 

M. Tomasello: Constructing a Language (Harvard 
Univ. Press, Cambridge 2003) 

L.B. Smith, L. Samuelson: Objects in space and 
mind: From reaching to words. In: Thinking 
Through Space: Spatial Foundations of Lan- 
guage and Cognition, ed. by K. Mix, L.B. Smith, 
M. Gasser (Oxford Univ. Press, Oxford 2010) 

A.F. Morse, T. Belpaeme, A. Cangelosi, L.B. Smith: 
Thinking with your body: Modelling spatial biases 
in categorization using a real humanoid robot, 
2010 Annu. Meet. Cogn. Sci. Soc. (2010) pp. 33- 
38 

G. Metta, L. Natale, F. Nori, G. Sandini, D. Ver- 
non, L. Fadiga, C. von Hofsten, J. Santos-Victor, 
A. Bernardino, L. Montesano: The iCub humanoid 
robot: An open-systems platform for research in 
cognitive development, Neural Netw. 23, 1125-1134 
(2010) 

A.F. Morse, J. de Greeff, T. Belpaeme, A. Cangelosi: 
Epigenetic robotics architecture (ERA), IEEE Trans. 
Auton. Mental Dev. 2(4), 325-339 (2010) 

Y. Sugita, J. Tani: Learning semantic combinatori- 
ality from the interaction between linguistic and 
behavioral processes, Adapt. Behav. 13(1), 33-52 
(2005) 

V. Tikhanoff, A. Cangelosi, G. Metta: Language un- 
derstanding in humanoid robots: iCub simulation 
experiments, IEEE Trans. Auton. Mental Dev. 3(1), 
17-29 (2011) 

E. Tuci, T. Ferrauto, A. Zeschel, G. Massera, S. Nolfi: 
An experiment on behavior generalization and 
the emergence of linguistic compositionality in 
evolving robots, IEEE Trans. Auton. Mental Dev. 
3(2), 176-189 (2011) 


Embodied Intelligence 


References 


37.68 


37.69 


37.70 


37.71 


37.72 


37.73 


37.74 


37.75 


37.76 


37.77 


37.78 


37.79 


37.80 


37.81 


37.82 


37.83 


Y. Yamashita, J. Tani: Emergence of functional hi- 
erarchy in a multiple timescale neural network 
model: A humanoid robot experiment, PLoS Com- 
put. Biol. &(11), e1000220 (2008) 

L. Steels: Modeling the cultural evolution of lan- 
guage, Phys. Life Rev. 8(4), 339-356 (2011) 

L. Steels: Experiments in Cultural Language Evo- 
lution, Advances in Interaction Studies, Vol. 3 
(John Benjamins, Amsterdam 2012) 

F. Stramandinoli, D. Marocco, A. Cangelosi: The 
grounding of higher order concepts in action 
and language: A cognitive robotics model, Neural 
Netw. 32, 165-173 (2012) 

J. Piaget: The Origins of Intelligence in Children 
(International Univ. Press, New York 1952) 

G.J. Groen, J.M. Parkman: A chronometric analysis 
of simple addition, Psychol. Rev. 79(4), 329-343 
(1972) 

R.S. Moyer, T.K. Landauer: Time required for 
judgements of numerical inequality, Nature 215, 
1519-1520 (1967) 

M. Rucinski, A. Cangelosi, T. Belpaeme: An em- 
bodied developmental robotic model of inter- 
actions between numbers and space, Expanding 
the Space of Cognitive Science, 23rd Annu. Meet. 
Cogn. Sci. Soc., ed. by L. Carlson, C. Hoelscher, 
T.F. Shipley (Cognitive Science Society, Austin 2011) 
pp. 237-242 

S. Dehaene, S. Bossini, P. Giraux: The mental 
representation of parity and number magnitude, 
J. Exp. Psychol. Gen. 122, 371-396 (1993) 

G. Wood, H.C. Nuerk, K. Willmes, M.H. Fischer: On 
the cognitive link between space and number: 
A meta-analysis of the SNARC effect, Psychol. Sci. 
Q. 50(4), 489-525 (2008) 

M.H. Fischer, A.D. Castel, M.D. Dodd, J. Pratt: Per- 
ceiving numbers causes spatial shifts of attention, 
Nat. Neurosci. 6(6), 555-556 (2003) 

D.B. Berch, E.J. Foley, R. Hill, R.P. McDonough: 
Extracting parity and magnitude from Arabic nu- 
merals: Developmental changes in number pro- 
cessing and mental representation, J. Exp. Child 
Psychol. 74, 286-308 (1999) 

M.H. Fischer: Finger counting habits modulate 
spatial-numerical associations, Cortex 44, 386- 
392 (2008) 

S.M. Göbel, S. Shaki, M.H. Fischer: The cultural 
number line: A review of cultural and linguis- 
tic influences on the development of number 
processing, J. Cross-Cult. Psychol. 42, 543-565 
(2011) 

S. Shaki, M.H. Fischer: Reading space into num- 
bers — A cross-linguistic comparison of the SNARC 
effect, Cognition 108, 590-599 (2008) 

S. Shaki, M.H. Fischer, W.M. Petrusic: Reading 
habits for both words and numbers contribute to 
the SNARC effect, Psychon. Bull. Rev. 16(2), 328-331 
(2009) 


37.84 


37.85 


37.86 


37.87 


37.88 


37.89 


37.90 


37.91 


37.92 


37.93 


37.94 


37.95 


37.96 


37.97 


37.98 


37.99 


37.100 


M.H. Fischer, R. Mills, S. Shaki: How to cook 
a SNARC: Number placement in text rapidly 
changes spatial-numerical associations, Brain 
Cogn. 72, 333-336 (2010) 

M.H. Fischer, P. Brugger: When digits help dig- 
its: Spatial-numerical associations point to finger 
counting as prime example of embodied cogni- 
tion, Front. Psychol. 2, 260 (2011) 

M. Pinhas, M.H. Fischer: Mental movements 
without magnitude? A study of spatial biases 
in symbolic arithmetic, Cognition 109, 408-415 
(2008) 

M. Rucinski, A. Cangelosi, T. Belpaeme: Robotic 
model of the contribution of gesture to learn- 
ing to count, Proc. IEEE ICDL-EpiRob Conf. Dev. 
(2012) 

Q. Chen, T. Verguts: Beyond the mental number 
line: A neural network model of number-space 
interactions, Cogn. Psychol. 60(3), 218-240 (2010) 
D. Caligiore, A.M. Borghi, D. Parisi, G. Baldassarre: 
TRoPICALS: A computational embodied neuro- 
science model of compatibility effects, Psychol. 
Rev. 117, 1188-1228 (2010) 

D. Caligiore, A.M. Borghi, R. Ellis, A. Cangelosi, 
G. Baldassarre: How affordances associated with 
a distractor object can cause compatibility effects: 
A study with the computational model TRoPICALS, 
Psychol. Res. 77(1), 7-19 (2013) 

0. Lindemann, A. Alipour, M.H. Fischer: Finger 
counting habits in Middle-Eastern and Western 
individuals: An online survey, J. Cross-Cult. Psy- 
chol. 42, 566-578 (2011) 

M.W. Alibali, A.A. DiRusso: The function of gesture 
in learning to count: More than keeping track, 
Cogn. Dev. 14(1), 37-56 (1999) 

R. Pfeifer, M. Lungarella, F. lida: The challenges 
ahead for bio-inspired ‘soft’ robotics, Commun. 
ACM 55(11), 76-87 (2012) 

M. Lungarella, O. Sporns: Information self- 
structuring: Key principle for learning and devel- 
opment, Proc. 4th Int. Conf. Dev. Learn. (2005) 

P. Capdepuy, D. Polani, C. Nehaniv: Maximization 
of potential information flow as a universal util- 
ity for collective behaviour, Proc. 2007 IEEE Symp. 
Artif. Life (CI-ALife 2007) (2007) pp. 207-213 

L. Gleitman: The structural sources of verb mean- 
ings, Lang. Acquis. 1, 135-176 (1990) 

J. Mayor, K. Plunkett: Vocabulary explosion: Are 
infants full of Zipf?, Proc. 32nd Annu. Meet. Cogn. 
Sci. Soc., ed. by S. Ohlsson, R. Catrambone (Cog- 
nitive Science Society, Austin 2010) 

E. Thelen, L.B. Smith: A Dynamic Systems Ap- 
proach to the Development of Cognition and Ac- 
tion (MIT Press, Cambridge 1994) 

M.L. McKinney, K.J. McNamara: Heterochrony, the 
Evolution of Ontogeny (Plenum, New York 1991) 
A. Cangelosi: Heterochrony and adaptation in de- 
veloping neural networks, Proc. GECCO99 Genet. 


713 


ZE | d Hed 


714 Part D 


Neural Networks 


ZE | d Hed 


37.101 


37.102 


Evol. Comput. Conf., ed. by W. Banzhaf (Morgan 
Kaufmann, San Francisco 1999) pp. 1241-1248 
G.C. Hinton, S.J. Nowlan: How learning can guide 
evolution, Complex Syst. 1, 495-502 (1987) 

S. Nolfi, J.L. Elman, D. Parisi: Learning and evolu- 
tion in neural networks, Adapt. Behav. 3(1), 5-28 
(1994) 


37.103 


37.104 


37.105 


J. Bongard: Morphological change in machines 
accelerates the evolution of robust behavior, Proc. 
Natl. Acad. Sci. USA 108(4), 1234-1239 (2011) 

S. Kumar, P. Bentley (Eds.): On Growth, Form, and 
Computers (Academic, London 2003) 

K.0. Stanley, R. Miikkulainen: A taxonomy for ar- 
tifcial embryogeny, Artif. Life 9, 93-130 (2003) 


38. Neuromorphic Engineering 


Giacomo Indiveri 


Neuromorphic engineering is a relatively young 
field that attempts to build physical realizations 
of biologically realistic models of neural sys- 
tems using electronic circuits implemented in very 
large scale integration technology. While originally 
focusing on models of the sensory periphery im- 
plemented using mainly analog circuits, the field 
has grown and expanded to include the modeling 
of neural processing systems that incorporate the 
computational role of the body, that model learn- 
ing and cognitive processes, and that implement 
large distributed spiking neural networks using 
a variety of design techniques and technologies. 
This emerging field is characterized by its multi- 
disciplinary nature and its focus on the physics 
of computation, driving innovations in theoretical 
neuroscience, device physics, electrical engineer- 
ing, and computer science. 


38.1 The Origins 


Models of neural information processing systems that 
link the type of information processing that takes place 
in the brain with theories of computation and com- 
puter science date back to the origins of computer 
science itself [38.1, 2]. The theory of computation based 
on abstract neural networks models was developed al- 
ready in the 1950s [38.3,4], and the development of 
artificial neural networks implemented on digital com- 
puters was very popular throughout the 1980s and the 
early 1990s [38.5—8]. Similarly, the history of imple- 
menting electronic models of neural circuits extends 
back to the construction of perceptrons in the late 
1950s [38.3] and retinas in the early 1970s [38.9]. How- 
ever, the modern wave of research utilizing very large 
scale integration technology and emphasizing the non- 
linear current characteristics of the transistor to study 
and implement neural computation began only in the 
mid-1980s, with the collaboration that sprung up be- 


38.1 The OMES :.:ci.s..ccc se scecavescewescccsesenade 715 
38.2 Neural and Neuromorphic Computing... 716 


38.3 The Importance 
of Fundamental Neuroscience .............. 717 


38.4 Temporal Dynamics 


in Neuromorphic Architectures ............. 718 
38.5 Synapse and Neuron Circuits................. 719 

38.5.1 Spike-Based Learning Circuits..... 720 
38.6 Spike-Based Multichip Neuromorphic 

SYSS er EE 721 
38.7 State-Dependent Computation 

in Neuromorphic Systems..................... 722 
38.8 COMCIUSIONS ensi iy T22 
ROTOMONCOS oi. scccssiessciaseancesseascesseacaeersuanesarees 23 


tween scientists such as Max Delbriick, John Hopfield, 
Carver Mead, and Richard Feynman [38.10]. Inspired 
by graded synaptic transmission in the retina, Mead 
sought to use the graded (analog) properties of tran- 
sistors, rather than simply operating them as on-off 
(digital) switches, to build circuits that emulate biologi- 
cal neural systems. He developed neuromorphic circuits 
that shared many common physical properties with pro- 
teic channels in neurons, and that consequently required 
far fewer transistors than digital approaches to emulat- 
ing neural systems [38.11]. Neuromorphic engineering 
is the research field that was born out of this activity 
and which carries on that legacy: it takes inspiration 
from biology, physics, mathematics, computer science, 
and engineering to design artificial neural systems for 
carrying out robust and efficient computation using low 
power, massively parallel analog very large scale in- 
tegration (VLSI) circuits, that operate with the same 


715 


v 
o 

= 

= 
o 
Ww 
© 
= 


716 Part D | Neural Networks 


T'8E | d Hed 


physics of computation present in the brain [38.12]. In- 
deed, this young research field was born both out of 
the Physics of Computation course taught at Caltech 
by Carver Mead, John Hopfield, and Richard Feynman 
and with Mead’s textbook Analog Very Large Scale In- 
tegration and Neural Systems [38.11]. Prominent in the 
early expansion of the field were scientists and engi- 
neers such as Christof Koch, Terry Sejnowski, Rodney 
Douglas, Andreas Andreou, Paul Mueller, Jan van der 
Spiegel, and Eric Vittoz, training a generation of cross- 
disciplinary students. Examples of successes in neuro- 
morphic engineering range from the first biologically 
realistic silicon neuron [38.13], or realistic silicon mod- 
els of the mammalian retina [38.14], to more recent 
silicon cochlea devices potentially useful for cochlear 
implants [38.15], or complex distributed multichip ar- 
chitectures for implementing event-driven autonomous 
behaving systems [38.16]. 

It is now a well-established field [38.17], with 
two flagship workshops (the Telluride Neuromorphic 
Engineering [38.18] and Capo Caccia Cognitive Neu- 
romorphic Engineering [38.19] workshops) that are 


currently still held every year. Neuromorphic circuits 
are now being investigated by many academic and in- 
dustrial research groups worldwide to develop a new 
generation of computing technologies that use the same 
organizing principles of the biological nervous sys- 
tem [38.15,20,21]. Research in this field represents 
frontier research as it opens new technological and 
scientific horizons: in addition to basic science ques- 
tions on the fundamental principles of computation 
used by the cortical circuits, neuromorphic engineering 
addresses issues in computer-science, and electrical en- 
gineering which go well beyond established frontiers 
of knowledge. A major effort is now being invested 
for understanding how these neuromorphic computa- 
tional principles can be implemented using massively 
parallel arrays of basic computing elements (or cores), 
and how they can be exploited to create a new gener- 
ation of computing technologies that takes advantage 
of future (nano)technologies and scaled VLSI pro- 
cesses, while coping with the problems of low-power 
dissipation, device unreliability, inhomogeneity, fault 
tolerance, etc. 


38.2 Neural and Neuromorphic Computing 


Neural computing (or neurocomputing) is concerned 
with the implementation of artificial neural networks 
for solving practical problems. Similarly, hardware im- 
plementations of artificial neural networks (neurocom- 
puters) adopt mainly statistics and signal processing 
methods to solve the problem they are designed to 
tackle. These algorithms and systems are not neces- 
sarily tied to detailed models of neural or cortical 
processing. Neuromorphic computing on the other hand 
aims to reproduce the principles of neural computa- 
tion by emulating as faithfully as possible the detailed 
biophysics of the nervous system in hardware. In this 
respect, one major characteristic of these systems is 
their use of spikes for representing and processing sig- 
nals. This is not an end in itself: spiking neural networks 
represent a promising computational paradigm for solv- 
ing complex pattern recognition and sensory processing 
tasks that are difficult to tackle using standard ma- 
chine vision and machine learning techniques [38.22, 
23]. Much research has been dedicated to software 
simulations of spiking neural networks [38.24], and 
a wide range of solutions have been proposed for solv- 
ing real-world and engineering problems [38.25, 26]. 
Similarly, there are projects that focus on software 


simulations of large-scale spiking neural networks for 
exploring the computational properties of models of 
cortical circuits [38.27,28]. Recently, several research 
projects have been established worldwide to develop 
large-scale hardware implementations of spiking neural 
systems using VLSI technologies, mainly for allow- 
ing neuroscientists to carry out simulations and virtual 
experiments in real time or even faster than real-time 
scales [38.29-31]. Although dealing with hardware im- 
plementations of neural systems, either with custom 
VLSI devices or with dedicated computer architectures, 
these projects represent the conventional neurocomput- 
ing approaches, rather than neuromorphic-computing 
ones. Indeed, these systems are mainly concerned with 
fast and large simulations of spiking neural networks. 
They are optimized for speed and precision, at the cost 
of size and power consumption (which ranges from 
megawatts to kilowatts, depending on which approach 
is followed). An example of an alternative large-scale 
spiking neural network implementation that follows the 
original neuromorphic engineering principles (i. e., that 
exploits the characteristics of VLSI technology to di- 
rectly emulate the biophysics and the connectivity of 
cortical circuits) is represented by the Neurogrid sys- 


Neuromorphic Engineering 


38.3 The Importance of Fundamental Neuroscience 


tem [38.32]. This system comprises an array of 16 VLSI 
chips, each integrating mixed analog neuromorphic 
neuron and synapse circuits with digital asynchronous 
event routing logic. The chips are assembled on a 16.5 x 
19 cm? printed circuit board, and the whole system can 
model over one million neurons connected by billions 
of synapses in real time, and using only about ~ 3W of 
power [38.32]. 

Irrespective of the approach followed, these projects 
have two common goals: On one hand they aim to 
advance our understanding of neural processing in 


the brain by developing models and physically build- 
ing them using electronic circuits, and on the other 
they aim to exploit this understanding for develop- 
ing a new generation of radically different non-von 
Neumann computing technologies that are inspired by 
neural and cortical circuits. In this interdisciplinary 
journey neuroscience findings will influence theoreti- 
cal developments, and these will determine specifica- 
tions and constraints for developing new neuromor- 
phic circuits and systems that can implement them 
optimally. 


38.3 The Importance of Fundamental Neuroscience 


The neocortex is a remarkable computational de- 
vice [38.33]. It is the neuronal structure in the brain that 
most expresses biology’s ability to implement percep- 
tion and cognition. Anatomical and neurophysiological 
studies have shown that the mammalian cortex with its 
laminar organization and regular microscopic structure 
has a surprisingly uniform architecture [38.34]. Since 
the original work of Gilbert and Wiesel [38.35] on the 
neural circuits of visual cortex it has been argued that 
this basic architecture, and its underlying computational 
principles computational principles can be understood 
in terms of the laminar distribution of relatively few 
classes of excitatory and inhibitory neurons [38.34]. 
Based on these slow, unreliable and inhomogeneous 
computing elements, the cortex easily outperform to- 
day’s most powerful computers in a wide variety of 
computational tasks such as vision, audition, or mo- 
tor control. Indeed, despite the remarkable progress 
in information and communication technology and the 
vast amount of resources dedicated to information and 
communication technology research and development, 
today’s most fastest and largest computers are still not 
able to match neural systems, when it comes to carry- 
ing out robust computations in real-world tasks. The 
reasons for this performance gap are not yet fully un- 
derstood, but it is clear that one fundamental difference 
between the two types of computing systems lies in the 
style of computation. Rather than using Boolean logic, 
precise digital representations, and clocked operations, 
nervous systems carry out robust and reliable computa- 
tion using hybrid analog/digital unreliable components; 
they emphasize distributed, event driven, collective, 
and massively parallel mechanisms, and make exten- 
sive use of adaptation, self-organization and learning. 
Specifically, the patchy organization of the neurons in 


the cortex suggests a computational machine where 
populations of neurons perform collective computa- 
tion in individual clusters, transmit the results of this 
computation to neighboring clusters, and set the local 
context of the cluster by means of feedback connec- 
tions from/to other relevant cortical areas. This overall 
graphical architecture resembles graphical processing 
models that perform Bayesian inference [38.36, 37]. 
However, the theoretical knowledge for designing and 
analyzing these models is limited mainly to graphs 
without loops, while the cortex is characterized by 
massive recurrent (loopy) connectivity schemes. Re- 
cent studies exploring loopy graphical models related to 
cortical architectures started to emerge [38.33, 38], but 
issues of convergence and accuracy remain unresolved, 
hardware implementations in a cortical architectures 
composed of spiking neurons have not been addressed 
yet. 

Understanding the fundamental computational prin- 
ciples used by the cortex, how they are exploited for 
processing, and how to implement them in hardware, 
will allow us to develop radically novel computing 
paradigms and to construct a new generation of infor- 
mation and communication technology that combine 
the strengths of silicon technology with the perfor- 
mance of brains. Indeed fundamental research in neu- 
roscience has already made substantial progress in un- 
covering these principles, and information and commu- 
nication technologies have advanced to a point where 
it is possible to integrate almost as many transistors in 
a VLSI system as neurons in a brain. From the theoret- 
ical standpoint of view, it has been demonstrated that 
any Turing machine, and hence any conceivable dig- 
ital computation, can be implemented by a noise-free 
network of spiking neurons [38.39]. It has also been 


717 


€°se | d Hed 


718 PartD 


Neural Networks 


se | d Hed 


shown that networks of spiking neurons can carry out 
a wide variety of complex state-dependent computa- 
tions, even in the presence of noise [38.40—44]. How- 
ever, apart from isolated results, a general insight into 
which computations can be carried out in a robust man- 
ner by networks of unreliable spiking elements is still 
missing. Current proposals in state-of-the-art computa- 
tional and theoretical neuroscience research represent 
mainly approximate functional models and are imple- 
mented as abstract artificial neural networks [38.45, 
46]. It is less clear how these functions are realized 
by the actual networks of neocortex [38.34], how these 
networks are interconnected locally, and how percep- 


tual and cognitive computations can be supported by 
them. 

Both additional neurophysiological studies on neu- 
ron types and quantitative descriptions of local and 
inter-areal connectivity patterns are required to deter- 
mine the specifications for developing the neuromor- 
phic VLSI analogs of the cortical circuits studied, and 
additional computational neuroscience and neuromor- 
phic engineering studies are required to understand 
what level of detail to use in implementing spiking 
neural networks, and what formal methodology to use 
for synthesizing and programming these non-von Neu- 
mann computational architectures. 


38.4 Temporal Dynamics in Neuromorphic Architectures 


Neuromorphic spiking neural network architectures 
typically comprise massively parallel arrays of sim- 
ple processing elements with memory and computation 
co-localized (Fig. 38.1). Given their architectural con- 
straints, these neural processing systems cannot process 
signals using the same strategies used by the conven- 
tional von Neumann computing architectures, such as 
digital signal processor or central processing unit, that 
time-domain multiplex small numbers of highly com- 
plex processors at high clock rates and operate by 
transferring the partial results of the computation from 
and to external memory banks. The synapses and neu- 
rons in these architectures have to process input spikes 
and produce output responses as the input signals ar- 
rive, in real time, at the rate of the incoming data. It is 
not possible to virtualize time and transfer partial re- 
sults in memory banks outside the architecture core, 
at higher rates. Rather it is necessary to employ re- 
sources that compute with time constants that are well 
matched to those of the signals they are designed to pro- 
cess. Therefore, to interact with the environment and 
process signals with biological timescales efficiently, 
hardware neuromorphic systems need to be able to 
compute using biologically realistic time constants. In 
this way, they are well matched to the signals they 
process, and are inherently synchronized with the real 
world events. 

This constraint is not easy to satisfy using analog 
VLSI technology. Standard analog circuit design tech- 
niques either lead to bulky and silicon-area expensive 
solutions [38.47] or fail to meet this condition, resorting 
to modeling neural dynamics at accelerated unrealistic 
timescales [38.48—50]. 


One elegant solution to this problem is to use cur- 
rent-mode design techniques [38.51] and log-domain 
circuits operated in the weak-inversion regime [38.52]. 
When metal oxide semiconductor field effect transis- 
tors are operated in this regime, the main mechanism 
of carrier transport is that of diffusion, as it is for ions 
flowing through proteic channels across neuron mem- 
branes. In general, neuromorphic VLSI circuits operate 
in this domain (also known as the subthreshold regime), 
and this is why they share many common physical prop- 
erties with proteic channels in neurons [38.52]. For 
example, metal oxide semiconductor field effect tran- 
sistor have an exponential relationship between gate- 
to-source voltage and drain current, and produce cur- 
rents that range from femto- to nanoampere resolution. 
In this domain, it is therefore possible to integrate rela- 
tively small capacitors in VLSI circuits, to implement 
temporal filters that are both compact and have bio- 
logically realistic time constants, ranging from tens to 
hundreds of milliseconds. 

A very compact subthreshold log-domain circuit 
that can reproduce biologically plausible temporal dy- 
namics is the differential pair integrator circuit [38.53], 
shown in Fig. 38.2. It can be shown, by log-domain cir- 
cuit analysis techniques [38.54, 55] that the response of 
this circuit is governed by the following first-order dif- 
ferential equation 


I d Inhi 
T (: F 1) lout + Lout = oe —In, 


(38.1) 
Tour J dt lr 


where the time constant t & C Ur/kl, the term Ur rep- 
resents the thermal voltage and « the subthreshold slope 
factor [38.52]. 


Neuromorphic Engineering | 38.5 Synapse and Neuron Circuits 


Synaptic inputs Constant current 


(J Synapse Vren (V) 


1 
C | Soma 
0.8 


> 
0 0.05 0.1 0.15 0.2 
Time (s) 


Input layer Hidden layer Output layer 


Z < } 
Ln SN 
BAAN 


Fig. 38.1 Neuromorphic spiking neural network architectures: detailed biophysical models of cortical circuits are derived from 
neuroscience experiments; neural networks models are designed, with realistic spiking neurons and dynamic synapses; these are 
mapped into analog circuits, and integrated in large numbers on VLSI chips. Input spikes are integrated by synaptic circuits, 
which drive their target postsynaptic neurons, which in turn integrate all synaptic inputs and generate action potentials. Spikes 
of multiple neurons are transmitted off chip using asynchronous digital circuits, to eventually control in real-time autonomous 


behaving systems 


Vdd 


|% 


Fig. 38.2 Schematic diagram of neuromorphic integrator 
circuit. Input currents are integrated in time to produce out- 
put currents with large time constants, and with a tunable 
gain factor 


38.5 Synapse and Neuron Circuits 


Synapses are fundamental elements for computation 
and information transfer in both real and artificial neu- 
ral systems. They play a crucial role in neural coding 
and learning algorithms, as well as in neuromorphic 
neural network architectures. While modeling the non- 
linear properties and the dynamics of real synapses 


Although this first-order nonlinear differential 
equation cannot be solved analytically, for sufficiently 
large input currents (Jj, >> Ir) the term — Jm on the 
right-hand side of (38.1) becomes negligible, and even- 
tually when the condition Joy; >> In is met, the equation 
can be well approximated by 

d Tintin 


Ta lout + Tout = a : 


(38.2) 


Under the reasonable assumptions of nonnegligi- 
ble input currents, this circuit implements therefore 
a compact linear integrator with time constants that 
can be set to range from microseconds to hundreds of 
milliseconds. It is a circuit that can be used to build 
neuromorphic sensory systems that interact with the 
environment [38.56], and most importantly, is is a cir- 
cuit that reproduces faithfully the dynamics of synaptic 
transmission observed in biological synapses [38.57]. 


can be extremely onerous for software simulations in 
terms of computational power, memory requirements, 
and simulation time, neuromorphic synapse circuits can 
faithfully reproduce synaptic dynamics using integra- 
tors such as the differential pair integrator shown in 
Fig. 38.2. The same differential pair integrator cir- 


119 


S'8E | d Hed 


720 PartD 


Neural Networks 


S'8E | d Hed 


cuit can be used to model the passive leak and con- 
ductance behavior in silicon neurons. An example of 
a silicon neuron circuit that incorporated the differen- 
tial pair integrator is shown in Fig. 38.3. This circuit 
implements an adaptive exponential integrate-and-fire 
neuron model [38.58]. In addition to the conductance- 
based behavior, it implements a spike-frequency adap- 
tation mechanisms, a positive feedback mechanism that 
models the effect of sodium activation and inactivation 
channels, and a reset mechanism with a free parameter 
that can be used to set the neuron’s reset potential. The 
neuron’s input differential pair integrator integrates the 
input current until it approaches the neuron’s threshold 
voltage. As the positive feedback circuit gets activated, 
it induces an exponential rise in the variable that rep- 
resents the model neuron membrane potential, which 
in the circuit of Fig. 38.3 is the current Jem. This 
quickly causes the neuron to produce an action poten- 
tial and make a request for transmitting a spike (1. e., the 
REQ signal of Fig. 38.3 is activated). Once the digital 
request signal is acknowledged, the membrane capaci- 
tance Cmem is reset to the neuron’s tunable reset poten- 
tial V.... These types of neuron circuits have been shown 
to be extremely low power, consuming about 7 pJ per 


spike [38.59]. In addition, the circuit is extremely com- 
pact compared to alternative designs [38.58], while still 
being able to reproduce realistic dynamics. 

As synapse and neuron circuits integrate their cor- 
responding input signals in parallel, the neural network 
emulation time does not depend on the number of 
elements involved, and the network response always 
happen in real time. These circuits can be therefore used 
to develop low-power large-scale hardware neural ar- 
chitectures, for signal processing and general purpose 
computing [38.58] 


38.5.1 Spike-Based Learning Circuits 


As large-scale very large scale integration (VLSI) net- 
works of spiking neurons are becoming realizable, the 
development of robust spike-based learning methods, 
algorithms, and circuits has become crucial. Spike- 
based learning mechanisms enable the hardware neural 
systems they are embedded in to adapt to the statis- 
tics of their input signals, to learn and classify complex 
sequences of spatiotemporal patterns, and eventually 
to implement general purpose state-dependent com- 
puting paradigms. Biologically plausible spike-driven 


Fig. 38.3 Schematic diagrams of a conductance-based integrate-and-fire neuron. An input differential pair integrator 
low-pass filter (M_i—3) implements the neuron leak conductance. A noninverting amplifier with current-mode positive 
feedback (Mai—6) produces address events at extremely low-power operation. A reset block (Mri—«) resets the neuron 
to the reset voltage Vs and keeps it reset for a refractory period, set by the V,ef bias voltage. An additional differential 
pair integrator low-pass filter (Mci—6) integrates the output events in a negative feedback loop, to implement a spike- 


frequency adaptation mechanism 


Neuromorphic Engineering 


38.6 Spike-Based Multichip Neuromorphic Systems 


synaptic plasticity mechanisms have been thoroughly 
investigated in recent years. It has been shown, for ex- 
ample, how spike-timing dependent plasticity (STDP) 
can be used to learn to encode temporal patterns 
of spikes [38.42,60,61]. In spike-timing dependent 
plasticity the relative timing of pre- and postsynap- 
tic spikes determine how to update the efficacy of 
a synapse. Plasticity mechanisms based on the timing 
of the spikes map very effectively onto silicon neuro- 
morphic devices, and so a wide range of spike-timing 
dependent plasticity models have been implemented 
in VLSI [38.62-67]. It is therefore possible to build 


large-scale neural systems that can carry out signal 
processing and neural computation, and include adap- 
tation and learning. These types of systems are, by 
their very own nature, modular and scalable. It is pos- 
sible to develop very large scale systems by designing 
basic neural processing cores, and by interconnecting 
them together [38.68]. However, to interconnect mul- 
tiple neural network chips among each other, or to 
provide sensory inputs to them, or to interface them to 
conventional computers or robotic platforms, it is nec- 
essary to develop efficient spike-based communication 
protocols and interfaces. 


38.6 Spike-Based Multichip Neuromorphic Systems 


In addition to using spikes for signal efficient process- 
ing and computations, neuromorphic systems can use 
spiking representations also for efficient communica- 
tion. The use of asynchronous spike- or event-based 
representations in electronic systems can be energy effi- 
cient and fault tolerant, making them ideal for building 
modular systems and creating complex hierarchies of 
computation. In recent years, a new class of neuromor- 
phic multichip systems started to emerge [38.69-71]. 
These systems typically comprise one or more neuro- 
morphic sensors, interfaced to general-purpose neural 
network chips comprising spiking silicon neurons and 
dynamic synapses. The strategy used to transmit sig- 
nals across chip boundaries in these types of systems 
is based on asynchronous address-events: output events 
are represented by the addresses of the neurons that 
spiked, and transmitted in real time on a digital bus 


Encode Decode 
— o — r 


(Fig. 38.4). The communication protocol used by these 
systems is commonly referred to as address event rep- 
resentation [38.72, 73]. The analog nature of the AER 
(address event representation) signals being transmitted 
is encoded in the mean frequency of the neurons spikes 
(spike rates) and in their precise timing. Both types of 
representations are still an active topic of research in 
neuroscience, and can be investigated in real time with 
these hardware systems. Once on a digital bus, the ad- 
dress events can be translated, converted or remapped 
to multiple destinations using the conventional logic 
and memory elements. Digital address event representa- 
tion infrastructures allow us to construct large multichip 
networks with arbitrary connectivity, and to seamlessly 
reconfigure the network topology. Although digital, the 
asynchronous real-time nature of the AER protocol 
poses significant technological challenges that are still 


Action potential 


Address-event 
representation of 
action potential 


Outputs i : 
Fig. 38.4 Asynchronous communi- 


cation scheme between two chips: 
when a neuron on the source chip 
generates an action potential, its ad- 
dress is placed on a common digital 
bus. The receiving chip decodes the 
address events and routes them to the 
appropriate synapses 


Destination 
chip 


721 


9'8E | d Hed 


722 


8°8e|d Hed 


Part D 


Neural Networks 


being actively investigated by the electrical engineering 
community [38.74]. But by using analog processing in 
the neuromorphic cores and asynchronous digital com- 
munication outside them, neuromorphic systems can 
exploit the best of both worlds, and implement compact 


low-power brain inspired neural processing systems 
that can interact with the environment in real time, and 
represent an alternative (complementary) computing 
technology to the more common and the conventional 
VLSI computing architectures. 


38.7 State-Dependent Computation in Neuromorphic Systems 


General-purpose cortical-like computing architectures 
can be interfaced to real-time autonomous behaving 
systems to process sensory signals and carry out event- 
driven state-dependent computation in real time. How- 
ever, while the circuit design techniques and technolo- 
gies for implementing these neuromorphic systems are 
becoming well established, formal methodologies for 
programming them, to execute specific procedures and 
solve user defined tasks, do no exist yet. A first step 
toward this goal is the definition of methods and pro- 
cedures for implementing state-dependent computation 
in networks of spiking neurons. In general, state-depen- 
dent computation in autonomous behaving systems has 
been a challenging research field since the advent of 
digital computers. Recent theoretical findings and tech- 
nological developments show promising results in this 
domain [38.16, 43,44, 75,76]. But the computational 
tasks that these systems are currently able to perform re- 
main rather simple, compared to what can be achieved 
by humans, mammals, and many other animal species. 
We know, for instance, that nervous systems can ex- 
hibit context-dependent behavior, can execute programs 
consisting of series of flexible iterations, and can condi- 
tionally branch to alternative behaviors. A general un- 
derstanding of how to configure artificial neural systems 
to achieve this sophistication of processing, including 
also adaptation, autonomous learning, interpretation of 
ambiguous input signals, symbolic manipulation, in- 
ference, and other characteristics that we could regard 
as effective cognition is still missing. But progress is 
being made in this direction by studying the computa- 


38.8 Conclusions 


In this chapter, we presented an overview of the neuro- 
morphic engineering field, focusing on very large scale 
integration implementations of spiking neural networks 
and on multineuron chips that comprise synapses and 
neurons with biophysically realistic dynamics, nonlin- 
ear properties, and spike-based plasticity mechanisms. 
We argued that the multineuron chips built using these 


tional properties of spiking neural networks configured 
as attractors or winner-take-all networks [38.33, 44, 
77). When properly configured, these architectures pro- 
duce persistent activities, which can be regarded as 
computational states. Both software and VLSI event- 
driven soft-winner-take-all architectures are being de- 
veloped to couple spike-based computational models 
among each other, using the asynchronous communi- 
cation infrastructure, and use them to investigate their 
computational properties as neural finite-state machines 
in autonomous behaving robotic platforms [38.44, 78]. 

The theoretical, modeling, and VLSI design inter- 
disciplinary activities is carried out with tight interac- 
tions, in an effort to understand: 


1. How to use the analog, unreliable, and low-preci- 
sion silicon neurons and synapse circuits operated 
in the weak-inversion regime [38.52] to carry out 
reliable and robust signal processing and pattern 
recognition tasks; 

2. How to compose networks of such elements and 
how to embody them in real-time behaving systems 
for implementing sets of prespecified desired func- 
tionalities and behaviors; and 

3. How to formalize these theories and techniques to 
develop a systematic methodology for configuring 
these networks and systems to achieve arbitrary 
state-dependent computations, similar to what is 
currently done using high-level programming lan- 
guages such as Java or C++ for conventional 
digital architectures. 


silicon neurons and synaptic circuits can be used to 
implement an alternative brain inspired computational 
paradigm that is complementary to the conventional 
ones based on von Neumann architectures. 

Indeed, the field of neuromorphic engineering has 
been very successful in developing a new generation of 
computing technologies implemented with design prin- 


Neuromorphic Engineering | References 


ciples based on those of the nervous systems, and which 
exploit the physics of computation used in biological 
neural systems. It is now possible to design and im- 
plement complex large-scale artificial neural systems 
with elaborate computational properties, such as spike- 
based plasticity and soft winner-take-all behavior, or 
even complete artificial sensory-motor systems, able to 
robustly process signals in real time using neuromor- 
phic VLSI technology. 

Within this context, neuromorphic VLSI technology 
can be extremely useful for exploring neural processing 
strategies in real time. While there are clear advantages 
of this technology, for example, in terms of power bud- 
get and size requirements, there are also restrictions 
and limitations imposed by the hardware implemen- 
tations that limit their possible range of applications. 
These constraints include for example limited resolu- 
tion in the state variables or bounded parameters (e.g., 
bounded synaptic weights that cannot grow indefinitely 
or become negative). Also the presence of noise and 
inhomogeneities in all circuit components, place se- 
vere limitations on the precision and reliability of the 
computations performed. However, most, if not all, the 
limitations that neuromorphic hardware implementa- 
tions face, (e.g., in maintaining stability, in achieving 
robust computation using unreliable components, etc.) 


References 


38.1 W.S. McCulloch, W. Pitts: A logical calculus of the 
ideas immanent in nervous activity, Bull. Math. 
Biophys. 5, 115-133 (1943) 

38.2 J. von Neumann: The Computer and the Brain (Yale 
Univ. Press, New Haven 1958) 

38.3 F. Rosenblatt: The perceptron: A probabilistic 
model for information storage and organization in 
the brain, Psychol. Rev. 65(6), 386-408 (1958) 

38.4 M.L. Minsky: Computation: Finite and Infinite Ma- 
chines (Prentice-Hall, Upper Saddle River 1967) 

38.5 J.J. Hopfield: Neural networks and physical systems 
with emergent collective computational abilities, 
Proc. Natl. Acad. Sci. USA 79(8), 2554-2558 (1982) 

38.6 D.E. Rumelhart, J.L. McClelland: Foundations, par- 
allel distributed processing. In: Explorations in the 
Microstructure of Cognition, ed. by D.E. Rumelhart, 
J.L. McClelland (MIT, Cambridge 1986) 

38.7 T. Kohonen: Self-Organization and Associative 
Memory, Springer Series in Information Sciences, 
2nd edn. (Springer, Berlin Heidelberg 1988) 

38.8 J. Hertz, A. Krogh, R.G. Palmer: Introduction to the 
Theory of Neural Computation (Addison-Wesley, 
Reading 1991) 

38.9 K. Fukushima, Y. Yamaguchi, M. Yasuda, S. Nagata: 
An electronic model of the retina, Proc. IEEE 58(12), 
1950-1951 (1970) 


are often the same one faced by real neural systems. 
So these limitations are useful for reducing the space 
of possible artificial neural models that explain or re- 
produce the properties of real cortical circuits. While in 
principle these features could be simulated also in soft- 
ware (e.g., by adding a noise term to each state variable, 
or restricting the resolution of variables to 3, 4, or 6 
bits instead of using the floating point representations), 
they are seldom taken into account. So in addition 
to representing a technology useful for implementing 
hardware neural processing systems and solving practi- 
cal applications, neuromorphic circuits can be used as 
an additional tool for studying and understanding basic 
neuroscience. 

As VLSI technology is widespread and readily ac- 
cessible, it is possible to easily learn (and train new 
generations of students) to design neuromorphic VLSI 
neural networks for building hardware models of neu- 
ral systems and sensory-motor systems. Understanding 
how to build real-time behaving neuromorphic systems 
that can work in real-world scenarios, will allow us to 
both gain a better understanding of the principles of 
neural coding in the nervous system, and develop a new 
generation of computing technologies that extend and 
that complement current digital computing devices, cir- 
cuits, and architectures. 


38.10 T. Hey: Richard Feynman and computation, Con- 
temp. Phys. 40(4), 257-265 (1999) 

38.11 C.A. Mead: Analog VLSI and Neural Systems 
(Addison-Wesley, Reading 1989) 

38.12 C. Mead: Neuromorphic electronic systems, Proc. 
IEEE 78(10), 1629-1636 (1990) 

38.13 M. Mahowald, R.J. Douglas: A silicon neuron, Na- 
ture 354, 515-518 (1991) 

38.14 M. Mahowald: The silicon retina, Sci. Am. 264, 76- 
82 (1991) 

38.15 R. Sarpeshkar: Brain power — borrowing from biol- 
ogy makes for low power computing — bionic ear, 
IEEE Spectrum 43(5), 24-29 (2006) 

38.16 R. Serrano-Gotarredona, T. Serrano-Gotarredona, 
A. Acosta-Jimenez, A. Linares-Barranco, G. Ji- 
ménez-Moreno, A. Civit-Balcells, B. Linares- 
Barranco: Spike events processing for vision sys- 
tems, Int. Symp. Circuits Syst. (ISCAS, Piscataway) 
(2007) 

38.17 G. Indiveri, T.K. Horiuchi: Frontiers in neuromor- 
phic engineering, Front. Neurosci. 5(118), 1-2 (2011) 

38.18 Telluride meuromorphic cognition engineer- 
ing workshop, — http://ine-web.org/workshops/ 
workshops- overview 

38.19 The Capo Caccia Workshops toward Cognitive Neu- 
romorphic Engineering. http://capocaccia.ethz.ch. 


723 


8€ | d Hed 


724 PartD 


Neural Networks 


8€ | d Hed 


38.20 


38.21 


38.22 


38.23 


38.24 


38.25 


38.26 


38.27 


38.28 


38.29 


38.30 


38.31 


38.32 


38.33 


38.34 


38.35 


38.36 


K.A. Boahen: Neuromorphic microchips, Sci. Am. 
292(5), 56-63 (2005) 

R.J. Douglas, M.A. Mahowald, C. Mead: Neuro- 
morphic analogue VLSI, Annu. Rev. Neurosci. 18, 
255-281 (1995) 

W. Maass, E.D. Sontag: Neural systems as nonlinear 
filters, Neural Comput. 12(8), 1743-1772 (2000) 

A. Belatreche, L.P. Maguire, M. McGinnity: Advances 
in design and application of spiking neural net- 
works, Soft Comput. 11(3), 239-248 (2006) 

R. Brette, M. Rudolph, T. Carnevale, M. Hines, 
D. Beeman, J.M. Bower, M. Diesmann, A. Morrison, 
P.H. Harris Jr., F.C. Goodman, M. Zirpe, T. Nat- 
schläger, D. Pecevski, B. Ermentrout, M. Djurfeldt, 
A. Lansner, 0. Rochel, T. Vieville, E. Muller, A.P. Da- 
vison, S. El Boustani, A. Destexhe: Simulation of 
networks of spiking neurons: A review of tools 
and strategies, J. Comput. Neurosci. 23(3), 349-398 
(2007) 

J. Brader, W. Senn, S. Fusi: Learning real world 
stimuli in a neural network with spike-driven 
synaptic dynamics, Neural Comput. 19, 2881-2912 
(2007) 

P. Rowcliffe, J. Feng: Training spiking neuronal net- 
works with applications in engineering tasks, IEEE 
Trans. Neural Netw. 19(9), 1626-1640 (2008) 

The Blue Brain Project. EPFL website. (2005) http:// 
bluebrain.epfl.ch/ 

E. Izhikevich, G. Edelman: Large-scale model of 
mammalian thalamocortical systems, Proc. Natl. 
Acad. Sci. USA 105, 3593-3598 (2008) 
Brain-Inspired Multiscale Computation in Neuro- 
morphic Hybrid Systems (BrainScaleS). FP7 269921 
EU Grant 2011-2015 

Systems of Neuromorphic Adaptive Plastic Scalable 
Electronics (SyNAPSE). US Darpa Initiative (http:// 
www.darpa.mil/dso/solicitations/baa08-28.html) 
(2009) 

R. Freidman: Reverse engineering the brain, 
Biomed. Comput. Rev. 5(2), 10-17 (2009) 

B.V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, 
A.R. Chandrasekaran, J.M. Bussat, R. Alvarez-Ica- 
za, J.V. Arthur, P.A. Merolla, K. Boahen: Neuro- 
grid: A mixed-analog-digital multichip system for 
large-scale neural simulations, Proc. IEEE 102(5), 
699-716 (2014) 

R.J. Douglas, K. Martin: Recurrent neuronal cir- 
cuits in the neocortex, Curr. Biol. 17(13), R496-R500 
(2007) 

R.J. Douglas, K.A.C. Martin: Neural circuits of 
the neocortex, Annu. Rev. Neurosci. 27, 419-451 
(2004) 

C.D. Gilbert, T.N. Wiesel: Clustered intrinsic connec- 
tions in cat visual cortex, J. Neurosci. 3, 1116-1133 
(1983) 

G.F. Cooper: The computational complexity of 
probabilistic inference using bayesian belief net- 
works, Artif. Intell. 42(2/3), 393-405 (1990) 


38.37 


38.38 


38.39 


38.40 


38.41 


38.42 


38.43 


38.44 


38.45 


38.46 


38.47 


38.48 


38.49 


38.50 


38.51 


38.52 


D.J.C. MacKay: Information Theory, Inference and 
Learning Algorithms (Cambridge Univ. Press, Cam- 
bridge 2003) 

A. Steimer, W. Maass, R. Douglas: Belief prop- 
agation in networks of spiking neurons, Neural 
Comput. 21, 2502-2523 (2009) 

W. Maass: On the computational power of winner- 
take-all, Neural Comput. 12(11), 2519-2535 (2000) 
W. Maass, P. Joshi, E.D. Sontag: Computational as- 
pects of feedback in neural circuits, PLOS Comput. 
Biol. 3(1), 1-20 (2007) 

L.F. Abbott, W.G. Regehr: Synaptic computation, 
Nature 431, 796-803 (2004) 

R. Gütig, H. Sompolinsky: The tempotron: A neu- 
ron that learns spike timing-based decisions, Nat. 
Neurosci. 9, 420-428 (2006) 

T. Wennekers, N. Ay: Finite state automata result- 
ing from temporal information maximization and 
a temporal learning rule, Neural Comput. 10(17), 
2258-2290 (2005) 

U. Rutishauser, R. Douglas: State-dependent com- 
putation using coupled recurrent networks, Neural 
Comput. 21, 478-509 (2009) 

P. Dayan, L.F. Abbott: Theoretical Neuroscience: 
Computational and Mathematical Modeling of 
Neural Systems (MIT, Cambridge 2001) 

M. Arbib (Ed.): The Handbook of Brain Theory and 
Neural Networks, 2nd edn. (MIT, Cambridge 2002) 
G. Rachmuth, H.Z. Shouval, M.F. Bear, C.-S. Poon: 
A biophysically-based neuromorphic model of 
spike rate- and timing-dependent plasticity, 
Proc. Natl. Acad. Sci. USA 108(49), E1266-E1274 
(2011) 

J. Schemmel, D. Briiderle, K. Meier, B. Ostendorf: 
Modeling synaptic plasticity within networks of 
highly accelerated | & F neurons, Int. Symp. Circuits 
Syst. (ISCAS, Piscataway) (2007) pp. 3367-3370 
J.H.B. Wijekoon, P. Dudek: Compact silicon neuron 
circuit with spiking and bursting behaviour, Neural 
Netw. 21(2/3), 524-534 (2008) 

D. Briiderle, M.A. Petrovici, B. Vogginger, M. Ehrlich, 
T. Pfeil, S. Millner, A. Griibl, K. Wendt, E. Müller, 
M.-0. Schwartz, D.H. de Oliveira, S. Jeltsch, J. Fie- 
res, M. Schilling, P. Miller, 0. Breitwieser, V. Pet- 
kov, L. Muller, A.P. Davison, P. Krishnamurthy, 
J. Kremkow, M. Lundqvist, E. Muller, J. Partzsch, 
S. Scholze, L. Zühl, C. Mayr, A. Destexhe, M. Dies- 
mann, T.C. Potjans, A. Lansner, R. Schüffny, 
J. Schemmel, K. Meier: A comprehensive work- 
flow for general-purpose neural modeling with 
highly configurable neuromorphic hardware sys- 
tems, Biol. Cybern. 104(4), 263-296 (2011) 

C. Tomazou, F.J. Lidgey, D.G. Haigh (Eds.): Analogue 
IC Design: The Current-Mode Approach (Peregrinus, 
Stevenage, Herts., UK 1990) 

S.-C. Liu, J. Kramer, G. Indiveri, T. Delbruck, 
R.J. Douglas: Analog VLSI: Circuits and Principles 
(MIT Press, Cambridge 2002) 


Neuromorphic Engineering | References 725 


38.53 


38.54 


38.55 


38.56 


38.57 


38.58 


38.59 


38.60 


38.61 


38.62 


38.63 


38.64 


38.65 


38.66 


38.67 


C. Bartolozzi, G. Indiveri: Synaptic dynamics in 
analog VLSI, Neural Comput. 19(10), 2581-2603 
(2007) 

E.M. Drakakis, A.J. Payne, C. Toumazou: Log- 
domain state-space: A systematic transistor-level 
approach for log-domain filtering, IEEE Trans. Cir- 
cuits Syst. Il 46(3), 290-305 (1999) 

D.R. Frey: Log-domain filtering: An approach to 
current-mode filtering, IEE Proc G 140(6), 406-416 
(1993) 

S.-C. Liu, T. Delbruck: Neuromorphic sensory sys- 
tems, Curr. Opin. Neurobiol. 20(3), 288-295 (2010) 

A. Destexhe, Z.F. Mainen, T.J. Sejnowski: Kinetic 
models of synaptic transmission. In: Methods in 
Neuronal Modelling, from lons to Networks, ed. 
by C. Koch, I. Segev (MIT Press, Cambridge 1998) 
pp. 1-25 

G. Indiveri, B. Linares-Barranco, T.J. Hamilton, 
A. van Schaik, R. Etienne-Cummings, T. Del- 
bruck, S.-C. Liu, P. Dudek, P. Hadfliger, S. Renaud, 
J. Schemmel, G. Cauwenberghs, J. Arthur, K. Hynna, 
F. Folowosele, S. Saighi, T. Serrano-Gotarredona, 
J. Wijekoon, Y. Wang, K. Boahen: Neuromorphic sil- 
icon neuron circuits, Front. Neurosci. 5, 1-23 (2011) 
P. Livi, G. Indiveri: A current-mode conductance- 
based silicon neuron for address-event neuromor- 
phic systems, Int. Symp. Circuits Syst. (ISCAS) (2009) 
pp. 2898-2901 

L.F. Abbott, S.B. Nelson: Synaptic plasticity: Taming 
the beast, Nat. Neurosci. 3, 1178-1183 (2000) 

R.A. Legenstein, C. Näger, W. Maass: What can 
a neuron learn with spike-timing-dependent 
plasticity?, Neural Comput. 17(11), 2337-2382 
(2005) 

S.A. Bamford, A.F. Murray, D.J. Willshaw: Spike- 
timing-dependent plasticity with weight depen- 
dence evoked from physical constraints, IEEE Trans, 
Biomed. Circuits Syst. 6(4), 385-398 (2012) 

S. Mitra, S. Fusi, G. Indiveri: Real-time classification 
of complex patterns using spike-based learning 
in neuromorphic VLSI, IEEE Trans. Biomed. Circuits 
Syst. 3(1), 32-42 (2009) 

G. Indiveri, E. Chicca, R.J. Douglas: A VLSI array of 
low-power spiking neurons and bistable synapses 
with spike-timing dependent plasticity, IEEE Trans. 
Neural Netw. 17(1), 211-221 (2006) 

A. Bofill, |. Petit, A.F. Murray: Synchrony detec- 
tion and amplification by silicon neurons with STDP 
synapses, IEEE Trans. Neural Netw. 15(5), 1296-1304 
(2004) 

S. Fusi, M. Annunziato, D. Badoni, A. Salamon, 
D.J. Amit: Spike-driven synaptic plasticity: Theory, 
simulation, VLSI implementation, Neural Comput. 
12, 2227-2258 (2000) 

P. Hafliger, M. Mahowald: Weight vector normal- 
ization in an analog VLSI artificial neuron using 
a backpropagating action potential. In: Neuro- 


38.68 


38.69 


38.70 


38.71 


38.72 


38.73 


38.74 


38.75 


38.76 


38.77 


38.78 


morphic Systems: Engineering Silicon from Neu- 
robiology, ed. by L.S. Smith, A. Hamilton (World 
Scientific, London 1998) pp. 191-196 

P.A. Merolla, J.V. Arthur, R. Alvarez-Icaza, A. Cas- 
sidy, J. Sawada, F. Akopyan, B.L. Jackson, N. Imam, 
A. Chandra, C. Guo, Y. Nakamura, B. Brezzo, 
I. Vo, S.K. Esser, R. Appuswamy, B. Taba, A. Amir, 
M.D. Flickner, W.P. Risk, R. Manohar, D.S. Modha: 
A million spiking-neuron integrated circuit with 
a scalable communication network and interface, 
Science 345(6197), 668-673 (2014) 

R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, 
A. Linares-Barranco, R. Paz-Vicente, F. Gómez- 
Rodriguez, L. Camunas-Mesa, R. Berner, M. Rivas- 
Perez, T. Delbruck, S.-C. Liu, R. Douglas, P. Häfliger, 
G. Jimenez-Moreno, A. Civit-Ballcels, T. Serrano- 
Gotarredona, A.J. Acosta-Jiménez, B. Linares- 
Barranco: CAVIAR: A 45k neuron, 5M synapse, 
12G connects/s aer hardware sensory- processing- 
learning-actuating system for high-speed visual 
object recognition and tracking, IEEE Trans. Neural 
Netw. 20(9), 1417-1438 (2009) 

E. Chicca, A.M. Whatley, P. Lichtsteiner, V. Dante, 
T. Delbruck, P. Del Giudice, R.J. Douglas, G. In- 
diveri: A multi-chip pulse-based neuromorphic 
infrastructure and its application to a model of ori- 
entation selectivity, IEEE Trans. Circuits Syst. 1 5(54), 
981-993 (2007) 

T.Y.W. Choi, P.A. Merolla, J.V. Arthur, K.A. Boahen, 
B.E. Shi: Neuromorphic implementation of orien- 
tation hypercolumns, IEEE Trans. Circuits Syst. | 
52(6), 1049-1060 (2005) 

M. Mahowald: An Analog VLSI System for Stereo- 
scopic Vision (Kluwer, Boston 1994) 

K.A. Boahen: Point-to-point connectivity between 
neuromorphic chips using address-events, IEEE 
Trans. Circuits Syst. Il 47(5), 416-434 (2000) 

A.J. Martin, M. Nystrom: Asynchronous techniques 
for system-on-chip design, Proc. IEEE 94, 1089- 
1120 (2006) 

G. Schoner: Dynamical systems approaches to cog- 
nition. In: Cambridge Handbook of Computational 
Cognitive Modeling, ed. by R. Sun (Cambridge Univ. 
Press, Cambridge 2008) pp. 101-126 

G. Indiveri, E. Chicca, R.J. Douglas: Artificial cogni- 
tive systems: From VLSI networks of spiking neurons 
to neuromorphic cognition, Cogn. Comput. 1, 119- 
127 (2009) 

M. Giulioni, P. Camilleri, M. Mattia, V. Dante, 
J. Braun, P. Del Giudice: Robust working memory 
in an asynchronously spiking neural network real- 
ized in neuromorphic VLSI, Front. Neurosci. 5, 1-16 
(2011) 

E. Neftci, J. Binas, U. Rutishauser, E. Chicca, G. In- 
diveri, R. Douglas: Synthesizing Cognition in neu- 
romorphic electronic Systems, Proc. Natl. Acad. Sci. 
USA 110(37), E3468-E3476 (2013) 


8€ | d Hed 


Damien Coyle, Ronen Sosnik 


Neuroengineering of sensorimotor rhythm-based 
brain-computer interface (BCI) systems is the 
process of using engineering techniques to un- 
derstand, repair, replace, enhance, or otherwise 
exploit the properties of neural systems, engaged 
in the representation, planning, and execution 
of volitional movements, for the restoration and 
augmentation of human function via direct inter- 
actions between the nervous system and devices. 

This chapter reviews information that is fun- 
damental for the complete and comprehensive 
understanding of this complex interdisciplinary 
research field, namely an overview of the motor 
system, an overview of recent findings in neu- 
roimaging and electrophysiology studies of the 
motor cortical anatomy and networks, and the 
engineering approaches used to analyze motor 
cortical signals and translate them into control 
signals that computer programs and devices can 
interpret. 

Specifically, the anatomy and physiology of 
the human motor system, focusing on the brain 
areas and spinal elements involved in the gen- 
eration of volitional movements is reviewed. The 
stage is then set for introducing human proto- 
typical motion attributes, sensorimotor learning, 
and several computational models suggested to 
explain psychophysical motor phenomena based 
on the current knowledge in the field of neuro- 
physiology. 

An introduction to invasive and non-invasive 
neural recording techniques, including func- 
tional and structural magnetic resonance imaging 
(fMRI and sMRI), electrocorticography (ECoG), elec- 
troencephalography (EEG), intracortical single 
unit activity (SU) and multiple unit extracellular 
recordings, and magnetoencephalography (MEG) 
is integrated with coverage aimed at elucidating 
what is known about sensory motor oscillations 
and brain anatomy, which are used to generate 
control signals for brain actuated devices and al- 


39. Neuroengineering 


ternative communication in BCI. Emphasis is on 

latest findings in these topics and on highlight- 

ing what information is accessible at each of the 
different scales and the levels of activity that are 
discernible or utilizable for the effective control 

of devices using intentional activation sensorimo- 
tor neurons and/or modulation of sensorimotor 

rhythms and oscillations. 

The nature, advantages, and drawbacks of var- 
ious approaches and their suggested functions as 
the neural correlates of various spatiotemporal 
motion attributes are reviewed. Sections dealing 
with signal analysis techniques, translation algo- 
rithms, and adaption to the brain's non-stationary 
dynamics present the reader with a wide-ranging 
review of the mathematical and statistical tech- 
niques commonly used to extract and classify the 
bulk of neural information recorded by the various 
recording techniques and the challenges that are 
posed for deploying BCI systems for their intended 
uses, be it alternative communication and control, 
assistive technologies, neurorehabilitation, neu- 
rorestoration or replacement, or recreation and 
entertainment, among other applications. Lastly, 
a discussion is presented on the future of the field, 
highlighting newly emerging research directions 
and their potential ability to enhance our un- 
derstanding of the human brain and specifically 
the human motor system and ultimately how that 
knowledge may lead to more advanced and intel- 
ligent computational systems. 


39.1 Overview - Neuroengineering 


ii General occ. ccsese sceseseeescestcnceseess 728 
39.1.1 The Human Motor System........ 730 
39.2 Human Motor Control ......................... T32 
39.2.1 Motion Planning 
and Execution in Humans ....... 32 
39.2.2 Coordinate Systems Used 
to Acquire 


a New Internal Model ............. 732 


727 


v 
o 

= 

+ 
o 
Ww 
te) 


728 PartD 


L'6E | d Hed 


Neural Networks 


39.2.3 Spatial Accuracy 


and Reproducibility ................ 733 
39.3 Modeling the Motor System - 
Internal Motor Models........................ 733 


39.3.1 Forward Models, Inverse 

Models, and Combined Models 734 
39.3.2 Adaptive Control Theory .......... 734 
39.3.3 Optimization Principles........... 734 
39.3.4 Kinematic Features 

of Human Hand Movements 

and the Minimum Jerk 

Hypothesis asessorina 735 
39.3.5 The Minimum Jerk Model, 

The Target Switching Paradigm, 

and Writing-like Sequence 


MOVEMEMÝS 1. scnccccsssexseveracteres 736 
39.4 Sensorimotor Learning ....................... 736 
39.4.1 Explicit Versus Implicit 
Motor Learning... T3T 
39.4.2 Time Phases in Motor Learning. 737 
39.4.3 Effector Dependency............... T31 
39.4.4 Coarticulation.......... ee 738 
39.4.5 Movement Cuing.................... 738 
39.5 MRI and the Motor System - 
Structure and Function....................... 738 
39.6 Electrocorticographic Motor Cortical 
Surface Potentials....................::::cce 741 


39.7 MEG and EEG - Extra Cerebral 

Magnetic and Electric Fields 
of the Motor System .......................0 745 
39.7.1 Sensorimotor Rhythms and 

Other Surrounding Oscillations. 746 
39.7.2 Movement-Related Potentials.. 747 
39.7.3 Decoding Hand Movements 

MPI EEG oirir ieas T47 


39.8 Extracellular Recording - 
Decoding Hand Movements 
from Spikes and Local Field Potential.. 748 


39.8.1 Neural Coding Schemes ........... 749 
39.8.2 Single Unit Activity Correlates 

of Hand Motion Attributes ....... 751 
39.8.3 Local Field Potential Correlates 

of Hand Motion Attributes ....... 754 

39.9 Translating Brainwaves 

into Control Signals - BClIs.................. 754 
39.9.1 Pre-Processing and Feature 

Extraction/Selection ................ 755 
39.9.2 Classification. esce TST 


39.9.3 Unsupervised Adaptation 
in Sensorimotor Rhythms BCIs.. 757 


39.9.4 BCI Outlook ......iioeeeeeeeeeea 760 
39.10 Conclusion ........aeeeeeeeen 762 
References........... 00. cec ccc ceceeceeeeeeeeeeeaeeeeeuees 764 


39.1 Overview — Neuroengineering in General 


Neuroengineering is defined as the interdisciplinary 
field of engineering and computational approaches to 
problems in basic and clinical neurosciences. Thus, ed- 
ucation and research in neuroengineering encompasses 
the fields of engineering, mathematics, and computer 
science on the one hand, and molecular, cellular, and 
systems neurosciences on the other. Prominent goals in 
the field include restoration and augmentation of human 
functions via direct interactions between the nervous 
system and artificial devices. Much current research is 
focused on understanding the coding and processing of 
information in the sensory and motor systems, quanti- 
fying how this processing is altered in the pathological 
state, and how it can be manipulated through interac- 
tions with artificial devices, including brain-computer 
interfaces (BCIs) and neuroprostheses. 

Although there are many topics that can be covered 
under neuroengineering umbrella, this chapter does not 
aim to cover them all. Focus is on providing a com- 


prehensive overview of state-of-the-art in technologies 
and knowledge surrounding the human motor system. 
The motor system is extremely complex in terms of 
the functions it performs and the structure underlying 
the control it provides; however, there is an ever- 
increasing body of knowledge on how it works and 
is controlled. This has been facilitated by studies of 
animal models, computational models, electrophysiol- 
ogy, and neuroimaging of humans. Another important 
development that is extending the boundaries of our 
knowledge about the motor system is the development 
of brain—computer interface (BCI) technologies that in- 
volve intentional modulation of sensorimotor activity 
through executed as well as imagined movement (motor 
imagery). BCI research not only opens up a framework 
for non-muscular communication between humans and 
computers/machines but offers experimental paradigms 
for understanding the neuroscience of motor control, 
testing hypotheses, and gaining detailed insight into 


Neuroengineering 


39.1 Overview — Neuroengineering in General 


motor control from the activity of a single neuron, 
a small population of neurons, networks of neurons, 
and the spatial and spectral relationship across multiple 
brain regions and networks. This knowledge will un- 
doubtedly lead to better diagnostics for motor-related 
pathologies, better BCIs for assistance and alterna- 
tive non-muscular communication applications for the 
physically impaired, better rehabilitation for those ca- 
pable of regaining lost motor function, and a better 
understanding of the brain as a whole. 

Relevantly, for the context and scope of this hand- 
book, BCI research will contribute to better gaining 
a better insight of information processing in the brain, 
resulting in better, more intelligent computational ap- 
proaches to developing, truly intelligent systems — sys- 
tems that perceive, reason, and act autonomously. 

The motor system is often considered to be at the 
heart of human intelligence. From the motor chauvin- 
ist’s point of view the entire purpose of the brain is to 
produce movement [39.1]. This assertion is based on the 
following observations about movement: 


1. Interaction with the world is only achieved through 
movement. 

2. All communication is mediated via the motor sys- 
tem including speech, sign language, gesture, and 
writing. 

3. All sensory and cognitive processes can be consid- 
ered inputs that determine future motor outputs. 


Neuroscientists and researchers focusing on other 
areas and functions of the brain may refute this sugges- 


tion given the fact that many regions related to general 
intelligence are located throughout the brain and that 
a single intelligence center is unlikely. No single neu- 
roanatomical structure determines general intelligence, 
and different types of brain designs can produce equiva- 
lent intellectual performance [39.3]. Nevertheless, there 
is no doubt that the motor system is critical to the 
advancement of human level intelligence and, there- 
fore, in the context of computational intelligence, this 
chapter focuses on reviewing studies and methodolo- 
gies that elucidate some of the aspects that we know 
about sensorimotor systems and how these can be stud- 
ied. Although the aim of the chapter is not to provide 
an exhaustive review of the available extensive litera- 
ture, it does aim to provide insights into key findings 
using some of the state-of-the-art experimental and 
methodological approaches deployed in neuroscience 
and neuroengineering, whilst at the same time review- 
ing methodology that may lead to the development of 
practical BCIs. BCIs have revealed new ways of study- 
ing how the brain learns and adapts, which in turn 
have helped improve BCI designs and better compu- 
tational intelligence for adapting the signal processing 
to the adaptation regime of the brain. One of the key 
findings in BCI research is that it can trigger plas- 
tic changes in different brain areas, suggesting that 
the brain has even greater flexibility than previously 
thought [39.4]. These findings can only serve to im- 
prove our understanding of how the brain, the most 
sophisticated and complex organism in the known uni- 
verse, functions, undoubtedly leading to better compu- 
tational systems. 


Posterior parietal cortex (PPC) 
Transforming visual cues into 
plans for voluntary movements 


Motor cortex 


Initiating and directing voluntary 
movements 


Brainstem centers 
Postural control 


Thalamus 


Basal ganglia 
Learning movements, motivation of 
movements, initiating movements 


Spinal cord 


Reflex coordination 


— Motor neurons 


E Skeletal muscles 


Cerebellum 
Learning movements and 
coordination 


Fig. 39.1 Major components of the 
motor system (after [39.2], courtesy 
of Shadmehr) 


729 


L'6E | d Hed 


730 PartD | Neural Networks 


L'6E | d Hed 


39.1.1 The Human Motor System 


The human motor system produces action. It controls 
goal-directed movement by selecting the targets of ac- 
tion, generating a motor plan and coordinating the 
generation of forces needed to achieve those objec- 
tives. Genes encode a great deal of the information 
required by the motor system — especially for actions in- 


_— Nerves 

ee Breathing and head 

Vertebrae _ __ 2 and neck movement 
Heart rate and 


numbers Áb 
Cervical shoulder movement 
division Wrist and elbow 
movement 
Hand and finger 
movement 
Thoracic 
division 


Sympathetic tone 
— including temperature 
regulation and trunk stability 


Lumbar 
division 
| _ Ejaculation 
and hip motion 
Sacral 
division 


Knee extension 


Foot motion 
and knee flexion 


| Penile erection 
and bowel and bladder activity 


Fig. 39.2 Divisions of the spinal cord (after [39.5], courtesy of 
Shadmehr and McDonald) 


volving locomotion, orientation, exploration, ingestion, 
defence, aggression, and reproduction — but every indi- 
vidual must learn and remember a great deal of motor 
information during his or her lifetime. Some of that in- 
formation rises to conscious awareness, but much of it 
does not. Here we will focus on the motor system of hu- 
mans, drawing on information from primates and other 
mammals, as necessary. 


Major Components of the Motor System 

The central nervous system that vertebrates have 
evolved comprises six major components: the spinal 
cord, medulla, pons, midbrain, diencephalon and telen- 
cephalon, the last five of which compose the brain. In 
a different grouping, the hindbrain (medulla and pons), 
the midbrain, and the forebrain (telencephalon plus di- 
encephalon) constitute the brain. Taken together, the 
midbrain and hindbrain make up the brainstem. All lev- 
els of the central nervous system participate in motor 
control. However, let us take the simple act of reaching 
to pick up a cup of coffee to illustrate the function of the 
various components of the motor system (Fig. 39.1): 


@ The parietal cortex: Processes visual information 
and proprioceptive information to compute location 
of the cup with respect to the hand. Sends this infor- 
mation to the motor cortex. 

@ The motor cortex: Using the information regarding 
the location of the cup with respect to the hand, it 
computes forces that are necessary to move the arm. 
This computation results in commands that are sent 
to the brainstem and the spinal cord. 

@ The brainstem motor center: Sends commands to 
the spinal cord that will maintain the body’s balance 
during the reaching movement. 

@ The spinal cord: Motor neurons send the commands 
received from the motor cortex and the brainstem to 
the muscles. During the movement, sensory infor- 
mation from the limb is acquired and transmitted 
back to the cortex. Reflex pathways ensure stability 
of the limb. 

© The cerebellum: This center is important for co- 
ordination of multi-joint movements, learning of 
movements, and maintenance of postural stability. 

© The basal ganglia: This center is important for 
learning of movements, stability of movements, ini- 
tiation of movements, emotional, and motivational 
aspects of movements. 

@ The thalamus: May be thought of as a kind of 
switchboard of information. It acts as a relay be- 
tween a variety of subcortical areas and the cerebral 


Neuroengineering | 39.1 Overview — Neuroengineering in General 


To brain 


From brain 


Descending track 
of axons 


Gray 
matter 


Dorsal root 
ganglion 


Ascending track 


q of axons 


b v a N À q 
f, Se 
Dorsal f ; E sS 
surface “Blood ssa be \ 
= 4 


x Y Neuronal 
egllbody: 


v m 


Fig. 39.3 A spinal segment (after [39.5], courtesy of Shadmehr and McDonald) 


cortex, although recent studies suggest that thala- 
mic function is more selective. The neuronal infor- 
mation processes necessary for motor control are 
proposed as a network involving the thalamus as 
a subcortical motor center. The nature of the inter- 
connected tissues of the cerebellum to the multiple 
motor cortices suggests that the thalamus fulfills 
a key function in providing the specific channels 
from the basal ganglia and cerebellum to the cor- 
tical motor areas. 


The spinal cord comprises four major divisions. 
From rostral to caudal, these are called: cervical, tho- 
racic, lumbar, and sacral (Fig. 39.2). Cervix is the Latin 
word for neck. The cervical spinal segments intervene 
between the pectoral (or shoulder) girdle and the skull. 
Thorax means chest (or breast plate). Lumbar refers to 
the loins. Sacral, the most intriguing name of all refers 
to some sort of sacred bone. 

In mammals, the cervical spinal cord has 8 seg- 
ments; the thoracic spinal cord has 12, and the lumbar 
and sacral cords both have 5. The parts of the spinal 
cord that receive inputs from and control the muscles 
of the arms (more generally, forelimbs) and legs (more 
generally, hind limbs) show enlargements associated 
with an increasing number and size of neurons and 
fibers: the cervical enlargement for the arms and the 


lumbar enlargement for the legs. Each segment is la- 
beled and numbered according to its order, from rostral 
to caudal, within each general region of spinal cord. 
Thus, the first cervical segment is abbreviated C1 and 
together the eight cervical segments can be designated 
as C1-C8. 

In each spinal segment, one finds a ring of white 
matter (WM) surrounding a central core of gray matter 
(GM) (Fig. 39.3). White matter is so called because the 
high concentration of myelin in the fiber pathways gives 
it a lighter, shiny appearance relative to regions with 
many cell bodies. The spinal gray matter bulges at the 
dorsal and ventral surfaces to form the dorsal horn and 
ventral horn, respectively. 

The cord has two major systems of neurons: de- 
scending and ascending. In the descending group, the 
neurons control both smooth muscles of the inter- 
nal organs and the striated muscles attached to our 
bones. The descending pathway begins in the brain, 
which sends electrical signals to specific segments in 
the cord. Motor neurons in those segments then con- 
vey the impulses towards their destinations outside the 
cord. 

The ascending group is the sensory pathways, send- 
ing sensory signals received from the limbs and mus- 
cles and our organs to specific segments of the cord 
and to the brain. These signals originate with special- 


731 


L'6E | d Hed 


732 


T'6E | d Hed 


Part D 


Neural Networks 


ized transducer cells, such as cells in the skin that 
detect pressure. The cell bodies of the neurons are 
in a gray, butterfly-shaped region of the cord (gray 
matter). The ascending and descending axon fibers 


39.2 Human Motor Control 


Motor control is a complex process that involves the 
brain, muscles, limbs, and often external objects. It 
underlies motion, balance, stability, coordination, and 
our interaction with others and technology. The general 
mission of the human motor control research field is to 
understand the physiology of normal human voluntary 
movement and the pathophysiology of different move- 
ment disorders. Some of the opening questions include: 
how do we select our actions of the many actions possi- 
ble? How are these behaviors sequenced for appropriate 
order and timing between them? How does perception 
integrate with motor control? And how are perceptual- 
motor skills acquired? In the following section the basic 
aspects of motor control — motor planning and motor 
execution are presented. 


39.2.1 Motion Planning 
and Execution in Humans 


Human goal-directed arm movements are fast, accurate, 
and can compensate for various dynamic loads exerted 
by the environment. These movements exhibit remark- 
able invariant properties, although a motor goal can 
be achieved using different combinations of elementary 
movements. 

Models of goal-directed human arm movements 
can be divided into two major groups: feedback and 
feed-forward. Feedback schemes for motion planning 
assume that the motion is generated through a feedback 
control law, whereas feed-forward schemes of trajec- 
tory formation propose that the movement is planned 
in advance and then executed. While a comprehensive 
model of human arm movements should include feed- 
forward as well as feedback control mechanisms, pure 
feedback control mechanisms cannot account for the 
fast and smooth movements performed by adult hu- 
mans. Although none of the existing models are able 
to account for all the characteristics of human mo- 
tion, there is compelling evidence that mechanisms for 
feed-forward motion planning exist within the central 
nervous system (CNS). A further supporting argument 
for the existence of a pre-planned trajectory is that visu- 


travel in a surrounding area known as the white mat- 
ter. It is called white matter because the axons are 
wrapped in myelin, a white electrically insulating 
material. 


ally directed movements are characterized by relatively 
long reaction times (RT) of 200—500 ms [39.6], which 
are supposed to reflect the time needed to plan an ad- 
equate movement. A partial knowledge of either the 
amplitude or the direction of the upcoming movement 
can significantly reduce the RT. 


39.2.2 Coordinate Systems Used to Acquire 
a New Internal Model 


In reaching for objects around us, neural processing 
transforms visuospatial information about target loca- 
tion into motor commands to specify muscle forces 
and joint motions that are involved in moving the 
hand to the desired location [39.7]. In planar reaching 
movements, extent and direction have different vari- 
able errors, suggesting that the CNS plans the move- 
ment amplitude and direction independently and that 
the hand paths are initially planned in vectorial co- 
ordinates without taking into account joint motions. 
In this framework, the movement vector is specified 
as an extent and direction from the initial hand posi- 
tion. Kinematic accuracy depends on learning a scaling 
factor from errors in extent and reference axes from 
errors in direction, and the learning of new reference 
axes shows limited generalization [39.8]. Altogether, 
these findings suggest that motor planning takes place 
in extrinsic, hand-centered, visually perceived coor- 
dinates. Finally however, vectorial information needs 
to be converted into muscle forces for the desired 
movement to be produced. This transformation needs 
to take into account the biomechanical properties of 
the moving arm, notably the interaction torques pro- 
duced at all the joints by the motion of all limb 
segments. For multi-joint arms, there are significant 
inertial dynamic interactions between the moving skele- 
tal segments, and several muscles pull across more 
than one joint. Clearly, these complexities raise com- 
plicated control problems since one needs to overcome 
or solve the inverse dynamics problem. The capac- 
ity to anticipate the dynamic effects is understood 
to depend on learning an internal models of muscu- 


Neuroengineering 


39.3 Modeling the Motor System - Internal Motor Models 


loskeletal dynamics and other forces acting on the 
limb. 

The equilibrium trajectory hypothesis for multi- 
joint arm motions [39.9] circumvented the complex 
dynamic problem mentioned above by using the spring- 
like properties of muscles and stating that multi-joint 
arm movements are generated by gradually shifting 
the hand equilibrium positions defined by the neuro- 
muscular activity. The magnitude of the force exerted 
on the arm, at any time, depends on the difference 
between the actual and equilibrium hand positions 
and the stiffness and viscosity about the equilibrium 
position. 

Neuropsychological studies indicate that for M1 
(primary motor) region, the representations that medi- 
ate motor behavior are distributed, often in a graded 
manner, across extensive, overlapping cortical regions, 
so that different memory systems can underlie different 
coordinate systems, which are used at different hierar- 
chical levels. 


39.2.3 Spatial Accuracy and Reproducibility 


Our ability to generate accurate and appropriate mo- 
tor behavior relies on tailoring our motor commands 
to the prevailing movement context. This context em- 
bodies parameters of both our own motor system, such 
as the level of muscle fatigue, and the outside world, 
such as the weight of a bottle to be lifted. As the con- 
sequence of a given motor command depends on the 
current context, the CNS has to estimate this context 
so that the motor commands can be appropriately ad- 
justed to attain accurate control. A current context can 
be estimated by integrating two sources of information: 
sensory feedback and knowledge about how the context 


is likely to have changed from the previous estimate. In 
the absence of sensory feedback about the context, the 
CNS is able to extrapolate the likely evolution of the 
context without requiring awareness that the context is 
changing [39.10]. 

Although the CNS tries to maximize our motion 
accuracy, systematic directional errors are still found. 
These errors may result from a number of sources. 
One cause for not being accurate is a visual distor- 
tion, which could be the outcome of fatigue of the 
eyes or inherent optical distortion. A second cause 
could be imperfect control processes due to the noise 
in the neuromuscular system or blood flow pulsations, 
which cause our movements to be jerky. A third cause 
could be that each movement we utilize, consciously 
or unconsciously, may involve different motor plans, 
which result in slightly different trajectories and end- 
point accuracies. 

In a simple aiming movement, the task is to mini- 
mize the final error, as measured by the variance about 
the target. The endpoint variability has an ellipsoid 
shape with two main axes perpendicular one to another. 
This finding led to the vectorial planning hypotheses 
stating that planning of visually guided reaches is ac- 
complished by independent specification of extent and 
direction [39.8]. It was later suggested that the aim of 
the optimal control strategy is to minimize the volume 
of the ellipsoid, thereby being as accurate as possi- 
ble. Non-smooth movements require increased motor 
commands, which generate increased noise; smooth- 
ness thereby leads to increased end-point accuracy but 
is not a goal in its own. Although the end-point-error 
cost function specifies the optimal movement, how one 
approaches this optimum for novel, unrehearsed move- 
ments is an open question. 


39.3 Modeling the Motor System - Internal Motor Models 


An internal model is a postulated neural process that 
simulates the response of the motor system in order to 
estimate the outcome of a motor command. The inter- 
nal model theory of motor control argues that the motor 
system is controlled by the constant interactions of the 
plant and the controller. The plant is the body part being 
controlled, while the internal model itself is considered 
part of the controller. Information from the controller, 
such as information from the CNS, feedback informa- 
tion, and the efference copy, is sent to the plant which 
moves accordingly. 


Internal models can be controlled through either 
feed-forward or feedback control. Feed-forward control 
computes its input into a system using only the cur- 
rent state and its model of the system. It does not use 
feedback, so it cannot correct for errors in its control. 
In feedback control, some of the output of the system 
can be fed back into the system’s input, and the sys- 
tem is then able to make adjustments or compensate 
for errors from its desired output. Two primary types 
of internal models have been proposed: forward mod- 
els and inverse models. In simulations, models can be 


733 


€°6€ | d Hed 


734 PartD 


Neural Networks 


€°6€ | d Hed 


combined together to solve more complex movement 
tasks. 

The following section elaborates on the two internal 
models, introduces the concept of optimization princi- 
ples and its use in modeling human motor behavior, 
presenting a well-established motor control model for 
2-D volitional hand movement. 


39.3.1 Forward Models, Inverse Models, 
and Combined Models 


In their simplest form, forward models take the input 
of a motor command to the plant and output a pre- 
dicted position of the body. The motor command input 
to the forward model can be an efference copy. The out- 
put from that forward model, the predicted position of 
the body, is then compared with the actual position of 
the body. The actual and predicted position of the body 
may differ due to noise introduced into the system by 
either internal (e.g., body sensors are not perfect, sen- 
sory noise) or external (e.g., unpredictable forces from 
outside the body) sources. If the actual and predicted 
body positions differ, the difference can be fed back as 
an input into the entire system again so that an adjusted 
set of motor commands can be formed to create a more 
accurate movement. 

Inverse models use the desired and actual position 
of the body as inputs to estimate the necessary motor 
commands that would transform the current position 
into the desired one. For example, in an arm reaching 
task, the desired position (or a trajectory of consecutive 
positions) of the arm is input into the postulated inverse 
model, and the inverse model generates the motor com- 
mands needed to control the arm and bring it into this 
desired configuration. 

Theoretical work has shown that in models of motor 
control, when inverse models are used in combination 
with a forward model, the efference copy of the motor 
command output from the inverse model can be used 
as an input to a forward model for further predictions. 
For example if, in addition to reaching with the arm, the 
hand must be controlled to grab an object, an efference 
copy of the arm motor command can be input into a for- 
ward model to estimate the arm’s predicted trajectory. 
With this information, the controller can then generate 
the appropriate motor command telling the hand to grab 
the object. It has been proposed that if they exist, this 
combination of inverse and forward models would al- 
low the CNS to take a desired action (reach with the 
arm), accurately control the reach, and then accurately 
control the hand to grip an object. 


39.3.2 Adaptive Control Theory 


With the assumption that new models can be acquired 
and pre-existing models can be updated, the efference 
copy is important for the adaptive control of a move- 
ment task. Throughout the duration of a motor task, an 
efference copy is fed into a forward model known as 
a dynamics predictor whose output allows prediction of 
the motor output. When applying adaptive control the- 
ory techniques to motor control, the efference copy is 
used in indirect control schemes as the input to the ref- 
erence model. 


39.3.3 Optimization Principles 


Optimization theory is a valuable integrative and pre- 
dictive tool for studying the interaction between the 
many complex factors, which result in the generation of 
goal-directed motor behavior. It provides a convenient 
way to formulate a model of the underlying neural com- 
putations without requiring specific details on the way 
those computations are carried out. The components of 
optimization problems are: a task goal (defined mathe- 
matically by a performance criterion or a cost function), 
a system to be controlled (a set of system variables that 
are available for modulation), and an algorithm capa- 
ble of finding an analytical or a numerical solution. By 
rephrasing the learning problem within the framework 
of an optimization problem, one is forced to make ex- 
plicit, quantitative hypotheses about the goals of motor 
actions and to articulate how these goals relate to ob- 
servable behavior. 

As indicated in the last section, goal-directed arm 
movements exhibit remarkable invariant properties de- 
spite the fact that a given point in space can be reached 
through an infinite number of spatial, articular, and 
muscle combinations. In order to account for this ob- 
servation it is necessary to postulate the existence of 
a regulator, i.e., a functional constraint, to reduce the 
number of degrees of freedom available to perform the 
task. Most of the regulators proposed during the last 
decade refer to a general hypothesis that the nervous 
system fries to minimize some cost related to the move- 
ment performance. Nelson [39.11] first formulated this 
idea in an operative way by proposing to use mathemat- 
ical cost functions to estimate the energy or other costs 
consumed during a movement. This approach was fur- 
ther developed by several investigators who proposed. 
different criteria such as, for instance, minimum muscu- 
lar energy, minimum effort, minimum torque, minimum 
work, or minimum variance. A model that is indis- 


Neuroengineering 


39.3 Modeling the Motor System - Internal Motor Models 


putably one of the most mentioned in the literature and 
that has proven to be very powerful in describing multi- 
joint movements is the minimum jerk model described 
in the next section. 


39.3.4 Kinematic Features 
of Human Hand Movements 
and the Minimum Jerk Hypothesis 


Human point-to-point arm movements that are re- 
stricted in the horizontal plane tend to be straight, 
smooth, with single-peaked, bell-shaped velocity pro- 
files and are invariant with respect to rotation, transla- 
tion, and spatial or temporal scaling. Motor adaptation 
studies in which unexpected static loads or velocity- 
dependent force fields were applied during horizontal 
reaching movements further supported the hypothesis 
that arm trajectories follow a kinematic plan formu- 
lated in an extrinsic Cartesian task-space. The mor- 
phological invariance of the movement in Cartesian 
space supported the hypothesis that the hand trajec- 
tory in task-space is the primary variable computed 
during movement planning. It is assumed that follow- 
ing the planning process, the CNS performs non-linear 
inverse kinematics computations, which convert time 
sequences of hand position into time sequences of joint 
positions. 


a) a T4 


V 


> 
20 40 60 120 
Time (x 10 ms) 


> 
20 40 60 120 
Time (x 10 ms) 


Fig. 39.4a,b Overlapped predicted (solid lines) and measured (dashed lines) hand paths (a,, bı), speeds (az, b2), and acceleration 
components along the y-axis (a3, b3) and along the x-axis (d) for two unconstrained point-to-point movements. (a) A movement 


con 
20 40 60 120 
Time (x 10 ms) 


The kinematic features of one-joint goal directed 
movements were successfully modeled by the mini- 
mum jerk hypothesis [39.13] and were later extended 
for planar hand motion [39.12]. The minimum jerk 
model states that the time integral of the squared mag- 
nitude of the first derivative of the Cartesian hand 
acceleration (jerk) is minimized, 


E 2 
dr 
eal — | dt, 
dË 

0 


where r(t) = (x(t), y(t)) are the Cartesian coordinates 
of the hand and T is the movement duration. The solu- 
tion of this variational problem, assuming zero velocity 
and zero acceleration at the initial and final hand loca- 
tions 7;, rf, is given by 


o= i+ (10(4)'-15(4)' +6(4)) 


x (ri— re) . 


(39.1) 


(39.2) 


The experimental setup and the comparison between 
experimental data and the minimum-jerk model predic- 
tion for hand paths, tangential velocities, and accelera- 
tion components between different targets are depicted 
in Fig. 39.4. 


b) aly 
=] T3 a Ts 
aly al) als 
b1) mm/s b2) mm/s? b3) mm/s” 
15004 42004 . 42004 


\ 


> 
20 40 60 120 
Time (x 10 ms) 


> 
20 40 60 120 20 40 60 


Time (x 10 ms) 


between targets 3 and 6. (b) A movement between targets 1 and 4 (after [39.12], courtesy of Flash) 


Time (x 10 ms) 


735 


€°6€ | d Hed 


736 PartD 


Neural Networks 


H'6E | d Hed 


39.3.5 The Minimum Jerk Model, 
The Target Switching Paradigm, 
and Writing-like Sequence 
Movements 


The stereotyped kinematic patterns of planar reaching 
movements are not the expression of a pre-wired or 
inborn motor pattern, but the result of learning during 
ontogenesis. When infants start to reach, their reaching 
is characterized by multiple accelerations and decelera- 
tions of the hand, while experienced infants reach with 
much straighter hand paths and with a single smooth 
acceleration and deceleration of the hand. It is possi- 
ble to decompose a large proportion of infant reaches 
into an underlying sequence of sub-movements that re- 
semble simple movements of adults. It is now believed 
that the CNS uses small, smooth sub-movements, com- 
monly known as motion primitives, which are smoothly 
concatenated in time and space, in order to construct 
more complicated trajectories. Motor primitives can be 
considered neural control modules that can be flexibly 
combined to generate a large repertoire of behaviors. 
A primitive may represent the temporal profile of a par- 
ticular muscle activity (low level, dynamic intrinsic 
primitive) or a geometrical shape in visually perceived 
Cartesian coordinates (high level, kinematic extrinsic 
primitive [39.14, 15]). The overall motor output will 
be the sum of all primitives, weighted by the level of 
activations of each module. A behavior for which the 
motor system has many primitives will be easy to learn, 
whereas a behavior that cannot be approximated by any 
set of primitives would be impossible to learn [39.16]. 
The biological plausibility of the primitives’ mod- 
ules model was demonstrated in studies on spinalized 
frogs and rats that showed that the pre-motor circuits 
within the spinal cord are organized into a set of discrete 
modules [39.17]. Each module, when activated, induces 
a specific force field, and the simultaneous activation 
of multiple modules leads to the vectorial combination 


39.4 Sensorimotor Learning 


Motion planning strategies may also change with learn- 
ing. If a task is performed for the first time the only 
strategy the CNS might follow is to develop a plan, 
which allows the execution of the task without taking 
into account the computational cost. A repetitive per- 
formance might result in a change in the coding of the 
movement and produce a more optimal behavior — at 


of the corresponding fields. Other evidence for the ex- 
istence of primitive sub-movements came from works 
on hemiplegic stroke patients, which showed that the 
patients’ first movements were clearly segmented and 
exhibited a remarkably invariant speed vs. time profile. 

The concept of superposition was further elaborated 
and modeled for target switching experiments [39.18]. 
It was found that arm trajectory modification in a dou- 
ble target displacement paradigm involves the vectorial 
summation of two independent plans, each coding for 
a maximally smooth point-to-point trajectory. The first 
plan is the initial unmodified plan for moving between 
the initial hand position and the first target location. 
The second plan is a time-shifted trajectory plan that 
starts and ends at rest and has the same amplitude and 
kinematic form as a simple point-to-point movement 
between the first and second target locations. 

The minimum jerk model is also a powerful model 
for predicting the generated trajectory when subjects 
are instructed to generate continuous movements from 
one target to another through an intermediate tar- 
get. It was also shown that, using the minimum jerk 
model, human handwriting properties can be faithfully 
reconstructed while specifying the velocities and the 
positions at via-points, taken at maximum curvature 
locations. 

Understanding primitives may only be achieved 
by investigating the neural correlates of sensorimotor 
learning and control. We already know a lot about 
the neural correlates of motor imagery and execution 
as highlighted in Sects. 39.6 and 39.7, which may 
provide a good starting point to investigate motion 
primitives, but we will have to go beyond basic cor- 
relates to understand the time-dependent, non-linear 
relationship among various neural correlates of motor 
learning and control. This will involve new exper- 
imental paradigms and computational methods. The 
following section overviews investigations into senso- 
rimotor learning. 


a lower computational cost. Thus, practice — the track 
for perfection, allows the performance of many tasks to 
improve, throughout life, with repetitions. 

Even in adulthood simple tasks such as reaching to- 
wards a target or rapidly and accurately tapping a short 
sequence of finger movements, which appear when 
mastered to be effortlessly performed, often require ex- 


Neuroengineering | 39.4 Sensorimotor Learning 737 


tensive training before skilled performance develops. 
A performance gains asymptotes after a long training 
period and is usually kept intact for years to come. 
Many studies have focused on different aspects of 
motor learning: time scale in motor learning and devel- 
opment, task and effector specificity, effect of attention, 
and intention and explicit vs. implicit motor leaning. 
These topics are discussed in the next section in the con- 
text of motor sequence learning. 


39.4.1 Explicit Versus Implicit 
Motor Learning 


When considering sequence learning one needs to dis- 
tinguish between explicit and implicit learning. Ex- 
plicit learning is frequently assumed to be similar to 
the processes which operate during conscious problem 
solving, and includes: conscious attempts to construct 
a representation of the task; directed search of mem- 
ory for similar or analogous task relevant information; 
and conscious attempts to derive and test hypotheses 
related to the structure of the task. This type of learn- 
ing has been distinguished from alternative models of 
learning, termed implicit learning. The term implicit 
learning denotes learning phenomena in which more 
or less complex structures are reflected in behavior, al- 
though the learners are unable to verbally describe these 
structures. Numerous studies have examined implicit 
learning of serial-order information using the serial re- 
action time (SRT) task. In this task, learning is revealed 
as a decrease in reaction times for stimuli presented 
when needed to repeat a sequence versus those pre- 
sented in a random order. 

There is a vast literature debating what is really 
learned in the SRT task. The description of a given se- 
quence structure is from a theoretical point of view not 
trivial because a given structure typically has several 
different structural components. Implicit learning may 
depend on each of these structural components. In se- 
quence learning tasks these components may pertain to: 
frequency-based, statistical structures (i. e.,redundancy), 
relational structures, and temporal and spatial patterns. 
A literature review shows that all of these components in- 
fluence on the rate in which a sequence is learned. 

Neuropsychological research suggests that implicit 
sequence learning in the SRT task is spared in pa- 
tients with organic amnesia, so implicit SRT learning 
does not appear to depend on the medial temporal 
and diencephalic brain regions that are critical for ex- 
plicit memory. Conversely, patients with Huntington or 
Parkinson diseases have consistently shown SRT im- 


pairments, so the basal ganglia seem to be critically 
involved in SRT learning. Recent studies indicate that 
the anterior striatum affects learning of new sequences 
while the posterior striatum is engaged in recalling 
a well-learned sequence. In the following section the 
discussion is restricted to explicit motor learning. 


39.4.2 Time Phases in Motor Learning 


It is reasonable to assume that a gain in a motor per- 
formance reflects a change in brain processing which 
is triggered by practice. The verity that many skills, 
when acquired, are retained over long time intervals 
suggests that training can induce long-lasting neural 
changes. Previous results from neuroimaging studies in 
which performance was modified over time have shown 
that different learning stages can be defined by altered 
brain activations patterns. As an effect of repetition or 
practice, several studies report that specific brain ar- 
eas showed an increase in the magnitude or extent of 
activation. Motor skill learning (e.g., sequential finger 
opposition tasks) requires prolonged training times and 
has two distinct phases, analogous to those subserving 
perceptual skill learning: an initial, fast improvement 
phase (fast learning) in which the extent of activation 
in the M1 area decreases (habituation-like response) 
and a slowly evolving, post training incremental per- 
formance gain (slow learning), in which the activation 
in M1 increases compared to control conditions [39.19]. 


39.4.3 Effector Dependency 


Another fascinating enigma in the realm of motor learn- 
ing is whether the representation of procedural memory 
in the brain changes throughout training and whether 
different neural correlates underlie the different learn- 
ing stages. A study conducted on monkeys, in which 
a sequence of ten button presses is learned by trial and 
error, has shown that the time course of improvement 
of two performance measures: key press errors and 
reaction-time (RT), was different [39.20]. The key press 
errors reached an asymptote within a shorter period of 
training compared to the RTs, which continued to de- 
crease throughout a longer time period. This finding 
suggested that the acquisition of sequence knowledge 
(as measured by key press errors) may take place 
quickly but long-term motor sequence learning (as mea- 
sured by RT) may take longer to be established, thus 
different aspects of the task are learned in different time 
scales. Further studies on monkeys and humans demon- 
strated that although effector-dependent and indepen- 


7°6€ | d Hed 


738 PartD 


Neural Networks 


S°6€ | d Hed 


dent learning occur simultaneously, effector-dependent 
representation might take longer to establish than 
effector-independent representation. 


39.4.4 Coarticulation 


After a motor sequence is extensively trained, most of 
the subjects undergo implicit or explicit anticipation, 
which results in a coarticulation — the spatial and tem- 
poral overlap of adjacent articulatory activities. It is 
well known that as we learn to speak, our speech be- 
comes smooth and fluent. Coarticulation in speech pro- 
duction is a phenomenon in which the articulator move- 
ments for a given speech sound vary systematically 
with the surrounding sounds and their associated move- 
ments. Several models have tried to predict the move- 
ments of fleshy tissue points on the tongue, lips, and 
jaw during speech production. Coarticulation has also 
been studied in the hand motor sequence. It was shown 
that pianists could anticipate a couple of notes before 
playing, which resulted in hand and finger kinematic 
divergence (assuming an anticipatory position) prior to 
the depression of the last common note. Such a diver- 
gence implies an anticipatory modification of sequen- 
tial movements of the hand, akin to the phenomenon 
of coarticulation in speech. Moreover, studies on flu- 
ent finger spelling has shown that rather than simply an 
interaction whereby a preceding movement affects the 
one following, the anticipated movement in a sequence 
could systematically affect the one preceding it. 


39.4.5 Movement Cuing 


Another important aspect of motor sequence perfor- 
manceis the type ofmovement cuing, external or internal. 
As internally cued movements are initiated at subject’s 
will, they have, by definition, predictable timing. Exter- 
nally triggered movements are performed in response to 


go signals; hence, they have unpredictable timing (un- 
less the timing of the go signal is not random and follows 
some temporal pattern that can be learned implicitly or 
explicitly). Studies on movement cuing in animals and 
patients with movement disorders have showed that the 
basal ganglia are presumably internal-cue generators and 
that they are preferentially connected with the supple- 
mentary motor area (SMA), an area that is concerned 
more with internal than with external motor initiation. 
In normal subjects the type of movement cuing influ- 
ences movement execution and performance. It has also 
been shown that teaching Parkinson disease (PD) pa- 
tients, who areimpaired withrespect to tasks involving the 
spontaneous generation of appropriate strategies, to initi- 
ate movements concurrently with external cue improved 
their motor performance. 

The preceding sections have provided a brief 
overview of the extensive literature available on un- 
derstanding the motor system from an experimental 
psychophysics and model-based perspective. A focus 
on general high level modeling is critical to under- 
standing motor control; however, the problem is being 
tackled from other perspectives, namely understand- 
ing the details of neuro and electrophysiology of brain 
regions and neural pathways involved in controlling 
motor function. In the context of developing brain- 
computer interfaces there have been significant efforts 
focused on understanding small network populations 
and structural, functional, and electrophysiological cor- 
relates of motor functions using epidural and subdural 
recordings, as well as non-invasively recorded elec- 
troencephalography (EEG), magnetoencephalography 
(MEG), and magnetic resonance imaging-based (MRI) 
technologies. Understanding the differences between 
imagined movement and motor execution, as well as 
the effects of movement feedback and no feedback have 
shed light on motor functioning. The following sections 
provide a snapshot of some recent findings. 


39.5 MRI and the Motor System - Structure and Function 


A new key phase of research is beginning to investigate 
how functional networks relate to structural networks, 
with emphasis on how distributed brain areas commu- 
nicate with each other [39.21]. Structural methods have 
been powerful in indicating when and where changes 
occur in both gray and white matter with learning and 
recovery [39.22] and disease [39.23]. Here we review 
some of the findings in sensorimotor systems with an 


emphasis on elucidating regions engaged in motor ex- 
ecution and motor imagery (imagined movement), and 
motor sequence learning. 

Even with identical practice, no two individuals are 
able to reach the same level of performance on a motor 
skill — nor do they follow the same trajectory of im- 
provement as they learn [39.24]. These differences are 
related to brain structure and function, but individual 


Neuroengineering 


39.5 MRI and the Motor System — Structure and Function 


differences in structure have rarely been explored. Stud- 
ies have shown individual differences in white matter 
(WM) supporting visuospatial attention, motor cortical 
connectivity through the Corpus callosum, and connec- 
tivity between the motor regions of the cerebellum and 
motor cortex. Steele et al. [39.24] studied the structural 
characteristics of the brain regions that are functionally 
engaged in motor sequence performance along with the 
fiber pathways involved. Using diffusion tensor imag- 
ing (DTD, probabilistic tractography, and voxel-based 
morphometry they aimed to determine the structural 
correlates of skilled motor performance. DTI is used to 
asses white matter integrity and perform probabilistic 
tractography. 

Fractional anisotropy (FA) is affected by WM 
properties, including axon myelination, diameter, and 
packing density. Differences in these properties may 
lead to individual differences in performance through 
pre-existing differences or training-induced changes in 
axon conduction velocity and synaptic synchronization, 
or density of innervation [39.24, 25]. Greater fiber in- 
tegrity along the superior longitudinal fasciculus (SLF) 
would be consistent with the idea that greater myelina- 
tion observed in relation to performance may underlie 
enhancements in synchronized activity between task- 
relevant regions. 

Voxel-based morphometry is used to assess gray 
matter (GM) volume. Individual differences in GM vol- 
ume may be influenced by multiple factors such as 
neuronal and glial cell density, synaptic density, vascu- 
lar architecture, and cortical thickness [39.26]. 

The majority of structural studies of individual 
differences find that better performance is associated 
with higher FA or greater GM volume. Individual 
differences in structural measures reflect differences in 
the microstructural organization of tissue related to task 
performance. A greater FA, an index of fiber integrity, 
may represent a greater ability for neurons in connected 
regions to communicate. Steele et al. [39.24] found 
enhanced synchronization performance on a temporal 
motor sequence task related to greater fiber integrity 
of the SLF, where the rate of improvement on synchro- 
nization was positively correlated with GM volume 
in cerebellar lobules HVI and V-regions that showed 
training-related decreases in activity in the same 
sample. The synchronization performance on the task 
was negatively correlated with FA in WM underlying 
the bilateral sensorimotor cortex, in particular within 
the bilateral corticospinal tract (CST), such that partic- 
ipants with greater final synchronization performance 
on the tasks had lower FA in these clusters. 


The results provide clear evidence of the impor- 
tance of structure in learning skilled tasks and that 
a larger corticospinal tract does not necessarily mean 
better performance. Enhanced fiber integrity in the SLF 
may result in reduced FA in regions where it crosses 
the CST and, therefore, there is a trade-off between 
the two in the region of the CST-SLF fiber crossing, 
which enables better performance for some motor im- 
agery and BCI participants — and is consistent with the 
idea of enhanced communication/synchronization be- 
tween regions that are functionally important for this 
task. The causes of inter-individual variability in brain 
structure are not fully understood, but are likely to 
include pre-existing genetic contributions and contribu- 
tions from learning and the environment [39.24]. Ullén 
et al. [39.27] attempted to address this by investigating 
whether millisecond variability in a simple, automatic 
timing task, isochronous finger tapping, correlates with 
intellectual performance and, using voxel-based mor- 
phometry, whether these two tasks share neuroanatom- 
ical substrates. Volumes of overlapping right prefrontal 
WM regions were found to be correlated with both sta- 
bility of tapping and intelligence. These results suggest 
a bottom-up explanation where extensive pre-frontal 
connectivity underlies individual differences in both 
variables as opposed to top-down mechanisms such as 
attentional control and cognitive strategies. 

Sensorimotor rhythm modulation is the most pop- 
ular BCI control strategy, yet little is known about the 
structural and functional differences that separate motor 
areas related to motor output from higher-order motor 
control areas or about the functional neural correlates 
of high-order control areas during voluntary motor con- 
trol. EEG and fMRI studies have shown the extent 
of motor regions that are active along with the tem- 
poral sequence of activations across different motor 
areas during a motor task and across different sub- 
jects [39.28, 29]. Ball et al. [39.28] have shown that all 
subjects in an EEG/fMRI study involving finger flex- 
ion had highly activated primary motor cortex areas 
along with activation of the frontal medial wall mo- 
tor areas. They also showed that some subjects had 
anterior type activations as opposed to posterior acti- 
vation for others, with some showing activity starting 
in the anterior cingulate motor area (CMA) and then 
shifting to the intermediate supplementary motor ar- 
eas. The time sequence of these activations was noted 
where it was shown that ~ 120ms before movement 
onset there was a drop in source strength in conjunction 
with an immediate increase of source strength in the 
M1 area. Those who showed more posterior activations 


739 


G'6E | d Hed 


740 Part D | Neural Networks 


S°6€ | d Hed 


a) Motor imagery all 


High aptitude users 


8 
come high > ioe 


Low aptitude users 


High aptitude users 


Motor observation all 


Motor execution all 


Low aptitude users 


48 5.6 3.3 3.9 45 3.2 3.6 40 


Fig. 39.5a-c Brain activation motor imagery (a) motor observation task (b) and motor execution task (c) showing mean 
activation of all participants (A), high aptitude users and low aptitude users individually (B), and the contrast of high 
aptitude users low aptitude users (C). The figure illustrates the maximum contrast between low aptitude and high aptitude 


BCI users (after [39.29], courtesy of Halder) 


were restricted to the posterior SMA. Some subjects 
showed activation of the inferior parietal lobe (IPL) 
during early movement onset. In all subjects showing 
activation of higher-order motor areas (anterior CMA, 
intermediate SMA, IPL), these areas became active be- 
fore the executive motor areas (M1 and posterior SMA). 
A number of these areas are related to attentional pro- 
cessing, others to triggering and others to executing. 
Understanding the sequence of these events for each 
individual in the context of rehabilitation and more ad- 
vanced brain and neural computer interfacing will be 
important. 

The neural mechanisms of brain—computer inter- 
face control were investigated by [39.29] in an {MRI 
study. It was shown that up to 30 different motor 
sites are significantly activated during motor execu- 
tion, motor observation, and motor imagery and that 
the number of activated voxels during motor observa- 
tion was significantly correlated with accuracy in an 
EEG sensorimotor rhythm-based (SMR) BCI task (see 
Sect. 39.7.1 for further details on SMR). Significantly 
higher activations of the supplementary motor areas for 
motor imagery and motor observation tasks were ob- 
served for high aptitude BCI users (see Fig. 39.5 for 


an illustration [39.29]). The results demonstrate that 
acquisition of the sensorimotor program reflected in 
SMR-BCI control (Sect. 39.7.1) is tightly related to the 
recall of such sensorimotor programs during observa- 
tion of movements and unrelated to the actual execution 
of these movement sequences. 

Using such knowledge about sensorimotor control 
will be critical in understanding and developing suc- 
cessful learning and control models for robotic devices 
and BCIs, and fully closing the sensorimotor learn- 
ing loop to enable finer manipulation abilities using 
BCIs and for retraining or enabling better relearn- 
ing of motor actions after cortical damage. Under- 
standing the neuroanatomy involved in motor execu- 
tion/imagery/observation may also provide a means of 
enhancing our knowledge of motion primitives and 
their neural correlates as discussed in Sect. 39.4. MRI 
and fMRI, however, only provide part of the picture, at 
the level of large networks of neurons, and on relatively 
large time scales. Invasive electrophysiology, however, 
can target specific neuronal networks at millisecond 
time resolution. The following section highlights some 
of the most recent findings from motor cortical surface 
potentials investigations. 


Neuroengineering | 39.6 Electrocorticographic Motor Cortical Surface Potentials 741 


39.6 Electrocorticographic Motor Cortical Surface Potentials 


The electroencephalogram (EEG) is derived from the 
action potentials of millions of neurons firing electri- 
cal pulses simultaneously. The human brain has more 
than 100000000000 (10!!) neurons and thousands of 
spikes (electrical pulses) are emitted each millisecond. 
EEG reflects the aggregate activity of millions of in- 
dividual neurons recorded with electrodes positioned 
in a standardized pattern on the scalp. Brainwaves 
are categorized into a number of different frequency 
bands including delta (1—4 Hz), theta (5—8 Hz), al- 
pha (8—12 Hz), mu (8—12 Hz), beta (13—30 Hz), and 
gamma (> 30 Hz). Each of these brain rhythms can 
be associated with various brain processes and be- 
havioral states, however, knowledge of exactly where 
brainwaves are generated in the brain, and if/how they 
communicate information, is very limited. By studying 
brain rhythms and oscillations we attempt to answer 
these questions and have realized that brain rhythms 
underpin almost every aspect of information process- 
ing in the brain, including memory, attention, and even 
our intelligence. We also observe that abnormal brain 
oscillations may underlie the problems experienced in 
diseases such as epilepsy or Alzheimer’s disease and 
we know that certain changes in brain rhythms and 
oscillations are good indicators of brain pathology as- 
sociated with these diseases. If we know more about the 
function of brainwaves we may be able to develop bet- 
ter diagnosis and treatments of these diseases. It may 
also lead to better computational tools and better bio- 
inspired processing tools to develop artificial cognitive 
systems. 

Brain rhythmic activity can be recorded non- 
invasively from the scalp as EEG or intracranially from 
the surface of the cortex as cortical EEG or the electro- 
corticogram (EEG is described in Sect. 39.7). 

Electrocorticography (ECoG), involving the clinical 
placement of electrode arrays on the brain surface (usu- 
ally above the dura) enables the recording of, similar 
to EEG, large-scale field potentials that are primar- 
ily derived from the aggregate synaptic potential from 
large neuronal populations, whereby synaptic current 
produces a change in the local electric field. ECoG 
can characterize local cortical potentials with high spa- 
tiotemporal precision (0.5 cm? in ECoG compared to 
1cm? in EEG) and high amplitudes (10—200 uV in 
ECoG compared to 10—100 uV in EEG). Furthermore, 
the ECoG spectral content can reach 300 Hz (compared 
to 60Hz in EEG) due to the closer vicinity of the 
electrodes to the electric source (the non-homogenous, 


anisotropic brain volume and tissues act as a low-pass 
filter). Independent individual finger movement dynam- 
ics can be resolved at the 20 ms time scale, which 
has been shown not to be possible with EEG (but has 
recently been demonstrated using MEG [39.30] as de- 
scribed in Sect. 39.7). Here we review some of the latest 
findings of ECoG studies involving human sensorimo- 
tor systems. 

The power spectral density (PSD) of the cortical 
potential can reveal properties within neuronal pop- 
ulations. Peaks in the PSD indicate activity that is 
synchronized across a neuronal population, for exam- 
ple, movement decreases the lateral frontoparietal alpha 
(8—12 Hz) and beta rhythm (12—30 Hz) amplitudes 
with limited spatial specificity whereas high gamma 
changes, which are spatially more focused, are also 
observable during motor control. Miller et al. [39.31], 
however, observed through a range of studies inves- 
tigating local gamma band-specific cortical process- 
ing, a lack of distinct peaks in the cortical potential 
PSD beyond 60 Hz and hypothesized the existence of 
broadband changes across all frequencies that were ob- 
scured at low frequencies by covariate fluctuations in 0 
(4—7 Hz), œ (8—12 Hz), and 6 (13—30 Hz) band oscil- 
lations. They demonstrated that there is a phenomenon 
that obeys a broadband, power-law form extending 
across the entire frequency range. Even with local brain 
activity in the gamma band there is an increase in power 
across all frequencies, and the power law shape is con- 
served. This suggests that there are phenomena with 
no special timescale where the neuronal population be- 
neath does not oscillate synchronously but may simply 
reflect a change in the population mean firing rate. 
Miller et al. [39.31] postulated that the power-law scal- 
ing during high y activity is a reflection of changes in 
asynchronous activity and not necessarily synchronous, 
rhythmic, action potential activity changes, as is often 
hypothesized. 

These findings suggest a fundamentally different 
approach to the way we consider the cortical poten- 
tial spectrum: power-law scaling reflects asynchronous, 
averaged input to the local neural population, whereas 
changes in characteristic brain rhythms reflect synchro- 
nized populations that coherently oscillate across large 
cortical regions. Miller et al. [39.31] also augment the 
findings by demonstrating power-law scaling in sim- 
ulated cortical potentials using small-scale, simplified 
integrate and fire neuronal models, an example of which 
is shown in Fig. 39.6. 


9°6€ | d Hed 


742 Part D | Neural Networks 


9°6€ | d Hed 


d) Power (arb. units) 


Pf) A 107 


Presynaptic 


mo'y 


Spike arrival times 


b) r 10° 
ECoG electrode Postsynaptic current P(f) < A ———— 
1+ (fo 
T= (2mfo)' = 2-4 ms 10° 
F s 
z £ Ohmic current through membrane 1 r 
> | S aif 
& stl I(t) œ ([Q]in — [Q]out) 1+ (f/f) | - 60 spike/s 
10°} — 15 spike/s 


E — l m 
AW naa A, 10 100 500 


Signal time dependence Frequency (Hz) 


Fig. 39.6a-d An illustration of how the power-law phenomena in the cortical potential might be generated based on a simulation 
study (see [39.31] for details). Panel (d) shows the PSD of this signal and has a knee 70 Hz, with a power law of P ~ 1/f*, which 
would normally be observed in ECoG PSD. The change in the spectra with increasing mean spike rate of synaptic input strongly 
resembles the change observed experimentally over motor cortex during activity, as demonstrated in [39.31] (after [39.31] with 
permission from Miller) 


Knowledge of this power-law scaling in the brain 
surface electric potential was subsequently exploited in 
a number of further studies investigating differences 
in motor cortical processing during imagined and ex- 
ecuted movements [39.32] and the role of rhythms and 
oscillations in sensorimotor activations [39.33]. 

As outlined, motor imagery to produce volitional 
neural signals to control external devices and for re- 
habilitation is one of the most popular approaches 
employed in brain-computer interfaces. As highlighted 
in the previous section neuroimaging using hemody- 
namic markers (positron emission tomography (PET) 
and fMRI) and extra cerebral magnetic and electric 
field studies (MEG and EEG) have shown that mo- 
tor imagery activates many of the same neocortical 
areas as those involved in planning and execution of 
movements. Miller et al. [39.32] studied the execution- 
imagery similarities with electrocorticographic cortical 
surface potentials in eight human subjects during overt 
action and kinaesthetic imagery of the same movement 
to determine what and where are the neuronal sub- 
strates that underlie motor imagery-based learning and 
the congruence of cortical electrophysiologic change 
associated with motor movement and motor imagery. 
The results show that the spatial distribution of acti- 
vation significantly overlaps between hand and tongue 
movement in the lower frequency bands (LFB) but not 
in the higher frequency bands (HFB), whereas during 


Fig. 39.7a-e An illustration of modes of neural activity 
with cortical beta rhythm states. (a) Modulation of broad- 
band amplitude by underlying rhythm can be thought of as 
population-averaged spike-field interaction. (b) Released 
cortex demonstrates a small amount of broadband power 
coupling to underlying rhythm phase, and the underlying 
spiking from pyramidal neurons is high in rate and only 
weakly coupled to the underlying rhythm phase. (c) Sup- 
pressed cortex demonstrates less broadband power but with 
higher modulation by the underlying rhythm, while under- 
lying single unit spiking is low in rate but tightly coupled 
to the rhythm phase. (d) A simplified heuristic for how 
rhythms might influence cortical computation: During ac- 
tive computation, pyramidal neurons (PN) engage in asyn- 
chronous activity, where mutual excitation has a sophisti- 
cated spatio-temporal pattern. Averaged across the popula- 
tion, the ECoG signal shows broadband increase, with neg- 
ligible beta. (e) During resting state, cortical neurons, via 
synchronized interneuron (IN) input, are entrained with the 
beta rhythm, which also involves extracortical circuits sym- 
bolized by the input froma synchronizing neuron in the tha- 
lamus (TN). The modulation of local activity with rhythms 
is revealed in the ECoG by significant broadband modula- 
tion with the phase of low frequency rhythms (after [39.33], 
courtesy of Miller; see [39.33] for further details) > 


kinaesthetic imagination of the same movement task the 
magnitude of spectral changes were smaller (26% less 


Neuroengineering | 39.6 Electrocorticographic Motor Cortical Surface Potentials 743 


a) b) c) Suppressed cortex 


Broadband 
ECoG power 


Layers 1/2 


Beta-filtered 
voltage trace 


Nie espa > (apap 


Interaction 


TN A 
o 
=) 
a 
8 2 
3 N Broadband al 
a 
op 
z A y 
p=) -7 0 n 
Frequency Rhythm phase 
Local cortical activity Modulation of local activity by rhythmic input—"supressed state" 
calculated across the electrode array) even though the During an imagery-based learning task involving 


spatially broad decrease in power in the LFB and the fo- real-time feedback of the magnitude of cortical acti- 
cal increase in HFB power were similar for movement vation of a particular electrode, in the form of cursor 
and imagery. movement on screen, the spatial distribution of HFB ac- 


9°6€ | d Hed 


744 Part D 


Neural Networks 


9°6€ | d Hed 


tivity was quantitatively conserved in each case, but the 
magnitude of the imagery associated spectral change in- 
creased significantly and, in many cases, exceeded that 
observed during actual movement. The spatially broad 
desynchronization in LFB is consistent with EEG- 
based imagery, which uses a/f desynchronization as 
a means of cursor control in BCIs [39.19]; however, 
the results demonstrate that this phenomena reflects 
an aspect of cortical processing that is fundamen- 
tally non-specific. LFB desynchronization may reflect 
altered feedback between cortical and subcortical struc- 
ture with a timescale of interaction that corresponds 
to the peak frequency in the PSD as opposed to local, 
somatotopically distinct, population-scale computation. 
Miller et al. [39.32] speculate that the significant LFB 
power difference during movement and imagery might 
be a correlate of a partial release of cortex by subcor- 
tical structures (partial decoherence of a synchronized 
corticothalamic circuit) as opposed to a complete re- 
lease during actual movement or after motor imagery 
feedback. 

The HFB change is reflective of a broadband PSD 
increase that is obscured at lower frequencies by the 
motor associated œ/f rhythms but which has been 
specifically correlated with local population firing rate 
and is observed in a number of spatially overlapping 
areas, including primary motor cortical areas for both 
movement and imagery. These findings have been used 
for much speculation about the neural substrates and 
electrophysiology of movement control. The results 
clearly demonstrate the congruence in large-scale ac- 
tivation between motor imagery and overt movement, 
and imagery-based feedback and the overlapping acti- 
vation in distributed circuits during movement and im- 
agery, the clear role of the primary motor cortex during 
motor imagery, and the role of feedback in the aug- 
mentation of widespread neuronal activity during motor 
imagery. Electrocorticographic evidence of the rele- 
vance of the role of primary motor areas during motor 
imagery to complement EEG and neuroimaging show- 
ing primary motor activation during imagery/movement 
such as those outlined in the previous section was also 
an outcome of the study. The dramatic augmentation 
given by feedback, particularly in primary motor cortex 
is significant, particularly in the context of BCI training, 
because it demonstrates a dynamic restructuring of neu- 
ronal dynamics across a whole population in the motor 
cortex on very short time scales (< 10min) [39.32]. 
This augmentation and restructuring can, indeed, result 
in improved motor imager performance over time but 
leads to the necessity to co-adapt the BCI signal pro- 


cessing to cope with associated non-stationary drifts in 
the resulting oscillations of cortical potentials (a topic 
to which we return in Sect. 39.9). 

Human motor behaviors such as reaching, reading, 
and speaking are executed and controlled by somato- 
motor regions of the cerebral cortex, which are located 
immediately anterior and posterior to the central sul- 
cus [39.33]. Electrical oscillations in the lower beta 
band (12—20Hz) have been shown to have an in- 
verse relationship to motor production and imagery, 
decreasing during movement initiation and production 
and rebounding (synchronization) following movement 
cessation and during imagery continuation in the peri- 
central somatomotor and somatosensory cortex. 

Investigations have been conducted to determine 
whether beta rhythms play an active role in the compu- 
tations taking place in somatomotor cortex or whether it 
is epiphenomenon of cortical state changes influenced 
by the other cortical or subcortical processes [39.33]. 
There is strong correlation between the firing time of 
individual neurons in the primary motor cortex and 
the phase of the beta rhythms in the local field poten- 
tial [39.34]. Miller et al. [39.33] have acquired ECoG 
evidence of the role of beta rhythms in the organi- 
zation of the somatomotor function by analyzing the 
broadband spectral power on fast time scales (tens 
of milliseconds) during rest (visual fixation) and fin- 
ger flexion. The results show that cortical activity 
has a robust entrainment on the phase of the beta 
rhythms, which is predominant in peri-central motor ar- 
eas whereby broadband spectral changes vary with the 
phase of underlying motor beta rhythm. This relation- 
ship between beta rhythms and local neuronal activity 
is a property of the idling brain (present during resting 
and selectively diminished during movement). Specifi- 
cally, Miller et al. [39.33] propose that the predominant 
pattern for the beta range shows a tendency for brain 
activity, as measured by broadband power, to increase 
just prior to the surface negative phase and decrease just 
prior to the surface positive phase of the beta rhythm, 
which they refer to as rhythmic entrainment. The pre- 
dominant phase couplings for 6/a/f ranges are found 
to be different and have different spatial localizations. 

Miller et al. [39.33] proposed a suppression through 
synchronization hypothesis, whereby diffuse cortical 
inputs originating from subcortical areas might func- 
tionally suppress large regions of the cortex, the ad- 
vantage of which is to enable selective engagement 
of task-relevant and task-irrelevant brain circuits and 
for dynamical reallocation of metabolic resources. This 
shifting entrainment suggests that the -rhythm is not 


Neuroengineering | 39.7 MEG and EEG - Extra Cerebral Magnetic and Electric Fields of the Motor System 745 


simply a background process that is suppressed during 
movement, but rather that the beta rhythm plays an ac- 
tive and important role in motor processing. In recent 
years, there has been a growing focus on coupling be- 


tween neuronal firing and rhythmic brain activity, and 
this study provides substantial evidence and methodol- 
ogy to support the important role of brain rhythms in 
neuronal functioning. 


39.7 MEG and EEG - Extra Cerebral Magnetic and Electric Fields 


of the Motor System 


The previous section highlighted a number of the most 
recent examples of ECoG-based studies that are shed- 
ding more light on the way in which the motor system 
processes information and is activated during imagery 
and movement. As electrocorticography is a highly in- 
vasive procedure involving surgery, a key question that 
has been addressed is whether the spectral findings 
and spatial specificity of ECoG will ever be possible 
using non-invasive extracerebrally acquired EEG, or 
whether ECoG findings can be used to develop better 
EEG-based processing methodologies to extract ECoG 
information. At the International BCI meeting in 2010 
a workshop addressed the critical questions around 
the state-of-the-art BCI signal processing, in particu- 
lar, should future BCI research emphasize a shift from 
scalp-recorded EEG to ECoG, and how are the signals 
from the two modalities related? [39.35]. 

There is still much debate around the future of EEG 
for BCI due to its limited spatial resolution and vari- 
ous noise-related issues, whereas ECoG shows much 
promise in addressing both of these issues. However, 
ECoG requires surgical implantation and the long-term 
effectiveness remains to be verified in humans. A step 
toward answering this question is to better understand 
the relationship between EEG and ECoG. In a work- 
shop summary the question was addressed by com- 
paring and contrasting the contribution of population 
synchronized (rhythmic) and asynchronous changes in 
the EEG and ECoG potential measurements [39.35]. 
The beta rhythm is robust in extracerebral EEG record- 
ings, spatially synchronous across the pre and postcen- 
tral gyri, so this coherent rhythm is augmented with 
respect to background spatial averaging. The different 
states of the surface rhythms may represent switching 
between the stable modes observed in on-going surface 
oscillations. 

In contrast, the broadband spectral change that ac- 
companies movement is asynchronous at the local level 
and unrelated across cortical regions, so it is dis- 
torted or diminished by spatial averaging. Krusienski 
et al. [39.35] compared the contribution of popula- 


tion synchronized (rhythmic) and asynchronous (broad- 
band, 1/f) changes in the EEG and ECoG poten- 
tial measurements using a number of simplifications 
and approximations. These approximations suggest that 
synchronized cortical oscillations may be differently re- 
flected at the EEG scale than the ECoG scale. Krusien- 
ski et al. [39.35] show that to have the same contribution 
to EEG that a single cortical column has on ECoG, the 
spatial extent of cortical activity would have to span 
nearly the full width of a gyrus, and nearly a centime- 
ter longitudinally. Based upon ECoG measurements of 
the 1/f change in the visual cortex, the findings con- 
firm the possibility of detecting 1/f change in EEG 
during visual input directly over the occipital pole as an 
event-related potential. In the pre-central motor cortex, 
the movement of several digits in concert can produce 
a widespread change, which is dramatic enough to be 
measured in the EEG; however, based on these find- 
ings, the detection of single finger digit movement in 
EEG is not possible. This finding has been supported in 
other recent studies, a number of which involved mag- 
netoencephalography (MEG). As with EEG, the mag- 
netoencephalogram (MEG) is recorded non-invasively; 
however MEG is a record of magnetic fields, measured 
outside the head, produced by electrical activity within 
the brain, whereas EEG is a measure of the electrical 
potentials. Synchronized neuronal currents, produced 
primarily by the intracellular electrical currents within 
the dendrites of pyramidal cells in the cortex, induce 
weak magnetic fields. Neuronal networks of 50000 or 
more active neurons are needed to produce a detectable 
signal using MEG. MEG has a number of advantages 
over EEG, most notably its spatial specificity as the 
magnetic flux detected at the surface of the head with 
MEG penetrates the skull and tissues without signifi- 
cant distortion, unlike the secondary volume currents 
detected outside the head with EEG [39.36]. MEG, 
however, is also less practical, requiring significant 
shielding from environmental electric magnetic inter- 
ference and is not a wearable or mobile technology like 
EEG and, therefore, cannot be used for bedside record- 


Z'6E | d Hed 


746 PartD 


Neural Networks 


2°6€ | d Hed 


ings in a clinical setting or mobile BCI applications. 
MEG has been used in a range of clinical applica- 
tions (cf. [39.36] for a review) and for research. Below 
we describe a number of studies with focus on mo- 
tor cortical investigations in the context of developing 
brain—computer interfaces. 

Quandt et al. [39.30] have investigated single trial 
brain activity in MEG and EEG recordings elicited 
by finger movement on one hand. The muscle mass 
involved in finger movement is smaller than in limb 
or hand movement, and neuronal discharges of mo- 
tor cortex neurons are correspondingly smaller in fin- 
ger movement than in arm or wrist movements. This 
makes detection of finger movement more difficult 
from non-invasive recordings. Using MEG Kauhanen 
et al. [39.37, 38] showed that left and right-hand index 
finger movement can be discriminated and that single 
trial brain activity recorded non-invasively can be used 
to decode finger movement; however, there are sig- 
nificant obstacles in non-invasive recordings in terms 
of the substantial overlapping activations in M1 when 
decoding individual finger movements on the same 
hand. Miller et al. [39.39] and Wang et al. [39.40] have 
shown that real-time representation of individual fin- 
ger movements is possible using ECoG; however, fin- 
ger movement discrimination from extracerebral neural 
recordings has only recently been shown to be possi- 
ble. Quandt et al. [39.30] found using simultaneously 
recorded EEG and MEG that finger discrimination on 
the same hand is possible with MEG but EEG is not 
sufficient for robust classification. The lower spatial res- 
olution of scalp signal EEG is due to the spatial blurring 
at the interface of tissues with different conductance. 
The issue cannot be overcome by increasing the den- 
sity of EEG electrodes. It is speculated that the strong 
curvature of the cortical sheet in the finger knob (an 
omega-shaped knob of the central sulcus) contributes 
to the high decoding accuracy of MEG, whereby ori- 
entation change in the active tissue may change spatial 
patterns of magnetic flux measured in sensor space, but 
potentials caused by the same processes are not de- 
tectable at the scalp. Using different approaches four 
fingers on the same hand could be decoded with circa 
57% accuracy using MEG and across all cases MEG 
performs better than EEG (p < 0.005), whilst EEG of- 
ten only produced accuracies slightly above the upper 
confidence interval for guessing. 

Analysis of the oscillations from MEG correspond 
to ECoG studies where the power of the lower os- 
cillations (< 60 Hz) decreases around the movement, 
whereas power in the high gamma band increases 


and that the effects in the high gamma band are 
more spatially focused than in the lower frequency 
bands [39.30]. Interestingly, the discrimination accu- 
racy from the band power of the most informative 
frequency band between 6 and 11 Hz was clearly in- 
ferior to the accuracy derived from the time-series data, 
indicating that slow movement-related neural activation 
modulations are most informative about which finger 
of a hand moves, and the inferior accuracy given by 
the band power is likely to be due to the lack of phase 
information contained in band-power features [39.30] 
(time embedding, temporal sequence information and 
exploiting phase information in discriminating motor 
signals are revisited in Sect. 39.9). 

The above are just a few examples of what has been 
shown not to be possible to characterize sufficiently 
in EEG. The following section provides an overview 
of the known sensorimotor phenomena detectable from 
EEG and some of the most recent advances in decoding 
hand/arm movements non-invasively. 


39.7.1 Sensorimotor Rhythms and Other 
Surrounding Oscillations 


There are a number of rhythms and potentials that have 
been strongly linked with motor control, many of which 
have been exploited in EEG-based non-invasive BCI 
devices. As outlined previously, the sensorimotor area 
(SMA) generates a variety of rhythms that have spe- 
cific functional and topographic properties. To reiterate, 
distinct rhythms are generated by hand movements 
over the post central somatosensory cortex. The u 
(8—12 Hz) and $ (13—30 Hz) bands are altered dur- 
ing sensorimotor processing [39.41—43]. Attenuation of 
the spectral power in these bands indicates an event- 
related desynchronization (ERD), whilst an increase in 
power indicates event-related synchronization (ERS). 
ERD of the jz and f bands are commonly associated 
with activated sensorimotor areas and ERS in the u 
band is associated with idle or resting sensorimotor 
areas. ERD/ERS has been studied widely for many cog- 
nitive studies and provides very distinctive lateralized 
EEG pattern differences, which form the basis of left 
hand vs. right hand or foot MI-based BCIs [39.44, 45]. 
However, as outlined above, later studies have shown 
the actual rhythmic activity generated by the sensori- 
motor system can be much more detailed. The œ or u 
component of the SMR also has a phase-coupled sec- 
ond peak in the beta band. Both the alpha and beta 
peaks can become independent at the offset of a move- 
ment, after which the beta band rebounds faster and 


Neuroengineering | 39.7 MEG and EEG - Extra Cerebral Magnetic and Electric Fields of the Motor System 747 


with higher amplitude than the alpha band. Desynchro- 
nization of the beta band during a motor task can occur 
in different frequency bands than the subsequent resyn- 
chronization (rebound) after the motor task [39.41]. As 
previously outlined, many studies have shown that neu- 
ral networks similar to those of executed movement are 
activated during imagery and observation of movement 
and thus similar sensorimotor rhythmic activity can be 
observed during motor imagery and execution. 

Gamma oscillations of the electromagnetic field of 
the brain are known to be involved in a variety of cogni- 
tive processes and are believed to be fundamental for in- 
formation processing within the brain. Gamma oscilla- 
tions have been shown to be correlated with other brain 
rhythms at different frequencies and a recent study 
has shown the causal influences of gamma oscillation 
on sensorimotor rhythms (SMR) in healthy subjects 
using magnetoencephalography [39.46]. It has been 
shown that the modulation of sensorimotor rhythms is 
positively correlated with the power of frontal and oc- 
cipital gamma oscillations, negatively correlated with 
the power of centro-parietal gamma oscillations and 
that simple causal structure can be attributed to a causal 
relationship or influence of gamma oscillations on the 
SMR. The behavioral correlate of the topographic alter- 
ations of gamma power, a shift of gamma power from 
centro-parietal to frontal and occipital regions, remains 
elusive, although increased gamma power over frontal 
areas has been associated with selective attention in 
auditory paradigms. Grosse-Wentrup et al. [39.46] pos- 
tulated that neurofeedback of gamma activity may be 
used to enhance BCI performance to help low aptitude 
BCI users, i. e., those who appear incapable of BCI con- 
trol using SMR. 


39.7.2 Movement-Related Potentials 


Signals observed during and before the onset of move- 
ment signify motor planning and preparation. For ex- 
ample, the bereitschafts potential or BP (from German, 
readiness potential), also called the pre-motor potential 
or readiness potential (RP), is a measure of activity in 
the motor cortex of the brain leading up to voluntary 
muscle movement [39.47, 48]. The BP is a manifesta- 
tion of cortical contribution to the pre-motor planning 
of volitional movement. Krauledat et al. [39.49, 50] re- 
port on experiments carried out using the lateralized 
readiness potential (LRP) (i. e., Bereitschafts potential) 
for brain—computer interfaces. Before accomplishing 
motor tasks a negative readiness potential which re- 
flects the preparation can be observed. They showed 


it is possible to distinguish the pre-movement poten- 
tials from finger tapping experiments, even before the 
movement occurs or the onset of the movement, thus 
potentially improving accuracy and reducing latency 
in the BCI system. The BP is ten to a hundred times 
smaller than the a-rhythm of the EEG and it can only be 
identified by averaging across trials and has two compo- 
nents: an early component referred to as BP1 (sometime 
NS1) lasting from about —1.2 to —0.5s before move- 
ment onset (negative slope (NS) of early BP) and a late 
component (BP2 or NS2) from —0.5 to shortly before 
Os (steeper negative slope of late BP) [39.48, 51,52]. 
A pre-movement positivity can be observed along with 
a motor-potential which starts about 50 to 60 ms before 
the onset of movement and has its maximum over the 
contralateral precentral hand area. 


39.7.3 Decoding Hand Movements from EEG 


Movement-related cortical potentials (MRCP) have 
been used as control signals for BCIs [39.53]. MRCP 
and SMR have distinct changes during execution or 
imagination of voluntary movements. MRCP is con- 
sidered a slow cortical potential where the surface 
negativity which develops 2 s before the movement on- 
set is the Beireitschaftspotential referred to above. Gu 
et al. [39.53] studied MRCP and SMRs in the con- 
text of discriminating the movement type and speed 
from the same limb based on the hypothesis that if 
the imagined movements are related to the same limb, 
the control could be more natural than associating 
commands to movements of different limbs for BCIs. 
They focused on fast slow wrist rotation and exten- 
sion and they found that average MRCPs rebounded 
more strongly when fast-speed movements were imag- 
ined compared with slow-speed movements; however, 
the rebound rate of MRCP was not substantially differ- 
ent between movement types. The peak negativity was 
more pronounced in the frontal (Fz) and central region 
(C1) than in the occipital region (Pz). The rebound rate 
of MRCP was greater in the central region (C1) when 
compared to the occipital region (Pz). MRCP and SMR 
are independent of each other as they originate from 
different brain sources and they occupy different fre- 
quency bands [39.52—54]. This renders them useful for 
multi-dimensional control in BCIs. 

In accordance with the analysis of averaged 
MRCPs, the single-trial classification rate between two 
movements performed at the same speed was lower 
than when combining movements at different speeds. 
Gu et al. [39.53] suggest that selecting different speeds 


Z'6E | d Hed 


748 PartD 


Neural Networks 


8°6€ | d Hed 


rather than different movements when these are exe- 
cuted at the same joint may be best for BCI applica- 
tions. However, the task pair that was optimal in terms 
of classification accuracy is subject-dependent, and thus 
a subject-specific evaluation of the task pair should be 
conducted. The study by Gu et al. [39.53] is important 
as it is one of a limited number of studies that focus on 
discriminating different movements of the same limb as 
opposed to moving different limbs from EEG, which is 
much more common practice in BCI designs. However, 
Lakany and Conway [39.55] investigated the difference 
between imagined and executed wrist movements in 20 
different directions using machine learning and found 
that the accuracy of discriminating wrist movement 
imagination is much less than for actual movement; 
however, they later found [39.56] time-frequency EEG 
features modulated by force direction in arm isometric 
exertions to four different directions in the horizontal 
plane can give better directional discrimination infor- 
mation and that t—f features from the planning and 
execution phase may be most appropriate. 

Although a limited number of works demonstrating 
EEG-based 2-D and 3-D continuous control of a cur- 
sor through biofeedback have been reported [39.57, 
58] along with a few studies of classification of the 
direction/speed of 2-D hand/wrist movements outlined 
above, there are very few studies that have demon- 
strated continuous decoding of hand kinematics from 
EEG. Classification of different motor imagery tasks 
on single trial basis is more commonly reported. The 
signal-to-noise ratio, the bandwidth, and the informa- 
tion content of electroencephalography are generally 
thought to be insufficient to extract detailed infor- 
mation about natural, multi-joint movements of the 
upper limb. However, evidence from a study by Brad- 
berry et al. [39.59] investigating whether the kinematics 
of natural hand movements are decodable from EEG 
challenges this assumption. They continuously extract 
hand velocity from signals collected during a three- 
dimensional (3-D) center-out reaching task and found 
that a linear EEG decoding model could reconstruct 3-D 
hand-velocity profiles reasonably well and that sensor 
CP3, which lies roughly above the primary sensori- 


motor cortex contralateral to the reaching hand, made 
the greatest contribution. Using a time-lagged approach 
they found that EEG data from 60 ms in the past sup- 
plied the most information with 16.0% of the total 
contribution suggesting a linear decoding method such 
as the one used [39.59] rely on a sub-seconds history 
of neural data to reconstruct hand kinematics. Using 
a source localization technique they found that the pri- 
mary sensorimotor cortex (pre-central gyrus and post- 
central gyrus) was indeed a major contributor along 
with the inferior parietal lobule (IPL), all of which 
have been found to be activated during motor execu- 
tion and imagery in other investigations [39.12, 13, 16]. 
Bradberry et al. [39.59] also found that the movement 
variability is negatively correlated with decoding accu- 
racy, suggesting two reasons; 1) increased movement 
variability could degrade decoding accuracy due to less 
similar pairs of EEG-kinematic exemplars, i.e., less 
movement variability results in reduced intra-class vari- 
ability for training, and 2) subjects differ in their ability 
to perform the task without practice (motor learning 
is important for improving predictions of movement). 
Hence, the strengths of a priori neural representations of 
the required movements vary until learned or practiced, 
and these differences could directly relate to the accu- 
racy with which the representations can be extracted. 
This study provides important evidence that decodable 
information about detailed, complex hand movements 
can be derived from the brain non-invasively using 
EEG; however, it remains to be determined whether 
these findings are consistent when using the same 
methodology in an imagined 3-D center-out task. 

Although we know a lot about brain structure asso- 
ciated with sensorimotor activity, as well as the rhythms 
and potentials surrounding this activity, we have not 
yet systematically linked the neural correlates of these 
to specific motion primitives or motor control models. 
Modeling, using biological plausible neural models, the 
findings in relation to motor cortical structure, function, 
and dynamics along with linkage to the underlying mo- 
tor psychophysics and advanced signal processing in 
BCI may help advance our knowledge on motion prim- 
itives, sensorimotor learning, and control. 


39.8 Extracellular Recording — Decoding Hand Movements 
from Spikes and Local Field Potential 


Although fMRI, MEG, and EEG offer low risk, non- 
surgical recording procedures they have inherent limita- 


tions which many expect can be overcome with invasive 
approaches such as ECoG (described in Sect. 39.6) and 


Neuroengineering | 39.8 Extracellular Recording — Decoding Hand Movements from Spikes and Local Field Potential 749 


by implanting electrodes to record the electrical activity 
of single neurons extracellularly (single unit record- 
ings). Here we focus on some recent studies aimed 
at testing this scale for use in sensorimotor-related 
BCIs. 

Extracellular recording has many advantageous, in- 
cluding high signal amplitude (up to 500 uV), low 
susceptibility to external noise and artefact (eye move- 
ments, cardiac activity, muscle activity) leading to high 
signal-to-noise ratio, high spatial resolution (50 m7), 
high temporal resolution (~ 1 ms), and high spectral 
content (up to 2kHz) due to the close vicinity to 
the electric source. As a consequence, there is a high 
correlation between the neural signals recorded and 
the generated/imagined hand movements, resulting in 
a short learning duration when employed in a mo- 
tor BCI system. The disadvantages of the invasive 
recording technique include a complex and expensive 
medical procedure, susceptibility to infections (possi- 
bly leading to meningitis, epilepsy), pain, prolonged 
hospitalization, direct damage to the neural tissue (e.g., 
a flat 15 um x 60 um electrode penetrating 2 mm deep 
hits, on average, 5 neurons and 40000 synapses), indi- 
rect damage to the neural tissue (small blood vessels 
are hit by the electrode causing ischemia for distant 
neurons and synapses and the evolution of an inflam- 
matory response), and evolvement of a scarred tissue 
which electrically isolates the electrodes from the sur- 
roundings and render the system non-responsive after 
being implanted for extended durations. Furthermore, 
the electrode material itself, however biocompatible, 
sooner or later causes an inflammatory reaction and the 
evolvement of scarred tissue. 

Theoretically, however, extracellular recordings of- 
fer more accurate information that may enable us to 
devise realistic BCI systems that allow for additional 
degrees of freedom and natural control of prosthetic 
devices, such as a hand and arm prostheses. To this 
end, substantial efforts have been put into devising 
novel biocompatible electrodes (e.g., platinum, irid- 
ium oxide, carbonic polymers) that will delay immune 
system stimulation, devising multi-functional micro- 
electrodes that allow for recording/stimulating while 
injecting anti-inflammatory agents to suppress inflam- 
matory response, devising hybrid microelectrodes that 
allow for the inclusion of pre-amplifier and multi- 
plexer on the electrode chip to allow wireless trans- 
mission of the data, thus avoiding the necessity for 
a scalp drill hole used for taking out the flat cable 
carrying the neural data, which is prone to causing 
infections. 


Extracellular recording, being the most invasive 
recording technique (compared to non-invasive EEG 
and MEG recording and partially invasive ECoG 
recording) allows recording both the high-frequency 
content neural output activity, i. e., spikes, and the low 
frequency content neural input activity, denoted as lo- 
cal field potential (LFP), which is the voltage caused 
by electrical current flowing from all nearby dendritic 
synaptic activity across the resistance of the local ex- 
tracellular space. In the following section, the neural 
coding schemes, in general, and the cortical correlates 
of kinematic and dynamic motion attributes, in specific, 
will be presented along with their suggested use for cur- 
rent and future BCI systems. 


39.8.1 Neural Coding Schemes 


A sequence, or train, of spikes may contain information 
based on different coding schemes. In motor neurons, 
for example, the strength at which an innervated muscle 
is flexed depends solely on the firing rate, the average 
number of spikes per unit time (a rate code). At the 
other end, a complex temporal code is based on the pre- 
cise timing of single spikes. They may be locked to an 
external stimulus such as in the auditory system or be 
generated intrinsically by the neural circuitry. Whether 
neurons use rate coding or temporal coding is a topic 
of intense debate within the neuroscience community, 
even though there is no clear definition of what these 
terms mean. Neural schemes include rate coding, spike 
count rate, time-dependent firing rate, temporal coding, 
and population coding. 


Rate coding 
Rate coding is a traditional coding scheme, assuming 
that most, if not all, information about the stimulus is 
contained in the firing rate of the neuron. The concept 
of firing rates has been successfully applied during the 
last 80 years. It dates back to the pioneering work of 
Adrian and Zotterman who showed that the firing rate 
of stretch receptor neurons in the muscles is related to 
the force applied to the muscle [39.60]. In the following 
decades, measurement of firing rates became a standard 
tool for describing the properties of all types of sensory 
or cortical neurons, partly due to the relative ease of 
measuring rates experimentally. 

Because the sequence of action potentials gener- 
ated by a given stimulus varies from trial to trial, 
neuronal responses are typically treated statistically or 
probabilistically. They may be characterized by firing 
rates, rather than as specific spike sequences. In most 


8°6€ | d Hed 


750 PartD 


Neural Networks 


8°6€ | d Hed 


sensory systems, the firing rate increases, generally 
non-linearly, with increasing stimulus intensity. Any in- 
formation possibly encoded in the temporal structure of 
the spike train is ignored. Consequently, rate coding is 
inefficient but highly robust with respect to the inter- 
spike interval (ISI) noise. During recent years, more 
and more experimental evidences have suggested that 
a straightforward firing rate concept based on tempo- 
ral averaging may be too simplistic to describe brain 
activity [39.61]. In rate coding, learning is based on 
activity-dependent synaptic weight modifications. 


Spike-count rate 
Spike-count rate also referred to as temporal average, 
is obtained by counting the number of spikes that ap- 
pear during a trial and dividing by the duration of the 
trial. The length T of the time window is set by the ex- 
perimenter and depends on the type of neuron recorded 
from and the stimulus. In practice, to obtain sensible 
averages, several spikes should occur within the time 
window. Typical values are T = 100 ms or T = 500 ms, 
but the duration may also be longer or shorter. 

The spike-count rate can be determined from a sin- 
gle trial, but at the expense of losing all temporal 
resolution about variations in neural response during 
the course of the trial. Temporal averaging can work 
well in cases where the stimulus is constant or slowly 
varying and does not require a fast reaction of the or- 
ganism — and this is the situation usually encountered 
in experimental protocols. Real-world input, however, 
is hardly stationary, but often changing on a fast time 
scale. For example, even when viewing a static im- 
age, humans perform saccades, rapid changes of the 
direction of gaze. The image projected onto the retinal 
photoreceptors changes, therefore, every few hundred 
milliseconds. Despite its shortcomings, the concept of 
a spike-count rate code is widely used not only in exper- 
iments, but also in models of neural networks. It has led 
to the idea that a neuron transforms information about 
a single input variable (the stimulus strength) into a sin- 
gle continuous output variable (the firing rate). 


Time-dependent firing rate 
Time-dependent firing rate is defined as the average 
number of spikes (averaged over trials) appearing dur- 
ing a short interval between times ¢ and t + At, divided 
by the duration of the interval. It works for stationary as 
well as for time-dependent stimuli. To experimentally 
measure the time-dependent firing rate, the experi- 
menter records from a neuron while stimulating with 
some input sequence. The same stimulation sequence is 


repeated several times and the neuronal response is re- 
ported in a peri-stimulus-time histogram (PSTH). The 
time ¢ is measured with respect to the start of the stimu- 
lation sequence. The At must be large enough (typically 
in the range of 1 or a few milliseconds) so there is 
a sufficient number of spikes within the interval to ob- 
tain a reliable estimate of the average. The number of 
occurrences of spikes nx (t; t+ At) summed over all rep- 
etitions of the experiment divided by the number K of 
repetitions is a measure of the typical activity of the 
neuron between time f and t+ At. A further division 
by the interval length Ar yields the time-dependent fir- 
ing rate r(t) of the neuron, which is equivalent to the 
spike density of PSTH. 

For sufficiently small Az, r(t) At is the average num- 
ber of spikes occurring between times ¢ and t+ At 
over multiple trials. If At is small, there will never 
be more than one spike within the interval between t 
and t+ At on any given trial. This means that r(rt) At 
is also the fraction of trials on which a spike occurred 
between those times. Equivalently, r() At is the proba- 
bility that a spike occurs during this time interval. As 
an experimental procedure, the time-dependent firing 
rate measure is a useful method to evaluate neuronal 
activity, in particular in the case of time-dependent 
stimuli. The obvious problem with this approach is that 
it cannot be the coding scheme used by neurons in the 
brain. Neurons cannot wait for the stimuli to repeat- 
edly present in exactly the same manner as observed 
before generating the response. Nevertheless, the ex- 
perimental time-dependent firing rate measure makes 
sense, if there are large populations of independent neu- 
rons that receive the same stimulus. Instead of recording 
from a population of N neurons in a single run, it is ex- 
perimentally easier to record from a single neuron and 
average over N repeated runs. Thus, the time-dependent 
firing rate coding relies on the implicit assumption that 
there are always populations of neurons. 


Temporal coding 
When precise spike timing or high-frequency firing-rate 
fluctuations are found to carry information, the neural 
code is often identified as a temporal code. A number 
of studies have found that the temporal resolution of the 
neural code is on a millisecond time scale, indicating 
that precise spike timing is a significant element in neu- 
ral coding [39.62]. Temporal codes employ those fea- 
tures of the spiking activity that cannot be described by 
the firing rate. For example, the time to first spike after 
the stimulus onset, characteristics based on the second 
and higher statistical moments of the ISI probability dis- 


Neuroengineering | 39.8 Extracellular Recording - Decoding Hand Movements from Spikes and Local Field Potential 751 


tribution, spike randomness, or precisely timed groups 
of spikes (temporal patterns) are candidates for temporal 
codes. As there is no absolute time reference in the ner- 
vous system, the information is carried either in terms of 
the relative timing of spikes in a population of neurons 
or with respect to an ongoing brain oscillation. 

The temporal structure of a spike train or firing rate 
evoked by a stimulus is determined both by the dy- 
namics of the stimulus and by the nature of the neural 
encoding process. Stimuli that change rapidly tend to 
generate precisely timed spikes and rapidly changing 
firing rates no matter what neural coding strategy is 
being used. Temporal coding refers to temporal preci- 
sion in the response that does not arise solely from the 
dynamics of the stimulus, but that nevertheless relates 
to properties of the stimulus. The interplay between 
stimulus and encoding dynamics makes the identifica- 
tion of a temporal code difficult. The issue of temporal 
coding is distinct and independent from the issue of 
independent-spike coding. If each spike is independent 
of all the other spikes in the train, the temporal charac- 
ter of the neural code is determined by the behavior of 
the time-dependent firing rate r(t). If r(t) varies slowly 
with time, the code is typically called a rate code, and 
if it varies rapidly, the code is called temporal. 


Population coding 
Population coding is a method to represent stimuli by 
using the joint activities of a number of neurons. In 
population coding, each neuron has a distribution of 
responses over some set of inputs, and the responses 
of many neurons may be combined to determine some 
value about the inputs. 

Currently, BCI and BMI (brain machine interface) 
systems rely mostly on population coding. The descrip- 
tion of one of the most famous population codes — 
the motor population vector along with its use in cur- 
rent and future BCI and BMI systems is presented in 
Sect. 39.8.2. 


39.8.2 Single Unit Activity Correlates 
of Hand Motion Attributes 


In 1982, Georgopoulos et al. [39.63] found that the ac- 
tivity of single cells in the motor cortex of monkeys, 
who were making arm movements in eight directions 
(at 45° intervals) in a two-dimensional apparatus, var- 
ied in an orderly fashion with the direction of the move- 
ment. Discharge was most intense with movements in 
a preferred direction and was reduced gradually when 
movements were made in directions farther and farther 


away from the preferred movement. This resulted in 
a bell-shaped directional tuning curve. These relations 
were observed for cell discharge during the reaction 
time, the movement time, and the period that preceded 
the earliest changes in the electromyographic activity 
(approximately 80ms before movement onset) (elec- 
tromyography (EMG) is a technique for evaluating and 
recording the electrical activity produced by skeletal 
muscles). In about 75% of the 241 directionally tuned 
cells, the frequency of discharge D was a sinusoidal 
function of the direction of movement 


D = bọ + bı sin O + bz cos O , (39.3) 
or, in terms of the preferred direction © 
D = bọ + cı cos(O — Oo) , (39.4) 


where bo, bj, b2, and c; are regression coefficients. Pre- 
ferred directions differed for different cells so that the 
tuning curves partially overlapped. The orderly varia- 
tion of cell discharge with the direction of movement 
and the fact that cells related to only one of the eight 
directions of movement tested were rarely observed, 
indicated that movements in a particular direction are 
not subserved by motor cortical cells uniquely re- 
lated to that movement. It was suggested, instead, that 
a movement trajectory in a desired direction might be 
generated by the cooperation of cells with overlapping 
tuning curves. The orderly variation in the frequency of 
discharge of a motor cortical cell with the direction of 
movement is shown in Fig. 39.8. 

Later on, Amirikian et al. systematically examined 
the variation in the shape of the directional tuning pro- 
files among a population of cells recorded from the arm 
area of the motor cortex of monkeys using movements 
in 20 directions, every 18° [39.64]. This allowed the 
investigation of tuning functions with extra parameters 
to capture additional features of the tuning curve (i. e., 
tuning breadth, symmetry, and modality) and determine 
an optimal tuning function. It was concluded that mo- 
tor cortical cells are more sharply tuned than previously 
thought. 

Paninski et al. [39.65] using a pursuit-tracking task 
(PTT) in which a monkey had to continuously track 
a randomly moving visual stimulus (thus providing 
a broad sample of velocity and position space) with 
invasive recordings from the M1 region showed that 
there is heterogeneity of position and velocity coding in 
that region, with markedly different temporal dynamics 
for each — velocity-tuned neurons were approximately 
sinusoidally tuned for direction, with linear speed scal- 
ing; other cells showed sinusoidal tuning for position, 


8°6€ | d Hed 


752 


8°6€ | d Hed 


Part D 


Neural Networks 


Fig. 39.8 Orderly variation in the frequency of 
discharge of a motor cortical cell with the di- 
rection of movement. Upper half: rasters are 
oriented to the movement onset M and show 
impulse activity during five repetitions of move- 
ments made in each of the eight directions indi- 
cated by the center diagram. Notice the orderly 


variation in cell’s activity during the RT (re- 
action time), MOT (movement time) and TET 


Impulse/s 
60 4 


40 


20 


0° 45° 135° 225 SSi 


with linear scaling by distance. Velocity encoding led 
behavior by about 100 ms for most cells, whereas posi- 
tion tuning was more broadly distributed, with leads and 
lags suggestive of both feed-forward and feedback cod- 
ing. Linear regression methods confirmed that random, 
2-D hand trajectories can be reconstructed from the fir- 
ing of small ensembles of randomly selected neurons 
(3—19 cells) within the M1 arm area. These findings 
demonstrate that M1 carries information about evolving 
hand trajectory during visually guided pursuit tracking, 
including information about arm position both during 
and after its specification. 

Georgopoulos et al. formulated a population vector 
hypothesis to explain how populations of motor cor- 
tex neurons encode movement direction [39.66]. In the 
population vector model, individual neurons vote for 
their preferred directions using their firing rate. The 
final vote is calculated by vectorial summation of indi- 
vidual preferred directions weighted by neuronal rates. 
This model proved to be successful in description of 


(total experiment time; TET = RT + MOT). 
Lower half: directional tuning curve of the same 
cell. The discharge frequency is for TET. The 
data points are mean + SEM. The regression 
equation for the fitted sinusoidal curve is D = 
32.37 + 7.281 sin O — 21.343 cos ©, where D 

is the frequency of discharge and © is the direc- 
tion of movement, or, equivalently, D = 32.37 + 
22.5 cos (© — Oo), where po is the preferred di- 
rection (Op = 161°) (after [39.63], courtesy of 
A.P. Georgopoulos) 


motor-cortex encoding of 2-D and 3-D reach direc- 
tions, and was also capable of predicting new effects, 
e.g., accurately describing mental rotations made by 
the monkeys that were trained to translate locations of 
visual stimuli into spatially shifted locations of reach 
targets [39.67, 68]. 

The population vector study actually divided the 
field of motor physiologists between Evarts’ upper mo- 
tor neuron group, which followed the hypothesis that 
motor cortex neurons contributed to control of single 
muscles [39.69] and the Georgopoulos group studying 
the representation of movement directions in the cortex. 
From the theoretical point of view, population coding 
is one of a few mathematically well-formulated prob- 
lems in neuroscience. It grasps the essential features of 
neural coding and, yet, is simple enough for theoretic 
analysis. Experimental studies have revealed that this 
coding paradigm is widely used in the sensor and motor 
areas of the brain. For example, in the visual area me- 
dial temporal (MT) neurons are tuned to the movement 


Neuroengineering | 39.8 Extracellular Recording - Decoding Hand Movements from Spikes and Local Field Potential 


direction. In response to an object moving in a particu- 
lar direction, many neurons in MT fire, with a noise- 
corrupted and bell-shaped activity pattern across the 
population. The moving direction of the object is re- 
trieved from the population activity, to be immune from 
the fluctuation existing in a single neuron’s signal. 

Population coding has a number of advantages, 
including reduction of uncertainty due to neuronal 
variability and the ability to represent a number of 
different stimulus attributes simultaneously. Population 
coding is also much faster than rate coding and can 
reflect changes in the stimulus conditions nearly in- 
stantaneously. Individual neurons in such a population 
typically have different but overlapping selectivities, 
so that many neurons, but not necessarily all, respond 
to a given stimulus. The Georgopoulos vector coding 
is an example of simple averaging. A more sophis- 
ticated mathematical technique for performing such 
a reconstruction is the method of maximum likelihood 
based on a multi-variate distribution of the neuronal 
responses. These models can assume independence, 
second-order correlations [39.70], or even more de- 
tailed dependencies such as higher-order maximum 
entropy models [39.71] 

The finding that arm movement is well represented 
in populations of neurons recorded from the motor cor- 
tex has resulted in a rapid advancement in extracellular 
recording-based BCI in non-human primates and in 
a limited number of human studies. Several groups have 
been able to capture complex brain motor cortex signals 
by recording from neural ensembles (groups of neu- 
rons) and using these to control external devices. First, 
cortical activity patterns have been used in BCIs to 
show how cursors on computer displays can be moved 
in two and three-dimensional space. It was later real- 
ized that the ability to move a cursor can be useful in 
its own right and that this technology could be applied 
to restore arm and hand function for amputees and the 
physically impaired. 

Miguel Nicolelis has been a prominent proponent 
of using multiple electrodes spread over a greater area 
of the brain to obtain neuronal signals to drive a BCI. 
Such neural ensembles are said to reduce the variability 
in output produced by single electrodes, which could 
make it difficult to operate a BCI. After conducting 
initial studies in rats during the 1990s, Nicolelis et al. 
succeeded in building a BCI that reproduced owl mon- 
key movements while the monkey operated a joystick 
or reached for food [39.72]. The BCI operated in real 
time and could also control a separate robot remotely 
over internet protocol. However, the monkeys could not 


see the arm moving and did not receive any feedback, 
a so-called open-loop BCI. 

Other laboratories which have developed BCIs and 
algorithms that decode neuron signals include those run 
by John Donoghue, Andrew Schwartz, and Richard An- 
dersen. These researchers have been able to produce 
working BCIs, even using recorded signals from far 
fewer neurons than Nicolelis used (15—30 neurons ver- 
sus 50—200 neurons). Donoghue et al. reported training 
rhesus monkeys to use a BCI to track visual targets on 
a computer screen (closed-loop BCI) with or without 
the assistance of a joystick [39.73]. 

Later experiments by Nicolelis using rhesus mon- 
keys succeeded in closing the feedback loop and re- 
produced monkey reaching and grasping movements 
in a robot arm. With their deeply cleft and furrowed 
brains, rhesus monkeys are considered to be better mod- 
els for human neurophysiology than owl monkeys. The 
monkeys were trained to reach and grasp objects on 
a computer screen by manipulating a joystick, while 
corresponding movements by a robot arm were hid- 
den [39.74, 75]. The monkeys were later shown the 
robot directly and learned to control it by viewing its 
movements. The BCI used velocity predictions to con- 
trol reaching movements and simultaneously predicted 
hand gripping force. 

The use of cortical signals to control a multi- 
jointed prosthetic device for direct real-time interaction 
with the physical environment (embodiment) was first 
demonstrated by Schwartz et al. [39.76]. Schwartz et al. 
implanted 96 intracortical microelectrodes in the proxi- 
mal arm region of the primary motor cortex of monkeys 
(Macaca mulatta) and used their motor cortical activ- 
ity to control a mechanized arm replica and control 
a gripper on the end of the arm. The monkey could feed 
itself pieces of fruit and marshmallows using a robotic 
arm controlled by the animal’s own brain signals. Ow- 
ing to the physical interaction between the monkey, the 
robotic arm, and the objects in the workspace, this new 
task presented a higher level of difficulty than previous 
virtual (cursor control) experiments. 

In 2012 Schwartz et al. [39.68] showed that a 52- 
year-old individual with tetraplegia who was implanted 
with two 96-channel intracortical microelectrodes in the 
motor cortex could rapidly achieve neurological control 
of an anthropomorphic prosthetic limb with seven de- 
grees of freedom (three-dimensional translation, three- 
dimensional orientation, one-dimensional grasping). 
The participant was able to move the prosthetic limb 
freely in the three-dimensional workspace on the sec- 
ond day of training. After 13 weeks, robust seven- 


753 


8°6€ | d Hed 


754 + PartD 


Neural Networks 


6°6€ | d Hed 


dimensional movements were performed routinely. The 
participant was also able to use the prosthetic limb to 
do skillful and coordinated reach and grasp movements 
that resulted in clinically significant gains in tests of up- 
per limb function. No adverse events were reported. 

In addition to predicting kinematic and kinetic pa- 
rameters of limb movements, BCIs that predict elec- 
tromyographic or electrical activity of the muscles of 
primates are being developed [39.77]. Such BCIs may 
be used to restore mobility in paralyzed limbs by 
electrically stimulating muscles. Miguel Nicolelis and 
colleagues demonstrated that the activity of large neu- 
ral ensembles can predict arm position. This work made 
possible the creation of BCIs that read arm movement 
intentions and translate them into movements of artifi- 
cial actuators. Carmena et al. [39.74] programmed the 
neural coding in a BCI that allowed a monkey to con- 
trol reaching and grasping movements by a robotic arm. 
Lebedev et al. [39.75] argued that brain networks re- 
organize to create a new representation of the robotic 
appendage in addition to the representation of the ani- 
mal’s own limbs. 


The biggest impediment to BCI technology at 
present is the lack of a sensor modality that provides 
safe, accurate, and robust access to brain signals. It is 
conceivable or even likely, however, that such a sen- 
sor will be developed within the next 20 years. The 
use of such a sensor should greatly expand the range 
of communication functions that can be provided using 
a BCL 

To conclude, this demonstration of multi-degree-of- 
freedom embodied prosthetic control paves the way to- 
wards the development of dexterous prosthetic devices 
that could ultimately achieve arm and hand function at 
a near-natural level. 


39.8.3 Local Field Potential Correlates 
of Hand Motion Attributes 


Local field potentials can be recorded with extracellular 
recordings, and a number of studies have shown their 
application; however, as ECoG and EEG (covered in 
Sects. 39.5 and 39.6) are indirect measures of LFPs we 
do not cover LFPs here again for brevity. 


39.9 Translating Brainwaves into Control Signals — BCIs 


Heretofore the chapter has focused on the characteris- 
tics of the neural correlates of motor control and how 
these might be deployed in SMR-based EEG, ECoG, 


wi 
Hardware _- 


Patient/client/ 
gamer 


Amplify-digitize- 
transmit-record 


Data 


Brain nent 
acquisition 


parameter 
selection 


@ control i 


and MEG BCI designs, providing evidence of activa- 
tions at various scales of the brain and a brief outline 
of individual methodologies for attaining this evidence. 


Offline 


Fiy 
T 


i. all AA 
’ Application devices 


with intelligence 


Device | 


Optional feedback 


Online 
unsupervised 
adaption 


_| Computer| 


Post- 
processing 


monitor | 


Fig. 39.9 Illustration of the vari- 
ous components of a BCI involving 
a closed-loop learning system as 
well as offline and online parameter 
optimization and system adaptation 


Neuroengineering 


39.9 Translating Brainwaves into Control Signals — BCIs 


Brain-computer interfaces, however, require a number 
of stages of signal processing and components to be 
effective and robust. Figure 39.9 shows common com- 
ponents of a complete BCI system. Although not all 
components shown are deployed together in every sys- 
tem there is increasing evidence that combining the best 
approaches deployed for each component and process 
in a multi-stage framework as well as ensemble meth- 
ods or multi-classifier approaches can lead to significant 
performance gains when discriminating sensorimotor 
rhythms and translating brain oscillations into stable 
and accurate control signals. Performance here can be 
considered from various perspectives, including sys- 
tem accuracy in producing the correct response, the 
speed at which a response is detected (or the number 
of correct detections possible in a given period), the 
adaptability to each individual and the inherent non- 
stationary dynamics of the mutual interaction between 
the brain and the translating algorithm, the length of 
training required to reach an acceptable performance, 
the number of sensors required to derive a useful con- 
trol signal, and the amount of engagement needed by 
the participant, to name but a few. The following sec- 
tions highlight some of the methods which have been 
tried and tested in sensorimotor rhythm BCIs; how- 
ever the coverage is by no means exhaustive. Also, the 
main emphasis is on EEG-based BCI designs as EEG- 
based BCI has been the driving force behind much of 
the novel signal processing research conducted in the 
field over the last 20 years, with some of the more in- 
vasive approaches considered less usable in the short 
term, high risk for experimentation and deployment in 
humans, with less funding to develop invasive strategies 
and less data availability. 

EEG being the least informative, spectrally and spa- 
tially, about the underlying brain processes and subject 
to deterioration and spatial diffusion by the physi- 
cal properties of the cerebrospinal fluid, skull, and 
skin, as well as the ominous susceptibility to contam- 
ination from other sources such as muscle and eye 
movements, poses the most challenges for engineers, 
mathematicians, and computer scientists. Researchers 
in these, among many other disciplines, are eager to 
solve a problem which has dogged the field for long 
namely, creating an EEG-based BCI which is accu- 
rate and robust across time for individual subjects and 
can be deployed across multiple subjects easily to offer 
a communication channel which matches or surpasses, 
at least, other basic, tried and tested computer periph- 
eral input devices and/or basic assistive communication 
technologies. Signal processing, as shown in Fig. 39.10, 


is only one piece of the puzzle with a range of other 
components being equally as important, including elec- 
trode technologies and hardware being critical to data 
quality, usability, and acceptability of the system. Addi- 
tionally, the technologies and devices under the control 
of the BCI are another aspect, which is not dealt with 
here but is a topic which requires investigation to de- 
termine how applications can be adapted to cope with 
the, as yet, inevitable inconsistencies in the communi- 
cation and control signals derived from the BCI. Here 
our intention is not to deal with these elements of 
brain computer interface but only to provide the reader 
with an indicative overview of key signal processing 
and discrimination topics under consideration in the 
area, perhaps not topics that have received the atten- 
tion deserved, but show promise. Interested readers are 
referred to [39.78-87] for comprehensive surveys of 
BCl-control strategies and signal-processing strategies. 


39.9.1 Pre-Processing 
and Feature Extraction/Selection 


Oscillatory and rhythmic activity in various frequency 
bands are a predominant feature in sensorimotor 
rhythm-based BCIs, as outlined in Sect. 39.7.1. Whilst 
amplitude of power in subject-specific sub-bands has 
proven to be a reliable feature to enable discrimina- 
tion of the lateralized brain activity associated with 


Left vs. rest Foot vs. rest 


High 


Right vs. rest 


Left (class 1) 
Low 
High 


Left (class 2) 


High 
Foot (class 3) 
Low 
High 


Tongue (class 4) 


~ 
a 
\ 
s 


Sof N/S 
Pe 
Sl a 


Low 


Variances 


Fig. 39.10 Hypothetical relative variance levels of the CSP trans- 


formed surrogate data 


Tongue vs. rest 


755 


6°6€ | d Hed 


756 PartD 


Neural Networks 


6°6€ | d Hed 


gross arm movement imagination from EEG, there is 
a general consensus that there is a necessity to ex- 
tract much more information about spatial and temporal 
relationships by correlating the synchronicity, ampli- 
tude, phase, and coherence of oscillatory activity across 
distributed brain regions. To that end, spectral filter- 
ing is often accompanied with spatial pattern estima- 
tion techniques, channel selection techniques, along 
with other preprocessing techniques to detect signal 
sources and for noise removal. These include principle 
component analysis (PCA) and independent compo- 
nent analysis (ICA), among others, whilst the most 
commonly used is the common spatial patterns (CSP) 
approach [39.88-91]. 

Many of these methods involve linear transforma- 
tions where a set of possibly correlated observations are 
transformed into a set of uncorrelated variables and can 
be used for feature dimensionality reduction, artifact re- 
moval, channel selection, and dimensionality reduction. 
CSP is by far the most commonly deployed of all these 
filters in sensorimotor rhythm-based BCIs. 

CSP maximizes the ratio of class-conditional vari- 
ances of EEG sources [39.88, 89]. To utilize CSP, $; 
and >, are the pooled estimates of the covariance ma- 
trices for two classes, as follows 


Ic 
= D ce{1,2}, 


c i=l 


(39.5) 


where /, is the number of trials for class c and X; is the 
M x N matrices containing the i-th windowed segment 
of trial i; N is the window length and M is the num- 
ber of EEG channels. The two covariance matrices, Xi 
and }`,, are simultaneously diagonalized such that the 
eigenvalues sum to 1. This is achieved by calculating 
the generalized eigenvectors W 


zr- (E+E) 


where the diagonal matrix D contains the eigenvalue 
of X; and the column vectors of W are the filters for 
the CSP projections. With this projection matrix the de- 
composition mapping of the windowed trials X is given 
as 


(39.6) 


E=WxX. (39.7) 


To generalize CSP to three or more classes (the multi- 
class paradigm), spatial filters can be produced for 
each class vs. the remaining classes (one vs. rest ap- 
proach). If q is the number of filters used then there 


are qx C surrogate channels from which to extract 
features. To illustrate how CSP enhances separability 
among four classes the hypothetical relative variance 
level of the data in each of the four classes are shown in 
Fig. 39.10. 

CSP has been modified and improved substan- 
tially using numerous techniques and deployed and 
tested in BCIs [39.88-92]. CSP is commonly ap- 
plied with spectral filters. One of the more successful 
approaches to spectral filtering combined with CSP 
is the filter bank CSP approach [39.93, 94]. Another 
promising technique for the analysis of multi-modal, 
multi-channel, multi-tasks, multi-subject, and multi- 
dimensional data is multi-way (array) tensor factor- 
ization/decomposition [39.95]. The technique has been 
shown to have the ability to discriminate between dif- 
ferent conditions, such as right hand motor imagery, 
left hand motor imagery, or both hands motor imagery, 
based on the spatiotemporal features of the different 
EEG tensor factorization components observed. 

Due to the short sequences of events during motor 
control it is likely that assessment of activity at a fine 
granularity such as the optimal embedding parameters 
for prediction as well as the predictability of EEG over 
short and long time spans and across channels will also 
provide clues about the temporal sequences of motor 
planning and activations and the motion primitives in- 
volved in different hand movement trajectories. Work 
has shown that subject, channel, and class-specific op- 
timal time embedding parameter selection using partial 
mutual information improves the performance of a pre- 
dictive framework for EEG classification in SMR-based 
BCIs [39.92, 96-102]. Many other time series model- 
ing, embedding, and prediction through traditional and 
computational intelligence techniques such as fuzzy 
and recurrent neural networks (FNN and RNN) have 
been promoted for EEG preprocessing and feature ex- 
traction to maximize signal separability [39.19, 66-69, 
103]. 

The above preprocessing or filtering frameworks 
have been used extensively, yet rarely independently, 
but in conjunction with a stream of other signal pro- 
cessing methodologies to extract reliable information 
from neural data. It is well known that the amplitude 
and the phase of neural oscillations are spatially and 
temporally modulated during sensorimotor processing 
(see Sects. 39.6 and 39.7 for further details). Spec- 
tral information and band power extraction have been 
commonly used as features ((39.82, 83] for reviews); 
however, phase and cross frequency coupling less so, 
even though a number of non-invasive and intracortical 


Neuroengineering 


39.9 Translating Brainwaves into Control Signals — BCIs 


studies have emphasized the importance of phase infor- 
mation [39.33, 35, 104, 105]. Furthermore, amplitude- 
phase cross-frequency coupling has been suggested 
to play an important role in neural coding [39.106]. 
While neural representations of movement kinematics 
and movement imagination by amplitude information 
in sensorimotor cortex have been extensively reported 
using different oscillatory signals (LFP, ECoG, MEG, 
EEG) [39.33, 35, 39, 107-111] and used extensively in 
non-invasive motor imagery-based BCI designs phase 
information has not been given as much attention as 
possibly deserved [39.35]. As reported in [39.35] there 
have been some recent developments describing syn- 
chronized activity between M1 and hand speed [39.8 1, 
82], corticomuscular coupling [39.112], and the LFP 
beta oscillations phase locked to target cue onset in 
an instructed-delay reaching task [39.113], in addition 
to the studies covered in Sects. 39.6 and 39.7, among 
others. The role of phase coding in the sensorimotor 
cortex should be further explored to fully exploit the 
complementary information encoded by amplitude and 
phase [39.35]. 

Parameter optimization can be made more profi- 
cient through global searches of the parameter space 
using evolutionary computation-based approaches such 
as particle swarm optimization (PSO) and genetic al- 
gorithms (GAs). The importance of features can be 
assessed and ranked for different tasks using various 
feature selection techniques using information theoretic 
approaches such as partial mutual information-based 
(PMI) input variable selection [39.98, 114]. Parame- 
ter optimization and feature selection approaches such 
as these enable coverage of a large parameter space 
when additional features are identified to enhance per- 
formance. Heuristic-based approaches can be used to 
determine the relative increase in classification associ- 
ated with each variable along with other more advanced 
methods for feature selection such as Fisher’s crite- 
rion and partial mutual information to estimate the level 
of redundancy among features. Verifying the feature 
landscape using global heuristic searches is important 
initially and automated intelligent approaches enable 
efficient and automated system optimization during ap- 
plication at a later time and easy application to a large 
sample of participant data, i. e., removing the necessity 
to conduct global parameter searches. 


39.9.2 Classification 


Various classifier techniques can be applied to the sam- 
pled data to determine classification/prediction perfor- 


mance, including standard linear methods such as linear 
discriminant analysis (LDA), support vector machines 
(SVM), and probabilistic-based approaches [39.115], 
as well as non-linear approaches such as backprop- 
agation neural networks (NN) and self-organizing 
fuzzy neural networks (SOFNN) [39.116]. Other adap- 
tive methods and approaches to classifier combination 
have been investigated [39.87, 88, 117, 118] along with 
Type-2 fuzzy logic approaches to deal with uncer- 
tainty [39.119, 120]. Recent evidence has shown that 
probabilistic classifier vector machines (PCVM) have 
significant potential to outperform other tried and tested 
classifiers [39.121,122]. These are just a few of the 
available approaches (see [39.82] for a more detailed re- 
view). Here we focus on one of the latest trends in BCI 
translation algorithms, i.e., automated adaptation to 
non-stationary changes in the EEG dynamics over time. 


39.9.3 Unsupervised Adaptation 
in Sensorimotor Rhythms BCIs 


EEG signals deployed in BCI are inherently non- 
stationary resulting in substantial change over time, 
both within a single session and between sessions, re- 
sulting in significant challenges in maintaining BCI 
system robustness. There are various sources of non- 
stationarities: short-term changes related to modifica- 
tion to the strategy that users apply to motor imagery 
to enhance performance, drifts in attention, attention to 
different stimuli or processing other thoughts or stim- 
uli/feedback, slow cortical potential drifts and less spe- 
cific long-term changes related to fatigue, small day to 
day differences in the placement of electrodes, among 
others. However, one which is considered a potential 
source of change over time is user adaption through 
motor learning to improve BCI performance over time, 
sometimes referred to as the effects of feedback train- 
ing [39.123, 124] and sensorimotor learning. 

The effects of feedback on the user’s ability to pro- 
duce consistent EEG, as he/she begins to become more 
confident and learns to develop more specific com- 
munication and control signals, can have a negative 
effect on the BCI’s feature extraction procedure and 
classifier. During sensorimotor learning the temporal 
and spatial activity of the brain continually adapts and 
the features which were initially suited to maximizing 
the discrimination accuracy may not remain stable as 
time evolves, thus degradation in communication oc- 
curs. For this reason, the BCI must have the ability to 
adapt and interact with the adaptations that the brain 
makes in response to the feedback. According to Wol- 


757 


6°6€ | d Hed 


758 PartD 


Neural Networks 


6°6€ | d Hed 


paw etal. [39.125] the BCI operation depends on the 
interaction of two adaptive controllers, the user’s brain, 
which produces the signals measured by the BCI, and 
the BCI itself, which translates these signals into spe- 
cific commands [39.125]. 

With feedback, even though classification accuracy 
is expected to improve with an increasing number of 
experiments, the performance has been shown to de- 
crease with time if the classifier is not updated [39.43]. 
This has been referred to as the man—machine learning 
dilemma (MMLD), meaning that the two systems in- 
volved (man and machine) are strongly interdependent, 
but cannot be controlled or adapted in parallel [39.43]. 
The experiments shown in many studies show that feed- 
back results in changing EEG patterns, and thus adap- 
tation of the pattern recognition methods is required. 
It is, therefore, paramount to adapt a BCI periodically 
or continuously if possible. Autonomous adaptive sys- 
tem design is required but a challenge. The recognition 
and productive engagement of adaptation will be impor- 
tant for successful BCI operation. According to Wolpaw 
et al. [39.125] there are three levels of adaptation which 
are not always accounted for but have great importance 
for future adoption of BCI systems: 


1. When a new user first accesses the BCI, the algo- 
rithm adapts to the user’s signal features. 

@ No two people are the same physiologically 
or psychologically, therefore brain topography 
differs among individuals, and the electrophysi- 
ological signals that are produced from different 
individuals are unique to each individual, even 
though they may be measured from the same lo- 
cation on the scalp whilst performing the same 
mental tasks at the same time. For each new 
user the BCI has to adapt specifically to the 
characteristics of each particular person’s EEG. 
This adaptation may be to find subject-specific 
frequency bands which contain frequency com- 
ponents that enable maximal discrimination ac- 
curacy between two mental tasks or train a static 
classifier on a set of extracted features. 

2. The second level of adaptation requires that the 
BCI system components be periodically adjusted or 
adapted online to reduce the impact of spontaneous 
variations in the EEG. 

@ Any BCI system which only possesses the first 
level of adaptation will continue to be effective 
only if the user’s performance is very stable. 
Most electrophysiological signals display short 
and long-term variations due to the complexity 


of the physiological functioning of the underly- 
ing processes in the brain, among other sources 
of change as outlined above. The BCI sys- 
tem should have the ability to accommodate 
these variations by adapting to the signal fea- 
ture values which maximally express the user’s 
intended communication. 
3. The third level of adaptation accommodates and en- 

gages the adaptive capacities of the brain. 

@ The BCI depends on the interaction of two 
adaptive controllers, the BCI and the user’s 
brain [39.125]: 


When an electrophysiological signal feature 
that is normally merely a reflection of brain 
function becomes the end product of that func- 
tion, that is, when it becomes an output that 
carries the user intent to the outside world, it 
engages the adaptive capacities of the brain. 


This means that, as the user develops the skill 
of controlling their EEG, the brain has learned 
a new function and, hopefully, the brain’s newly 
learned function will modify the EEG so as to 
improve BCI operation. The third level of adap- 
tation should accommodate and encourage the 
user to develop and maintain the highest pos- 
sible level of correlation between the intended 
communication and the extracted signal features 
that the BCI employs to decipher the intended 
communication. Due to the nature of this adap- 
tation (the continuous interaction between the 
user and the BCI) it can only be assessed on- 
line and its design is among the most difficult 
problems confronting BCI research. 


McFarland et al. [39.126] further categorize adapta- 
tion into system adaptation, user adaptation, and system 
and user co-adaption, asking the question: is it neces- 
sary to continuously adapt the parameters of the BCI 
translation algorithm? Their findings show that for sen- 
sorimotor rhythms BCI it is, whereas perhaps it is not 
for other stimulus-based BCIs. 

A review of adaptation methods is included by 
Hasan [39.127] focusing on questions: what, how, and 
when to adapt and how to evaluate adaptation suc- 
cess. A range of studies has been aimed at address- 
ing the adaption requirements [39.123, 124, 128-137]. 
Krusienski et al. [39.35] define the various types of pos- 
sible adaptation as follows: 


© Covariate shift adaptation/minimization: Covariate 
shift refers to when the distribution of the training 


Neuroengineering 


39.9 Translating Brainwaves into Control Signals — BCIs 


features and test features follow different distri- 
butions, while the conditional distribution of the 
output values (of the classifier) and the features 
is unchanged [39.138]. The shift in feature distri- 
bution from session to session can be significant 
and can result in substantive biasing effects. With- 
out some form of adaption to the features and/or 
classifier, the classifier trained on a past session 
would perform poorly in a more recent session. 
Satti et al. [39.139] proposed a method for covari- 
ate shift minimization (CSM), where features can 
be adapted so that the feature distribution is always 
consistent with the distribution of the features that 
were used to train the classifier in the first session. 
This can be achieved in an unsupervised manner 
by estimating the shift in distribution using a least 
squares fitting polynomial for each feature and re- 
moving the shift by adding the common mean of 
the training feature distribution so that the feature 
space distribution remains constant over time as 
described in [39.139]. Mohammadi et al. [39.140] 
applied CSM in self-paced BCI updated features to 
account for short terms (within trial) drifts in sig- 
nal dynamics. In [39.138] an importance-weighted 
cross-validation for accommodating covariate shift 
under a number of assumptions is described but 
is not adaptively updated online in an unsuper- 
vised manner whereas other offline approaches have 
been investigated to enable feature extraction meth- 
ods to accommodate non-stationarity and covariate 
shifts [39.90, 91]. 

Feature adaptation/regression: Involves adapting 
the parameters of the feature extraction methods 
to account for subject learning, e.g., modifying 
the subject-specific frequency bands can be easily 
achieved in a supervised manner but this is not nec- 
essarily easily achieved online, unsupervised. An 
approach to adaptively weight features based on mu 
and beta rhythm amplitudes and their interactions 
using regression [39.4] resulted in significant per- 
formance improvements and may be adapted for 
unsupervised feature adaptation. 

Covariate shift adaptation/minimization can be con- 
sidered an anti-biasing method because it pre- 
vents the classifier biasing, whereas feature adap- 
tion/regression is likely to result in the need to adapt 
the classifier to suit the new feature distributions. 
Both methods help to improve the performance over 
time, but it is uncertain if feature adaption followed 
by covariate shift minimization (to shift features to- 
wards earlier distribution) would limit the need for 


classifier adaptation and/or provide stable perfor- 
mance or negate the benefits of feature regressions. 
An interesting discussion on the interplay between 
feature regression and adapting bias and gain terms 
in the classifier is presented in [39.4]. 

Classifier adaptation: Unsupervised classifier adap- 
tation has received more attention than feature 
adaptation with a number of methods having been 
proposed [39.124, 128]. Classifier adaptation is re- 
quired when significant learning (or relearning)- 
induced plasticity in the brain significantly alters 
the brain dynamics, resulting in a shift in feature 
distribution, as well as significant changes in the 
conditional distribution between features and classi- 
fier output as opposed to cases where only covariate 
shift has occurred. In such cases, classifier adap- 
tation can neither be referred to as anti-biasing or 
de-biasing. 

Post-processing adaptation: De-biasing the clas- 
sifier output, in its simplest form, can be per- 
formed in an unsupervised manner by removing 
the mean calculated from a window of recent clas- 
sifier outputs from the instantaneous value of the 
classifier [39.141], also referred to as normalization 
in [39.142], where the data from recent trials are 
used to predict the mean and standard deviation of 
the next trial and the data of the next trial is then 
normalized by these estimates to produce a control 
signal which is assumed to be stationary. De-biasing 
is suitable when covariate shift has not been ac- 
counted for and can improve the online feedback 
response but may only provide a slight performance 
improvement. 

EEG data space adaptation (EEG-DSA): acts on 
the raw data space and is a new approach to lin- 
early transform the EEG data from the target space 
(evaluation/testing session), such that the distribu- 
tion difference to the source space (training session) 
is minimized [39.143]. The Kullback—Leibler (KL) 
divergence criterion is the main method deployed 
in this approach and it can be applied in a super- 
vised or unsupervised manner either periodically or 
continuously. Other adaptations (feature space or 
classifier) can be applied in tandem but accurate 
minimization of feature space adaptation should 
negate the need for further anti-biasing and or de- 
biasing adaptations. 


Classifier adaptation (anti-biasing) negates the need 


for post processing (de-biasing) if the classifier is up- 
dated continuously, which is a challenging task to 


759 


6°6€ | d Hed 


760 PartD | Neural Networks 


6°6€ | d Hed 


undertake in an unsupervised manner (with no class 
labels) and may result in maladaptation, whereas de- 
biasing can be conducted easily, unsupervised, regard- 
less of the classifier used. Because post-processing- 
based de-biasing only results in removal of bias (shifts it 
to mean zero) in the feedback signal and not necessarily 
a change in the dynamics of the feedback signal, feature 
adaptation or classifier adaptation is necessary during 
subject learning and adaptation as the conditional dis- 
tribution between features and classifier output evolves 
as outlined above. 

All of the above methods are heavily dependent 
upon the context in which the BCI is used. For exam- 
ple, for a BCI applied in alternative communication the 
objective is to maximize the probability of interpret- 
ing the user’s intent correctly; therefore the adaptation 
is performed with that objective, whereas, if the BCI 
is aimed at inducing neuroplastic changes in specific 
cortical areas, e.g., a BCI which is aimed at support- 
ing stroke survivors perform motor imagery as means 
of enhancing the speed or level of rehabilitation post 
stroke, the objective is to not only provide accurate 
feedback but to encourage the user to activate regions 
of cortex which do not necessarily provide optimal 
control signals [39.35]. The latter may require elec- 
trode/channel adaptation strategies but not necessarily 
in a fast online unsupervised manner. Abrupt changes 


Communication method 
Reading 
Spoken English (a) 
Gestural: ASL 
Spoken English (b) 
Braille 
Eye tracker 
Fingerspelling 
Foot pedal 
Hand 
Mouse 
Joystick 
Morse code (c) 
Trackball 
Morse code (d) 
Touchpad 
Brain-computer interface 


0 15 30 45 
Communication speed (bit/s) 


Fig. 39.11 Comparison of communication rates between humans 
and the external world: (a) speech received auditorily; (b) speech 
received visually using lip reading and supplemented by cues; 
(c) Morse code received auditorily; (d) Morse code received 
through vibrotactile stimulation (figure adapted from [39.144] with 
permission and other sources [39.145, 146]) 


to classifier performance may also lead to negative 
learning where the user cannot cope with the rate at 
which the feedback dynamics change, in such cases 
consistent feedback, even though less accurate, may 
be appropriate. As outlined in [39.126], there is still 
debate around whether mutual adaptation of a sys- 
tem and user is a necessary feature of successful BCI 
operation or if fast adaptation of parameters during 
training is not necessary. A recent study in animal 
models suggests that there is no negative correlation 
between decoding performance and the time between 
model generation and model testing, which suggests 
that the neural representations that encode kinematic 
parameters of reaching movements are stable across 
the months of study [39.147, 148], which further sug- 
gests little adaptation is needed for ECoG decoding in 
animal models, but this may not necessarily translate 
to humans and non-invasive BCIs involving motor im- 
agery. Much more research on the issue of what type of 
adaption methods to apply and at what rate adaptation 
is necessary. Another important factor is to consider 
a person’s level of ability to control a BCI and those 
persons close to chance levels may actually benefit from 
an incorrect belief on their performance level [39.149]. 
This would imply adapting the classifier output based 
on knowledge of the targets in a supervised manner, 
such that the user thinks they are performing better — 
a method which may help in the initial training phases 
to improve BCI performance [39.149]. Most of the 
techniques outlined above have been tested offline and, 
therefore, there is need to assess how the techniques 
improve performance as the user and BCI are mutu- 
ally adapted. Table 39.1 provides a summary of the 
categories of adaptation and their interrelationships and 
requirements. 


39.9.4 BCI Outlook 


Translating brain signals into control signals is a com- 
plex task. The communication bandwidth given by BCI 
is still lagging behind most other communication meth- 
ods rates between humans and the external world where 
the maximum BCI communication rate is ~ 0.41 bit/s 
(~ 25 bit/min) [39.144] (see Fig. 39.11 for an illus- 
tration that nicely illustrates the gap in communica- 
tion bandwidth between BCI and other communication 
methods, as well as the relatively low communica- 
tion bandwidth across all human—human and human- 
computer interaction methods). 

Nevertheless with the many developments and stud- 
ies highlighted throughout this chapter (a selected few 


Neuroengineering 


39.9 Translating Brainwaves into Control Signals — BCIs 


Table 39.1 BCI components that can be adapted and the way in which they can be adapted. Interrelationship between 
components, i.e., indicating when one is adapted which other component or stages of the signal processing pipeline it 
might be necessary to adapt (whether a calibration session is needed for offline setup or there is certain number of trials 
needed before adaption begins is not specified in the criteria but is another consideration) 


Adaptation type Anti- De- Subject Feature 
biasing biasing relearning updates 


Channel Y i 
adaptation 

Data space SE 

adaptation 

Feature we v w 
regression 

Covariate shift Y NA 
minimization 

Covariate shift M 

adaptation 

Classifier Y K 

adaptation 

Gain/bias Y 


Joystick | 
---- EEG 


SSEESse Cortical neurons | 
>! 


Movement time (s) 


Fig. 39.12 Distributions of target-acquisition times (i.e., 
time from target appearance to target hit) on a 2-D 
center-out cursor-movement task for joystick control, 
EEG-based BCI control, and cortical neuron-based BCI 
control. The EEG-based and neuron-based BCIs per- 
form similarly and both are slower and much less 
consistent than the joystick. For both BCIs in a sub- 
stantial number of trials, the target is not reached 
even in the 7s allowed. Such inconsistent perfor- 
mance is typical of movement control by present-day 
BCIs, regardless of what brain signals they use. (Joy- 
stick data and neuron-based BCI data from Hochberg 
et al. [39.150]; EEG-based BCI data from Wolpaw and 
McFarland [39.57]; figure after [39.58], courtesy of Mc- 
Farland et al.) 


Classifier Online Super- Unsuper- Performance 


adaptation vised vised improvement 
likely 
Ye Y Y 
Y Ye Y 
y: Y y Y 
Da ne Y 
i Y Y 
X Y Ye We 
Y Slight 


among many) there has been progress, yet there is 
still debate around whether invasive recordings are 
more appropriate for BCI, with findings showing that 
performance to date is not necessarily better or commu- 
nications faster with invasive or extracellular recordings 
compared to EEG (see Fig. 39.12 for an illustra- 
tion [39.58]). As shown, performance is far less con- 
sistent than a joystick for 2-D center-out tasks using 
both methods, however the performance is remark- 
ably similar even though the extracellular recordings 
are high resolution and EEG is low resolution. Train- 
ing rates/durations with invasive BCI are probably less 
onerous on the BCI user compared to EEG-based ap- 
proaches, which often require longer durations, how- 
ever only a select few are willing to undergo surgery 
for BCI implants due to the high risk associated with 
the surgery required, at least with the currently available 
technology. This is likely to change in the future and 
information transfer between humans and machines is 
likely to increase to overcome the communication bot- 
tleneck human—human and human—computer interac- 
tion by directly interfacing brain and machine [39.144]. 
There is one limitation that dogs many movement or 
motor-related BCI studies and that is that in a large 
part control relies only a signal from single cortical 
area [39.58]. Exploiting multiple cortical areas may of- 
fer much more and this may be achieved more easily 
and successfully by exploiting information acquired at 
different scales using both invasive and non-invasive 
technologies (many of the studies reported in this chap- 


761 


6°6€ | d Hed 


762 


OL'6e | d Hed 


Part D 


Neural Networks 


ter have shown advantages that are unique at the various 
scales of recording). Carmena [39.151] recommends 
that non-invasive BCIs should not be pitted against 
invasive ones as both have pros and cons and have 
gone beyond pitching resolution as an argument to use 
one type or the other. In the future, BCI systems may 
very well become a hybrid of different kinds of neural 
signals, be able to benefit from local, high-resolution in- 
formation (for generating motor commands) and more 
global information (arousal, level of attention, and other 
cognitive states) [39.151]. 

In summary, BCI technology is developing through 
a better understanding of the motor system and sen- 
sorimotor control, better recording technologies, better 
signal processing, more extensive trials with users, 
long-term studies, more multi-disciplinary interactions, 
among many other reasons. According to report con- 
ducted by Berger et al. [39.152] the magnitude of BCI 
research throughout the world will grow substantially, 
if not dramatically, in future years with multiple driv- 
ing forces: 


© Continued advances in underlying science and tech- 
nology 

@ Increasing demand for solutions to repair the ner- 
vous system 

@ Increase in the aging population worldwide; a need 
for solutions to age-related, neurodegenerative dis- 
orders, and for assistive BCI technologies 

@ Commercial demand for non-medical BCIs. 


BCI has the potential to meet many of these chal- 
lenges in healthcare and is already growing in popular- 
ity for non-medical applications. BCI is considered by 
many as a revolutionary technology. 

An analysis of the history of technology shows 
that technological change is exponential, and accord- 


39.10 Conclusion 


The scientific approaches described throughout this 
chapter often overlook the underpinning processes and 
rely on correlations between a minimal number of fac- 
tors only. As a result, current sensorimotor rhythms 
BCIs are of limited functionality and allow basic mo- 
tor functions (a two degrees-of-freedom (DOF) lim- 
ited control of a wheelchair/mouse cursor/robotic arm) 
and limited communication abilities (word dictation). 
It is assumed that BCI systems could greatly benefit 


ing to the law of accelerating returns as the technology 
performance increases, more and more users groups 
begin to adopt the technology and prices begin to 
fall [39.153]. In terms of BCI there has been sig- 
nificant progress over recent years, and these trends 
are being observed with increasing technology diffu- 
sion [39.144]. In terms of research there has been an 
exponential growth in the number of peer reviewed pub- 
lications since 2000 [39.84]. 

Many studies over the past two decades have 
demonstrated that non-muscular communication, based 
on brain—computer interfaces (BCIs), is possible and, 
despite the nascent nature of BCIs there is already 
a range of products, including alternative commu- 
nication and control for the disabled stroke reha- 
bilitation, electrophysiologically interactive computer 
systems, neurofeedback therapy, and BCI-controlled 
robotics/wheelchairs. A range of case studies have also 
shown that head trauma victims diagnosed as being 
in a persistent vegetative state (PVS) or a minimally 
conscious state and patients suffering locked-in syn- 
drome as a result of motor neuron disease or brainstem 
stroke can specifically benefit from current BCI sys- 
tems, although, as BCIs improve and surpass existing 
assistive technologies, they will be beneficial to those 
with less severe disabilities. In addition, the possibil- 
ity of enriching computer game play through BCI also 
has immense potential, and computer games as well 
as other forms of interactive multi-media are currently 
an engaging interface techniques for therapeutic neuro- 
feedback and improving BCI performance and training 
paradigms. Brain—computer games interaction provides 
motivation and challenge during training, which is used 
as a stepping stone towards applications that offer en- 
ablement and assistance. Based on these projections and 
the ever-increasing knowledge of the brain the future 
looks bright for BCIs. 


from the inclusion of multi-modal data and multi- 
dimensional signal processing techniques, which would 
allow the introduction of additional data sources and 
data from multiple brain scales, and enable detection 
of more subtle features embedded in the signal. Fur- 
thermore, using knowledge about sensorimotor control 
will be critical in understanding and developing suc- 
cessful learning and control models for robotic devices 
and BCI, fully closing the sensorimotor learning loop 


Neuroengineering | 39.10 Conclusion 


to enable finer manipulation abilities using BCIs and 
for retraining or enabling better relearning of motor ac- 
tions after cortical damage. As demonstrated through- 
out the chapter, many remarkable studies have been 
conducted with truly inspirational engineering and sci- 
entific methodologies resulting in many very useful and 
interesting findings. 

There are many potential advantages of understand- 
ing motor circuitry, not to mention the many clinical and 
quality of life benefits that a greater understanding of 
the motor systems may provide. Such knowledge may 
offer better insights into treating motor pathologies that 
occur as a result of injury or diseases such as spinal 
cord injury, stroke, Parkinson’s disease, Guillain Barre 
syndrome, motor diseases, and Alzheimer’s disease, to 
mention just a few. Understanding sensorimotor sys- 
tems can provide significant gains in developing more 
intelligent systems that can provide multiple benefits for 
humanity in general. However, there are still lacunae in 
our biological account of how the motor system works. 

Animals have superb innate abilities to choose and 
execute simple and extended courses of action and the 
ability to adapt their actions to a changing environment. 
We are still a long way from understanding how that 
is achieved and are exploiting this to tackle the issues 
outlined above comprehensively. There are number of 
key questions that need to be addressed [39.154]: 


@ What are the roles of the cortex, the basal ganglia, 
and the cerebellum — the three major neural control 
structures involved in movement planning and gen- 
eration? 

@ How do these structures in the brain interact to de- 
liver seamless adaptive control? 

@ How do we specify how hierarchical control struc- 
tures can be learned? 

@ What is the relationship between reflexes, habits, 
and goal-directed actions? 

© Is there anything to be gained for robotic control 
by thinking about how interactions are organized in 
sensorimotor regions? 

@ Is it essential to replicate this lateralized structure in 
sensorimotor areas to produce better motor control 
in an artificial cognitive system? 

@ How can we create more accurate models of how 
the motor cortex works? Can such models be im- 
plemented to provide human-like motor control in 
an artificial system? 

@ How can we decode motor activity to undertake 
tasks that require accurate and robust three dimen- 
sional control under multiple different scenarios? 


Wolpert et al. [39.16] elaborate on some of these 
questions, in particular, one which has not been ad- 
dressed in this chapter, namely modeling sensorimotor 
systems. Although substantial progress has been made 
in computational sensorimotor control, the field has 
been less successful in linking computational mod- 
els to neurobiological models of control. Sensorimotor 
control has traditionally been considered from a con- 
trol theory perspective, without relation to neurobi- 
ology [39.155]. Although neglected in this chapter, 
computational motor cortical circuit modeling will be 
a critical aspect of research into understanding sen- 
sorimotor control and learning, and is likely to fill 
parts of the lacunae in our understanding that are 
not accessible with current imaging, electrophysiology, 
and experimental methodology. Likewise, understand- 
ing the computations undertaken in many of senso- 
rimotor areas will depend heavily on computational 
modeling. Doya [39.156] suggested the classical notion 
that the cerebellum and the basal ganglia are dedi- 
cated solely to motor control. This is now under dispute 
given increasing evidence of their involvement in non- 
motor functions. However, there is enough anatomical, 
physiological and theoretical evidence to support the 
hypotheses that the cerebellum is a specialized organ- 
ism that may support supervised learning, the basal 
ganglia may perform reinforcement learning role, and 
the cerebral cortex may perform unsupervised learn- 
ing. Alternative theories that enable us to comprehend 
the way the cortex, cerebellum, and the basal ganglia 
participate in motor, sensory or cognitive tasks are re- 
quired [39.156]. 

Additionally, as has been illustrated throughout this 
work, investigating brain oscillations is key to under- 
standing brain coordination. Understanding the coor- 
dination of multiple parts of an extremely complex 
system such as the brain is a significant challenge. 
Models of cortical coordination dynamics can show 
how brain areas may cooperate (integration) and at 
the same time retain their functional specificity (seg- 
regation). Such models can exhibit properties that the 
brain is known to exhibit, including self-organization, 
multi-functionality, meta-stability, and switching. Cor- 
tical coordination can be assessed by investigating the 
collective phase relationships among brain oscillations 
and rhythms in neurophysiological data. Imaging and 
electrophysiology can be used to tackle the challenge 
of understanding how different brain areas interact and 
cooperate. 

Ultimately better knowledge of the motor system 
through neuroengineering sensorimotor—-computer in- 


763 


OL'6€ | d Hed 


764 PartD 


Neural Networks 


6€ | d Hed 


terfaces may lead to better methods of understanding 
brain dysfunction and pathology, better brain—computer 
interfaces, biological plausible neural circuit models, 
and inevitably more intelligent systems and machines 
that can perceive, reason, and act autonomously. It 
is too early to know the overarching control mech- 


References 


anisms and exact neural processes involved in the 
motor system but through the many innovations of 
scientists around the world, as highlighted in this chap- 
ter, pieces of the puzzle are being understood and 
slowly assembled to reach this target and go beyond 


it. 


39.1 


39.2 


39:3 


39.4 


39.5 


39.6 


39.7 


39.8 


39.9 


39.10 


39.11 


39.12 


39.13 


39.14 


D.M. Wolpert, Z. Ghahramani, J.R. Flanagan: Per- 
spectives and problems in motor learning, Trends 
Cogn. Sci. 5, 487-494 (2001) 

Laboratory for Computational Motor Control, John 
Hopkins University, Baltimore, USA: http://www. 
shadmehrlab.org/Courses/medschoollectures. 
html 

R.E. Jung, R.J. Haier: The Parieto-Frontal Inte- 
gration Theory (P-FIT) of intelligence: Converging 
neuroimaging evidence, Behav. Brain Sci. 30,135- 
154 (2007) 

D.J. McFarland, J.R. Wolpaw: Sensorimotor 
rhythm-based brain-computer interface (BCI): 
Feature selection by regression improves perfor- 
mance, IEEE Trans. Neural Syst. Rehabil. Eng. 13, 
372-379 (2005) 

J.W. McDonald: Repairing the damaged spinal 
cord, Sci. Am. 281, 64-73 (1999) 

A.P. Georgopoulos, J. Ashe, N. Smyrnis, M. Taira: 
The motor cortex and the coding of force, Science 
256, 1692-1695 (1992) 

M. Desmurget, D. Pélisson, Y. Rossetti, C. Prablanc: 
From eye to hand: Planning goal-directed move- 
ments, Neurosci. Biobehav. Rev. 22, 761-788 
(1998) 

C. Ghez, J.W. Krakauer, R.L. Sainburg, M.-F. Ghi- 
lasdi: Spatial representations and internal mod- 
els of limb dynamics in motor learning. In: The 
New Cognitive Neurosciences, ed. by M.S. Gaz- 
zaniga (MIT Press, Cambridge 2000) pp. 501-514 
N. Hogan: The mechanics of multi-joint posture 
and movement control, Biol. Cybern. 52, 315-331 
(1985) 

P. Vetter, D.M. Wolpert: Context estimation for 
sensorimotor control, J. Neurophysiol. 84, 1026- 
1034 (2000) 

W.L. Nelson: Physical principles for economies 
of skilled movements, Biol. Cybern. 46, 135-147 
(1983) 

T. Flash, N. Hogan: The coordination of arm 
movements: An experimentally confirmed math- 
ematical model, J. Neurosci. 5, 1688-1703 (1985) 
N. Hogan: An organizing principle for a class of 
voluntary movements, J. Neurosci. 4, 2745-2754 
(1984) 

R. Sosnik, T. Flash, B. Hauptmann, A. Karni: The 
acquisition and implementation of the smooth- 


39.15 


39.16 


39.17 


39.18 


39.19 


39.20 


39.21 


39.22 


39.23 


39.24 


39.25 


39.26 


ness maximization motion strategy is dependent 
on spatial accuracy demands, Exp. Brain Res. 176, 
311-331 (2007) 

R. Sosnik, M. Shemesh, M. Abeles: The point of no 
return in planar hand movements: An indication 
of the existence of high level motion primitives, 
Cogn. Neurodynam. 1, 341-358 (2007) 

D.M. Wolpert, J. Diedrichsen, J.R. Flanagan: Prin- 
ciples of sensorimotor learning, Nat. Rev. Neu- 
rosci. 12, 739-751 (2011) 

F.A. Mussa-lIvaldi, E. Bizzi: Motor learning through 
the combination of primitives, Philos. Trans. R. 
Soc. B 355, 1755-1769 (2000) 

E.A. Henis, T. Flash: Mechanisms underlying the 
generation of averaged modified trajectories, 
Biol. Cybern. 72, 407-419 (1995) 

A. Karni, G. Meyer, C. Rey-Hipolito, P. Jezzard, 
M.M. Adams, R. Turner, L.G. Ungerleider: The ac- 
quisition of skilled motor performance: Fast and 
slow experience-driven changes in primary mo- 
tor cortex, Proc. Natl. Acad. Sci. USA 95, 861-868 
(1998) 

0. Hikosaka, M.K. Rand, S. Miyachi, K. Miyashita: 
Learning of sequential movements in the mon- 
key: Process of learning and retention of memory, 
J. Neurophysiol. 74, 1652-1661 (1995) 

R. Colom, S. Karama, R.E. Jung, R.J. Haier: Human 
intelligence and brain networks, Dialogues Clin. 
Neurosci. 12, 489-501 (2010) 

H. Johansen-Berg: The future of functionally- 
related structural change assessment, Neurolm- 
age 62, 1293-1298 (2012) 

X. Li, D. Coyle, L. Maguire, D.R. Watson, 
T.M. McGinnity: Gray matter concentration 
and effective connectivity changes in Alzheimer's 
disease: A longitudinal structural MRI study, 
Neuroradiology 53, 733-748 (2011) 

C.J. Steele, J. Scholz, G. Douaud, H. Johansen- 
Berg, V.B. Penhune: Structural correlates of skilled 
performance on a motor sequence task, Front. 
Human Neurosci. 6, 289 (2012) 

R.D. Fields: Imaging learning: the search for 
a memory trace, Neuroscientist 17, 185-196 
(2011) 

S.A. Huettel, A.W. Song, G. McCarthy: Functional 
Magnetic Resonance Imaging, 2nd edn. (Sinauer 
Associates, Sunderland 2009) 


Neuroengineering 


References 


39.27 


39.28 


39.29 


39.30 


39.31 


39.32 


39.33 


39.34 


39.35 


39.36 


39.37 


39.38 


39.39 


F. Ullén, L. Forsman, 0. Blom, A. Karabanov, 
G. Madison: Intelligence and variability in a sim- 
ple timing task share neural substrates in the 
prefrontal white matter, J. Neurosci. 28, 4238- 
4243 (2008) 

T. Ball, A. Schreiber, B. Feige, M. Wagner, 
C.H. Lücking, R. Kristeva-Feige: The role of 
higher-order motor areas in voluntary movement 
as revealed by high-resolution EEG and fMRI, 
Neurolmage 10, 682-694 (1999) 

S. Halder, D. Agorastos, R. Veit, E.M. Hammer, 
S. Lee, B. Varkuti, M. Bogdan, W. Rosenstiel, 
N. Birbaumer, A. Kübler: Neural mechanisms of 
brain-computer interface control, Neurolmage 
55, 1779-1790 (2011) 

F. Quandt, C. Reichert, H. Hinrichs, H.J. Heinze, 
R.T. Knight, J.W. Rieger: Single trial discrimination 
of individual finger movements on one hand: 
A combined MEG and EEG study, Neurolmage 59, 
3316-3324 (2012) 

K.J. Miller, L.B. Sorensen, J.G. Ojemann, M. den 
Nijs: Power-law scaling in the brain surface elec- 
tric potential, PLoS Comput. Biol. 5, e1000609 
(2009) 

K.J. Miller, G. Schalk, E.E. Fetz, M. den Nijs, 
J.G. Ojemann, R.P.N. Rao: Cortical activity during 
motor execution, motor imagery, and imagery- 
based online feedback, Proc. Natl Acad. Sci. USA 
107, 4430-4435 (2010) 

K.J. Miller, D. Hermes, C.J. Honey, A.O. Hebb, 
N.F. Ramsey, R.T. Knight, J.G. Ojemann, E.E. Fetz: 
Human motor cortical activity is selectively 
phase-entrained on underlying rhythms, PLoS 
Comput. Biol. 8, e1002655 (2012) 

V.N. Murthy, E.E. Fetz: Synchronization of neu- 
rons during local field potential oscillations in 
sensorimotor cortex of awake monkeys, J. Neu- 
rophysiol. 76, 3968-3982 (1996) 

D.J. Krusienski, M. Grosse-Wentrup, F. Galán, 
D. Coyle, K.J. Miller, E. Forney, C.W. Anderson: Crit- 
ical issues in state-of-the-art brain-computer 
interface signal processing, J. Neural Eng. 8, 
025002 (2011) 

A.C. Papanicolaou, E.M. Castillo, R. Billingsley- 
Marshall, E. Pataraia, P.G. Simos: A review of clini- 
cal applications of magnetoencephalography, Int. 
Rev. Neurobiol. 68, 223-247 (2005) 

L. Kauhanen, T. Nykopp, J. Lehtonen, P. Jylänki, 
J. Heikkonen, P. Rantanen, H. Alaranta, M. Sams: 
EEG and MEG brain-computer interface for 
tetraplegic patients, IEEE Trans. Neural Syst. Re- 
habil. Eng. 14, 190-193 (2006) 

L. Kauhanen, T. Nykopp, M. Sams: Classification 
of single MEG trials related to left and right index 
finger movements, Clin. Neurophysiol. 117, 430- 
439 (2006) 

K.J. Miller, S. Zanos, E.E. Fetz, M. den Nijs, 
J.G. Ojemann: Decoupling the cortical power 
spectrum reveals real-time representation of in- 


39.40 


39.41 


39.42 


39.43 


39.44 


39.45 


39.46 


39.47 


39.48 


39.49 


39.50 


39.51 


39.52 


39.53 


dividual finger movements in humans, J. Neu- 
rosci. 29, 3132-3137 (2009) 

Z. Wang, Q. Ji, K.J. Miller, G. Schalk: Prior knowl- 
edge improves decoding of finger flexion from 
electrocorticographic signals, Front. Neurosci. 5, 
127 (2011) 

G. Pfurtscheller, C. Neuper, C. Brunner, F. da Lopes 
Silva: Beta rebound after different types of mo- 
tor imagery in man, Neurosci. Lett. 378, 156-159 
(2005) 

G. Pfurtscheller, C. Brunner, A. Schlégl, F.H. da 
Lopes Silva: Mu rhythm (de)synchronization and 
EEG single-trial classification of different motor 
imagery tasks, Neurolmage 31, 153-159 (2006) 

G. Pfurtscheller, C. Neuper, A. Schlogl, K. Lugger: 
Separability of EEG signals recorded during right 
and left motor imagery using adaptive autore- 
gressive parameters, IEEE Trans. Rehabil. Eng. 6, 
316-325 (1998) 

D. Coyle, G. Prasad, T.M. McGinnity: A time- 
frequency approach to feature extraction for 
a brain-computer interface with a compara- 
tive analysis of performance measures, EURASIP 
J. Appl, Signal Process. 2005, 3141-3151 (2005) 

G. Pfurtscheller, C. Guger, G. Miller, G. Krausz, 
C. Neuper: Brain oscillations control hand ortho- 
sis in a tetraplegic, Neurosci. Lett. 292, 211-214 
(2000) 

M. Grosse-Wentrup, B. Schélkopf, J. Hill: Causal 
influence of gamma oscillations on the sensori- 
motor rhythm, Neurolmage 56, 837-842 (2011) 
H.H. Kornhuber, L. Deecke: Hirnpotentialan- 
derungen bei Willkiirbewegungen und passiven 
Bewegungen des Menschen: Bereitschaftspoten- 
tial und reafferente Potentiale, Pfliig. Arch. 284, 
1-17 (1965) 

L. Deecke, B. Grozinger, H.H. Kornhuber: Volun- 
tary finger movement in man: Cerebral potentials 
and theory, Biol. Cybern. 23, 99-119 (1976) 

M. Krauledat, G. Dornhege, B. Blankertz, G. Curio, 
K.-R. Miller: The Berlin brain-computer interface 
for rapid response, Proc. 2nd Int. Brain-Comp. 
Interface Workshop Train. Course Biomed. Tech. 
(2004) pp. 61-62 

M. Krauledat, G. Dornhege, B. Blankertz, G. Curio, 
F. Losch, K.-R. Miller: Improving speed and accu- 
racy of brain-computer interfaces using readiness 
potential, Proc. 26th Int. IEEE Eng. Med. Biol. Conf. 
(2004) pp. 4512-4515 

J.P.R. Dick, J.C. Rothwell, B.L. Day, R. Can- 
tello, 0. Buruma, M. Gioux, R. Benecke, A. Be- 
rardelli, P.D. Thompson, C.D. Marsden: The Bere- 
itschaftspotential is abnormal in Parkinson's dis- 
ease, Brain 112(1), 233-244 (1989) 

H. Shibasaki, M. Hallett: What is the Bere- 
itschaftspotential?, Clin. Neurophysiol. 117, 2341- 
2356 (2006) 

Y. Gu, K. Dremstrup, D. Farina: Single-trial dis- 
crimination of type and speed of wrist move- 


765 


6€ | d Hed 


766 PartD 


Neural Networks 


6€ | d Hed 


39.54 


39.55 


39.56 


39.57 


39.58 


39.59 


39.60 


39.61 


39.62 


39.63 


39.64 


39.65 


39.66 


39.67 


39.68 


ments from EEG recordings, Clin. Neurophysiol. 
120, 1596-1600 (2009) 

D.J. McFarland, L.A. Miner, T.M. Vaughan, 
J.R. Wolpaw: Mu and beta rhythm topographies 
during motor imagery and actual movements, 
Brain Topogr. 12, 177-186 (2000) 

H. Lakany, B.A. Conway: Comparing EEG pat- 
terns of actual and imaginary wrist movements — 
A machine learning approach, Proc. 1st Int. Conf. 
Artif. Intell. Mach. Learn. AIML (2005) pp. 124-127 
B. Nasseroleslami, H. Lakany, B.A. Conway: Iden- 
tification of time-frequency EEG features mod- 
ulated by force direction in arm isometric exer- 
tions, Proc. 5th Int. IEEE EMBS Conf. Neural Eng. 
(2011) pp. 422-425 

J.R. Wolpaw, D.J. McFarland: Control of a two- 
dimensional movement signal by a noninvasive 
brain-computer interface in humans, Proc. Natl. 
Acad. Sci. USA 101, 17849-17854 (2004) 

D.J. McFarland, W.A. Sarnacki, J.R. Wolpaw: 
Electroencephalographic (EEG) control of three- 
dimensional movement, J. Neural Eng. 7, 036007 
(2010) 

T.J. Bradberry, R.J. Gentili, J.L. Contreras-Vidal: 
Reconstructing three-dimensional hand move- 
ments from noninvasive electroencephalographic 
signals, J. Neurosci. 30, 3432-3437 (2010) 

E.D. Adrian, Y. Zotterman: The impulses produced 
by sensory nerve-endings: Part Il. The response of 
a Single End-Organ, J. Physiol. 61, 151-171 (1926) 
R.B. Stein, E.R. Gossen, K.E. Jones: Neuronal vari- 
ability: Noise or part of the signal?, Nat. Rev. 
Neurosci. 6, 389-397 (2005) 

D.A. Butts, C. Weng, J. Jin, C.-l. Yeh, N.A. Lesica, 
J.-M. Alonso, G.B. Stanley: Temporal precision in 
the neural code and the timescales of natural vi- 
sion, Nature 449, 92-95 (2007) 

A.P. Georgopoulos, J.F. Kalaska, R. Caminiti, 
J.T. Massey: On the relations between the direc- 
tion of two-dimensional arm movements and cell 
discharge in primate motor cortex, J. Neurosci. 2, 
1527-1537 (1982) 

B. Amirikian, A.P. Georgopoulos, A.P. Georgopu- 
los: Directional tuning profiles of motor cortical 
cells, Neurosci. Res. 36, 73-79 (2000) 

L. Paninski, M.R. Fellows, N.G. Hatsopoulos, 
J.P. Donoghue: Spatiotemporal tuning of motor 
cortical neurons for hand position and velocity, 
J. Neurophysiol. 91, 515-532 (2004) 

A.P. Georgopoulos, A.B. Schwartz, R.E. Kettner: 
Neuronal population coding of movement direc- 
tion, Science 233, 1416-1419 (1986) 

H. Tanaka, T.J. Sejnowski, J.W. Krakauer: Adap- 
tation to visuomotor rotation through interaction 
between posterior parietal and motor cortical ar- 
eas, J. Neurophysiol. 102, 2921-2932 (2009) 

S.M. Chase, R.E. Kass, A.B. Schwartz: Behavioral 
and neural correlates of visuomotor adaptation 
observed through a brain-computer interface in 


39.69 


39.70 


39.71 


39.72 


39.73 


39.74 


39.75 


39.76 


39.77 


39.78 


39.79 


39.80 


39.81 


primary motor cortex, J. Neurophysiol. 108, 624- 
644 (2012) 

E.V. Evarts: Relation of pyramidal tract activ- 
ity to force exerted during voluntary movement, 
J. Neurophysiol. 31, 14-27 (1968) 

E. Schneidman, M.J. Berry, R. Segev, W. Bialek: 
Weak pairwise correlations imply strongly cor- 
related network states in a neural population, 
Nature 440, 1007-1012 (2006) 

S.-I. Amari: Information geometry on hierarchy 
of probability distributions, IEEE Trans. Inform. 
Theor. 47, 1701-1711 (2001) 

J. Wessberg, C.R. Stambaugh, J.D. Kralik, P.D. Beck, 
M. Laubach, J.K. Chapin, J. Kim, S.J. Biggs, 
M.A. Srinivasan, M.A. Nicolelis: Real-time pre- 
diction of hand trajectory by ensembles of cor- 
tical neurons in primates, Nature 408, 361-365 
(2000) 

M.D. Serruya, N.G. Hatsopoulos, L. Paninski, 
M.R. Fellows, J.P. Donoghue: Instant neural con- 
trol of a movement signal, Nature 416, 141-142 


(2002) 
J.M. Carmena, M.A. Lebedev, R.E. Crist, 
J.E. O'Doherty, D.M. Santucci, D.F. Dimitrov, 


P.G. Patil, C.S. Henriquez, M.A.L. Nicolelis: Learn- 
ing to control a brain-machine interface for 
reaching and grasping by primates, PLoS Biol. 1, 
E42 (2003) 

M.A. Lebedev, J.M. Carmena, J.E. O'Doherty, 
M. Zacksenhouse, C.S. Henriquez, J.C. Principe, 
M.A.L. Nicolelis: Cortical ensemble adaptation to 
represent velocity of an artificial actuator con- 
trolled by a brain-machine interface, J. Neurosci. 
25, 4681-4693 (2005) 

M. Velliste, S. Perel, M.C. Spalding, A.S. Whitford, 
A.B. Schwartz: Cortical control of a prosthetic arm 
for self-feeding, Nature 453, 1098-1101 (2008) 
D.M. Santucci, J.D. Kralik, M.A. Lebedev, 
M.A.L. Nicolelis: Frontal and parietal cortical 
ensembles predict single-trial muscle activity 
during reaching movements in primates, Eur. 
J. Neurosci. 22, 1529-1540 (2005) 

J.R. Wolpaw, N. Birbaumer, D.J. McFarland, 
G. Pfurtscheller, T.M. Vaughan: Brain-computer 
interfaces for communication and control, Clin. 
Neurophysiol. 113, 767-791 (2002) 

A. Bashashati, S.G. Mason, J.F. Borisoff, R.K. Ward, 
G.E. Birch: A comparative study on generat- 
ing training-data for self-paced brain interfaces, 
IEEE Trans. Neural Syst. Rehabil. Eng. 15, 59-66 
(2007) 

S.G. Mason, A. Bashashati, M. Fatourechi, 
K.F. Navarro, G.E. Birch: A comprehensive survey 
of brain interface technology designs, Ann. 
Biomed. Eng. 35(2), 137-169 (2007) 

N. Brodu, F. Lotte, A. Lécuyer: Comparative study 
of band-power extraction techniques for Motor 
Imagery classification, IEEE Symp. Comput. Intell. 
Cogn. Algorithm, Mind, Brain (2011) pp. 1-6 


Neuroengineering 


References 


39.82 


39.83 


39.84 


39.85 


39.86 


39.87 


39.88 


39.89 


39.90 


39.91 


39.92 


39.93 


39.94 


39.95 


39.96 


F. Lotte, M. Congedo, A. Lécuyer, F. Lamarche, 
B. Arnaldi: A review of classification algorithms 
for EEG-based brain-computer interfaces, J. Neu- 
ral Eng. 4, RI-R13 (2007) 

P. Herman, G. Prasad, T.M. McGinnity, D. Coyle: 
Comparative analysis of spectral approaches to 
feature extraction for EEG-based motor imagery 
classification, IEEE Trans. Neural Syst. Rehabil. 
Eng. 16, 317-326 (2008) 

J. Wolpaw, E.W. Wolpaw: Brain-Computer Inter- 
faces: Principles and Practice (Oxford Univ. Press, 
Oxford 2012) 

S. Sun: Ensemble learning methods for classifying 
EEG sign, Lect. Notes Comput. Sci. 4472, 113-120 
(2007) 

L.F. Nicolas-Alonso, J. Gomez-Gil: Brain com- 
puter interfaces — A review., Sensors 12, 1211-1279 
(2012) 

A. Soria-Frisch: A critical review on the usage of 
ensembles for BCI. In: Towards Practical Brain- 
Computer Interfaces, ed. by B.Z. Allison, S. Dunne, 
R. Leeb, J.R. Del Millan, A. Nijholt (Springer, Berlin, 
Heidelberg 2013) pp. 41-65 

H. Ramoser, J. Muller-Gerking, G. Pfurtscheller: 
Optimal spatial filtering of single trial EEG during 
imagined hand movement, IEEE Trans. Rehabil. 
Eng. 8, 441-446 (2000) 

B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, 
K.-R. Muller: Optimizing Spatial filters for Robust 
EEG Single-Trial Analysis, IEEE Signal Process. Mag. 
25, 41-56 (2008) 

B. Blankertz, M. Kawanabe, R. Tomioka, F.U. Hoh- 
lefeld, V. Nikulin, K.-R. Miller: Invariant Common 
Spatial Patterns: Alleviating Nonstationarities in 
Brain-Computer Interfacing, Adv. Neural Inf. Pro- 
cess. 20, 1-8 (2008) 

F. Lotte, C. Guan: Regularizing common spatial 
patterns to improve BCI designs: Unified theory 
and new algorithms, IEEE Trans. Neural Syst. Re- 
habil. Eng. 58, 355-362 (2011) 

D. Coyle: Neural network based auto association 
and time-series prediction for biosignal process- 
ing in brain-computer interfaces, IEEE Comput. 
Intell. Mag. 4(4), 47-59 (2009) 

H. Zhang, Z.Y. Chin, K.K. Ang, C. Guan, C. Wang: 
Optimum spatio-spectral filtering network for 
brain-computer interface, IEEE Trans. Neural 
Netw. 22, 52-63 (2011) 

K.K. Ang, Z.Y. Chin, C. Wang, C. Guan, H. Zhang: 
Filter bank common spatial pattern algorithm on 
BCI competition IV datasets 2a and 2b, Front. Neu- 
rosci. 6, 39 (2012) 

A. Cichocki, Y. Washizawa, T. Rutkowski, H. Ba- 
kardjian, A.H. Phan, S. Choi, H. Lee, Q. Zhao, 
L. Zhang, Y. Li: Noninvasive BCIs: Multiway signal- 
processing array decompositions, Computer 41, 
34-42 (2008) 

D.H. Coyle, G. Prasad, T.M. McGinnity: Improv- 
ing information transfer rates of BCI by self- 


39.97 


39.98 


39.99 


39.100 


39.101 


39.102 


39.103 


39.104 


39.105 


39.106 


39.107 


39.108 


39.109 


organising fuzzy neural network-based multi- 
step-ahead time series prediction, Proc. 3rd IEEE 
Syst. Man Cybern. Conf. (2004) 

D. Coyle, G. Prasad, T.M. McGinnity: A time- 
series prediction approach for feature extraction 
in a brain-computer interface, IEEE Trans. Neural 
Syst. Rehabil. Eng. 13, 461-467 (2005) 

D. Coyle: Channel and class dependent time- 
series embedding using partial mutual infor- 
mation improves sensorimotor rhythm based 
brain-computer interfaces. In: Time Series Anal- 
ysis, Modeling and Applications - A Computa- 
tional Intelligence Perspective, ed. by W. Pedrycz 
(Springer, Berlin, Heidelberg 2013) pp. 249-278 

A. Schlogl, D. Flotzinger, G. Pfurtscheller: Adap- 
tive autoregressive modeling used for single- 
trial EEG classification - Verwendung eines 
Adaptiven Autoregressiven Modells fiir die Klas- 
sifikation von Einzeltrial-EEG-Daten, Biomed. 
Tech./Biomed. Eng. 42, 162-167 (1997) 

E. Haselsteiner, G. Pfurtscheller: Using time- 
dependent neural networks for EEG classification, 
IEEE Trans. Rehabil. Eng. 8, 457-463 (2000) 

E.M. Forney, C.W. Anderson: Classification of EEG 
during imagined mental tasks by forecasting with 
Elman Recurrent Neural Networks, Int. Joint Conf. 
Neural Netw. (2011) pp. 2749-2755 

C. Anderson, E. Forney, D. Hains, A. Natarajan: Re- 
liable identification of mental tasks using time- 
embedded EEG and sequential evidence accumu- 
lation, J. Neural Eng. 8, 025023 (2011) 

H.K. Kimelberg: Functions of mature mammalian 
astrocytes: A current view, Neuroscientist 16, 79- 
106 (2010) 

N.A. Busch, J. Dubois, R. VanRullen: The phase of 
ongoing EEG oscillations predicts visual percep- 
tion, J. Neurosci. 29, 7869-7876 (2009) 

W.J. Freeman: Origin, structure, and role of back- 
ground EEG activity. Part 1. Analytic amplitude, 
Clin. Neurophysiol. 115, 2077-2088 (2004) 

R.T. Canolty, E. Edwards, S.S. Dalal, M. Soltani, 
S.S. Nagarajan, H.E. Kirsch, M.S. Berger, N.M. Bar- 
baro, R.T. Knight: High gamma power is phase- 
locked to theta oscillations in human neocortex, 
Science 313, 1626-1628 (2006) 

C. Mehring, J. Rickert, E. Vaadia, S. De Cardosa 
Oliveira, A. Aertsen, S. Rotter: Inference of hand 
movements from local field potentials in mon- 
key motor cortex, Nat. Neurosci. 6, 1253-1254 
(2003) 

K.J. Miller, E.C. Leuthardt, G. Schalk, R.P.N. Rao, 
N.R. Anderson, D.W. Moran, J.W. Miller, J.G. Oje- 
mann: Spectral changes in cortical surface po- 
tentials during motor movement, J. Neurosci. 27, 
2424—2432 (2007) 

S. Waldert, H. Preissl, E. Demandt, C. Braun, 
N. Birbaumer, A. Aertsen, C. Mehring: Hand move- 
ment direction decoded from MEG and EEG, 
J. Neurosci. 28, 1000-1008 (2008) 


767 


6€ | d Hed 


768 PartD 


Neural Networks 


6€ | d Hed 


39.110 


39.111 


39.112 


39.113 


39.114 


39.115 


39.116 


39.117 


39.118 


39.119 


39.120 


39.121 


39.122 


39.123 


39.124 


K. Jerbi, J.-P. Lachaux, K. N'Diaye, D. Pantazis, 
R.M. Leahy, L. Garnero, S. Baillet: Coherent neural 
representation of hand speed in humans revealed 
by MEG imaging, Proc. Natl. Acad. Sci. USA 104, 
7676-7681 (2007) 

K. Jerbi, 0. Bertrand: Cross-frequency coupling in 
parieto-frontal oscillatory networks during motor 
imagery revealed by magnetoencephalography, 
Front. Neurosci. 3, 3-4 (2009) 

S.N. Baker: Oscillatory interactions between sen- 
sorimotor cortex and the periphery, Curr. Opin. 
Neurobiol. 17, 649-655 (2007) 

D. Rubino, K.A. Robbins, N.G. Hatsopoulos: Prop- 
agating waves mediate information transfer in 
the motor cortex, Nat. Neurosci. 9, 1549-1557 
(2006) 

R.J. May, H.R. Maier, G.C. Dandy, T.M.K. Gayani 
Fernando: Non-linear variable selection for ar- 
tificial neural networks using partial mutual in- 
formation, Environ. Model. Softw. 23, 1312-1326 
(2008) 

S. Lemm, C. Schäfer, G. Curio: BCI competition 
2003-data set Ill: Probabilistic modeling of sen- 
sorimotor u rhythms for classification of imagi- 
nary hand movements, IEEE Trans. Biomed. Eng. 
51, 1077-1080 (2004) 

D. Coyle, G. Prasad, T.M. McGinnity: Faster self- 
organizing fuzzy neural network training and 
a hyperparameter analysis for a brain-computer 
interface, IEEE Trans. Syst. Man Cybern. 39, 1458- 
1471 (2009) 

C. Sannelli, C. Vidaurre, K.-R. Müller, B. Blankertz: 
CSP patches: An ensemble of optimized spatial fil- 
ters. An evaluation study, J. Neural Eng. 8, 025012 
(2011) 

C. Vidaurre, M. Kawanabe, P. von Bünau, B. Blan- 
kertz, K.R. Müller: Toward unsupervised adapta- 
tion of LDA for brain-computer interfaces, IEEE 
Trans. Biomed. Eng. 58, 587-597 (2011) 

P. Herman, G. Prasad, T.M. McGinnity: Compu- 
tational intelligence approaches to brain sig- 
nal pattern recognition. In: Pattern Recognition 
Techniques, Technology and Applications, ed. by 
B. Verma (InTech, Rijeka 2008) pp. 91-120 

J.M. Mendel: Type-2 fuzzy sets and systems: An 
overview, IEEE Comput. Intell. Mag. 2(1), 20-29 
(2007) 

R. Mohammadi, A. Mahloojifar, H. Chen, D. Coyle: 
EEG based foot movement onset detection 
with the probabilistic classification vector ma- 
chine, Lect. Notes Comput. Sci. 7666, 356-363 
(2012) 

H. Chen, P. Tino, X. Yao: Probabilistic classifica- 
tion vector machines, IEEE Trans. Neural Netw. 20, 
901-914 (2009) 

C. Vidaurre, B. Blankertz: Towards a cure for BCI 
illiteracy, Brain Topogr. 23, 194—198 (2010) 

A. Schlögl, C. Vidaurre, K.-R. Müller: Adaptive 
methods in BCI research: An introductory tutorial. 


39.125 


39.126 


39.127 


39.128 


39.129 


39.130 


39.131 


39.132 


39.133 


39.134 


39.135 


39.136 


39.137 


39.138 


39.139 


In: Brain Computer Interfaces — Revolutionizing 
Human-Computer Interfaces, ed. by B. Graimann, 
B. Allison, G. Pfurtscheller (Springer, Berlin, Hei- 
delberg 2010) pp. 331-355 

J.R. Wolpaw, N. Birbaumer, W.J. Heetderks, 
D.J. McFarland, P.H. Peckham, G. Schalk, E. Don- 
chin, L.A. Quatrano, C.J. Robinson, T.M. Vaughan: 
Brain-computer interface technology: A review of 
the first international meeting, IEEE Trans. Reha- 
bil. Eng. 8, 164-173 (2000) 

D.J. McFarland, W.A. Sarnacki, J.R. Wolpaw: 
Should the parameters of a BCI translation al- 
gorithm be continually adapted?, J. Neurosci. 
Methods 199, 103-107 (2011) 

B.A.S. Hasan: Adaptive Methods Exploiting the 
Time Structure in EEG for Interfaces (University of 
Essex, Colchester 2010) 

C. Vidaurre, C. Sannelli, K.-R. Müller, B. Blankertz: 
Co-adaptive calibration to improve BCI efficiency, 
J. Neural Eng. 8, 025009 (2011) 

J.W. Yoon, S.J. Roberts, M. Dyson, J.Q. Gan: Adap- 
tive classification for brain computer interface 
systems using sequential Monte Carlo sampling, 
Neural Netw. 22, 1286-1294 (2009) 

J.Q. Gan: Self-adapting BCI based on unsuper- 
vised learning, 3rd Int. Workshop Brain Comput. 
Interfaces (2006) pp. 50-51 

S. Lu, C. Guan, H. Zhang: Unsupervised brain com- 
puter interface based on intersubject information 
and online adaptation, IEEE Trans. Neural Syst. 
Rehabil. Eng. 17, 135-145 (2009) 

S.E. Eren, M. Grosse-Wentrup, M. Buss: Unsu- 
pervised classification for non-invasive brain- 
computer-interfaces, Proc. Autom. Workshop (VDI 
Verlag, 2007) pp. 65-66 

B.A.S. Hasan, J.Q. Gan: Unsupervised adaptive 
GMM for BCI, 4th Int. IEEE EMBS Conf. Neural Eng. 
(2009) pp. 295-298 

J. Blumberg, J. Rickert, S. Waldert, A. Schulze- 
Bonhage, A. Aertsen, C. Mehring: Adaptive classi- 
fication for brain computer interfaces, Annu. Int. 
Conf. IEEE Eng. Med. Biol. Soc. (2007) pp. 2536- 
2539 

G. Liu, G. Huang, J. Meng, D. Zhang, X. Zhu: Unsu- 
pervised adaptation based on fuzzy c-means for 
brain-computer interface, 1st Int. Conf. Inform. 
Sci. Eng. (2009) pp. 4122-4125 

P. Shenoy, M. Krauledat, B. Blankertz, R.P.N. Rao, 
K.-R. Miller: Towards adaptive classification for 
BCI, J. Neural Eng. 3, R13-23 (2006) 

T. Gürel, C. Mehring: Unsupervised adaptation of 
brain-machine interface decoders, Front. Neu- 
rosci. 6, 164 (2012) 

M. Sugiyama, M. Krauledat, K.-R. Müller: Co- 
variate shift adaptation by importance weighted 
cross validation, J. Mach. Learn. Res. 8, 985-1005 
(2007) 

A. Satti, C. Guan, D. Coyle, G. Prasad: A covari- 
ate shift minimisation method to alleviate non- 


Neuroengineering | References 769 


39.140 


39.141 


39.142 


39.143 


39.144 


39.145 


39.146 


39.147 


stationarity effects for an adaptive brain-com- 
puter interface, 20th Int. Conf. Pattern Recogn. 
(2010) pp. 105-108 

R. Mohammadi, A. Mahloojifar, D. Coyle: Unsu- 
pervised short-term covariate shift minimization 
for self-paced BCI, IEEE Symp. Comput. Intell. 
Cogn. Algorithm. Mind, Brain (2013) pp. 101-106 
A. Satti, D. Coyle, G. Prasad: Continuous EEG clas- 
sification for a self-paced BCI, 4th Int. IEEE/EMBS 
Conf. Neural Eng. (2009) pp. 315-318 

G.E. Fabiani, D.J. McFarland, J.R. Wolpaw, 
G. Pfurtscheller: Conversion of EEG activity into 
cursor movement by a brain-computer interface 
(BCI), IEEE Trans. Neural Syst. Rehabil. Eng. 12, 
331-338 (2004) 

M. Arvaneh, C. Guan, K.K. Ang, C. Quek: EEG data 
space adaptation to reduce intersession non- 
stationarity in brain-computer interface, Neural 
Comput. 25(8), 2146-2171 (2013) 

G. Schalk: Brain-computer symbiosis, J. Neural 
Eng. 5, PI-P15 (2008) 

C.M. Reed, N.I. Durlach: Note on informa- 
tion transfer rates in human communication, 
Presence: Teleoper. Virtual Environ. 7, 509-518 
(1998) 

I.S. MacKenzie: Fitts' Law as a research and design 
tool in human-computer interaction, Human- 
Comput. Interact. 7, 91-139 (1992) 

Z.C. Chao, Y. Nagasaka, N. Fujii: Long-term asyn- 
chronous decoding of arm motion using electro- 
corticographic signals in monkeys, Front. Neuro- 
eng. 3, 3-13 (2010) 


39.148 


39.149 


39.150 


39.151 


39.152 


39.153 


39.154 


39.155 


39.156 


G. Schalk: Can electrocorticography (ECoG) support 
robust and powerful brain-computer interfaces?, 
Front. Neuroeng. 3, 9 (2010) 

A. Barbero, M. Grosse-Wentrup: Biased feedback 
in brain-computer interfaces, J. Neuroeng. Reha- 
bil. 7, 34 (2010) 

L.R. Hochberg, M.D. Serruya, G.M. Friehs, 
J.A. Mukand, M. Saleh, A.H. Caplan, A. Branner, 
D. Chen, R.D. Penn, J.P. Donoghue: Neuronal en- 
semble control of prosthetic devices by a human 
with tetraplegia, Nature 442, 164—171 (2006) 

J.M. Carmena: Becoming bionic, IEEE Spect. 49, 
24-29 (2012) 

T. W. Berger, J. K. Chapin, G. A. Gerhardt, D. J. Mc- 
Farland, D. M. Taylor, P. A. Tresco: WTEC Panel 
Report on International Assessment of Research 
and Development in Brain-Computer Interfaces 
(2007) 

R. Kurzweil: The Age of Spiritual Machines: When 
Computers Exceed Human Intelligence (Penguin, 
New York 2000) 

P. Barnard, P. Dayan, P. Redgrave: Action. In: 
Cognitive Systems: Information Processing Meets 
Brain Science, ed. by R. Morris, L. Tarassenki, 
M. Kenward (Elsevier Academic, London 2006) 
G.L. Chadderdon, S.A. Neymotin, C.C. Kerr, 
W.W. Lytton: Reinforcement learning of targeted 
movement in a spiking neuronal model of motor 
cortex, PloS ONE 7, e47251 (2012) 

K. Doya: What are the computations of the cere- 
bellum, the basal ganglia and the cerebral cor- 
tex?, Neural Netw. 12, 961-974 (1999) 


6€ | d Hed 


771 


40. Evolving Connectionist Systems: 
From Neuro-Fuzzy-, to Spiking- and Neuro-Genetic 


Nikola Kasabov 


This chapter follows the development of a class of 
neural networks (NN) called evolving connectionist 
systems (ECOS). The term evolving is used here in its 
meaning of unfolding, developing, changing, re- 
vealing (according to the Oxford dictionary) rather 
than evolutionary. The latter represents processes 
related to populations and generations of them. An 
ECOS is a neural network-based model that evolves 
its structure and functionality through incremen- 
tal, adaptive learning and self-organization during 
its lifetime. In principle, it could be a simple NN or 


40.1 Principles of Evolving Connectionist 
Systems (ECOS) ..........0.0....ccceeseeeeeeeees 771 


40.2 Hybrid Systems and Evolving 
Neuro-Fuzzy Systems .......................068 T12 
40.2.1 Hybrid Systems... 772 
40.2.2 Evolving Neuro-Fuzzy Systems .... 773 
40.2.3 From Local to Transductive 
(Individualized) Learning 
and Modeling n.oeccccsisccis T75 
40.2.4 Applications... T75 


40.3 Evolving Spiking Neural Networks 


a hybrid connectionist system. The latter is a sys- C A E 775 
tem based on neural networks that also integrate 40.3.1 Spiking Neuron Models .............. 775 
other computational principles, such as linguis- 40.3.2 Evolving Spiking Neural Networks 
tically meaningful explanation features of fuzzy CT E deeetutabesveacedbins 119 
rules, optimization techniques for structure and 40.3.3 Extracting Fuzzy Rules 

parameter optimization, quantum-inspired meth- fomi eN ae ce astscraiiraceionnceuaee 777 
ods, and gene regulatory networks. The chapter 40.3.4 eSNN Applications .................... Vat 


includes definitions and examples of ECOS such as: 
evolving neuro-fuzzy and hybrid systems; evolving 


spiking neural networks, neurogenetic systems, (CNGM) ara a EET ELTEL renee tek ide 
quantum-inspired systems, which are all dis- cee le es Poe EOE sdegaaaees i 
cussed from the point of view of the structural ate P A S Cai whee a 

s ee 40.4.3 Quantum-Inspired Optimization v 
and functional development of a connectionist- g 
based model and the knowledge that it represents ar SNN anid INEM. sinies, T + 
= $ 8! 2 ep ; 40.4.4 Applications of CNGM................. 779 o 
Applications for knowledge engineering across do- kak 
main areas, such as in bioinformatics, brain study, 40.5 Conclusions and Further Directions....... 779 = 
and intelligent machines are presented. 2 

Referentes. oeeie 780 


40.4 Computational Neuro-Genetic Modeling 


40.1 Principles of Evolving Connectionist Systems (ECOS) 


Everything in Nature evolves, develops, unfolds, re- 
veals, and changes in time. The brain is probably 
the ultimate evolving system, which develops during 
a lifetime, based on genetic information (Nature) and 
learning from the environment (nurture). Inspired by 
information principles of the developing brain, ECOS 
are adaptive, incremental learning and knowledge rep- 
resentation systems that evolve their structure and func- 


tionality from incoming data through interaction with 
the environment, where in the core of a system is 
a connectionist architecture that consists of neurons (in- 
formation processing units) and connections between 
them [40.1]. An ECOS is a system based on neural net- 
works and the use of also other techniques of computa- 
tional intelligence (CI), which operates continuously in 
time and adapts its structure and functionality through 


772 


7°07 | d Hed 


Part D 


Neural Networks 


Current price 


Yesterday's price — 
(crisp values) 


continuous interaction with the environment and with 
other systems. The adaptation is defined through: 


1. A set of evolving rules. 

2. A set of parameters (genes) that are subject to 
change during the system operation. 

3. An incoming continuous flow of information, pos- 
sibly with unknown distribution. 

4. Goal (rationale) criteria (also subject to modifica- 
tion) that are applied to optimize the performance 
of the system over time. 


ECOS learning algorithms are inspired by brain-like 
information processing principles, e.g.: 


1. They evolve in an open space, where the dimensions 
of the space can change. 

2. They learn via incremental learning, possibly in an 
on-line mode. 


3. They learn continuously in a lifelong learning 
mode. 

4. They learn both as individual systems and as an evo- 
lutionary population of such systems. 

5. They use constructive learning and have evolving 
structures. 

6. They learn and partition the problem space locally, 
thus allowing for a fast adaptation and tracing the 
evolving processes over time. 

7. They evolve different types of knowledge represen- 
tation from data, mostly a combination of memory- 
based and symbolic knowledge. 


Many methods, algorithms, and computational in- 
telligence systems have been developed since the con- 
ception of ECOS and many applications across disci- 
plines. This chapter will review only the fundamental 
aspects of some of these methods and will highlight 
some principal applications. 


40.2 Hybrid Systems and Evolving Neuro-Fuzzy Systems 


40.2.1 Hybrid Systems 


A hybrid computational intelligent system integrates 
several principles of computational intelligence to en- 
hance different aspects of the performance of the sys- 
tem. Here we will discuss only hybrid connectionist 
systems that integrate artificial neural networks (NN) 
with other techniques utilizing the adaptive learning 
features of the NN. 

Early hybrid connectionist systems combined NN 
with rule-based systems such as production rules [40.3] 


Neural 
network 


Predicted price 


Fuzzified data 


Rules 
extraction 
module 


Trading rules 


(fuzzy) 


Neural 
network 


(crisp value) 


Political situation Fuzzy 
rule-based 


decision 


Decision (buy/sell/hold) 
> 
(fuzzy & crisp values) 


Economic situation 
(fuzzy values) 


Fig. 40.1 A hybrid NN-fuzzy rule-based expert system for financial 


decision support (after [40.2]) 


or predicate logic [40.4]. They utilized NN modules 
for a lower level of information processing and rule- 
based systems for reasoning and explanation at a higher 
level. 

The above principle is applied when fuzzy rules 
are used for higher-level information processing and 
for approximate reasoning [40.5-7]. These are expert 
systems that combine the learning ability of NN with 
the explanation power of linguistically plausible fuzzy 
rules [40.8-11]. A block diagram of an exemplar sys- 
tem is shown in Fig. 40.1, where at a lower level 
a neural network (NN) module predicts the level of 
a stock index and at a higher level a fuzzy reason- 
ing module combines the predicted values with some 
macro-economic variables representing the political 
and the economic situations using the following types 
of fuzzy rules [40.2] 


IF <the predicted by the NN module stock value 
in the future is high> AND 
<the economic situation is good> AND 
<the political situation is stable> 


THEN <buy stock> . (40.1) 


Along with the integration of NN and fuzzy rules 
for a better decision support, the system from Fig. 40.1 


includes an NN module for extracting recent rules form 
data that can be used by experts to analyze the dy- 


Evolving Connectionist Systems 


40.2 Hybrid Systems and Evolving Neuro-Fuzzy Systems 


Rule (case) 


Fig. 40.2 A simple, feedforward EFuNN structure. The 
tule nodes evolve from data to capture cluster centers in the 
input space, while the output nodes evolve local models to 
learn and approximate the data in each of these clusters 


namics of the stock and to possibly update the trading 
fuzzy rules in the fuzzy rule-based module. This NN 
module uses a fuzzy neural (FNN) network for the rule 
extraction. 

Fuzzy neural networks (FNN) integrate NN and 
fuzzy rules into a single neuronal model tightly cou- 
pling learning and fuzzy reasoning rules into a con- 
nectionist structure. One of the first FNN models was 
initiated by Yamakawa and other Japanese scientists 
and promoted at a series of IZUKA conferences in 
Japan [40.12, 13]. Many models of FNNs were devel- 
oped based on these principles [40.2, 14, 15]. 


40.2.2 Evolving Neuro-Fuzzy Systems 
The evolving neuro-fuzzy systems further extended the 


principles of hybrid neuro-fuzzy systems and the FNN, 
where instead of training a fixed connectionist structure, 


Outputs 


the structure and its functionality evolve from incom- 
ing data, often in an on-line, one-pass learning mode. 
This is the case with evolving connectionist systems 
(ECOS) [40.1, 16-19]. 

ECOS are modular connectionist-based systems 
that evolve their structure and functionality in a contin- 
uous, self-organized, on-line, adaptive, and interactive 
way from incoming information [40.17]. They can pro- 
cess both data and knowledge in a supervised and/or 
unsupervised way. ECOS learn local models from data 
through clustering of the data and associating a local 
output function for each cluster represented in a connec- 
tionist structure. They can learn incrementally single 
data items or chunks of data and also incrementally 
change their input features [40.18]. 

Elements of ECOS have been proposed as part of 
the early, classical NN models, such as Kohonen’s self 
organising maps (SOM) [40.20], redical basis func- 
tion(RBF) [40.21], Fuzy ARTMap [40.22] by Carpenter 
et al. and Fritzke’s growing neural gas [40.23], Platt’s 
resource allocation networks (RAN) [40.24]. 

Some principles of ECOS are: 


@ Neurons are created (evolved) and allocated as cen- 
ters of (fuzzy) data clusters. Fuzzy clustering, as 
a means to create local knowledge-based systems, 
was stimulated by the pioneering work of Bezdek, 
Yager and Filev [40.27-30]. 

@ Local models are evolved and updated in these clus- 
ters. 


Here we will briefly illustrate the concepts of 
ECOS on two implementations: evolving fuzzy neu- 


Fuzzy outputs 


W: 


Rule (case) 
layer 


Fuzzy input 
layer 


Fig. 40.3 An EFuNN structure with 
feedback connections (after [40.16]) 


773 


7°07 | d Hed 


774 PartD 


7°07 | d Hed 


Neural Networks 


tral networks (EFUNN) [40.16] and dynamic neuro- 
fuzzy inference systems (DENFIS) [40.25]. Examples 
of EFuNN are shown in Figs. 40.2 and 40.3 and of 


C ESD 


Fig. 40.4a,b Learning in DENFIS uses the evolving clustering 
method illustrated on a simple example of 2 inputs and 1 output 
and 11 data clusters evolved. The recall of the DENFIS for two 
new input vectors x; and xp is illustrated with the use of the 3 clos- 
ets clusters to the new input vector (after [40.25]). (a) Fuzzy role 
group 1 for a DENFIS. (b) Fuzzy role group 2 for a DENFIS 


GFR-ECOS: Evolving Medical Decision Support System 


Calculate J 


LAge LSe 3Screat 4Sura 5.Race 6.Salb Ecos MDRD CurrentVec InputVector 
[ f | write =) T | Ra 


InputVectorPattern Input PCA Space (Lst & 2nd principal components) 


Age 


zsa 


Puzzy Rule 12 
if is about 21.3 
str is 


3.Screat 


Circle: Fuzzy Rade Node Triage: Ispat F 
ECOS System: ExtRRes1.mat 
Fig. 40.5 An example of the DENFIS model (after [40.26]) for 
medical renal function evaluation 


DENFIS in Figs. 40.4 and 40.5. In ECOS, clusters of 
data are created (evolved) based on similarity between 
data samples (input vectors) either in the input space 
(this is the case in some of the ECOS models, e.g., 
DENFIS), or in both the input and output space (this 
is the case, e.g., in the EFuNN models). Samples that 
have a distance to an existing node (cluster center, rule 
node, neuron) less than a certain threshold are allocated 
to the same cluster. Samples that do not fit into existing 
clusters, form (generate, evolve) new clusters. Cluster 
centers are continuously adjusted according to new data 
samples, others are created incrementally. ECOS learn 
from data and automatically create or update a local 
(fuzzy) model/function in each cluster, e.g., 


IF < data is in a (fuzzy) cluster Ci > 


THEN < the model is Fi>, (40.2) 


where Fi can be a fuzzy value, a linear or logistic re- 
gression function (Fig. 40.5), or an NN model [40.25]. 

ECOS utilize evolving clustering methods. There 
is no fixed number of clusters specified a priori, 
but clusters are created and updated incrementally. 
Other ECOS that use this principle are: evolving self- 
organized maps (ESOM) [40.17], evolving classifica- 
tion function [40.18,26], evolving spiking neural net- 
works (Sect. [40.4]). 

As an example, the following are the major steps for 
the training and recall of a DENFIS model: 


Training: 
1. Create or update a cluster from incoming data. 
2. Create or update a Takagi—Sugeno fuzzy rule for 
each cluster: 
IF x is in cluster Cj THEN yj = fj (x), 
where: yi = 60+ 61 x1+ B2x2+---+ Bq. 


The function coefficients are incrementally updated 
with every new input vector or after a chunk of data. 
Recall — fuzzy inference for a new input vector: 


1. For a new input vector x = [x1, x2, ..., xq] DEN- 
FIS chooses m fuzzy rules from the whole fuzzy 
rule set for forming a current inference system. 

2. The inference result is 

Di=1,m [wi fi(xl, x2,...,xq)] 


= = ‘ (40.3) 
y Xi=1,m Ol 


where i is the index of one of the m closets to 
the new input vector x clusters, wi = 1 — di is the 
weighted distance between this vector the cluster 
center, fi(x) is the calculated output for x according 
to the local model fi for cluster i. 


Evolving Connectionist Systems | 40.3 Evolving Spiking Neural Networks (eSNN) 775 


40.2.3 From Local to Transductive 
(Individualized) Learning 
and Modeling 


A special direction of ECOS is transductive reason- 
ing and personalized modeling. Instead of building 
a set of local models fi (e.g., prototypes) to cover the 
whole problem space and then using these models to 
classify/predict any new input vector, in transductive 
modeling for every new input vector x a new model 
fx is created based on selected nearest neighbor vec- 
tors from the available data. Such ECOS models are 
neuro-fuzzy inference systems (NFI) [40.31] and the 
transductive weighted neuro-fuzzy inference system 
(TWNFI) [40.32]. In TWNFI for every new input vector 
the neighborhood of the closest data vectors is opti- 
mized using both the distance between the new vector 
and the neighboring ones and the weighted importance 
of the input variables, so that the error of the model is 
minimized in the neighborhood area [40.33]. TWNFI is 
a further development of the weighted-weighted nearest 
neighbor method (WWKNN) proposed in [40.34]. The 
output for a new input vector is calculated based on the 
outputs of the k-nearest neighbors, where the weighting 
is based on both distance and a priori calculated impor- 
tance for each variable using a ranking method such as 
signal-to-noise ratio or the t-test. 

Other ECOS were been developed as improvements 
of EFuNN, DENFIS, or other early ECOS models by 
Ozawa etal. and Watts [40.35-37], including ensem- 
bles of ECOS [40.38]. A similar approach to ECOS 


was used by Angelov in the development of the (ETS) 
models [40.39]. 


40.2.4 Applications 


ECOS have been applied to problems across domain ar- 
eas. It is demonstrated that local incremental learning 
or transductive learning are superior when compared 
to global learning models and when compared in terms 
of accuracy and new knowledge obtained. A review of 
ECOS applications can be found in [40.26]. The appli- 
cations include: 


Medical decision support systems (Fig. 40.5) 

Bioinformatics, e.g., [40.40] 

Neuroinformatics and brain study, e.g., [40.41] 

Evolvable robots, e.g., [40.42] 

Financial and economic decision support systems, 

e.g., [40.43] 

@ Environmental and ecological modeling, e.g., 
[40.44] 

© Signal processing, speech, image, and multimodal 
systems, e.g., [40.45] 

@ Cybersecurity, e.g., [40.46] 

@ Multiple time series prediction, e.g., [40.47]. 


While classical ECOS use a simple McCulloch 
and Pitts model of a neuron and the Hebbian learning 
rule [40.48], evolving spiking neural network (eSNN) 
architectures use a spiking neuron model, applying the 
same or similar ECOS principles. 


40.3 Evolving Spiking Neural Networks (eSNN) 


40.3.1 Spiking Neuron Models 


A single biological neuron and the associated synapses 
is a complex information processing machine that in- 
volves short-term information processing, long-term in- 
formation storage, and evolutionary information stored 
as genes in the nucleus of the neuron. A spiking neuron 
model assumes input information represented as trains 
of spikes over time. When sufficient input spikes are ac- 
cumulated in the membrane of the neuron, the neuron’s 
post-synaptic potential exceeds a threshold and the neu- 
ron emits a spike at its axon (Fig. 40.6a,b). Some of 
the-state-of-the-art models of spiking neurons include: 
early models by Hodgkin and Huxley [40.49], and Hop- 
field [40.50]; and more recent models by Maass, Gerst- 
ner, Kistler, Izhikevich, Thorpe and van Ruller [40.5 1— 
54]. Such models are spike response models (SRMs), 


the leaky integrate-and-fire model (LIFM) (Fig. 40.6), 
Izhikevich models, adaptive LIFM, and probabilistic 
IFM [40.55]. 


40.3.2 Evolving Spiking Neural Networks 
(eSNN) 


Based on the ECOS principles, an evolving spik- 
ing neural network architecture (eSNN) was proposed 
in [40.26], which was initially designed as a visual pat- 
tern recognition system. The first eSNNs were based on 
Thorpe’s neural model [40.54], in which the importance 
of early spikes (after the onset of a certain stimu- 
lus) is boosted, called rank-order coding and learning. 
Synaptic plasticity is employed by a fast supervised 
one-pass learning algorithm. An exemplar eSNN for 
multimodal auditory-visual information processing on 


€°0% | d Hed 


776 PartD | Neural Networks 
a) Inte anon /— Spike b) Stimulus 
+ leakage aN ih 
I |l 
x, y i 
—— Refractory period 
e Binary 
E 
X3 events 
u(t 
i ( J 
it 
0.8 
0.6 
0.4 
0.2 
0 1 
0 2 4 6 8 t 
Fig. 40.6 (a) LIFM of a spiking neuron. (b) The LIFM increases its membrane potential u(t) with every incoming spike 
at time f until the potential reaches a threshold, after which the neuron emits an output spike and its potential is reset to 
an initial value 
Visual frame; (grayscale pixel) 
Auditory frame; (MFCC) 
x% 
+ 
OOOO Of Receptive fields Fe K 
Z g- 
B NA cells 
Speaker, Background = Z Ll 
model model 7- adidas PA v 
S Ll 4 Ww A S res ae Geni 
i rientation 
=t (m = Via Lf Wy x cells 
|| e = : 
= ; a 
2 Complex 
w cells 
L2 w = weights VA 7 
U Auditory class 1 Visual class 1 
w [0, +] Av [ES w Ow E eO, 
l Tia a w= wa See 
Supramoda ! Tovisual ! — ! Toauditory | 
layer a elass I ! class 1 ! 
Or neuron And neuron 
PSPm = 1 PSPr = 2 


Fig. 40.7 An exemplar eSNN for multimodal auditory-visual information processing in the case study problem of speaker au- 
thentication (after [40.56]) 


Evolving Connectionist Systems 


40.3 Evolving Spiking Neural Networks (eSNN) 


Spatio-temporal Receptive Input nee 
data sample fields neurons fa Hf se 


Evolving Fig. 40.8 A reservoir-based eSNN 
neuron for spatio-temporal pattern classifica- 


the case study problem of speaker authentication is 
shown in Fig. 40.7. 

Different eSNN models use different architec- 
tures. Figure 40.8 shows a reservoir-based eSNN for 
spatio-temporal pattern recognition where the reser- 
voir [40.57] uses the spike-time-dependent plasticity 
(STDP) learning rule [40.58], and the output classifier 
that classifies spatio-temporal activities of the reservoir 
uses rank-order learning rule [40.54]. 


40.3.3 Extracting Fuzzy Rules from eSNN 


Extracting fuzzy rules from an eSNN would make 
eSNN not only efficient learning models, but also 
knowledge-based models. A method was proposed in 
[40.59] and illustrated in Fig. 40.9a,b. Based on the con- 
nection weights w between the receptive field layer L1 
and the class output neuron layer L2 fuzzy rules are ex- 
tracted. 


40.3.4 eSNN Applications 


Different eSNN models and systems have been devel- 
oped for different applications, such as: 


@ eSNN for spatio- and spectro-temporal pattern 
recognition — http://ncs.ethz.ch/projects/evospike 

@ Dynamic eSNN (deSNN) for moving object recog- 
nition — [40.60] 

@ Spike pattern association neuron(SPAN) for gener- 

ation of precise time spike sequences as a response 

to recognized input spiking patterns — [40.61] 

Environmental and ecological modeling — [40.44] 

EEG data modeling — [40.62] 

Neuromorphic SNN hardware — [40.63, 64] 

Neurogenetic models (Sect. 40.4). 


repository tion (after [40.55]) 
> Class 1 
> Class n 
a) y 


L2i L2j 
Ci Gj 
b) w 
1 
Hci 
0.8 Ter 
0.6 
0.4 
02 il 
E! 
aa 2 3 4 5 Gar 


eal! 


Fig. 40.9 (a) A simple structure of an eSNN for 2- 
class classification based on one input variable using six 
receptive fields to convert the input values into spike trains. 
(b) The connection weights of the connections to class Ci 
and Cj output neurons, respectively, are interpreted as 
fuzzy rules. IF(input variable v is SMALL) THEN class 
Ci; IF(v is LARGE)THEN class Cj 


A review of eSNN methods, systems and their ap- 
plications can be found in [40.65]. 


777 


€°0%7 | d Hed 


778 Part D | Neural Networks 


104% | d Hed 


40.4 Computational Neuro-Genetic Modeling (CNGM) 


40.4.1 Principles 


A neuro-genetic model of a neuron was proposed 
in [40.41, 66]. It utilizes information about how some 
proteins and genes affect the spiking activities of a neu- 
ron such as fast excitation, fast inhibition, slow exci- 
tation, and slow inhibition. An important part of the 
model is a dynamic gene/protein regulatory network 
(GRN) model of the dynamic interactions between 
genes/proteins over time that affect the spiking activity 
of the neuron — Fig. 40.10. 

A CNGM is a dynamical model that has two dy- 
namical sub-models: 


@ GRN, which models dynamical interaction between 
genes/proteins over time scale T1 

@ eSNN, which models dynamical interaction be- 
tween spiking neurons at a time scale T2. 


The two sub-models interact over time. 


40.4.2 The NeuroCube Framework 


A further development of the eSNN and the CNGM 
was achieved with the introduction of the NeuroCube 
framework [40.67]. The main idea is to support the cre- 
ation of multi-modular integrated systems, where dif- 
ferent modules, consisting of different neuronal types 
and genetic parameters correspond in a way to dif- 
ferent parts of the brain and different functions (e.g., 
vision, sensory information processing, sound recog- 
nition, motor-control) and the whole system works in 
an integrated mode for brain signal pattern recognition. 
A concrete model built with the use of the NeuroCube 
would have a specific structure and a set of algorithms 
depending on the problem and the application condi- 
tions, e.g., classification of EEG, recognition of func- 


Fig. 40.10 A schematic diagram of a computational neuro-genetic 
modeling (CNGM) framework consisting of a gene/protein regula- 
tory network (GRN) as part of an eSNN (after [40.41]) 


tional magneto-resonance imaging (fMRI) data, brain 
computer interfaces, emotional cognitive robotics, and 
modeling Alzheimer’s disease. 

A block diagram of the NeuroCube framework is 
shown in Fig. 40.11. It consists of the following mod- 
ules: 


e@ An input information encoding module 

@ A NeuroCube module 

© An output module 

e@ A gene regulatory network (GRN) module. 


The main principles of the NeuroCube framework 


1. NeuroCube is a framework to model brain data (and 
not a brain model or a brain map). 

2. NeuroCube is a selective, approximate map of rel- 
evant to the brain data brain regions, along with 
relevant genetic information, into a 3-D spiking 
neuronal structure. 

3. An initial NeuroCube structure can include known 
connections between different areas of the brain. 

4. There are two types of data used for both training 
a particular NeuroCube and to recall it on new data: 
(a) data, measuring the activity of the brain when 
certain stimuli are presented, e.g., (EEG, fMRI); (b) 
direct stimuli data, e.g., sound, spoken language, 
video data, tactile data, odor data, etc. 

5. A NeuroCube architecture, consisting of a Neu- 
roCube module, (GRN)s at the lowest level, and 
a higher-level evaluation (classification) module. 

6. Different types of neurons and learning rules can be 
used in different areas of the architecture. 

7. Memory of the system is represented as a combi- 
nation of: (a) short-term memory, represented as 
changes of the neuronal membranes and temporary 
changes of synaptic efficacy; (b) long-term memory, 
represented as a stable establishment of synaptic ef- 
ficacy; (c) genetic memory, represented as a change 
in the genetic code and the gene/protein expression 
level as a result of the above short-term and long- 
term memory changes and evolutionary processes. 

8. Parameters in the NeuroCube are defined by 
genes/proteins that form dynamic GRN models. 

9. NeuroCube can potentially capture in its internal 
representation both spatial and temporal character- 
istics from multimodal brain data. 

10. The structure and the functionality of a NeuroCube 
architecture evolve in time from incoming data. 


Evolving Connectionist Systems 


40.5 Conclusions and Further Directions 


Spatio/spectro-temporal 
input data stream 


Chunk n |. Chunk 1 
| aren | IAR 


Ip Classification 


— 


t 


Modeling 


T . 
Input stimulus __--7 


hO Pii O wO, 


“hy Pei (0) 


Pil) 


Probabilistic parameters 


Gene regulatory network 


Fig. 40.11 The NeuroCube framework (after [40.67]) 


40.4.3 Quantum-Inspired Optimization 
of eSNN and CNGM 


A CNGM has a large number of parameters that 
need to be optimized for an efficient performance. 
Quantum-inspired optimization methods are suitable 
for this purpose as they can deal with a large num- 
ber of variables and will converge in much faster 
time that any other optimization algorithms [40.68]. 
Quantum-inspired eSNN (QeSNN) use the principle of 
superposition of states to represent and optimize fea- 
tures (input variables) and parameters of the eSNN 
including genes in a GRN [40.44]. They are optimized 
through a quantum-inspired genetic algorithm [40.44] 


Neurogenetic cube (NeuCube) 


Output class 


Class A 
Class B 
Class C 
Output module Class D 
Classification olein 
SS Field potential (V) 
Modeling 40 


Output data 


or a quantum-inspired particle swarm optimization al- 
gorithm [40.69]. Features are represented as qubits in 
a superposition of 1 (selected), with a probability a, and 
0 (not selected) with a probability 8. When the model 
has to be calculated, the quantum bits collapse in 1 or 0. 


40.4.4 Applications of CNGM 


Various applications of CNGM have been developed 
such as: 


@ Modeling brain diseases [40.41, 70] 
@ EEG and fMRI spatio-temporal pattern recogni- 
tion [40.67]. 


40.5 Conclusions and Further Directions 


This chapter presented a brief overview of the main 
principles of a class of neural networks called evolv- 
ing connectionist systems (ECOS) along with their 
applications for computational intelligence. ECOS fa- 
cilitate fast and accurate learning from data and new 
knowledge discovery across application areas. They 


integrate principles from neural networks, fuzzy sys- 
tems, evolutionary computation, and quantum comput- 
ing. The future directions and applications of ECOS 
are foreseen as a further integration of principles 
from information science-, bio-informatics, and neuro- 
informatics [40.71]. 


7 30 
20 
(Chunks) 10 
0 
-10 
-20 
-30 
40 > 
0 20000 40000 60000 
Time (ms) 


779 


S°04|d Hed 


780 PartD 


Neural Networks 


047 | d Hed 


References 
40.1 N. Kasabov: Evolving fuzzy neural networks - Al- 40.17 D. Deng, N. Kasabov: On-line pattern analysis by 

gorithms, applications and biological motivation. evolving self-organising maps, Neurocomputing 

In: Methodologies for the Conception, Design Ap- 51, 87-103 (2003) 

plication of Soft Computing, ed. by T. Yamakawa, 40.18 N. Kasabov: Evolving Connectionist Systems: Meth- 

G. Matsumoto (World Scientific, Singapore 1998) ods and Applications in Bioinformatics, Brain Study 

pp. 271-274 and Intelligent Machines, Perpective in Neural 
40.2 N. Kasabov: Foundations of Neural Networks, Fuzzy Computing (Springer, Berlin, Heidelberg 2003) 

Systems and Knowledge Engineering (MIT, Cam- 40.19 M. Watts: A decade of Kasabov's evolving con- 

bridge 1996) p. 550 nectionist systems: A review, IEEE Trans. Syst. Man 
40.3 N. Kasabov, S. Shishkov: A connectionist produc- Cybern. C 39(3), 253-269 (2009) 

tion system with partial match and its use for 40.20 N. Kohonen: Self-Organizing Maps, 2nd edn. 

approximate reasoning, Connect. Sci. 5(3/4), 275- (Springer, Berlin, Heidelberg 1997) 

305 (1993) 40.21 F. Girosi: Regularization theory, radial basis func- 
40.4 N. Kasabov: Hybrid connectionist production sys- tions and networks. In: From Statistics to Neural 

tem, J. Syst. Eng. 3(1), 15-21 (1993) Networks, ed. by V. Cherkassky, J.H. Friedman, 
40.5 L.A. Zadeh: Fuzzy sets, Inf. Control 8, 338-353 (1965) H. Wechsler (Springer, Heidelberg 1994) pp. 166-187 
40.6 L.A. Zadeh: Fuzzy logic, IEEE Computer 21, 83-93 40.22 G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Rey- 

(1988) nolds, D.B. Rosen: Fuzzy ARTMAP: A neural network 
40.7 L.A. Zadeh: A theory of approximate reasoning. architecture for incremental supervised learning of 

In: Machine Intelligence, Vol. 9, ed. by J.E. Hayes, analogue multidimensional maps, IEEE Trans. Neu- 

D. Michie, L.J. Mikulich (Ellis Horwood, Chichester ral Netw. 3(5), 698-713 (1991) 

1979) pp. 149-194 40.23 B. Fritzke: A growing neural gas network learns 
40.8 N. Kasabov: Incorporating neural networks into topologies, Adv. Neural Inf. Process. Syst. 7, 625- 

production systems and a practical approach to- 632 (1995) 

wards realisation of fuzzy expert systems, Comput. 40.24 J. Platt: A resource allocating network for function 

Sci. Inf. 21(2), 26-34 (1991) interpolation, Neural Comput. 3, 213-225 (1991) 
40.9 N. Kasabov: Hybrid connectionist fuzzy production 40.25 N. Kasabov, Q. Song: DENFIS: Dynamic, evolving 

systems — Towards building comprehensive Al, In- neural-fuzzy inference Systems and its application 

tell. Autom. Soft Comput. 1(4), 351-360 (1995) for time-series prediction, IEEE Trans. Fuzzy Syst. 
40.10 N. Kasabov: Connectionist fuzzy production sys- 10, 144-154 (2002) 

tems, Lect. Notes Artif. Intell. 847, 114-128 (1994) 40.26 N. Kasabov: Evolving Connectionist Systems: The 
40.11 N. Kasabov: Hybrid connectionist fuzzy systems for Knowledge Engineering Approach (Springer, Berlin, 

speech recognition and the use of connectionist Heidelberg 2007) 

production systems, Lect. Notes Artif. Intell. 1011, 40.27 J. Bezdek: A review of probabilistic, fuzzy, and neu- 

19-33 (1995) ral models for pattern recognition, J. Intell. Fuzzy 
40.12 T. Yamakawa, E. Uchino, T. Miki, H. Kusanagi: A neo Syst. 1, 1-25 (1993) 

fuzzy neuron and its application to system iden- 40.28 J. Bezdek (Ed.): Analysis of Fuzzy Information (CRC, 

tification and prediction of the system behaviour, Boca Raton 1987) 

Proc. 2nd Int. Conf. Fuzzy Log. Neural Netw. (lizuka, 40.29 J. Bezdek: Pattern Recognition with Fuzzy Objective 

Japan 1992) pp. 477-483 Function Algorithms (Plenum, New York 1981) 
40.13 T. Yamakawa, S. Tomoda: A fuzzy neuron and its 40.30 R.R. Yager, D. Filev: Generation of fuzzy rules by 

application to pattern recognition, Proc. 3rd IFSA mountain clustering, J. Intell. Fuzzy Syst. 2, 209- 

Congr., ed. by J. Bezdek (Seattle, Washington 1989) 219 (1994) 

pp. 1-9 40.31 Q. Song, N. Kasabov: NFI: A neuro-fuzzy infer- 
40.14 T. Furuhashi, T. Hasegawa, S. Horikawa, Y. Uchi- ence method for transductive reasoning, IEEE Trans. 

kawa: An adaptive fuzzy controller using fuzzy Fuzzy Syst. 13(6), 799-808 (2005) 

neural networks, Proc. 5th IFSA World Congr. Seoul 40.32 Q. Song, N. Kasabov: TWNFI - A transductive 

(1993) pp. 769-772 neuro-fuzzy inference system with weighted data 
40.15 N. Kasabov, J.S. Kim, M. Watts, A. Gray: FuNN/2 normalisation for personalised modelling, Neural 

- A fuzzy neural network architecture for adap- Netw. 19(10), 1591-1596 (2006) 

tive learning and knowledge acquisition, Inf. Sci. 40.33 N. Kasabov, Y. Hu: Integrated optimisation method 

101(3/4), 155-175 (1997) for personalised modelling and case studies for 
40.16 N. Kasabov: Evolving fuzzy neural networks for su- medical decision support, Int. J. Funct. Inf. Pers. 

pervised/unsupervised online knowledge-based Med. 3(3), 236-256 (2010) 

learning, IEEE Trans. Syst. Man Cybern. B 31(6), 902- 40.34 N. Kasabov: Global, local and personalised mod- 


918 (2001) 


elling and profile discovery in bioinformatics: An 


Evolving Connectionist Systems 


References 


40.35 


40.36 


40.37 


40.38 


40.39 


40.40 


40.41 


40.42 


40.43 


40.44 


40.45 


40.46 


40.47 


40.48 


40.49 


integrated approach, Pattern Recognit. Lett. 28(6), 
673-685 (2007) 

S. Ozawa, S. Pang, N. Kasabov: On-line feature 
selection for adaptive evolving connectionist sys- 
tems, Int. J. Innov. Comput. Inf. Control 2(1), 181-192 
(2006) 

S. Ozawa, S. Pang, N. Kasabov: Incremental learning 
of feature space and classifier for online pattern 
recognition, Int. J. Knowl. Intell. Eng. Syst. 10, 57- 
65 (2006) 

M. Watts: Evolving Connectionist Systems: Charac- 
terisation, Simplification, Formalisation, Explana- 
tion and Optimisation, Ph.D. Thesis (University of 
Otago, Dunedin 2004) 

N.L. Mineu, A.J. da Silva, T.B. Ludermir: Evolving 
neural networks using differential evolution with 
neighborhood-based mutation and simple sub- 
population scheme, Proc. Braz. Symp. Neural Netw. 
SBRN (2012) pp. 190-195 

P. Angelov: Evolving Rule-Based Models: A Tool 
for Design of Flexible Adaptive Systems (Springer, 
Berlin, Heidelberg 2002) 

N. Kasabov: Adaptive modelling and discovery 
in bioinformatics: The evolving connectionist ap- 
proach, Int. J. Intell. Syst. 23, 545-555 (2008) 

L. Benuskova, N. Kasabov: Computational Neuro- 
Genetic Modelling (Springer, Berlin, Heidelberg 
2007) 

L. Huang, Q. Song, N. Kasabov: Evolving connec- 
tionist system based role allocation for robotic 
soccer, Int. J. Adv. Robot. Syst. 5(1), 59-62 (2008) 
N. Kasabov: Adaptation and interaction in dynam- 
ical systems: Modelling and rule discovery through 
evolving connectionist systems, Appl. Soft Comput. 
6(3), 307-322 (2006) 

S. Schliebs, M. Defoin-Platel, S.P. Worner, N. Kasa- 
bov: Integrated feature and parameter optimiza- 
tion for evolving spiking neural networks: Explor- 
ing heterogeneous probabilistic models, Neural 
Netw. 22, 623-632 (2009) 

N. Kasabov, E. Postma, J. van den Herik: AVIS: 
A connectionist-based framework for integrated 
auditory and visual information processing, Inf. 
Sci. 123, 127-148 (2000) 

S. Pang, T. Ban, Y. Kadobayashi, K. Kasabov: LDA 
merging and splitting with applications to multi- 
agent cooperative learning and system alteration, 
IEEE Trans. Syst. Man Cybern. B 42(2), 552-564 
(2012) 

H. Widiputra, R. Pears, N. Kasabov: Multiple time- 
series prediction through multiple time-series 
relationships profiling and clustered recurring 
trends, Lect. Notes Artif. Intell. 6635, 161-172 (2011) 
D. Hebb: The Organization of Behavior (Wiley, New 
York 1949) 

A.L. Hodgkin, A.F. Huxley: A quantitative descrip- 
tion of membrane current and its application to 
conduction and excitation in nerve, J. Physiol. 117, 
500-544 (1952) 


40.50 


40.51 


40.52 


40.53 


40.54 


40.55 


40.56 


40.57 


40.58 


40.59 


40.60 


40.61 


40.62 


40.63 


40.64 


40.65 


40.66 


J. Hopfield: Pattern recognition computation using 
action potential timing for stimulus representa- 
tion, Nature 376, 33-36 (1995) 

W. Maass: Computing with spiking neurons. 
In: Pulsed Neural Networks, ed. by W. Maass, 
C.M. Bishop (MIT, Cambridge 1998) pp. 55-81 

W. Gerstner: Time structure of the activity of neural 
network models, Phys. Rev. E 51, 738-758 (1995) 
E.M. Izhikevich: Which model to use for cortical 
spiking neurons?, IEEE Trans. Neural Netw. 15(5), 
1063-1070 (2004) 

S. Thorpe, A. Delorme, R. van Ruller: Spike- 
based strategies for rapid processing, Neural Netw. 
14(6/7), 715-725 (2001) 

N. Kasabov: To spike or not to spike: A probabilis- 
tic spiking neuron model, Neural Netw. 23(1), 16-19 
(2010) 

S. Wysoski, L. Benuskova, N. Kasabov: Evolving 
spiking neural networks for audiovisual informa- 
tion processing, Neural Netw. 23(7), 819-836 (2010) 
D. Verstraeten, B. Schrauwen, M. d'Haene, D. Stroo- 
bandt: An experimental unification of reservoir 
computing methods, Neural Netw. 20(3), 391-403 
(2007) 

S. Song, K. Miller, L. Abbott: Competitive Hebbian 
learning through spike-timing-dependent synap- 
tic plasticity, Nat. Neurosci. 3, 919-926 (2000) 

S. Soltic, N. Kasabov: Knowledge extraction from 
evolving spiking neural networks with rank order 
population coding, Int. J. Neural Syst. 20(6), 437- 
445 (2010) 

N. Kasabov, K. Dhoble, N. Nuntalid, G. Indiveri: Dy- 
namic evolving spiking neural networks for on-line 
spatio- and spectro-temporal pattern recognition, 
Neural Netw. 41, 188-201 (2013) 

A. Mohemmed, S. Schliebs, S. Matsuda, N. Kasabov: 
SPAN: Spike pattern association neuron for learning 
spatio-temporal spike patterns, Int. J. Neural Syst. 
22(4), 1250012 (2012) 

N. Nuntalid, K. Dhoble, N. Kasabov: EEG classi- 
fication with BSA spike encoding algorithm and 
evolving probabilistic spiking neural network, Lect. 
Notes Comput. Sci. 7062, 451-460 (2011) 

G. Indiveri, B. Linares-Barranco, T.J. Hamilton, 
A. van Schaik, R. Etienne-Cummings, T. Del- 
bruck, S.-C. Liu, P. Dudek, P. Häfliger, S. Renaud, 
J. Schemmel, G. Cauwenberghs, J. Arthur, K. Hynna, 
F. Folowosele, S. Saighi, T. Serrano-Gotarredona, 
J. Wijekoon, Y. Wang, K. Boahen: Neuromorphicsil- 
icon neuron circuits, Front. Neurosci. 5, 5 (2011) 

G. Indiveri, E. Chicca, R.J. Douglas: Artificial cogni- 
tive systems: From VLSI networks of spiking neurons 
to neuromorphic cognition, Cogn. Comput. 1(2), 
119-127 (2009) 

S. Schliebs, N. Kasabov: Evolving spiking neural 
networks — a survey, Evol. Syst. 4(2), 87-98 (2013) 
N. Kasabov, L. Benuskova, S. Wysoski: A compu- 
tational neurogenetic model of a spiking neuron, 
Neural Netw. IJCNN'05. Proc. (2005) pp. 446-451 


781 


047 | d Hed 


782 


047 | d Hed 


Part D 


Neural Networks 


40.67 


40.68 


40.69 


N. Kasabov: NeuCube EvoSpike architecture for 
spatio-temporal modelling and pattern recogni- 
tion of brain signals, Lect. Notes Comput. Sci. 7477, 
225-243 (2012) 

M. Defoin-Platel, S. Schliebs, N. Kasabov: Quan- 
tum-inspired evolutionary algorithm: A multi- 
model EDA, IEEE Trans. Evol. Comput. 13(6), 1218- 
1232 (2009) 

H. Nuzly, A. Hamed, S.M. Shamsuddin: Probabilis- 
tic evolving spiking neural network optimization 


40.70 


40.71 


using dynamic quantum inspired particle swarm 
optimization, Aust. J. Intell. Inf. Process. Syst. 11(1), 
5-15 (2010) 

N. Kasabov, R. Schliebs, H. Kojima: Probabilis- 
tic computational neurogenetic framework: From 
modelling cognitive systems to Alzheimer's dis- 
ease, IEEE Trans. Auton. Ment. Dev. 3(4), 300-311 
(2011) 

N. Kasabov (Ed.): Springer Handbook of Bio/Neuro- 
informatics (Springer, Berlin, Heidelberg 2014) 


783 


41. Machine Learning Applications 


Piero P. Bonissone 


We desaibe ihe pocas af building wmpuiatonei 4.2 Machine Learning (ML) Functions.......... 786 
intelligence (CI) models for machine learning (ML) 4.3 CI/ML Applications in Industrial 
applications. We use offline metaheuristics to de- Domains: Prognostics 
sign the models’ run-time architectures and online and Health Management (PHM)............ 787 
metaheuristics to control/aggregate the object- 4.3.1 Health Assessment and Anomaly 
level models (base models) in these architectures. Detection: An Unsupervised 
CI techniques complement more traditional sta- Learning Problem....... peee oe 788 
tistical techniques, which are the core of ML for 41.3.2 pienie ~ Diagnostics: 
unsupervised and supervised learning. We ana- ; . 
lyze CI/ML industrial applications in the area of 41.3.3 Papasa etn tbl ba i 791 
prognostics and health management (PHM) for in- i A Regression Problem 6 i 791 
dustrial assets, and describe two PHM case studies. 4.3.4 Health Management- 
In the first case, we address anomaly detection for Fault Aeeormmadation 
aircraft engines; in the second one, we rank lo- and Optimization -ersin 797 
comotives in a fleet according to their expected Ee as oo A 
remaining useful life. Then, we illustrate simi- 41.4 CI/ML Applications in Financial 
lar Cl-enabled capabilities as they are applied Domains: Risk Management AERA 797 
to risk management for commercial and finan- Mad Automaton ee 
cial assets. In this context, we describe three ee 798 
case studies in insurance underwriting, mortgage 41.4.2 Mortgage Collateral Valuation: ia 
collateral valuation, and portfolio optimization. ii A Re ian Prowl : 801 
; ; gression Problem................ 
We explain the current trend favoring the use of 41.4.3 Portfolio Rebalancing: 
model ensemble and fusion Oel individual mod- An Optimization Problem........... 804 
els, and emphasize the need for injecting diversity 
during the model generation phase. We present 41.5 Model Ensembles and Fusion ............... 807 
a model-agnostic fusion mechanism, which can 41.5.1 Motivations 
be used with commoditized models obtained from for Model Ensembles ................. 807 
crowdsourcing, cloud-based evolution, and other 41.5.2 Construction 
sources. Finally, we explore research trends, and 41.5.3 a i ta aaa lis 807 
moane challengas/epporiuniies Tor MIL techniques ~ in the Model Ensembles............. 808 > 
in the emerging context of big data and cloud 41.5.4 Lazy Meta-Learning: x; 
computing A Model-Agnostic Fusion = 
EA Mechanisiieescseni rreren 809 £ 
41.1 a a ARTN 784 aie Summary and Future Research 
Intelligence Object- Challenges ..................ccceccceeeceeeeeeeeeees 812 
and Meta-Models..................... 78h 41.6.1 Future Research Challenges ........ 813 
41.1.2 Model Lifecycle... 786 References.............eeeeeeesseirerrrrererrrernn 817 


784 Part D | Neural Networks 


LL | d Hed 


41.1 Motivation 


Based on the definition provided by the IEEE Compu- 
tational Intelligence Society, computational intelligence 
(CI) covers biologically and linguistically motivated 
computational paradigms. Its scope broadly overlaps 
with that of soft computing (SC), a similar concept also 
conceived in the 1990s. The original definition of soft 
computing was [41.1]: 


An association of computing methodologies that 
includes as its principal members fuzzy logic 
(FL), neuro-computing (NC), evolutionary comput- 
ing (EC) and probabilistic computing (PC). 


Thus, in its original scope, CI excluded probabilis- 
tic reasoning systems, while including other nature- 
inspired methodologies, such as swarm computing, ant 
colony optimization, etc. More recently, however, CI 
has extended its scope to include statistically inspired 
machine-learning techniques. Throughout this review, 
we will adopt this less restrictive definition of CI 
techniques, increasing its overlapping with SC even 
more [41.2,3]. Readers interested in the historical ori- 
gins of the CI concept should consult [41.4—6]. 

In addressing real-world problems, we usually deal 
with physical systems that are difficult to model and 
possess large solution spaces. In these situations, we 
leverage two types of resources: domain knowledge 
of the process or product and field data that charac- 
terize the system’s behavior. The relevant engineering 
knowledge tends to be a combination of first princi- 
ples and empirical knowledge. Usually, it is captured in 
physics-based models, which tend to be more precise 
than data-driven models, but more difficult to con- 
struct and maintain. The available data are typically 
a collection of input-output measurements, represent- 
ing instances of the system’s behavior. Usually, data 
tend to be incomplete and noisy. Therefore, we often 
augment knowledge-driven models by integrating them 
with approximate solutions derived from CI methodolo- 
gies, which are robust to this type of imperfect data. CI 
is a flexible framework that offers a broad spectrum of 
design choices to perform such integration. 

Domain knowledge can be integrated within CI 
models in a variety of ways. Arguably, the simplest in- 
tegration is the use of physics-based models (derived 
from domain knowledge) to predict expected values of 
variables of interest. By contrasting the expected val- 
ues with the actual measured values, we compute the 
residuals for the same variables and use CI based mod- 


els to explain the differences. Domain knowledge can 
also be used to design CI-based models: it can influence 
the selection of the features (functions of raw data) that 
are the inputs to the CI models; it can suggest certain 
topologies for graphical models (e.g., NN architectures) 
to approximate known functional dependences; it can 
be represented by linguistics fuzzy terms and relation- 
ships to provide coarse approximations; it can be used 
to define data structures of individuals in the popula- 
tion of an evolutionary algorithm (EA); it can be used 
explicitly in metaheuristics (MH’s) that leverage such 
knowledge to focus its search in a more efficient way. 
For a more detailed discussion of the use of domain 
knowledge in EAs, see [41.7]. 

Computational intelligence started in the 1990s with 
three pillars: Neural networks (NNs), to create func- 
tional approximations from input—outputs training sets; 
fuzzy systems, to represent imprecise knowledge and 
perform approximate deductions with it; and Evolution- 
ary systems, to create efficient global search methods 
based on optimization through adaptation. Over the 
last decade, the individual developments of these pil- 
lars have become intertwined, leading to successful 
hybridizations. 


41.1.1 Building Computational Intelligence 
Object- and Meta-Models 


Recently, as described in [41.21], this hybridization has 
been structured as a three-layer approach, in which each 
layer has a specific purpose: 


@ Layer 1: Offline MHs. They are used in batch mode, 
during the model creation phase, to design, tune, 
and optimize run-time model architectures for de- 
ployment. Then they are used to adapt them and 
maintain them over time. Examples of offline MHs 
are global search methods, such as EAs, scatter 
search, tabu search, swarm optimization, etc. 

@ Layer 2: Online MHs. They are part of the run- 
time model architecture, and they are designed 
by offline MHs. The online MHs are used to in- 
tegrate/interpolate among multiple (local) object- 
models, manage their complexity, and improve their 
overall performance. Examples of online MHs are 
fuzzy supervisory systems, fusion modules, etc. 

@ Layer 3: Object-level Models. They are also part 
of the run-time architecture, and they are de- 
signed by offline MHs to solve object-level prob- 


Machine Learning Applications 


41.1 Motivation 


lems. For simpler cases, we use single object- 
level models that provide an individual SC func- 
tionality (functional approximation, optimization, 
or reasoning with imperfect data). For com- 
plex cases, we use multiple object-level models 
in parallel configuration (ensemble) or sequential 
configuration (cascade, loop), to integrate func- 
tional approximation with optimization and rea- 
soning with imperfect data (imprecise and uncer- 
tain). 


The underlying idea is to reduce or eliminate man- 
ual intervention in any of these layers, while leveraging 
CI capabilities at every level. We can manage complex- 
ity by finding the best model architecture to support 
problem decomposition, create high-performance lo- 
cal models with limited competence regions, allow 
for smooth interpolations among them, and promote 
robustness to imperfect data by aggregating diverse 


models. Let us examine some case studies that further 
illustrate this concept. 


Examples of Offline MH, Online MH, 

and Object Models 
In Table 41.1, we observe a variety of CI applications in 
which we followed the separation between object- and 
meta-level described earlier. In most of these applica- 
tions, the object-level models were based on different 
technologies such as machine learning (support vector 
machines, random forest), statistics (multivariate adap- 
tive regression splines, MARS), Hotelling’s T?), neural 
networks (feedforward, self-organizing maps), fuzzy 
systems, EAs, Case based. The online metaheuristics 
were mostly based on fuzzy aggregation (interpolation) 
of complementary local models or fusion of compet- 
ing models. The offline MHs were mostly implemented 
by evolutionary search in the model design space. De- 
scriptions of these applications can be found in the 


Table 41.1 Examples of CI applications at meta-level and object-level 


Case Problem instance Problem type Model design Model controller Object-level models References 
study (offline MHs) (online MHs) 
Anomaly Classification Model Fuzzy Multiple Models: SVM, [41.8] 
detection (system) T-norm tuning aggregation NN, Case-Based, MARS 
Anomaly Classification Manual design Fusion Multiple Models: [41.9] 
detection (system) Kolmogorov complexity, 
SOM. random forest, 
Hotteling T2, AANN 
#1 Anomaly Classification EA-base tuning Fuzzy Multiple Models: [41.10] 
detection (model) and regression of fuzzy supervisory supervisory Ensemble of AANN’s 
termset 
#2 Best units Ranking EA-base tuning of None Single Model: Fuzzy (41.11, 12] 
selection similarity function instance based models 
(Lazy Learning) 
#3 Insurance Classification EA Fusion Multiple Models: [41.13, 14] 
underwriting: NN, Fuzzy, MARS 
Risk management 
#4 Mortgage Regression Manual design Fusion Multiple Models: ANFIS, [41.15] 
collateral Fuzzy CBR, RBF 
valuation 
#5 Portfolio Multiobjective Seq. LP None Single Model: [41.16] 
rebalancing optimization MOEA (SPEA) 
Load, HR, Regression Multiple CART Fusion Multiple Models: [41.17] 
NO, forecast trees Ensemble of NN’s 
Aircraft engine Control/Fault EA tuning of linear Crisp supervisory Multiple Models (Loop): [41.18] 
fault recovery accommodation control gains SVM + linear control 
Power plant Optimization Manual design Fusion Multiple Models (Loop): [41.19] 
optimization MOEA + NN’s 
Flexible Optimization Manual design Fuzzy Single Model: [41.20] 
manufacturing supervisory Genetic Algorithms 


optimization 


785 


Vly | d Hed 


786 PartD 


Neural Networks 


el | d Hed 


references listed in the last column of Table 41.1. The 
five case studies covered in this review are indicated in 
the first column of Table 41.1. 


41.1.2 Model Lifecycle 


In real-world applications, before using a model in 
a production environment we must address the model’s 
complete life cycle, from its design and implementa- 
tion to its validation, tuning, production testing, use, 
monitoring, and maintenance. By maintenance, we re- 
fer to all the steps required to keep the model vital (e.g., 
nonobsolete) and to adapt it to changes in the envi- 
ronment in which it is deployed. Many reasons justify 


this focus on model maintenance. Over the model’s life 
cycle, maintenance costs are the by far most expen- 
sive ones (as software maintenance costs are the most 
expensive ones in the life of a software system). Fur- 
thermore, when dealing with mission-critical software 
we need to guarantee continuous operation or at least 
fast recovery from system failures or model obsoles- 
cence to avoid lost revenues and other business costs. 
The use of MHs in the design stage allows us to cre- 
ate a process for automating the model building phase 
and subsequent model updates. This is a critical step to 
quickly deploy and maintain CI models in the field, and 
it will be further described in the case studies. Addi- 
tional information on this topic can be found in [41.22]. 


41.2 Machine Learning (ML) Functions 


Machine learning techniques can be roughly subdi- 
vided into supervised, semisupervised reinforcement, 
and unsupervised learning. The distinction among these 
categories depends on the complete, partial, or lack of 
available ground truth (i. e., correct outputs for each in- 
put vector) during the training phase. 

Unsupervised learning techniques are used when 
no ground truth is available. Their goal is to iden- 
tify structures in the input space that could be used 
to decompose the problem and facilitate local model 
building. Typical examples of unsupervised learning are 
cluster analysis, self-organizing maps (SOMs) [41.23], 
and dimension reduction techniques, such as principal 
components analysis (PCA), independent components 
analysis (ICA), multidimensional scaling (MDS), etc. 

Reinforcement learning (RL) does not rely on 
ground truth. It assumes that an agent operates in an 
environment and after performing one or more actions 
it receives a reward that is a consequence of its actions, 
rather than an explicit expression of ground truth. Sut- 
ton and Barto [41.24] were among the first proponents 
of this technique, which is quite promising to model 
adversarial situations, but it has not generated many 
industrial or commercial applications. A succinct de- 
scription of RL can be found in [41.25]. 

Semisupervised and supervised learning techniques 
are used when partial or complete ground truth is avail- 
able, such as labels for classification problems and 
real-values for regression problems. There are many 
traditional linear models for classification and regres- 
sion. For instance, we have linear discriminant analysis 
(LDA) and logistic regression (LR) for classification, 


and least-squares techniques — combined with feature 
subset selection or shrinkage methods (e.g., Ridge, least 
absolute shrinkage and selection operator (LASSO)) — 
for regressions. CI techniques usually generate nonlin- 
ear solutions to these problems. We can group the most 
of the commonly used nonlinear techniques, as 


© Directed graphical-based models, such as neural 
networks, neural fuzzy systems, Bayesian belief 
networks, Bayesian neural networks, etc. 

© Tree based: Classification analysis and regression 
trees (CARTs) [41.26], ID3/C4.5 [41.27], etc. 

© Grammar based: Genetic programming, evolution- 
ary programming, etc. 

© Similarity and metric learning: Lazy learning 
(or instance-based learning) [41.28, 29], case-based 
reasoning, (fuzzy) k-means, etc. 

© Undirected graphical models: Markov graphs, re- 
stricted Boltzmann machines [41.30], etc. 


The reader can find a comprehensive treatment of 
these techniques in [41.31]. Some of these models are 
used as part of on ensemble, rather than individually. 
Such is the case of random forest [41.32], which is 
based on a collection of CART trees [41.26]. Similarly, 
a fuzzy extension of random forest using fuzzy deci- 
sion trees [41.33] can be found in [41.34]. This trend 
toward the use of ensembles is covered in Sect. 41.5. 
We will now focus on a subset of the CI/ML applica- 
tions. Specifically, we will analyze two case studies in 
industrial applications (Sect. 41.3) and three in financial 
applications (Sect. 41.4). 


Machine Learning Applications | 41.3 CI/ML Applications in Industrial Domains: Prognostics and Health Management (PHM) 787 


41.3 CI/ML Applications in Industrial Domains: 
Prognostics and Health Management (PHM) 


To provide a coherent theme for the ML indus- 
trial application, we will focus on prognostics and 
health management (PHM). The main goal of PHM 
for assets such as locomotives, medical scanners, and 
aircraft engines is to maintain these assets’ opera- 
tional performance over time, improving their utiliza- 
tion while minimizing their maintenance cost. This 
tradeoff is critical for the proper execution of con- 
tractual service agreements (CSAs) offered by origi- 
nal equipment manufacturer’s (OEM) to their valued 
customers. 

PHM is a multidiscipline field, as it includes aspects 
of electrical engineering (reliability, design, service), 
computer and decision sciences (artificial intelligence, 
CI, MI, statistics, operations research (OR)), mechan- 
ical engineering (geometric models for fault propaga- 


Health assessment 


Asset Operator 


(1) Remote (5) Diagnostics 


monitoring 


Subsystem 


Raw sensor data failute modes 


Subsystem health 
assessment 
(y, deterior. index) 


(2) Data pre- 
processing 


Time-stamped features, Classification 
event messages, 


parametric data 


(6) Prognostics 
|—___, ——_ 


Remaining 
useful life (RUL) 


(3) Anomaly 
detection 


Warnings & alerts, 


change detection time Prediction 


(4) Anomaly 
identification 


Anomaly source ID: 
-System 

- Sensor 

- Operator 

- Control 

- Reference 

- Model 


Detection 


Fig. 41.1 PHM functional architecture 


tion), material sciences, etc. Within this paper, we will 
focus on the role that CI plays in PHM functionalities. 
PHM can be divided into two main components: 


© Health assessment: the evaluation and interpretation 
of the asset’s current and future health state. 

© Health management: the control, operation, and lo- 
gistic plans to be implemented in response to such 
assessment. 


PHM functional architecture is illustrated in 
Fig. 41.1, adapted from [41.35]. 
The first two tasks: 


(1) Remote monitoring, and 
(2) Input data preprocessing, are platform dependent, 
as they need domain knowledge to identify and 


Health management 


(7) Fault Corrective action 
accommodation identification 


Part level health RUL 


assessment 


On-board tactical control 


Available 
A PEET. SS reconfigurations 
ae Mission 
(8) Logistics objectives & 
decision requirements 


engine 


Parts availability 


| | | : Available assets 

i| Maintenance Operational || Supply chain |: 

actions/ actions/ actions/ . 

plans plans plans f Other inputs 
: _ Readiness Operational Inventory i HM 
:| improvement impact 
H assessment |: 

assessment assessment : 


Off-board strategic planning 


E'LH | d Hed 


788 PartD 


Neural Networks 


E'LH | d Hed 


select the most informative inputs, perform data cu- 
ration (de-noising, imputation, and normalization), 
aggregate them, and prepare them to be suitable in- 
puts for the models. 
The remaining decisional tasks could be considered 
platform independent (to the extent that their func- 
tions could be accomplished by data-driven models 
alone.) These tasks are: 

(3) Anomaly detection and identification 

(4) Anomaly resolution 

(5) Diagnostics 

(6) Prognostics 

(7) Fault accommodation 

(8) Logistics decisions. 


Health assessment, which is based on descriptive 
and predictive analytics, is contained in the left block 
of Fig. 41.1 (annotated with P). Health management 
(HM), which is based on prescriptive analytics, is con- 
tained in the right block of Fig. 41.1 (annotated with 
HM). In the remaining of this section, we will cover 
two case studies related to anomaly identification and 
prognostics. 


41.3.1 Health Assessment 
and Anomaly Detection: 
An Unsupervised Learning Problem 


Anomaly Detection (AD) 

Using platform-deployed sensors, we collect data re- 
motely. We preprocess it, via segmentation, filtering, 
imputation, validation, and we summarize it by ex- 
tracting feature subsets that provide a more succinct, 
robust representation of its information content. These 
features, which could contain a combination of cat- 
egorical and numerical values, are analyzed by an 
anomaly detection model to assess the degree of abnor- 
mal behavior of each asset in the fleet. If the degree 
of abnormality exceeds a given threshold, the model 
will identify the asset, determine the time when the 
anomaly was first noticed and suggest possible causes 
of the anomaly (usually a coarse identification at the 
systems/subsystem level). Anomaly detection usually 
leverages unsupervised learning techniques, such as 
clustering. Its goal is to extract the underlying structural 
information from the data, define normal structures and 
regions, and identify departures from such regions. 


Anomaly Identification (Al) 
After detecting an abnormal change, e.g., a departure 
from a normal region of the state space, we need to 


identify its cause. There are many factors that could 
cause such change: 


(a) A system fault, which could eventually lead to a fail- 
ure. 

(b) A sensor fault, which is creating incorrect measure- 
ments. 

(c) An inadequate anomaly detection model, which is 
falsely reporting anomalies due to poor design, in- 
adequate model update, execution outside its region 
of competence, etc. 

(d) A sudden, unexpected operational transient, which 
is stressing the system by creating an abrupt load 
change. This transient could be originated by an op- 
erator error, who is requesting such sudden change; 
by an incorrect reference vector (in case of oper- 
ation automation), which is also requesting such 
abrupt change; or by a poorly designed controller, 
which is either over- or under-compensating for 
a perceived state change. 


The first factor (system fault) represents a correct 
anomaly classification and should trigger the rest of 
the workflow (diagnostics, prognostics, fault accom- 
modation, and maintenance optimization), while the 
other three factors generate false alarms (false posi- 
tives.) In the next case study, we will focus on how to 
improve the accuracy of an anomaly detection model, 
(third factor in the list) and decrease the probability of 
causing false positives. This increase in model fidelity 
will also create a sharper distinction between system 
faults and sensor faults (first and second factors in the 
list). 


Anomaly Detection for Aircraft Engines 
Problem Definition. As noted in Sect. 41.1, one 
of the best way to leverage domain knowledge is 
to create expected values using highly tuned physics 
based simulators, compare them with actual values and 
analyze the differences (residuals) using data-driven 
models. 


Physics-Based Simulator. In this case study, we fo- 
cused on the detection of anomalies in a simulated 
aircraft engine. A component level model (CLM), 
a thermodynamic model that has been widely used to 
simulate the performance of an aircraft engine, pro- 
vided the physics-based model. Flight conditions, such 
as altitude, Mach number, ambient temperature, and en- 
gine fan speed, and a large variety of model parameters, 
such as module efficiency and flow capacity are inputs 


Machine Learning Applications | 41.3 CI/ML Applications in Industrial Domains: Prognostics and Health Management (PHM) 


to the CLM. The outputs of the CLM are the values for 
pressures, core speed and temperatures at various loca- 
tions of engine, which simulate sensor measurements. 
Realistic values of sensor noise can be added after the 
CLM calculation. In this study, we used a steady state 
CLM model for a commercial, high-bypass, twin spool, 
turbofan engine. 


Actual Values. We used engine data collected under 
cruise conditions to monitor engine health changes. 


Data-Driven Model. We realized that a single, global 
model — regardless of the technology used to implement 
it — would be inadequate for large operating spaces of 
the simulated engine. Global models are designed to 
achieve a compromise among completeness (for cover- 
age), high fidelity (for accuracy), and transparency (for 
maintainability). As a result, we usually end up with 
models that in order to maintain small biases exhibit 
large variability. This variability might be too large to 
distinguish between model error and anomalous system 
behavior and can be a significant factor in the genera- 
tion of false alarms. 


Cl-Based Approach. To solve the model fidelity prob- 
lem, we decomposed the engine’s operating space into 
several, partially overlapping regions and developed 
a set of local models, trained on each region. This 
schema required a supervisory model (or meta model) 
to determine the competence region of each local model 
and select the appropriate one. In control problems, 
this supervisory module typically selects one controller 
(out of a collection of low-level controllers) to close 
the loop with the dynamic system. In many fuzzy 
controllers application [41.36, 37], a fuzzy supervisory 
module determines the applicability degree of the low- 
level controller and interpolates their outputs. Usually, 
this is done with a weighted, convex sum of the con- 
trollers’ outputs. The weights used in the convex sum 
are the applicability degrees of the low-level controllers 
in the part of the state space that contains the input. The 
transition from mode selection to mode melting [41.38] 
generates a smoother response surface by avoiding dis- 
continuities. 

We applied the same concept to the problem of im- 
proving the fidelity of data-driven models for anomaly 
detection. First, we decided to use auto-associative NNs 
(AANN) to implement the local models. Then, we de- 
veloped a fuzzy supervisory controller, defining the 
applicability of each AANN as a fuzzy region in the 
engine’s three-dimensional operating space, defined by 


altitude, Mach number, and Ambient temperature. Fi- 
nally, we used and evolutionary algorithm to tune the 
term set of the fuzzy supervisory and find the best fuzzy 
boundaries to interpolate between AANNs with over- 
lapping applicability. 


Local Models. Auto-Associative Neural Networks 
(AANN’s) are feedforward neural networks with struc- 
ture satisfying requirements for performing restricted 
auto-association. The inputs to the AANN go through 
a dimensionality reduction, as their information is com- 
bined and compressed in intermediate layers. For ex- 
ample, in Fig. 41.2 the nine nodes in the input layer are 
reduced to five and then three, in the second layer (en- 
coding) and third layer (bottleneck), respectively. Then, 
the nodes in the 3rd layer are used to recreate the origi- 
nal inputs, by going through a dimensionality expansion 
(fourth layer, decoding, and fifth layer, outputs). In the 
ideal case, the AANN outputs should be identical to the 
inputs. Their difference (residuals) and their gradient 
information are used to train the AANN to minimize 
such difference. 

This network computes the largest nonlinear princi- 
pal components (NLPCA’s) — the nodes in the interme- 
diate layer — to identify and remove correlations among 
variables. Besides the generation of residuals this type 
of network can also be used in dimensionality reduc- 
tion, visualization, and exploratory data analysis. As 
noted in [41.39]: 


While (principal component analysis) PCA iden- 
tifies only linear correlations between variables, 


Le 
Sor 


Oa 
Ss 
Naga 


Input Encoding Bottleneck Decoding Output 
layer layer layer layer layer 


Fig. 41.2 Architecture of a 9-5-3-5-9 auto associative neu- 
ral network 


789 


E'LH | d Hed 


790 Part D | Neural Networks 
NLPCA uncover both linear and nonlinear corre- rameters. For each of ambient temperature and Mach 
lations, without restriction on the character of the number, we varied four parameters, with a total of ten 
nonlinearities present in the data. search parameters. Each individual was a set of ten 
parameters that created a corresponding set of mem- 
NLPCA operates by training a feedforward neural net- bership functions that controlled residuals behavior 
work to perform the identity mapping, where the net- of the fuzzy supervisory model. The fitness of each 
work inputs are reproduced at the output layer. The individual was computed based on the aggregate of 
network contains an internal bottleneck layer (contain- the nine sensor residuals, with a goal toward max- 
ing fewer nodes than input or output layers), which imizing fitness or minimizing overall residuals. The 
forces the network to develop a compact representa- EA used was based on the genetic algorithm opti- 
tion of the input data, and two additional hidden layers. mization toolbox (GAOT) toolkit. The population size 
Additional information about AANNs can be found was set at 500, and the generation count was set at 
in [41.39-41]. 1000. The EA execution was very efficient taking only 
The complete CI approach is illustrated in Fig. 41.3, about 2h of execution time on a standard desktop 
adapted from [41.10]. The left part of the figure shows machine. 
the run-time anomaly detection (AD) model. The cen- 
ter part of Fig. 41.3 shows an instance of the term Results. As a result of this experiment, we were 
set used by the fuzzy supervisory system (the scale able to drastically reduce the residuals generated un- 
of the operational state variables was normalized as der steady state, no-fault assumption, and we improved 
a percentage of the range of values to preserve propri- the fidelity of the local model ensemble by more than 
etary information). In the right part of Fig. 41.3, we a factor of four with respect to a reference global 
can see the evolutionary algorithm (EA) in a wrapper data-driven model. This fidelity allowed us to cre- 
configuration, used to tune the membership functions ate a sharper baseline used to identify true engine 
(term sets). Each individual in the EA population is anomalies, distinguishing them from sensor anoma- 
a set of parameters that represents an implementable lies. For a more detailed description of these results 
term set configuration. For Altitude, we varied two pa- see [41.10]. 
Fuzzy supervisory rule set Evolutionary algorithm 
[see n [Aet Amb. [Macht] Model # tuning the FS ees 
: “| variables temp. a wrapper approac 
paar Engine physics-based sinalator 5 RI Hah High | High [AANNAL PRALE 
Operational Sensor F Piega pe Tii a | an Ree Individual in EA 
state vector data ie R3 | Medium | Low Low AANN-3 population defines 
ae ene ten -7| fuzzy sup. termset 
Run-time anomaly detection model Fuzzy supervisory term set | 1 : 
“ Altitude Fuzzy supervisory 
a Fuzzy $ 1 E interpolates among 
supervisory Ë osl ae AANN using termset 
z system p Se) E 
o 0 125 25 37.5 S50 62.5 75 87.5 100 Compute residuals between 
es F as Ambient témporaturė nine simulated sensors 
o E — High & interpolated AANN output 
— £ 0.5} — Median 
= Š ae - {i - 
w ` 0125 25 375 50 625 75 875 100 Compute e 
AARD AANN-2 AANN:S N #1 r= of aae sensor residuals 
EI Metin | J 
= 


Residual analysis 


— Low 


0 12.5 25 37.5 50 62.5 75 87.5 100 


Evolutionary algorithm 
based on fitness function 


Fig. 41.3 Evolutionary algorithms tune the term sets of the fuzzy supervisory system to interpolate the outputs of an ensemble 
of local auto associative NNs 


Machine Learning Applications | 41.3 CI/ML Applications in Industrial Domains: Prognostics and Health Management (PHM) 791 


41.3.2 Health Assessment — Diagnostics: 
A Semisupervised 
and Supervised Learning Problem 


The information generated by the anomaly identifica- 
tion model allows a diagnostic module to focus on 
a given unit’s subsystem, analyzing key variables as- 
sociated with the subsystem, and trying to match their 
patterns with a library of signatures associated with 
faults or incipient failure modes. The result is a ranked 
list of possible faults. Therefore diagnostics is a classi- 
fication problem, mapping a feature space into a labeled 
fault space. 

Usually, data-driven diagnostics leverages super- 
vised learning techniques to extract potential signatures 
from the historical data and use them to recognize dif- 
ferent failure modes automatically. A large variety of 
statistical and AI-based techniques can be used for au- 
tomatic fault diagnostics, including neural networks, 
decision tree, random forest, Bayesian belief network, 
case-based reasoning, hidden Markov model, support 
vector machine, fuzzy logic etc. Those data-driven di- 
agnostics methods are able to learn the faulty signatures 
or patterns from the training data and associate them 
with different failure modes when new data arrives. 

Data-driven approaches have many benefits. First, 
they can be designed to be independent of domain 
knowledge related to a particular system. We could use 
this approach with data recorded for almost any compo- 
nent/system, as long as the recorded data is relevant to 
the health condition of the interested component. This 
reduces the effort involved with eliciting and incorpo- 
rating domain specific knowledge. A second benefit is 
the use of fusion techniques to take advantage of diverse 
information from multiple data sources/models to boost 
diagnostics performance. The third benefit is the robust- 
ness to noise exhibited by most of data-driven methods, 
such as fuzzy logic and neural networks. However, all 
data-driven techniques require the availability of la- 
beled historical data so these data collection step must 
precede the application of these methods. 

Domain knowledge, when available, can still be 
leveraged to initialize the structures of the data-driven 
models (feature selection, network topology, etc.) and 
provide better initial conditions for optimization and 
tuning techniques applied to the data-driven diagnostics 
models. 

Supervised learning is a very mature topic in ML. 
As a result, there are many diagnostics applications of 
CI techniques to medical [41.42—45] industrial [41.46, 
47] automotive [41.48], and other domains. Given its 


widespread use, we will not provide additional case 
studies for diagnostics. 


4.3.3 Health Assessment — Prognostics: 
A Regression Problem 


Prognostics is the prediction of remaining useful life 
(RUL), when the anomaly detection and diagnostics 
modules can identify and isolate an incipient fail- 
ure through its preceding faults. This incipient failure 
changes the graph of RUL versus time from a linear, 
normal-wear trajectory to an exponentially decaying 
one. The fault time and incipient failure mode deter- 
mine the inflection point in such curve and the dete- 
rioration steepness, respectively. These estimates are 
usually in units of time or utilization cycles, and have 
an associated uncertainty, e.g., a probability density 
curve around the actual estimate. Typically, this uncer- 
tainty (e.g., RUL confidence interval) increases as the 
prediction horizon is extended. Operators can choose 
a confidence level that allows them to incorporate a risk 
level into their decision making. They can change oper- 
ational characteristics, such as load, which may prolong 
the life of components at risk. They can also account 
for upcoming maintenance and set in motion a logis- 
tics process to support a smooth transition from faulted 
equipment to fully functioning. 

Predicting RUL is not trivial, because RUL depends 
on current deterioration state and future usage, such 
as unit load and speed, among others. Prognostics is 
closely linked with diagnostics. In the absence of any 
evidence of damage or faulted condition, prognostics 
reverts to Statistical estimation of fleet-wide life. It is 
common to employ prognostics in the presence of an 
indication of abnormal wear, faults, or other abnormal 
situation. Therefore, it is critical to provide accurate and 
quick diagnostics to allow prognostics to operate. At the 
heart of prognostics is the ability to properly model the 
accumulation and propagation of damage. A common 
approach to prognostics is to employ a model of dam- 
age propagation contingent on future use. Such models 
are often times based on detailed materials knowledge 
and makes use of finite element modeling. This requires 
an in depth understanding of the local conditions the 
particular component is exposed to. 

For example, for spall propagation in bearings, we 
need to know the local load, speed, and temperature 
conditions at the site of the damage, e.g., at the outer 
race (or ball or cage). In addition, we need to know 
the geometry and local material properties at the sus- 
pected damage site. This information is used to derive 


E'LH | d Hed 


792 


E'LH | d Hed 


Part D 


Neural Networks 


the stresses that components are expected to experience, 
typically using a finite element approach. The potential 
benefit of this process is the promise of accurate predic- 
tion of when the bearing will fail. For a different fault 
mode, the process needs to be repeated. Because of the 
cost and effort involved, this method is reserved for a set 
of components that, if left undetected and without re- 
maining life information, might experience catastrophic 
failure that transcends the entire system and causes 
system failure. However, there is a large set of com- 
ponents that will not benefit from this approach, either 
because a physics-based damage model is not achiev- 
able or is too costly to develop. Therefore, it is desirable 
to increase coverage of prognostics for a range of fault 
modes. To this end, the techniques would ideally utilize 
existing models and sensor data. 

ML provides us with an alternative approach, which 
is based on analyzing time series data where the equip- 
ment behavior has been monitored via sensor mea- 
surements during the normal operation until equipment 
failure. When a reasonable set of these observations 
exists, ML algorithms can be employed to recognize 
these trends and predict remaining life (albeit, often 
times under the assumption of near-constant future 
load conditions.) Usually, specific faults have preferred 
directions in the health related feature space. By extrap- 
olating the propagation in this parameter space and by 
mapping the extrapolation into the time domain, we can 
derive RUL information. 

A prerequisite to leverage RUL estimation is to 
have a narrow confidence interval, so that this informa- 
tion is actionable and can be used in the asset health 
management part of PHM as a time horizon to opti- 
mize the logistics/maintenance scheduling plan. In most 
cases, however, we do not have run-to-failure data in 
the time series. Usually, when a failure is identified it is 
corrected promptly, causing the time series to be statis- 
tically censored on the right. The lack of run-to-failure 
data further compounds the technical difficulty of pre- 
dicting RUL with a small variance. 

We consider two options to address this problem. 
The first option is to use of an ensemble of diverse 
predictive models (Sect. 41.5.3 for a definition of diver- 
sity) such that the fusion of the ensemble will reduce 
the variance and make the output more actionable — 
Sect. 41.5 is devoted to this topic. The second option 
is to relax the problem formulation, by increasing the 
granularity of the models output. This granularity is de- 
termined by the actions that we will perform with such 
information. For example, we could formulate prognos- 
tics as: 


(1) A partial ordering over RUL. This formulation 
could be used to estimate the risk of claims in 
term life policies. Insurance underwriters estimate 
the applicants’ expected mortality at a coarse level 
by classifying each applicant into a given rate-class 
from a set of sorted classes that define decreasing 
RUL. Applicants inside each class are indistinguish- 
able in terms of risk and are charged the same 
premium (gender and age being equal). This will be 
further described in the case study of Sect. 41.4.1. 
In a PHM context, this formulation could be used 
to price the contractual service agreement renewals 
for different units within a fleet, in a fashion similar 
to the risk-based pricing of insurance underwriting. 

(2) An ordinal ordering over RUL (ranking). This for- 
mulation could be used to select the most reliable 
units of a fleet for mission-critical assignments. 
This will be further described in the case study of 
Sect. 41.3.3, where we illustrate how a train dis- 
patcher could select the best locomotives to create 
a hot train, e.g., a freight train with a guaranteed 
arrival time. 

(3) A cardinal ordering over RUL (rating). This formu- 
lation could be used to understand the relative level 
of readiness of units in a fleet, to prioritize the need 
for instruments calibration, power management as- 
sessment/verification, etc. 

(4) A binary classification of whether a given event 
(causing the end of RUL) will happen within a given 
time window. This formulation could be used to 
generate a time-dependent risk assessment to opti- 
mize fleet scheduling and unit allocation. 

(5) A regression on RUL, including the confidence in- 
terval of the prediction. This formulation provides 
the finest granularity. If we were able to reduce 
the confidence intervals of these predictions, we 
could refine and optimize the condition-based main- 
tenance of the assets in the fleet. 


The following case study will illustrate the second 
problem reformulation (ranking). In this case, genetic 
algorithms are used to evolve fuzzy instance-based 
models that will generate a ranking of the most reliable 
locomotives within a fleet. 


Case Study 2: RUL-Driven Ranking 

of Locomotives in a Fleet 
Problem Definition. The problem of selecting the 
best units from a fleet of equipment occurs in many 
military and commercial applications. Given a specific 
mission profile, a commander may have to decide which 


Machine Learning Applications | 41.3 CI/ML Applications in Industrial Domains: Prognostics and Health Management (PHM) 


five armored vehicles to deploy in order to minimize the 
chance of a breakdown. In the commercial world, rail 
operators need to make decisions on which locomotives 
to use in a train traveling from coast to coast with time 
sensitive shipments. 

The behavior of these complex electromechanical 
assets varies considerably across different phases of 
their life cycle. Assets that are identical at the time 
of manufacture will evolve into somewhat individual 
systems with unique characteristics based on their us- 
age and maintenance history. Utilizing these assets 
efficiently requires a) being able to create a model char- 
acterizing their expected performance, and b) keeping 
this model updated as the behavior of the underlying 
asset changes. 

In this problem formulation, RUL prediction for 
each individual unit is computed by aggregating its own 
track record with that of a number of peer units — units 
with similarities along three key dimensions: system de- 
sign, patterns of utilization, and maintenance history. 
The notion of a peer is close to that of a neighbor 
in CBR, except the states of the peers are constantly 
changing. Odometer-type variables like mileage and 
age increase, and discrete events like major mainte- 
nance or upgrades occur. Thus, it is reasonable to 
assume that after every significant mission, the peers 
of a target unit may change based upon changes in both 
the unit itself, and the fleet at large. Our results suggest 
that estimating unit performance from peers is a prac- 
tical, robust, and promising approach. We conducted 
two experiments — one for retrospective estimation and 
one for prospective estimation. In the first experiment, 
we explored how well the median RUL of any unit 
could be estimated from the medians of its peers. In 
the second experiment, for a given instant in time, we 
predicted the time to the next failure for each unit us- 
ing the history of the peers. In these experiments, the 
retrospective (or prospective) RUL estimates were used 
to induce a ranking over the units. The selection of 
the best N units was based on this ranking. The preci- 
sion of the selection was the percentage of the correctly 
selected units among the N units (based on ground 
truth). 


Cl-Based Approach. Our approach was based on 
fuzzy instance-based model (FIM), which can be found 
in [41.11]. We addressed the definition of similarity 
among peers by evolving the design of a similarity func- 
tion in conjunction with the design of the attribute space 
in which the similarity was evaluated. Specifically, we 
used the following four steps: 


(1) Retrieval of similar instances from the database 
(DB). 

(2) Evaluation of similarity measures between the 
probe and the retrieved instances. 

(3) Creation of local models based on the most similar 
instances. 

(4) Aggregation of local models outputs (weighted by 
their similarity measures). 


(1) Retrieval. We looked for all units in the fleet DB 
whose behavior was similar to the probe. These in- 
stances are the potential peers of the probe. The peers 
and probe can be seen as points in an n-dimensional fea- 
ture space. For instance, let us assume that a probe Q is 
characterized by an n-dimensional vector of feature Xo, 
and O(Q) = [D1, 9. D2,9, - . - , Deco). o] the history of its 
operational availability durations 


Q = [Xo:0(9)] 
= [t1,0.---.%n,9;D1,0.---,Do), 0] - (41.1) 


Any other unit u; in the fleet has a similar characteriza- 
tion 


u = [X; O(u)] 
= [41,j,%2,j,---5%n,j;D1,;,D2,j,...,Diqy,j] - 
(41.2) 


For each dimension i we defined a truncated general- 
ized Bell function, TGBF;(x;; ai, bi, ci), centered at the 
value of the probe c;, which represents the degree of 
similarity along that dimension. Specifically 


TGBF,(x;; ai bi, ci) 


2b; 
Xi Ci 


1+ 


qj 


Xi— Ci 


if | 1+ 


di 
0 otherwise 
(41.3) 


where e is the truncation parameter, e.g., e = 107. 
Since the parameters c; in each TGBF; are de- 
termined by the values of the probe, each TGBF; 
has only two free parameters a; and b; to control its 
spread and curvature. In a coarse retrieval step, we ex- 
tracted an instance in the DB if all of its features are 


793 


E'LH | d Hed 


794 PartD 


Neural Networks 


E'LH | d Hed 


within the support of the TGBF’s. Then we formal- 
ized the retrieval step. P(Q), the set of potential peers 
of Q, is composed of all units within a range from 
the value of Q : P(Q) = {y,j=1,...,m|y € N(Xo)} 
where N (Xọ) is the neighborhood of Q in the state space 
X, defined by the constraint ||x;, 9 —;, ;|| < R; for all po- 
tential attributes i for which the corresponding weight is 
nonzero. R; is half of the support of the TGBF;, centered 
on the probe’s coordinate x;, o. 


(2) Similarity Evaluation. Each TGBF; is a mem- 
bership function representing the partial degree of sat- 
isfaction of constraint A;(x;). Thus, it represents the 
closeness of the instance around the probe value for 
that particular attribute. For a given peer P;, we eval- 
uated the function $;,; = TGBF;(x;, j; ai, bi, Xi,q) along 
each potential attribute i. The values (a;, b;) are design 
choices manually initialized, and later refined by the 
EAs. Since we wanted the most similar instances to be 
the closest to the probe along all n attributes, we used 
a similarity measure defined as the intersection of the 
constraint-satisfaction values. Furthermore, to represent 
the different relevance that each criterion should have 
in the evaluation of similarity, we attached a weight w; 
to each attribute A;. Therefore, we extended the notion 
of a similarity measure between P; and the probe Q as 
a weighted minimum operator 


5; = minj_,{max|(1 — wy), §;, i]} 
= minj; {max| (1 — w;), 
TGBF, (xj, j an bi, Xi,0) |} 5 (41.4) 
where w; € [0, 1]. The set of values for the weights {w;} 
and parameters {(a;, b;)} are critical design choices that 
impact the proper selection of peers. 


(3) Local Models. The idea of creating a local model 
on demand can be traced back to memory-based ap- 
proaches [41.28, 29] and lazy learning [41.49]. Within 
this case study, we focused on the creation of local pre- 
dictive models used to forecast each unit’s remaining 
life. First, we used each local model to generate an es- 
timated value of the predicted variable. Then, we used 
an aggregation mechanism based on the similarities of 
the peers to determine the final output. 

The generation of local models can vary in com- 
plexity, depending on the task difficulty. In the first 
experiment, we used the Median operator as the local 
model, hence we did not need to define any parameter 

yj = Median [Di,j, Do, j, pa 


In the second experiment we used an exponential aver- 
age, requiring the definition of a forgetting factor w 


Yi = Drt 1.j = Dko), j = X Diqp,j 
+ (1—a@) x D= [where D;,; = Dj, j] . 
(41.6) 


(4) Aggregation. We needed to combine the indi- 
vidual outputs y; of the peers P;(Q) to generate the 
estimated output yg for the probe Q. median (for ex- 
periment I) or the prediction of the next availability 
duration, Dyext, o (for experiment II) for the probe Q. 
To this end, we computed the weighted average of the 
peers’ individual outputs using their normalized simi- 
larity to the probe as a weight, namely 


yo = Mediang = 


where y = Median [D,,;,D2,;,.... Di), ] 
for Exp. I 


woes Six Yj 


yor Dnext, o> 57 5, 
j=1%i 
for Exp. II. 


where y; = Dy) +1, (41.7) 
The entire process is summarized in Fig. 41.4, adapted 
from [41.11]. 


Structural and Parametric Tuning 
Given the critical design roles of the weights {w;}, the 
parameters {(a;,b;)}, and the forgetting factor œ, it was 
necessary to create a methodology to generate their best 
values according to our metric, i. e., classification preci- 
sion. After testing several manually created peer-based 
models, we decided to use evolutionary search to de- 
velop and maintain the fuzzy instance-based classifier, 
following a wrapper methodology detailed in [41.13]. 
In this application, however, we extended evolutionary 
to include structural search, via attribute selection and 
weighting [41.50], besides the parametric tuning. 

The EAs were composed of a population of indi- 
viduals, each of which containing a vector of elements 
that represented distinct tunable parameters within the 
FIM configuration. Examples of tunable parameters in- 
cluded the range of each parameter used to retrieve 
neighbor instances and the relative parameter weights 
used for similarity calculation. The EAs used two types 
of mutation operators (Gaussian and uniform), and no 
crossover. Its population (with 100 individuals) was 
evolved over 200 generations. 


Machine Learning Applications | 41.3 CI/ML Applications in Industrial Domains: Prognostics and Health Management (PHM) 795 


(a, b) [Parameters of TGBF;] 


= 


[Weights for attribute space] 


Weighted similarity Sji S; 
mil j I i 
TGBF; (xij; ai, Di, Xi) ia ane [max(1—wj), Sji] 
X Feature i similarity Weighted similarity 
i for peer uj for peer uj S, Normalized 
similarity 
Retrieyal Local models for peer uj 
| m (O OM) 
o| | 
| uz {OQ O(m) — fo 
| ux [O Ou) 
Aggregation 


[Constant for exponential average of history] 


Fig. 41.4 Description of Fuzzy instance-based models (FIM) aggregated by convex sum 


Each chromosome defined an instance of the at- 
tribute space used by the associated classifier by spec- 
ifying a vector of weights [w1,w2,...,wņn]. If wi € 
{0, 1}, we perform attribute selection, i.e., we select 
a crisp subset from the universe of potential attributes. 
If w; € {0, 1}, we perform attribute weighting, i. e., we 
define a fuzzy subset from the universe of potential at- 
tributes 


[wi, w2, see »Wrl[(ai, b1), (a2, b2), e.’ (an, b,)\[o] > 
(41.8) 


where 

© w; € (0, 1] for attribute weighting and 

© w; € {0, 1} for attribute selection 

@ n=Cardinarlity of universe of U,|U| = n 

@ d=}; w; (fuzzy) cardinality of selected features 
© (a;i, bi) = Parameters for GBF; 

© qa = Parameter for exponential average. 


The first part of the chromosome, containing the 
weights vector [w1, w2,..., Wn], defines the attribute 
space (the FIM structure) and the relevance of each 
attribute in evaluating similarity. The second part 
of the chromosome, containing the vector of pairs 
[(a1, b1), ... (aj, bi), ... (an, bn)] defines the parameter 


for retrieval and similarity evaluation. The last part 
of the chromosome, containing the parameter œ, de- 
fines the forgetting factor for the local models. The 
fitness function is computed using a wrapper ap- 
proach [41.50]. For each chromosome, represented 
by (41.8) we instantiated its corresponding FIM. Fol- 
lowing a leave-one-out approach, we used FIM to 
predict the expected life of the probe unit follow- 
ing the four steps described in the previous sub- 
section. We repeated this process for all units in 
the fleet and ranked them in decreasing order, us- 
ing their predicted duration Dyext, ọ. We then selected 
the top 20%. The fitness function of the chromosome 
was the precision of the classification, TP/(TP + FP), 
where TP is the count of True Positives and FP is 
the count of False Positives. This is illustrated in 
Fig. 41.5. 


Results. We used 18 months worth of data and per- 
formed the experiments at three different times, after 
6, 12, and 18 months, respectively. We wanted to test 
the adaptability of the learning techniques to environ- 
mental, operational, or maintenance changes. We also 
wanted to determine if their performance would im- 
prove over time with incremental data acquisition. For 
each start-up time, we used EAs to generate an opti- 


E'lh | d Hed 


796 Part D 


E'LH | d Hed 


Neural Networks 


Chromosome 
decoder 
Mutation 
Uniform mutation Best 
Gaussian mutation 5 == 
Original 


= 


Elitist 


i XML config i 
: file w 


Leave-one-out testing 
retrieval, similarity, local models: 
«weights, GBF parameters, a 


Retrieve nearest imi 
Similar 
objects 


neighbors 


Quartify 
s similarity 
Maintenance 
& utilization 
CB 


Object 
similarities & 
local models 


(best fom pop. i) 


Pop. (i) Fitness P(selection) 


Fitness function quantify quality of 
chromosome la 


Instance of 
FIBM 
engine 


Prediction 


Prediction-based 
selection of 
best units 


Fuzzy IBM 
decision 


f=TP/(TP + FP) 


Evolutionary algorithm 


Fig. 41.5 FRC optimization using EA 


mized weighted subset of attributes to define the peers 
of each unit. 


Experiment1: Retrospective Selection. The goal was 
to select the current best 20% units of the fleet based on 
their peers past performance. In this case, a random se- 
lection, which could be used as a baseline, would yield 
20%. However, the size of the fleet at each start-up time 
was different, ranging from 262 (after 6 months) to 634 
(after 12 months), to 845 (after 18 months.) We decided 
to keep the number of selected units constant (i. e., 


Selection perfomance (%) 
8 


—á— Evolved peers 
+@® -~ Non peer-heuristics 


10x better 
63.5 than random 


-@- Random 1.7 x better 
55.8 F than 
48.1 heuristics 
37 37 
32 o a ere eee ee > 
Pr 
OL 
20 = 
ee E 
SS gee tee 6 


-@ 


5 > 
(52 out of 262 units) (52 out of 634 units) (52 out of 845 units) 


Time slices 


Fig. 41.6 Dynamic models first experiment 


TP. || FP. 
FN | TN 


Fuzzy IBM evaluation 


52 units) over the three start-up times. Thus the baseline 
random selection for each start-up time was [20%-8 %-— 
6%], i.e., 52/262 = 20%; 52/634 = 8%; 52/845 = 
6%. 


Experiment 2: Prospective Selection. We wanted to 
select the future best 20% units for the next-pulse dura- 
tion. In this case, a random selection would yield 20%. 

The peers designed by the EAs provided the best 
accuracy overall: 


@ Experiment | (Retrospective selection): 
Precision = 63.5%, which was more than 10x bet- 
ter than random selection, and 1.7x better than 
existing heuristics. 

© Experiment 2 (Prospective selection): 
Precision = 55.0%, which was more than 2.5 x bet- 
ter than random selection, and 1.5x better than 
existing heuristics. 


Figures 41.6 and 41.7 illustrate the results of these 
two experiments. 

As mentioned in Sect. 41.1.2, successfully deployed 
intelligent systems must remain valid and accurate over 
time, while compensating for drifts and accounting for 
contextual changes that might otherwise render their 
design stale or obsolete. In this case study, we repeated 
the last set of experiments using dynamic and static 


Machine Learning Applications 


41.4 CI/ML Applications in Financial Domains: Risk Management 


models. The dynamic models were fresh models, rede- 
veloped at each time slice by using the methodology 
described. The static models were developed at time 
slice 1 and applied, unchanged, at time slices 2 and 3. In 
this experiment, the original models showed significant 
deterioration over time: 43% — 29% — 25%. In con- 
trast, the dynamic models exhibited robust, improved 
precision: 43% — 43% — 55%. This is illustrated in 
Fig. 41.8. 

This comparison shows the benefit of automated 
model updating. By using an offline metaheuristics 
such as EAs, we can automate model development 
and model re-tuning. This allows us to maintain model 
performance over time, through frequent updates, and 
avoid the obsolescence-driven model deterioration, 
which in this example occurred 1 year after the first 
deployment. A more detailed description of this case 
study can be found in [41.11, 12]. 


4.3.4 Health Management - 
Fault Accommodation 
and Optimization 


All the functions described in Sects. 41.3.1-41.3.3 
could be described as descriptive and predictive 
analytics, as they provide assessments and projec- 
tions of the system’s health state. These assessments 
lead to prescriptive analytics, as they determine the 
on-board control action and an off-board logistics, 
repair and planning actions. On-board control actions 
are usually focused on maintaining performance 
or safety margins, and are performed in real-time. 
Off-board maintenance/repair actions cover more 
offline decisions. They require a decision support 
system (DSS) performing multiobjective optimiza- 
tions, exploring Pareto frontiers of corrective actions, 
and combining them with preference aggregations to 
generate the best decision tradeoffs. The underline 
techniques are intelligent control for fault accom- 
modation [41.18] and multiobjective optimization 


Selection perfomance (%) 


654 
—å— Evolved peers 


60 ‘--: Non peer-heuristics } 2.5 x better 

55 |- --@- Random than random 

50 |. “> Own (time series) 1.5 x better 
—@— Own (median) F than 

45 42 heuristics 

40 

35 

30 

25 

20 

15 

10 z 


1 2) 3 
Time slices 


Fig. 41.7 Dynamic models second experiment 


Selection perfomance (%) 
A 


60 —á— Evolved dynamic peers 
55 +> Random 
50 - @- Evolved static peers 
45 The cost of 
not 
40 Tg . r= maintaining 
35) ET the models 
30 ES A 
25 29 n @25 
20 a eects O > 
15 20 20 20 
10 > 
1 2 3 


Time slices 


Fig. 41.8 Dynamic models versus static models in second 
ment 


techniques [41.51-54], aimed at minimizing the im- 
pact that maintenance and repairs event could cause to 
the profitable operation of the assets. For the sake of 
brevity, we will not provide a case study of optimiza- 
tion in the PHM domain, but we will present one in the 
financial domain (Sect. 41.4.3). 


41.4 CIIML Applications in Financial Domains: Risk Management 


Prognostics and health management of industrial assets 
bears a strong analogy with risk management of finan- 
cial and commercial assets. We have shown how unsu- 
pervised learning can be used to identify abnormal be- 
haviors, i.e., deviations from normal states/structures. 
In PHM, units in a fleet that stray away from nor- 
mal performance baselines are usually anomalies lead- 


ing to incipient failure modes. In financial domains, 
nonconforming user behaviors could be precursors to 
fraudulent transactions and could be identified using 
similar techniques. Similarly, supervised learning could 
be used to classify the root cause of an anomaly (diag- 
nostics) or to classify the risk class of an applicant for 
a financial/insurance product (risk classification). Re- 


experi- 


797 


H't | d Hed 


798 PartD 


Neural Networks 


li | d Hed 


gressions could be used to forecast the remaining useful 
life of an asset under future load assumptions, or to 
forecast the residual value of assets after their lease 
period (or to create an instant valuation for an asset, 
such as in mortgage collateral valuation). Multiobjec- 
tive optimization techniques could be used to balance 
production values with life erosion cost (or combustion 
efficiency with emissions) or to balance an investment 
portfolio using multiple metrics of returns and risk. We 
will illustrate this analogy with the following three case 
studies, in which we will describe the use of CI tech- 
niques in risk classification for insurance underwriting, 
residential property valuation, and portfolio rebalancing 
optimization. 


41.4.1 Automation 
of Insurance Underwriting: 
A Classification Problem 


Problem Definition 
In many transaction-oriented processes, human deci- 
sion makers evaluate new applications for a given 
service (mortgages, loans, credits, insurances, etc.) and 
assess their associated risk and price. The automa- 
tion of these business processes is likely to increase 
throughput and reliability while reducing risk. The 
success of these ventures is depends on the availabil- 
ity of generalized decision-making systems that are 
not just able to reliably replicate the human decision- 
making process, but can do so in an explainable, 
transparent fashion. Insurance underwriting is one such 
high-volume application domain where intelligent au- 
tomation can be highly beneficial, and reliability and 
transparency of decision-making are critical. Tradition- 
ally, highly trained individuals perform insurance un- 
derwriting. A given insurance application is compared 
against several standards put forward by the insurance 
company and classified into one of the risk categories 
(rate classes) available for the type of insurance re- 
quested. The risk categories then affect the premium 
paid by the applicant — the higher the risk category, the 
higher the premium. The accept/reject decision is also 
part of this risk classification, since risks above a cer- 
tain tolerance level set by the company will simply be 
rejected. 

There can be a large amount of variability in the 
underwriting process when performed by human un- 
derwriters. Typically the underwriting standards cannot 
cover all possible cases, and sometimes they might be 
ambiguous. The subjective judgment of the underwriter 
will almost always play a role in the process. Variation 


in factors such as underwriter training and experience, 
and a multitude of other effects can cause different un- 
derwriters to issue inconsistent decisions. Sometimes 
these decisions fall in a gray area not explicitly covered 
by the standards. In these cases, the underwriter uses 
his/her own experience to determine whether the stan- 
dards should be adjusted. Different underwriters could 
apply different assumption regarding the applicability 
of the adjustments, as they might use stricter or more 
liberal interpretations of the standards. 


CI-Based Approach 
To address these problems, we developed a system to 
automate the application placement process for cases 
of low or medium complexity. For more complex cases, 
the system provided the underwriter with an assist 
based on partial analysis and conclusions. 

We used a fuzzy-rule-based classifier (FRC) to 
capture the underwriting standards derived from the 
actuarial guidance. Then we tuned the FRC with an 
evolutionarily algorithms to determine the best FRC 
parameters to maximize precision and recall, wile min- 
imizing the cost of misclassification. The remaining of 
this section will summarize this solution. 


Fuzzy Rule-Based Classifier (FRC). The fuzzy-rule 
based classifier (FRC), which is briefly described 
in [41.13,14], uses rule sets to encode underwriting 
standards. Each rule set represents a set of fuzzy con- 
straints defining the boundaries between rate classes. 
These constraints were first determined from the un- 
derwriting guidelines. They were then refined using 
knowledge engineering sessions with expert underwrit- 
ers to identify factors such as blood pressure levels and 
cholesterol levels, which are critical in defining the ap- 
plicant’s risk and corresponding premium. The goal of 
the classifier is to assign an applicant to the most com- 
petitive rate class, providing that the applicant’s vital 
data meet all of the constraints of that particular rate 
class to a minimum degree of satisfaction. The con- 
straints for each rate class r are represented by n fuzzy 
sets: A¥(x;), i= 1,...,n. Each constraint A/(x;) can 
be interpreted as the degree of preference induced by 
value x; for satisfying constraint Aj (x;). After evaluat- 
ing all constraints, we compute two measures for each 
rate class r. The first one is the degree of intersection of 
all the constraints and measures the weakest constraint 
satisfaction 


I(r) = (Ai (xi) = minj_ Aj (x) - (41.9) 


i=l 


Machine Learning Applications 


This expression implies that each criterion has equal 
weight. If we want to attach a weight w; to each cri- 
terion A; we could use the weighted minimum operator: 


V(r) = (WAG) 
i=l 


= mini, (max((1 — w;), A7 @:))) , 


where w; € [0, 1]. The second one is a cumulative mea- 
sure of missing points (the complement of the average 
satisfaction of all constraints), and measures the overall 
tolerance allowed to each applicant, i. e., 


MP(r) = Y-A) =n (: - unas) 


i=1 i=1 


=n(1—A’). (41.10) 


The final classification is obtained by comparing the 
two measures, I(r) and MP(r) against two lower bounds 


Rate class: A 


Systolic blood pressure 


> 


150 A 160 170 


Cholesterol 


250 A 265 280 


AWeight/height 


A6% 8% 10% 


T 


Aggregation of partial 
satisfaction of constraints 


Degree of placement = 0.8 


I(Z)=0.8 MP(Z)=0.3 


Fig. 41.9 Example of three fuzzy constraints for rate 
class Z 


defined by thresholds tı and t2. The parametric defini- 
tion of each fuzzy constraint A; (x;) and the values of t1 
and tz are design parameters that were initialized with 
knowledge engineering sessions. 

Figure 41.9 — adapted from [41.13] — illustrates an 
example of three constraints (trapezoidal membership 
functions) associated with rate class Z, the input data 
corresponding to an application, and the evaluation of 
the first measure, indicating the weakest degree of sat- 
isfaction of all constraints. 


Optimization of Design Parameters of the FRC Clas- 
sifier. The FRC design parameters were tuned, moni- 
tored, and maintained to assure the classifier’s optimal 
performance. To this end, we used EAs, composed 
of a population of chromosomes. Each chromosome 
contained a vector of elements that represent distinct 
tunable parameters to configure the FRC classifier, i. e., 
the parametric definition of the fuzzy constraints Aj (x;) 
and thresholds tı and t2. 

A chromosome, the genotypic representation of 
a model, defines the complete parametric configuration 
of the classifier. Thus, an instance of such classifier 
can be created for each chromosome, as shown in 
Fig. 41.10. Each chromosome c;, of population P(t) 
(left-hand side of Fig. 41.10), goes through a decod- 
ing process to allow them to create the classifier on the 
right. Each classifier is then tested on all the cases in the 
case base, assigning a rate class to each case. We can 
determine the quality of the configuration encoded by 
the chromosome, i.e., the fitness of the chromosome, 
by analyzing the results of the test. Our EA uses two 
types of mutations (uniform and Gaussian) to produce 
new individuals in the population by randomly vary- 
ing parameters of a single chromosome. The more fit 
chromosomes in generation ¢ will be more likely to be 
selected for this and pass their genetic material to the 
next generation t+ 1. Analogously, the less fit solutions 
will be culled from the population. At the conclusion of 
the EAs execution the best chromosome of the last gen- 
eration determines the classifier’s configuration. Note 
the similarity between Figs. 41.10 and 41.5, which un- 
derlies the similar role that EAs play as the offline MHs 
to design the best fuzzy classifier (in this case study), or 
the best FIM (in the case of the second case study). 


Standard Reference Dataset (SRD). To test and tune 
the classifiers, we needed to establish a benchmark. 
Therefore, we generated a standard reference dataset 
(SRD) of approximately 3000 cases taken from a strat- 
ified random sample of the historical case population. 


41.4 CI/ML Applications in Financial Domains: Risk Management 799 


li | d Hed 


800 Part D 


Neural Networks 


li | d Hed 


Chromosome 
decoder 


Mutation 


Uniform mutation Best 
Gaussian mutation 5 == 


Original 


= 


Elitist 


i XML config i 
: file r 


`“. KB parameters 


Standard 
reference 
data set 


(best fom pop. i) 


Pop. (i) Fitness P(selection) 


Fitness function: 
quantity quality of chromosome 


Comparison | | | | DU decisions | 
Matrix M(TxT) | PB | P | S | Ste | St | Tab2 


ip 


T 
fdndy) = YX pli, j) MG, j) 


i=l j=1,j+i 


Evolutionary algorithm 


Fig. 41.10 FRC optimization using EA 


Each of these cases received a rate class decision when 
it was originally underwritten. To reduce variability in 
these decisions, a team of experienced underwriters per- 
formed a blind review of selected cases to determine 
the standard reference decisions. These cases were then 
used to create and optimize the FRC model. 


Fitness Function. In classification problems such as 
this one, we can use two matrices to construct the fit- 
ness function that we want to optimize. The first matrix 
is a TxT confusion matrix M that contains frequencies 
of correct and incorrect classifications for all possi- 
ble combinations of the standard reference decisions 
(SRDs), which represent ground truth rate class de- 
cisions as reached by consensus among senior expert 
underwriters for a set of insurance applications and 
classifier decisions. The frequencies of correct classi- 
fications can be found on the main diagonal of matrix 
M. The first (T — 1) columns represent the rate classes 
available to the classifier. Column T represents the clas- 
sifier’s choice of not assigning any rate class, sending 
the case to a human underwriter. The same ordering is 
used to sort the rows for the SRD. The second matrix is 
a T xT penalty matrix P that contains the value loss due 
to misclassification. The entries in the penalty matrix P 
are zero or negative values. They were computed from 
actuarial data showing the net present value (NPV) for 


Fuzzy rule evaluation 


each entry (j,k). The penalty value P(j, k) was the differ- 
ence between the NPV of the entry (j, kj) and the highest 
NPV -— corresponding to the correct entry (j, j), located 
on the main diagonal. The fitness function f combined 
the values of M, resulted from a test run of the clas- 
sifier configured with chromosome c;, with the penalty 
matrix P to produce a single value 


T 


F 
fle) =} J MGk) * PGA). 


j=1 k=1 


(41.11) 


Function f represents the expected value loss for that 
chromosome computed over the SRD and is the fitness 
function used to drive the evolutionary search. 


Results 
Testing and Validation of FRC. We defined Cover- 
age as the percentage of cases as a fraction of the total 
number of input cases; Relative accuracy as the percent- 
age of correct decisions on those cases that were not 
referred to the human underwriter; Global accuracy as 
the percentage of correct decisions, including making 
correct rate class decisions and making a correct deci- 
sion to refer cases to human underwriters as a fraction 
of total input cases. Then we performed a comparison 
against the SRD. The results, reported in [41.13], show 


Machine Learning Applications 


41.4 CI/ML Applications in Financial Domains: Risk Management 


Table 41.2 Typical performance of the un-tuned and tuned rule-based decision system (FRC) 


Metrics Initial parameters based on written Best knowledge engineered parameters Optimized parameters 
guidelines (%) 

Coverage 94.01 90.38 91.71 

Relative accuracy 75.92 92199 9552 

Global accuracy 74.75 90.07 93.63 


Table 41.3 Average FRC performance over 5 tuning case sets compared to five disjoint test sets 


Metrics Average performance on training sets 
Coverage 91.81 
Relative accuracy 94.52 
Global accuracy 92.74 


aremarkable improvement in all measures. Specifically, 
we obtained the following results: 

Using the initial parameters (first column of Ta- 
ble 41.2) we can observe a large moderate Cover- 
age (94%) associated with a low relative accuracy 
(76%) and a lower global accuracy (+~75%). These 
performance values are the result of applying a strict 
interpretation of the underwriter (UW) guidelines, with- 
out allowing for any tolerance. Had we implemented 
such crisp rules with a traditional rule-based system, 
we would have obtained these results. This strictness 
would prevent the insurer from being price competitive, 
and would not represent the typical modus operandi 
of human underwriters. However, by allowing each 
underwriter to use his/her own interpretation of such 
guidelines, we could introduce large underwriters’ vari- 
ability. One of our main goals was to provide a uniform 
interpretation, while still allowing for some tolerance. 
This goal is addressed in the second column of Ta- 
ble 41.2, which shows the results of performing knowl- 
edge engineering and encoding the desired tradeoff 
between risk and price competitiveness as fuzzy con- 
straints with preference semantics. This intermediate 
stage shows a different tradeoff since both Global and 
relative accuracy have improved. Coverage slightly de- 
creases (90%) for a considerable gain in relative 
accuracy (93%). Although we obtained this initial pa- 
rameter set by interviewing the experts, we had no guar- 
antee that such parameters were optimal. Therefore, we 
used EAs to tune them. We allowed the parameters to 
move within a predefined range centered on their initial 
values and, using the SRD and the fitness function de- 
scribed above, we obtained an optimized parameter set, 
whose results are described in the third column of Ta- 
ble 41.2. The results of the optimization show the point 
corresponding to the final parameter set dominates the 
second set point (in a Pareto sense), since both cover- 


Average performance on disjoint test sets 
91.80 
93.60 
91.60 


age and relative accuracy were improved. Finally, we 
can observe that the final metric, global accuracy (last 
row in Table 41.2), improves monotonically as we move 
from using the strict interpretation of the guidelines 
(75%), through the knowledge-engineered parame- 
ters (~90%), to the optimized parameters (~94%). 

While the reported performance of the optimized 
parameters, shown in Table 41.2, is typical of the 
performance achieved through the optimization, a five- 
fold cross-validation on the optimization was also per- 
formed to identify stable parameters in the design space 
and stable metrics in the performance space. This is 
shown in Table 41.3. 

With this kind of automation, variability of the risk 
category decision was greatly reduced. This also elim- 
inated a source of risk exposure for the company, al- 
lowing it to operate more competitively and profitably. 
The intelligent automation process, capable of deter- 
mining its applicability for each new case, increased 
underwriting capacity by enabling them to handle larger 
volume of applications. Additional information on this 
approach can be found in [41.13, 14]. 


41.4.2 Mortgage Collateral Valuation: 
A Regression Problem 


Problem Definition 

Residential property valuation is the process of deter- 
mining a dollar estimate of the property value for given 
market conditions. Within this case study, we will re- 
strict ourselves to a single-family residence designed or 
intended for owner occupancy. The value of a property 
changes with market conditions, so any estimate of its 
value must be periodically updated to reflect those mar- 
ket changes. Any valuation must also be supported by 
current evidence of market conditions, e.g., recent real 
estate transactions. 


801 


li | d Hed 


802 


li | d Hed 


Neural Networks 


Done before use 


ae Sales price 5 
ee, TE ie 
living area Sift 


Latitude 


Address —>| Geocoding Longitude 


Fig. 41.11 Locational Value method 
(LOCVAL) 


When used for estimating 


$i Deviation 


Latitude 


Address —>| Geocoding 


Longitude 


Subject 
property 


Living area >) 


M 


$ value Reliability 


Current manual process for estimating the value of 
properties usually requires an on-site visit by a human 
appraiser. This process is slow and expensive for batch 
applications such as those used by banks for updat- 
ing their loan and insurance portfolios, verifying risk 
profiles of servicing rights, or evaluating default risks 
for securitized packages of mortgages. The appraisal 
process for these batch applications is currently es- 
timated, to a lesser degree of accuracy, by sampling 
techniques. Secondary buyers and mortgage insurers 
may also require verification of property value on in- 
dividual transactions. Some of the applications also 
require that the output be qualified by a reliability mea- 
sure and some justification, so that questionable data 
and unusual circumstances can be flagged for the hu- 
man who uses the output of the system. Thus, the 
automation of residential property was motivated by 
a broad spectrum of application areas. 

The most common and credible method used by 
appraisers is the sales comparables approach. This 
method consists of finding comparable cases, i. e., re- 
cent sales that are comparable to the subject property 
(using sales records); contrasting the subject property 
with the comparables; adjusting the comparables’ sales 
price to reflect their differences from the subject prop- 
erty (using heuristics and personal experience); and 
reconciling the comparables adjusted sales prices to de- 
rive an estimate for the subject property (using any 
reasonable averaging method). This process assumes 
that the item’s market value can be derived by the prices 
demanded by similar items in the same market. 


Cl-Based Approach: 

LOCVAL, AIGEN, AICOMP, Fusion 
To automate the valuation process, we developed a pro- 
gram that combined the result of three independent 
estimators. The first one, locational value (LOCVAL), 
was a coarse estimator based on the locational value 
of the property. The second one, generative AI model 
(AIGEN), was a generative estimator based on neuro- 
fuzzy networks that only used five features from our 
training set. The third one, comparable based AI model 
(AICOMP), was a fuzzy case-based reasoned that fol- 
lowed the comparable-based approach of the appraisers. 
Finally, we fused the output of the estimators into a sin- 
gle estimate and reliability value. 


Locational value model (LOCVAL). The first model 
was based solely on two features of the property: 
its location, expressed by a valid, geocoded address, 
and its living area, as shown in Fig. 41.11 (adapted 
from [41.15].) 

A dollar per square foot measure was constructed for 
each point in the county, by suitably averaging the ob- 
served, filtered historical market values in the vicinity 
of that point. This locational value estimator (LOCVAL) 
produced two output values: Locational_Value (a $/ft 
estimate) and Deviation_from_prevailing_value. The 
local averaging was done by an exponentially decreas- 
ing radial basis function with a space constant of 
0.15—0.2 miles. It could be described as the weighted 
sum of radial basis functions (all of the same width), 
each situated at the site of a sale within the past 1-year 


Machine Learning Applications 


41.4 CI/ML Applications in Financial Domains: Risk Management 


and having amplitude equal to the sales price. Devia- 
tion from prevailing value was the standard deviation for 
houses within the area covered and was derived using 
a similar approach. The output of LOCVAL was a coarse 
estimate of the property value, which was used as an in- 
put for the generative approach (AIGEN). 


AIGEN: Fuzzy-Neural Network. The generative AI 
model (AIGEN) relied on a fuzzy-neural net that, after 
a training phase, provided an estimate of the subject’s 
value. The specific model was an extension of AN- 
FIS [41.55], which implemented a fuzzy system as 
a five-layer neural network so that the structure of the 
net could be interpreted in terms of high-level rules. 
The extension developed allowed the output to be lin- 
ear functions of variables that did not necessarily occur 
in the input. In this fashion, we achieved more fidelity 
with the local models (linear functions) without incur- 
ring in the computational complexity caused by a large 
number of inputs. AIGEN inputs were five property 
features (total_rooms, num_bedrooms, num_baths, liv- 
ing_area, and lot_size) and the output of LOCVAL 
(locational_value). 


AICOMP: Fuzzy Case-Based Reasoner (CBR). 
AICOMP is a fuzzy CBR system that used fuzzy pred- 
icates and fuzzy-logic-based similarity measures to 
estimate the value of residential property. This process 
consisted of selecting relevant cases (which would be 
nearby house sales), adapting them, and aggregating 
those adapted cases into a single estimate of the 
property value. AICOMP followed a process similar 
to the sales comparison used by certified appraisers to 
estimate a residential property’s value. This approach, 
which is further described in [41.56], consisted of: 


(1) Retrieving recent sales from a case base. Upon 
entering the subject property attributes, AICOMP 
retrieves potentially similar comparables from the 
case-base. This initial selection uses six attributes: 
address, date of sale, living area, lot area, number 
of bathrooms, and bedrooms. 

(2) Comparing the subject property with the retrieved 
cases. The comparables are rated and ranked on 
a similarity scale to identify the most similar ones 
to the subject property. This rating is obtained 
from a weighted aggregation of the decision maker 
preferences, expressed as fuzzy membership distri- 
butions and relations. 

(3) Adjusting the sales price of the retrieved cases. 
Each property’s sales price is adjusted to reflect 


their differences from the subject property. These 
adjustments are performed by a rule set that uses 
additional property attributes, such as construction 
quality, conditions, pools, fireplaces, etc. 

(4) Aggregating the adjusted sales prices of the re- 
trieved cases. The best four to eight comparables 
are selected. The adjusted sales price and similarity 
of the selected properties are combined to produce 
an estimate of the subject value with an associated 
reliability value. 


Fusion. Each model produced a property value and 
an associated reliability value. The latter was a func- 
tion of the typicality of the subject property based on 
its physical characteristics (such as lot size, living area, 
and total room). These typical values were represented 
by possibilistic distributions (fuzzy sets). We computed 
the degree to which each property satisfied each crite- 
rion. The overall property value reliability was obtained 
by considering the conjunction of these constraint satis- 
factions (i. e., the minimum of the individual reliability 
values). 

The computation times, required inputs, errors, and 
reliability values for these three methods are shown in 
Fig. 41.12. The locational value (LOCVAL) model took 
the least time and information, but produced the largest 
error. The CBR approach (AICOMP) took the largest 
time and number of inputs, but produced the lowest 
error. 

The fusion of the three estimators exhibited several 
advantages: 


@ The fusion process provided an indication of the re- 
liability in the final estimate: 
— If reliability was high, the fused estimate was 
more accurate than any of the individual ones 
— If reliability was limited, the system generated 
an explanation in human terms 
@ The fused estimates were more robust. 


These characteristics allowed the user to determine 
the suitability of the estimate within the given busi- 
ness application context. Knowledge-based rules were 
used for constructing this fusion at a supervisory level, 
and the few parameters were determined manually, by 
inspection and experimentation. A more detailed de- 
scription of this process can be found in [41.15]. This 
case study was the oldest case study to use model en- 
semble and fusion. However, the use of metaheuristics 
to guide the design phase at that time was still not well 
understood as in the more recent case studies. 


803 


li | d Hed 


804 Part D | Neural Networks 


Li | d Hed 


peed Required | Estimator | Relative | reat 
at run time inputs error 


10-12% 
1-9 % | 


Fused $ estimate 


Address 
Living_areal 


AIGEN —> $$ 
Bedrooms 
Baths 
Total_rooms 
Lot_size 
=l 
AICOMP —> $$ 
m 
AICOMP can use the following optional 
High property attributes if available: age, FUSION 
eff_age, quality, condition, fireplaces, 


pool, air_cond, and heating 


Low 


LOCVAL | $$ 


Results 
The reliability values generated by the fusion were di- 
vided into three classes, labeled good, fair, and poor. 
From a test sample of 7293 properties, 63% were clas- 
sified as good, with a median absolute error of 5.4% (an 
error that was satisfactory for the intended application.) 
Of the remaining subjects, 24% were classified as fair, 
and 13% as poor. The fair set had a medium error of 
7.7%, and the poor set had a median error of 11.8%. 

The reliability computation and the fusion increased 
the robustness and usefulness of the system, which 
achieved good accuracy and was scalable for thou- 
sands of automated transactions. This approach made it 
a transparent, interpretable, fast, and inexpensive choice 
for bulk estimates of residential property value for a va- 
riety of financial applications. 


4.4.3 Portfolio Rebalancing: 
An Optimization Problem 


Problem Definition 
The goal of portfolio optimization is to manage risk 
through diversification and obtain an optimal risk- 
return tradeoff. In this case study, we address portfolio 
optimization within the context of an asset-liability 
management (ALM) application. The goal was to find 
the optimal allocation of available financial resources 
to a diversified portfolio of 1500+ long and short-term 
financial assets, in accordance with risk, liability, and 
regulatory constraints. 

To characterize the investor’s risk objectives and 
capture the potential risk-return tradeoffs, we used var- 


Fig. 41.12 Data comparison of multi- 
ple approaches 


ious measures to quantify different aspects of portfolio 
risk. For ALM applications, a typical measure of risk is 
surplus variance. We computed portfolio variance us- 
ing an analytical method based on a multifactor risk 
framework. In this framework, the value of a secu- 
rity can be characterized as a function of multiple 
underlying risk factors. The change in the value of 
a security can be approximated by the changes in the 
risk factor values and risk sensitivities to these risk 
factors. The portfolio variance equation can be de- 
rived analytically from the underlying value change 
function. 

In ALM applications, the portfolios have assets and 
liabilities that are affected by the changes in com- 
mon risk factors. Since a majority of the assets are 
fixed-income securities, the dominant risk factors are 
interest rates. In ALM applications, in addition to 
maximizing return or minimizing risk, portfolio man- 
agers are constrained to match the characteristics of 
asset portfolios with those of the corresponding lia- 
bilities to preserve portfolio surplus due to interest 
rate changes. Therefore, the ALM portfolio optimiza- 
tion problem formulation has additional linear con- 
straints that match the asset-liability characteristics 
when compared with the traditional Markowitz model. 
We use the following ALM portfolio optimization 
formulation 


Maximize Portfolio expected return 
Minimize Surplus variance 


Minimize Portfolio value at risk (41.12) 


Machine Learning Applications 


41.4 CI/ML Applications in Financial Domains: Risk Management 


Subject to: 


Duration mismatch < target, 
Convexity mismatch < target; and 


Linear portfolio investment constraints. 


To measure the three objectives in (41.12), namely 
portfolio expected return, surplus variance, and port- 
folio value at risk, we used book yield, portfo- 
lio variance, and simplified value at risk (SVaR), 
respectively. These metrics are defined as fol- 
lows: 


© Portfolio book yield represents its accounting yield 
to maturity and is defined as 


>>; BookValue; x Book Yield 
>>; Book Value; 


Book Yieldp = 


(41.13) 


© Portfolio variance is a measure of its variability 

and is defined as the second moment of its value 
change AV 

o? = E|(AV)?| —E[(AV)?° (41.14) 

© Portfolio simplified value at risk is a complex mea- 


sure of the portfolio’s catastrophic risk and is de- 
scribed in details in [41.57]. 


These metrics define the 3D optimization space. 
Now, let us analyze its constraints. The change in the 
value AV of a security can be approximated by a sec- 
ond order Taylor series expansion given by 


(41.15) 


1 m m @V 
Fo BS (sam) AFiAF,. 


The first- and second-order partial derivatives in (41.15) 
are the risk sensitivities, i.e., the change in the se- 
curity value with respect to the change in the risk 
factors F;. These two terms are typically called delta 
and gamma, respectively [41.58]. For fixed-income se- 
curities, these measures are duration and convexity. 
The duration and convexity mismatches, which con- 
strain our optimization space in (41.12), are the absolute 


values of the differences between the effective dura- 
tions and convexities of the assets and liabilities in the 
portfolio, respectively. Though they are nonlinear (be- 
cause of the absolute value function), the constraints 
can easily be made linear by replacing each of them 
with two new constraints that each ensure that the 
actual value of the mismatch is less than the target 
mismatch and greater than the negative of the target 
mismatch, respectively. The other portfolio investment 
constraints include asset-sourcing constraints that im- 
pose a maximum limit on each asset class or secu- 
rity, overall portfolio credit quality, and other linear 
constraints. 


Cl-Based Approach 
Given the explicit need for customization and hy- 
bridization in methods for portfolio optimization, we 
could not find an existing multiobjective optimization 
algorithms could be applied without extensive modifi- 
cations. Specifically, the requirement to optimize while 
satisfying a large number of linear constraints excluded 
the ready application of prior evolutionary multiobjec- 
tive optimization approaches. This aspect was a princi- 
pal motivation to develop a novel hybrid techniques. 

Figure 41.13 illustrates the process used to drive 
the search for the efficient frontier. The process con- 
sisted of three steps, corresponding to the three boxes 
in Fig. 41.13. The first step (box 1 in Fig. 41.13) was 
the generation of the Pareto front. It consisted of: 


(a) Initializing the population of candidate portfolios 
using a randomized linear programming (RLP) 

(b) Generating an interim Pareto front with a Pareto 
sorting evolutionary algorithm (PSEA) 

(c) Completing gaps in the Pareto front with a target 
objective genetic algorithm (TOGA) and 

(d) Storing the results in a repository. After many runs, 
we filtered the repository with an efficient domi- 
nance filter and generated the first efficient frontier. 


The second step (box 2 in Fig. 41.13) was the in- 
teractive densification of the Pareto front. This richly 
sampled Pareto front was analyzed for possible gaps, 
and augmented with a last run of TOGA, leading to the 
generation of the second efficient frontier. Each point 
in this front represented a nondominated solution, i. e., a 
viable portfolio. The third step (box 3 in Fig. 41.13) was 
the portfolio selection. We needed to incorporate the de- 
cision maker’s preferences in the return-risk tradeoff. 
Our goal is to reduce the large number of viable so- 


805 


li | d Hed 


806 PartD 


Neural Networks 


li | d Hed 


Randomized linear 


Fig. 41.13 Portfolio optimization 
process/workflow 


Identification of p 
search space [DIELS Seo Solutions/population 
solutions/population : 
boundary ; archive 
generation 
ea Run PSEA (Pareto Run TOGA (Target | | 
i Eia —> sorting evolutionary objectives genetic ! 
E algorithm) algorithm) 
' Y Y ' 
' Multiple runs Interim PSEA Interim TOGA ' 
! efficient frontier efficient frontier ! 
ti 
ones Efficient Dominance 
front frontier filtering efficient frontiers 


Interactive Rene ; 
densification of Visualization and Run Efficient 
Pareto front targeting TOGA frontier 


Selected 
portfolios 


Portfolio 
selection 


Interactive portfolio 
down-selection 


lutions into a much smaller subset that could then be 
further analyzed for a final portfolio selection. 

We will briefly describe the major components of 
this process. 


Randomized Linear Programming (RLP). The key 
challenge in solving the portfolio optimization problem 
was presented by the large number of linear allocation 
constraints. The feasible space defined by these con- 
straints is a high dimensional real-valued space (1500+ 
dimensions), and is a highly compact convex poly- 
tope, making for an enormously challenging constraint 
satisfaction problem. We leveraged our knowledge on 
the geometrical nature of the feasible space by de- 
signing a randomized linear programming algorithm 
that robustly sampled the boundary vertices of the 
convex feasible space. These extremity samples were 
seeded in the initial population of the PSEA and were 
exclusively used by the evolutionary multiobjective al- 
gorithm to generate interior points (via interpolative 
convex crossover) that were always geometrically fea- 
sible. This was similar in principle to the preprocess 
phase, proposed by Kubalik and Lazansky [41.59]. 


Pareto Sorting Evolutionary Algorithm (PSEA). We 
developed a Pareto sorting evolutionary algorithm 
(PSEA) that was able to robustly identify the Pareto 


front of optimal portfolios defined over a space of re- 
turns and risks. The algorithm used a secondary storage 
and maintains the diversity of the population by using 
a convex crossover operator, incorporating new random 
solutions in each generation of the search; and using 
a noncrowding filter. Given the reliance of the PSEA on 
the continuous identification of nondominated points, 
we developed a fast dominance filter to implement this 
function very efficiently. 


Target Objectives Genetic Algorithm (TOGA). We 
further enhanced the quality of the Pareto front by 
using a target objectives genetic algorithm (TOGA), 
a non-Pareto nonaggregating function approach to mul- 
tiobjective optimization. Unlike the PSEA, which was 
driven by the concept of dominance, the TOGA found 
solutions that were as close as possible to a predefined 
target for one or more criterion [41.60]. We used this to 
fill potential gaps in the Pareto front. 


Decision Maker Preferences. We incorporated the 
decision-maker’s preferences in the return-risk tradeoff 
to perform our selection. The goal was to reduce thou- 
sands of nondominated solutions into a much smaller 
subset (of +10 points), which could be further analyzed 
for a final portfolio selection. After obtaining a 3D 
Pareto front, we augmented this space with three ad- 


Machine Learning Applications | 41.5 Model Ensembles and Fusion 


ditional metrics, to reflect additional constraints for use 
in the tradeoff process. This augmented 6D space was 
used for the down-selection problem. To incorporate 
progressive ordinal preferences, we used a graphical 
tool to visualize 2D projections of the Pareto front. Af- 
ter applying a set of constraints to further refine the 
best region, we used an ordinal preference, defined by 
the order in which we visited and executed limited, lo- 
cal tradeoffs in each of the available 2D projections of 
the Pareto front. In this approach, the decision maker 
could understand the available space of options and 


41.5 Model Ensembles and Fusion 


Over the last decade, we have witnessed an emerging 
trend favoring the use of model ensembles over indi- 
vidual models. The elements of these ensembles are 
object-level models, the fusion mechanism is an online 
MHs, and their overall design is guided by offline MHs, 
as discussed in Sect. 41.1. 


41.5.1 Motivations for Model Ensembles 


This trend is driven by the improved performance 
obtained by ensembles. By fusing the outputs of an 
ensemble of diverse predictive models, we boost the 
overall prediction accuracy while reducing the variance. 
Fumera and Roli et al. [41.61] confirmed theoretically 
the claims of Dietterich [41.62]. They proved that av- 
eraging of classifiers outputs guarantees a better test 
set performance than the worst classifier of the en- 
semble. Moreover, under specific hypotheses, such as 
linear combiners of individual classifiers with unbiased 
and uncorrelated errors, the fusion of multiple classi- 
fiers can improve the performance of the best individual 
classifiers. Under ideal circumstances (e.g., with an in- 
finite number of classifiers) the fusion can provide the 
optimal Bayes classifier [41.63]. All this is possible if 
individual classifiers make different errors (diversity), 
as we will discuss in Sect. 41.5.3. 

There is also a computational motivation for us- 
ing model ensembles. Many learning algorithms are 
based on local search and suffer from the problem 
of local minima, which is usually resolved by mul- 
tiple independent initializations. In other cases, gen- 
erating the optimal training might be computation- 
ally hard even with enough training data. The fusion 
of multiple classifiers trained from different starting 
points or training sets can better approximate the 


the costs/benefits of the available tradeoffs. The use 
of progressive preference elicitation provided a natural 
mechanism to identify a small number of the good so- 
lutions. 


Results. The optimization process was successfully 
tested on large portfolios of fixed-income base secu- 
rities — each portfolio involving over fifteen hundred 
financial assets, and investment decisions of several bil- 
lion dollars. For a more complete description of this 
application refer to [41.16]. 


optimal classifier at a fraction of the computational 
cost. 


41.5.2 Construction of Model Ensembles 


The ensemble construction requires the creation of base 
models, an ensemble topology, and a fusion mecha- 
nism. Let us briefly review these concepts. 


Base Models. Base models are the elements to be 
fused — they are the object-level models discussed in 
Sect. 41.1.1. They need to be diverse, e.g., they need to 
have low error correlations. They could differ in their 
parameters, and/or in their structure, and/or in the ML 
techniques used to create them. The process for inject- 
ing diversity in their design is described in Sect. 41.5.3. 


Topology. The ensemble can be constructed by fol- 
lowing a parallel or serial topology (or in some cases, 
a hybrid one). The most common topology is the par- 
allel one, in which multiple models are fed the same 
inputs and their outputs are merged by the fusion mech- 
anism. In the serial topology, the models are applied 
sequentially (as in the case when we first use a primary 
model, and in case of it failing to accept a pattern, a sec- 
ondary model is used to attempt a classification). 


Fusion Mechanism. We divide the fusion mecha- 
nisms based on two criteria: (a) the type of aggregation 
that they perform; (b) the dependency on their inputs. 
The former is concerned with the regions of compe- 
tence of the base models to be aggregated and divides 
the fusion mechanisms into: selection, interpolation, 
and integration. The latter is concerned with the depen- 
dency of the meta-model (fusion) on the inputs to the 


807 


Sly | d Hed 


808 Part D 


Neural Networks 


Sly | d Hed 


ensemble and divides the fusion mechanisms into static 
and dynamic ones. 

Based on the first criterion, we have the following 
types of fusion mechanisms: 


@ Selection — used to fuse disjoint, complementary 
models. In this case, the base models were trained 
on disjoint regions of the feature space and, for each 
pattern, just one model is responsible for the final 
decision. Selection determines a binary relevance 
weight of each complementary model (where all but 
one of the weights are zero). This mechanism is typ- 
ically used in hierarchical control systems, in which 
a supervisory controller (meta controller) selects the 
most appropriate low-level controller for any given 
state. Another example of this mechanism is the use 
of decision trees [41.26], in which the leaf node 
reached by the input/state determines the selected 
model. 

© Interpolation — used to fuse overlapping comple- 
mentary models. In this case, the base models were 
trained on different but overlapping regions of the 
feature space and, for each state, a subset of models 
is responsible for the final decision. Interpolation 
determines a fuzzy relevance weight of each com- 
plementary model (where the weights are in the 
[0, 1] interval and they are usually normalized to 
add up to 1). By interpolating rather than switch- 
ing between models, we introduce smoothness in 
the response surface induced by the ensemble. This 
interpolation mechanism is typical of hierarchical 
fuzzy systems, usually found in fuzzy control appli- 
cations [41.36, 55]. 

© Integration, — used to fuse competitive models. 
In this case, all base models were trained on 
the same feature space and, for each input, all 
models contribute to the final decision accord- 
ing to their relevance weight. Integration deter- 
mines the relevance weight of each competitive 
model. 


Based on the second criterion (input dependency of 
the meta model), we have the following fusion mecha- 
nisms: 


@ Static fusion. The relevance weights are determined 
in a batch mode by a static fusion meta-model (on- 
line MHs). The mechanism is applied uniformly 
to all inputs. This is the typical case of alge- 
braic expressions used to compute the relevance 
weights [41.64]. 


© Dynamic fusion. The relevance weights are deter- 
mined at run time, by a dynamic fusion meta-model 
(online MHs). The weights vary according to the in- 
puts. This is the typical case of dynamic systems 
used to compute the relevance weights [41.36—38]. 


As noted by Roli et al. [41.65], the design of a suc- 
cessful fusion system consists of three parts: design of 
the individual object-level models, selection of a set of 
diverse models, and design of the fusion mechanism. 
The operating word is diverse, where model diversity 
is defined by low correlation among the object-level 
model errors. In other words, these models should 
be as accurate as possible while avoiding coincident 
errors. This concept is described by Kuncheva and 
Whitaker [41.66], where the authors propose four pair- 
wise and six nonpairwise diversity measures to deter- 
mine the models difference. A more complete treatment 
of this topic can be found in [41.67]. 


41.5.3 Creating Diversity 
in the Model Ensembles 


Let us consider a model as a mapping from an n- 
dimensional feature space F to a k-dimensional output 
space Y. The model training dataset could be repre- 
sented as a flat file, in which each row is a point in the 
cross-product F x Y and each column represent a coor- 
dinate dimension for such points (either in the feature 
space F or in the output space Y.) 

Among the many approaches for injecting diversity 
in the creation of an ensemble of models, we find bag- 
ging, boosting, random subspace, randomization, and 
random forest. Some of these approaches subsample the 
rows of the training set (points or examples), some other 
ones subsample the columns (features), and a few do 
both. Let us review some of these approaches in chrono- 
logical order. 

Bootstrap [41.68-70] or bagging [41.71] is ar- 
guably the oldest techniques for creating an ensemble 
of models. In this approach, diversity is obtained by 
building each model with a different set of examples, 
which are obtained from the original training dataset 
by resampling the rows with replacement (using a uni- 
form probability distribution over the rows). Bagging 
combines the decisions of the classifiers using uniform- 
weighted voting. For each new training dataset, we must 
maintain the same number of rows as in the original 
training dataset, by sampling it that many times. Sam- 
pling with replacement leads ~63.2% unique rows. 
Sampling with replacement creates a series of inde- 


Machine Learning Applications | 41.5 Model Ensembles and Fusion 


pendent Bernoulli trials, so the number of times a row 
is sampled from k trials out of N rows is B(k, 1/N). 
For large values of N, the Bernoulli series can be ap- 
proximated by a Poisson distribution with mean (k/N). 
Therefore, the proportion of rows not sampled will be 
approximately e~*/", In bootstrap, the number of sam- 
ples is equal to the number of rows, i. e., k = N, so the 
Poisson approximation has a mean 1 and the proportion 
of sampled data is (1—e7*/") = (1—e7!) = 63.2%. 
As a result, we can achieve the same storage reduc- 
tion by not duplicating the same rows and instead attach 
a count at the end of each sampled record to indicate 
the number of time it was selected. An interesting vari- 
ation of this concept is the Bag of Little Bootstraps 
(BLBs) [41.72], which modifies the bootstrap approach 
to be usable with much larger data sets (where 63.2% 
of the original data would still be prohibitively large). 
Their proposed BLB approach performs a more drastic 
subsampling while maintaining the unbiased estimation 
and convergence rate of the original bootstrap method. 

An alternative to bagging is boosting, which is 
rooted in the probably approximately correct (PAC) 
learning model [41.73—75]. Instead of training all clas- 
sifiers in parallel (as in the case of bagging), we con- 
struct the ensemble in a serial fashion, by adding one 
model at a time. The model added to the ensemble at 
step j is trained on a dataset sampled selectively from 
the original dataset. The sampling distribution starts 
from a uniform distribution (as in bagging) and pro- 
gresses toward increasing likelihood of misclassified 
examples in the new dataset. Thus, the distribution is 
modified at each step, increasing the likelihood of the 
examples misclassified by the classifier at step (j— 1) 
being in the training dataset at step j. Like Bagging, 
Boosting combines the decisions of the classifiers us- 
ing uniform-weighted voting. 

Adaboost (or adapting boosting) [41.74] extends 
boosting from binary to multiclass classification prob- 
lems and regression models. It adapts the probability 
distribution over the rows in the training set to increase 
the difficulty of the training points by including more 
instances misclassified or wrongly predicted by previ- 
ous models. Adaboost combines the decisions of the 
classifiers using a weighted voting. For regressions, it 
aggregates all the normalized confidences for the out- 
put. For multiclass classification it selects the class with 
the highest votes, calculated from the normalized clas- 
sification errors of each class. 

A different approach to inject diversity is to limit 
the number of columns (features), rather than the num- 
ber of rows (points). Ho’s random subspaces tech- 


nique [41.76] selects random subsets of the available 
features to be used in training the individual classifiers 
in the ensemble. 

Dietterich [41.77] introduced an approach called 
randomization. In this approach, at each node of each 
tree of the ensemble, the 20 best attributes to split the 
node are determined and one of them is randomly se- 
lected for use at that node. 

Breiman [41.32] presented random forest ensem- 
bles, where bagging is used in combination with ran- 
dom feature subspace selection. At each node of each 
tree of the forest, a subset of m attributes (out of n 
available ones) is randomly selected, and the best split 
available based on the m attributes is selected for that 
node. Clearly, if m were too small, the tree performance 
would be severely affected, while if m were too close to 
the value of n, each tree performance would be higher, 
but diversity would suffer. In the case of random forest, 
a tradeoff between individual performance and overall 
diversity is achieved by using a value of m around the 
|./n| for classification problems, and around |n/3] for 
regression problems. 

Other approaches to increase diversity rely on the 
use of a high-level model to combine object-level mod- 
els derived from different machine-learning techniques, 
e.g., stacked generalization [41.78]. Alternatively, we 
can inject structural diversity in the design of the ob- 
ject models by using different topologies/architectures 
in graphical models (e.g., neural networks) or different 
function sets/grammars in genetic programming algo- 
rithms to construct models [41.79]. 

The above approaches allow us to extract different 
types of information from the data, which should lead to 
lower error correlations among the models. With these 
approaches we can generate a space of diverse models, 
which can then be searched by offline MHs to tune and 
optimize the model ensemble, according to tradeoffs of 
performance and diversity. 


41.5.4 Lazy Meta-Learning: 
A Model-Agnostic Fusion Mechanism 


A second trend in the development of analytics models 
is the inevitable commoditization of object-level mod- 
els. Multiple sources for model creation are now avail- 
able, ranging from Crowdsourcing analytics by compe- 
tition (e.g., [41.80]) or by collaboration (e.g., [41.81]) to 
cloud-based model automation tools, such as evolving 
model populations using genetic programming [41.79]. 
This situation creates different requirements for the fu- 
sion mechanism, which now should be agnostic with 


809 


Sly | d Hed 


810 PartD 


Neural Networks 


Sly | d Hed 


respect to the genesis of the object-level models in the 
ensemble. 

We should note that all the previous approaches to 
fusion described in Sect. 41.5.3 use static fusion mecha- 
nisms, as they focus primarily on the creation of diverse 
base models (or object-level models). If we want to be 
agnostics with respect to these models, we need to have 
a smarter fusion mechanism, i.e., a meta model that 
can reason about the performance and applicability of 
the available object level models. 

This issue is partially addressed by Lazy Meta- 
Learning, proposed in [41.82]. In this approach, for 
each query we instantiate a customized fusion mech- 
anism. Such mechanism is a meta model, i. e., a model 
that operates on the object-level models whose predic- 
tions we want to fuse. Specifically, for a given query we 
dynamically (i.e., based on the query) create a model 
ensemble, followed by a customized fusion. The dy- 
namic model ensemble consists of: 


(1) Finding the most relevant object-level models 
from a DB of models, by matching their meta- 
information with the query. 

(2) Identifying the relevant models with higher perfor- 
mance. 

(3) Selecting a subset of models with highly uncorre- 
lated errors to create the ensemble. 


The customized fusion uses the meta-information of 
the models in the ensemble for dynamic bias compensa- 
tion and relevance weighting. The output is a weighted 
interpolation or extrapolation of the outputs of the 
model ensemble. 

More specifically the Lazy Meta-Learning process 
is divided into three stages: 


© Model creation, an offline stage in which we create 
the initial building blocks for the assembly (or we 
collect them/acquire them from other sources) and 
we compile their meta-information 

@ Dynamic model assembly, an online stage in which, 
for a given query we select the best subset of models 

@ Dynamic model fusion, an online stage in which we 
evaluate the selected models and dynamically fuse 
them to solve the query. 


Model Creation: The Building Blocks 
We assume the availability of an initial training set that 
samples an underlying mapping from a feature space X 
to an output y. In the case of supervised learning, we 
also know the ground truth-value ¢ for each record in 


the training set. We create a database DB of m diverse, 
local or global models developed by any source. If we 
have control on the model creation, we can increase 
model diversity by any of the techniques described in 
Sect. 41.5.3. Every time, we add a model to the DB, 
we need to capture its associated meta-information, i. e., 
information about the model itself, its training set, and 
its local/global performance. Such meta-information is 
used to create indices in the DB that will make its search 
more efficient. For each model M;, we use a compiled 
summary of its performance, represented by a CART 
tree 7;, of depth d;, and trained on the model error vector 
obtained during the validation of the model. To avoid 
overfitting, each tree is pruned to allow at least 25 points 
in each leaf node. 


Dynamic Model Ensemble: 
Query-Driven Model Selection and Ensemble 
This stage is divided into three steps: 


© Model Filtering, in which we retrieve from the 
DB the applicable models for the given query. For 
a query q, the process starts with a set of constraints 
to define its related feasibility set. In this case the 
constraints are: 


(a) Model soundness and competency in its region 
of applicability (i.e., there must be sufficient 
points in the training set to develop a reliable 
model. 

(b) Model vitality (i. e., the model is up-to-date, not 
obsolete). 

(c) Model applicability to the query (i. e., the query 
is in the model’s competence region). 


The intersection of these constraints satisfaction 
gives us a set of retrieved models for the query q. 
Let us denote the cardinality of this set as r. 

© Model Preselection, in which we reduce the number 
of models, based on their local performance charac- 
teristics, such as bias, variability, and distance from 
the query. For a query g, having retrieved its fea- 
sible r models from the previous step, we classify 
the query using the same CART tree T;, associated 
to each model, and reach leaf node L;(q). Each leaf 
node will be defined by its path to the root of the tree 
and will contain d; constraints over (at most) d; fea- 
tures. Leaf L;(q) provides performance estimates of 
model M; in the region of the query. These estimates 
are used to retrieve the set of Pareto-best models to 
be used in the next step. Let us denote the cardinal- 
ity of this set as p. 


Machine Learning Applications | 41.5 Model Ensembles and Fusion 


r CART trees 


Query 


pata 


i i i Combinations 
i i i P 
Indices i ! i k 
(meta- i i ' TT 
information) i i i 
i D=llQa Xall i Entropy 
! i r points in 3-D ' measure for 
! l performance space ' each k-tuple 
DB of i i i 
all local i i i Pee 
models i i Pareto filter | Diversity 
i f 7 optimization 
m models ' r models | p points in 3-D performance space ' k models 
i L ı (corresponding to p models) | 
m= 100-10000 ! r = 30-100 ' r= 10-20 ' k= 3-10 
Model creation & Model filtering Model pre-selection Model final selection 
meta-information (by applicability) (by local performance dominance) (by entropy/correlation) 


Leverage DB index 
efficiency to 


Model perfomance 
compilation and 
region segmentation 
with CART trees 


retrieve applicable 
models in log m 


MCDM.-based model selection 
(by pareto sets or by smallest 
distances to origin) 


Greedy search models 
error cross-correlation to 


find most uncorrelated 
subset 


Fig. 41.14 Dynamic model ensemble on demand (filtering, selection) 


@ Model Final Selection, in which we define the final 
model subset. We need to use an ensemble whose 
elements have the most uncorrelated errors. We use 
the Entropy Measure E, proposed by Kuncheva and 
Whitaker, as the way to find the k most diverse 
models to form the ensemble. To avoid the intrin- 
sic combinatorial complexity, we approximate our 
search via a greedy algorithm (further described 
in [41.82]). 


The process is described in Fig. 41.14. 


Dynamic Model Fusion: Generating the Answer 
Finally, we evaluate the selected k models, compensate 
for their biases, and aggregate their outputs using a rel- 
evance weight that is a function of their proximity to 
the query to generate the solution to the query. This is 
illustrated in Fig. 41.15. 

This approach was successfully tested against a re- 
gression problem for a coal-fired power plant optimiza- 
tion. The optimization problem, described in [41.83], 
required to adjust 20+ set-point values for the power 


Fig. 41.15 Dynamic model fusion on demand > 


plant multivariable controller to match the generated 
power with the required Load, while minimizing emis- 


Query 
k CART trees 


M1, O1, Di 
y 


Dynamic output 
fusion F(Q) 
Weights/biases are functions 


of regional performance 
(based on Q) 


| 


Yo 


k models 


Selected models 


Dynamic fusion 


Dynamic bias 
compensation for each 
model based on bias in 

CART leaf node 


Dynamic weighted fusion 
based on each model 
prediction error 
(de) in CART leaf node 


811 


Sly | d Hed 


812 


9°lh | d Hed 


Neural Networks 


Table 41.4 Static and dynamic fusion for: (a) 30 NNs (b) 45 SRs models; (c) 75 combined models 


NO, 
(a) NN (b) SR (c) NN+ = (a) NN 
SR 
Baseline 0.02279 0.03267 0.02877 91.79 
(average) 
Dynamic 0.01627 0.01651 0.01541 70.95 
fusion 
Percentage 28.6% 49.5% 46.4% 22.7% 
gain 


sions (NO,) and heat rate (inverse of efficiency). The 
optimization was predicated on having a reliable fit- 
ness function, i. e., a high fidelity mapping between the 
20+ input vector and the three outputs (load, NO,, heat 
rate). 

In [41.82], we focused on generating this mapping 
via dynamic fusion. We used a data base of approxi- 
mately 75 models: 30 neural networks (NNs) trained 
using bootstrapping and about 45 symbolic regression 
(SR) models evolved on the MIT Cloud using with the 
same training set of 5000+ records. We applied the dy- 
namic fusion approach, described in this section, and 
evaluated it on a disjoint validation set made of 2200+ 
records. The results of the mean of the absolute error 
(MAE) computed over this validation set are summa- 
rized in Table 41.4. 

The first conclusion is that dynamic fusion con- 
sistently outperformed static fusion, as shown by the 


Heat rate Load 

(b) SR (c) NN+ (a) NN (b) SR (c) NN+ 
SR SR 

109.15 101.14 1.0598 15275 1.3149 


73.90 70.90 0.8474 0.8243 0.8209 


32.3% 29.9% 20.0% 46.0% 37.6% 


percentage gain (last row in Table 41.4), which was 
computed as the difference between baseline and dy- 
namic fusion, as a percentage of the baseline. In the 
cases of NO, and load, the baseline (average) for the 
45 SR models was ~50% worse than that of the 30 
NNs. In creating the SR, we sacrificed individual per- 
formance to boost diversity, as described in Sect. 41.5.3. 
On the other hand, the NNs were trained for perfor- 
mance, while diversity was only partially addressed by 
bootstrap (but the NNs were trained with the same fea- 
ture set and the same topology). After dynamic fusion 
the performance of the SR was roughly comparable 
with the NNs (within 2.7%), and the overall perfor- 
mance of the combined models was 3—5% better than 
that of the NNs alone. This experiment verified the im- 
portance of diversity during model creation. A more 
complete treatment of Lazy Meta Learning, including 
results of these experiments can be found in [41.82]. 


41.6 Summary and Future Research Challenges 


We illustrated the use of CI techniques in ML ap- 
plications. We explained how to leverage CI to build 
meta models (for offline design and online con- 
trol/aggregation) and object-level models (for solv- 
ing the problem at hand.) We described the most 
typical ML functions: unsupervised learning (cluster- 
ing), supervised learning (classification and regres- 
sions), and optimization. To structure the cases studies 
described in this review, we presented two similar 
paradigms: PHM for industrial assets, and risk man- 
agement for financial and commercial assets. We ana- 
lyzed five case studies to show the use of CI models 
in: 


(1) Unsupervised learning for anomaly detection 
(based on neural networks, fuzzy systems, and 
EAs). 


(2) Supervised learning (classification) for assessing 
and pricing risk in insurance products (based on 
fuzzy systems and EAs). 

(3) Supervised learning (regression) for valuating mort- 
gage collaterals (based on radial basis functions, 
fuzzy systems, neural fuzzy systems, and fusion). 

(4) Supervised learning (regression-induced ranking) 
for selecting the best units in a fleet (based on fuzzy 
systems and EAs). 

(5) Multiobjective optimization for rebalancing a port- 
folio of investments (based on multiobjective EAs). 


In the last section we covered model ensembles 
and fusion, and emphasize the need for injecting di- 
versity during the model creation stage. We proposed 
a model—agnostics fusion mechanism that could be used 
to fuse commoditized models (such as the ones obtained 


Machine Learning Applications | 41.6 Summary and Future Research Challenges 813 


by crowdsourcing). We will conclude this review with 
a prospective view of research challenges for ML. 


solutions have been proposed, ranging from SW 
frameworks [41.87] to active Flash [41.88]. 


b. Technology stack for Big Data (volume). 
41.6.1 Future Research Challenges We need to address scalability issues along 
many dimensions, such as data size, number 
All applications described in the five case studies were of computational nodes over which to dis- 
developed before the advent of cloud computing and tribute the algorithms, number of models to 
big data. Since then, we have encountered situations in be trained/deployed, etc. Among the research 
which we need to analyze very large data sets, enabled groups addressing this issue, we found the UC 
by the Internet of things (IoT) [41.84], machine-to- Berkeley AMP Lab [41.89] to be among the 
machine (M2M) connectivity, and social media. In this leaders in this area. The AMP researchers have 
new environment, we need to scale up the CI/ML ca- developed the Berkeley data analytics stack 
pabilities and address the underlying three v’s in big (BDAS) [41.90], a technology stack composed 
data: volume, velocity, and variability. Large data vol- of Shark (to run structured query language 
umes pose new challenges to data storage, organization, (SQL) and complex analytics on large clus- 
and query; data feed velocity requires novel streaming ters) [41.91, 92], Spark (to reuse working set of 
capabilities; data variability requires the collection and data across multiple parallel operations, typical 
analysis of structured, unstructured, and semistructural of ML algorithms) [41.93], and Mesos (to share 
(e.g., locational) data and the ability to learn across mul- commodity clusters between multiple diverse 
tiple modalities. The use of cloud computing will also cluster computing frameworks, such as Hadoop 
result in the commoditization of analytics. In this con- and message passing interface (MPD) [41.94]. 
text, ML applications will also include the delivery of c. Parallelization of ML Algorithms (volume). We 
Analytics-as-a-Service (AaaS) [41.85]. need to design ML algorithms so that their com- 
We will conclude this section with a view of five putation can be distributed. Some algorithms are 
research challenges entailed by this new environment, easy to parallelize, like population-based EAs 
as illustrated in Fig. 41.16: (using an island [41.95,96] or a diffusion grid 
models to distribute the subpopulations to many 
(1) Data-driven Model Automation and Scalability computational nodes), or Random Forest (grow- 
a. Computation at the edge (velocity). When the ing subsets of trees on different computational 
cost of moving large data set becomes signifi- nodes) [41.97]. Other algorithms will need to be 
cant, we need to perform analysis while the data redesigned for parallelization. 
are still in memory, via in-situ analytics and in- d. Multimodal learning (variability). As the size of 
transit data transformation [41.86]. A variety of the information grows, its content will become 
Quality/ 
computation 
Yeli (1) Data-driven (3) Decision tateari 
model automation/ making/ Anytime ML 
Volume scalability uncertainty algorithms 


Evidence/ 
uncertainty 
representation 


Variability 


Structured and 


Upgrade CI/ML analytics 
unstructured data 


Duman tole research themes 


Physics and data 
driven models 


Non-expert 
ML users 


(4) Fusion 


Model agnostic 


2) ML/hi 
(2) art fusion 


interaction 


Crowdsourcing 


Model diversity 
by design 


Data scientists 


(5) Special 
ML topics 


Extreme scale Deep learning 


visual analytics 


Learning 
with graphs 


Fig. 41.16 Research challenges for 
CI/ML analytics 


9'li | d Hed 


814 PartD 


Neural Networks 


9°lh | d Hed 


more heterogeneous. For instance, by navigat- 
ing through web pages, we encounter informa- 
tion represented as text, images, audio, tables, 
video, applets, etc. There are preliminary efforts 
for representing and learning across multiple 
modalities using graphical approaches [41.98], 
and kernels [41.99, 100]. However, a compre- 
hensive approach to this issue remains an open 
problem. 


(2) ML/Human Interactions 


a. 


Upgrade the human role. We need to remove the 
human modeler from the most time-consuming, 
iterative tasks, by automating data scrubbing 
(outliers removal, de-noising, imputation, or 
elimination of missing data) and data prepara- 
tion (multiple sources integration, feature selec- 
tion, feature generation). As noted in [41.86], 
while: 


All high performance computing (HPC) com- 
ponents — power, memory, storage, bandwidth, 
concurrence, etc. — will improve performance by 
a factor of 3 to 4444 by 2018... human cogni- 
tive capability will certainly remain constant. 


We are obviously the bottleneck in any kind of 
automation process and we need to upgrade our 
role, by interacting with the process at higher 
levels of model design. For example, in active 
learning we maximize the information value of 
each additional question to be answered by a hu- 
man expert [41.101, 102]. In interactive multi- 
criteria decision making we can use progressive 
preference articulation. This allows the expert 
to guide the automated search in the design 
space, by interactively simplifying the problem, 
e.g., by transforming an objective into a con- 
straint once the values of most solutions fall 
within certain ranges for that objective [41.103]. 
Non-expert ML users. For routine modeling 
task, we need to enable nonexpert users to define 
analytics in a declarative rather than procedural 
fashion. In [41.104] we can see an example of 
this concept, based on the analogy between the 
MLbase language for ML and traditional SQL 
languages for DBs. 

Integration of crowdsourcing with analyt- 
ics engines. Crowdsourcing is an emerging 
trend that is increasing human capacity in 
a manner similar to the way cloud com- 
puting is increasing computational capacity. 


In this analogy, we could think of Ama- 
zon Mechanical Turk [41.105] as the dual of 
Amazon Elastic Cloud Computing [41.106]. 

Originated from the concept of Wisdom of 

Crowds [41.107], crowdsourcing has shown 

a tremendous growth [41.108]. According to 

Malone et al. [41.109], the crowd’s contribution 

to the solution of a problem/task can be done via 

collection, competition, or collaboration. 

i. Collection is used when the task can be 
decomposed into independent micro-tasks 
that are then executed by a large crowd to 
generate, edit, or augment information. An 
example of this case is the labeling of videos 
or images to create a training set for super- 
vised learning [41.110]. The annotation of 
galaxy morphologies or lunar craters done 
at Zooniverse [41.111] is another example 
of this case. 

ii. Competition is used when a single individ- 
ual in the crowd can provide the complete 
solution to the task. An example of this 
case is the creation of data-driven models 
via contest hosted by sites such as Kag- 
gle [41.80] or CrowdANALYTICS [41.81]. 

iii. Collaboration is used when single individu- 
als in the crowd cannot provide a complete 
solution and the task cannot be decomposed 
into independent subtasks. This situation re- 
quires individuals to collaborate toward the 
generation of the solution. Examples of this 
case are usually large projects, such as the 
development of Linux or Wikipedia. 

Additional crowdsourcing systems are surveyed 

in [41.112]. We are interested in the intersec- 

tion of crowdsourcing with machine learning 
and big data, which is also the focus of UCB 

AMP Lab [41.89]. Among the research trends in 

this area, we find the impact of crowdsourcing in 

DBs queries, such as changing the closed-world 

assumption of DB queries [41.113], monitoring 

queries progress when crowdsourced inputs are 
expected [41.114], and using the crowd to inter- 
pret queries [41.115]. Furthermore, we can also 
apply ML techniques to improve the quality of 
the crowd-generated outputs, reducing variabil- 
ity in annotation tasks [41.116], and performing 
bias removal from the outputs [41.117]. In the 
future, we expect many more opportunities for 

CI/ML to leverage crowdsourcing and enhance 

the quality of its outputs. 


Machine Learning Applications 


41.6 Summary and Future Research Challenges 


d. New generations of Data Scientists. There is 


a severe skill gap, which we will need to over- 
come if we want to accelerate the applicabil- 
ity of ML to a broader set of problems. We 
need to train a new generation of data scien- 
tists, with skills in data flow (collection, storage, 
access, and mobility), data curation (preser- 
vation, publication, security, description, and 
cleanings), and basic analytics skills (applied 
statistics, MI/CI). Several universities are cre- 
ating a customized curriculum to address this 
need. The National Consortium for Data Sci- 
ence is an illustrative example of this emerging 
trend [41.118]. 

Extreme scale visual analytics. We can cope 
with increases in data volume/velocity, and 
analysis complexity because we have benefited 
from similar increases in computing capac- 
ity. Unfortunately, as aptly noted in [41.86], 
... there is no Moore’s law for human cog- 
nitive abilities. So, we face many challenges, 
described in the same reference, when we want 
to present/visualize the results of the complex 
analytics to the user (as in the case of multilevel 
hierarchies, time-dependent data, etc.) There are 
situations in which the user can select data with 
certain characteristics to be used in the analysis 
and steer data summarization and triage per- 
formed by the ML algorithms [41.86]. This will 
also overlap with the previous category of up- 
grading the human role. 


(3) Decision Making and Uncertainty 


a. 


Quality/Computation tradeoff. When faced with 
massively large data sets, we need to distribute 
data and models over multiple nodes. Often we 
also need to subsample the data sets while train- 
ing the models. This might introduce biases and 
increase variances in the models results. The use 
of the Bag of Little Bootstrap [41.72] allows 
us to extend bootstrap (Sect. 41.5.3) to large 
data sets. However, not all queries or functions 
will work with bootstrap. BlinkDB is a useful 
tool to address this problem, as it allows us to 
understand if additional resources will actually 
improve the quality of the answer. BlinkDB is 
a [41.119]: 


... massively parallel, sampling-based approxi- 
mate query engine for running ad-hoc, interac- 
tive SQL queries on large volumes of data. It 
allows to trade-off query accuracy for response 


time, enabling interactive queries over massive 
data by running queries on data samples and 
presenting results annotated with meaningful 
error bars... . 


Anytime ML Algorithms. Anytime algorithms 
are especially needed for online ML applica- 
tions in which models need to produce results 
within a given real-time constraints [41.120]. 
In [41.121] the authors propose a method for de- 
termining when to terminate the learning phase 
of an algorithm (for a sequence of iid tasks) to 
optimize the expected average reward per unit 
time. In simpler situations, we want to be able 
to interrupt the algorithm and use its most re- 
cent cached answer. This idea is related to the 
quality/computation tradeoff of point (3.a), as 
we expect the quality of the answer to increase if 
we allow more computational resources. For ex- 
ample, in EAs, under convergence assumptions, 
we expect further generations to have a better 
fitness than the current one. At any time we can 
stop the EA and fetch the answer for the current 
population. 

Evidence and Uncertainty Representation. This 
is one of the extreme-scale visual analytics 
challenges covered in [41.86]. In two previous 
points, we noted that as the data size increases 
we would need to perform data subsampling 
to meet real-time constraints. This subsampling 
will introduce even greater uncertainty in the 
process. We need better ways to quantify and 
visualize such uncertainty and provide the end- 
user with intuitive views of the information and 
its underlying risk. 


(4) Model Ensemble/Fusion 
a. Integration of structured and unstructured data. 


The simplest case is the integration of time de- 
pendent text (e.g., news, reports, logs) with time 
series data, e.g., text as a sensor. This topic 
is related to the multimodal learning discus- 
sion illustrated in (1.d). Alternatively, instead of 
learning across multiple data formats, we can 
use an ensemble of modality-specific learners 
and fuse their outputs. An example of this ap- 
proach can be found in [41.122] 

Integration of physics-based and data-driven 
models. There are at least two ways to perform 
such integration: using the models in parallel 
or serially. The simplest integration is based 
on a parallel architecture, in which both mod- 


815 


9'li | d Hed 


816 Part D 


Neural Networks 


9°ln | d Hed 


els are used simultaneously, yet separately. This 
case covers the use of data-driven models ap- 
plied to the residuals between expected values 
(generated by physics-based models) and ac- 
tual values (measured by sensors). This was 
illustrated in the first case study (Sect. 41.3.1). 
Another example of parallel integration is the 
use of an ensemble of physics-based and data- 
driven models, followed by an agnostic fusion 
mechanism (Sect. 41.5.4). A different type is the 
serial integration, in which one model is used 
to initialize the other one. An example of such 
integration is the use of data-driven models to 
generate estimates of parameters and initial con- 
ditions of physics based models. For instance, 
we could use data-driven RUL predictive mod- 
els to estimate the current degree of degradation 
of key components in an electro-mechanical 
system. Then, we could run a physics-based 
model of the system, using these estimates as 
initial conditions, to determine the impact of 
future load scenarios to the RUL predictions. 
Another example of serial integration is the use 
of physics-based models to generate (offline) 
a large data set, usually following a Design 
of Experiment methodology. This data set be- 
comes the training set for a data-driven model 
that will functionally approximate the physics- 
based model. Typically, a second data-driven 
model, frequently retrained with real-time data 
feeds, will be used to correct the outputs of the 
static approximation [41.123]. 

Model-agnostics fusion. In Sect. 41.5.4 we cov- 
ered the concept of model-agnostic fusion to 
be deployed when predictive models are cre- 
ated by a variety of sources (such as crowd- 
sourcing via competition or cloud-based genetic 
programming). We showed that this type of 
fusion is a meta model that leverages each pre- 
dictive model’s meta-data, defining its region 
of applicability and relative level of perfor- 
mance [41.82]. Additional research is needed to 
prevent over-fitting of the meta models and to 
extend this concept to classification problems. 


. Model diversity by design. By leveraging the 


almost infinite computational capacity of the 
cloud, we should be able to construct model en- 
sembles that are diverse by design. There are 
many techniques used to inject diversity in the 
models design, as described in Sect. 41.5.3. 
One of the most promising techniques con- 


sists in evolving a large population of sym- 
bolic regression models by distributing genetic 
programming algorithms using an island ap- 
proach [41.95, 96]. Random feature subsets can 
be assigned to each island, which also differ 
from the other islands through the use of distinct 
grammars, fitness functions, and functions sets. 
Additional information on this approach can be 
found in [41.79, 124, 125]. 


(5) Special Topics in Machine Learning 
a. Deep Learning. Originally proposed by 


Fukushima [41.126], deep learning (DL) gained 
acceptance when Hinton’s showed that DL 
training was decomposable [41.127, 128]. Hin- 
ton showed that each of the layers in the neural 
network could be pretrained one at a time, as an 
unsupervised Restricted Boltzmann Machine, 
and then fine-tuned using supervised backprop- 
agation. This discovery allows us to use large 
(mostly unlabeled) data sets available from Big 
Data applications to train DL networks. 
Learning with Graphs. There are many appli- 
cations of Recommender Systems in social net- 
works and targeted advertising. Typically these 
systems select information using collaborative 
filtering (CF), a technique based on collabo- 
ration among multiple agents, viewpoints, and 
data sources. Researchers have proposed vari- 
ous solutions to overcome some of the intrinsic 
challenges caused by data sparsity and network 
scalability. Among the most notable approaches 
we have: 

(1) Pregel [41.129]: a synchronous message 
passing abstraction in which all vertex- 
programs run simultaneously in a sequence 
of super steps. 

(2) GraphLab [41.130]: an asynchronous dis- 
tributed, shared-memory abstraction, de- 
signed to leverage attributes typical of ML 
algorithms, such as sparse data with local 
dependences, iterative algorithms, and po- 
tentially asynchronous execution. 

(3) PowerGraph [41.131] and its Spark imple- 
mentation GraphX [41.132]: an abstraction 
combining the best features of Pregel and 
GraphLab, better suited for natural graphs 
applications with large neighborhoods and 
high degree vertices. 


The above examples are just a sample of specialized 
ML algorithms and architectures for niche opportu- 


Machine Learning Applications 


References 


nities, exploiting the characteristic of their respective 
problems. 

In conclusion, we need to shape CI research to ad- 
dress these new challenges. To remain a vital discipline 
during the next decade, we need to leverage the al- 
most infinite capacity provided by cloud computing and 


References 


crowdsourcing, understand the tradeoffs between so- 
lution quality and computational resource allocation, 
design fusion mechanisms for model ensembles de- 
rived from heterogeneous data, and create specialized 
architectures and algorithms to exploit problems char- 
acteristics. 


41.1 


41.2 


41.3 


41.4 


41.5 


41.6 


41.7 


41.8 


41.9 


41.10 


41.11 


41.12 


41.13 


L. Zadeh: Fuzzy logic and soft computing: Issues, 
contentions and perspectives, Proc. IIZUKA'94: 3rd 
Int. Conf. Fuzzy Logic Neural Nets Soft Comput. 
(1994) pp. 1-2 

P. Bonissone: Soft computing: The convergence 
of emerging reasoning technologies, J. Res. Soft 
Comput. 1(1), 6-18 (1997) 

J.L. Verdegay, R. Yager, P. Bonissone: On heuristics 
as a fundamental constituent of soft computing, 
Fuzzy Sets Syst. 159(7), 846-855 (2008) 

J. Bezdek: On the relationship between neural 
networks, pattern recognition, and intelligence, 
Int. J. Approx. Reason. 6, 85-107 (1992) 

J. Bezdek: What is computational intelligence? In: 
Computational Intelligence Imitating Life, ed. by 
J. Zurada, R.I.1. Mark, C. Robinson (IEEE, New York 
1994) 

R.I.1. Mark: Computational versus artificial intel- 
ligence, IEEE Trans. Neural Netw. 4(5), 737-739 
(1993) 

P. Bonissone, R. Subbu, N. Eklund, T. Kiehl: Evolu- 
tionary algorithms + domain knowledge = real- 
world evolutionary computation, IEEE Trans. Evol. 
Comput. 10(3), 256-280 (2006) 

P. Bonissone, K. Goebel, W. Yan: Classifier fu- 
sion using triangular norms, Proc. Multiple Classif. 
Syst. (MCS) (Cagliari, Italy 2004) pp. 154-163 

P. Bonissone, N. Iyer: Soft computing applications 
to prognostics and health management (PHM): 
Leveraging field data and domain knowledge, 9th 
Int. Work-Conf. Artif. Neural Netw. (IWANN) (2007) 
pp. 928-939 

P. Bonissone, X. Hu, R. Subbu: A systematic PHM 
approach for anomaly resolution: A hybrid neural 
fuzzy system for model construction, Proc. PHM 
2009, San Diego (2009) 

P. Bonissone, A. Varma: Predicting the best units 
within a fleet: Prognostic capabilities enabled by 
peer learning, fuzzy similarity, and evolution- 
ary design process, Proc. FUZZ-IEEE 2005, Reno 
(2005) 

P. Bonissone, A. Varma, K. Aggour, F. Xue: De- 
sign of local fuzzy models using evolutionary 
algorithms, Comput. Stat. Data Anal. 51, 398-416 
(2006) 

P. Bonissone, R. Subbu, K. Aggour: Evolutionary 
optimization of fuzzy decision systems for au- 


41.14 


41.15 


41.16 


41.17 


41.18 


41.19 


41.20 


41.21 


41.22 


41.23 


41.24 


41.25 


tomated insurance underwriting, Proc. FUZZ-IEEE 
2002, Honolulu (2002) pp. 1003-1008 

K. Aggour, P. Bonissone, W. Cheetham, R. Mess- 
mer: Automating the underwriting of insurance 
applications, Al Mag. 27(3), 36-50 (2006) 

P. Bonissone, W. Cheetham, D. Golibersuch, 
P. Khedkar: Automated residential property val- 
uation: An accurate and reliable based on soft 
computing. In: Soft Computing in Financial En- 
gineering, ed. by R. Ribeiro, H. Zimmermann, 
R.R. Yager, J. Kacprzyk (Springer, Heidelberg 
1998) 

R. Subbu, P. Bonissone, N. Eklund, S. Bollapra- 
gada, K. Chalermkraivuth: Multiobjective finan- 
cial portfolio design: A hybrid evolutionary ap- 
proach, Proc. IEEE Congr. Evol. Comput. (CEC 2005), 
Edinburgh (2005) pp. 1722-1729 

P. Bonissone, F. Xue, R. Subbu: Fast meta-models 
for local fusion of multiple predictive models, 
Appl. Soft Comput. J. 11(2), 1529-1539 (2011) 

K. Goebel, R. Subbu, P. Bonissone: Controller 
Adaptation to Compensate Deterioration Effects, 
General Electric Global Research Technical Report, 
2006GR(298 (2006) 

R. Subbu, P. Bonissone, N. Eklund, W. Yan, 
N. lyer, F. Xue, R. Shah: Management of com- 
plex dynamic systems based on model-predictive 
multi-objective optimization, Proc. IEEE CIMSA (La 
Coruña, Spain 2006) pp. 64-69 

R. Subbu, P. Bonissone: A retrospective view of 
fuzzy control of evolutionary algorithm resources, 
Proc. FUZZ-IEEE 2003, St. Louis (2003) pp. 143- 
148 

P. Bonissone: Soft computing: A continuously 
evolving concept, Int. J. Comput. Intell. Syst. 3(2), 
237-248 (2010) 

A. Patterson, P. Bonissone, M. Pavese: Six sigma 
quality applied throughout the lifecycle of and 
automated decision system, J. Qual. Reliab. Eng. 
Int. 21(3), 275-292 (2005) 

T. Kohonen: Self-organized formation of topo- 
logically correct feature maps, Biol. Cybern. 43, 
59-69 (1982) 

R.S. Sutton, A.G. Barto: Reinforcement Learning 
(MIT, Cambridge 1998) 

F. Woergoetter, B. Porr: Reinforcement learning 
Scholarpedia B(3), 1448 (2008) 


817 


ly | d Hed 


818 Part D | Neural Networks 
41.26 L. Breiman, J. Friedman, R.A. Olshen, C.J. Stone: 41.44 R. Tadeusiewicz: Modern Computational Intelli- 
Classification and Regression Trees (Wadsworth, gence Methods for the Interpretation of Medi- 
Belmont 1984) cal Images, Vol. 84 (Springer, Berlin, Heidelberg 
41.27 J.R. Quinlan: C4.5: Programs for Machine Learning 2008) 
(Morgan Kaufmann, San Francisco 1993) 41.45 F. Steinman, K.-P. Adlassnig: Fuzzy medical di- 
41.28 C.G. Atkeson: Memory-based approaches to ap- agnosis. In: Handbook of Fuzzy Computation, ed. 
proximating continuous functions. In: Nonlinear by E.H. Ruspini, P.P. Bonissone, W. Pedrycz (IOP 
Modeling and Forecasting, ed. by M. Casdagli, Publ., Bristol 1998) pp. 1-14 
S. Eubank (Addison-Wesley, Redwood City 1992) 41.46 W. Yan, F. Xue: Jet engine gas path fault diagnosis 
pp. 503-521 using dynamic fusion of multiple classifiers, Neu- 
41.29 C.G. Atkeson, A. Moore, S. Schaal: Locally ral Netw. 2008, UCNN, Hong Kong (2008) pp. 1585- 
weighted learning, Artif. Intell. Rev. 11(1-5), 11-73 1591 
(1997) 41.47 D. Cayrac, D. Dubois, H. Prade: Possibilistic han- 
41.30 B.D. Ripley: Pattern Recognition and Neural Net- dling of uncertainty in fault diagnosis. In: Hand- 
works (Cambridge Univ. Press, Cambridge 1996) book of Fuzzy Computation, ed. by E.H. Ruspini, 
41.31 T. Hastie, R. Tibshirani, J. Friedman: The Elements P.P. Bonissone, W. Pedrycz (Institute of Physics 
of Statistical Learning: Data Mining, Inference, Publishing, Bristol 1998) pp. 1-7 
and Prediction (Springer, Berlin, Heidelberg 2010) 41.48 D. Prokhorov (Ed.): Computational Intelligence in 
41.32 L. Breiman: Random forests, Mach. Learn. 45(1), Automotive Applications (Springer, Berlin 2008) 
5-32 (2001) 41.49 H. Bersini, G. Bontempi, M. Birattari: Is readability 
41.33 C.Z. Janikow: Fuzzy decision trees: Issues and compatible with accuracy? From neuro-fuzzy to 
methods, IEEE Trans. Syst. Man Cybern. — Part B lazy learning, Proc. Artif. Intell., Berlin, Vol. 7, ed. 
28(1), 1-15 (1998) by C. Freksa (1998) pp. 10-25 
41.34 P. Bonissone, J.M. Cadenas, M.C. Garrido, 41.50 A. Freitas: Data Mining and Knowledge Discov- 
R.A. Diaz: A fuzzy random forest, Int. J. Approx. ery with Evolutionary Algorithms (Springer, Berlin 
Reason. 51(7), 729-747 (2010) 2002) 
41.35 P. Bonissone: Soft computing applications in 41.51 K. Deb: Multi-Objective Optimization Using Evo- 
PHM. In: Computational Intelligence in Decision lutionary Algorithms (Wiley, Chichester 2001) 
and Control, ed. by D. Ruan, J. Montero, J. Lu, 41.52 C.A.C. Coello, G.B. Lamont, D.A. Van Veld- 
L. Martinez, P.D. D'hondt, E. Kerre (World Scien- huizen: Evolutionary Algorithms for Solving 
tific, Singapore 2008) pp. 751-756 Multi-Objective Problems (Kluwer, New York 2002) 
41.36 P. Bonissone, V. Badami, K.H. Chiang, P.S. Khed- 41.53 P. Bonissone, Y.-T. Chen, K. Goebel, P. Khedkar: 
kar, K.W. Marcelle, M.J. Schutten: Industrial ap- Hybrid soft computing systems: Industrial and 
plications of fuzzy logic at general electric, Proc. commercial applications, Proc. IEEE 87(9), 1641- 
IEEE 83(3), 450-465 (1995) 1667 (1999) 
41.37 D. Filev: Gain scheduling based control of a class 41.54 P. Bonissone, R. Subbu, J. Lizzi: Multi-criteria de- 
of TSK systems. In: Fuzzy Control Synthesis and cision making (MCDM): A framework for research 
Analysis, ed. by S. Farinwata, D. Filev, R. Langari and applications, IEEE Comput. Intell. Mag. 4(3), 
(Wiley, New York 2000) pp. 321-334 48-61 (2009) 
41.38 P. Bonissone, K. Chiang: Fuzzy logic hierarchical 41.55 J.S.R. Jang: ANFIS: Adaptive-network-based- 
controller for a recuperative turboshaft engine: fuzzy-inference-system, IEEE Trans. Syst. Man Cy- 
From mode selection to mode melding, Industrial bern. 23(3), 665-685 (1993) 
Applications of Fuzzy Control and Intelligent Sys- 41.56 P. Bonissone, W. Cheetham: Fuzzy case-based 
tems, ed. by J. Yen, R. Langari, L. Zadeh (1995) reasoning for residential property valuation. In: 
pp. 131-156 Handbook of Fuzzy Computing, ed. by E.H. Rus- 
ay 41.39 M.A. Kramer: Autoassociative neural networks, pini, P.P. Bonissone, W. Pedrycz (Institute of 
= Comput. Chem. Eng. 16(4), 313-328 (1992) Physics, Bristol 1998), Section G14.1 
o 41.40 J.W. Hines, I.E. Uhrig: Use of autoassociative neu- 41.57 K.C. Chalermkraivuth: Analytical Approaches for 
T ral networks for signal validation, J. Intell. Robot. Multifactor Risk Measurements GE Tech. Inform. 
= Syst. 21(2), 143-154 (1998) 2004GRC184 (GE Global Research, Niskayuna 2004) 
41.41 B. Lerner, H. Guterman, M. Aladjem, |. Dinstein: 41.58 J.C. Hull: Options, Futures and Other Derivatives 
A comparative study of neural network based fea- (Prentice Hall, Upper Saddle River 2000) 
ture extraction paradigms, Pattern Recognit. Lett. 41.59 J. Kubalik, J. Lazansky: Genetic algorithms and 
20, 7-14 (1999) their tuning. In: Computing Anticipatory Systems, 
41.42 R.C. Eberhart, S. Yuhui: Computational Intel- ed. by D.M. Dubois (American Institute Physics, 
ligence: Concepts to Implementations (Morgan Liege 1999) pp. 217-229 
Kaufmann, San Francisco 2007) 41.60 N. Eklund: Multiobjective Visible Spectrum Opti- 
41.43 D. Poole, A. Mackworth, R. Goebel: Computational mization: A Genetic Algorithm Approach (Rensse- 


Intelligence (Oxford Univ. Press, Oxford 1998) 


laer, Troy 2002) 


Machine Learning Applications 


References 


41.61 


41.62 


41.63 


41.64 


41.65 


41.66 


41.67 


41.68 


41.69 


41.70 


41.71 


41.72 


41.73 


41.74 


41.75 


41.76 


41.77 


41.78 


41.79 


41.80 


41.81 


G. Fumera, F. Roli: A theoretical and experimental 
analysis of linear combiners for multiple classi- 
fier systems, IEEE Trans. Pattern Anal. Mach. Intell. 
27(6), 942-956 (2005) 

T.G. Dietterich: Ensemble methods in machine 
learning, 1st Int. Workshop Mult. Classi. Syst. 
(2000) pp. 1-15 

K. Tume, J. Ghosh: Error correlation and error re- 
duction in ensemble classifiers, Connect. Sci. 8, 
385-404 (1996) 

R. Polikar: Ensemble learning, Scholarpedia 4(1), 
2776 (2009) 

F. Roli, G. Giacinto, G. Vernazza: Methods for de- 
signing multiple classifier systems, Lect. Notes 
Comput. Sci. 2096, 78-87 (2001) 

L. Kuncheva, C. Whitaker: Ten measures of di- 
versity in classifier ensembles: Limits for two 
classifiers, Proc. IEE Workshop Intell. Sens. Pro- 
cess, Birmingham (2001), p. 10/1-6 

L.l. Kuncheva: Combining Pattern Classifiers: 
Methods and Algorithms (Wiley, New York 2004) 
B. Efron: Bootstrap methods: Another look at the 
jackknife, Ann. Stat. 7(1), 1-26 (1979) 

B. Efron: More efficient bootstrap computations, 
J. Amer. Stat. Assoc. 85(409), 79-89 (1988) 

B. Efron, R. Tibshirani: An Introduction to the 
Bootstrap (CRC, Boca Raton 1993) 

L. Breiman: Bagging predictors, Mach. Learn. 
24(2), 123-140 (1996) 

A. Kleiner, A. Talwalkar, P. Sarkar, M.I. Jordan: 
A scalable bootstrap for massive data, J. Roy. Stat. 
Soc. 76(4), 795-816 (2014) 

M.J. Kearns, U.V. Vazirani: An Introduction to 
Computational Learning Theory (MIT, Cambridge 
1994) 

Y. Freud, R.E. Schapire: A decision-theoretic gen- 
eralization of on-line learning and an application 
to boosting, J. Comput. Syst. Sci. 55(1), 119-139 
(1997) 

R.E. Schapire: Theoretical views of boosting, Proc. 
4th Eur. Conf. Comput. Learn. Theory (1999) pp. 1- 
10 

T.K. Ho: The random subspace method for con- 
structing decision forests, IEEE Trans. Pattern Anal. 
Mach. Intell. 20(8), 832-844 (1998) 

T.G. Dietterich: An experimental comparison of 
three methods for constructing ensembles of de- 
cision trees: Bagging, boosting, and randomiza- 
tion, Mach. Learn. 40(2), 139-157 (2000) 

D.H. Wolpert: Stacked generalization, Neural 
Netw. 5, 241-251 (1992) 

D. Sherry, K. Veeramachaneni, J. McDermott, 
U.M. O'Reilly: FlexGP: Genetic programming on 
the cloud, Lect. Notes Comput. Sci. 7248, 477-486 
(2012) 

A. Vance: Kaggle Contests: Crunching number for 
fame and glory, Businessweek, January 4, 2012 
G. Stolovitsky, S. Friend: Dream Challanges: http:// 
dreamchallanges.org 


41.82 


41.83 


41.84 


41.85 


41.86 


41.87 


41.88 


41.89 


41.90 


41.91 


41.92 


41.93 


41.94 


41.95 


41.96 


41.97 


P. Bonissone: Lazy meta-learning: Creating cus- 
tomized model ensembles on demand, Lect. 
Notes Comput. Sci. 7311, 1-23 (2012) 

F. Xue, R. Subbu, P. Bonissone: Locally weighted 
fusion of multiple predictive models, IEEE Int. Jt. 
Conf. Neural Netw. (IJCNN'06), Vancouver (2006) 
pp. 2137-2143 

M.A. Feki, F. Kawsar, M. Boussard, L. Trappeniers: 
The Internet of things: The next technological 
revolution, Computer 46(2), 24-25 (2013) 

Q. Chen, M. Hsu, H. Zeller: Experience in contin- 
uous analytics as a service (CaaaS), Proc. 14th Int. 
Conf. Extending Database Technol., EDBT/ICDT '11 
(2011) pp. 509-514 

P.C. Wong, H.-W. Shen, C.R. Johnson, C. Chen, 
R.B. Ross: The Top 10 challenges in extreme-scale 
visual analytics, IEEE Comput. Graph. Appl. 32(4), 
64-67 (2012) 

M. Parashar: Addressing the petascale data chal- 
lenge using in-situ analytics, Proc. 2nd Int. 
Workshop on Petascale Data Anal., PDAC'1 (2011) 
pp. 35-36 

D. Tiwari, S. Boboila, S. Vazhkudai, Y. Kim, 
X. Ma, P. Desnoyers, Y. Solihin: Active flash: To- 
wards energy-efficient, in-situ data analytics on 
extreme-scale machines, file and storage tech- 
nologies (FAST), 11th USENIX Conf. File Storage 
Technol., FAST'13, San Jose (2013) 

U. C. Berkeley: AMP Lab, https://amplab.cs. 
berkeley.edu/about/ 

Berkeley Data Analytics Stack (BDAS), https:// 
amplab.cs.berkeley.edu/software/ 

C. Engle, A. Lupher, R. Xin, M. Zaharia, M. Franklin, 
S. Shenker, I. Stoica: Shark: Fast data analysis 
using coarse-grained distributed memory, ACM 
SIGMOD Conf. (2012) 

R. Xin, J. Rosen, M. Zaharia, M. Franklin, 
S. Shenker, |. Stoica: Shark: SQL and rich analytics 
at scale, ACM SIGMOD Conf. (2013) 

M. Zaharia, M. Chowdhury, M. Franklin, 
S. Shenker, I. Stoica: Spark: Cluster Com- 
puting with Working Sets (HotCloud, Boston 
2010) 

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, 
A. Joseph, R. Katz, S. Shenker, I. Stoica: Mesos: A 
platform for fine-grained resource sharing in the 
data center, Proc. Netw. Syst. Des. Implement., 
NSDI (2011) 

D. Whitley, S. Rana, R.B. Heckendorn: Island 
model genetic algorithms and linearly separable 
problems, Lect. Notes Comput. Sci. 1305, 109-125 
(1997) 

D. Whitley, S. Rana, R.B. Heckendorn: The Is- 
land model genetic algorithm: On separability, 
population size and convergence, J. Comput. Inf. 
Technol. 7, 33-48 (1999) 

L. Mitchell, T.M. Sloan, M. Mewissen, P. Ghazal, 
T. Forster, M. Piotrowski, A.S. Trew: A parallel ran- 
dom forest classifier for R, Proc. 2nd Int. Workshop 


819 


ly | d Hed 


820 PartD 


Neural Networks 


ly | da Hed 


41. 


41. 


41. 


41. 


41. 


41. 


41 


41. 


41. 


41. 


41. 


41. 


41. 


41. 
41. 


41. 


41. 


41. 


98 


99 


100 


101 


102 


103 


.104 


105 


106 


107 


108 


109 


110 


111 
112 


113 


114 


115 


Emerg. Comput. Methods Life Sci., ECMLS '11, San 
Jose (2011) pp. 1-6 

H. Tong, J. He, M. Li, C. Zhang, W.-Y. Ma: 
Graph based multi-modality learning, Proc. 13th 
ACM Int. Conf. Multimed., MULTIMEDIA '05 (2005) 
pp. 862-871 

L. Barrington, D. Turnbull, G.R.C. Lanckriet: 
Game-powered machine learning, Proc. Natl. 
Acad. Sci. (2012) pp. 6411-6416 

B. McFee, G.R.G. Lanckriet: Learning content sim- 
ilarity for music recommendation, IEEE Trans. 
Audio Speech Lang. Process. 20, 2207-2218 
(2012) 

B. Settles: Active Learning Literature Survey, Com- 
put. Sci. Tech. Rep. Vol. 1648 (University of Wis- 
consin, Madison 2009) 

N. Rubens, D. Kaplan, M. Sugiyama: Active learn- 
ing in recommender systems. In: Recommender 
Systems Handbook, ed. by F. Ricci, P.B. Kantor, 
L. Rokach, B. Shapira (Springer, Berlin, Heidelberg 
2011) pp. 735-767 

S. Adra, |. Griffin, PJ. Fleming: A compara- 
tive study of progressive preference articulation 
techniques for multiobjective optimisation, Lect. 
Notes Comput. Sci. 4403, 908-921 (2007) 

T. Kraska, A. Talwalkar, J. Duchi, R. Griffith, 
M. Franklin, M. Jordan: MLbase: A distributed 
machine-learning system, Conf. Innov. Data Syst. 
Res., CIDR, Asilomar (2013) 

Amazon Mechanical Turk: https://www.mturk. 
com/mturk/welcome 

Amazon Elastic Cloud Computing: http://aws. 
amazon.com/ec2/ 

J. Surowiecki: The Wisdom of Crowds (Random 
House, New York 2004) 

J. Howe: Crowdsourcing (Random House, New 
York 2008) 

T. Malone, R. Laubacher, C. Dellarocas: The collec- 
tive intelligence genome, MIT Sloan Manag. Rev. 
51(3), 21-31 (2010) 

L. Zhao, G. Sukthankar, R. Sukthankar: Robust ac- 
tive learning using crowdsourced annotations for 
activity recognition, AAAI Workshop Hum. Com- 
put. (2011) pp. 74-79 

Zooniverse: https://www.zooniverse.org, 2007 
M.-C. Yuen, l. King, K.-S. Leung: A survey of 
crowdsourcing systems, 2011 IEEE Int. Conf. Priv. 
Secur. Risk Trust IEEE Int. Conf. Soc. Comput. (2011) 
pp. 766-773 

A. Feng, M. Franklin, D. Kossmann, T. Kraska, 
S. Madden, S. Ramesh, A. Wang, R. Xin: CrowdDB: 
Query processing with the VLDB crowd, Proc. VLDB 
(2011) 

B. Trushkowsky, T. Kraska, M. Franklin, P. Sarkar: 
Crowdsourced enumeration queries, Int. Conf. 
Data Eng., ICDE (2013) 

G. Demartini, B. Trushkowsky, T. Kraska, 
M. Franklin: CrowdQ: Crowdsourced query 
understanding, Int. Conf. Data Eng. (ICDE) (2013) 


41.116 


41.117 


41.118 


41.119 


41.120 


41.121 


41.122 


41.123 


41.124 


41.125 


41.126 


41.127 


41.128 


41.129 


Q. Liu, J. Peng, A. Ihler: Variational infer- 
ence for crowdsourcing, Adv. Neural Inf. Pro- 
cess. Syst., NIPS 25, La Jolla, ed. by F. Pereira, 
C.J.C. Burges, L. Bottou, K.Q. Weinberger (2012) 
pp. 701-709 

F. Wauthier, M. Jordan: Bayesian bias mitiga- 
tion for crowdsourcing, Adv. Neural Inf. Pro- 
cess. Syst., NIPS 25, La Jolla, ed. by F. Pereira, 
C.J.C. Burges, L. Bottou, K.Q. Weinberger (2012) 
pp. 1800-1808 

The National Consortium for Data Science: http:// 
data2discovery.org 

S. Agarwal, A. Panda, B. Mozafari, S. Madden, 
l. Stoica: BlinkDB: Queries with bounded errors 
and bounded response times on very large data, 
Proc. 8th ACM Eur. Conf. Comput. Syst. (2013) 
pp. 29-42 

G.I. Webb: Anytime learning and classification for 
online applications, Proc. 2006 Conf. Adv. Intell. 
IT, Active Media Technol. 2006, ed. by. Li, M. Looi, 
N. Zhong (IOS, Amsterdam 2006) 

B. P6czos, Y. Abbasi-Yadkori, C. Szepesvari, 
R. Greiner, N. Sturtevant: Learning when to stop 
thinking and do something!, Proc. 26th Int. 
Conf. Mach. Learn., ICML'09 (2009) pp. 825- 
832 

D.D. Palmer, M.B. Reichman, N. White: Multi- 
media information extraction in a live multi- 
lingual news monitoring system. In: Multimedia 
Information Extraction: Advances in Video, Audio, 
and Imagery Analysis for Search, Data Mining, 
Surveillance, and Authoring, ed. by M.T. Maybury 
(Wiley, New York 2012) pp. 145-157 

R.V. Subbu, L.M. Fujita, W. Yan, N.D. Ouellet, 
R.J. Mitchell, P.P. Bonissone, R.F. Hoskin: Method 
and System to Predict Power Plant Performance, 
US2 L3 A1 2012008 3933 (2012) 

D. Wilson, K. Veeramachaneni, U.M. O'Reilly: 
Large Scale Island Model CMA-ES for High Dimen- 
sional Problems, EVOPAR In EvoApplications 2073, 
Vienna (Springer, Berlin, Heidelberg 2013) 

0. Derby, K. Veeramachaneni, E. Hemberg, 
U.M. O'Reilly: Cloud Driven Island Model Genetic 
Programming, EVOPAR In EvoApplications 2013, 
Vienna (Springer, Berlin, Heidelberg 2013) 

K. Fukushima: Neocognitron: A self-organizing 
neural network model for a mechanism of pat- 
tern recognition unaffected by shift in position, 
Biol. Cybern. 36(4), 193-202 (1980) 

G. Hinton: A fast learning algorithm for deep 
belief nets, Neural Comput. 18(7), 1527-1554 
(2006) 

G. Hinton: Learning multiple layers of represen- 
tation, Trends Cogn. Sci. 11, 10 (2007) 

G. Malewicz, M.H. Austern, A.J.C. Bik, J.C. Dehnert, 
|. Horn, N. Leiser, G. Czajkowski: Pregel: A system 
for large-scale graph processing, SIGMOD'10, Proc. 
2010 ACM SIGMOD Int. Conf. Manag. Data (2010) 
pp. 135-146 


Machine Learning Applications | References 821 


41.130 


41.131 


Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, 
C. Guestrin, J.M. Hellerstein: Distributed 
GraphLab: A framework for parallel machine 
learning and data mining in the cloud, Proc. 
VLDB Endow. (2012) pp. 716-727 

J. Gonzalez, Y. Low, H. Gu, D. Bickson, C. Guestrin: 
PowerGraph: Distributed graph-parallel compu- 


41.132 


tation on natural graphs, OSDI'12 (2012), Vol. 12, 
No. 1, p. 2 

R. Xin, J. Gonzalez, M. Franklin, |. Sto- 
ica: GraphX: A Resilient Distributed Graph 
System on Spark, Proc. 1st Int. Workshop 
Graph Data Manag. Exp. Syst., GRADES 2013 
(2013) 


ly | da Hed 


823 


Part E 


Part E Evolutionary Computation 


Ed. by Frank Neumann, Carsten Witt, Peter Merz, Carlos A. Coello Coello, 
Thomas Bartz-Beielstein, Oliver Schütze, Jörn Mehnen, Günther Raidl 


42 Genetic Algorithms 
Jonathan E. Rowe, Birmingham, UK 


43 Genetic Programming 
James McDermott, Dublin 4, Ireland 
Una-May O'Reilly, Cambridge, USA 


44 Evolution Strategies 
Nikolaus Hansen, Orsay Cedex, France 
Dirk V. Arnold, Halifax, Nova Scotia, 
Canada 
Anne Auger, Orsay Cedex, France 


45 Estimation of Distribution Algorithms 
Martin Pelikan, Sunnyvale, USA 
Mark W. Hauschild, St. Louis, USA 
Fernando G. Lobo, Faro, Portugal 


46 Parallel Evolutionary Algorithms 
Dirk Sudholt, Sheffield, UK 


47 Learning Classifier Systems 
Martin V. Butz, Tubingen, Germany 


48 Indicator-Based Selection 
Lothar Thiele, Zurich, Switzerland 


49 Multi-Objective Evolutionary 
Algorithms 
Kalyanmoy Deb, East Lansing, USA 


50 Parallel Multiobjective Evolutionary 
Algorithms 
Francisco Luna, Mérida, Spain 
Enrique Alba, Malaga, Spain 


51 Many-Objective Problems: 
Challenges and Methods 
Antonio López Jaimes, México, Mexico 
Carlos A. Coello Coello, México, Mexico 


52 Memetic and Hybrid Evolutionary 
Algorithms 
Jhon Edgar Amaya, San Cristóbal, 
Venezuela 
Carlos Cotta Porras, Malaga, Spain 
Antonio J. Fernandez Leiva, Malaga, Spain 


5 


Ww 


Design of Representations and Search 
Operators 
Franz Rothlauf, Mainz, Germany 


54 Stochastic Local Search Algorithms: 
An Overview 
Holger H. Hoos, Vancouver, Canada 
Thomas Stitzle, Brussels, Belgium 


55 Parallel Evolutionary Combinatorial 
Optimization 
El-Ghazali Talbi, Villeneuve d'Ascq, 
France 


56 How to Create Generalizable Results 
Thomas Bartz-Beielstein, Gummersbach, 
Germany 


57 Computational Intelligence 
in Industrial Applications 
Ekaterina Vladislavieva, Turnhout, 
Belgium 
Guido Smits, NM Hoek, The Netherlands 
Mark Kotanchek, Midland, USA 


824 PartE 


58 Solving Phase Equilibrium Problems 61 Knowledge Discovery in Bioinformatics 


by Means of Avoidance-Based 
Multiobjectivization 

Mike Preuss, Münster, Germany 
Simon Wessing, Dortmund, Germany 
Giinter Rudolph, Dortmund, Germany 


Julie Hamon, Boisseuil, France 

Julie Jacques, Seclin, France 

Laetitia Jourdan, Lille, France 
Clarisse Dhaenens, Villeneuve d'Ascq 
Cedex, France 


Gabriele Sadowski, Dortmund, Germany 62 Integration of Metaheuristics 


and Constraint Programming 


59 Modeling and Optimization 
Luca Di Gaspero, Udine, Italy 


of Machining Problems 

Dirk Biermann, Dortmund, Germany 
Petra Kersting, Dortmund, Germany 
Tobias Wagner, Dortmund, Germany 
Andreas Zabel, Dortmund, Germany 


63 Graph Coloring and Recombination 
Rhyd Lewis, Cardiff, UK 


64 Metaheuristic Algorithms and Tree 
Decomposition 
Thomas Hammerl, Vienna, Austria 
Nysret Musliu, Vienna, Austria 
Werner Schafhauser, Vienna, Austria 


60 Aerodynamic Design 
with Physics-Based Surrogates 
Emiliano luliano, Capua (CE), Italy 


Domenico Quagliarella, Capua (CE), Italy é5 Evolutionary Computation 


and Constraint Satisfaction 
Jano |. van Hemert, Dunfermline, UK 


Jonathan E. Rowe 


While much has been discovered about the proper- 
ties of simple evolutionary algorithms (EAs), based 
on evolving a single individual, there is very little 
work touching on the properties of population- 
based systems. We highlight some of the work that 
does exist in this chapter. 


421 Algorithmic Framework ...................... 826 
42.2 Selection Methods...................00000. 828 
42.2.1 Random Selection................... 829 
42.2.2 Proportional Selection............. 829 
42.2.3 Stochastic Universal Sampling.. 829 
42.2.4 Scaling Methods ................. 829 
42.2.5 Rank Selection... 830 
42.2.6 Tournament Selection ............. 830 
42.2.7 Truncation Selection................ 830 


Genetic algorithms (GA) are a particular class of evolu- 
tionary computation method often used for optimiza- 
tion problems. They were originally introduced by 
Holland [42.1] at around the same time when other 
evolutionary methods were being developed, and pop- 
ularized by Goldberg’s much-cited book [42.2]. They 
are characterized by the maintenance of a population of 
search points, rather than a single point, and the evolu- 
tion of the system involves comparisons and interaction 
between the points in the population. They are usually 
used for combinatorial optimization problems, that is 
where the search space is a finite set (typically with 
some structure). The most common examples use fixed- 
length binary strings to represent possible solutions, 
though this is by no means always the case — the search 
space representation should be chosen to suit the par- 
ticular problem class of interest. Members of the search 
space are then evaluated via a fitness function, which 
determines how well they solve the particular problem 
instance. This is, of course, by analogy with natural 
selection in which the fittest survives and evolves to 


42. Genetic Algorithms 


42.3 Replacement Methods ........................ 831 
42.3.1 Random Replacement............. 831 
42.3.2 Inverse Selection ................00 831 
42.3.3 Replace Worst................cccceeee 831 
42.3.4 Replace Parents.................:008 832 

42.4 Mutation Methods......................c0608 832 

42.5 Selection—Mutation Balance ............... 834 

42.6 Crossover Methods.......................00005 836 

42.7 Population Diversity .....................00 838 

42.8 Parallel Genetic Algorithms................. 839 

42.9 Populations as Solutions .................... 841 

42.10 COMCIUSIONS...................cccccceeeeeeeeeeeeees 842 

Referents. orninn 843 


produce better and better solutions. Notice that the ef- 
ficiency of a genetic algorithm is therefore measured in 
terms of the number of evaluations of the fitness func- 
tion required to solve the problem (rather than a more 
direct measure of the computational complexity). For 
any well-defined problem class, the maximum num- 
ber of function evaluations required by an algorithm to 
solve a problem instance is called the black box com- 
plexity of the algorithm for that problem class. The 
black box complexity of a problem class is defined to be 
the maximum number of function evaluations required 
by the best possible black box algorithm. This is a re- 
search topic in its own right [42.3]. 

As an example, consider the subset sum problem, 
in which a set of n integers is given, along with a tar- 
get integer T, and we have to find a subset whose sum 
is as close to T as possible. We can represent subsets 
as binary strings of length n, in which a 1 indicates 
that an element is in the subset and a O that it is ex- 
cluded. The binary string forms the analog of the DNA 
(deoxyribonucleic acid) of the corresponding individual 


vV 
o 

= 

pes 
m 
f> 
N 


826 PartE 


Evolutionary Computation 


Lzh |3 Hed 


solution. Its fitness would then be given by the corre- 
sponding sum — which we are trying to minimize. 

For a different example, consider the traveling 
salesman problem, in which there are a number of 
cities to be visited. We have to plan a route to visit 
each city once and return home, while minimizing the 
distance traveled. A potential solution here could be ex- 
pressed as a permutation of the list of cities (acting as 
the DNA), with the fitness given by the corresponding 
distance. 

A genetic algorithm maintains a population of such 
solutions and their corresponding fitnesses. By focusing 
on the better members of the population and introduc- 


42.1 Algorithmic Framework 


The genetic algorithm works by updating the popula- 
tion in discrete iterations, called generations. We begin 
with an initial, randomly generated, population. This 
acts as a set of parents from which a number of off- 
spring are produced, from which the next generation is 
created. There are two basic schemes for doing this: the 
generational method and the steady-state method. 

The generational approach is to repeatedly produce 
offspring from the parent population, until there are 
enough to fill up a whole new population. One gener- 
ation, in this case, corresponds to the creation of all of 
these offspring. The steady-state approach, by contrast, 
produces a single offspring from the current parents, 
and then inserting it into the population, replacing some 
individual. Here, a generation consists of creating one 
new individual solution. 

In either case, the population size stays fixed at 
its initial size, which is a parameter of the algorithm. 
There exists some theoretical work investigating a good 
choice of population size in different situations, but 
there are few general principles [42.10]. The correct 
size will depend on the problem to be optimized, and 
the particular details of the rest of the algorithm. It 
should be noted, however, that in a number of cases it 
can be shown that smaller population sizes are prefer- 
able and, indeed, in some cases a population of size one 
is sufficient. 

The overall structure of the generational genetic al- 
gorithm is as follows: 


1. Initialize population of size jz randomly with points 
from the search space. 

2. Repeat until stopping criterion is satisfied: 
a) Repeat u times: 


ing small variations (or mutations), we hope that the 
population will evolve good, or even optimal, solutions 
in a reasonable time. 

A popular general introduction to the field can be 
found in Mitchell’s book [42.4]. Much of what is known 
about the theory of genetic algorithms was developed 
by Vose and colleagues [42.5], which centers around 
a description of the changing population as a dynamical 
system. A gentle overview of this theory can be found 
in [42.6]. More recently, there has been a stronger em- 
phasis on understanding the algorithmic aspects, with 
a particular focus on run-time analysis for optimization 
problems [42.7-9]. 


i. Choose a point from the population. 


ii. Modify the point with mutation and 
crossover. 

iii. Place resulting offspring in the new popula- 
tion. 


3. Stop. 


The critical points are therefore the selection 
method, used to choose points from the current popu- 
lation, and the mutation and crossover methods used to 
modify the chosen points. The idea is that selection will 
favor better solutions (in the sense that they provide bet- 
ter solutions to the optimization problem at hand), that 
mutation will introduce slight variations in the current 
chosen solutions, and that crossover will combine to- 
gether parts of different good solutions to, hopefully, 
form a better combination. We will look at different 
schemes for selection, mutation, and crossover in detail 
later, in Sects. 42.2, 42.4, and 42.6. 

The overall structure of the steady-state genetic al- 
gorithm is as follows: 


1. Initialize population of size u randomly with points 
from the search space. 

2. Repeat until stopping criterion is satisfied: 
a) Choose a point from the population. 
b) Modify the point with mutation and crossover. 
c) Choose an existing member of the population. 
d) Replace that member with the new offspring. 

3. Stop. 


It can be seen that, in addition to selection, mutation 
and crossover, the steady-state genetic algorithm also 
requires us to specify a means for choosing an individ- 
ual to be replaced. Suitable replacement strategies will 


Genetic Algorithms | 42.1 Algorithmic Framework 


be discussed later in Sect. 42.3, but for now we make 
a few general observations. 

It is clear that in the generational genetic algo- 
rithm, progress is driven by the selection method. In 
this step of the algorithm, we choose to keep those so- 
lutions which we prefer, by dint of the degree to which 
they optimize the problem we are trying to solve. For 
the steady-state genetic algorithm, progress can also 
be maintained by the replacement strategy, if this is 
designed so as to affect the replacement of poorly per- 
forming individuals. In fact, it is possible to put the 
whole burden of evolution on the replacement step, for 
example, by always replacing the worst member of the 
population, and allowing the selection step to choose 
any individual uniformly at random. Conversely, one 
may use a stronger selection method and replace indi- 
viduals randomly (or, of course, some combination of 
the two approaches). The steady-state algorithm there- 
fore allows the user finer tuning of the strength of 
selective pressure. 

In addition, the steady-state approach allows the 
user to guarantee that good individuals are never lost, 
by choosing a replacement strategy that protects such 
individuals. For example, replacing the worst individ- 
ual each time ensures that copies of the best individual 
always remain. Any EA that has the property of pre- 
serving the best individual is called elitist. This would 
seem to be a desirable property as otherwise progress 
toward the optimum can be lost due to mutation and 
selection (Sect. 42.5). The generational framework, as 
it stands, offers no such guarantee. Indeed, depending 
on the method chose, the best individual may not even 
be selected, let alone preserved! It is quite a common 
strategy, therefore, to add elitism to the generational 
framework, for example, by making sure at least one 
copy of the best individual is copied across to the next 
generation each time. 

The generational approach, without elitism explic- 
itly added, is referred to as a comma strategy. In partic- 
ular, it is a (u, 4) EA. This means that the population 
has size u, and a further u offspring are created, which 
becomes the next generation population. More gener- 
ally, we could have (jz, A) algorithms, where A > u, in 
which A offspring are created and the best jz are taken 
to be the next generation. If À is sufficiently large with 
respect to u, then the probability of not selecting the 
best individual gets rather small. If, in addition, there is 
a reasonable chance that mutation and crossover do not 
make any changes, we get an approximation of elitism. 

The steady-state algorithm, in which one replaces 
the worst individual, is referred to as a plus strategy. 


In particular, it is a (u + 1) EA. This means that one 
offspring is created from a population of size jz, and 
then the best jz are kept from the pool of parents plus 
offspring. More generally, we could have (w+ A) al- 
gorithms, in which A offspring are created and the best 
u from the collection of parents and offspring are kept 
to be the next generation. Plus strategies are, of course, 
elitist. 

Another advantage of the steady-state genetic algo- 
rithm is that progress can be immediately exploited. As 
soon as an improving individual is found, it is inserted 
into the population, and may be selected for further evo- 
lution. The generational algorithm, in contrast, has to 
produce a full set of offspring before any good discov- 
eries can be built on. This overhead can be minimized 
by taking the population to be as small as possible, al- 
though one then risks losing good individuals in the 
selection process, unless some form of elitism is explic- 
itly implemented. 

A further extension often made to either scheme is 
to implement a mechanism for maintaining a level of 
diversity in the population. Clearly, a potential advan- 
tage of having a population is that it can cover a broad 
area of the search space and allow for more effec- 
tive searching than if a single individual were used. 
Any such advantage would be lost, however, if the 
members of the population end up identical or very 
similar. In particular, the effectiveness of crossover is 
reduced, or even eliminated, if the population mem- 
bers are too similar. Some of the methods employed 
for maintaining diversity will be discussed in detail in 
Sect. 42.7. 

Before we move on to details, it is worth asking 
whether or not maintaining a population is an effec- 
tive approach to optimization. Indeed, such a question 
should be asked seriously whenever one attempts to use 
EAs for such problems. For certain classes of prob- 
lem, it may well be the case that a simple local search 
strategy (that is, a (1 + 1) evolutionary algorithm (EA)) 
may be as effective or better. For example, for some 
problems (such as OneMax, which simply totals the 
bit values in a string), the (1 + 1)-EA is provably op- 
timal amongst evolutionary algorithms that only use 
standard bitwise mutation [42.11]. There is a small, 
but limited, amount of theoretical work on this issue 
to guide us [42.10]. In the first place, if the search 
space comprises islands of good solutions with small 
gaps containing poor solutions, then a population might 
provide an effective way to jump across the gaps. This 
is because the poorer offspring are not so readily re- 
jected (especially with a generational approach), and 


827 


24 |3 Hed 


828 PartE 


Evolutionary Computation 


e747 | J Hed 


may persist long enough for a lucky mutation to move 
into a neighboring good region. One might therefore 
also expect a population-based approach to be effective 
generally on highly multimodal problems but, unfor- 
tunately, there is very little analysis on this situation. 
If crossover is thought to be effective for your prob- 
lem (Sect. 42.6), then a population is required, as we 
need to choose pairs of parents — although again, it 
might be the case that very small populations are suf- 
ficient. It certainly seems to be the case that for any 
such advantage, it is necessary to maintain a reason- 
able level of diversity, or the point of the population is 
lost [42.12]. 


42.2 Selection Methods 


The selection method is the primary means a genetic 
algorithm has of directing the search process toward 
better solutions. It is usually defined in terms of a fitness 
function which assigns a positive score to each point 
in the search space, with the optimal solution having 
the maximum (or minimum) fitness. Often the fitness 
function is, in fact, just the objective function of the 
problem to be optimized, however, there are times when 
this can be modified. This typically happens when the 
problem involves some constraints, and so account has 
to be taken as to what extent the constraints are satis- 
fied. There are a number of different ways to approach 
this situation: 


1. Simply discard any illegal solution and try again. 
That is, if mutating (say) a solution produces an ille- 
gal solution, discard it and try mutating the original 
again, until a legal solution is obtained. 

2. Repair the solution. This involves creating a special 
purpose heuristic which, given any illegal solution, 
modifies it until it becomes legal. 

3. More generally, one can construct modified oper- 
ators which are guaranteed to produce legal solu- 
tions. The above two methods are specific ways one 
might achieve this. 

4. Adapt the fitness function by adding penalty terms. 


It is this fourth approach which concerns us here, as 
it allows illegal solutions to be tolerated, but puts the 
onus on the selection method to drive the population 
away from illegal and toward legal solutions. The idea 
is to create a fitness function which is a weighted sum 
of the original objective function, and a measure of the 
extend to which constraints have been broken. That is, 


One situation in particular, where a genetic algo- 
rithm may be helpful, is if the potential solutions to 
a problem are represented by the population as a whole. 
That is, one tries to find an optimal set of things, and 
we can use the population to represent that set. This is 
the case, for example, in multiobjective optimization, 
where one tries to determine the Pareto set of dominant 
solutions [42.13]. For single objective problems, it may 
be possible to represent solutions as a set of objects, 
each of which can be evaluated according to its con- 
tribution to the overall solution. There has been some 
recent progress on problems of this type [42.14] and 
will consider this case in Sect. 42.9. 


if h: X — R is the objective function, we might have 
a fitness function 


k 
FŒ = woh(x) — wA), 


j=! 


where c; is a measure as to how far constraint j has been 
broken. A difficulty with this approach is in specify- 
ing the weights, since one would like to allow illegal 
solutions to be tolerated (at least in the early stages 
of the search), but do not want good, but illegal solu- 
tions to wipe out any legal ones. It is common, then, 
to try to fix the weights at least so that legal solutions 
are preferred to illegal ones (however good). This, then, 
suggests a fourth approach which works particularly 
well with tournament selection (see below), in which 
it is only necessary to say which of two solutions is to 
be preferred. We stipulate that legal solutions are to be 
preferred to illegal ones that two legal solutions should 
be compared with the objective function and that two 
illegal solutions should be compared by the extent of 
constraint violation. 

The degree to which poor, or illegal, solutions are 
tolerated by a genetic algorithm is determined by the 
strength of the selection method chosen. A weak se- 
lection method will allow poor (that is, low fitness) 
solutions to be selected with high probability com- 
pared to a strong scheme, which will typically select 
better solutions. It is usual to insist that a selection 
scheme should have the property that a better solution 
should have a higher probability of being selected than 
a weaker one. A number of selection methods have been 


Genetic Algorithms | 42.2 Selection Methods 


proposed, ranging from very weak to very strong, and 
we will consider them in this order. 


42.2.1 Random Selection 


The weakest selection method is simply to pick a mem- 
ber of the population uniformly at random. Of course, 
this has no selection strength at all and will not, by 
itself, guide the search process. It must therefore be 
combined with some other mechanism to achieve this. 
Typically, this would be used in a steady-state genetic 
algorithm, in which the replacement strategy imposes 
the selection pressure (Sect. 42.3). Another possibility 
is in a parallel genetic algorithm where offspring may 
replace parents if they are better, but the selection of 
partners for crossover is random (Sect. 42.8). 


42.2.2 Proportional Selection 


The fitness proportional selection method comes from 
taking the analogy of the role of fitness in natural evolu- 
tion seriously. In biological evolution, fitness is literally 
a measure of how many offspring an individual expects 
to have. Within the fixed-sized population of a ge- 
netic algorithm, then, we model this by saying that the 
probability of an individual being selected should be 
proportional to its fitness within the population. That 
is, the probability of selecting item x is given by 


fœ 
ESO) ’ 
where the sum ranges over all members of the popula- 


tion. This selection method is often implemented by the 
roulette wheel algorithm as follows: 


1. Let T be the total fitness of the population. 
2. Let R be a random number in the range 0 < R <T. 
3. Letc=0. Let i = 0. 
4. While c < R do 
a) Letc=c+f(i) 
b) Leti=i+1. 
5. Return i. 


where f (i) is the fitness of the item with index i in 
the population. 


42.2.3 Stochastic Universal Sampling 
In a generational genetic algorithm, one needs to select 


u individuals from the population in order to com- 
plete one generation. Using proportional selection to 


do this therefore requires O(u?) time, which can be 
a significant burden on the running time. An alterna- 
tive selection algorithm, which still ensures that the 
expected number of times an individual is selected is 
proportional to its fitness, is the stochastic universal 
sampling algorithm [42.15]. If T is again the total fit- 
ness of the population, then let 


_f@ 
=- H 


H= 


which is the expected number of copies of item i. 
The selection algorithm guarantees that either |E[i]] or 
[E[i]| copes of i are selected, for each item i in the pop- 
ulation: 


1. Let r be a random number in the range 0 < r < 1. 
2. Letc=0. 
3. Fori=0touw—1do 

a) Let c = c + Efi]. 

b) While r < c do: r = r + 1; Select(i). 


By the time the algorithm terminates, jz items will 
have been selected, in O(j) time (for a good introduc- 
tion to asymptotic notation [42.16]). 


42.2.4 Scaling Methods 


In the early stages of a run of a genetic algorithm, there 
is usually considerable diversity in the population, and 
the fitness of the best individuals may be considerably 
greater than the others. When using fitness proportional 
selection, this can lead to strong selection of the better 
individuals. Later on when the algorithm is nearing the 
optimum, the population is less diverse, and fitnesses 
may be more or less constant. In this situation, propor- 
tional selection is very weak, and does not discriminate. 

One idea to combat this problem is to scale the 
fitness function somehow, so as to adjust the selec- 
tion strength during the run. Two proposals along these 
lines are sigma scaling and Boltzmann selection. Sigma 
scaling (invented by Stephanie Forrest and described 
in [42.2] and [42.4]) explicitly takes the diversity of the 
population into account via the standard deviation o of 
the fitness in the population. Given the fitness function 
f:X — R, the new scaled fitness is 

ny = 14 OE 

20 

where f is the average fitness of the population. A neg- 
ative value of h(x) might be clamped at zero or some 


829 


ec |3 Hed 


830 PartE 


Evolutionary Computation 


et | J Hed 


small value. The idea behind sigma scaling is that now 
when there is a lot of diversity, o is large, and so the 
best individuals will not dominate the selection process 
so much. However, when diversity is low, so is o and the 
scaled fitness function can still discriminate effectively. 

The second proposal, Boltzmann selection, makes 
use of the idea that the diversity (and therefore strength 
of selection) is lost over time [42.17]. We therefore seek 
to scale the fitness using the time (or generation num- 
ber) as a parameter. This is usually done in the same 
way as simulated annealing, by controlling a tempera- 
ture parameter T, which is initially large, but decreases 
over time. We have a scaled fitness of 


h(x) = exp (2) : 


Of course, a difficulty with this approach is to select 


the appropriate cooling schedule by which T should de- 
crease over time. 


42.2.5 Rank Selection 


We have seen that one of the major drawbacks of the 
proportional selection method, and stochastic universal 
sampling, is that the probability of choosing individu- 
als is very sensitive to the relative scale of the fitness 
function. For example, an item with fitness 2 is twice as 
likely to be chosen as an item of fitness 1. But an item 
of fitness 101 is almost as likely to be chosen as one 
with fitness 100. It is therefore suggested that selection 
should depend only on the relative strength of the indi- 
vidual within the population. One way to achieve this 
is to choose an individual depending on its rank. That 
is, we sort the population using the fitness function, but 
then ascribe a rank to each member, with the best indi- 
vidual getting score u down to 1, for the worst [42.18]. 
The simplest thing to do then is to choose individ- 
uals proportional to rank, using the roulette wheel or 
stochastic universal sampling algorithms. However, this 
is then sensitive to the population size. So a common al- 
ternative is to linearly scale the rank to achieve a score 
between two numbers a and b. We thus get a function 


h(i) = (b—a)r(i) + pa—b 

p-l 
where r(i) is the rank of item i in the population. We 
then seek to select items in proportion to their h-value. 
This can be done with the roulette wheel or stochastic 
universal sampling method. Notice that since the sum 


of ranks is known, so is the sum of /-values, and so the 
probability of selecting item i is 
2((b—a)r(i) + wa—b) 
w(w—W(b+a) ` 


42.2.6 Tournament Selection 


A much simpler way to achieve a similar end as rank 
selection is to use tournament selection. We fix a pa- 
rameter k < u. To select an item from the population 
we simply do the following: 


1. Choose k items from the population, uniformly at 
random. 
2. Return the best item from those chosen; 


where best refers, of course, to assessment by the fitness 
function. Perhaps the most common version is binary 
tournament selection, where k = 2. In this case, it is not 
strictly necessary to have a fitness function assign a nu- 
merical value to points in the search space. All that is 
required is a means to compare two points and return 
the preferred one. 

It is straightforward to show that the probability of 
choosing item i from the population is 

2r(i)— 1 

-e 
where r(i) is the rank from 1 (the worst) to u (the best). 
If one chooses the two items to be compared without 
replacement, then the probability of choosing item i be- 
comes 


2r(i) —2 
(m1) ° 


which is equivalent to rank selection, linearly scaled 
witha = O and b = 1. 

At the other extreme, if one were to pick the tourna- 
ment size to be very large (close to jz) then it becomes 
more and more likely that only the best individuals will 
be selected. Increasing the tournament size in this way 
is a good method for controlling the selection strength. 


42.2.7 Truncation Selection 


The strongest selection method of all would be to 
only select the best individual. Slightly more forgiving 
is truncation selection, where only individuals within 
a given fraction of top performers are selected. In a gen- 
erational genetic algorithm, these must be repeatedly 


Genetic Algorithms | 42.3 Replacement Methods 


selected at random. In a steady-state algorithm one 
simply picks of them at random. Truncation selection, 
therefore, introduces a new parameter, which is the frac- 
tion of the population available for selection. This may 


42.3 Replacement Methods 


The steady-state genetic algorithm requires a method 
by which a new offspring solution can be placed into 
the population, replacing one of the existing members. 
As with selection, there are different approaches, which 
have different strengths in terms of the extent to which it 
drives the population to retain better solutions. Indeed, 
most of the methods for replacement are based on those 
already described for selection (Sect. 42.2). 


42.3.1 Random Replacement 


A simple method commonly found in steady-state ge- 
netic algorithms is for the new solution to replace an 
existing member chosen uniformly at random. If this 
is done, then the replacement phase does not push 
the search process in any particular direction and the 
onus for evolving toward better solutions is on the 
selection method chosen. Because of this, it is com- 
monly supposed that a steady-state genetic algorithm 
with random replacement is more or less equivalent to 
a generational genetic algorithm, using the same selec- 
tion method. This is not quite true; however, it can be 
shown theoretically that the long-term behavior of both 
algorithms will be the same [42.19]. What will not nec- 
essarily be the same is the short term, transient behavior 
and, in particular, the speed with which the algorithms 
will arrive at their long-term equilibrium may well be 
different. 


42.3.2 Inverse Selection 


Several replacement methods are based directly on 
selection methods, but changed so that the poorer per- 
forming solutions are more likely to be replaced than 
the better ones. For example, one can construct an in- 
verse fitness proportional replacement method, where 
the probability of being replaced is determined by the 
fitness. In order to ensure that lower fitness means 
a higher chance of replacement, the reciprocal of the 
fitness might be used to determine the probability of re- 
placement. Alternatively, the fitness can be subtracted 
from that of the global optimum (if known). This would 


vary from, say, a half (rather weak) to a tenth (rather 
strong). Usually, this must be done experimentally, as 
there is little theoretical analysis of this form of selec- 
tion for combinatorial problems. 


have the advantage that the optimum, if found, would 
never be replaced. In this case, the probability that item 
i in the population is selected for replacement will be 


f -fÒ 
uf* — DFO) 


where f* is the optimum fitness value. 

Replacement determined by fitness has the same 
drawbacks as selection done in this way. For example, 
toward the end of the search all the population members 
will have similar fitness values. Fitness proportional re- 
placement will then be almost as likely to replace the 
best individual as the worst. Consequently, an alterna- 
tive method using scaling or rank might be preferred. 
The simplest of these ideas is to use a tournament, but 
this time pick the worst of the sample: 


1. Choose k items from the population, uniformly at 
random without replacement. 
2. Return the worst item from those chosen. 


This has the advantage that the best item in the pop- 
ulation cannot be replaced. 


42.3.3 Replace Worst 


Perhaps the most common choice of replacement strat- 
egy is to simply replace the worst member of the 
population with the new offspring. This is a relatively 
strong approach, as it preserves all the better members 
of the population. Indeed, this strategy can well be com- 
bined with the random selection method, and using only 
replacement as the means of driving the evolution to- 
ward better individuals. 

An even stronger variant would be to replace the 
worst member of the population only if the new off- 
spring is better or equal in value. This has the property 
that the minimum fitness of the population can never 
decrease, and so we are guaranteed progress through- 
out the evolution. In some cases, this may lead to much 
faster evolution. However, it may also happen that for 
a long time, no new individuals are added to the pop- 


831 


EZH |3 Hed 


832 PartE 


Evolutionary Computation 


17h |3 Wed 


ulation. This will be increasingly the case toward the 
end of a run, when it is increasingly hard to create bet- 
ter individuals. Replacing the worst member, regardless 
of how good or bad the new offspring, at least keeps 
adding new search points and creates new possibilities 
for further exploration, while retaining copies of the 
better solutions in the population. 


42.3.4 Replace Parents 


A different idea for choosing which element to replace 
is for the offspring to replace the parent (or parents). 
This would have an advantage in maintaining some 
level of diversity (Sect. 42.7) since the offspring is 
likely to be similar to the parent. The simplest way to 
do this, in the case when mutation but not crossover is 
used is for the offspring to replace the parent if it is bet- 
ter. If the selection process is purely random, then this 
amounts to running a number of local search algorithms 
in parallel, as there is no interaction between the indi- 
viduals in the population. 

A variation on this is when there is crossover, which 
requires the selection of a second parent (Sect. 42.6). 
In this case, it makes sense for the offspring to replace 
the worst of the two parents, guaranteeing that the best 
individual is never replaced. This is the idea behind the 
so-called microbial genetic algorithm [42.20], which is 
a steady-state genetic algorithm, with random selection, 
standard crossover, and mutation (Sects. 42.4 and 42.6), 
with the offspring replacing the worst parent: 


1. Generate random population. 
2. Repeat until stopping criterion satisfied: 


42.4 Mutation Methods 


Whereas selection and replacement focus the genetic 
algorithm on a subset of its population, the mutation 
and crossover operators enable it to sample new points 
in the search space. The idea behind mutation is that, 
having selected a good member of the population, we 
should try to create a variant with the hope that it is 
even better. To do this, it is often possible to make use 
of some natural or well-established local search opera- 
tors for the problem class concerned. For example, for 
the traveling salesman problem, the well-known 2-opt 
operator works by reversing a random segment of the 
selected tour [42.23, 24]. This immediately provides us 
with a way of mutating solutions for this problem class. 


a) Select two individuals from the population uni- 
formly at random. 
b) Perform crossover and mutation. 
c) Let the new offspring replace the worst of the 
two parents. 
3. Stop. 


A further variation possible if crossover is used is 
to create two offspring and for them to replace both 
parents under some suitable conditions. Possibilities 
include: 


© If atleast one offspring is better than both parents. 

© If both offspring are no worse than the worst parent. 

© If one offspring is better than both parents and the 
other is better than one of them. 


This is the idea behind the gene invariant genetic al- 
gorithm [42.21]. It is designed for use on search spaces 
given by fixed length binary strings. We arrange for the 
initial population to be constructed such that for every 
random string generated, we also include its bitwise 
complement. This ensures an equal number of ones 
and zeros at each bit position in the population. When 
crossover takes place between two parents, if we keep 
both possible offspring, then we maintain the number 
of ones and zeros. This arrangement naturally main- 
tains a lot of diversity in the population, without even 
the necessity to include mutation. Early empirical stud- 
ies suggested that this approach would be very good 
at avoiding certain kinds of traps in the search space. 
There seems to have been very little work following up 
this suggestion, however (although [42.22] for the anal- 
ysis of a (1 + 1) version of the algorithm). 


Similarly, when solving the Knapsack problem, it is 
often helpful to exchange items, and this too gives us 
a good idea how to mutate. 

Generally speaking, mutations are defined by 
choosing a representation for the points of the search 
space, and then defining a set of operators to act on that 
representation. For example, many combinatorial opti- 
mization problems concern choosing an optimal subset 
of some set. We can represent the search space, the col- 
lection of all subsets, using binary strings with length 
equal to the size of the set. Each position corresponds to 
a set element, and we use 0 and 1 to distinguish whether 
or not an element is included in a particular subset. We 


Genetic Algorithms | 42.4 Mutation Methods 


then define a collection of operators with which to act 
upon the representation. For example, the action of re- 
moving or adding an item to the subset is very natural, 
and is given by simply flipping the bit at the appropri- 
ate position. If there are n bits in the representation, then 
this would give us n corresponding operators. In order 
to mutate a bitstring, one chooses an operator at random 
and applies it. 

In order to exchange an item in a subset with one 
that is not, we must simultaneously flip a 1 to a 0 and 
a 0 to a 1. If the subset has k elements, then there are 
k(n—k) ways to do this, giving us another possible set 
of operators. 

Perhaps the most common choice of mutation for 
binary strings is to randomly flip each bit indepen- 
dently with a fixed probability u called the mutation 
rate. This corresponds to flipping a subset of bits of 
size k with probability u«(1—u)"—*. Very often, the mu- 
tation rate is set to u = 1/n. This is popular because, 
while it favors single bit mutations, there is a signifi- 
cant probability of flipping two or more bits, enabling 
exchanges to take place. Notice, however, that there is 
a probability of (1 — 1/n)” ~ 1/e that nothing happens 
at all. While this clearly slows down evolution by an 
almost constant factor, it is not necessarily a bad thing 
to have a significant probability of doing nothing — it 
can sometimes prevent evolution rushing off down the 
wrong path (Sect. 42.5). 

While the mutation rate u = 1/n is the most com- 
monly recommended, the best value to choose de- 
pends on the details of the problem class, and the 
rest of the genetic algorithm being used [42.25]. For 
example, the simple (1+ 1) EA maintains a single 
point of the search space which it repeatedly mu- 
tates — replacing it only if a better offspring occurs. 
For linear functions, the choice of u = 1/n is prov- 
ably optimal for the class of linear functions [42.26]. 
However, it can be shown that for the so-called lead- 
ing ones problem, in which the fitness of a string is 
simply the position of the first zero, the optimal mu- 
tation rate for the (1 + 1) algorithm is in fact close to 
1.59/n [42.27]. 

For a more general approach to defining a mutation 
operator, one could assign a probability to each subset 
of bits, and flip such a subset with the given probability. 
This corresponds to defining a probability distribution 
over the set of binary strings x. To mutate a string x, we 


choose another string y with probability 2 (y) and return 
the result x ® y, where the © symbol represents bitwise 
exclusive-or [42.28]. This general method has the prop- 
erty that it is invariant with respect to the labels 0 and 
1 to represent whether or not an item is in a given sub- 
set. One might also wish mutation to be invariant with 
respect to the ordering of the n elements of the underly- 
ing set, so that the performance of the genetic algorithm 
is not sensitive to how this order is chosen. To do this 
requires a much more restricted mutation operator. One 
must specify a probability distribution over the numbers 
0, 1,...,m and then, having chosen a number according 
to this distribution, flip a random subset of bits of this 
size. Such a mutation is said to be unbiased with re- 
spect to bit labels (that is, the choice of 0 or 1) and bit 
ordering [42.29, 30]. 

Mutating each bit with a fixed rate is an example of 
an unbiased mutation operator, with the probability of 
flipping k bits being 


() u*(1 n u)" . 


The following is an efficient algorithm to choose k ac- 
cording to this binomial distribution [42.31]: 


Letx=y=0. 

Let c = log(1 — u). 

Let r be a random number from 0 to 1. 

Let y= y+ 1+ [log(r)/c]. 

If y <n then let x = x+ 1 and go to 3. Else return x. 


hr art Soe a 


We can use this algorithm to perform mutation by 
first selecting the number of bits to be flipped and 
then choosing which particular subset of that size will 
be mutated. The above algorithm has expected run- 
ning time of O(un). If u is relatively small, one can 
then sample the bit positions from {1,...,n} repeat- 
ing if the same index is selected twice (use a hash 
table to detect the unlikely event of a repeated sam- 
ple). For the choice u=1/n the random selection 
algorithm runs in constant time, and, for large n, the 
probability of having to do a repeat sample tends to 
zero. Consequently, performing mutation in this way 
is extremely efficient. An alternative method is to 
randomly choose the position of the next bit to be 
flipped [42.32]. 


833 


17h |3 Hed 


834 PartE | Evolutionary Computation 

= 42.5 Selection—Mutation Balance 

o 

= 

= We are now in a position to put together a simple ge- 
= netic algorithm involving selection, replacement, and 
5 mutation. Selection and replacement will focus the 
ul 


search on good solutions, whereas mutation will ex- 
plore the search space by generating new individuals. 
We will find that there is a balance between these two 
forces, but the exact nature of the balance depends on 
the details of the algorithm, as well as the problem 
to be solved. To simplify things a little, we will con- 
sider the well-known toy problem, OneMax, in which 
the fitness of a binary string is the sum of its bits. For 
this problem, there are a number of theoretical results 
concerning the balance between selection and muta- 
tion, for example [42.33, 34]. For a thorough analysis 
of the selection—mutation balance on a LeadingOnes 
type problem, [42.35]. Here, we will illustrate the ef- 
fects with some empirical data. 

The first algorithm we will look at is the steady state 
genetic algorithm, using binary tournament selection, 


a) 


100 


| ping Aan AYN 
olj 


0 1000 


> 
2000 3000 4000 


50 


> 
0 1000 2000 3000 4000 


and random replacement. As described above, the long- 
term behavior will be the same as for a generational ge- 
netic algorithm, with the same selection and mutation 
methods. Specifically, let us consider a population of 
size 10, fix the string length to 100 bit, and consider 
the effect of varying the mutation rate. In Fig. 42.1, we 
show four typical runs at mutation rates 0.03, 0.02, 0.01 
and 0.001 respectively. We plot the fitness of the best 
member of the population at each generation. 

Recall that this algorithm is not elitist. This means 
that it is possible to lose the best individual, by re- 
placing it with a mutant of the selected individual. 
It is chosen for replacement with probability of 0.1. 
Whether it is replaced by something better or worse de- 
pends on the mutation rate. A higher mutation rate will 
tend to be more destructive. We see from the plotted tra- 
jectories, that for higher mutation rates, the algorithm 
converges to a steady state more quickly (around gen- 
eration 200 in the case of mutation rate of 0.03) but to 


b) 
1004 


" Pan aiie ara i 
f 
| 


0 1000 


> 
2000 3000 4000 


0 > 
0 1000 2000 3000 4000 


Fig. 42.1a-d Trajectories of best of population for steady-state GA with random replacement on OneMax problem (100 
bit). (a) Mutation rate = 0.03; (b) mutation rate = 0.02, (c) mutation rate = 0.01, (d) mutation rate = 0.001 


Genetic Algorithms | 42.5 Selection—Mutation Balance 


7000 4 
6000 
5000 
4000 
3000 
2000 
1000 


0 1ÃÁ——— _____) 
0 0.005 0.01 0.015 0.02 0.025 0.03 


Fig. 42.2 Average time taken to optimize OneMax (100 
bit) for steady-state GA with worst replacement, with vary- 
ing mutation rates 


a poor quality solution (average fitness around 74). As 
the mutation rate is reduced, the algorithm takes longer 
to find the steady state, but it is of higher quality. For the 
smallest mutation rate shown, 0.001, the run illustrated 
takes 3500 generations to converge, but this includes the 
optimal solution. 

Since the OneMax problem is simply a matter of 
hill-climbing, we can improve matters by using an eli- 
tist algorithm. So we now consider the steady-state 
genetic algorithm, again with binary tournament selec- 
tion, but this time replacing the worst individual in the 
population. In this setup, we cannot lose the current best 
solution in the population. We quickly find experimen- 
tally that for reasonable mutation rates, we can always 
find the optimum solution in reasonable time. However, 
again there is a balance to be struck between selec- 
tion strength and mutation. If mutation is too high, it 
is again destructive, which slows down progress. If it is 
too low, we wait for a long time for progress to be made. 
There is now an optimal mutation rate to be sought. Fig- 
ure 42.2 shows the average time to find the optimum 
for the same four different mutation rates (0.03, 0.02, 
0.01, 0.001). The average is taken over 20 runs. We can 
clearly see that the best tradeoff is obtained with the 
mutation rate of 0.01 in this case, with an average of 
around 1400 generations required to find the optimum. 
This rate equals one divided by the string length, and is 
a common choice for EAs. 

It is not strictly necessary to implement an eli- 
tist strategy to obtain reasonable optimisation perfor- 
mance. Consider instead a generational genetic algo- 
rithm in which our selection is the strongest pos- 
sible — we always pick the best in the population. 
This is not technically elitist, as we apply muta- 


8000 
7000 
6000 
5000 
4000 
3000 
2000 
1000 


OQ ————— _——o,_ > 
0.005 0.01 0.015 0.02 


Fig. 42.3 Average time taken (generations) to optimize 
OneMax (100 bit) for generational GA with best selection, 
population size 10, and various mutation rates 


4000 4 
3000 
2000 


1000 


5 10 15 20 


Fig. 42.4 Average time taken (generations) to optimize 
OneMax (100 bit) for generational GA with best selection, 
mutation rate 0.01, and various population sizes 


tion to the selected individual, which means there is 
a chance it is lost. However, if there is a reason- 
able population size, and the mutation rate is not too 
large, then there is a good chance that a copy of 
the best individual will be placed in the next gen- 
eration. This in effect simulates elitism [42.36]. Yet 
again there is a balance between selection and mu- 
tation, this time depending on the population size. 
If the mutation rate is high, then we will need 
a large population to have a good chance of preserv- 
ing the best individual. Smaller mutation rates will 
allow smaller populations, but will slow down the 
evolution. 

Consider first a population of size 10, and a range of 
mutation rates: 0.005, 0.01, 0.015 and 0.02. Figure 42.3 
shows the average time to find the optimum for our 
generational genetic algorithm. We see that it is very 


835 


S°@ |3 Hed 


836 PartE 


Evolutionary Computation 


9°74 |3 Hed 


efficient for sufficiently low mutation rates (only 125 
generation when the mutation rate is 0.01). However, 
there is a transition to much longer run times when the 
mutation rate gets higher (2500 generations when the 
mutation rate is 0.02). 

Conversely, we can consider fixing the mutation rate 
at 0.01 and varying the population size. Figure 42.4 
shows the optimisation time in generations for popu- 
lations of size 5, 10, 15 and 20. Again we see evidence 
of a transition from long times (when the population 
is too small) to very efficient times with larger popula- 
tions (as low as 80 generations when the population size 
is 20). However, we have to bear in mind that a gen- 
erational genetic algorithm has more fitness function 
evaluations per generation than a steady-state genetic 
algorithm. Thus we cannot keep increasing the popu- 
lation size indefinitely without cost. Figure 42.5 shows 
the number of fitness function evaluations required to 
find the optimum. We see that, of the examples shown, 
a population of size 10 is best (requiring an average of 
1374 evaluations). 

The exact tradeoff between mutation rate and popu- 
lation size can be calculated theoretically. If the number 
of bits is n, and the mutation rate is 1/n, it can be 


42.6 Crossover Methods 


Crossover (or recombination) is a method for combining 
together parts of two different solutions to make a third. 
The hope is that good parts of each parent solution will 
combine to make an even better offspring. Of course, we 
might also be recombining the bad parts of each parent 
and come up with a worse solution — but then selection 
and replacement methods will filter these out. 

Several methods exist for performing crossover, de- 
pending on the representation used. For binary strings, 
there are three common choices. One-point crossover 
chooses a bit position at random and combines all the 
bit values below this position from one parent, with all 
the remaining bit values from the other. Thus, given the 
parents 


01001101, 
11100111 


if we choose the fifth position for our cut point, we ob- 
tain the offspring solution 


01000111. 


0 > 
D 10 15 20 


Fig. 42.5 Average time taken (fitness function evalua- 
tions) to optimise OneMax (100 bit) for generational GA 
with best selection, mutation rate 0.01, and various popu- 
lation sizes 


shown that there is a transition between exponential and 
polynomial run time for the OneMax problem when the 
population size is approximately 5 log), n. Indeed it can 
be shown that this is a lower bound on the required pop- 
ulation size to efficiently optimize any fitness function 
with unique global optimum [42.37]. 


Similarly, for two-point crossover, we choose two bit po- 
sitions at random. The bit values between these two cut 
points come from one parent, and the remaining values 
come from the other. Thus, with the same two parents, 
choosing cut positions 2 and 6 produces the offspring 


O1100101. 


Both one- and two-point crossovers have the property 
of being biased with respect to the ordering of bits. If 
the problem representation has been chosen so that this 
order matters, then such a crossover may confer an ad- 
vantage as they tend to preserve values that are next 
to each other. For example, if the problem relates to 
finding an optimal subset, and the elements of the set 
have been preordered according to some heuristic (e.g. 
a greedy algorithm), then one- or two-point crossover 
may be appropriate. If, however, the order of the bits 
is arbitrary, then one should choose a method which is 
unbiased with respect to ordering. The common choice 
is called uniform crossover, and involves choosing bit 
values from either parent at random. One way of im- 
plementing this would be to generate a random bit 


Genetic Algorithms | 42.6 Crossover Methods 


string and let the values in this string (or mask) deter- 
mine which parents the bit values should come from. 
For example, using again the two parents above, if we 
generate the random mask 01010101, we get the off- 
spring 


01001101. 


This leads to a more general view of crossover: We 
specify a probability distribution z over the set of bi- 
nary strings and, to perform crossover, we select a string 
according to this distribution and use it as a mask [42.5]. 
Uniform crossover corresponds to a uniform distribu- 
tion. One-point crossover corresponds to selecting only 
masks of the form O ...01 ...1. It can be shown that 
crossover by masks in general is always unbiased with 
respect to changes in labels of bit values (that is, ex- 
changing ones for zeros). If we also require crossover 
to be unbiased with respect to bit ordering, then it is 
necessary and sufficient that masks containing the same 
number of ones are selected with the equal probability. 

Any crossover by masks also has the nice property 
that if both parents agree on a bit position, then the off- 
spring will also share the same value at that position. 
Such crossovers are called respectful, and emphasize 
the idea that one is trying to preserve structure found in 
the parents [42.38, 39]. It can be shown that such prop- 
erties can be understood geometrically [42.40]. 

Our understanding on when crossover can be help- 
ful is rather limited and there are many open questions. 
For example, continuing to look at the OneMax prob- 
lem, we can examine experimentally whether or not 
crossover helps. Let us keep to the steady-state al- 
gorithm with tournament selection and replacement 
of the worst individual. We modify the algorithm 
by allowing, with a given probability, crossover to 
take place between the selected individual and a ran- 
domly chosen one. Our algorithm therefore is as fol- 
lows: 


1. Initialize population of size u randomly with points 
from the search space. 
2. Repeat until stopping criterion is satisfied: 

a) Choose a member of the population using bi- 
nary tournament selection. 

b) With probability p, crossover selected individ- 
ual with one chosen randomly from the popula- 
tion. 

c) Modify the result with mutation using rate 1/n. 

d) Replace worst member of population with the 
new offspring. 

3. Stop. 


0 0.2 0.4 0.6 0.8 1 


Fig. 42.6 Average time taken to optimize OneMax (100 
bits) for generational GA with tournament selection, mu- 
tation rate 0.01, population size 10, and varying crossover 
probabilities 


120004 
10000 
8000 
6000 


4000 


2000 


0 


Fig. 42.7 Average time taken to optimize OneMax for var- 
ious string lengths for generational GA with tournament 
selection, mutation rate 1/n, population size 10. Solid line 
is with no crossover. Dotted line is with uniform crossover 
with probability 1 


We first consider an experiment on OneMax with 
n= 100. We use a population of size 10, and vary 
the crossover probability between O and 1. The re- 
sults are shown in Fig. 42.6, which shows the average 
time to find the optimum (averages over 20 runs). It 
can be seen that there is some improvement as the 
crossover rate increases, although the results are rather 
noisy (error bars represent one standard deviation). Ex- 
amining this case further, we compare the steady-state 
genetic algorithm with no crossover (p = 0) with uni- 
form crossover (p = 1) for different string lengths. The 
results are shown in Fig. 42.7. Here it is clear that 
there in significant improvement, which appears to be 
increasing as the string length grows. To date, there 


837 


9°74 |3 Hed 


838 


27@ |3 Hed 


Part E | Evolutionary Computation 
is no theoretical analysis of why this should be the 
case. 100 7 
The first example of a problem class for which 
crossover can provably help is the following [42.12, 80 
41]. We take the OneMax function, and then create 
a trap just before the optimum 60 
0 if6<|\x|| <n, 40 
jump(x) = hea 
||x|| otherwise , 20 
where ||x|| is the number of ones in x (Fig. 42.8 for an 0 ` 
illustration). The idea is that the population first climbs 0 20 40 60 80 100 


the hill to the threshold 0 just before the trap. It is then 
rather unlikely that a single mutation event will create 
a string that crosses the gap and finds the global op- 
timum. However, crossing over two strings with just 
a few zeros in each may have a better chance of jumping 
the gap, especially if the zeros occur in different places 
in the two parents. To achieve this, some level of diver- 
sity must be maintained in the population — a subject 
discussed further in Sect. 42.7. 

For problem classes in which solutions are not nat- 
urally represented by binary strings, one has to think 
carefully about the best way to design a crossover 
operator. For example, take the case of the traveling 
salesman problem, in which a solution is given by a per- 
mutation of the cities, indicating the order in which they 
are to be visited. Over the years, a number of different 


42.7 Population Diversity 


For most forms of crossover, if one crosses an indi- 
vidual with itself, the resulting offspring will again be 
the same. Crossovers with this property are said to be 
pure [42.39]. This is certainly a property of one-point, 
two-point and uniform crossovers for binary strings. 
There is therefore no point in performing crossover if 
the population largely comprises copies of the same 
item. In fact, the whole idea of the population is rather 
wasted if this is the case. Rather, the hope is to gain ad- 
vantage by having different members of the population 
search different parts of the search space. It seems im- 
portant, therefore, to maintain a level of diversity in the 
population. 

The importance of this can be seen in solving the 
jump(x) problem described earlier in Sect. 42.6. Once 
the population has arrived at the local optimum, the 
individuals will typically contain 6—1 ones, and the 
rest zeros. If the zeros tend to fall in the same bit po- 


Fig. 42.8 The jump(x) function for string length n = 100, 
with threshold 6 = 98 


crossover methods have been proposed. For the most 
part, these were designed so as to be respectful of the 
positions of the cities along the route. That is, if a partic- 
ular city was visited first by both parents, then it would 
also be visited first by the offspring. However, a much 
more effective approach is to try to preserve the edges 
between adjacent cities in a route. That is, if city A is 
followed immediately by city B by both parents, irre- 
spective of where this takes place in the route, then we 
should try to ensure that this also happens in the case 
of offspring. This kind of crossover is called an edge 
recombination operator [42.42]. 


sitions for each member of the population, then it will 
be impossible for crossover to jump across the gap. For 
example, crossing 


1111111000 
and 
1111110100 


cannot produce the optimum. If, however, we can en- 
sure that diversity in the population, then two members 
at the local optimum might be 


1111111000 
and 
0110110111, 


which has a reasonable chance to jump the gap. 


Genetic Algorithms | 42.8 Parallel Genetic Algorithms 


Several different mechanisms have been proposed 
to ensure some diversity is maintained in a population. 
The simplest way is to enforce it directly by not allow- 
ing duplicate individuals in the population [42.43]. So 
if an offspring is produced which is the same as some- 
thing already in the population, we just discard it. 

A second method is to adjust the replacement 
method in a steady-state genetic algorithm, by making 
sure the new offspring replaces something similar to it- 
self. For example, one could have a replacement rule 
that makes the offspring replace the population mem- 
ber most similar to it (as measured by the Hamming 
distance). This method, called crowding will, of course, 
destroy the elitism property [42.44]. This can be sal- 
vaged by only doing the replacement if the offspring is 
at least as good as what it replaces. 

A third approach to diversity is to explicitly modify 
the fitness function in such a way that individuals re- 
ceive a penalty for being too similar to other population 
members [42.45]. This idea is called fitness sharing. We 
think of the fitness function as specifying a resource 
available to individuals. Similar individuals are compet- 
ing for the resource, which has to be shared out between 
them. 

A fourth approach is to limit the choice of the part- 
ner for crossover. So far, our algorithms have chosen 
the crossover partner uniformly at random. One could 
instead, try to explicitly choose the most different indi- 
vidual in the population [42.46]. 

We test these various methods using the jump(x) 
problem of the previous section. Recall that for the 
population to jump across the gap requires a crossover 
between two diverse individuals. We work with a string 
length n = 100 and a threshold 0 = 98. As a baseline, 
consider the steady-state genetic algorithm, with binary 


42.8 Parallel Genetic Algorithms 


There have been a number of studies of different ways 
to parallelize genetic algorithms. There are two basic 
methods. The first is the island model, in which we 
have several populations evolving in parallel, which oc- 
casionally exchange members [42.47]. To specify such 
an algorithm, one needs to decide on the topology of 
the network of populations (that is, which populations 
can exchange members) and the frequency with which 
migrations can take place. We also need a method to 
decide on which individuals should be passed, and how 
they should be incorporated into the new population. 


Table 42.1 Success of various diversity methods in solv- 
ing the jump(x) problem. Each is tested for 20 trials, to 
a maximum of 10000 fitness function evaluations. Mean 
and standard deviation refer just to successful runs 


Method Percentage Mean Standard 
successful evaluations deviation 
runs (%) 

None 5 8860 - 

No duplicates 80 3808 2728 

Crowding 100 1320 389 

Sharing 65 3452 2652 

Partner choice 10 688 113 


tournament selection, replacement of the worst, uni- 
form crossover (probability 1) and mutation rate 0.01. 
In 20 trials on the jump(x) function, only once did 
this algorithm succeed in finding the optimum within 
10000 generations. On that one successful run, it re- 
quired 8860 evaluations to complete. We compare this 
result with the same algorithm, modified in each of the 
four diversity-preserving methods above. In the case 
of fitness sharing, we simply penalize any individual 
which has multiple copies in the population by subtract- 
ing the number of copies from the fitness (It should be 
noted that there are different ways of doing this, and the 
original method is far more complicated.). The results 
of the experiments are summarized in Table 42.1. The 
best result is given by the crowding mechanism, but no- 
tice that it is essential to preserve elitism here, otherwise 
the algorithm never solves the problem within 10000 
generations. 

A different approach altogether is to structure the 
population in some way, so as to prevent certain individ- 
uals interaction. We take up this idea, in a more general 
context, in the following section. 


That is, we need a form of selection and replacement 
for the migration stage. 

As an example, consider having several steady-state 
genetic algorithms operating in parallel. After a certain 
number of generation, we choose a member of each 
population to migrate — for example, the best one in 
each population. We copy this individual to the neigh- 
boring populations, according to the chosen topology. 
To keep things simple, we choose the complete topol- 
ogy, in which every pair of populations is connected. 
Thus, each population receives a copy of the best from 


839 


8°77 | 3 Hed 


840 


8°7h | 3 Hed 


Part E 


Evolutionary Computation 


all the other populations. These now have to be incor- 
porated into the home population somehow. An easy 
method is to take the best of all the incoming indi- 
viduals, and use it to replace the worst in the current 
population. Such an algorithm will look like this: 


1. Create m populations, each of size u. 

2. Update each population for c generations. 

3. For each population, replace the worst individual by 
the best of the remaining populations. 

4. Goto2. 


To take an extreme case, if the population size is 
u = 1 and we migrate every generation (c = 1), then 
this is rather similar to a (1, m) EA. 

There are two possible advantages of the island 
model. Firstly, it is straightforward to distribute it on 
a genuinely parallel processing architecture, leading to 
performance gains. Secondly, there may be some prob- 
lem classes for which the use of different populations 
can help. The idea is that different populations may 
explore different parts of the search space, or develop 
different partial solutions. The parameter c is chosen 
large enough to allow some progress to be made. The 
migration stage then allows efforts in different direc- 
tions to be shared, and workable partial solutions to be 
combined. Some recent theoretical progress has been 
made in analyzing this situation for certain problem 
classes [42.48]. 

A particular case of such a model is found in co- 
evolutionary algorithms. These come in two flavors: 
competitive and co-operative. In a competitive algo- 
rithm, we typically have two parallel populations. One 
represents solutions to a problem, and the other repre- 
sents problem instances. The idea is that, as the former 
population is finding better solutions, so the latter is 
finding harder instances to test these solutions. An early 
example evolved sorting networks for sorting lists of 
integers [42.49]. While one population contained dif- 
ferent networks, the other contained different lists to be 
sorted. The fitness of a network was judged by its ability 
to sort the problem instances. The fitness of a problem 
instance was its ability to cause trouble for the net- 
works. As the instances get harder, the sorting networks 
become more sophisticated. 

In a co-operative co-evolutionary algorithm, the 
different populations work together to solve a single 
problem [42.50]. This is done by dividing the problem 
into pieces, and letting each population evolve a solu- 
tion for each piece. The fitness of a piece is judged by 
combining it with pieces from other populations and 
evaluating the success of the whole. Theoretical anal- 


ysis shows that certain types of problems can benefit 
from this approach by allowing greater levels of explo- 
ration than in a single population [42.51]. 

The second parallel model for genetic algorithms 
is the fine grained model, in which there is a sin- 
gle population, but with the members of the popu- 
lation distributed spatially, typically on a rectangular 
grid [42.52]. At each time step, each individual is 
crossed over with a neighbor. The resulting offspring 
replaces the original if it is better, according to the fit- 
ness function. The algorithm looks like this: 


1. Create an initial random population of size m. 
In parallel, for each individual x: 
a) Choose a random neighbor y of x. 
b) Cross over x and y to form z. 
c) Mutate z to form the offspring. 
d) Replace x with the offspring if it is better. 
3. Go to 2. 


Notice that such an algorithm is generational, but 
also elitist. There has been very little analysis of this 
kind of architecture, although there is some empirical 
evidence that it can be effective, especially for problems 
with multiple objectives, in which different tradeoffs 
can emerge in different parts of the population [42.53]. 

To illustrate the fine-grained parallel genetic al- 
gorithm, consider a ring topology, in which the kth 
member of the population has as neighbors the (k—1)th 
and (k+ 1)th member (wrapping round at the ends). We 
try it on OneMax, with a population of size 10, using 
uniform crossover and, as usual, a mutation rate of 1/n. 
The results, for a variety of string lengths, are shown 


10000 
8000 
6000 
4000 
2000 


0 100 200 300 400 500 


Fig. 42.9 The average time (in fitness function evalua- 
tions) for the ring-topology parallel genetic algorithm to 
find the optimum for OneMax for a variety of string 
lengths. Population size 10, mutation rate = 1/n, uniform 
crossover 


Genetic Algorithms | 42.9 Populations as Solutions 


in Fig. 42.9. We can see that the parallel algorithm 
is competitive with the steady-state genetic algorithm. 
Note that fitness function evaluations are plotted, and 
not generations. 

The distributed nature of the population should 
help to maintain a level of diversity. Hence, we would 
expect a parallel genetic algorithm (with crossover) 
to perform reasonably well on the jump(x) function 


42.9 Populations as Solutions 


We finish this chapter by considering a genetic algo- 
rithm for which the population as a whole represents the 
solution to the problem, rather than it being a collection 
of individual solutions. It is therefore an example of co- 
operative co-evolution taking place within a single pop- 
ulation. Moreover, it is one of the very few examples 
involving a population-based genetic algorithm using 
crossover, for which a serious theoretical analysis ex- 
ists. It is one of the highlights of the theory of genetic 
algorithms to date [42.14, 54, 55]. 

The problem we are addressing is the classical All- 
Pairs Shortest Path problem. We are given a graph with 
vertex set V (containing n vertices) and edge set E with 
positive weighted edges. The goal is to determine the 
shortest path between every pair of vertices in the graph, 
where length is given by summing the weights along the 
path. 

To clarify what is meant exactly by a path, first 
consider a sequence of vertices, vj,..., Vm such that 
(Vk, vk+1) € E for all k = 1,...,m—1. Such a sequence 
is called a walk. A path is then a walk with no repeated 
vertices. Since for any walk between two vertices there 
is a shorter path (by omitting any loops), we can equiva- 
lently consider the problem of finding the shortest walks 
between any two vertices. 

The population will represent a solution to the 
problem for a given graph, by having each individual 
representing a walk between two vertices. The prob- 
lem is solved when the population contains exactly the 
shortest paths for all of the n(n — 1) pairs of vertices. 

The algorithm will be a steady-state genetic algo- 
rithm, with random selection and a replacement method 
which enforce diversity. For any pair of vertices we will 
allow at most one walk between them to exist in the 
population. The outline of the algorithm is as follows: 


1. Initialize the population to be E. 
2. Select a population member uniformly at random. 


of Sect. 42.6. Over 20 trials of our ring-based fine- 
grained algorithm, on the jump(x) function with n= 
100 and 6 = 98, we find that it solves the problem 
on all trials, requiring an average of 2924 function 
evaluations (standard deviation is 1483). It is there- 
fore competitive with the simple diversity enforcement 
method on this problem (compare with results in Ta- 
ble 42.1). 


3. With probability p do crossover, else do mutation. 

4. If a walk with the same start and end is in popula- 
tion, replace it with offspring, if offspring length is 
no worse. 

5. If a walk with same start and end is not in popula- 
tion, add offspring walk to population. 

6. Goto 2. 


We can see from line 5 that another unusual feature 
of the algorithm is that the population size can grow. 
Indeed, it starts with just the edge set from the graph 
and has to grow to get the paths between all pairs of 
vertices. 

In line 3, we see there is a choice between muta- 
tion and crossover, governed by a parameter p. Given 
that our individuals are walks in a graph (rather than bi- 
nary strings) it is clear that we need to specify some 
special purpose operators. We define mutation to be 
a random lengthening or shrinking of a walk as follows. 


Suppose that the selected walk is vj, v2, ... , Vm—1, Vm- 
We randomly select a vertex from the neighbors of 
vı and vm. If this is neither v2 nor v,,—; then we ap- 


pend it to the walk. If it is one of v2 or v,,—; then we 
truncate the path at that end. This process is repeated 
a number of times, given by choosing an integer s ac- 
cording to a Poisson distribution with parameter A = 1. 
We then perform s + 1 mutations to generate the off- 
spring walk. 

As an example, consider the graph illustrated in 
Fig. 42.10 (notice that the edge weights are not shown). 
Suppose we have selected the walk (3,4, 5, 6) from the 
population to mutate. We choose our random Poisson 
variable s — let us say it is one. So we have two muta- 
tions to apply. We gather the set of vertices connected to 
the two end points. That is, {1,4, 8} and {5,7, 10, 11}, 
and we pick one of these at random. Let us suppose 
that 8 is selected. We therefore extend our walk to be- 
come (8,3,4,5,6). For the second mutation, we again 


841 


6°¢7 | J Hed 


842 


OL’? | J Hed 


Part E | Evolutionary Computation 


Fig. 42.10 An example graph for the All Pairs Shortest Path prob- 
lem (edge weights not shown) 


collect together the vertices attached to the end points: 
{3,9} and {5,7, 10,11}. Choosing one at random we 
select, say, 5. Since this is a vertex in the walk prior to 
an end point, we truncate to produce the final offspring 
(8,3, 4,5). If there is already a walk from 8 to 5 in the 
population, we replace it with the new one, only if the 
new one is in fact shorter (that is, the sum of the weights 
is less). If there is no such walk, then we add the new 
one to the population. 

To perform crossover, we have to be careful to en- 
sure that we end up with a valid walk. Suppose that 
the individual we have selected is a walk from u to v. 
We consider all the members of the population that start 
from v, but exclude the one that goes back to u. We then 
choose a member of this set, uniformly at random, and 
concatenate it to our original walk. This guarantees that 
the offspring is again a valid walk. 

For example, consider again the graph in Fig. 42.10 
and the selected walk (3, 4, 5, 6). We need to first collect 
together all the walks in the population that start from 
vertex 6, excluding the one (if it exists) that goes from 


42.10 Conclusions 


We have seen that the defining feature of a genetic al- 
gorithm is its maintenance of a population, which is 
used to search for an optimal (or at least sufficiently 
good) solution to the problem class at hand. For prob- 
lems where solutions are naturally represented as binary 
strings, there is an obvious analogy with an evolving 
population of individuals in nature, with the strings pro- 
viding the DNA. Analogs of mutation and crossover 
(recombination) are then readily definable and can be 
used as search operators. The theory that describes the 


6 to 3. Imagine that we find the following: 


(6, 10, 5,2) 
(6,5,2,7) 
(6,11,10) 


and we pick one at random — say, the second one. 
Concatenating this to the original walk produces the 
offspring (3,4,5, 6,5, 2,7). Notice that this is a walk, 
rather than a path, since vertex 5 is repeated. If this is 
better than any existing walk from vertex 3 to vertex 7, 
then it replaces it. If there is no such walk in the popu- 
lation, then the new one gets added. 

A considerable amount of theoretical analysis have 
been done for this genetic algorithm on the class of All 
Pairs Shortest Path problems. If we run the algorithm 
with no crossover (that is, we set p = 0) then it can be 
shown that it requires @(n*) generations for the popu- 
lation to converge to the optimal set of paths. Adding 
crossover by choosing 0 < p < 1 improves the perfor- 
mance to O(n? logn). Note that the classical approach 
to solving this problem (the Floyd—Warshall algorithm) 
requires O(n?) time, which is faster and includes all 
computations that have to be performed (i. e. this is not 
just a count of the black box function evaluations). Of 
course, the classical algorithm has full details of the 
problem instance on which it is working, whereas the 
genetic algorithm is operating blind. Despite this great 
disadvantage, the genetic algorithm only pays a cost of 
a factor of log n over the classical approach (in addition 
to any implementation overhead). 

This example is one of the only cases where we 
have a proof that crossover helps for a naturally defined 
problem class. It is an important open problem to find 
others. 


trajectory of a population under such operators, in gen- 
eral terms, is well developed [42.5]. 

What is much less clear is the question of when all 
this is worth doing? That is, if our primary interest is in 
solving problems efficiently, in what circumstances is 
a genetic algorithm a good choice? To begin to answer 
this question requires an in-depth theoretical analysis of 
algorithms and problem classes. The work on this area 
has only just begun — most known results relate to the 
so-called (1+ 1) EA (that is, a population of size 1), 


Genetic Algorithms | References 


and very little work exists on the role of crossover. The 
All Pairs Shortest Path example represents the current 
state of the art in this respect. 

Having said that, it is clear (at least empirically 
and anecdotally) that genetic algorithms can be very ef- 
fective for complex problems, where problem instance 


References 


information is limited. Indeed, there are many success- 
ful applications of genetic algorithms presented every 
year at conferences, and they are a well-known tool for 
optimization in industry. There is, therefore, a desperate 
need for further theoretical work and understanding in 
this area. 


42.1 J.H. Holland: Adaptation in Natural and Artificial 
Systems (MIT, Cambridge 1992) 

42.2 D.E. Goldberg: Genetic Algorithms in Search, Opti- 
mization and Machine Learning (Addison Wesley, 
Indianapolis 1989) 

42.3 S. Droste, T. Jansen, K. Tinnefeld, |. Wegener: 
A new framework for the valuation of algorithms 
for black-box optimization. In: Foundations of Ge- 
netic Algorithms, ed. by K.A. De Jong, R. Poli, 
J.E. Rowe (Morgan Kaufmann, Torremolinos 2002) 
pp. 253-270 

42.4 M. Mitchell: An Introduction to Genetic Algorithms 
(MIT, Cambridge 1998) 

42.5 M.D. Vose: The Simple Genetic Algorithm (MIT, Cam- 
bridge 1999) 

42.6 C.R. Reeves, J.E. Rowe: Genetic Algorithms: Princi- 
ples and Perspectives (Kluwer, Dordrecht 2003) 


42.7 |. Wegener: Theoretical aspects of evolutionary al- 
gorithms, Lect. Notes Comput. Sci. 2076, 64-78 
(2001) 


42.8 F. Neumann, C. Witt: Bioinspired Computation 
in Combinatorial Optimization — Algorithms and 
Their Computational Complexity (Springer, Berlin, 
Heidelberg 2010) 

42.9 A. Auger, B. Doerr: Theory of Randomized Search 
Heuristics (World Scientific, Singapore 2011) 

42.10 T. Jansen, |. Wegener: On the utility of popula- 
tions in evolutionary algorithms, Proc. Genet. Evol. 
Comput. Conf. (GECCO) 2001 (Morgan Kaufmann, San 
Francisco 2001) pp. 1034-1041 

42.11 D. Sudholt: General lower bounds for the running 
time of evolutionary algorithms, Lect. Notes Com- 
put. Sci. 6238, 124-133 (2010) 

42.12 T. Jansen, |. Wegener: On the analysis of evolution- 
ary algorithms — A proof that crossover really can 
help, Algorithmica 34(1), 47-66 (2002) 

42.13 K. Deb: Multi-Objective Optimization Using Evolu- 
tionary Algorithms (Wiley, New York 2009) 

42.14 B. Doerr, E. Happ, C. Klein: Crossover can prov- 
ably be useful in evolutionary computation, Proc. 
Genet. Evol. Comput. Conf. (GECCO) 2008 (Morgan 
Kaufmann, Atlanta 2008) pp. 539-546 

42.15 J.E. Baker: Reducing bias and inefficiency in the 
selection algorithm, Proc. 2nd Int. Conf. Genet. Al- 
gorithms (ICGA) 1987 (Lawrence Erlbaum, Hillsdale 
1987) pp. 14-21 


42.16 G. Rawlins: Compared to What? (Freeman, New 
York 1991) 

42.17 A. Prigel-Bennett, J.L. Shapiro: An analysis of ge- 
netic algorithms using statistical mechanics, Phys. 
Rev. Lett. 72(9), 1305-1309 (1994) 

42.18 L.D. Whitley: The GENITOR algorithm and selective 
pressure: Why rank-based allocation of reproduc- 
tive trials is best, Proc. 3rd Int. Conf. Genet. Al- 
gorithms (ICGA) 1989 (Morgan Kaufmann, Atlante 
1989) pp. 116-121 

42.19 A.H. Wright, J.E. Rowe: Continuous dynamical sys- 
tem models of steady-state genetic algorithms, 
Foundations of Genetic Algorithms (Morgan Kauf- 
mann, Charlottesville 2002) pp. 209-225 

42.20 |. Harvey: The microbial genetic algorithm, Proc. 
10th Eur. Conf. Adv. Artif. Life (Springer, Berlin, Hei- 
delberg 2011) pp. 126-133 

42.21 J. Culberson: Genetic Invariance: A New Paradigm 
for Genetic Algorithm Design, Univ. Alberta Tech. 
Rep. R92-02 (1992) 

42.22 M. Dietzfelbinger, B. Naudts, C. Van Hoyweghen, 
|. Wegener: The analysis of a recombinative hill- 
climber on H-IFF, IEEE Trans. Evol. Comput. 7(5), 
417-423 (2003) 

42.23 S. Lin: Computer Solutions to the travelling sales- 
man problem, Bell Syst. Tech. J. 44(10), 2245-2269 
(1965) 

42.24 C.H. Papadimitriou, K. Steiglitz: Combinatorial 0p- 
timization (Dover Publications, New York 1998) 

42.25 T. Jansen, |. Wegener: On the choice of the muta- 
tion probability for the (1+1)EA, Lect. Notes Comput. 
Sci. 1917, 89-98 (2000) 

42.26 C. Witt: Optimizing linear functions with random- 
ized search heuristics, 29th Int. Symp. Theor. Asp. 
Comp. Sci. (STACS 2012), Leibniz-Zentrum fuer Infor- 
matik (2012) pp. 420-431 

42.27 S. Böttcher, B. Doerr, F. Neumann: Optimal fixed 
and adaptive mutation rates for the LeadingOnes 
problem, Lect. Notes Comput. Sci. 6238, 1-10 
(2010) 

42.28 J.E. Rowe, M.D. Vose, A.H. Wright: Representation 
invariant genetic operators, Evol. Comput. 18(4), 
635-660 (2010) 

42.29 J.E. Rowe, M.D. Vose: Unbiased black box algo- 
rithms, Genet. Evol. Comput. Conf. (GECCO 2011) 
(ACM, Dublin 2011) pp. 2035-2042 


843 


zh |3 Hed 


844 PartE 


Evolutionary Computation 


zh |3 Hed 


42.30 


42.31 


42.32 


42.33 


42.34 


42.35 


42.36 


42.37 


42.38 


42.39 


42.40 


42.41 


42.42 


P.K. Lehre, C. Witt: Black box search by unbiased 
variation, Genet. Evol. Comput. Conf. (GECCO 2010) 
(ACM, Portland 2010) pp. 1441-1449 

V. Kachitvichyanukul, B.W. Schmeiser: Binomial 
random variate generation, Communications ACM 
31(2), 216-222 (1988) 

T. Jansen, C. Zarges: Analysis of evolutionary algo- 
rithms: From computational complexity analysis to 
algorithm engineering, Found. Genet. Algorithms 
(FOGA) (2011) pp. 1-14 

J.E. Rowe: Population fixed-points for functions 
of unitation, Foundations of Genetetic Algorithms, 
Vol. 5 (Morgan Kaufmann, San Francisco 1998) 
pp. 69-84 

F. Neumann, P. Oliveto, C. Witt: Theoretical analysis 
of fitness-proportional selection: Landscapes and 
efficiency, Genet. Evol. Comput. Conf. (GECCO 2009) 
(ACM, Portland 2009) pp. 835-842 

P.K. Lehre, X. Yao: On the impact of the mutation- 
selection balance on the runtime of evolutionary 
algorithms, Found. Genet. Algorithms (FOGA 2009) 
(ACM, Portland 2009) 

J. Jagerskiipper, T. Storch: When the plus strategy 
outperforms the comma strategy and when not, 
Proc. IEEE Symp. Found. Comput. Intell. (FOCI 2007) 
(IEEE, Bellingham 2007) pp. 25-32 

J.E. Rowe, D. Sudholt: The choice of the offspring 
population size in the (1,4) EA, Genet. Evol. Com- 
put. Conf. (GECCO 2012) (ACM, Philadelphia 2012) 
N.J. Radcliffe: Forma analysis and random re- 
spectful recombination, Fourth Int. Conf. Genet. 
Algorithms (Morgan Kaufmann, San Francisco 1991) 
pp. 31-38 

J.E. Rowe, M.D. Vose, A.H. Wright: Group properties 
of crossover and mutation, Evol. Comput. 10(2), 151- 
184 (2002) 

A. Moraglio: Towards a Geometric Unification of 
Evolutionary Algorithms, Ph.D. Thesis (University of 
Essex, Colchester 2007) 

T. Kötzing, D. Sudholt, M. Theile: How crossover 
helps in pseudo-Boolean optimization, Genet. 
Evol. Comput. Conf. (GECCO 2011) (ACM, Dublin 2011) 
pp. 989-996 

D. Whitley, T. Starkweather, D. Shaner: The traveling 
salesman and sequence scheduling: Quality solu- 
tions using genetic edge recombination. In: The 
Handbook of Genetic Algorithms, ed. by L. Davis 
(Van Nostrand Reinhold, Amsterdam 1991) pp. 350- 
372 


42.43 


42.44 


42.45 


42.46 


42.47 


42.48 


42.49 


42.50 


42.51 


42.52 


42.53 


42.54 


42.55 


S. Ronald: Duplicate genotypes in a genetic algo- 
rithm, Proc. 1998 IEEE World Congr. Comput. Intell. 
(1998) pp. 793-798 

K. De Jong: An Analysis of the Behaviour of a Class 
of Genetic Adaptive System, Ph.D. Thesis (University 
of Michigan, Ann Arbor 1975) 

D.E. Goldberg, J. Richardson: Genetic algorithms 
with sharing for multimodal function optimiza- 
tion, Proc. 2nd Int. Conf. Genet. Algorithms (ICGA) 
1987 (Lawrence Erlbaum Associates, Hillsdale 1987) 
pp. 41-49 

L.J. Eshelman, J.D. Shaffer: Preventing premature 
convergence in genetic algorithms by preventing 
incest, Proc. 4th Int. Conf. Genet. Algorithms (ICGA) 
1991 (Morgan Kaufmann, San Diego 1991) pp. 115- 
122 

M. Tomassini: Spatially Structured Evolutionary Al- 
gorithms: Articial Evolution in Space and Time 
(Springer, Berlin, Heidelberg 2005) 

J. Lassig, D. Sudholt: Analysis of speedups in paral- 
lel evolutionary algorithms for combinatorial opti- 
mization, Proc. 22nd Int. Symp. Algorithms Comput 
(ISAAC 2011, Yokohama 2011) 

W.D. Hillis: Co-evolving parasites improve sim- 
ulated evolution as an optimization procedure, 
Physica D 42, 228-234 (1990) 

M.A. Potter, K.A. De Jong: Cooperative coevolution: 
An architecture for evolving coadapted subcompo- 
nents, Evol. Comput. 8(1), 1-29 (2000) 

T. Jansen, R.P. Wiegand: The cooperative coevo- 
lutionary (1+ 1) EA, Evol. Comput. 12(4), 405-434 
(2004) 

H. Mihlenbein: Parallel genetic algorithms pop- 
ulation genetics and combinatorial optimization, 
Proc. 3rd Int. Conf. Genet. Algorithms (ICGA) 
1989 (Morgan Kaufmann, Fairfax 1989) pp. 416- 
421 

J.E. Rowe, K. Vinsen, N. Marvin: Parallel GAs for 
multiobjective functions, Proc. 2nd Nordic Work- 
shop Genet. Algorithms, ed. by J. Alander (Univ. 
Vaasa Press, Vaasa 1996) pp. 61-70 

B. Doerr, M. Theile: Improved analysis methods for 
crossover-based algorithms, Genet. Evol. Comput. 
Conf. (GECCO 2009) (ACM, Portland 2009) pp. 247- 
254 

B. Doerr, T. Kötzing, F. Neumann, M. Theile: 
More effective crossover operators for the all-pairs 
shortest path problem, Lect. Notes Comput. Sci. 
6238, 184-193 (2010) 


James McDermott, Una-May O'Reilly 


Genetic programming (GP) is the subset of evolu- 
tionary computation in which the aim is to create 
executable programs. It is an exciting field with 

many applications, some immediate and practical, 
others long-term and visionary. In this chapter, 

we provide a brief history of the ideas of genetic 
programming. We give a taxonomy of approaches 
and place genetic programming in a broader tax- 
onomy of artificial intelligence. We outline some 

current research topics and point to successful use 
cases. We conclude with some practical GP-related 
resources including software packages and venues 
for GP publications. 


43.1 Evolutionary Search for Executable 


Progra ME a ienen a S 845 
G32 MSO rria 846 
43.3 Taxonomy of Al and GP 0...00 848 
43.3.1 Placing GP in an Al Context ........ 848 
43.3.2 Taxonomy of GP cee 849 


43. Genetic Programming 


43.3.3 Representations ..................005 849 
43.3.4 Population Models .................008 852 
Hh. Uses of OP ccna asni 853 
43.4.1 Symbolic Regression a.e 853 
43.4.2 Machine Learning..................... 853 
43.4.3 Software Engineering................. 854 
Se DESEM ecne 855 
43.5 Research Topics ...................ccsssseeeeeeeees 857 
eS GB aeneae ae eaan 857 
G3 mA GP THEO resene endy 858 
035:3 MOdular. escniiisssinnsyss 860 
43.5.4 Open-Ended Evolution 
MA OP seron e hore 860 
43.6 Practicalities.........00.0.c eee 861 
43.6.1 Conferences and Journals........... 861 
TO. SOWIE ciciscnssascencsnoncnasenedess soins 861 
43.6.3 Resources 
and Further Reading.................. 861 
REFEFENCES......... eee cc ccc c ccc eec eee eeaeseeeaeeneeees 862 


43.1 Evolutionary Search for Executable Programs 


There have been many attempts to artificially emulate 
human intelligence, from symbolic artificial intelli- 
gence (AI) [43.1] to connectionism [43.2, 3], to subcog- 
nitive approaches like behavioral AI [43.4] and statisti- 
cal machine learning (ML) [43.5], and domain-specific 
achievements like web search [43.6] and self-driving 
cars [43.7]. Darwinian evolution [43.8] has a type of 
distributed intelligence distinct from all of these. It has 
created lifeforms and ecosystems of amazing diversity, 
complexity, beauty, facility, and efficiency. It has even 
created forms of intelligence very different from itself, 
including our own. 

The principles of evolution — fitness biased selec- 
tion and inheritance with variation — serve as inspiration 
for the field of evolutionary computation (EC) [43.9], 
an adaptive learning and search approach which is 


general-purpose, applicable even with black-box per- 
formance feedback, and highly parallel. EC is a trial- 
and-error method: individual solutions are evaluated 
for fitness, good ones are selected as parents, and 
new ones are created by inheritance with variation 
(Fig. 43.1). 

GP is the subset of EC in which the aim is to create 
executable programs. The search space is a set of pro- 
grams, such as the space of all possible Lisp programs 
within a subset of built-in functions and functions com- 
posed by a programmer or the space of numerical C 
functions. The program representation is an encoding of 
such a search space, for example an abstract syntax tree 
or a list of instructions. A program’s fitness is evaluated 
by executing it to see what it does. New programs are 
created by inheritance and variation of material from 


845 


V 
o 

= 

“= 
m 
f> 
w 
= 


846 PartE 


cen |3 Hed 


Evolutionary Computation 


Empty population 


Random 
initialization 


Population 


Fitness evaluation 
and selection 


Replacement 


Children Parents 


Crossover and 
mutation 


Fig. 43.1 The fundamental loop of EC 


parent programs, with constraints to ensure syntactic 
correctness. 

We define a program as a data structure capable 
of being executed directly by a computer, or of being 
compiled to a directly executable form by a compiler, 
or of interpretation, leading to execution of low-level 
code, by an interpreter. A key feature of some pro- 
gramming languages, such as Lisp, is homoiconicity: 
program code can be viewed as data. This is essential in 
GP, since when the algorithm operates on existing pro- 
grams to make new ones, it is regarding them as data; 
but when they are being executed in order to determine 
what they do, they are being regarded as the program 
code. This double meaning echoes that of DNA (de- 
oxyribonucleic acid), which is both data and code in 
the same sense. 

GP exists in many different forms which differ 
(among other ways) in their executable representation. 
As in programming by hand, GP usually considers and 
composes programs of varying length. Programs are 


43.2 History 


GP has a surprisingly long history, dating back to very 
shortly after von Neumann’s 1945 description of the 
stored-program architecture [43.43] and the 1946 cre- 
ation of ENIAC [43.44], sometimes regarded as the 
first general-purpose computer. In 1948, Turing stated 
the aim of machine intelligence and recognized that 
evolution might have something to teach us in this re- 
gard [43.45]: 


also generally hierarchical in some sense, with nesting 
of statements or control. These representation proper- 
ties (variable length and hierarchical structure) raise 
a very different set of technical challenges for GP com- 
pared to typical EC. 

GP is very promising, because programs are so gen- 
eral. A program can define and operate on any data 
structure, including numbers, strings, lists, dictionaries, 
sets, permutations, trees, and graphs [43.10-12]. Via 
Turing completeness, a program can emulate any model 
of computation, including Turing machines, cellular 
automata, neural networks, grammars, and finite-state 
machines [43.13-18]. 

A program can be a data regression model [43.19] 
or a probability distribution. It can express the growth 
process of a plant [43.20], the gait of a horse [43.21], 
or the attack strategy of a group of lions [43.22]; it 
can model behavior in the Prisoner’s Dilemma [43.23] 
or play chess [43.24], Pacman [43.25], or a car-racing 
game [43.26]. A program can generate designs for 
physical objects, like a space-going antenna [43.27], or 
plans for the organization of objects, like the layout of 
a manufacturing facility [43.28]. A program can imple- 
ment a rule-based expert system for medicine [43.29], 
a scheduling strategy for a factory [43.30], or an 
exam timetable for a university [43.31]. A program 
can recognize speech [43.32], filter a digital sig- 
nal [43.33], or process the raw output of a brain- 
computer interface [43.34]. It can generate a piece 
of abstract art [43.35], a 3-D (three-dimensional) ar- 
chitectural model [43.36], or a piece of piano mu- 
sic [43.37]. 

A program can interface with natural or man-made 
sensors and actuators in the real world, so it can both 
act and react [43.38]. It can interact with a user or with 
remote sites over the network [43.39]. It can also intro- 
spect and copy or modify itself [43.40]. A program can 
be nondeterministic [43.41]. If true AI is possible, then 
a program can be intelligent [43.42]. 


Further research into intelligence of machinery will 
probably be very greatly concerned with searches. 
[...] There is the genetical or evolutionary search 
by which a combination of genes is looked for, the 
criterion being survival value. The remarkable suc- 
cess of this search confirms to some extent the idea 
that intellectual activity consists mainly of various 
kinds of search. 


Genetic Programming | 43.2 History 847 


However, Turing also went a step further. In 1950, he 
more explicitly stated the aim of automatic program- 
ming (AP) and a mapping between biological evolution 
and program search [43.46]: 


We have |...] divided our problem [automatic pro- 
gramming] into two parts. The child-program [Tur- 
ing machine] and the education process. These two 
remain very closely connected. We cannot expect to 
find a good child-machine at the first attempt. One 
must experiment with teaching one such machine 
and see how well it learns. One can then try another 
and see if it is better or worse. There is an obvious 
connection between this process and evolution, by 
the identifications: 


© Structure of the child machine = Hereditary mate- 
rial 

© Changes = Mutations 

@ Natural selection = Judgment of the experimenter. 


This is an unmistakeable, if abstract, description of GP 
(though a computational fitness function is not envis- 
aged). 

Several other authors expanded on the aims and vi- 
sion of AP and machine intelligence. In 1959 Samuel 
wrote that the aim was to be able to Tell the computer 
what to do, not how to do it [43.47]. An important early 
attempt at implementation of AP was the 1958 learning 
machine of Friedberg [43.48]. 

In 1963, McCarthy summarized [43.1] several rep- 
resentations with which machine intelligence might 
be attempted: neural networks, Turing machines, and 
calculator programs. With the latter, McCarthy was re- 
ferring to Friedberg’s work. McCarthy was prescient 
in identifying important issues such as representations, 


$ 
à 
o — 


operator behavior, density of good programs in the 
search space, sufficiency of the search space, appro- 
priate fitness evaluation, and self-organized modularity. 
Many of these remain open issues in GP [43.49]. 

Fogel et al.’s 1960s evolutionary programming may 
be the first successful implementation of GP [43.50]. 
It used a finite-state machine representation for pro- 
grams, with specialized operators to ensure syntactic 
correctness of offspring. A detailed history is available 
in Fogel’s 2006 book [43.51]. 

In the 1980s, inspired by the success of genetic al- 
gorithms (GAs) and learning classifier systems (LCSs), 
several authors experimented with hierarchically struc- 
tured and program-like representations. Smith [43.52] 
proposed a representation of a variable-length list of 
rules which could be used for program-like behavior 
such as maze navigation and poker. Cramer [43.53] was 
the first to use a tree-structured representation and ap- 
propriate operators. With a simple proof of concept, it 
successfully evolved a multiplication function in a sim- 
ple custom language. Schmidhuber [43.54] describes 
a GP system with the possibility of Turing complete- 
ness, though the focus is on meta-learning aspects. 
Fujiki and Dickinson [43.55] generated Lisp code for 
the prisoner’s dilemma, Bickel and Bickel [43.56] used 
a GA to create variable-length lists of rules, each of 
which had a tree structure. An artificial life approach 
using machine-code genomes was used by Ray [43.57]. 
All of these would likely be regarded as on-topic in 
a modern GP conference. 

However, the founding of the modern field of GP, 
and the invention of what is now called standard GP, 
are credited to Koza [43.19]. In addition to the abstract 
syntax tree notation (Sect. 43.3.3), the key innovations 
were subtree crossover (Sect. 43.3.3) and the descrip- 
tion and set-up of many test problems. In this and 


0 =] 


Fig. 43.2a-c The StdGP representation is an abstract syntax tree. The expression that will be evaluated in the sec- 
ond tree from left is, in inorder notation, (x * y) — (x + 2). In preorder, or the notation of Lisp-style S-expressions, it is 
(— (* xy) (+ x 2)). GP presumes that the variables x and y will be already bound to some value in the execution environ- 
ment when the expression is evaluated. It also presumes that the operations x and —, etc. are also defined. Note that, all 
interior tree nodes are effectively operators in some computational language. In standard GP parlance, these operators 
are called functions and the leaf tree nodes which accept no arguments and typically represent variables bound to data 


values from the problem domain are referred to as terminals 


cen | J Hed 


848 PartE 


Evolutionary Computation 


EEH |3 Hed 


later research [43.10, 58,59] symbolic regression of 
synthetic data and real-world time series, Boolean prob- 
lems, and simple robot control problems such as the 
lawnmower problem and the artificial ant with Santa 
Fe trail were introduced as benchmarks and solved 
successfully for the first time, demonstrating that GP 
was a potentially powerful and general-purpose method 
capable of solving machine learning-style problems 
albeit conventional academic versions of them. Mu- 
tation was minimized in order to make it clear that 
GP was different from random search. GP took on 
its modern form in the years following Koza’s 1992 
book: many researchers took up work in the field, new 
types of GP were developed (Sect. 43.3), successful 


43.3 Taxonomy of Al and GP 


In this section, we present a taxonomy which firstly 
places GP in the context of the broader fields of EC, 
ML, and artificial intelligence (AI). It then classifies GP 
techniques according to their representations and their 
population models (Fig. 43.3). 


43.3.1 Placing GP in an Al Context 


GP is a type of EC, which is a type of ML, 
which is itself a subset of the broader field of AI 
(Fig. 43.3). Carbonell et al. [43.61] classify ML tech- 
niques according to the underlying learning strategy, 
which may be rote learning, learning from instruc- 
tion, learning by analogy, learning from examples, and 
learning from observation and discovery. In this classi- 
fication, EC and GP fit in the learning from examples 
category, in that an (individual, fitness) pair is an ex- 


applications appeared (Sect. 43.4), key research top- 
ics were identified (Sect. 43.5), further books were 
written, and conferences and journals were established 
(Sect. 43.6). 

Another important milestone in the history of GP 
was the 2004 establishment of the Humies, the awards 
for human-competitive results produced by EC meth- 
ods. The entries are judged for matching or exceeding 
human-produced solutions to the same or similar prob- 
lems, and for criteria such as patentability and pub- 
lishability. The impressive list of human-competitive 
results [43.60] again helps to demonstrate to researchers 
and clients outside the field of GP that it is powerful and 
general purpose. 


ample drawn from the search space together with its 
evaluation. 

It is also useful to see GP as a subset of another 
field, AP. The term automatic programming seems to 
have had different meanings at different times, from 
automated card punching, to compilation, to template- 
driven source generation, then generation techniques 
such as universal modeling language (UML), to the am- 
bitious aim of creating software directly from a natural- 
language English specification [43.62]. We interpret AP 
to mean creating software by specifying what to do 
rather than how to do it [43.47]. GP clearly fits into this 
category. Other nonevolutionary techniques also do so, 
for example inductive programming (IP). The main dif- 
ference between GP and IP is that typically IP works 
only with programs which are known to be correct, 
achieving this using inductive methods over the spec- 


Fig. 43.3 A taxonomy of AI, EC, 
and GP 


Genetic Programming | 43.3 Taxonomy of Aland GP 849 


ifications, [43.63]. In contrast, GP is concerned mostly 
with programs which are syntactically correct, but be- 
haviorally suboptimal. 


43.3.2 Taxonomy of GP 


It is traditional to divide EC into four main subfields: 
evolution strategies (ES) [43.64, 65], evolutionary pro- 
gramming (EP) [43.50], GAs [43.66], and GP. In this 
view, ES is chiefly characterized by real-valued opti- 
mization and self-adaptation of algorithm parameters; 
EP by a finite-state machine representation (later gen- 
eralized) and the absence of crossover; GA by the 
bitstring representation; and GP by the abstract syn- 
tax tree representation. While historically useful, this 
classification is not exhaustive: in particular it does 
not provide a home for the many alternative GP rep- 
resentations which now exist. It also separates EP and 
GP, though they are both concerned with evolving pro- 
grams. We prefer to use the term GP in a general sense 
to refer to all types of EC which evolve programs. We 
use the term standard GP (StdGP) to mean Koza-style 
GP with a tree representation. With this view, StdGP 
and EP are types of GP, as are several others discussed 
below. In the following, we classify GP algorithms ac- 
cording to their representation and according to their 
population model. 


43.3.3 Representations 


Throughout EC, it is useful to contrast direct and indi- 
rect representations. Standard GP is direct, in that the 
genome (the object created and modified by the genetic 
operators) serves directly as an executable program. 
Some other GP representations are indirect, meaning 
that the genome must be decoded or translated in 
some way to give an executable program. An example 
is grammatical evolution (GE, see below), where the 
genome is an integer array which is used to generate 
a program. Indirect representations have the advantage 
that they may allow an easier definition of the genetic 
operators, since they may allow the genome to exist 
in a rather simpler space than that of executable pro- 
grams. Indirect representations also imitate somewhat 
more closely the mechanism found in nature, a mapping 
from DNA (deoxyribonucleic acid) to RNA (ribonu- 
cleic acid) to mRNA (messenger RNA) to codons to 
proteins and finally to cells. The choice between direct 
and indirect representations also affects the structure 
of the fitness landscape (Sect. 43.5.2). In the follow- 
ing, we present a nonexhaustive selection of the main 


representations used in GP, in each case describing ini- 
tialization and the two key operators: mutation, and 
crossover. 


Standard GP 
In Standard GP (StdGP), the representation is an ab- 
stract syntax tree, or can be seen as a Lisp-style 
S-expression. All nodes are functions and all arguments 
are the same type. A function accepts zero or more 
arguments and returns a single value. Trees can be ini- 
tialized by recursive random growth starting from a null 
node. StdGP uses parameterized initialization methods 
that diversify the size and structure of initial trees. Fig- 
ure 43.2a shows a tree in the process of initialization. 

Trees can be crossed over by cutting and swap- 
ping the subtrees rooted at randomly chosen nodes, as 
shown in Fig. 43.2b. They can be mutated by cutting 
and regrowing from the subtrees of randomly cho- 
sen nodes, as shown in Fig. 43.2c. Another mutation 
operator, HVL-Prime, is shown later in Fig. 43.11. 
Note that crossover or mutation creates an offspring 
of potentially different size and structure, but the off- 
spring remains syntactically valid for evaluation. With 
these variations, a tree could theoretically grow to 
infinite size or height. To circumvent this, as a prac- 
ticality, a hard parameterized threshold for size or 
height or some other threshold is used. Violations to 
the threshold are typically rejected. Bias may also 
be applied in the randomized selection of crossed- 
over subtree roots. A common variation of StdGP 
is strongly typed GP (STGP) [43.67, 68], which sup- 
ports functions accepting arguments and returning val- 
ues of specific types by means of specialized mu- 
tation and crossover operations that respect these 


types. 


Executable Graph Representations 
A natural generalization of the executable tree rep- 
resentation of StdGP is the executable graph. Neural 
networks can be seen as executable graphs in which 
each node calculates a weighted sum of its inputs and 
outputs the result after a fixed shaping function such 
as tanh(). Parallel and distributed GP (PDGP) [43.69] 
is more closely akin to StdGP in that nodes calculate 
different functions, depending on their labels, and do 
not perform a weighted sum. It also allows the topol- 
ogy of the graph to vary, unlike the typical neural 
network. Cartesian GP (CGP) [43.70] uses an integer- 
array genome and a mapping process to produce the 
graph. Each block of three integer genes codes for 
a single node in the graph, specifying the indices of 


EEH |3 Hed 


850 PartE | Evolutionary Computation 


EEH |3 Hed 


its inputs and the function to be executed by the node 
(Fig. 43.4). 

Neuro-evolution of augmenting topologies 
(NEAT) [43.71] again allows the topology to vary, 
and allows nodes to be labelled by the functions they 
perform, but in this case each node does perform 
a weighted sum of its inputs. Each of these represen- 
tations uses different operators. For example, CGP 
uses simple array-oriented (GA-style) initialization, 
crossover, and mutation operators (subject to some 
customizations). 


Finite-State Machine Representations 

Some GP representations use graphs in a different way: 
the model of computation is the finite-state machine 
rather than the executable functional graph (Fig. 43.5). 
The original incarnation of evolutionary programming 
(EP) [43.72] is an example. In a typical implementa- 
tion [43.72], five types of mutation are used: adding 
and deleting states, changing the initial state, changing 
the output symbol attached to edges, and changing the 
edges themselves. In this implementation, crossover is 
not used. 


[012 210 102 231 040 353] 


\/ 
[01* 21+ 10* 23- 04+ 35/] 


0 1 
OMORO 
Fig. 43.4 Cartesian GP. An integer-array genome is di- 
vided into blocks: in each block the last integer specifies 
a function (top-left). Then one node is created for each in- 
put variable (x,y,z) and for each genome block. Nodes 
are arranged in a grid and outputs are indexed sequen- 
tially (bottom-left). The first elements in each block specify 
the indices of the incoming links. The final graph is cre- 
ated by connecting each node input to the node output 
with the same integer label (right). Dataflow in the graph 
is bottom to top. Multiple outputs can be read from the 
topmost layer of nodes. In this example node 6 outputs 
xy—z+y, node 7 outputs x+z+y, and node 8 out- 


puts xy/xy 


Grammatical GP 
In grammatical GP [43.73], the context-free grammar 
(CFG) is the defining component of the representation. 
In the most common approach, search takes place in 
the space defined by a fixed nondeterministic CFG. The 
aim is to find a good program in that space. Often the 
CFG defines a useful subset of a programming language 
such as Lisp, C, or Python. Programs derived from the 
CFG can then be compiled or interpreted using either 
standard or special-purpose software. There are sev- 
eral advantages to using a CFG. It allows convenient 
definition of multiple data-types which are automati- 
cally respected by the crossover and mutation operators. 
It can introduce domain knowledge into the problem 
representation. For example, if it is known that good 
programs will consist of a conditional statement in- 
side a loop, it is easy to express this knowledge using 
a grammar. The grammar can restrict the ways in which 
program expressions are combined, for example mak- 
ing the system aware of physical units in dimensionally 
aware GP [43.74,75]. A grammatical GP system can 
conveniently be applied to new domains, or can incor- 
porate new domain knowledge, through updates to the 
grammar rather than large-scale reprogramming. 

In one early system [43.76], the derivation tree is 
used as the genome: initial individuals’ genomes are 
randomly generated according to the rules of the gram- 
mar. Mutation works by randomly generating a new 
subtree starting from a randomly chosen internal node 
in the derivation tree. Crossover is constrained to ex- 
change subtrees whose roots are identical. In this way, 
new individuals are guaranteed to be valid derivation 
trees. The executable program is then created from the 
genome by reading the leaves left to right. A later sys- 
tem, grammatical evolution (GEs) [43.77] instead uses 
an integer-array genome. Initialization, mutation and 
crossover are defined as simple GA-style array opera- 
tions. The genome is mapped to an output program by 
using the successive integers of the genome to choose 


Fig. 43.5 EP representation: finite-state machine. In this 
example, a mutation changes a state transition 


Genetic Programming | 43.3 Taxonomy of Aland GP 851 


among the applicable production choices at each step 
of the derivation process. Figure 43.6 shows a sim- 
ple grammar, integer genome, derivation process, and 
derivation tree. At each step of the derivation process, 
the left-most nonterminal in the derivation is rewritten. 
The next integer gene is used to determine, using the 
mod rule, which of the possible productions is chosen. 
The output program is the final step of the derivation 
tree. 

Although successful and widely used, GE has also 
been criticized for the disruptive effects of its operators 
with respect to preserving the modular functionality 
of parents. Another system, tree adjoining grammar- 
guided genetic programming (TAG3P) has also been 
used successfully [43.78]. Instead of a string-rewriting 
CFG, TAG3P uses the tree-rewriting tree adjoining 
grammars. The representation has the advantage, rel- 
ative to GE, that individuals are valid programs at 
every step of the derivation process. TAGs also have 
some context-sensitive properties [43.78]. However, it 
is a more complex representation. 

Another common alternative approach, surveyed 
by Shan et al. [43.79], uses probabilistic models over 
grammar-defined spaces, rather than direct evolutionary 
search. 


Linear GP 
In Linear GP (LGP), the program is a list of instructions 
to be interpreted sequentially. In order to achieve com- 
plex functionality, a set of registers acting as state or 
memory are used. Instructions can read from or write to 


<e> : := <0o><e><e> | <v> 

<o> = + || * (<e>) 

Gees z ily 

4]1]ai7jerfiuls |... T28 Ko) es) Kes) 
<e> [start] 


-> <o><e><e> [4%2=0] 
= Haea [| a2 = 1] 
=> *<yoce> [1/7 %2=1)| 


-> *y <e> [61 %2=1] 
> ayar M12221] G) © 
A 5Y) ay [8%2=0] 


Fig. 43.6 GE representation. The grammar (top-left) 
consists of several rules. The genome (center-left) is 
a variable-length list of integers. At each step of the deriva- 
tion process (bottom-left), the left-most nonterminal is 
rewritten as specified by a gene. The resulting derivation 
tree is shown on the right: reading just the leaves gives the 
derived program 


the registers. Several registers, which may be read-only, 
are initialized with the values of the input variables. One 
register is designated as the output: its value at the end 
of the program is taken as the result of the program. 
Since a register can be read multiple times after writ- 
ing, an LGP program can be seen as having a graph 
structure. A typical implementation is that of [43.80]. 
It uses instructions of three registers each, which typi- 
cally calculate a new value as an arithmetic function of 
some registers and/or constants, and assign it to a regis- 
ter (Fig. 43.7). 

It also allows conditional statements and looping. It 
explicitly recognizes the possibility of nonfunctioning 
code, or introns. Since there are no syntactic constraints 
on how multiple instructions may be composed to- 
gether, initialization can be as simple as the random 
generation of a list of valid instructions. Mutation can 
change a single instruction to a newly generated instruc- 
tion, or change just a single element of an instruction. 
Crossover can be performed over the two parents’ list 
structures, respecting instruction boundaries. 


Stack-Based GP 
A variant of linear GP avoids the need for registers 
by adding a stack. The program is again a list of in- 
structions, each now represented by a single label. In 
a simple arithmetic implementation, the label may be 
one of the input variables (x;), a numerical constant, or 
a function (*, +, etc.). If it is a variable or constant, 
the instruction is executed by pushing the value onto 
the stack. If a function, it is executed by popping the 
required number of operands from the stack, execut- 
ing the function on them, and pushing the result back 
on. The result of the program is the value at the top of 


Xo | x1 Read-only 
ro | ri Read-write 
ro = Xo + xy 


ro = ri 


Fig. 43.7 Linear GP representation. This implementation 
has four registers in total (top). The representation is a list 
of register-oriented instructions (bottom). In this example 
program of three instructions, ro is the output register, and 
the formula 4(xp + x1)? is calculated 


EEH |3 Hed 


852 


EEH |3 Hed 


Part E 


Evolutionary Computation 


the stack after all instructions have been executed. With 
the stipulation that stack-popping instructions become 
no-ops when the stack is empty, one can again imple- 
ment initialization, mutation, and crossover as simple 
list-based operations [43.81]. One can also constrain the 
operations to work on what are effectively subtrees, so 
that stack-based GP becomes effectively equivalent to 
a reverse Polish notation implementation of standard 
GP [43.82]. A more sophisticated type of stack-based 
GP is PushGP [43.83], in which multiple stacks are 
used. Each stack is used for values of a different type, 
such as integer, boolean, and float. When a function 
requires multiple operands of different types, they are 
taken as required from the appropriate stacks. With the 
addition of an exec stack which stores the program 
code itself, and the code stack which stores items of 
code, both of which may be both read and written, 
PushGP gains the ability to evolve programs with self- 
modification, modularity, control structures, and even 
self-reproduction. 


Low-Level Programming 

Finally, several authors have evolved programs di- 
rectly in real-world low-level programming languages. 
Schulte et al. [43.84] automatically repaired programs 
written in Java byte code and in x86 assembly. Orlov 
and Sipper [43.85] evolved programs such as trail nav- 
igation and image classification de novo in Java byte 
code. This work made use of a specialized crossover 
operator which performed automated checks for com- 
patibility of the parent programs’ stack and control flow 
state. Nordin [43.86] proposed a machine-code repre- 
sentation for GP. Programs consist of lists of low-level 
register-oriented instructions which execute directly, 
rather than in a virtual machine or interpreter. The re- 
sult is a massive speed-up in execution. 


43.3.4 Population Models 


It is also useful to classify GP methods according 
to their population models. In general the population 
model and the representation can vary independently, 
and in fact all of the following population can be ap- 
plied with any EC representation including bitstrings 
and real-valued vectors, as well as with GP represen- 
tations. 

The simplest possible model, hill-climbing, uses 
just one individual at a time [43.87]. At each iteration, 
offspring are created until one of them is more highly fit 
than the current individual, which it then replaces. If at 
any iteration it becomes impossible to find an improve- 


ment, the algorithm has climbed the hill, i.e. reached 
a local optimum, and stops. It is common to use a ran- 
dom restart in this case. The hill-climbing model can 
be used in combination with any representation. Note 
that it does not use crossover. Variants include ES-style 
(u, A) or (+A) schemes, in which multiple parents 
each give rise to multiple offspring by mutation. 

The most common model is an evolving popula- 
tion. Here a large number of individuals (from tens to 
many thousands) exist in parallel, with new genera- 
tions being created by crossover and mutation among 
selected individuals. Variants include the steady-state 
and the generational models. They differ only in that 
the steady-state model generates one or a few new indi- 
viduals at a time, adds them to the existing population 
and removes some old or weak individuals; whereas the 
generational model generates an entirely new popula- 
tion all at once and discards the old one. 

The island model is a further addition, in which 
multiple populations all evolve in parallel, with infre- 
quent migration between them [43.88]. 

In coevolutionary models, the fitness of an individ- 
ual cannot be calculated in an endogenous way. Instead 
it depends on the individual’s relationship to other in- 
dividuals in the population. A typical example is in 
game-playing applications such as checkers, where the 
best way to evaluate an individual is to allow it to play 
against other individuals. Coevolution can also use fit- 
ness defined in terms of an individual’s relationship to 
individuals in a population of a different type. A good 
example is the work of [43.89], which uses a type of 
predator-prey relationship between populations of pro- 
grams and populations of test cases. The test cases 
(predators) evolve to find bugs in the programs; the pro- 
grams (prey) evolve to fix the bugs being tested for by 
the test suites. 

Another group of highly biologically inspired pop- 
ulation models are those of swarm intelligence. Here 
the primary method of learning is not the creation of 
new individuals by inheritance. Instead, each individ- 
ual generally lives for the length of the run, but moves 
about in the search space with reference to other indi- 
viduals and their current fitness values. For example, in 
particle swarm optimization (PSO) individuals tend to 
move toward the global best and toward the best point in 
their own history, but tend to avoid moving too close to 
other individuals. Although PSO and related methods 
such as differential evolution (DE) are best applied in 
real-valued optimization, their population models and 
operators can be abstracted and applied in GP methods 
also [43.90, 91]. 


Genetic Programming | 43.4 Uses of GP 853 


Finally, we come to estimation of distribution algo- 
rithms (EDAs). Here the idea is to create a population, 
select a subsample of the best individuals, model that 
subsample using a distribution, and then create a new 
population by sampling the distribution. This approach 
is particularly common in grammar-based GP [43.73], 


43.4 Uses of GP 


Our introduction (Sect. 43.1) has touched on a wide ar- 
ray of domains in which GP has been applied. In this 
section, we give more detail on just a few of these. 


43.4.1 Symbolic Regression 


Symbolic regression is one of the most common tasks 
for which GP is used [43.19, 95, 96]. It is used as a com- 
ponent in techniques like data modeling, clustering, and 
classification, for example in the modeling application 
outlined in Sect. 43.4.2. It is named after techniques 
such as linear or quadratic regression, and can be seen 
as a generalization of them. Unlike those techniques 
it does not require a priori specification of the model. 
The goal is to find a function in symbolic form which 
models a data set. A typical symbolic regression is im- 
plemented as follows. 

It begins with a dataset which is to be regressed, 
in the form of a numerical matrix (Fig. 43.8, left). 
Each row / is a data-point consisting of some input (ex- 
planatory) variables x; and an output (response) variable 
y; to be modeled. The goal is to produce a function 
f(x) which models the relationship between x and y as 
closely as possible. Figure 43.8 (right) plots the existing 
data and one possible function f. 

Typically StdGP is used, with a numerical language 
which includes arithmetic operators, functions like si- 
nusoids and exponentials, numerical constants, and the 
input variables of the dataset. The internal nodes of each 
StdGP abstract syntax tree will be operators and func- 
tions, and the leaf nodes will be constants and variables. 

To calculate the fitness of each model, the explana- 
tory variables of the model are bound to their values 
at each of the training points x; in turn. The model is 
executed, and the output f(x;) is the model’s predicted 
response. This value ĵ; is then compared to the response 
of the training point y;. The error can be visualized as 
the dotted lines in Fig. 43.8 (right). Fitness is usually 
defined as the root-mean-square error of the model’s 
outputs versus the training data. In this formulation, 


though it is also used with other representations [43.92— 
94]. The modeling-sampling process could be regarded 
as a type of whole-population crossover. Alternatively 
one can view EDAs as being quite far from the biolog- 
ical inspiration of most EC, and in a sense they bridge 
the gap between EC and statistical ML. 


therefore, fitness is to be minimized 


n N ENE 2 
fitness(f) = Lisi (fi) = yi)" œ) =y) í 
n 
Over the course of evolution, the population moves to- 
ward better and better models f of the training data. 
After the run, a testing data set is used to confirm that 
the model is capable of generalization to unseen data. 


43.4.2 Machine Learning 


Like other ML methods, GP is successful in quantita- 
tive domains where data is available for learning and 
both approximate solutions and incremental improve- 
ments are valued. In modeling or supervised learning, 
GP is preferable to other ML methods in circumstances 
where the form of the solution model is unknown a pri- 
ori because it is capable of searching among possible 
forms for the model. Symbolic regression can be used 
as an approach to classification, regression modeling, 
and clustering. It can also be used to automatically 
extract influential features, since it is able to pare 
down the feature set it is given at initialization. GP- 
derived classifiers have been integrated into ensemble 


x 5 
0.1 0.3 
0.2 0.6 
0.3 0.5 
0.4 0.7 
0.5 0.5 


> 
x 
Fig. 43.8 Symbolic regression: a matrix of data (left) is 
to be modeled by a function. It is plotted as dots in the 


figure on the right. A candidate function f (solid line) can 
be plotted, and its errors (dotted lines) can be visualized 


eh |3 Wed 


854 PartE 


Evolutionary Computation 


teh |3 Hed 


learning approaches and GP has been used in reinforce- 
ment learning (RL) contexts. Figure 43.9 shows GP as 
a means of ML which allows it to address problems 
such as planning, forecasting, pattern recognition, and 
modeling. 

For the sensory evaluation problem described 
in [43.97], the authors use GP as the anchor of a ML 
framework (Fig. 43.10). A panel of assessors provides 
liking scores for many different flavors. Each flavor 
consists of a mixture of ingredients in different pro- 
portions. The goals are to discover the dependency of 
a liking score on the concentration levels of flavors’ 
ingredients, identifying ingredients that drive liking, 
segmenting the panel into groups with similar liking 
preferences and optimizing flavors to maximize liking 
per group. The framework uses symbolic regression and 
ensemble methods to generate multiple diverse expla- 
nations of liking scores, with confidence information. It 
uses statistical techniques to extrapolate from the genet- 
ically evolved model ensembles to unobserved regions 
of the flavor space. It also segments the assessors into 
groups which either have the same propensity to like 
flavors, or whose liking is driven by the same ingredi- 
ents. 

Sensory evaluation data is very sparse and there 
is large variation among the responses of different 
assessors. A Pareto-GP algorithm (which uses multi- 
objective techniques to maximise model accuracy and 
minimise model complexity; [43.98]) was therefore 
used to evolve an ensemble of models for each assessor 
and to use this ensemble as a source of robust vari- 


Source signal observations 


Problems 


Pattern Anomaly 
recognition detection 
Machine learning 
7 techniques 


Forecasting 


Planing Modeling 


Feature Statistical Ensemple MAST 
: i mp: RL Optimization 

extraction analysis techniques 

Classification Regression | Clustering 


p Structure free O 


modeling 


Genetic programming symbolic regression 


Fig. 43.9 GP as a component in ML. Symbolic regression can be 
used as an approach to many ML tasks, and integrated with other 
ML techniques 


able importance estimation. The frequency of variable 
occurrences in the models of the ensemble was inter- 
preted as information about the ingredients that drive 
the liking of an assessor. Model ensembles with the 
same dominance of variable occurrences, and which 
demonstrate similar effects when the important vari- 
ables are varied, were grouped together to identify 
assessors who are driven by the same ingredient set and 
in the same direction. Varying the input values of the 
important variables, while using the model ensembles 
of these panel segments, provided a means of conduct- 
ing focused sensitivity analysis. Subsequently, the same 
model ensembles when clustered constitute the black 
box which is used by an evolutionary algorithm in its 
optimization of flavors that are well liked by assessors 
who are driven by the same ingredient. 


43.4.3 Software Engineering 


At least three areas of software engineering have 
been tackled with remarkable success by GP: bug- 
fixing [43.99], parallelization [43.100, 101], and op- 
timization [43.102-104]. These three areas are very 
different in their aims, scope, and methods; however, 
they all need to deal with two key problems in this do- 
main: the very large and unconstrained search space, 
and the problem of program correctness. They therefore 
do not aim to evolve new functionality from scratch, but 
instead use existing code as material to be transformed 
in some way; and they either guarantee correctness of 
the evolved programs as a result of their representa- 
tions, or take advantage of existing test suites in order 
to provide strong evidence of correctness. 

Le Goues et al. [43.99] show that automatically fix- 
ing software bugs is a problem within the reach of GP. 
They describe a system called GenProg. It operates 
on C source code taken from open-source projects. It 
works by forming an abstract syntax tree from the orig- 
inal source code. The initial population is seeded with 
variations of the original. Mutations and crossover are 
constrained to copy or delete complete lines of code, 
rather than editing subexpressions, and they are con- 
strained to alter only lines which are exercised by the 
failing test cases. This helps to reduce the search space 
size. The original test suites are used to give confi- 
dence that the program variations have not lost their 
original functionality. Fixes for several real-world bugs 
are produced, quickly and with high certainty of suc- 
cess, including bugs in HTTP servers, Unix utilities, 
and a media player. The fixes can be automatically pro- 
cessed to produce minimal patches. Best of all, the fixes 


Genetic Programming | 43.4 UsesofGP 855 


Consistently-well 
liked flavors 


Assessors who 
like the same thing? 


Model similarity 
by feature importance 


vm +x? | 


Model 
ensemble ensemble 


Assessor 1 


raa a: ee oo 


Flavors that maximize 
an assessors liking 


Assessor with same 
propensity to like 
the flavor space 


Model similarity 
by behavior 


A 


Assessor 69 


ensemble 


Fig. 43.10 GP symbolic regression is unique and useful as an ML technique because it obviates the need to define the 
structure of a model prior to training. Here, it is used to form a personalized ensemble model for each assessor in a flavor 


evaluation panel 


are demonstrated to be rather robust: some even gener- 
alize to fixing related bugs which were not explicitly 
encoded in the test suite. 

Ryan [43.100] describes a system, Paragen, which 
automatically rewrites serial Fortran programs to par- 
allel versions. In Paragen I, the programs are directly 
varied by the genetic operators, and automated tests 
are used to reward the preservation of the program’s 
original semantics. The work of Williams [43.101] was 
in some ways similar to Paragen I. In Paragen II, 
correctness of the new programs is instead guaran- 
teed, using a different approach. The programs to 
be evolved are sequences of transformations defined 
over the original serial code. Each transformation is 
known to preserve semantics. Some transformations 
however directly transform serial operations to paral- 
lel, while other transformations merely enable the first 
type. 

A third goal of software engineering is optimization 
of existing code. White et al. [43.104] tackle this task 
using a multiobjective optimization method. Again, an 
existing program is used as a starting point, and the 
aim is to evolve a semantically equivalent one with im- 
proved characteristics, such as reduced memory usage, 
execution time, or power consumption. The system is 


capable of finding nonobvious optimizations, i. e. ones 
which cannot be found by optimizing compilers. A pop- 
ulation of test cases is coevolved with the population of 
programs. Stephenson et al. [43.102, 103] in the Meta 
Optimization project improve program execution speed 
by using GP to refine priority functions within the 
compiler. The compiler generates better code which ex- 
ecutes faster across the input range of one program and 
across the program range of a benchmark set. 

A survey of the broader field of search-based soft- 
ware engineering is given by Harman [43.105]. 


43.4.4 Design 


GP has been successfully used in several areas of de- 
sign. This includes both engineering design, where the 
aim is to design some hardware or software system 
to carry out a well-defined task, and aesthetic design, 
where the aim is to produce art objects with subjective 
qualities. 


Engineering Design 
One of the first examples of GP design was the synthe- 
sis of analog electrical circuits by Koza et al. [43.106]. 
This work addressed the problem of automatically cre- 


eh |3 Wed 


856 PartE 


Evolutionary Computation 


teh |3 Hed 


ating circuits to perform tasks such as a filter or an 
amplifier. Eight types of circuit were automatically 
created, each having certain requirements, such as out- 
putting an amplified copy of the input, and low dis- 
tortion. These functions were used to define fitness. 
A complex GP representation was used, with both 
STGP (Sect. 43.3.3) and ADFs (Sect. 43.5.3). Exe- 
cution of the evolved program began with a trivial 
embryonic circuit. GP program nodes, when executed, 
performed actions such as altering the circuit topol- 
ogy or creating a new component. These nodes were 
parameterized with numerical parameters, also under 
GP control, which could be created by more typical 
arithmetic GP subtrees. The evolved circuits solved 
significant problems to a human-competitive standard 
though they were not fabricated. 

Another significant success story was the space- 
going antenna evolved by Hornby et al. [43.27] for the 
NASA (National Aeronautics and Space Administra- 
tion) Space Technology 5 spacecraft. The task was to 
design an antenna with certain beamwidth and band- 
width requirements, which could be tested in simulation 
(thus providing a natural fitness function). GP was used 
to reduce reliance on human labor and limitations on 
complexity, and to explore areas of the search space 
which would be rejected as not worthy of exploration 
by human designers. Both a GA and a GP representa- 
tion were used, producing quite similar results. The GP 
representation was in some ways similar to a 3-D turtle 
graphics system. Commands included forward which 
moved the turtle forward, creating a wire component, 
and rotate-x which changed orientation. Branching of 
the antenna arms was allowed with special markers 
similar to those used in turtle graphics programs. The 
program composed of these primitives, when run, cre- 
ated a wire structure, which was rotated and copied four 
times to produce a symmetric result for simulation and 
evaluation. 


Aesthetic Design 

There have also been successes in the fields of graphical 
art, 3-D aesthetic design, and music. Given the aesthetic 
nature of these fields, GP fitness is often replaced by 
an interactive approach where the user performs direct 
selection on the population. This approach dates back 
to Dawkins’ seminal Biomorphs [43.107] and has been 
used in other forms of EC also [43.108]. Early suc- 
cesses were those of Todd and Latham [43.109], who 
created pseudo-organic forms, and Sims [43.35] who 
created abstract art. An overview of evolutionary art is 
provided by Lewis [43.110]. 


A key aim throughout aesthetic design is to avoid 
the many random-seeming designs which tend to be 
created by typical representations. For example, a naive 
representation for music might encode each quarter- 
note as an integer in a genome whose length is the 
length of the eventual piece. Such a representation will 
be capable of representing some good pieces of music, 
but it will have several significant problems. The vast 
majority of pieces will be very poor and random sound- 
ing. Small mutations will tend to gradually degrade 
pieces, rather than causing large-scale and semantically 
sensible transformations [43.111]. 

As a result, many authors have tried to use rep- 
resentations which take advantage of forms of reuse. 
Although reuse is also an aim in nonaesthetic GP 
(Sect. 43.5.3), the hypothesis that good solutions will 
tend to involve reuse, even on new, unknown problems, 
is more easily motivated in the context of aesthetic de- 
sign. 

In one strand of research, the time or space to be 
occupied by the work is predefined, and divided into 
a grid of 1, 2, or 3 dimensions. A GP function of 1, 2 or 
3 arguments is then evolved, and applied to each point 
in the grid with the coordinates of the point passed as 
arguments to the function. The result is that the func- 
tion is reused many times, and all parts of the work 
are felt to be coherent. The earliest example of such 
work was that of Sims [43.35], who created fascinat- 
ing graphical art (a 2-D grid) and some animations 
(a 3-D grid of two spatial dimensions and 1 time di- 
mension). The paradigm was later brought to a high 
degree of artistry by Hart [43.112]. The same gener- 
ative idea, now with a 1-D grid representing time, was 
used by Hoover et al. [43.113], Shao et al. [43.114] and 
McDermott and O’Reilly [43.115] to produce music as 
a function of time, and with a 3-D grid by Clune and 
Lipson [43.116] to produce 3-D sculptures. 

Other successful work has used different ap- 
proaches to reuse. L-systems are grammars in which 
symbols are recursively expanded in parallel: after sev- 
eral expansions (a growth process), the string will by 
highly patterned, with multiple copies of some sub- 
strings. Interpreting this string as a program can then 
yield highly patterned graphics [43.117], artificial crea- 
tures [43.118], and music [43.119]. Grammars have 
also been used in 3-D and architectural design, both 
in a modified L-system form [43.36] and in the stan- 
dard GE form [43.120]. The Ossia system of Dahlst- 
edt [43.37] uses GP trees with recursive pointers to 
impose reuse and a natural, gestural quality on short 
pieces of art music. 


Genetic Programming | 43.5 Research Topics 


43.5 Research Topics 


Many research topics of interest to GP practitioners 
are also of broader interest. For example, the self- 
adaptation of algorithm parameters is a topic of interest 
throughout EC. We have chosen to focus on four re- 
search topics of specific interest in GP: bloat, GP 
theory, modularity, and open-ended evolution. 


43.5.1 Bloat 


Most GP-type problems naturally require variable- 
length representations. It might be expected that se- 
lection pressure would effectively guide the popula- 
tion toward program sizes appropriate to the problem, 
and indeed this is sometimes the case. However, it 
has been observed that for many different represen- 
tations [43.121] and problems, programs grow over 
time without apparent fitness improvements. This phe- 
nomenon is called bloat. Since the time complexity for 
the evaluation of a GP program is generally propor- 
tional to its size, this greatly slows the GP run down. 
There are also other drawbacks. The eventual solu- 
tion may be so large and complex that is unreadable, 
negating a key advantage of symbolic methods like GP. 
Overly large programs tend to generalize less well than 
parsimonious ones. Bloat may negatively impact the 
rate of fitness improvement. Since bloat is a significant 
obstacle to successful GP, it is an important topic of re- 
search, with differing viewpoints both on the causes of 
bloat and the best solutions. 

The competing theories of the causes of bloat are 
summarized by Luke and Panait [43.122] and Silva 
et al. [43.123]. A fundamental idea is that adding ma- 
terial rather than removing material from a GP tree 
is more likely to lead to a fitness improvement. The 
hitchhiking theory is that noneffective code is carried 
along by virtue of being attached to useful code. De- 
fense against crossover suggests that large amounts of 
noneffective code give a selection advantage later in GP 
runs when crossover is likely to highly destructive of 
good, fragile programs. Removal bias is the idea that it 
is harder for GP operators to remove exactly the right 
(i. e., noneffective) code than it is to add more. The fit- 
ness causes bloat theory suggests that fitness-neutral 
changes tend to increase program size just because 
there are many more programs with the same func- 
tionality at larger sizes than at smaller [43.124]. The 
modification point depth theory suggests that children 
formed by tree crossover at deep crossover points are 
likely to have fitness similar to their parents and thus 


more likely to survive than the more radically different 
children formed at shallow crossover points. Because 
larger trees have more very deep potential crossover 
points, there is a selection pressure toward growth. Fi- 
nally, the crossover bias theory [43.125] suggests that 
after many crossovers, a population will tend toward 
a limiting distribution of tree sizes [43.126] such that 
small trees are more common than large ones — note 
that this is the opposite of the effect that might be 
expected as the basis of a theory of bloat. However, 
when selection is considered, the majority of the small 
programs cannot compete with the larger ones, and 
so the distribution is now skewed in favour of larger 
programs. 

Many different solutions to the problem of bloat 
have been proposed, many with some success. One sim- 
ple method is depth limiting, imposing a fixed limit on 
the tree depth that can be produced by the variation op- 
erators [43.19]. 

Another simple but effective method is Tarpeian 
bloat control [43.127]. Individuals which are larger than 
average receive, with a certain probability, a constant 
punitively bad fitness. The advantage is that these in- 
dividuals are not evaluated, and so a huge amount of 
time can be saved and devoted to running more genera- 
tions (as in [43.122]). The Tarpeian method does allow 
the population to grow beyond its initial size, since the 
punishment is only applied to a proportion of individu- 
als — typically around 1 in 3. This value can also be set 
adaptively [43.127]. 

The parsimony pressure method evaluates all indi- 
viduals, but imposes a fitness penalty on overly large 
individuals. This assumes that fitness is commensurable 
with size: the magnitude of the punishment establishes 
a de facto exchange rate between the two. Luke and 
Panait [43.122] found that parsimony pressure was ef- 
fective across problems and across a wide range of 
exchange rates. 

The choice of an exchange rate can be avoided using 
multiobjective methods, such as Pareto-GP [43.128], 
where one of the objectives is fitness and the other 
program length or complexity. The correct definition 
for complexity in this context is itself an interesting 
research topic [43.96, 129]. Alternatively, the pressure 
against bloat can be moved from the fitness evalua- 
tion phase to the the selection phase of the algorithm, 
using the double tournament method [43.122]. Here 
individuals must compete in one fitness-based tour- 
nament and one size-based one. Another approach 


857 


S'EH |3 Hed 


858 PartE 


Evolutionary Computation 


S'EH |3 Hed 


is to incorporate tree size directly into fitness eval- 
uation using a minimum description length princi- 
ple [43.130]. 

Another technique is called operator length equal- 
ization. A histogram of program sizes is maintained 
throughout the run and is used to set the popula- 
tion’s capacity for programs of different sizes. A newly 
created program which would cause the population’s 
capacity to be exceeded is rejected, unless exception- 
ally fit. A mutation-based variation of the method 
instead mutates the overly large individuals using 
directed mutation to become smaller or larger as 
needed. 

Some authors have argued that the choice of GP rep- 
resentation can avoid the issue of bloat [43.131]. Some 
aim to avoid the problem of bloat by speeding up fit- 
ness evaluation [43.82, 132] or avoiding wasted effort 
in evaluation [43.133, 134]. Sometimes GP techniques 
are introduced with other motivations but have the side- 
effect of reducing bloat [43.135]. 

In summary, researchers including Luke and 
Panait [43.122], Poli etal. [43.127], Miller [43.131], 
and Silva et al. [43.123] have effectively declared vic- 
tory in the fight against bloat. However, their techniques 
have not yet become standard for new GP research and 
benchmark experiments. 


43.5.2 GP Theory 


Theoretical research in GP seeks to answer a variety of 
questions, for example: What are the drivers of popula- 
tion fitness convergence? How does the behavior of an 
operator influence the progress of the algorithm? How 
does the combination of different algorithmic mecha- 
nisms steer GP toward fitter solutions? What mecha- 
nisms cause bloat to arise? What problems are difficult 
for GP? How diverse is a GP population? Theoretical 
methodologies are based in mathematics and exploit 
formalisms, theorems, and proofs for rigor. While GP 
may appear simple, beyond its stochastic nature which 
it shares with all other evolutionary algorithms, its 
variety of representations each impose specific require- 
ments for theoretical treatment. All GP representations 
share two common traits which greatly contribute to 
the difficulty it poses for theoretical analysis. First, the 
representations have no fixed size, implying a complex 
search space. Second, GP representations do not im- 
ply that parents will be equal in size and shape. While 
crossover accommodates this lack of synchronization, 
it generally allows the exchange of content from any- 
where in one parent to anywhere in the other parent’s 


tree. This implies combinatorial outcomes and likes not 
switching with likes. This functionality contributes to 
complicated algorithmic behavior which is challenging 
to analyze. 

Here, we select several influential methods of theo- 
retical analysis and very briefly describe them and their 
results: schema-based analysis, Markov chain model- 
ing, runtime complexity, and problem difficulty. We 
also introduce the No Free Lunch Theorem and describe 
its implications for GP. 


Schema-Based Analysis 

In schema-based analysis, the search space is con- 
ceptually partitioned into hyperplanes (also known as 
schemas) which represent sets of partial solutions. 
There are numerous ways to do this and, as a con- 
sequence, multiple schema definitions have been pro- 
posed [43.136-139]. The fitness of a schema is esti- 
mated as the average fitness of all programs in the 
sample of its hyperplane, given a population. The pro- 
cesses of fitness-based selection and crossover are for- 
malized in a recurrence equation which describes the 
expected number of programs sampling a schema from 
the current population to the next. Exact formulations 
have been derived for most types of crossover [43.140, 
141]. These alternatively depend on making explicit the 
effects and the mechanisms of schema creation. This 
leads to insight; however, tracking schema equations 
in actual GP population dynamics is infeasible. Also, 
while schema theorems predict changes from one gen- 
eration to the next, they cannot predict further into the 
future to predict the long-term dynamics that GP prac- 
titioners care about. 


Markov Chain Analysis 
Markov chain models are one means of describing such 
long-term GP dynamics. They take advantage of the 
Markovian property observed in a GP algorithm: the 
composition of one generation’s population relies only 
upon that of the previous generation. Markov chains 
describe the probabilistic movement of a particular pop- 
ulation (state) to others using a probabilistic transition 
matrix. In evolutionary algorithms, the transition matrix 
must express the effects of any selection and varia- 
tion operators. The transition matrix, when multiplied 
by itself k times, indicates which new populations can 
be reached in k generations. This, in principle, allows 
a calculation of the probability that a population with 
a solution can be reached. To date a Markov chain for 
a simplified GP crossover operator has been derived, 
see [43.142]. Another interesting Markov chain-based 


Genetic Programming | 43.5 Research Topics 859 


Substitution to 
chosen node 


result has revealed that the distribution of functionality 
of non-Turing complete programs approaches a limit 
as length increases. Markov chain analysis has also 
been the means of describing what happens with GP 
semantics rather than syntax. The influence of sub- 
tree crossover is studied in a semantic building block 
analysis by [43.143]. Markov chains, unfortunately, 
combinatorially explode with even simple extensions of 
algorithm dynamics or, in GP’s case, its theoretically in- 
finite search space. Thus, while they can support further 
analysis, ultimately this complexity is unwieldy to work 
with. 


Runtime Complexity 
Due to stochasticity, it is arguably impossible in most 
cases to make formal guarantees about the number of 
fitness evaluations needed for a GP algorithm to find 
an optimal solution. However, initial steps in the run- 
time complexity analysis of genetic programming have 
been made in [43.144]. The authors study the runtime 
of hill climbing GP algorithms which use a mutation 
operator called HVL-Prime (Figs. 43.11 and 43.12). 
Several of these simplified GP algorithms were ana- 
lyzed on two separable model problems, Order and 
Majority introduced in [43.145]. Order and Majority 
each have an independent, additive fitness structure. 
They each admit multiple solutions based on their ob- 
jective function, so they exhibit a key property of 
all real GP problems. They each capture a different 
relevant facet of typical GP problems. Order repre- 
sents problems, such as classification problems, where 
the operators include conditional functions such as 
an IF-THEN-ELSE. These functions give rise to con- 
ditional execution paths which have implications for 
evolvability and the effectiveness of crossover. Ma- 
jority is a GP equivalent of the GA OneMax prob- 
lem [43.146]. It reflects a general (and thus weak) 
property required of GP solutions: a solution must 
have correct functionality (by evolving an aggrega- 
tion of subsolutions) and no incorrect functionality. 
The analyses highlighted, in particular, the impact of 
accepting or rejecting neutral moves and the impor- 


c) 


Fig. 43.11a-c HVL-prime mutation: 
substitution and deletion (a) Original 
(I) parse tree, (b) Result of substitution 


O O (c) Result of deletion 


tance of a local mutation operator. A similar finding, 
[43.147], regarding mutation arose from the analy- 
sis of the Max problem [43.148] and hillclimbing. 
For a search process bounded by a maximally sized 
tree of n nodes, the time complexity of the sim- 
ple GP mutation-based hillclimbing algorithms using 
HVL-Prime for the entire range of MAX variants are 
O(n log? n) when one mutation operation precedes each 
fitness evaluation. When multiple mutations are succes- 
sively applied before each fitness evaluation, the time 
complexity is O(n*). This complexity can be reduced 
to O(nlogn) if the mutations are biased to replace 
a random leaf with distance d from the root with prob- 
ability 27%. 

Runtime analyses have also considered parsimony 
pressure and multiobjective GP algorithms for general- 
izations of Order and Majority [43.149]. 

GP algorithms have also been studied in the PAC 
learning framework [43.150]. 


Problem Difficulty 
Problem difficulty is the study of the differences be- 
tween algorithms and problems which lead to differ- 
ences in performance. Stated simply, the goal is to 
understand why some problems are easy and some are 
hard, and why some algorithms perform well on certain 
problems and others do not. Problem difficulty work in 
the field of GP has much in common with similar work 
in the broader field of EC. Problem difficulty is nat- 
urally related to the size of the search space; smaller 
spaces are easier to search, as are spaces in which 


hosen New 


Fig. 43.12a,b HVL-prime mutation: insertion (a) Original 
parse tree, (b) Result of insertion 


S'EH |3 Hed 


860 PartE 


Evolutionary Computation 


S'EH |3 Hed 


the solution is over-represented [43.151]. Difficulty is 
also related to the fitness landscape [43.152], which in 
turn depends on both the problem and the algorithm 
and representation chosen to solve it. Landscapes with 
few local optima (visualized in the fitness landscape as 
peaks which are not as high as that of the global opti- 
mum) are easier to search. Locality, that is the property 
that small changes to a program lead to small changes 
in fitness, implies a smooth, easily searchable land- 
scape [43.151, 153]. 

However, more precise statements concerning prob- 
lem difficulty are usually desired. One important line of 
research was carried out by Vanneschi et al. [43.154— 
156]. This involved calculating various measures of the 
correlation of the fitness landscape, that is the rela- 
tionship between distance in the landscape and fitness 
difference. The measures include the fitness distance 
correlation and the negative slope coefficient. These 
measures require the definition of a distance measure 
on the search space, which in the case of standard GP 
means a distance between pairs of trees. Various tree 
distance measures have been proposed and used for this 
purpose [43.157—160]. However, the reliable prediction 
of performance based purely on landscape analysis re- 
mains a distant goal in GP as it does in the broader field 
of EC. 


No Free Lunch 

In a nutshell, the No Free Lunch Theorem [43.161] 
proves that, averaged over all problem instances, no 
algorithm outperforms another. Follow-up NFL anal- 
ysis [43.162, 163] yields a similar result for problems 
where the set of fitness functions are closed under per- 
mutation. One question is whether the NFL theorem 
applies to GP algorithms: for some problem class, is 
it worth developing a better GP algorithm, or will this 
effort offer no extra value when all instances of the 
problem are considered? Research has revealed two 
conditions under which the NFL breaks down for GP 
because the set of fitness functions is not closed un- 
der permutation. First, GP has a many-to-one syntax 
tree to program output mapping because many differ- 
ent programs have the same functionality while pro- 
gram output functionality is not uniformly distributed 
across syntax trees. Second, a geometric argument has 
shown [43.164], that many realistic situations exist 
where a set of GP problems is provably not closed un- 
der permutation. The implication of a contradiction to 
the No Free Lunch theorem is that it is worthwhile in- 
vesting effort in improving a GP algorithm for a class 
of problems. 


43.5.3 Modularity 


Modularity in GP is the ability of a representation 
to evolve good building blocks and then encapsulate 
and reuse them. This can be expected to make com- 
plex programs far easier to find, since good building 
blocks needed in multiple places in the program not 
be laboriously re-evolved each time. One of the best- 
known approaches to modularity is automatically de- 
fined functions (ADFs), where the building blocks are 
implemented as functions which are defined in one 
part of the evolving program and then invoked from 
another part [43.58]. This work was followed by au- 
tomatically defined macros which are more powerful 
than ADFs and allow control of program flow [43.165]; 
automatically defined iteration, recursion, and mem- 
ory stores [43.10]; modularity in other representa- 
tions [43.166]; and demonstrations of the power of 
reuse, [43.167]. 


43.5.4 Open-Ended Evolution and GP 


Biological evolution is a long-running exploration of 
the enormously varied and indefinitely sized DNA 
search space. There is no hint that a limit on new ar- 
eas of the space to be explored will ever be reached. 
In contrast, EC algorithms often operate in search 
spaces which are finite and highly simplified in com- 
parison to biology. Although GP itself can be used 
for a wide variety of tasks (Sect. 43.1), each specific 
instance of the GP algorithm is capable of solving 
only a very narrow problem. In contrast, some re- 
searchers see biological evolution as pointing the way 
to a more ambitious vision of the possibilities for 
GP [43.168]. In this vision, an evolutionary run would 
continue for an indefinite length of time, always ex- 
ploring new areas of an indefinitely sized search space; 
always responding to changes in the environment; and 
always reshaping the search space itself. This vision 
is particularly well suited to GP, as opposed to GAs 
and similar algorithms, because GP already works in 
search spaces which are infinite in theory, if not in 
practice. 

To make this type of GP possible, it is necessary to 
prevent convergence of the population on a narrow area 
of the search space. Diversity preservation [43.169], pe- 
riodic injection of new random material [43.170], and 
island-structured population models [43.88] can help in 
this regard. 

Open-ended evolution would also be facilitated by 
complexity and nonstationarity in the algorithm’s evo- 


Genetic Programming | 43.6 Practicalities 


lutionary ecosystem. If fitness criteria are dynamic or 
coevolutionary [43.171—173], there may be no natural 


43.6 Practicalities 


43.6.1 Conferences and Journals 


Several conferences provide venues for the publication 
of new GP research results. The ACM Genetic and 
Evolutionary Computation Conference (GECCO) alter- 
nates annually between North America and the rest of 
the world and includes a GP track. EuroGP is held an- 
nually in Europe as the main event of Evo*, and focuses 
only on GP. The IEEE Congress on Evolutionary Com- 
putation is a larger event with broad coverage of EC 
in general. Genetic Programming Theory and Practice 
is held annually in Ann Arbor, MI, USA and provides 
a focused forum for GP discussion. Parallel Problem 
Solving from Nature is one of the older, general EC con- 
ferences, held biennially in Europe. It alternates with 
the Evolution Artificielle conference. Finally, Founda- 
tions of Genetic Algorithms is a smaller, theory-focused 
conference. 

The journal most specialized to the field is prob- 
ably Genetic Programming and Evolvable Machines 
(published by Springer). The September 2010, 10-year 
anniversary issue included several review articles on 
GP. Evolutionary Computation (MIT Press) and the 
IEEE Transactions on Evolutionary Computation also 
publish important GP material. Other on-topic journals 
with a broader focus include Applied Soft Computing 
and Natural Computing. 


43.6.2 Software 


A great variety of GP software is available. We will 
mention only a few packages — further options can be 
found online. 

One of the well-known Java systems is 
ECJ [43.174,175]. It is a general-purpose system 
with support for many representations, problems, and 
methods, both within GP and in the wider field of EC. 
It has a helpful mailing list. Watchmaker [43.176] is 
another general-purpose system with excellent out- 
of-the-box examples. GEVA [43.177, 178] is another 
Java-based package, this time with support only for 
GE. 

For users of C++ there are also several op- 
tions. Some popular packages include Evolutionary 


end-point to evolution, and so continued exploration un- 
der different criteria can lead to unlimited new results. 


Objects [43.179], wGP [43.180-182], and OpenBea- 
gle [43.183, 184]. Matlab users may be interested in 
GPLab [43.185], which implements standard GP, while 
DEAP [43.186] provides implementations of several al- 
gorithms in Python. PushGP [43.187] is available in 
many languages. 

Two more systems are worth mentioning for their 
deliberate focus on simplicity and understandability. 
TinyGP [43.188] and PonyGE [43.189] implement stan- 
dard GP and GE respectively, each in a single, readable 
source file. 

Moving on from open source, Michael Schmidt and 
Hod Lipson’s Eureqa [43.190] is a free-to-use tool with 
a focus on symbolic regression of numerical data and 
the built-in ability to use cloud resources. 

Finally, the authors are aware of two commercially 
available GP tools, each fast and industrial-strength. 
They have more automation and it just works function- 
ality, relative to most free and open-source tools. Free 
trials are available. DataModeler (Evolved Analytics 
LLC) [43.191] is a notebook in Mathematica. It em- 
ploys the ParetoGP method [43.128] which gives the 
ability to trade program fitness off against complex- 
ity, and to form ensembles of programs. It also exploits 
complex population archiving and archive-based selec- 
tion. It offers means of dealing with ill-conditioned data 
and extracting information on variable importance from 
evolved models. Discipulus (Register Machine Learn- 
ing Technologies, Inc.) [43.192] evolves machine code 
based on the ideas of Nordin et al. [43.193]. It runs 
on Windows only. The machine code representation 
allows very fast fitness evaluation and low memory us- 
age, hence large populations. In addition to typical GP 
features, it can: use an ES to optimise numerical con- 
stants; automatically construct ensembles; preprocess 
data; extract variable importance after runs; automat- 
ically simplify results; and save them to high-level 
languages. 


43.6.3 Resources and Further Reading 
Another useful resource for GP research is the GP 


Bibliography [43.194]. In addition to its huge, regu- 
larly updated collection of BibTeX-formatted citations, 


861 


9'E} |3 Hed 


862 


€} | 3 Hed 


Evolutionary Computation 


it has lists of researchers’ homepages [43.195] and co- 
authorship graphs. The GP mailing list [43.196] is one 
well-known forum for discussion. 

Many of the traditional GP benchmark problems 
have been criticized for being unrealistic in various 
ways. The lack of standardization of benchmark prob- 
lems also allows the possibility of cherry-picking of 
benchmarks. Effort is underway to bring some stan- 
dardization to the choice of GP benchmarks [43.197, 
198]. 

Those wishing to read further have many good 
options. The Field Guide to GP is a good introduc- 
tion, walking the reader through simple examples, 


References 


scanning large amounts of the literature, and offering 
practical advice [43.199]. Luke’s Essentials of Meta- 
heuristics [43.200] also has an introductory style, but 
is broader in scope. Both are free to download. Other 
broad and introductory books include those by Fo- 
gel [43.51] and Banzhaf et al. [43.201]. More special- 
ized books include those by Langdon and Poli [43.202] 
(coverage of theoretical topics), Langdon [43.11] (nar- 
rower coverage of GP with data structures), O’Neill and 
Ryan [43.77] (GE), Iba et al. [43.203] (GP-style ML), 
and Sipper [43.204] (games). Advances in Genetic Pro- 
gramming, a series of four volumes, contains important 
foundational work from the 1990s. 


43.1 J. McCarthy: Programs with Common Sense, Tech- 
nical Report (Stanford University, Department of 
Computer Science, Stanford 1963) 

43.2 F. Rosenblatt: The perceptron: A probabilistic 
model for information storage and organization 
in the brain, Psychol. Rev. 65(6), 386 (1958) 

43.3 D.E. Rumelhart, J.L. McClelland: Parallel Dis- 
tributed Processing: Explorations in the Mi- 
crostructure of Cognition, Volume 1: Foundations 
(MIT, Cambridge 1986) 


43.4 R.A. Brooks: Intelligence without representation, 
Artif. Intell. 47(1), 139-159 (1991) 
43.5 C. Cortes, V. Vapnik: Support-vector networks, 


Mach. Learn. 20(3), 273-297 (1995) 

43.6 L. Page, S. Brin, R. Motwani, T. Winograd: The 
Pagerank Citation Ranking: Bringing Order to the 
Web, Technical Report 1999-66 (Stanford InfoLab, 
Stanford 1999), available online at http://ilpubs. 
stanford.edu:8090/422/. Previous number = SIDL- 
WP-1999-0120. 

43.7 J. Levinson, J. Askeland, J. Becker, J. Dolson, 
D. Held, S. Kammel, J.Z. Kolter, D. Langer, 0. Pink, 
V. Pratt, M. Sokolsky, G. Stanek, D. Stavens, A. Te- 
ichma, M. Werling, S. Thrun: Towards fully au- 
tonomous driving: Systems and algorithms, In- 
tell. Veh. Symp. (IV) IEEE (2011) pp. 163-168 

43.8 C. Darwin: The Origin of Species by Means of Natu- 
ral Selection: Or, the Preservation of Favored Races 
in the Struggle for Life (John Murray, London 1859) 

43.9 T. Bäck, D.B. Fogel, Z. Michalewicz (Eds.): Hand- 
book of Evolutionary Computation (IOP Publ., 
Bristol 1997) 

43.10 J.R. Koza, D. Andre, F.H. Bennett Ill, M. Keane: Ge- 
netic Programming 3: Darwinian Invention and 
Problem Solving (Morgan Kaufman, San Francisco 
1999), available online at http://www.genetic- 
programming.org/gpbook3toc.html 

43.11 W.B. Langdon: Genetic Programming and Data 
Structures: Genetic Programming + Data Struc- 


tures = Automatic Programming! Genetic Pro- 
gramming, Vol. 1 (Kluwer, Boston 1998), avail- 
able online at http://www.cs.ucl.ac.uk/staff/W. 
Langdon/gpdata 

43.12 M. Suchorzewski, J. Clune: A novel generative 
encoding for evolving modular, regular and scal- 
able networks, Proc. 13th Annu. Conf. Genet. Evol. 
Comput. (2011) pp. 1523-1530 

43.13 J. Woodward: Evolving Turing complete rep- 
resentations, Proc. 2003 Congr. Evol. Comput. 
CEC2003, ed. by R. Sarker, R. Reynolds, H. Abbass, 
K.C. Tan, B. McKay, D. Essam, T. Gedeon (IEEE, 
Canberra 2003) pp. 830-837, available online at 
http://www.cs.bham.ac.uk/~jrw/publications/ 
2003/EvolvingTuringCompleteRepresentations/ 
cec032e.pdf 

43.14 J. Tanomaru: Evolving Turing machines from ex- 
amples, Lect. Notes Comput. Sci. 1363, 167-180 
(1993) 

43.15 D. Andre, F.H. Bennett Ill, J.R. Koza: Discovery by 
genetic programming of a cellular automata rule 
that is better than any known rule for the ma- 
jority classification problem, Proc. 1st Annu. Conf. 
Genet. Progr., ed. by J.R. Koza, D.E. Goldberg, 
D.B. Fogel, R.L. Riolo (MIT Press, Cambridge 1996) 
pp. 3-11, available online at http://www.genetic- 
programming.com/jkpdf/gp1996gkl. pdf 

43.16 F. Gruau: Neural Network Synthesis Using Cellular 
Encoding and the Genetic Algorithm, Ph.D. The- 
sis (Laboratoire de l'Informatique du Parallilisme, 
Ecole Normale Supirieure de Lyon, France 1994), 
available online at ftp://ftp.ens-lyon.fr/pub/LIP/ 
Rapports/PhD/PhD1994/PhD1994- 01- E.ps.Z 

43.17 A. Teller: Turing completeness in the language 
of genetic programming with indexed memory, 
Proc. 1994 IEEE World Congr. Comput. Intell., Or- 
lando, Vol. 1 (1994) pp. 136-141, available on- 
line at http://www.cs.cmu.edu/afs/cs/usr/astro/ 
public/papers/Turing.ps 


Genetic Programming 


References 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


S. Mabu, K. Hirasawa, J. Hu: A graph-based evo- 
lutionary algorithm: Genetic network program- 
ming (GNP) and its extension using reinforcement 
learning, Evol. Comput. 15(3), 369-398 (2007) 
J.R. Koza: Genetic Programming: On the Program- 
ming of Computers by Means of Natural Selection 
(MIT, Cambridge 1992) 

P. Prusinkiewicz, A. Lindenmayer: The Algorith- 
mic Beauty of Plants (The Virtual Laboratory) 
(Springer, Berlin, Heidelberg 1991) 

J. Murphy, M. O'Neill, H. Carr: Exploring grammat- 
ical evolution for horse gait optimisation, Lect. 
Notes Comput. Sci. 5481, 183-194 (2009) 

T. Haynes, S. Sen: Evolving behavioral strategies in 
predators and prey, Lect. Notes Comput. Sci. 1042, 
113-126 (1995) 

R. De Caux: Using Genetic Programming to Evolve 
Strategies for the Iterated Prisoner's Dilemma, 
Master's Thesis (University College, London 2001), 
available online at http://www.cs.ucl.ac.uk/staff/ 
W.Langdon/ftp/papers/decaux.masters.zip 

A. Hauptman, M. Sipper: GP-endchess: Using 
genetic programming to evolve chess endgame 
players, Lect. Notes Comput. Sci. 3447, 120-131 
(2005), available online at http://www.cs.bgu.ac. 
il/~sipper/papabs/eurogpchess- final. pdf 

E. Galvan-Lopéz, J.M. Swafford, M. O'Neill, 
A. Brabazon: Evolving a Ms. PacMan controller us- 
ing grammatical evolution, Lect. Notes Comput. 
Sci. 6024, 161-170 (2010) 

J. Togelius, S. Lucas, H.D. Thang, J.M. Garibaldi, 
T. Nakashima, C.H. Tan, I. Elhanany, S. Be- 
rant, P. Hingston, R.M. MacCallum, T. Haferlach, 
A. Gowrisankar, P. Burrow: The 2007 IEEE CEC sim- 
ulated car racing competition, Genet. Program. 
Evol. Mach. 9(4), 295-329 (2008) 

G.S. Hornby, J.D. Lohn, D.S. Linden: Computer- 
automated evolution of an X-band antenna for 
NASA's space technology 5 mission, Evol. Comput. 
19(1), 1-23 (2011) 

M. Furuholmen, K.H. Glette, M.E. Hovin, J. Torre- 
sen: Scalability, generalization and coevolution — 
experimental comparisons applied to automated 
facility layout planning, GECCO '09: Proc. 1th 
Annu. Conf. Genet. Evol. Comput., Montreal, ed. 
by F. Rothlauf, G. Raidl (2009) pp. 691-698, avail- 
able online at http://doi.acm.org/10.1145/1569901. 
1569997 

C.C. Bojarczuk, H.S. Lopes, A.A. Freitas: Ge- 
netic programming for knowledge discovery in 
chest-pain diagnosis, IEEE Eng. Med. Biol. Mag. 
19(4), 38-44 (2000), available online at http:// 
ieeexplore.ieee.org/iel5/51/18543/00853480. pdf 
T. Hildebrandt, J. Heger, B. Scholz-Reiter, M. Pe- 
likan, J. Branke: Towards improved dispatching 
rules for complex shop floor scenarios: A ge- 
netic programming approach, GECCO '10: Proc. 
12th Annu. Conf. Genet. Evol. Comput., Portland, 
ed. by J. Branke (2010) pp. 257-264 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


43. 


31 


32 


33 


34 


35 


36 


37 


38 


39 


43.40 


43.41 


43.42 


43.43 


43.44 


43.45 


M.B. Bader-El-Den, R. Poli, S. Fatima: Evolving 
timetabling heuristics using a grammar-based 
genetic programming hyper-heuristic frame- 
work, Memet. Comput. 1(3), 205-219 (2009), 
10.1007/s12293-009-0022-y 

M. Conrads, P. Nordin, W. Banzhaf: Speech sound 
discrimination with genetic programming, Lect. 
Notes Comput. Sci. 1391, 113-129 (1998) 

A. Esparcia-Alcazar, K. Sharman: Genetic pro- 
gramming for channel equalisation, Lect. Notes 
Comput. Sci. 1596, 126-137 (1999), available 
online at http://www. iti.upv.es/~anna/papers/ 
evoiasp99.ps 

R. Poli, M. Salvaris, C. Cinel: Evolution of a brain- 
computer interface mouse via genetic program- 
ming, Lect. Notes Comput. Sci. 6621, 203-214 (2011) 
K. Sims: Artificial evolution for computer graphics, 
ACM Comput. Gr. 25(4), 319-328 (1991), available 
online at http://delivery.acm.org/10.1145/130000/ 
122752/p319-sims.pdf SIGGRAPH '91 Proceedings 
U.-M. O'Reilly, M. Hemberg: Integrating gener- 
ative growth and evolutionary computation for 
form exploration, Genet. Program. Evol. Mach. 
8(2), 163-186 (2007), Special issue on develop- 
mental systems 

P. Dahlstedt: Autonomous evolution of complete 
piano pieces and performances, Proc. Music AL 
Workshop (2007) 

H. Iba: Multiple-agent learning for a robot navi- 
gation task by genetic programming, Genet. Pro- 
gram. Proc. 2nd Annu. Conf., Standord, ed. by 
J.R. Koza, K. Deb, M. Dorigo, D.B. Fogel, M. Gar- 
zon, H. Iba, R.L. Riolo (1997) pp. 195-200 

T. Weise, K. Tang: Evolving distributed algorithms 
with genetic programming, IEEE Trans. Evol. Com- 
put. 16(2), 242-265 (2012) 

L. Spector: Autoconstructive evolution: Push, 
pushGP, and pushpop, Proc. Genet. Evol. Comput. 
Conf. (GECCO-2001), ed. by L. Spector, E. Goodman 
(Morgan Kaufmann, San Francisco 2001) pp. 137- 
146, available online at http://hampshire.edu/ 
Ispector/pubs/ace.pdf 

J. Tavares, F. Pereira: Automatic design of ant al- 
gorithms with grammatical evolution. In: Gnetic 
Programming. 15th European Conference, Eu- 
roGP, ed. by A. Moraglio, S. Silva, K. Krawiec, 
P. Machado, C. Cotta (Springer, Berlin, Heidelberg 
2012) pp. 206-217 

M. Hutter: A Gentle Introduction To The Universal 
Algorithmic Agent AIXI. Technical Report IDSIA- 
01-03 (IDSIA, Manno-Lugano 2003) 

J. Von Neumann, M.D. Godfrey: First draft of a re- 
port on the EDVAC, IEEE Ann. Hist. Comput. 15(4), 
27-75 (1993) 

H.H. Goldstine, A. Goldstine: The electronic nu- 
merical integrator and computer (ENIAC), Math. 
Tables Other Aids Comput. 2(15), 97-110 (1946) 
A.M. Turing: Intelligent machinery. In: Cybernet- 
ics: Key Papers, ed. by C.R. Evans, A.D.J. Robert- 


863 


€} | 3 Hed 


864 PartE 


Evolutionary Computation 


E€} | 3 Wed 


43.46 


43.47 


43.48 


43.49 


43.50 


43.51 


43.52 


43.53 


43.54 


43.55 


43.56 


43.57 


43.58 


43.59 


son (Univ. Park Press, Baltimore 1968), Written 
1948 

A.M. Turing: Computing machinery and intelli- 
gence, Mind 59(236), 433-460 (1950) 

A.L. Samuel: Some studies in machine learning 
using the game of checkers, IBM J. Res. Dev. 3(3), 
210 (1959) 

R.M. Friedberg: A learning machine: Part I, IBM 
J. Res. Dev. 2(1), 2-13 (1958) 

M. O'Neill, L. Vanneschi, S. Gustafson, 
W. Banzhaf: Open issues in genetic program- 
ming, Genet. Program. Evol. Mach. 11(3/4), 
339-363 (2010), 10th Anniversary Issue: Progress 
in Genetic Programming and Evolvable Ma- 
chines 

L.J. Fogel, A.J. Owens, M.J. Walsh: Artificial In- 
telligence Through Simulated Evolution (Wiley, 
Hoboken 1966) 

D.B. Fogel: Evolutionary Computation: Toward 
a New Philosophy of Machine Intelligence, Vol. 1 
(Wiley, Hoboken 2006) 

S.F. Smith: A Learning System Based on Genetic 
Adaptive Algorithms, Ph.D. Thesis (University of 
Pittsburgh, Pittsburgh 1980) 

N.L. Cramer: A representation for the adaptive 
generation of simple sequential programs, Proc. 
Int. Conf. Genet. Algorithms Appl., Pittsburgh, 
ed. by J.J. Grefenstette (1985) pp. 183-187, avail- 
able online at http://www.sover.net/~nichael/ 
nic- publications/icga85/index.html 

J. Schmidhuber: Evolutionary Principles in Self- 
Referential Learning. On Learning Now to Learn: 
The Meta-Meta-Meta...-Hook, Diploma Thesis 
(Technische Universitat, Miinchen 1987), avail- 
able online at http://www.idsia.ch/~juergen/ 
diploma.html 

C. Fujiki, J. Dickinson: Using the genetic algo- 
rithm to generate lisp source code to solve the 
prisoner's dilemma, Proc. 2nd Int. Conf. Genet. 
Algorithms Appl., Cambridge, ed. by J.J. Grefen- 
stette (1987) pp. 236-240 

A.S. Bickel, R.W. Bickel: Tree structured rules in 
genetic algorithms, Proc. 2nd Int. Conf. Genet. 
Algorithms Appl., Cambridge, ed. by J.J. Grefen- 
stette (1987) pp. 77-81 

T.S. Ray: Evolution, Ecology and Optimization 
of Digital Organisms. Technical Report Working 
Paper 92-08-042 (Santa Fe Institute, Santa Fe 
1992) available online at http://www.santafe. 
edu/media/workingpapers/92-08-042.pdf 

J.R. Koza: Genetic Programming II: Automatic 
Discovery of Reusable Programs (MIT, Cambridge 
1994) 

J.R. Koza, M.A. Keane, M.J. Streeter, W. Mydlowec, 
J. Yu, G. Lanza: Genetic Programming IV: Rou- 
tine Human-Competitive Machine Intelligence 
(Springer, Berlin, Heidelberg 2003), available 
online at http://www.genetic- programming.org/ 
gpbook4toc.html 


43.60 


43.61 


43.62 


43.63 


43.64 


43.65 


43.66 


43.67 


43.68 


43.69 


43.70 


43.71 


43.72 


43.73 


43.74 


J. Koza: http://www.genetic- programming.org/ 
hc2011/combined.html 

J.G. Carbonell, R.S. Michalski, T.M. Mitchell: 
An overview of machine learning. In: Machine 
Learning: An Artificial Intelligence Approach, ed. 
by R.S. Michalski, J.G. Carbonell, T.M. Mitchell 
(Tioga, Palo Alto 1983) 

C. Rich, R.C. Waters: Automatic programming: 
Myths and prospects, Computer 21(8), 40-51 
(1988) 

S. Gulwani: Dimensions in program synthesis, 
Proc. 12th Int. SIGPLAN Symp. Princ. Pract. Declar. 
Program. (2010) pp. 13-24 

l. Rechenberg: Evolutionsstrategie - Optimierung 
Technischer Systeme nach Prinzipien der Biologis- 
chen Evolution (Frommann-Holzboog, Stuttgart 
1973) 

H.-P. Schwefel: Numerische Optimierung von 
Computer-Modellen (Birkhäuser, Basel 1977) 

J.H. Holland: Adaptation in Natural and Artificial 
Systems (University of Michigan, Ann Arbor 1975) 
D.J. Montana: Strongly typed genetic program- 
ming, Evol. Comput. 3(2), 199-230 (1995), avail- 
able online at http://vishnu.bbn.com/papers/ 
stgp.pdf 

T. Yu: Hierachical processing for evolving recursive 
and modular programs using higher order func- 
tions and lambda abstractions, Genet. Program. 
Evol. Mach. 2(4), 345-380 (2001) 

R. Poli: Parallel distributed genetic programming. 
In: New Ideas in Optimization, Advanced Topics 
in Computer Science, ed. by D. Corne, M. Dorigo, 
F. Glover (McGraw-Hill, London 1999) pp. 403-431, 
Chapter 27, available online at http://citeseer.ist. 
psu.edu/328504.html 

J.F. Miller, P. Thomson: Cartesian genetic pro- 
gramming, Lect. Notes Comput. Sci. 1802, 121-132 
(2000), available online at http://www.elec.york. 
ac.uk/intsys/users/jfm7/cgp- eurogp2000. pdf 
K.0. Stanley: Compositional pattern producing 
networks: A novel abstraction of development, 
Genet. Program. Evol. Mach. 8(2), 131-162 (2007) 
L.J. Fogel, P.J. Angeline, D.B. Fogel: An evolution- 
ary programming approach to self-adaptation on 
finite state machines, Proc. 4th Int. Conf. Evol. 
Program. (1995) pp. 355-365 

R.I. McKay, N.X. Hoai, P.A. Whigham, Y. Shan, 
M. O'Neill: Grammar-based genetic program- 
ming: A survey, Genet. Program. Evol. Mach. 
11(3/4), 365-396 (2010), September Tenth Anniver- 
sary Issue: Progress in Genetic Programming and 
Evolvable Machines 

M. Keijzer, V. Babovic: Dimensionally aware ge- 
netic programming, Proc. Genet. Evol. Comput. 
Conf., Orlando, Vol. 2, ed. by W. Banzhaf, J. Daida, 
A.E. Eiben, M.H. Garzon, V. Honavar, M. Jakiela, 
R.E. Smith (1999) pp. 1069-1076, available on- 
line at http://www.cs.bham.ac.uk/~wbl/biblio/ 
gecco1999/GP-420.ps 


Genetic Programming | References 


43.75 


43.76 


43.77 


43.78 


43.79 


43.80 


43.81 


43.82 


43.83 


43.84 


A. Ratle, M. Sebag: Grammar-guided genetic 
programming and dimensional consistency: 
Application to non-parametric identifica- 
tion in mechanics, Appl. Soft Comput. 1(1), 
105-118 (2001), available online at http:// 
www.sciencedirect.com/science/article/B6W86- 
43S6W98- B/1/38e0fa6ac503a5ef310e2287be0leffs8 
P.A. Whigham: Grammatically-based genetic 
programming, Proc. Workshop Genet. Program.: 
From Theory Real-World Appl., Tahoe City, ed. 
by J.P. Rosca (1995) pp. 33-41, available on- 
line at http://divcom.otago.ac.nz/sirc/Peterw/ 
Publications/m195.zip 

M. O'Neill, C. Ryan: Grammatical Evolution: Evo- 
lutionary Automatic Programming in a Arbitrary 
Language, Genetic Programming, Vol. 4 (Kluwer, 
Boston 2003), available online at http://www. 
wkap.nl/prod/b/1-4020- 7444-1 

N. Xuan Hoai, R.I. McKay, D. Essam: Repre- 
sentation and structural difficulty in genetic 
programming, IEEE Trans. Evol. Comput. 10(2), 
157-166 (2006), available online at http://sc. 
snu.ac.kr/courses/2006/fall/pg/aai/GP/nguyen/ 
Structdiff. pdf 

Y. Shan, R.I. McKay, D. Essam, H.A. Abbass: A sur- 
vey of probabilistic model building genetic pro- 
gramming. In: Scalable Optimization via Prob- 
abilistic Modeling: From Algorithms to Appli- 
cations, Studies in Computational Intelligence, 
Vol. 33, ed. by M. Pelikan, K. Sastry, E. Cantu-Paz 
(Springer, Berlin, Heidelberg 2006) pp. 121-160, 
Chapter 6 

M. Brameier, W. Banzhaf: Linear Genetic 
Programming, Genetic and Evolutionary Com- 
putation, Vol. 16 (Springer, Berlin, Heidelberg 
2007), available online at http://www.springer. 
com/west/home/default?SGWID=4-40356-22- 
173660820-0 

T. Perkis: Stack-based genetic programming, Proc. 
1994 IEEE World Congr. Comput. Intell., Orlando, 
Vol. 1 (1994) pp. 148-153, available online at 
http://citeseer.ist. psu.edu/432690.html 

W.B. Langdon: Large scale bioinformatics data 
mining with parallel genetic programming on 
graphics processing units. In: Parallel and Dis- 
tributed Computational Intelligence, Studies in 
Computational Intelligence, Vol. 269, ed. by F. de 
Fernandez Vega, E. Cantu-Paz (Springer, Berlin, 
Heidelberg 2010) pp. 113-141, Chapter 5, available 
online at http://www.springer.com/engineering/ 
book/978-3-642-10674-3 

L. Spector, A. Robinson: Genetic programming 
and autoconstructive evolution with the push 
programming language, Genet. Program. Evol. 
Mach. 3(1), 7-40 (2002), available online at http:// 
hampshire.edu/Ispector/pubs/push-gpem-final. 
pdf 

E. Schulte, S. Forrest, W. Weimer: Automated pro- 
gram repair through the evolution of assembly 


43.85 


43.86 


43.87 


43.88 


43.89 


43.90 


43.91 


43.92 


43.93 


43.94 


43.95 


43.96 


43.97 


code, Proc. IEEE/ACM Int. Conf. Autom. Softw. Eng. 
(2010) pp. 313-316 

M. Orlov, M. Sipper: Flight of the FINCH through 
the Java wilderness, IEEE Trans. Evol. Comput. 
15(2), 166-182 (2011) 

P. Nordin: A compiling genetic programming 
system that directly manipulates the machine 
code. In: Advances in Genetic Programming, 
ed. by K.E. Kinnear Jr. (MIT Press, Cambridge 
1994) pp. 311-331, Chapter 14, available on- 
line at http://cognet.mit.edu/library/books/view? 
isbn=0262111888 

U.-M. O'Reilly, F. Oppacher: Program search with 
a hierarchical variable length representation: Ge- 
netic programming, simulated annealing and hill 
climbing, Lect. Notes Comput. Sci. 866, 397-406 
(1994), available online at http://www.cs.ucl.ac. 
uk/staff/W.Langdon/ftp/papers/ppsn- 94. ps.gz 
M. Tomassini: Spatially Structured Evolutionary 
Algorithms (Springer, Berlin, Heidelberg 2005) 

A. Arcuri, X. Yao: A novel co-evolutionary ap- 
proach to automatic software bug fixing, IEEE 
World Congr. Comput. Intell., Hong Kong, ed. by 
J. Wang (2008) 

A. Moraglio, C. Di Chio, R. Poli: Geometric parti- 
cle swarm optimization, Lect. Notes Comput. Sci. 
4445, 125-136 (2007) 

M. O'Neill, A. Brabazon: Grammatical differen- 
tial evolution, Proc. Int. Conf. Artif. Intell. ICAI 
2006, Las Vegas, Vol. 1, ed. by H.R. Arabnia (2006) 
pp. 231-236, available online at http://citeseerx. 
ist. psu.edu/viewdoc/summary?doi=10.1.1.91.3012 
R. Poli, N.F. McPhee: A linear estimation-of- 
distribution GP system, Lect. Notes Comput. Sci. 
4971, 206-217 (2008) 

M. Looks, B. Goertzel, C. Pennachin: Learning 
computer programs with the Bayesian optimiza- 
tion algorithm, GECCO 2005: Proc. Conf. Genet. 
Evol. Comput., Washington, Vol. 1, ed. by U.- 
M. O'Reilly, H.-G. Beyer (2005) pp. 747-748, avail- 
able online at http://www.cs.bham.ac.uk/~wbl/ 
biblio/gecco2005/docs/p747. pdf 

E. Hemberg, K. Veeramachaneni, J. McDermott, 
C. Berzan, U.-M. O'Reilly: An investigation of lo- 
cal patterns for estimation of distribution ge- 
netic programming, Philadelphia, Proc. GECCO 
2012 (2012) 

M. Schmidt, H. Lipson: Distilling free-form 
natural laws from experimental data, Sci- 
ence 324(5923), 81-85 (2009), available online 
at http://ccsl.mae.cornell.edu/sites/default/files/ 
Science09_Schmidt. pdf 

E.J. Viadislavleva, G.F. Smits, D. den Hertog: Order 
of nonlinearity as a complexity measure for mod- 
els generated by symbolic regression via Pareto 
genetic programming, IEEE Trans. Evol. Comput. 
13(2), 333-349 (2009) 

K. Veeramachaneni,_ E. 
M. O'Reilly: Knowledge 


Vladislavleva, U.- 
mining sensory 


865 


€} | 3 Hed 


866 PartE 


Evolutionary Computation 


E€} | 3 Wed 


43.98 


43.99 


43.100 


43.101 


43.102 


43.103 


43.104 


43.105 


43.106 


43.107 


43.108 


43.109 


43.110 


evaluation data: Genetic programming, sta- 
tistical techniques, and swarm optimization, 
Genet. Program. Evolvable Mach. 13(1), 103-133 
(2012) 

M. Kotanchek, G. Smits, E. Vladislavleva: Pursu- 
ing the Pareto paradigm tournaments, algorithm 
variations & ordinal optimization. In: Genetic 
Programming Theory and Practice IV, Genetic and 
Evolutionary Computation, Vol. 5, ed. by R.L. Ri- 
olo, T. Soule, B. Worzel (Springer, Berlin, Heidel- 
berg 2006) pp. 167-186, Chapter 12 

C. Le Goues, T. Nguyen, S. Forrest, W. Weimer: 
GenProg: A generic method for automated soft- 
ware repair, IEEE Trans. Softw. Eng. 38(1), 54-72 
(2011) 

C. Ryan: Automatic Re-Engineering of Software 
Using Genetic Programming, Genetic Program- 
ming, Vol. 2 (Kluwer, Boston 2000), available on- 
line at http://www.wkap.nl/book.htm/0-7923- 
8653-1 

K.P. Williams: Evolutionary Algorithms for Auto- 
matic Parallelization, Ph.D. Thesis (University of 
Reading, Reading 1998) 

M. Stephenson, S. Amarasinghe, M. Martin, U.- 
M. O'Reilly: Meta optimization: Improving com- 
piler heuristics with machine learning, Proc. ACM 
SIGPLAN Conf. Program. Lang. Des. Implement. 
(PLDI '03), San Diego (2003) pp. 77-90 

M. Stephenson, U.-M. O'Reilly, M.C. Martin, 
S. Amarasinghe: Genetic programming applied to 
compiler heuristic optimization, Lect. Notes Com- 
put. Sci. 2610, 238-253 (2003) 

D.R. White, A. Arcuri, J.A. Clark: Evolutionary im- 
provement of programs, IEEE Trans. Evol. Comput. 
15(4), 515-538 (2011) 

M. Harman: The current state and future of search 
based software engineering, Proc. Future of Soft- 
ware Engineering FOSE '07, Washington, ed. by 
L. Briand, A. Wolf (2007) pp. 342-357 

J.R. Koza, F.H. Bennett Ill, D. Andre, M.A. Keane, 
F. Dunlap: Automated synthesis of analog elec- 
trical circuits by means of genetic program- 
ming, IEEE Trans. Evol. Comput. 1(2), 109-128 
(1997), available online at http://www.genetic- 
programming.com/jkpdflieeetecjournall1997. pdf 
R. Dawkins: The Blind Watchmaker (Norton, New 
York 1986) 

H. Takagi: Interactive evolutionary computation: 
Fusion of the capabilities of EC optimization and 
human evaluation, Proc. IEEE 89(9), 1275-1296 
(2001) 

S. Todd, W. Latham: Evolutionary Art and Com- 
puters (Academic, Waltham 1994) 

M. Lewis: Evolutionary visual art and design. In: 
The Art of Artificial Evolution: A Handbook on 
Evolutionary Art and Music, ed. by J. Romero, 
P. Machado (Springer, Berlin, Heidelberg 2008) 
pp. 3-37 


43.111 


43.112 


43.113 


43.114 


43.115 


43.116 


43.117 


43.118 


43.119 


43.120 


43.121 


43.122 


43.123 


43.124 


J. McDermott, J. Byrne, J.M. Swafford, M. O'Neill, 
A. Brabazon: Higher-order functions in aesthetic 
EC encodings, 2010 IEEE World Congr. Comput. In- 
tell., Barcelona (2010), pp. 2816-2823, 18-23 July 
D.A. Hart: Toward greater artistic control for inter- 
active evolution of images and animation, Lect. 
Notes Comput. Sci. 4448, 527-536 (2007) 

A.K. Hoover, M.P. Rosario, K.0. Stanley: Scaffold- 
ing for interactively evolving novel drum tracks for 
existing songs, Lect. Notes Comput. Sci. 4974, 412 
(2008) 

J. Shao, J. McDermott, M. O'Neill, A. Brabazon: 
Jive: A generative, interactive, virtual, evolution- 
ary music system, Lect. Notes Comput. Sci. 6025, 
341-350 (2010) 

J. McDermott, U.-M. O'Reilly: An executable graph 
representation for evolutionary generative music, 
Proc. GECCO 2011 (2011) pp. 403-410 

J. Clune, H. Lipson: Evolving three-dimensional 
objects with a generative encoding inspired by 
developmental biology, Proc. Eur. Conf. Artif. Life 
(2011), available online at http://endlessforms. 
com 

J. McCormack: Evolutionary L-systems. In: Design 
by Evolution: Advances in Evolutionary Design, 
ed. by P.F. Hingston, L.C. Barone, Z. Michalewicz, 
D.B. Fogel (Springer, Berlin, Heidelberg 2008) 
pp. 169-196 

G.S. Hornby, J.B. Pollack: Evolving L-systems to 
generate virtual creatures, Comput. Graph. 25(6), 
1041-1048 (2001) 

P. Worth, S. Stepney: Growing music: Musical in- 
terpretations of L-systems, Lect. Notes Comput. 
Sci. 3449, 545-550 (2005) 

J. McDermott, J. Byrne, J.M. Swafford, M. Hem- 
berg, C. McNally, E. Shotton, E. Hemberg, 
M. Fenton, M. O'Neill: String-rewriting grammars 
for evolutionary architectural design, Environ. 
Plan. B 39(4), 713-731 (2012), available online at 
http://www. envplan.com/abstract.cgi?id=b38037 
W. Banzhaf, W.B. Langdon: Some considera- 
tions on the reason for bloat, Genet. Program. 
Evol. Mach. 3(1), 81-91 (2002), available online 
at http://web.cs.mun.ca/~banzhaf/papers/genp_. 
bloat. pdf 

S. Luke, L. Panait: A comparison of bloat control 
methods for genetic programming, Evol. Comput. 
14(3), 309-344 (2006) 

S. Silva, S. Dignum, L. Vanneschi: Operator equal- 
isation for bloat free genetic programming and 
a survey of bloat control methods, Genet. Pro- 
gram. Evol. Mach. 3(2), 197-238 (2011) 

W.B. Langdon, R. Poli: Fitness causes bloat. 
In: Soft Computing in Engineering Design and 
Manufacturing, ed. by P.K. Chawdhry, R. Roy, 
R.K. Pant (Springer, London 1997) pp. 13-22, avail- 
able online at http://www.cs.bham.ac.uk/~wbl/ 
ftp/papers/WBL.bloat_wsc2.ps.gz 


Genetic Programming 


References 


43.125 


43.126 


43.127 


43.128 


43.129 


43.130 


43.131 


43.132 


43.133 


S. Dignum, R. Poli: Generalisation of the limit- 
ing distribution of program sizes in tree-based 
genetic programming and analysis of its effects 
on bloat, GECCO '07 Proc. 9th Annu. Conf. Genet. 
Evol. Comput., London, Vol. 2, ed. by H. Lipson, 
D. Thierens (2007) pp. 1588-1595, available on- 
line at http://www.cs.bham.ac.uk/~wbl/biblio/ 
gecco2007/docs/p1588.pdf 

W.B. Langdon: How many good programs are 
there? How long are they?, Found. Genet. Algo- 
rithms VII, San Francisco, ed. by K.A. De Jong, 
R. Poli, J.E. Rowe (2002), pp. 183-202, avail- 
able online at http://www.cs.ucl.ac.uk/staff/W. 
Langdon/ftp/papers/wbl_foga2002.pdf 

R. Poli, M. Salvaris, C. Cinel: Evolution of an effec- 
tive brain-computer interface mouse via genetic 
programming with adaptive Tarpeian bloat con- 
trol. In: Genetic Programming Theory and Practice 
IX, ed. by R. Riolo, K. Vladislavleva, J. Moore 
(Springer, Berlin, Heidelberg 2011) pp. 77-95 

G. Smits, E. Viadislavleva: Ordinal pareto genetic 
programming, Proc. 2006 IEEE Congr. Evol. Com- 
put., Vancouver, ed. by G.G. Yen, S.M. Lucas, 
G. Fogel, G. Kendall, R. Salomon, B.-T. Zhang, 
C.A. Coello Coello, T.P. Runarsson (2006) pp. 3114- 
3120, available online at http://ieeexplore.ieee. 
org/serviet/opac?punumber=11108 

L. Vanneschi, M. Castelli, S. Silva: Measuring bloat, 
overfitting and functional complexity in genetic 
programming, GECCO '10: Proc. 12th Annu. Conf. 
Genet. Evol. Comput., Portland (2010) pp. 877- 
884 

H. Iba, H. de Garis, T. Sato: Genetic program- 
ming using a minimum description length prin- 
ciple. In: Advances in Genetic Programming, ed. 
by K.E. Kinnear Jr. (MIT Press, Cambridge 1994) 
pp. 265-284, available online at http://citeseer. 
ist.psu.edu/327857.html, Chapter 12 

J. Miller: What bloat? Cartesian genetic pro- 
gramming on boolean problems, 2001 Genet. 
Evol. Comput. Conf. Late Break. Pap., ed. by 
E.D. Goodman (2001) pp. 295-302, available on- 
line at http://www.elec.york.ac.uk/intsys/users/ 
jfm7/gecco2001Late. pdf 

R. Poli, J. Page, W.B. Langdon: Smooth uni- 
form crossover, sub-machine code GP and demes: 
A recipe for solving high-order Boolean par- 
ity problems, Proc. Genet. Evol. Comput. Conf., 
Orlando, Vol. 2, ed. by W. Banzhaf, J. Daida, 
A.E. Eiben, M.H. Garzon, V. Honavar, M. Jakiela, 
R.E. Smith (1999) pp. 1162-1169, available on- 
line at http://www.cs.bham.ac.uk/~wbl/biblio/ 
gecco1999/GP-466.pdf 

M. Keijzer: Alternatives in subtree caching for 
genetic programming, Lect. Notes Comput. 
Sci. 3003, 328-337 (2004), available online at 
http://www. springerlink.com/openurl.asp? 
genre=article&issn=0302-9743&volume=3003& 
spage=328 


43.134 


43.135 


43.136 


43.137 


43.138 


43.139 


43.140 


43.141 


43.142 


43.143 


43.144 


R. Poli, W.B. Langdon: Running genetic program- 
ming backward. In: Genetic Programming Theory 
and Practice III, Genetic Programming, Vol. 9, ed. 
by T. Yu, R.L. Riolo, B. Worzel (Springer, Berlin, 
Heidelberg 2005) pp. 125-140, Chapter 9, avail- 
able online at http://www.cs.essex.ac.uk/staff/ 
poli/papers/GPTP2005. pdf 

Q.U. Nguyen, X.H. Nguyen, M. O'Neill, R.I. McKay, 
E. Galvan-Lépez: Semantically-based crossover in 
genetic programming: Application to real-valued 
symbolic regression, Genet. Program. Evol. Mach. 
12, 91-119 (2011) 

L. Altenberg: Emergent phenomena in genetic 
programming, Evol. Progr. — Proc. 3rd Annu. 
Conf., San Diego, ed. by A.V. Sebald, L.J. Fo- 
gel (1994) pp. 233-241, available online at http:// 
dynamics.org/~altenber/PAPERS/EPIGP/ 

U.-M. O'Reilly, F. Oppacher: The troubling aspects 
of a building block hypothesis for genetic pro- 
gramming, Working Paper 94-02-001 (Santa Fe 
Institute, Santa Fe 1992) 

R. Poli, W.B. Langdon: A new schema theory for 
genetic programming with one-point crossover 
and point mutation, Proc. Second Annu. Conf. 
Genet. Progr. 1997, Stanford, ed. by J.R. Koza, 
K. Deb, M. Dorigo, D.B. Fogel, M. Garzon, H. Iba, 
R.L. Riolo (1997) pp. 278-285, available online at 
http://citeseer.ist.psu.edu/327495.html 

J.P. Rosca: Analysis of complexity drift in genetic 
programming, Proc. 2nd Annu. Conf. Genet. Pro- 
gram. 1997, Stanford, ed. by J.R. Koza, K. Deb, 
M. Dorigo, D.B. Fogel, M. Garzon, H. Iba, R.L. Riolo 
(1997), pp. 286-294, available online at ftp://ftp. 
cs.rochester.edu/pub/u/rosca/gp/97.gp.ps.gz 

R. Poli, N.F. McPhee: General schema theory 
for genetic programming with subtree-swapping 
crossover: Part |, Evol. Comput. 11(1), 53-66 
(2003), available online at http://cswww.essex. 
ac.uk/staff/rpoli/papers/ecj2003partl.pdf 

R. Poli, N.F. McPhee: General schema theory 
for genetic programming with subtree-swapping 
crossover: Part Il, Evol. Comput. 11(2), 169-206 
(2003), available online at http://cswww.essex. 
ac.uk/staff/rpoli/papers/ecj2003partll.pdf 

R. Poli, N.F. McPhee, J.E. Rowe: Exact schema 
theory and Markov chain models for genetic 
programming and variable-length genetic algo- 
rithms with homologous crossover, Genet. Pro- 
gram. Evol. Mach. 5(1), 31-70 (2004), avail- 
able online at http://cswww.essex.ac.uk/staff/ 
rpoli/papers/GPEM2004. pdf 

N.F. McPhee, B. Ohs, T. Hutchison: Semantic 
building blocks in genetic programming, Lect. 
Notes Comput. Sci. 4971, 134-145 (2008) 

G. Durrett, F. Neumann, U.-M. O'Reilly: Compu- 
tational complexity analysis of simple genetic 
programming on two problems modeling isolated 
program semantics, Proc. 11th Workshop Found. 
Genet. Algorithm. (ACM, New York 2011) pp. 69- 


867 


€} | 3 Hed 


868 PartE 


Evolutionary Computation 


€} | 3 Wed 


43.145 


43.146 


43.147 


43.148 


43.149 


43.150 


43.151 


43.152 


43.153 


43.154 


43.155 


43.156 


43.157 


80, available online at http://larxiv.org/pdf/1007. 
4636v1 arXiv:1007.4636v1 

D.E. Goldberg, U.-M. O'Reilly: Where does the 
good stuff go, and why? How contextual seman- 
tics influence program structure in simple genetic 
programming, Lect. Notes Comput. Sci. 1391, 16- 
36 (1998), available online at http://citeseer.ist. 
psu.edu/96596.html 

D.E. Goldberg: Genetic Algorithms in Search, 
Optimization, and Machine Learning (Addison- 
Wesley, Reading 1989) 

T. Kötzing, F. Neumann, A. Sutton, U.-M. O'Reilly: 
The max problem revisited: The importance of 
mutation in genetic programming, GECCO '12 Proc. 
14th Annu. Conf. Genet. Evolut. Comput. (ACM, 
New York 2012) pp. 1333-1340 

C. Gathercole, P. Ross: The Max Problem for 
Genetic Programming - Highlighting an Ad- 
verse Interaction Between the Crossover Oper- 
ator and a Restriction on Tree Depth, Tech- 
nical Report (Department of Artificial Intelli- 
gence, University of Edinburgh, Edinburgh 1995) 
available online at http://citeseer.ist.psu.edu/ 
gathercole95max.html 

F. Neumann: Computational complexity analysis 
of multi-objective genetic programming, GECCO 
'12 Proc. 14th Annu. Conf. Genet. Evolut. Comput. 
(ACM, New York 2012) pp. 799-806 

T. Kötzing, F. Neumann, R. Spöhel: PAC learn- 
ing and genetic programming, Proc. 13th Annu. 
Conf. Genet. Evol. Comput. (ACM, New York 2011) 
pp. 2091-2096 

F. Rothlauf: Representations for Genetic and Evo- 
lutionary Algorithms, 2nd edn. (Physica, Heidel- 
berg 2006) 

T. Jones: Evolutionary Algorithms, Fitness Land- 
scapes and Search, Ph.D. Thesis (University of New 
Mexico, Albuquerque 1995) 

J. McDermott, E. Galván-Lopéz, M. O'Neill: A fine- 
grained view of phenotypes and locality in ge- 
netic programming. In: Genetic Programming 
Theory and Practice, Vol. 9, ed. by R. Riolo, 
K. Vladislavleva, J. Moore (Springer, Berlin, Hei- 
delberg 2011) 

M. Tomassini, L. Vanneschi, P. Collard, M. Clergue: 
A study of fitness distance correlation as a dif- 
ficulty measure in genetic programming, Evol. 
Comput. 13(2), 213-239 (2005) 

L. Vanneschi: Theory and Practice for Efficient Ge- 
netic Programming, Ph.D. Thesis (Université de 
Lausanne, Lausanne 2004) 

L. Vanneschi, M. Tomassini, P. Collard, S. Verel, 
Y. Pirola, G. Mauri: A comprehensive view of 
fitness landscapes with neutrality and fitness 
clouds, Lect. Notes Comput. Sci. 4445, 241-250 
(2007) 

A. Ekart, S.Z. Németh: A metric for genetic pro- 
grams and fitness sharing, Lect. Notes Comput. 
Sci. 1802, 259-270 (2000) 


43.158 


43.159 


43.160 


43.161 


43.162 


43.163 


43.164 


43.165 


43.166 


43.167 


43.168 


43.169 


S. Gustafson, L. Vanneschi: Crossover-based tree 
distance in genetic programming, IEEE Trans. Evol. 
Comput. 12(4), 506-524 (2008) 

J. McDermott, U.-M. O'Reilly, L. Vanneschi, 
K. Veeramachaneni: How far is it from here to 
there? A distance that is coherent with GP op- 
erators, Lect. Notes Comput. Sci. 6621, 190-202 
(2011) 

U.-M. O'Reilly: Using a distance metric on genetic 
programs to understand genetic operators, Int. 
Conf. Syst. Man Cybern. Comput. Cybern. Simul. 
(1997) pp. 233-241 

D.H. Wolpert, W.G. Macready: No free lunch the- 
orems for optimization, Evol. Comput. IEEE Trans. 
1(1), 67-82 (1997) 

C. Schumacher, M.D. Vose, L.D. Whitley: The no 
free lunch and problem description length, Proc. 
Genet. Evol. Comput. Conf. GECCO-2001 (2001) 


pp. 565-570 
J.R. Woodward, J.R. Neil: No free lunch, 
program induction and combinatorial prob- 


lems, Lect. Notes Comput. Sci. 2610, 475-484 
(2003) 

R. Poli, M. Graff, N.F. McPhee: Free lunches for 
function and program induction, FOGA '09: Proc. 
10th ACM SIGEVO Workshop Found. Genet. Algo- 
rithms, Orlando (2009) pp. 183-194 

L. Spector: Simultaneous evolution of pro- 
grams and their control structures. In: Ad- 
vances in Genetic Programming, Vol. 2, ed. 
by P.J. Angeline, K.E. Kinnear Jr. (MIT, Cam- 
bridge 1996) pp. 137-154, Chapter 7, available 
online at http://helios.hampshire.edu/Ispector/ 
pubs/AiGP2- post-final-e.pdf 

L. Spector, B. Martin, K. Harrington, T. Hel- 
muth: Tag-based modules in genetic program- 
ming, Proc. Genet. Evol. Comput. Conf. GECCO-2011 
(2011) 

G.S. Hornby: Measuring, nabling and compar- 
ing modularity, regularity and hierarchy in evo- 
lutionary design, GECCO 2005: Proc. 2005 Conf. 
Genet. Evol. Comput., Washington, Vol. 2, ed. 
by H.-G. Beyer, U.-M. O'Reilly, D.V. Arnold, 
W. Banzhaf, C. Blum, E.W. Bonabeau, E. Cantu- 
Paz, D. Dasgupta, K. Deb, J.A. Foster, E.D. de 
Jong, H. Lipson, X. Llora, S. Mancoridis, M. Pe- 
likan, G.R. Raidl, T. Soule, A.M. Tyrrell, J.-P. Wat- 
son, E. Zitzler (2005) pp. 1729-1736, available on- 
line at http://www.cs.bham.ac.uk/~wbl/biblio/ 
gecco2005/docs/p1729.pdf 

J.H. Moore, C.S. Greene, P.C. Andrews, B.C. White: 
Does complexity matter? Artificial evolution, 
computational evolution and the genetic anal- 
ysis of epistasis in common human diseases. In: 
Genetic Programming Theory and Practice Vol. VI, 
ed. by R.L. Riolo, T. Soule, B. Worzel (Springer, 
Berlin, Heidelberg 2008) pp. 125-145, Chap. 9 

S. Gustafson: An Analysis of Diversity in Genetic 
Programming, Ph.D. Thesis (School of Computer 


Genetic Programming 


References 


43.170 


43.171 


43.172 


43.173 


43.174 
43.175 


43.176 


43.177 


43.178 


43.179 
43.180 


43.181 


43.182 


43.183 


43.184 


43.185 


Science and Information Technology, University 
of Nottingham, Nottingham 2004), available on- 
line at http://www.cs.nott.ac.uk/~smg/research/ 
publications/phdthesis- gustafson. pdf 

G.S. Hornby: A steady-state version of the age- 
layered population structure EA. In: Genetic Pro- 
gramming Theory and Practice, Vol. VII, Genetic 
and Evolutionary Computation, ed. by R.L. Riolo, 
U.-M. O'Reilly, T. McConaghy (Springer, Ann Arbor 
2009) pp. 87-102, Chap. 6 

J.C. Bongard: Coevolutionary dynamics of a multi- 
population genetic programming system, Lect. 
Notes Comput. Sci. 1674, 154 (1999), available 
online at http://www.cs.uvm.edu/~jbongard/ 
papers/s067. ps.gz 

|. Dempsey, M. O'Neill, A. Brabazon: Founda- 
tions in Grammatical Evolution for Dynamic 
Environments, Studies in Computational Intel- 
ligence, Vol. 194 (Springer, Berlin, Heidelberg 
2009), available online at http://www.springer. 
com/engineering/book/978-3-642-00313-4 

J. Doucette, P. Lichodzijewski, M. Heywood: 
Evolving coevolutionary classifiers under large 
attribute spaces. In: Genetic Programming The- 
ory and Practice Vol. VII, ed. by R.L. Riolo, U.- 
M. O'Reilly, T. McConaghy (Springer, Berlin, Hei- 
delberg 2009) pp. 37-54, Chap. 3 

S. Luke: http://cs.gmu.edu/~eclab/projects/ecj/ 
S. Luke: The EC) Owner's Manual - A User Man- 
ual for the EC) Evolutionary Computation Library, 
Oth edn. online version 0.2 edition, available on- 
line at http://www.cs.gmu.edu/~eclab/projects/ 
ecj/docs/manual/manual.pdf 

D.W. Dyer: 
https://github.com/dwdyer/watchmaker 

E. Hemberg, M. O'Neill: http://ncra.ucd.ie/Site/ 
GEVA.html 

M. O'Neill, E. Hemberg, C. Gilligan, E. Bartley, 
J. McDermott, A. Brabazon: GEVA: Grammatical 
evolution in Java, SIGEVOlution 3(2), 17-22 (2008), 
available online at http://www.sigevolution.org/ 
issues/pdf/SIGEVOlution200802. pdf 

J. Dréo: http://eodev.sourceforge.net/ 

G. Squillero: http://www.cad.polito.it/research/ 
Evolutionary_Computation/MicroGP/index.html 
M. Schillaci, E.E. Sanchez Sanchez: A brief survey 
of uGP, SIGEvolution 1(2), 17-21 (2006) 

G. Squillero: MicroGP - an evolutionary assembly 
program generator, Genet. Program. Evol. Mach. 
6(3), 247-263 (2005), Published online: 17 August 
2005. 

C. Gagné, M. Parizeau: http://beagle.sourceforge. 
net/ 

C. Gagné, M. Parizeau: Open BEAGLE A C++ frame- 
work for your favorite evolutionary algorithm, 
SIGEvolution 1(1), 12-15 (2006), available online at 
http://www. sigevolution.org/2006/01/issue. pdf 
S. Silva: http://gplab.sourceforge.net/ 


43 


43 


43 


43 


43 


43 


43 
43 


43 


43 


43 


43 
43 


43 


43. 


43. 


43. 


43. 


.186 


.187 


.188 


.189 


.190 


.191 


.192 
.193 


.194 


.195 


.196 


.197 
.198 


.199 


200 


201 


202 


203 


204 


F.M. De Rainville, F.-A. Fortin: http://code.google. 
com/p/deap/ 

L. Spector: http://hampshire.edu/Ispector/push. 
html 


R. Poli: http://cswww.essex.ac.uk/staff/rpoli/ 
TinyGP/ 

E. Hemberg, J. McDermott: http://code.google. 
com/p/ponyge/ 

H. Lipson: http://creativemachines.cornell.edu/ 
eureqa 


M.E. Kotanchek, E. Vladislavieva, G.F. Smits: 
http://www. evolved-analytics.com/ 

P. Nordin: http://www.rmitech.com/ 

P. Nordin, W. Banzhaf, F.D. Francone: Efficient 
evolution of machine code for CISC architec- 
tures using instruction blocks and homologous 
crossover. In: Advances in Genetic Programming, 
Vol. 3, ed. by L. Spector, W.B. Langdon, U.- 
M. O'Reilly, P.J. Angeline (MIT, Cambridge 1999) 
pp. 275-299, Chap. 12, available online at http:// 
www.aimlearning.com/aigp31.pdf 

W.B. Langdon: http://www.cs.bham.ac.uk/~wbl/ 
biblio/ 

W.B. Langdon: http://www.cs.ucl.ac.uk/staff/W. 
Langdon/homepages.html 

Genetic Programming Yahoo Group: http:// 
groups.yahoo.com/group/genetic_programming/ 
J. McDermott, D. White: http://gpbenchmarks.org 
J. McDermott, D.R. White, S. Luke, L. Manzoni, 
M. Castelli, L. Vanneschi, W. JaSkowski, K. Krawiec, 
R. Harper, K. De Jong, U.-M. O'Reilly: Genetic pro- 
gramming needs better benchmarks, Proc. GECCO 
2012, Philadelphia (2012) 

R. Poli, W.B. Langdon, N.F. McPhee: A Field Guide 
to Genetic Programming (Lulu, Raleigh 2008), 
Published via http://lulu.com and available at 
http://www.gp-field-guide.org.uk (With contri- 
butions by J. R. Koza) 

S. Luke: Essentials of Metaheuristics, 1st edn. 
(Lulu, Raleigh 2009), available online at http://cs. 
gmu.edu/~sean/books/metaheuristics/ 

W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone: 
Genetic Programming - An Introduction; On 
the Automatic Evolution of Computer Programs 
and Its Applications (Morgan Kaufmann, San 
Francisco 1998), available online at http://www. 
elsevier.com/wps/find/bookdescription.cws_ 
home/677869/description#description 

W.B. Langdon, R. Poli: Foundations of Genetic 
Programming (Springer, Berlin, Heidelberg 2002), 
available online at http://www.cs.ucl.ac.uk/staff/ 
W.Langdon/FOGP/ 

H. Iba, Y. Hasegawa, T. Kumar Paul: Applied Ge- 
netic Programming and Machine Learning, CRC 
Complex and Enterprise Systems Engineering (CRC, 
Boca Raton 2009) 

M. Sipper: Evolved to Win (Lulu, Raleigh 2011), 
available at http://www.lulu.com/ 


869 


€} | 3 Hed 


Nikolaus Hansen, Dirk V. Arnold, Anne Auger 


Evolution strategies (ES) are evolutionary algo- 
rithms that date back to the 1960s and that are 
most commonly applied to black-box optimization 
problems in continuous search spaces. Inspired 
by biological evolution, their original formula- 
tion is based on the application of mutation, 
recombination and selection in populations of 
candidate solutions. From the algorithmic view- 
point, ES are optimization methods that sample 
new candidate solutions stochastically, most com- 
monly from a multivariate normal probability 
distribution. Their two most prominent design 
principles are unbiasedness and adaptive con- 
trol of parameters of the sample distribution. In 
this overview, the important concepts of success 
based step-size control, self-adaptation, and de- 
randomization are covered, as well as more recent 
developments such as covariance matrix adapta- 
tion and natural ES. The latter give new insights 
into the fundamental mathematical rationale be- 
hind ES. A broad discussion of theoretical results 
includes progress rate results on various func- 
tion classes and convergence proofs for evolution 
strategies. 


441 OVervieW ........ cece cece ceeccceeeseeecseae ees 871 


44.1 Overview 


Evolution strategies [44.1—4], sometimes also referred 
to as evolutionary strategies, and evolutionary pro- 
gramming [44.5] are search paradigms inspired by the 
principles of biological evolution. They belong to the 
family of evolutionary algorithms that address opti- 
mization problems by implementing a repeated process 
of (small) stochastic variations followed by selection. 
In each generation (or iteration), new offspring (or 
candidate solutions) are generated from their parents 
(candidate solutions already visited), their fitness is 


44, Evolution Strategies 


44.2 Main Principles.....................cccccccceeceees 873 
44.2.1 Environmental Selection ............ 873 
44.2.2 Mating Selection 

and Recombination................... 873 
44.2.3 Mutation and Parameter Control. 873 
44.2.4 Unbiasedness ...............cceseceeeees 873 
44.2.5 (u/p + A) Notation for Selection 

and Recombination................... 874 
44.2.6 Two Algorithm Templates ........... 874 
44.2.7 Recombination Operators........... 875 
44.2.8 Mutation Operators................... 876 

44.3 Parameter Control.......................006 877 
44.3.1 The 1/5th Success Rule.............. 878 
44.3.2 Self-Adaptation ..................006 879 


44.3.3 Derandomized Self-Adaptation... 879 
44.3.4 Nonlocal Derandomized 


Step-Size Control (CSA)............... 880 
44.3.5 Addressing Dependences 
Between Variables...................4. 881 
44.3.6 Covariance Matrix 
Adaptation (CMA)... 881 
44.3.7 Natural Evolution Strategies ....... 882 
44.3.8 Further Aspects o.ae 885 
HEL THOTY E E 886 
44.4.1 Lower Runtime Bounds.............. 887 
44.4.2 Progress Rates... eee 888 
44.4.3 Convergence Proofs...............000 894 
References... eeee cc teee ce eeeeneeeenaes 895 


evaluated, and the better offspring are selected to be- 
come the parents for the next generation. 

ES most commonly address the problem of contin- 
uous black-box optimization. The search space is the 
continuous domain, R”, and solutions in search space 
are n-dimensional vectors, denoted as x. We consider 
an objective or fitness function f : R” > R, x bh f(x) to 
be minimized. We make no specific assumptions on f, 
other than that f can be evaluated for each x, and re- 
fer to this search problem as black-box optimization. 


871 


v 
o 
| 
me 
m 
F 
F 
2a 


872 PartE | Evolutionary Computation 


Ly |3 Hed 


The objective is, loosely speaking, to generate solu- 
tions (x-vectors) with small f-values while using a small 
number of f-evaluations. Formally, we like to converge 
to an essential global optimum of f, in the sense that 
the best f(x) value gets arbitrarily close to the essen- 
tial infimum of f (i.e., the smallest f-value for which 
all larger, i.e., worse f-values have sublevel sets with 
positive volume). 

In this context, we present an overview of methods 
that sample new offspring, or candidate solutions, from 
normal distributions. Naturally, such an overview is bi- 
ased by the authors’ viewpoints, and our emphasis will 
be on important design principles and on contemporary 
ES that we consider as most relevant in practice or fu- 
ture research. More comprehensive historical overviews 
can be found elsewhere [44.6, 7]. 

In the next section, the main principles are intro- 
duced and two algorithm templates for an evolution 
strategy are presented. Section 44.3 presents six ES that 
mark important conceptual and algorithmic develop- 
ments. Section 44.4 summarizes important theoretical 
results. 


44.1.1 Symbols and Abbreviations 


Throughout this chapter, vectors like z € R” are column 
vectors, their transpose is denoted as zT, and transfor- 
mations like exp(z), z*, or |z| are applied component- 
wise. Further symbols are: 


e |z\= (\z|.|z2|,...)" absolute value taken compo- 

nent wise 

\Iz||= ./ >; z? Euclidean length of a vector 

~ equality in distribution 

œx in the limit proportional to 

o binary operator giving the component-wise prod- 

uct of two vectors or matrices (Hadamard product), 

such that for a,b e R” we have aob eR” and 

(aob); = aibi. 

@ 1. the indicator function, ly = 0 if œ is false or 0 or 
empty, and ly = 1 otherwise. 

@ cN number of offspring, offspring population 
size 

© €N number of parents, parental population size 

è y= (ey Iwl) "/ Xi w, the variance ef- 
fective selection mass or effective number of par- 
ents, where always Uw < u and py = p if all re- 
combination weights wę are equal in absolute value 

@ (1+ 1) elitist selection scheme with one parent and 
one offspring, see Sect. 44.2.5 


(ut A), e.g., (1+1) or (1, A), selection schemes, see 
Sect. 44.2.5 

(u/p,à) selection scheme with recombination (if 
p > 1), see Sect. 44.2.5 

p E€ N number of parents for recombination 

o > Oa step-size and/or standard deviation 

o € R} a vector of step-sizes and/or standard devi- 
ations 
gy € R a progress measure, see Definition 44.2 and 
Sect. 44.4.2 

Cu/u.a the progress coefficient for the (u/n, A)- 
ES [44.8] equals the expected value of the average 
of the largest jz order statistics of A independent 
standard normally distributed random numbers and 
is of the order of ,/2 log(A/,1). 

Ce R” a (symmetric and positive definite) co- 
variance matrix 

C2 € R"*” a matrix that satisfies cic} = C and 
is symmetric if not stated otherwise. If C? is sym- 


metric, the eigendecomposition C? = BAB! with 
BB! = I and the diagonal matrix A exists and we 
find C = C?C? = BAB as eigendecomposition 
of C. 

e; the i-th canonical basis vector 

f: R” — R fitness or objective function to be mini- 
mized 

Te R”*" the identity matrix (identity transforma- 
tion) 

i.i.d. independent and identically distributed 

N (x,C) a multivariate normal distribution with 
expectation and modal value x and covariance ma- 
trix C, see Sect. 44.2.8. 

n € N search space dimension 

P a multiset of individuals, a population 

S,Sq,S¢ < € R” a search path or evolution path 

sS, Sg endogenous strategy parameters (also known 
as control parameters) of a single parent or the k-th 
offspring; they typically parametrize the mutation, 
for example with a step-size o or a covariance ma- 
trix C 

t € N time or iteration index 

wk € R recombination weights 

x, x” x, € R” solution or object parameter vector 
of a single parent (at iteration £t) or of the k-th off- 
spring; an element of the search space R” that serves 
as argument to the fitness function f : R” > R. 
diag: R” > R"*" the diagonal matrix from 
a vector 

exp“ : R?” > R™", Abs SO (@A)‘/ i! 

is the matrix exponential for n> 1, otherwise 


Evolution Strategies | 44.2 Main Principles 873 


the exponential function. If A is symmetric and 
BAB! = A is the eigendecomposition of A with 
BB'=I and A diagonal, we have exp(A) = 


44,2 Main Principles 


ES derive inspiration from principles of biological evo- 
lution. We assume a population, P, of so-called indi- 
viduals. Each individual consists of a solution or object 
parameter vector x € R” (the visible traits) and further 
endogenous parameters, s (the hidden traits), and an 
associated fitness value, f(x). In some cases, the popu- 
lation contains only one individual. Individuals are also 
denoted as parents or offspring, depending on the con- 
text. In a generational procedure: 


1. One or several parents are picked from the popula- 
tion (mating selection) and new offspring are gen- 
erated by duplication and recombination of these 
parents. 

2. The new offspring undergo mutation and become 
new members of the population. 

3. Environmental selection reduces the population to 
its original size. 


Within this procedure, ES employ the following 
main principles that are specified and applied in the op- 
erators and algorithms further below. 


44.2.1 Environmental Selection 


Environmental selection is applied as so-called trun- 
cation selection. Based on the individuals’ fitnesses, 
f(x), only the u best individuals from the popula- 
tion survive. In contrast to roulette wheel selection 
in genetic algorithms [44.9], only fitness ranks are 
used. In evolution strategies, environmental selection 
is deterministic. In evolutionary programming, like 
in many other evolutionary algorithms, environmental 
selection has a stochastic component. Environmen- 
tal selection can also remove overaged individuals 
first. 


44.2.2 Mating Selection and Recombination 
Mating selection picks individuals from the population 


to become new parents. Recombination generates a sin- 
gle new offspring from these parents. Specifically, we 


B exp(A)BT = B (X2, A‘/i!) BT = I + BAB! + 
BA?BT/2 + ---. Furthermore, we have exp“ (A) = 
exp(A)® = exp(@A) and exp® (x) = (e%) = e®%. 


differentiate two common scenarios for mating selec- 
tion and recombination: 


© Fitness-independent mating selection and recom- 
bination do not depend on the fitness values of 
the individuals and can be either deterministic 
or stochastic. Environmental selection is then es- 
sential to drive the evolution toward better solu- 
tions. 

© Fitness-based mating selection and recombination, 
where the recombination operator utilizes the fitness 
ranking of the parents (in a deterministic way). En- 
vironmental selection can potentially be omitted in 
this case. 


44.2.3 Mutation and Parameter Control 


Mutation introduces small, random, and unbiased 
changes to an individual. These changes typically affect 
all variables. The average size of these changes depends 
on endogenous parameters that change over time. These 
parameters are also called control parameters, or en- 
dogenous strategy parameters, and define the notion 
of small, for example, via the step-size o. In contrast, 
exogenous strategy parameters are fixed once and for 
all, for example, parent number jz. Parameter control 
is not always directly inspired by biological evolution, 
but is an indispensable and central feature of evolution 
strategies. 


44.2.4 Unbiasedness 


Unbiasedness is a generic design principle of evolu- 
tion strategies. Variation resulting from mutation or 
recombination is designed to introduce new, unbiased 
information. Selection, on the other hand biases this 
information toward solutions with better fitness. Un- 
der neutral selection (i.e., fitness independent mating 
and environmental selection), all variation operators are 
desired to be unbiased. Maximum exploration and unbi- 
asedness are in accord. ES are unbiased in the following 
respects: 


zirh | J Hed 


874 Part E | Evolutionary Computation 
@ The type of mutation distribution, the Gaussian or Occasionally, a subscript to p is used in order to 
normal distribution, is chosen in order to have rota- denote the type of recombination, e.g., pr or pw for 
tional symmetry and maximum entropy (maximum intermediate or weighted recombination, respectively. 
exploration) under the given variances. Decreasing Without a subscript, we tacitly assume intermediate 
the entropy would introduce prior information and recombination, if not stated otherwise. The notation 
therefore a bias. has also been expanded to include the maximum age, 
@ Object parameters and endogenous strategy param- x, of individuals as (w,«,A)-ES [44.11], where plus- 
eters are unbiased under recombination and unbi- selection corresponds to x = oo and comma-selection 
ased under mutation. Typically, mutation has expec- corresponds to k = 1. 
tation zero. 
@ Invariance properties avoid a bias toward a specific 44.2.6 Two Algorithm Templates 
representation of the fitness function, e.g., repre- 
sentation in a specific coordinate system or using Algorithm 44.1 gives pseudocode for the evolution 
y specific fitness values (invariance to strictly mono- strategy. 
=i tonic transformations of the fitness values can be 
m achieved). Parameter control in evolution strategies Algorithm 44.1 The (j1/p+A)-ES 
= strives for invariance properties [44.10]. 1: given n, p, u, à E€ N4 
> 2: initialize P = {(x,, s f) | 1 < k < u} 
44.2.5 (u/p t A) Notation for Selection 3: while not happy 
and Recombination 4: forke{l,...,a} 
5 (xx, Sk) = recombine(select_mates(p, P)) 
An evolution strategy is an iterative (generational) 6: sk < mutate_s(s;) 
procedure. In each generation new individuals (off- 7 x, < mutate_x(s;,x,) € R” 
spring) are created from existing individuals (parents). 8: P < PU {(xk, Sk f(x~)) | 1<k <A} 
A mnemonic notation is commonly used to describe 9 P <select_by_age(P) // identity for ‘+’ 
some aspects of this iteration. The (u/ptA)-ES, 10: P< select_u_best(u, P) // by f-ranking 


where u, p and À are positive integers, also frequently 
denoted as (u + A)-ES (where p remains unspecified) 
describes the following: 


@ The parent population contains jz individuals. 

@ For recombination, p (out of u) parent individuals 
are used. We have therefore p < p. 

@ A denotes the number of offspring generated in each 
iteration. 

e + describes whether or not selection is additionally 
based on the individuals’ age. An evolution strategy 
applies either plus- or comma-selection. In plus- 
selection, age is not taken into account and the jz 
best of u +À individuals are chosen. Selection is eli- 
tist and, in effect, the parents are the ju all-time best 
individuals. In comma-selection, individuals die out 
after one iteration step and only the offspring (the 
youngest individuals) survive to the next generation. 
In that case, environmental selection chooses ju par- 
ents from A offspring. 


In a (u, A)-ES, A > u must hold and the case A = u 
requires fitness-based mating selection or recombina- 
tion. In a (u + A)-ES, A = 1 is possible and known as 
steady-state scenario. 


Given is a population, P, of at least jz individu- 
als (Xx, 5k,f(Xx)), K=1,..., u. Vector x, € R” is a so- 
lution vector and s; contains the control or endogenous 
strategy parameters, for example, a success counter or 
a step-size that primarily serves to control the mutation 
of x (in Line 7). The values of są may be identical for 
all k. In each generation, first À offspring are generated 
(Lines 4-7), each by recombination of p < jz individu- 
als from P (Line 2), followed by mutation of s (Line 6) 
and of x (Line 7). The new offspring are added to P 
(Line 8). Overaged individuals are removed from P 
(Line 9), where individuals from the same generation 
have, by definition, the same age. Finally, the best u 
individuals are retained in P (Line 10). 

The mutation of the x-vector in Line 7 always in- 
volves a stochastic component. Lines 5 and 6 may have 
stochastic components as well. 

When select_mates in Line 5 selects pọ = p in- 
dividuals from P, it reduces to the identity. If p = wu 
and recombination is deterministic, as is commonly the 
case, the result of recombine is the same parental cen- 
troid for all offspring. The computation of the parental 
centroid can be done once before the for loop or as the 


Evolution Strategies | 44.2 Main Principles 


last step of the while loop, simplifying the initialization 
of the algorithm. Algorithm 44.2 shows the pseudocode 
in this case. 


Algorithm 44.2 The (u/n + A)-ES 
1: givenn,A € N+ 
2: initialize x € R”, s, P = {} 
while not happy 
fork e{1,...,A} 
sp = Mutate_s(s) 
x, = mutate_x(s,;,x) 
P— PU { (Xk, Sk f Xx))} 
P < select_by_age(P) 
(x, s) < recombine(P, x, s) 


io 


// identity for ‘+’ 


W oo SOD we 


In Algorithm 44.2, only a single parental centroid 
(x, 5) is initialized. Mutation takes this parental cen- 
troid as input (notice that są and x; in Lines 5 and 6 are 
now assigned rather than updated) and recombination 
is postponed to the end of the loop, computing in Line 9 
the new parental centroid. While (xx, s) can contain all 
necessary information for this computation, it is often 
more transparent to use x and s as additional arguments 
in Line 9. Selection based on f-values is now limited to 
mating selection in procedure recombine (that is, pro- 
cedure select_j_best is omitted and jz is the number 
of individuals in P that are actually used by recom- 
bine). 

Using a single parental centroid has become the 
most popular approach, because such algorithms are 
simpler to formalize, easier to analyze, and even per- 
form better in various circumstances as they allow for 
maximum genetic repair (see in the following). All 
instances of ES given in Sect. 44.3 are based on Al- 
gorithm 44.2. 


44.2.7 Recombination Operators 


In ES, recombination combines information from sev- 
eral parents to generate a single new offspring. Often, 
multirecombination is used, where more than two par- 
ents are recombined (p > 2). In contrast, in genetic 
algorithms often two offspring are generated from the 
recombination of two parents. In evolutionary program- 
ming, recombination is generally not used. The most 
important recombination operators used in evolution 
strategies are the following: 


© Discrete or dominant recombination, denoted by 
(u/pp T A), is also known as uniform crossover in 
genetic algorithms. For each variable (component 


of the x-vector), a single parent is drawn uniformly 
from all p parents to inherit the variable value. For 
p parents that all differ in each variable value, the 
result is uniformly distributed across p” different 
x-values. The result of discrete recombination de- 
pends on the given coordinate system. 

© Intermediate recombination, denoted by (u/pr + 
à), takes the average value of all p parents (com- 
putes the center of mass, the centroid). 

© Weighted multirecombination [44.10, 12,13], de- 
noted by (u/pw t A), is a generalization of inter- 
mediate recombination, usually with p = n. It takes 
a weighted average of all p parents. The weight 
values depend on the fitness ranking, in that bet- 
ter parents never get smaller weights than inferior 
ones. With equal weights, intermediate recombina- 
tion is recovered. By using comma selection and 
p = H = À, where some of the weights may be zero, 
weighted recombination can take over the role of 
fitness-based environmental selection and negative 
weights become a feasible option [44.12, 13]. The 
sum of weights must be either one or zero, or re- 
combination must be applied to the vectors x, —x 
and the result added to x. 


In principle, recombination operators from genetic 
algorithms, like one-point and two-point crossover or 
line recombination [44.14] can alternatively be used. 
However, they have been rarely applied in ES. 

In ES, the result of selection and recombination 
is often deterministic (namely, if ọ = u and recombi- 
nation is intermediate or weighted). This means that 
eventually all offspring are generated by mutation from 
the same single solution vector (the parental centroid) 
as in Algorithm 44.2. This leads, for given variances, 
to maximum entropy because all offspring are inde- 
pendently drawn from the same normal distribution. 
With discrete recombination, the offspring distribution 
is generated from a mixture of normal distributions 
with different mean values. The resulting distribu- 
tion has lower entropy unless it has a larger overall 
variance. 

The role of recombination, in general, is to keep the 
variation in a population high. Discrete recombination 
directly introduces variation by generating different 
solutions. Their distance resembles the distance be- 
tween the parents. However, discrete recombination, 
as it depends on the given coordinate system, relies 
on separability: it can introduce variation successfully 
only if values of disrupted variables do not strongly de- 
pend on each other. Solutions resulting from discrete 


875 


zirh | J Hed 


876 PartE | Evolutionary Computation 


zirh | J Hed 


a) b) c) 
A A 

4 4 4 

2 2 : : 

(0) (0) 
-2 -2 
4 —4 A 

> - > e 
-2 0 2 -2 0 2 -2 0 2 


Fig. 44.1a-c Three two-dimensional multivariate normal distributions N (0, C) ~ CIN (0, I). The covariance matrix C 
of the distribution is, from left to right, the identity I (isotropic distribution), the diagonal matrix ( We 9) (axis-parallel 
distribution) and (7-122 1:873) with the same eigenvalues (1/4,4) as the diagonal matrix. Shown are in each subfigure 
the mean at 0 as small black dot (a different mean solely changes the axis annotations), two eigenvectors of C along 
the principal axes of the ellipsoids (thin black lines), two ellipsoids reflecting the set of points {x : (x —0)™C7!(x—0) € 


{1, 4}} that represent the 1-o and 2-o lines of equal density, and 100 sampled points (however, a few of them are likely 


to be outside of the area shown) 


recombination lie on the vertices of an axis-parallel 
box. 

Intermediate and weighted multirecombination do 
not lead to variation within the new population as they 
result in the same single point for all offspring. How- 
ever, they do allow the mutation operator to introduce 
additional variation by means of genetic repair [44.15]. 
Recombinative averaging reduces the effective step 
length taken in unfavorable directions by a factor of ./u 
(or ./fly in the case of weighted recombination), but 
leaves the step length in favorable directions essentially 
unchanged, see also Sect. 44.4.2. This may allow in- 
creased variation by enlarging mutations by a factor of 
about u (or jy) as revealed in (44.16), to achieve max- 
imal progress. 


44.2.8 Mutation Operators 


The mutation operator introduces (small) variations by 
adding a point symmetric perturbation to the result 
of recombination, say a solution vector x € R”. This 
perturbation is drawn from a multivariate normal dis- 
tribution, N (0, C), with zero mean (expected value) 
and covariance matrix C € R”*". Besides normally dis- 
tributed mutations, Cauchy mutations [44.16—18] have 
also been proposed in the context of ES and evolution- 
ary programming. We have x + N(0,C) ~ N(x, C), 
meaning that x determines the expected value of the 
new offspring individual. We also have x + N (0, C) ~ 


x+ CIN (0, I), meaning that the linear transformation 
C? generates the desired distribution from the vec- 
tor N (0, I) that has i.i.d. N (0, 1) components. (Using 
the normal distribution has several advantages. The 
N (0,1) distribution is the most convenient way to im- 
plement an isotropic perturbation. The normal distribu- 
tion is stable: sums of independent normally distributed 
random variables are again normally distributed. This 
facilitates the design and analysis of algorithms remark- 
ably. Furthermore, the normal distribution has maxi- 
mum entropy under the given variances.) 

Figure 44.1 shows different normal distributions 
in dimension n= 2. Their lines of equal den- 
sity are ellipsoids. Any straight section through the 
two-dimensional density recovers a two-dimensional 
Gaussian bell. Based on multivariate normal distri- 
butions, three different mutation operators can be 
distinguished: 


@ Spherical/isotropic (Fig. 44.la) where the covari- 
ance matrix is proportional to the identity, i.e., 
the mutation distribution follows oN (0, I) with 
step-size ø > 0. The distribution is spherical and 
invariant under rotations about its mean. In the fol- 
lowing, Algorithm 44.3 uses this kind of mutation. 

© Axis-parallel (Fig. 44.1b) where the covariance ma- 
trix is a diagonal matrix, i.e., the mutation distri- 
bution follows N (0, diag(a)*), where ø is a vector 
of coordinate-wise standard deviations and the di- 


Evolution Strategies | 44.3 Parameter Control 877 


agonal matrix diag(o)* has eigenvalues o? with 


eigenvectors e;. The principal axes of the ellip- 
soid are parallel to the coordinate axes. This case 
includes the previous isotropic case. Below, Algo- 
rithms 44.4—44.6 implement this kind of mutation 
distribution. 

© General (Fig. 44.1c) where the covariance matrix 
is symmetric and positive definite (i.e., xTCx > 
O for all x £0), generally nondiagonal and has 
(n? +n)/2 degrees of freedom (control param- 
eters). The general case includes the previous 
axis-parallel and spherical cases. Below, Algo- 
rithms 44.7 and 44.8 implement general multivari- 
ate normally distributed mutations. 


In the first and the second cases, the variations of 
variables are independent of each other, they are un- 
correlated. This limits the usefulness of the operator in 
practice. The third case is incompatible with discrete 
recombination: for a narrow, diagonally oriented ellip- 
soid (not to be confused with a diagonal covariance 
matrix), a point resulting from selection and discrete 
recombination lies within this ellipsoid only if each 


44.3 Parameter Control 


Controlling the parameters of the mutation operator is 
key to the design of ES. Consider the isotropic oper- 
ator (Fig. 44.la), where the step-size ø is a scaling 
factor for the random vector perturbation. The step-size 
controls to a large extent the convergence speed. In sit- 
uations where larger step-sizes lead to larger expected 
improvements, a step-size control technique should aim 
at increasing the step-size (and decreasing it in the op- 
posite scenario). 

The importance of step-size control is illustrated 
with a simple experiment. Consider a spherical func- 
tion f(x) = ||x||%, œ > 0, and a (1+1)-ES with constant 
step-size equal to o = 107°, i.e., with mutations drawn 
from 107?N (0, I). The convergence of the algorithm 
is depicted in Fig 44.2 (constant o graphs). 

We observe, roughly speaking, three stages: up to 
600 function evaluations, progress toward the optimum 
is slow. At this stage, the fixed step-size is too small. 
Between 700 and 800 evaluations, fast progress toward 
the optimum is observed. At this stage, the step-size 
is close to optimal. Afterward, the progress decreases 
and approaches the rate of the pure random search algo- 
rithm, well illustrated on the bottom subfigure. At this 


coordinate is taken from the same parent (which hap- 
pens with probability 1/p”—!) or from a parent with 
a very similar value in this coordinate. The narrower 
the ellipsoid the more similar (i. e., correlated) the value 
needs to be. As another illustration consider sampling, 
neutral selection and discrete recombination based on 
Fig. 44.1c): after discrete recombination the points 
(—2, 2) and (2, —2) outside the ellipsoid have the same 
probability as the points (2,2) and (—2, —2) inside the 
ellipsoid. 

The mutation operators introduced are unbiased in 
several ways. They are all point symmetrical and have 
expectation zero. Therefore, mutation alone will almost 
certainly not lead to better fitness values in expecta- 
tion. The isotropic mutation operator features the same 
distribution along any direction. The general mutation 
operator is, as long as C remains unspecified, unbiased 
toward the choice of a Cartesian coordinate system, 
i.e. unbiased toward the representation of solutions x, 
which has also been referred to as invariance to affine 
coordinate system transformations [44.10]. This how- 
ever depends on the way how C is adapted (see the 
following). 


stage the fixed step-size is too large and the probability 
to sample better offspring becomes very small. 

The figure also shows runs of the (1+1)-ES with 
1/5th success rule step-size control (as described in 
Sect. 44.3.1) and the step-size evolution associated to 
one of these runs. The initial step-size is far too small 
and we observe that the adaptation technique increases 
the step-size in the first iterations. Afterward, step-size 
is kept roughly proportional to the distance to the op- 
timum, which is in fact optimal and leads to linear 
convergence on the top subfigure. 

Generally, the goal of parameter control is to drive 
the endogenous strategy parameters close to their op- 
timal values. These optimal values, as we have seen 
for the step-size in Fig. 44.2, can significantly change 
over time or depending on the position in search space. 
In the most general case, the mutation operator has 
(n? + n)/2 degrees of freedom (Sect. 44.2.8). The con- 
jecture is that in the desired scenario lines of equal 
density of the mutation operator resemble locally the 
lines of equal fitness [44.4, pp. 242f.]. In the case of 
convex-quadratic fitness functions this resemblance can 
be perfect and, apart from the step-size, optimal param- 


Eih | J Wed 


878 PartE | Evolutionary Computation 


Eih | J Hed 


eters do not change over time (as illustrated in Fig. 44.3 
below). 

Control parameters like the step-size can be stored 
on different levels. Each individual can have its own 


a) Distance to optimum 


10° 


103 


10° — Random search 
—— Constant o 
—— Adaptive step—size o 
—— Step-size o 
10° 
0 500 1000 1500 


Function evaluations 
b) Distance to optimum 


107 


— Random search 
— Constant o 
—— Adaptive step-size o | 


10° 


10° 10* 10° 
Function evaluations 
Fig. 44.2a,b Runs of the (1+1)-ES with constant step- 
size, of pure random search (uniform in [—0.2, 1]!°), and 
of the (1+ 1)-ES with 1/5th success rule (Algorithm 44.3) 
on a spherical function f(x) = ||x||%,a@ > 0 (because of in- 
variance to monotonic f-transformation the same graph 
is observed for any a > 0). For each algorithm, there are 
three runs in (a) and (b). The x-axis is linear in (a) and in 
log-scale in (b). For the (1+1)-ES with constant step-size, 
o equals 10~?. For the (1+1)-ES with 1/Sth success rule, 
the initial step-size is chosen very small to 107° and the 
parameter d equals 1 + 10/3. In (a) also the evolution of 
the step-size of one of the runs of the (1+1)-ES with 1/5th 
success rule is shown. All algorithms are initialized at 1. 
Eventually, the (1+1)-ES with 1/5th success rule reveals 
linear behavior (a), while the other two algorithms reveal 
eventually linear behavior in (b) 


step-size value (like in Algorithms 44.4 and 44.5), or 
a single step-size is stored and applied to all individuals 
in the population. In the latter case, sometimes different 
populations with different parameter values are run in 
parallel [44.19]. 

In the following, six specific ES are outlined, each 
of them representing an important achievement in pa- 
rameter control. 


44.3.1 The 1/5th Success Rule 


The 1/5th success rule for step-size control is based 
on an important discovery made very early in the re- 
search of evolution strategies [44.1]. A similar rule 
had also been found independently before in [44.20]. 
As a control mechanism in practice, the 1/5th success 
rule has been mostly superseded by more sophisticated 
methods. However, its conceptual insight remains re- 
markably valuable. 

Consider a linear fitness function, for example, f : 
xe x orf: x> J; xi. In this case, any point symmet- 
rical mutation operator has a success probability of 1/2: 
in one-half of the cases, the perturbation will improve 
the original solution, in one half of the cases the so- 
lution will deteriorate. Following the Taylor’s formula, 
we know that smooth functions with decreasing neigh- 
borhood size become more and more linear. Therefore, 
the success probability becomes 1/2 for step-size o > 
0. On most nonlinear functions, the success rate is 
indeed a monotonously decreasing function in o and 
goes to zero for o —> ov. This suggests to control the 
step-size by increasing it for large success rates and de- 
creasing it for small ones. This mechanism can drive the 
step-size close to the optimal value. 

Rechenberg [44.1] investigated two simple but quite 
different functions, the corridor function 


x, if |x| <1 fori=2,...,n 
fixe 


oo otherwise , 


and the sphere function 


fixe Dox. 


He found optimal success rates for the (1+1)-ES with 
isotropic mutation to be ~ 0.184 > 1/6 and ~ 0.270 < 
1/3, respectively (for n + oo) [44.1]. Optimality here 
means to achieve the largest expected approach of the 
optimum in a single generation. This leads to approxi- 
mately 1/5 as being the success value where to switch 
between decreasing and increasing the step-size. 


Evolution Strategies | 44.3 Parameter Control 


Algorithm 44.3 The (1+1)-ES with 1/5th Rule 
1: given n E N4,d x JVn+1 
2: initialize x € R”, o > 0 
3: while not happy 
xı =x+oxNn(0,1D 
o < o x exp (Ip) < pax) — 1/5) 
if f(x1) < f(x) // select if better 
x=X // x-value of new parent 


// mutation 


Oy 


Algorithm 44.3 implements the (1+1)-ES 
with 1/5th success rule in a simple and effective 
way [44.21]. Lines 5-7 implement Line 9 from Al- 
gorithm 44.2, including selection in Line 8. Line 5 in 
Algorithm 44.3 updates the step-size o of the single 
parent. The step-size does not change if and only if the 
argument of exp is zero. While this cannot happen in 
a single generation, we still can find a stationary point 
for o: logo is unbiased if and only if the expected 
value of the argument of exp is zero. This is the case if 
Ely@,)<s(x) = 1/5, in other words, if the probability of 
an improvement with f (x1) < f(x) is 20%. Otherwise, 
logo increases in expectation if the success probability 
is larger than 1/5 and decreases if the success probabil- 
ity is smaller than 1/5. Hence, Algorithm 44.3 indeed 
implements the 1/5th success rule. 


44.3.2 Self-Adaptation 


A seminal idea in the domain of ES is parameter 
control via self-adaptation [44.3]. In self-adaptation, 
new control parameter settings are generated similar 
to new x-vectors by recombination and mutation. Al- 
gorithm 44.4 presents an example with adaptation of 
n coordinate-wise standard deviations (individual step- 
sizes). 


Algorithm 44.4 The (u/j1,A)-o SA-ES 
1: given n€ N4, À > 5n, wr dA/4EN, t ~ 1/yn, 
ti 1/n'/4 
2: initialize x € R”, o € R} 
3: while not happy 


4: forke{l,...,A} 

// random numbers i.i.d. for all k 
5: & =T N (0,1) // global step-size 
6: E= N(0,I) // coordinate-wise o 


7: zk = N (0,1) 
// mutation 
8: øk =0 o exp(E,) x exp(E) 
9: Xk =X +HOkOZK 
10: P = sel_u_best (xk, ok, fŒ) | 1 <k <A}) 
// recombination 


// x-vector change 


First, for conducting the mutation, random events 
are drawn in Lines 5-7. In Line 8, the step-size vector 
for each individual undergoes (i) a mutation common 
for all components, exp(&,), and (ii) a component-wise 
mutation with exp(&,). These mutations are unbiased, 
in that E logo, = logo. The mutation of x in Line 9 
uses the mutated vector ø. After selection in Line 10, 
intermediate recombination is applied to compute x 
and o for the next generation. By taking the average 
over o we have Eo = Eo, in Line 11. However, the 
application of mutation and recombination on ø intro- 
duces a moderate bias such that o tends to increase 
under neutral selection [44.22]. 

In order to achieve stable behavior of ø , the number 
of parents u must be large enough, which is reflected 
in the setting of A. A setting of t ~ 1/4 has been 
proposed in combination with é; being uniformly dis- 
tributed across the two values in {—1, 1} [44.2]. 


44.3.3 Derandomized Self-Adaptation 


Derandomized self-adaptation [44.23] addresses the 
problem of selection noise that occurs with self- 
adaptation of ø as outlined in Algorithm 44.4. Selection 
noise refers to the possibility that very good offspring 
may be generated with poor strategy parameter settings 
and vice versa. The problem occurs frequently and has 
two origins: 


© A small/large component in |ø% ozg| (Line 9 in Al- 
gorithm 44.4) does not necessarily imply that the 
respective component of ø z is small/large. Selection 
of o is disturbed by the respective realizations of z. 

@ Selection of a small/large component of |ø% 0 z| 
does not imply that this is necessarily a favorable 
setting: more often than not, the sign of a compo- 
nent is more important than its size and all other 
components influence the selection as well. 


Due to selection noise, poor values are frequently 
inherited and we observe stochastic fluctuations of o. 
Such fluctuations can in particular lead to very small 
values (very large values are removed by selection more 
quickly). The overall magnitude of these fluctuations 
can be implicitly controlled via the parent number p, 


879 


Eih | J Wed 


880 PartE 


Evolutionary Computation 


Eih | J Hed 


because intermediate recombination (Line 11 in Algo- 
rithm 44.4) effectively reduces the magnitude of o- 
changes and biases log o to larger values. 

For u <n, the stochastic fluctuations become pro- 
hibitive and therefore u ~ A/4 > 1.25n is chosen to 
make o-self-adaptation reliable. 

Derandomization addresses the problem of selec- 
tion noise on o directly without resorting to a large 
parent number. The derandomized (1, A)-oSA-ES is 
outlined in Algorithm 44.5 and addresses selection 
noise twofold. 


Algorithm 44.5 Derandomized (1, 4)-o SA-ES 
1: given n € N+, å ~ 10, t ~ 1/3,d ~% yn, di xn 
2: initialize x € R”, ø € R? 
3: while not happy 
4: forke{l,...,A} 
// random numbers i.i.d. for all k 
5: & =tN(0, 1) 
6: zk = N (0,1) 
// mutation, re-using random events 
T: xX, = X + exp(&) X O O Zk 
è = 1/d; Iz 
8: Ok =00 exp (eon 1) 
x exp" (En) 
9:  (x1,01,f(x1)) <— select_single_best( 
(Er onfa) | L<k <A}) 
// assign new parent 
10: o=0; 
ll: x =x, 


Instead of introducing new variations in ø by means 
of exp(&,), the variations from z, are directly used for 
the mutation of ø in Line 8. The variations are damp- 
ened compared to their use in the mutation of x (Line 7) 
via d and dj, thereby mimicking the effect of interme- 
diate recombination on ø [44.23, 24]. The order of the 
two mutation equations becomes irrelevant. 

For Algorithm 44.5 also a (u/ u, A) variant with re- 
combination is feasible. However, in particular in the 
(u/Hr, A)-ES, o-self-adaptation tends to generate too 
small step-sizes. A remedy for this problem is to use 
nonlocal information for step-size control. 


44.3.4 Nonlocal Derandomized Step-Size 
Control (CSA) 


When using self-adaptation, step-sizes are associated 
with individuals and selected based on the fitness of 
each individual. However, step-sizes that serve indi- 


viduals well by giving them a high likelihood to be 
selected are generally not step-sizes that maximize the 
progress of the entire population or the parental cen- 
troid x. We will see later that, for example, the optimal 
step-size may increase linearly with u (Sect. 44.4.2 
and (44.16)). With self-adaptation on the other hand, 
the step-size of the j-th best offspring is typically 
even smaller than the step-size of the best offspring. 
Consequently, Algorithm 44.5 assumes often too small 
step-sizes and can be considerably improved by using 
nonlocal information about the evolution of the pop- 
ulation. Instead of single (local) mutation steps z, an 
exponentially fading record, sg, of mutation steps is 
taken. This record, referred to as search path or evo- 
lution path, can be pictured as a sequence or sum of 
consecutive successful z-steps that is nonlocal in time 
and space. A search path carries information about the 
interrelation between single steps. This information can 
improve the adaptation and search procedure remark- 
ably. Algorithm 44.6 outlines the (w/j7,A)-ES with 
cumulative path length control, also denoted as cu- 
mulative step-size adaptation (CSA), and additionally 
with nonlocal individual step-size adaptation [44.25, 
26]. 


Algorithm 44.6 The (u/u,à)-ES with Search Path 
1: given neNi, AEN, UPrA/4eN, 


Co X yu/(n+ pu), dx 1+ yu/n, di x 3n 


Xk =X +O 0% 
P < sel_u_best({ Œr zr fŒ) |1 <k <A}) 
// recombination and parent update 
8: So 4 (l1—co)So + 


Veo =e) Xoz 


REP 


2: initialize x € R”, ø € R}. So =0 

3: while not happy 

4: forke{l,...,A} 

5: zk = N (0,1) // i.i.d. for each k 
6: 

T: 


s 
9) o<ao exp!/% Gear DI -1) 


10: s 
i s= 
- x= — Xk 
u 


In the (u/u,à)-ES with search path, Algo- 
rithm 44.6, the factor & for changing the overall step- 
size has disappeared (compared to Algorithm 44.5) and 
the update of ø is postponed until after the for loop. 


Iso 
EINO.DI 1) 


Evolution Strategies | 44.3 Parameter Control 881 


Instead of the additional random variate €, the length 
of the search path ||s, || determines the global step-size 
change in Line 9. For the individual step-size change, 
|zz| is replaced by |so |. 

Using a search path is justified in two ways. First, 
it implements a low-pass filter for selected z-steps, 
removing high-frequency (most likely noisy) informa- 
tion. Second, and more importantly, it utilizes informa- 
tion that is otherwise lost: even if all single steps have 
the same length, the length of sg can vary, because it 
depends on the correlation between the directions of z- 
steps. If single steps point into similar directions, the 
path will be up to almost ,/2/co times longer than 
a single step and the step-size will increase. If they op- 
pose each other the path will be up to almost ap taf? 
times shorter and the step-size will decrease. The same 
is true for single components of So. 

The factors /cg (2— co) and ,/p in Line 8 guar- 
anty unbiasedness of sg under neutral selection, as 
usual. 

All ES described so far are of somewhat limited 
value, because they feature only isotropic or axis- 
parallel mutation operators. In the remainder we con- 
sider methods that entertain not only an n-dimensional 
step-size vector ø, but also correlations between vari- 
ables for the mutation of x. 


44.3.5 Addressing Dependences 
Between Variables 


The ES presented so far sample the mutation distri- 
bution independently in each component of the given 
coordinate system. The lines of equal density are either 
spherical or axis-parallel ellipsoids (compare Fig. 44.1). 
This is a major drawback, because it allows to solve 
problems with a long or elongated valley efficiently 
only if the valley is aligned with the coordinate system. 
In this section, we discuss ES that allow us to traverse 
nonaxis-parallel valleys efficiently by sampling distri- 
butions with correlations. 


Full Covariance Matrix 
Algorithms that adapt the complete covariance ma- 
trix of the mutation distribution (compare Sect. 44.2.8) 
are correlated mutations [44.3], the generating set 
adaptation [44.26], the covariance matrix adaptation 
(CMA) [44.27], a mutative invariant adaptation [44.28], 
and some instances of natural evolution strategies 
(NES) [44.29-31]. Correlated mutations and some nat- 
ural ES are however not invariant under changes of the 


coordinate system [44.10, 31,32]. In the next sections, 
we outline two ES that adapt the full covariance ma- 
trix reliably and are invariant under coordinate system 
changes: the covariance matrix adaptation evolution 
strategy (CMA-ES) and the exponential natural evolu- 
tion strategy (xNES). 


Restricted Covariance Matrix 

Algorithms that adapt nondiagonal covariance matrices, 
but are restricted to certain matrices, are the momentum 
adaptation [44.33], direction adaptation [44.26], main 
vector adaptation [44.34], and limited memory CMA- 
ES [44.35]. These variants are limited in their capability 
to shape the mutation distribution, but they might be ad- 
vantageous for larger dimensional problems, say larger 
than a 100. 


44.3.6 Covariance Matrix Adaptation (CMA) 


The CMA-ES [44.10, 27,36] is a de facto standard 
in continuous domain evolutionary computation. The 
CMA-ES is a natural generalization of Algorithm 44.6 
in that the mutation ellipsoids are not constrained to be 
axis-parallel, but can take on a general orientation. The 
CMA-ES is also a direct successor of the generating set 
adaptation [44.26], replacing self-adaptation to control 
the overall step-size with cumulative step-size adapta- 
tion [44.37]. 

The (u/uw,A)-CMA-ES is outlined in Algo- 
rithm 44.7. 


Algorithm 44.7 The (u/uw, à)-CMA-ES 
1: givenne Ni,A>5,u~A/2, 

wg = w'(k)/ eel w’(k), 
w’(k) = log(A/2 + 1/2) — log rank(f(x;)), 
Hw = 1/ pa wes Co X Ly /(n F Hw): 
da 1+ /u,/n, 
Ce X (4+ Hy/n)/ (n+ 4+ 2uy/n), 
qs 2/ (n? T Mw): Cu xX [My /(n? ale Hw), Cm = 1 


initialize sọ = 0, Se = 0, C= I, o € Ri, xeR” 
while not happy 
for ke {1,...,A} 
zk = N (0, I) //i.i.d. for all k 


Xk =x+o0C? X Zk 
P =sel_p_best({ (zr fx) |1 <k <A}) 
So <— (l1 —co)So + //search path for o 


Vco (2— co) Hw )_ Wiz 


zkEP 


OO A OOS CME eR 0s ok 


Eih | J Wed 


882 


Eih | J Hed 


Part E | Evolutionary Computation 
9: So (1—Ce) Se + // search path for C Line 12, where negative weights w+ for inferior off- 
ho Vee =c) Vi > wC? zųą spring are advisable. Such an update has been intro- 
EP duced as active CMA [44.38]. 
1 The factor Cm in Line 10 can be equally written 
10: XxX + mo C? 2 WKkZk as a mutation scaling factor K = 1/cm in Line 6, com- 
ai pare [44.39]. This means that the actual mutation steps 
ll: o<o exp! (E = i) are larger than the inherited ones, resembling the deran- 
E|N(,D| domization technique of damping step-size changes to 
12: C<(l—-cy+ce,—cy) C+ address selection noise as described in Sect. 44.3.3. 
CISSE + Cu > weC2z,(C2z,)" An elegant way to replace Line 10 is 
zkEP 2 
where ho = llsol2/n<2+4/0+D> Ch = C1 — o < o exp e/d/2 (EL — 1) (44.1) 
ho’)ce(2— ce), and C2 is the unique symmetric " 
positive definite matrix obeying C2xC2 =C. and often used in theoretical investigations of this up- 
All c-coefficients are < 1. date as those presented in Sect. 44.4.2. 
A single run of the (5/5w,10)-CMA-ES on 
Two search paths are maintained, s, and se. The a convex-quadratic function is shown in Fig. 44.3. For 


first path, s,, accumulates steps in the coordinate sys- 
tem where the mutation distribution is isotropic and 
which can be derived by scaling in the principal axes 
of the mutation ellipsoid only. The path generalizes sg 
from Algorithm 44.6 to nondiagonal covariance ma- 
trices and is used to implement cumulative step-size 
adaptation, CSA, in Line 10 (resembling Line 9 in Al- 
gorithm 44.6). Under neutral selection, so ~ N (0, I) 
and logo is unbiased. 

The second path, Se, accumulates steps, disregard- 
ing o, in the given coordinate system. Whenever so 
is large and therefore o is increasing fast, the coef- 
ficient hg prevents Se from getting large and quickly 
changing the distribution shape via C. Given hg = 
1, under neutral selection Se ~ N (0, C). The coeffi- 
cient c, in Line 12 corrects for the bias on Se introduced 
by events ho = 0. The covariance matrix update con- 
sists of a rank-1 update, based on the search path se, 
and a rank-u update with u nonzero recombination 
weights w;. Under neutral selection, the expected co- 
variance matrix equals the covariance matrix before the 
update. 

The updates of x and C follow a common princi- 
ple. The mean x is updated such that the likelihood 
of successful offspring to be sampled again is maxi- 
mized (or increased if cm < 1). The covariance matrix 
C is updated such that the likelihood of successful steps 
(x, —x)/o to appear again, or the likelihood to sample 
(in the direction of) the path se, is increased. A more 
fundamental principle for the equations is given in the 
next section. 

Using not only the u best but all à offspring can 
be particularly useful for the rank-j update of C in 


the sake of demonstration, the initial step-size is cho- 
sen far too small (a situation that should be avoided 
in practice) and increases quickly for the first 400 f- 
evaluations. After no more than 5500 f-evaluations the 
adaptation of C is accomplished. Then the eigenvalues 
of C (square roots of which are shown in the lower left) 
reflect the underlying convex-quadratic function and the 
convergence speed is the same as on the sphere function 
and about 60% of the speed of the (1 + 1)-ES as ob- 
served in Fig. 44.2. The resulting convergence speed is 
about 10000 times faster than without adaptation of C 
and at least 1000 times faster compared to any of the 
algorithms from the previous sections. 


44.3.7 Natural Evolution Strategies 


The idea of using natural gradient learning [44.40] 
in ES has been proposed in [44.29] and further pur- 
sued in [44.31,41]. Natural evolution strategies (NES) 
put forward the idea that the update of all distribution 
parameters can be based on the same fundamental prin- 
ciple. NES have been proposed as a more principled 
alternative to CMA-ES and characterized by operat- 
ing on Cholesky factors of a covariance matrix. Only 
later was it discovered that also CMA-ES implements 
the underlying NES principle of natural gradient learn- 
ing [44.31, 42]. 

For simplicity, let the vector 6 represents all pa- 
rameters of the distribution to sample new offspring. 
In the case of a multivariate normal distribution as 
above, we have a bijective transformation between 0 
and mean and covariance matrix of the distribution, 
6 = (x, 07C). 


Evolution Strategies | 44.3 Parameter Control 883 


a) c) Object variables (mean, 10-D, popsize ~10) 


— Abs(f) 

—— f-min(f) 

10% E 
— Axis ratio 


f_recent = 2.5884720664009635 x 10% 


max std 
min std 


10° 2 D 
0 1000 2000 3000 4000 5000 6000 7000 0 1000 2000 3000 4000 5000 6000 7000 a 
Function evaluations Function evaluations m 

b) Scaling (all main axes) d) Standard deviations in all coordinates - 
10! 10! F 
Ww 


4 
0 
X 1 
—1 0 2 
10 10 a 
9 
3 
107 7 
5 
6 
10° —1 


1 
0 1000 2000 3000 4000 5000 6000 7000 0 1000 2000 3000 4000 5000 6000 7000 
Function evaluations Function evaluations 


Fig. 44.3a-d A single run of the (5/5w, 10)-CMA-ES on the rotated ellipsoid function x > } ` ;—] a?y? with œ; = 
1036-D/@—1) y = Rx, where R is a random matrix with R'R = I, for n = 10. Shown is the evolution of various pa- 
rameters against the number of function evaluations. (a) best (gray), median and worst fitness value that reveal the final 
convergence phase after about 5500 function evaluations where the ellipsoid function has been reduced to the simple 
sphere; minimal and maximal coordinate-wise standard deviation of the mutation distribution and in between (mostly 
hidden) the step-size o that is initialized far too small and increases quickly in the beginning, that increases afterward 
several times again by up to one order of magnitude and decreases with maximal rate during the last 1000 f-evaluations; 
axis ratio of the mutation ellipsoid (square root of the condition number of C) that increases from 1 to 1000 where the 
latter corresponds to @,/a1. (b) sorted principal axis lengths of the mutation ellipsoid disregarding o (square roots of the 
sorted eigenvalues of C, see also Fig. 44.1) that adapt to the (local) structure of the underlying optimization problem; they 
finally reflect almost perfectly the factors œ! up to a constant factor. (c) x (distribution mean) that is initialized with all 
ones and converges to the global optimum in zero while correlated movements of the variables can be observed. (d) stan- 
dard deviations in the coordinates disregarding o (square roots of diagonal elements of C) showing the R-dependent 
projections of the principal axis lengths into the given coordinate system. The straight lines to the right of the vertical 
line at about 6300 only annotate the coordinates and do not reflect measured data 


884 PartE 


Evolutionary Computation 


€°t7 | J Hed 


We consider a probability density p(.|@) over 
R” parametrized by @ and a nonincreasing func- 
tion ws, :R—R. More specifically, Wi, yh 
w(Pr,~p¢.|6)(f(@) < y)) computes the pg-quantile, 
or cumulative distribution function, of f(z) with z ~ 
p(.|@) at point y, composed with a nonincreasing pre- 
defined weight function w : [0,1] > R (where w(0) > 
w(1/2) =0 is advisable). The value of w (f(x)) is 
invariant under strictly monotonous nai elormnations 
of f. For x~ p(.|@) the distribution of WEF) ~ 
w(U[0, 1]) depends only on the predefined w; it is inde- 
pendent of 0 and f and therefore also (time-)invariant 
under 6-updates. Given A samples x, we have the 
rank-based consistent estimator 


we (fe) arr ee = 2) l 


We consider the expected W? p-transformed fit- 
ness [44.43] 


IO =EW PEF x~ p18) 
= | WoP(f(x)) pb )ae , (44.2) 
R” 


where the expectation is taken under the given sample 
distribution. The maximizer of J w.r.t. p(.|@) is, for any 
fixed wi p, a Dirac distribution concentrated on the mini- 
mizer off. A natural way to update 6 is therefore a gradi- 
ent ascent step in the VgJ direction. However, the vanilla 
gradient VgJ depends on the specific parametrization 
chosen in 0. In contrast, the natural gradient, denoted by 
Vg, is associated to the Fisher metric that is intrinsic to 
p and independent of the chosen @-parametrization. De- 
veloping VaJ (0) under mild assumptions on f and p(.|@) 
by exchanging differentiation and integration, recogniz- 
ing that the gradient Vo does not act on Wa , using the 
log-likelihood trick Vap(. |0) = p(.|@) Vo Inp(.|@) and 
finally setting 0’ = 6 yields 


TJ (0) = E (WE) Vo In p(x|6)) . 


We set 0’ = 6 because we will estimate Wọ using the 
current samples that are distributed according to p(.|@). 
A Monte Carlo approximation of the expected value by 
the average finally yields the comparatively simple ex- 
pression 


(44.3) 


preference weight 
— 


ou 1 À = = 
Vod(0) ~~ >) WFE) Yo Inpo) 
k=1 


intrinsic candidate direction 


(44.4) 


for a natural gradient update of 6, where x; ~ p(.|@) 
is sampled from the current distribution. ite natural 
gradient can be computed as Vo = Fg! Ve, where 
Fg is the Fisher information matrix expressed in 0- 
coordinates. For the multivariate Gaussian distribu- 
tion, Vg Inp(x;,|@) can indeed be easily expressed and 
computed efficiently. We find that in CMA-ES (Algo- 
rithm 44.7), the rank-jz update (Line 12 with cı = 0) 
and the update in Line 10 are natural gradient updates 
of C and x, respectively [44.31,42], where the k-th 
largest w% is a consistent estimator for the k-th largest 
WI (F (xx)) [44.43]. 

While the natural gradient does not depend on the 
parametrization of the distribution, a finite step taken 
in the natural gradient direction does. This becomes 
relevant for the covariance matrix update, where nat- 
ural ES take a different parametrization than CMA-ES. 
Starting from Line 12 in Algorithm 44.7, we find for 
cy =cn = 0 


Ca (l1—cu)C+ cu X WCC)" 


ZkEP 
c3 =c (a —cy)I+cu >> ma) c? 
ZEP 

r=! c: (Gz Y w (az =) c? 
GEP 

Cu Z1 

"x c2 exp > Wk (zzi -») c? ; 

KEP 


(44.5) 


The term bracketed between the matrices C? in the 
lower three lines is a multiplicative covariance ma- 
trix update expressed in the natural coordinates, where 
the covariance matrix is the identity and C2 serves 
as coordinate system transformation into the given co- 
ordinate system. Only the lower two lines of (44.5) 
do not rely on the constraint `, wg = 1 in order to 
satisfy a stationarity condition on C. For a given C 
on the right-hand side of (44.5), we have under neu- 
tral selection the stationarity condition E(Cyew) = C 
for the first three lines and E(log(Chew)) = log(C) 
for the last line, where log is the inverse of the 
matrix exponential exp. The last line of (44.5) is 
used in the exponential natural evolution strategy, 
XNES [44.31] and guarantees positive definiteness 
of C even with negative weights, independent of cy, 
and of the data z;. The xNES is depicted in Algo- 
rithm 44.8. 


Evolution Strategies | 44.3 Parameter Control 


Algorithm 44.8 The Exponential NES (xNES) 
1: given neE Ni, A>5, we=w'(k)/ X} lw], 

w’ (k) = log(A/2 + 1/2) — log rank(f(x;)), 

Ne @(S+A)/(5n'°) < 1, fo © Nes Me © 1 


2: initialize C? = I, o € R4, x € R” 
3: while not happy 
4: forke{l,...,A} 
5: Z = N (0,1) /lii.d. for all k 
6: x, =xtoC? X Zk 
T P={ kf |1<k<åà; 
8: x< x+ no C? X wer 
ZkEP 
g: no/2 Ci )) 
: 0 <0 exp » wk | — -1 
n 
KEP 
f 1 1 ne/2 T lizel? 
10: C? < C? x exp” X we zg — —I 
ZEP n 


In xNES, sampling is identical to CMA-ES and 
environmental selection is omitted entirely. Line 9 re- 
sembles the step-size update in (44.1). Comparing the 
updates more closely, with cg = 1 (44.1) uses 


hyll Da wal? _ 
n 


1 


whereas xNES uses 


a7 Ee -1) 


k 


for updating o. For u = 1 the updates are the same. 
For u > 1, the latter only depends on the lengths of the 
Zk, While the former depends on their lengths and di- 
rections. Finally, xNES expresses the update (44.5) in 
Line 10 on the Cholesky factor Cc , which does not re- 
main symmetric in this case (C = C2 x ch still holds). 
The term —||z;||2/n keeps the determinant of C2 (and 
thus the trace of log C2) constant and is of rather cos- 
metic nature. Omitting the term is equivalent to using 
No + Nc instead of no in Line 9. 

The exponential natural evolution strategy is a very 
elegant algorithm. Like CMA-ES it can be inter- 
preted as an incremental estimation of distribution algo- 
rithm [44.44]. However, it performs generally inferior 
compared to CMA-ES because it does not use search 
paths for updating o and C. 


44.3.8 Further Aspects 


Internal Parameters 
Adaptation and self-adaptation address the control of 
the most important internal parameters in ES. Yet, all 
algorithms presented have hidden and exposed param- 
eters in their implementation. Many of them can be 
set to reasonable and robust default values. The pop- 
ulation size parameters jz and A however change the 
search characteristics of an evolution strategy signifi- 
cantly. Larger values, in particular for parent number 4, 
often help address highly multimodal or noisy problems 
successfully. 

In practice, several experiments or restarts are 
advisable, where different initial conditions for x 
and o can be employed. For exploring different pop- 
ulation sizes, a schedule with increasing population 
size (IPOP) is advantageous [44.4547], because runs 
with larger populations take typically more function 
evaluations. Preceding long runs (large u and A) 
with short runs (small jz and A) leads to a smaller 
(relative) impairment of the later runs than vice 
versa. 


Internal Computational Complexity 

Algorithms presented in Sects. 44.3.1-44.3.4 that sam- 
ple isotropic or axis-parallel mutation distributions have 
an internal computational complexity linear in the di- 
mension. The internal computational complexity of 
CMA-ES and xNES is, for constant population size, 
cubic in the dimension due to the update of C2. Typ- 
ical implementations of the CMA-ES however have 
quadratic complexity, as they implement a lazy update 
scheme for C3, where C is decomposed into cic? 
only after about n/A iterations. An exact quadratic 
update for CMA-ES has also been proposed [44.48]. 
While never considered in the literature, a lazy update 
for xNES to achieve quadratic complexity seems feasi- 
ble as well. 


Invariance 

Selection and recombination in ES are based solely 
on the ranks of offspring and parent individuals. As 
a consequence, the behavior of ES is invariant under 
order-preserving (strictly monotonous) transformations 
of the fitness function value. In particular, all spherical 
unimodal functions belong to the same function class, 
which the convex-quadratic sphere function is the most 
pronounced member of. This function is more thor- 
oughly investigated in Sect. 44.4. 


885 


Eih | J Wed 


886 PartE 


Evolutionary Computation 


Hhh |3 Hed 


All algorithms presented are invariant under transla- 
tions and Algorithms 44.3, 44.7, and 44.8 are invariant 
under rotations of the coordinate system, provided that 
the initial x is translated and rotated accordingly. 

Parameter control can introduce yet further invari- 
ances. All algorithms presented are scale invariant due 
to step-size adaptation. Furthermore, ellipsoidal func- 
tions that are in the reach of the mutation operator of 
the ES presented in Sects. 44.3.2—44.3.7 are eventually 
transformed, effectively, into spherical functions. These 


44.4 Theory 


There is ample empirical evidence, that on many uni- 
modal functions ES with step-size control, as those 
outlined in the previous section, converge fast and 
with probability one to the global optimum. Conver- 
gence proofs supporting this evidence are discussed 
in Sect. 44.4.3. On multimodal functions on the other 
hand, the probability to converge to the global opti- 
mum (in a single run of the same strategy) is generally 
smaller than one (but larger than zero), as suggested 
by observations and theoretical results [44.55]. Without 
parameter control on the other hand, elitist strategies 
always converge to the essential global optimum, how- 
ever at a much slower rate (compare random search in 
Fig. 44.2). On a bounded domain and with mutation 
variances bounded away from zero, nonelitist strategies 
generate a subsequence of x-values converging to the 
essential global optimum. 

In this section, we use a time index f to denote iter- 
ation and assume, for notational convenience and with- 
out loss of generality (due to translation invariance), 
that the optimum of f is in x* = 0. This simplifies writ- 
ing x —x* to simply x and then ||x || measures the 
distance to the optimum of the parental centroid in time 
step t. 

Linear convergence plays a central role for ES. For a 
deterministic sequence x linear convergence (toward 
zero) takes place if there exists a c > 0 such that 


[xT || 


SES (44.6) 
Ix || 


= exp(—c) , 


i—oo 


which means, loosely speaking, that for ¢ large enough, 
the distance to the optimum decreases in every step 
by the constant factor exp(—c). Taking the logarithm 
of (44.6), then exchanging the logarithm and the limit 


ES are invariant under the respective affine transforma- 
tions of the search space, given the initial conditions are 
chosen respectively. 


Variants 
Evolution strategies have been extended and com- 
bined with other approaches in various ways. We 
mention here constraint handling [44.49, 50], fitness 
surrogates [44.51], multiobjective variants [44.52, 53], 
and exploitation of fitness values [44.54]. 


and taking the Cesaro mean yields 


(t+1) 
537 ad 
L S kO] 


— 
ai n 


=t og Ix ||/lx® I 


= —c. (44.7) 


For a sequence of random vectors, we define linear con- 
vergence based on (44.7) as follows. 


Definition 44.1 Linear Convergence 
The sequence of random vectors x converges almost 
surely linearly to 0 if there exists a c > 0 such that 


= im toe BOI 
C= poo TF kO *° 
T-1 
foe 1l eer] 


t= 


The sequence converges in expectation linearly to 0 if 
there exists a c > 0 such that 


[xt || 


——— (44.9) 
Ix || 


—c= lim Elog 


t> co 
The constant c is the convergence rate of the algorithm. 


Linear convergence, hence, means that asymptoti- 
cally in ¢, the logarithm of the distance to the optimum 
decreases linearly in ¢ like —ct. This behavior has been 
observed in Fig. 44.2 for the (1+1)-ES with 1/5th suc- 
cess rule on a unimodal spherical function. 

Note that A function evaluations are performed per 
iteration and it is then often useful to consider a conver- 


Evolution Strategies | 44.4 Theory 887 


gence rate per function evaluation, i. e., to normalize the 
convergence rate by À. 

The progress rate measures the reduction of the dis- 
tance to optimum within a single generation [44.1]. 


Definition 44.2 Progress Rate 
The normalized progress rate is defined as the expected 
relative reduction of ||x® || 

xO, °) 


a (= =e | 
a, °)) , (44.10) 


Ix || 
(+1) 
x 
=n{1-E bel 
Ix || 
where the expectation is taken over xt) 
given (x,s). In situations commonly consid- 
ered in theoretical analyses, g* does not depend 
on x and is expressed as a function of strategy 
parameters 5. 


Definitions 44.1 and 44.2 are related, in that for 
a given x 


eer? | 
y* < -n log E—.— (44.11) 
IIx || 
lx€+» |] 
< -n E log =nc. (44.12) 


Ix | 


Therefore, progress rate y* and convergence rate nc do 
not agree and we might observe convergence (c > 0) 
while y* < 0. However for n —> ov, we typically have 
y* = nc [44.56]. 

The normalized progress rate y* for ES has 
been extensively studied in various situations, see 
Sect. 44.4.2. Scale-invariance and (sometimes artificial) 
assumptions on the step-size typically ensure that the 
progress rates do not depend on t. 

Another way to describe how fast an algorithm 
approaches the optimum is to count the number 
of function evaluations needed to reduce the dis- 
tance to the optimum by a given factor 1/e or, 
similarly, the runtime to hit a ball of radius e€ 
around the optimum, starting, e.g., from the distance 
one. 


Definition 44.3 Runtime 
The runtime is the first hitting time of a ball around the 
optimum. Specifically, the runtime in number of func- 


tion evaluations as a function of € reads 


A x min f i jx || < € x |x I} 


(1) 
x 
=axminfr I I <e . (44.13) 


kel ~ 


Linear convergence with rate c as given in (44.9) im- 
plies that, for € — 0, the expected runtime divided 
by log(1/e) goes to the constant A/c. 


44.4.1 Lower Runtime Bounds 


Evolution strategies with a fixed number of parent and 
offspring individuals cannot converge faster than lin- 
early and with a convergence rate of O(1/n). This 
means that their runtime is lower bounded by a constant 
times log(1/e”) = nlog(1/e) [44.57-61]. This result 
can be obtained by analyzing the branching factor of the 
tree of possible paths the algorithm can take. It therefore 
holds for any optimization algorithm taking decisions 
based solely on a bounded number of comparisons be- 
tween fitness values [44.57-59]. 

More specifically, the runtime of any (1 t A)-ES 
with isotropic mutations cannot be asymptotically faster 
than « nlog(1/e) A/log(A) [44.62]. Considering more 
restrictive classes of algorithms can provide more pre- 
cise nonasymptotic bounds [44.60, 61]. Different ap- 
proaches address in particular the (1+1)- and (1, A)-ES 
and precisely characterize the fastest convergence rate 
that can be obtained with isotropic normal distributions 
on any objective function with any step-size adaptation 
mechanism [44.56, 63—65]. 

Considering the sphere function, the optimal con- 
vergence rate is attained with distance proportional 
step-size, that is, a step-size proportional to the dis- 
tance of the parental centroid to the optimum, o = 
const x ||x|| = o* ||x||/n. Optimal step-size and optimal 
convergence rate according to (44.8) and (44.9) can 
be expressed in terms of expectation of some random 
variables that are easily simulated numerically. The 
convergence rate of the (1+1)-ES with distance pro- 
portional step-size is shown in Fig. 44.4 as a function 
of the normalized step-size o* = no/||x||. The peak of 
each curve is the upper bound for the convergence rate 
that can be achieved on any function with any form of 
step-size adaptation. As for the general bound, the evo- 
lution strategy converges linearly and the convergence 
rate c decreases to zero like 1/n for n > œœ [44.56, 65, 
66], which is equivalent to linear scaling of the runtime 
in the dimension. The asymptotic limit for the conver- 


Hhh | J Wed 


888 PartE 


Evolutionary Computation 


Hhh |3 Hed 


0.25 


0.1 
0.05 


10° 10° 10! 


oO 


0 p] 
1074 


Fig. 44.4 Normalized convergence rate nc versus nor- 
malized step-size no/||x|| of the (1+1)-ES with distance 
proportional step-size for n = 2, 3,5, 10, 20, œœ (top to bot- 
tom). The peaks of the graphs represent the upper bound 
for the convergence rate of the (1+1)-ES with isotropic 
mutation (corresponding to the lower runtime bound). The 
limit curve for n to infinity (lowest curve) reveals the 
optimal normalized progress rate of y* ~ 0.202 of the 
(1+1)-ES on sphere functions for n —> oo 


gence rate of the (1+1)-ES, as shown in the lowest curve 
in Fig. 44.4, coincides with the progress rate expression 
given in the next section. 


44.4.2 Progress Rates 


This section presents analytical approximations to 
progress rates of ES for sphere, ridge, and cigar func- 
tions in the limit n — oo. Both one-generation results 
and those that consider multiple time steps and cumula- 
tive step-size adaptation are considered. 

The first analytical progress rate results date back 
to the early work of Rechenberg [44.1] and Schwe- 
fel [44.3], who considered the sphere and corridor mod- 
els and very simple strategy variants. Further results 
have since been derived for various ridge functions, sev- 
eral classes of convex quadratic functions, and more 
general constrained linear problems. The strategies that 
results are available for have increased in complex- 
ity as well and today include multiparent strategies 
employing recombination as well as several step-size 
adaptation mechanisms. Only strategy variants with 
isotropic mutation distributions have been considered 
up to this point. However, parameter control strate- 
gies that successfully adapt the shape of the mutation 


Fig. 44.5 Decomposition of mutation vector z into a com- 
ponent Zq in the direction of the negative of the gradient 
vector of the objective function and a perpendicular com- 
ponent ze 


distribution (such as CMA-ES) effectively transform el- 
lipsoidal functions into (almost) spherical ones; thus 
lending extra relevance to the analysis of sphere and 
sphere-like functions. 

The simplest convex quadratic functions to be opti- 
mized are variants of the sphere function (see also the 
discussion of invariance in Sect. 44.3.8) 


o= = =k, 


i=l 


where R denotes the distance from the optimal solu- 
tion. Expressions for the progress rate of ES on sphere 
functions can be computed by decomposing mutation 
vectors into two components zo and zo as illustrated 
in Fig. 44.5. Component Z@ is the projection of z onto 
the negative of the gradient vector Vf of the objective 
function. It contributes positively to the fitness of off- 
spring candidate solution 


y=x+z 
if and only if 
-Vf (x) z>0. 


Component zg = Z—Z@ is perpendicular to the gradi- 
ent direction and contributes negatively to the offspring 
fitness. Its expected squared length exceeds that of zo 
by a factor of n—1. Considering normalized quanti- 
ties o* =on/R and y* = gn/R allows giving concise 
mathematical representations of the scaling properties 
of various ES on spherical functions as shown below. 
Constant o* corresponds to the distance proportional 
step-size from Sect. 44.4.1. 


Evolution Strategies | 44.4 Theory 889 


(1+1)-ES on Sphere Functions 
The normalized progress rate of the (1+1)-ES on sphere 
functions is 


* : *2 * 
g* = l emn [i ert (2 )| 
V20 4 J8 


(44.14) 


in the limit of n —> oo [44.1]. The expression in square 
brackets is the success probability (i.e., the probabil- 
ity that the offspring candidate solution is superior to 
its parent and thus replaces it). The first term in (44.14) 
is the contribution to the normalized progress rate from 
the component zo of the mutation vector that is paral- 
lel to the gradient vector. The second term results from 
the component ze that is perpendicular to the gradient 
direction. 

The black curve in Fig. 44.4 illustrates how the 
normalized progress rate of the (1+1)-ES on sphere 
functions in the limit n — oo depends on the normal- 
ized mutation strength. For small normalized mutation 
strengths, the normalized progress rate is small as 
the short steps that are made do not yield significant 
progress. The success probability is nearly one-half. 
For large normalized mutation strengths, progress is 
near zero as the overwhelming majority of steps re- 
sult in poor offspring that are rejected. The normalized 
progress rate assumes a maximum value of g* = 0.202 
at normalized mutation strength o* = 1.224. The range 
of step-sizes for which close to optimal progress is 
achieved is referred to as the evolution window [44.1]. 
In the runs of the (1+1)-ES with constant step-size 
shown in Fig. 44.2, the normalized step-size initially is 
to the left of the evolution window (large relative dis- 
tance to the optimal solution) and in the end to its right 
(small relative distance to the optimal solution), achiev- 
ing maximal progress at a point in between. 


(u/n, à)-ES on Sphere Functions 
The normalized progress rate of the (j2/,A)-ES on 
sphere functions is described by 


2 
o* 


gr =O" Cuma yy (44.15) 
in the limit n > oo [44.2]. The term cy, is the 
expected value of the average of the u largest or- 
der statistics of A independent standard normally dis- 
tributed random numbers. For À fixed, Cu/u,a de- 
creases with increasing u. For the fixed truncation ratio 
U/À,Cu/u,.a approaches a finite limit value as A and ju 
increase [44.8, 15]. 


Progress per offspring p*/A 
0.25 4 


0.2 


0.15 


0.1 


0.05 


0 0.2 0.4 0.6 0.8 1 
Trunction ratio “/A 


Fig. 44.6 Maximal normalized progress per offspring of 
the (u/u,À)-ES on sphere functions for n —> oo plotted 
against the truncation ratio. The curves correspond to, from 
bottom to top, A = 4, 10,40, 100, co. The dotted line rep- 
resents the maximal progress rate of the (1+1)-ES 


It is easily seen from (44.15) that the normalized 
progress rate of the (/j1,A)-ES is maximized by nor- 
malized mutation strength 


O* = WCu/u.à - (44.16) 


The normalized progress rate achieved with that setting 
is 


2 
x _ HCu/ mA 


7 (44.17) 


p 
The progress rate is negative if o* > 2ucy/y.a- 
Figure 44.6 illustrates how the optimal normalized 
progress rate per offspring depends on the population 
size parameters u and À. Two interesting observations 
can be made from the figure: 


© For all but the smallest values of A, the (w/w, A)- 
ES with u> 1 is capable of significantly more 
rapid progress per offspring than the (1,A)-ES. 
This contrasts with findings for the (4/1,A)-ES, 
the performance of which on sphere functions for 
n— oo monotonically deteriorates with increas- 
ing u [44.8]. 

© For large A, the optimal truncation ratio is z/A = 
0.27, and the corresponding progress per offspring 
is 0.202. Those values are identical to the opti- 
mal success probability and resulting normalized 


Hhh | J Wed 


890 PartE 


Evolutionary Computation 


Hhh |3 Hed 


progress rate of the (1+1)-ES. Beyer [44.8] shows 
that the correspondence is no coincidence and 
indeed exact. The step-sizes that the two strate- 
gies employ differ widely, however. The optimal 
step-size of the (1+1)-ES is 1.224; that of the 
(n/u, À)-ES is UCu/u.a and for fixed truncation 
ratio u/À increases (slightly superlinearly) with the 
population size. For example, optimal step-sizes of 
(u/u,4u)-ES for u € {1,2,3} are 1.029, 2.276, 
and 3.538, respectively. If offspring candidate solu- 
tions can be evaluated in parallel, the (u/u,A)-ES 
is preferable to the (1+1)-ES, which does not ben- 
efit from the availability of parallel computational 
resources. 


Equation (44.15) holds in the limit n — oo for any 
finite value of A. In finite but high dimensional search 
spaces, it can serve as an approximation to the nor- 
malized progress rate of the (j1/j4,A)-ES on sphere 
functions in the vicinity of the optimal step-size pro- 
vided that A is not too large. A better approximation 
for finite n is derived in [44.8, 15] (however compare 
also [44.56]). 

The improved performance of the (w/j,A)-ES for 
u > 1 compared to the strategy that uses u = 1 is a con- 
sequence of the factor jz in the denominator of the term 
in (44.15) that contributes negatively to the normalized 
progress rate. The components zo of mutation vectors 
selected for survival are correlated and likely to point 
in the direction opposite to the gradient vector. The per- 
pendicular components Zo in the limit n — oo have no 
influence on whether a candidate solution is selected for 
survival and are thus uncorrelated. The recombinative 
averaging of mutation vectors results in a length of the 
Z@-component similar to those of individual mutation 
vectors. However, the squared length of the components 
perpendicular to the gradient direction is reduced by 
a factor of u, resulting in the reduction of the nega- 
tive term in (44.15) by a factor of u. Beyer [44.15] has 
coined the term genetic repair for this phenomenon. 

Weighted recombination (compare Algorithms 44.7 
and 44.8) can significantly increase the progress rate of 
(u/u,à)-ES on sphere functions. If n is large, the k- 
th best candidate solution is optimally associated with 
a weight proportional to the expected value of the 
k-th largest order statistic of a sample of A indepen- 
dent standard normally distributed random numbers. 
The resulting optimal normalized progress rate per off- 
spring candidate solution for large values of A then 
approaches a value of 0.5, exceeding that of optimal un- 
weighted recombination by a factor of almost two and 


a half [44.13]. The weights are symmetric about zero. 
If only positive weights are employed and u = |A/2], 
the optimal normalized progress rate per offspring with 
increasing A approaches a value of 0.25. The weights in 
Algorithms 44.7 and 44.8 closely resemble those posi- 
tive weights. 


(4/4, A)-ES on Noisy Sphere Functions 

Noise in the objective function is most commonly 
modeled as being Gaussian. If evaluation of a candi- 
date solution x yields a noisy objective function value 
f(x) +a-N (0, 1), then inferior candidate solutions will 
sometimes be selected for survival and superior ones 
discarded. As a result, progress rates decrease with 
increasing noise strength oe. Introducing normalized 
noise strength ož = o¢n/(2R?), in the limit n —> oo, the 
normalized progress rate of the (u/u,A)-ES on noisy 

sphere functions is 
* *2 
x_ I Cu/ma T 


g“ = ; 
V1+0? 2u 


where 3 = ož /o* is the noise-to-signal ratio that the 
strategy operates under [44.67]. Noise does not impact 
the term that contributes negatively to the strategy’s 
progress. However, it acts to reduce the magnitude of 
the positive term stemming from the contributions of 
mutation vectors parallel to the gradient direction. Note 
that unless the noise scales such that ož is independent 
of the location in search space (i.e., the standard de- 
viation of the noise term increases in direct proportion 
to f(x), such as in a multiplicative noise model with 
constant noise strength), (44.18) describes progress in 
single time steps only rather than a rate of convergence. 

Figure 44.7 illustrates for different offspring pop- 
ulation sizes A how the optimal progress rate per off- 
spring depends on the noise strength. The curves have 
been obtained from (44.18) for optimal values of o* 
and u. As the averaging of mutation vectors results in 
a vector of reduced length, increasing A (and u along 
with it) allows the strategy to operate using larger and 
larger step-sizes. Increasing the step-size reduces the 
noise-to-signal ratio Ŷ that the strategy operates under 
and thereby reduces the impact of noise on selection for 
survival. Through genetic repair, the (u/j4,A)-ES thus 
implicitly implements the rescaling of mutation vectors 
proposed in [44.2] for the (1, A)-ES in the presence of 
noise. Compare c,, and 7, in Algorithms 44.7 and 44.8 
that, for values smaller than one, implement the ex- 
plicit rescaling. It needs to be emphasized though that 
in finite-dimensional search spaces, the ability to in- 


(44.18) 


Evolution Strategies | 44.4 Theory 891 


crease À without violating the assumptions made in the 
derivation of (44.18) is severely limited. Nonetheless, 
the benefits resulting from genetic repair are signifi- 
cant, and the performance of the (j4/1,A)-ES is much 
more robust in the presence of noise than that of the 
(1+1)-ES. 


Cumulative Step-Size Adaptation 
All progress rate results discussed up to this point 
consider single time steps of the respective ES only. 
Analyses of the behavior of ES that include some form 
of step-size adaptation are considerably more difficult. 
Even for objective functions as simple as sphere func- 
tions, the state of the strategy is described by several 
variables with nonlinear, stochastic dynamics, and sim- 
plifying assumptions need to be made in order to arrive 
at quantitative results. 

In the following, we consider the (u/p,A)-ES 
with cumulative step-size adaptation (Algorithm 44.6 
with (44.1) in place of Line 9 for mathematical conve- 
nience) and parameters set such that cg — 0 as n —> oo 
and d = ©(1). The state of the strategy on noisy sphere 
functions with of = const (i.e., noise that decreases 
in strength as the optimal solution is approached) is 
described by the distance R of the parental centroid 
from the optimal solution, normalized step-size o*, the 


Progress per offspring p*/A 
0.25 4 


0.2 
0.15 
0.1 


0.05 


> 
10 15 20 
Noise strength o% 


Fig. 44.7 Optimal normalized progress rate per offspring 
of the (u/u,àÀ)-ES on noisy sphere functions for n> 
co plotted against the normalized noise strength. The 
solid lines depict results for, from bottom to top, A = 
4, 10, 40, 100, oo and optimally chosen jz. The dashed line 
represents the optimal progress rate of the (1+ 1)-ES (af- 
ter [44.68]) 


length of the search path s parallel to the direction of 
the gradient vector of the objective function, and that 
path’s overall squared length. After initialization effects 
have faded, the distribution of the latter three quantities 
is time invariant. Mean values of the time invariant dis- 
tribution can be approximated by computing expected 
values of the variables after a single iteration of the 
strategy in the limit n —> oo and imposing the condi- 
tion that those be equal to the respective values before 
that iteration. Solving the resulting system of equations 
for ož < V2UCu/u.à yields 


x 2 
—*_) (44.19) 


oO” = Mey /u,a4f2—- ( 
HCu/u,à 

for the average normalized mutation strength assumed 

by the strategy [44.69, 70]. The corresponding normal- 

ized progress rate 


is obtained from (44.18). Both the average mutation 
strength and the resulting progress rate are plotted 
against the noise strength in Fig. 44.8. For small 
noise strengths, cumulative step-size adaptation gen- 
erates mutation strengths that are larger than optimal. 
The evolution window continually shifts toward smaller 
values of the step-size, and adaptation remains behind 
its target. However, the resulting mutation strengths 
achieve progress rates within 20% of optimal ones. For 
large noise strengths, the situation is reversed and the 
mutation strengths generated by cumulative step-size 
adaptation are smaller than optimal. However, increas- 
ing the population size parameters u and À allows 
shifting the operating regime of the strategy toward the 
left-hand side of the graphs in Fig. 44.8, where step- 
sizes are near optimal. As above, it is important to keep 
in mind the limitations of the results derived in the limit 
n — oo. In finite-dimensional search spaces the ability 
to compensate for large amounts of noise by increas- 
ing the population size is more limited than (44.19) 
and (44.20) suggest. 


Parabolic Ridge Functions 
A class of test functions that poses difficulties very 
different from those encountered in connection with 


Hhh | J Wed 


892 PartE | Evolutionary Computation 


Hhh |3 Hed 


a) Mutation strength TNUC ju, A) 
IES 


0.5 


0 0.5 1 ES, 
Noise strength O-MUCu/u, a) 


b) Progress rate P'U pu, 212) 
1 


pen 
0 0.5 1 15 2 
Noise strength O-MUCu/u, a) 


Fig. 44.8a,b Normalized mutation strength and normalized progress rate of the (u/ m, A)-ES with cumulative step size 
adaptation on noisy sphere functions for n —> oo plotted against the normalized noise strength. The dashed lines depict 


optimal values 


sphere functions are ridge functions, 


ï a/2 
fen +e ($) =x, HERY, 
i=2 

which include the parabolic ridge for œ = 2. The xı- 
axis is referred to as the ridge axis, and R denotes 
the distance from that axis. Progress can be made by 
minimizing the distance from the ridge axis or by 
proceeding along it. The former requires decreasing 
step-sizes and is limited in its effect as R > 0. The latter 
allows indefinite progress and requires that the step- 
size does not decrease to zero. Short- and long-term 
goals may thus be conflicting, and inappropriate step- 
size adaptation may lead to stagnation. 

As an optimal solution to the ridge problem does 
not exist, the progress rate g of the (j4/j,A)-ES on 
ridge functions is defined as the expectation of the step 
made in the direction of the negative ridge axis. For con- 
stant step-size, the distance R of the parental centroid 
from the ridge axis assumes a time-invariant limit dis- 
tribution. An approximation to the mean value of that 
distribution can be obtained by identifying that value 
of R for which the expected change is zero. Using this 
value yields 

2 
p= PMC) à (44.21) 
n&(1+ J/1+ (2ucu/u.a/nE0))}?) 
for the progress rate of the (u/u,A)-ES on parabolic 
ridge functions [44.71]. The strictly monotonic behav- 


ior of the progress rate, increasing from a value of 
zero for o = 0 to 9 = Ucy/p.a>/(n&) for o > œ, is 
fundamentally different from that observed on sphere 
functions. However, the derivative of the progress rate 
with regard to the step-size for large values of ø tends to 
zero. The limited time horizon of any search as well as 
the intent of using ridge functions as local rather than 
global models of practically relevant objective func- 
tions both suggest that it may be unwise to increase the 
step-size without bounds. 

The performance of cumulative step-size adaptation 
on parabolic ridge functions can be studied using the 
same approach as described above for sphere functions, 
yielding 


_ Ueu/n.à 


ae 


for the (finite) average mutation strength [44.72]. 
From (44.21), the corresponding progress rate 


(44.22) 


= UCu/ wa’ 


Onk (44.23) 


is greater than half of the progress rate attained with any 
finite step size. 


Cigar Functions 
While parabolic ridge functions provide an environ- 
ment for evaluating whether step-size adaptation mech- 
anisms are able to avoid stagnation, the ability to 


Evolution Strategies | 44.4 Theory 893 


make continual meaningful positive progress with some 
constant nonzero step-size is, of course, atypical for 
practical optimization problems. A class of ridge-like 
functions that requires continual adaptation of the mu- 
tation strength and is thus a more realistic model of 
problems requiring ridge following are cigar functions: 


f@ =x) =p +eR’, 


i=2 


with parameter € > 1 being the condition number of 
the Hessian matrix. Small values of £ result in sphere- 
like characteristics, large values in ridge-like ones. As 
above, R measures the distance from the x-axis. 

Assuming successful adaptation of the step-size, 
ES exhibit linear convergence on cigar functions. The 
expected relative per iteration change in the objective 
function value of the population centroid is referred to 
as the quality gain A and determines the rate of conver- 
gence. In the limit n — œ it is described by 


*2 
E£-—1 
ifo” < 2UCu/u. à —— 


3 


SA 
2u(é— 1) 


A* = 
*2 


Cu/p.ao” -53 otherwise , 


where o* =on/R and A* = An/2 [44.73]. That re- 
lationship is illustrated in Fig. 44.9 for several values 
of the conditioning parameter. The parabola for E = 
1 reflects the simple quadratic relationship for sphere 
functions seen in (44.15). (For the case of sphere func- 
tions, normalized progress rate and normalized quality 
gain are the same.) For cigar functions with large values 
of £, two separate regimes can be identified. For small 
step-sizes, the quality gain of the strategy is limited by 
the size of the steps that can be made in the direction of 
the x,-axis. The x;-component of the population cen- 
troid virtually never changes sign. The search process 
resembles one of ridge following, and we refer to the 
regime as the ridge regime. In the other regime, the 
step-size is such that the quality gain of the strategy is 
effectively limited by the ability to approach the optimal 
solution in the subspace spanned by the x2, . . . , X,-axes. 
The x,-component of the population centroid changes 
sign much more frequently than in the ridge regime, 
as is the case on sphere functions. We thus refer to the 
regime as the sphere regime. 

The approach to the analysis of the behavior of 
cumulative step-size adaptation explained above for 


Quality gain ANU, 2) 
0.64 


b > 
0 0.5 1 1) 2 25 
Mutation strength OMUCuj, a) 


Fig. 44.9 Normalized quality gain of (j/j,A)-ES on 
cigar functions for n — oo plotted against the normal- 
ized mutation strength for € € {1,4, 100}. The vertical line 
represents the average normalized mutation strength gen- 
erated by cumulative step-size adaptation 


sphere and parabolic ridge functions can be applied to 
cigar functions as well, yielding 


o* = V2 u/ pA 


for the average normalized mutation strength generated 
by cumulative step-size adaptation [44.73]. The corre- 
sponding normalized quality gain is 


Ja 
~v2-—1 


(V2 I)Uhcu/u.a? if&< 
KS 
2 
HCu/u.à 


= 


Both are compared with optimal values in Fig. 44.10. 
For small condition numbers, (w/p, A)-ES operate in 
the sphere regime and are within 20% of the opti- 
mal quality gain as seen earlier. For large condition 
numbers, the strategy operates in the ridge regime and 
achieves a quality gain within a factor of 2 of the 
optimal one, in accordance with the findings for the 
parabolic ridge above. 


otherwise . 


Further Work 
Further research regarding the progress rate of ES in 
different test environments includes work analyzing the 
behavior of mutative self-adaptation for linear [44.22], 


Hhh | J Wed 


894 PartE | Evolutionary Computation 


Hhh |3 Hed 


a) Mutation strength TNUC ju, A) 
2 


1.5 


0.5 


— Realized 
—-- Optimal 


> 
1 10 100 
Condition number & 


b) Quality gain A'(Uciju, 2) 
il 


— Realized | 
—-— Optimal | 


0.1 


0.01 
p 1 10 100 


Condition number & 


Fig. 44.10a,b Normalized mutation strength and normalized quality gain of the (uw/j,A)-ES with cumulative step-size 
adaptation on cigar functions for n + oo plotted against the condition number of the cigar. The dashed curves represent 


optimal values 


spherical [44.74], and ridge functions [44.75]. Hierar- 
chically organized ES have been studied when applied 
to both parabolic ridge and sphere functions [44.76, 
77]. Several step-size adaptation techniques have been 
compared for ridge functions, including, but not lim- 
ited to, parabolic ones [44.78]. A further class of convex 
quadratic functions for which quality gain results have 
been derived is characterized by the occurrence of only 
two distinct eigenvalues of the Hessian, both of which 
occur with high multiplicity [44.79, 80]. 

An analytical investigation of the behavior of the 
(1+1)-ES on noisy sphere functions finds that failure 
to re-evaluate the parental candidate solution results in 
the systematic overvaluation of the parent and thus in 
potentially long periods of stagnation [44.68]. Contrary 
to what might be expected, the increased difficulty of 
replacing parental candidate solutions can have a pos- 
itive effect on progress rates as it tends to prevent the 
selection for survival of offspring candidate solutions 
solely due to favorable noise values. The convergence 
behavior of the (1+1)-ES on finite-dimensional sphere 
functions is studied by Jebalia et al. [44.81] who show 
that the additive noise model is inappropriate in fi- 
nite dimensions unless the parental candidate solution 
is re-evaluated, and who suggest a multiplicative noise 
model instead. An analysis of the behavior of (jz, A)- 
ES (without recombination) for noisy sphere functions 
finds that in contrast to the situation in the absence of 
noise, strategies with u > 1 can outperform (1, 1)-ES 
if there is noise present [44.82]. The use of nonsingle- 


ton populations increases the signal-to-noise ratio and 
thus allows for more effective selection of good can- 
didate solutions. The effects of non-Gaussian forms of 
noise on the performance of (u/ u, A)-ES applied to the 
optimization of sphere functions have also been inves- 
tigated [44.83]. 

Finally, there are some results regarding the 
optimization of time-varying objectives [44.84] as 
well as analyses of simple constraint handling tech- 
niques [44.85—87]. 


44.4.3 Convergence Proofs 


In the previous section, we have described theoretical 
results that involve approximations in their derivation 
and consider the limit for n —> oo. In this section, exact 
results are discussed. 

Convergence proofs with only mild assumptions on 
the objective function are easy to obtain for ES with 
a step-size that is effectively bounded from below and 
above (and, for nonelitist strategies, when addition- 
ally the search space is bounded) [44.12, 64]. In this 
case, the expected runtime to reach an e€-ball around 
the global optimum (see also Definition 44.3) cannot 
be faster than œ 1/e”, as obtained with pure random 
search for € — 0 orn — oo. If the mutation distribution 
is not normal and exhibits a singularity in zero, conver- 
gence can be much faster than with random search even 
when the step-size is bounded away from zero [44.88]. 
Similarly, convergence proofs can be obtained for adap- 


Evolution Strategies | References 


tive strategies that include provisions for using a fixed 
step-size and covariance matrix with some constant 
probability. 

Convergence proofs for strategy variants that do 
not explicitly ensure that long steps are sampled for 
a sufficiently long time typically require much stronger 
restrictions on the set of objective functions that they 
hold for. Such proofs, however, have the potential to 
reveal much faster, namely linear convergence. Evolu- 
tion strategies with the artificial distance proportional 
step-size, o = const x ||x||, exhibit, as shown above, 
linear convergence on the sphere function with an as- 
sociated runtime proportional to log(1/e) [44.63, 65, 
81, 89]. This result can be easily proved by using a law 
of large numbers, because ||x“F! || /||x© || are indepen- 
dent and identically distributed for all t. 

Without the artificial choice of step-size, o/||x|| 
becomes a random variable. If this random variable 
is a homogeneous Markov chain and stable enough 
to satisfy the law of large numbers, linear conver- 
gence is maintained [44.64, 89]. The stability of the 
Markov chain associated with the self-adaptive (1, A)- 
ES on the sphere function has been shown in dimension 


References 


44.1 l. Rechenberg: Evolutionstrategie: Optimierung 
technischer Systeme nach Prinzipien der biolo- 
gischen Evolution (Frommann-Holzboog, Stuttgart 
1973) 

44.2 |. Rechenberg: Evolutionsstrategie '94 (From- 
mann-Holzboog, Stuttgart 1994) 

44.3 H.-P. Schwefel: Numerische Optimierung von 
Computer-Modellen mittels der Evolutionsstrategie 
(Birkhäuser, Basel 1977) 

44.4 H.-P. Schwefel: Evolution and Optimum Seeking 
(Wiley, New York 1995) 

44.5 L.J. Fogel, A.J. Owens, M.J. Walsh: Artificial Intel- 
ligence through Simulated Evolution (Wiley, New 
York 1966) 

44.6 H.-G. Beyer, H.-P. Schwefel: Evolution strategies — 
A comprehensive introduction, Nat. Comp. 1(1), 3- 
52 (2002) 

44.7 D.B. Fogel: The Fossil Record (Wiley, New York 1998) 

44.8 H.-G. Beyer: The Theory of Evolution Strategies 
(Springer, Berlin, Heidelberg 2001) 

44.9 D.E. Goldberg: Genetic Algorithms in Search, Opti- 
mization and Machine Learning (Addison Wesley, 
Reading 1989) 

44.10 N. Hansen, A. Ostermeier: Completely derandom- 
ized self-adaptation in evolution strategies, Evol. 
Comp. 9(2), 159-195 (2001) 


n = 1 [44.90] providing thus a proof of linear conver- 
gence of this algorithm. The extension of this proof to 
higher dimensions is straightforward. 

Proofs that are formalized by upper bounds on 
the time to reduce the distance to the optimum by 
a given factor can also associate the linear depen- 
dency of the convergence rate in the dimension n. The 
(1 + A)- and the (1, A)-ES with common variants of the 
1/5th success rule converge linearly on the sphere func- 
tion with a runtime of O(n log(1/e) 1/,/log X) [44.62, 
91]. When A is smaller than O(n), the (1+A)-ES 
with a modified success rule is even J logå times 
faster and therefore matches the general lower run- 
time bound 2(nlog(1/e)A/log(A)) [44.62, Theo- 
rem 5]. On convex-quadratic functions, the asymp- 
totic runtime of the (1+1)-ES is the same as on 
the sphere function and, at least in some cases, 
proportional to the condition number of the prob- 
lem [44.92]. 

Convergence proofs of modern ES with recombi- 
nation, of CSA-ES, CMA-ES, or xNES are not yet 
available; however, we believe that some of them are 
likely to be achieved in the coming decade. 


44.11 H.-P. Schwefel, G. Rudolph: Contemporary evo- 
lution strategies. In: Advances Artificial Life, ed. 
by F. Morán, A. Moreno, J.J. Merelo, P. Chacón 
(Springer, Berlin, Heidelberg 1995) pp. 891- 
907 

44.12 G. Rudolph: Convergence Properties of Evolutionary 
Algorithms (Dr. Kovač, Hamburg 1997) 

44.13 D.V. Arnold: Weighted multirecombination evo- 
lution strategies, Theor. Comp. Sci. 361(1), 18-37 
(2006) 

44.14 H. Mühlenbein, D. Schlierkamp-Voosen: Predictive 
models for the breeder genetic algorithm I. Con- 
tinuous parameter optimization, Evol. Comp. 1(1), 
25-49 (1993) 

44.15 H.-G. Beyer: Toward a theory of evolution strate- 
gies: On the benefits of sex — The (ulu, A) theory, 
Evol. Comp. 3(1), 81-111 (1995) 

44.16 C. Kappler: Are evolutionary algorithms improved 
by large mutations?, Lect. Notes Comput. Sci. 1141, 
346-355 (1996) 

44.17 G. Rudolph: Local convergence rates of simple evo- 
lutionary algorithms with Cauchy mutations, IEEE 
Trans. Evol. Comp. 1(4), 249-258 (1997) 

44.18 X. Yao, Y. Liu, G. Lin: Evolutionary programming 
made faster, IEEE Trans. Evol. Comp. 3(2), 82-102 
(1999) 


895 


th |3 Hed 


896 PartE 


Evolutionary Computation 


th |3 Hed 


44.19 


44.20 


44.21 


44.22 


44.23 


44.24 


44.25 


44.26 


44.27 


44.28 


44.29 


44.30 


44,31 


44.32 


44,33 


M. Herdy: The number of offspring as strat- 
egy parameter in hierarchically organized evo- 
lution strategies, ACM SIGBIO Newsl. 13(2), 2-9 
(1993) 

M. Schumer, K. Steiglitz: Adaptive step size random 
search, IEEE Trans. Autom. Control 13(3), 270-276 
(1968) 

S. Kern, S.D. Müller, N. Hansen, D. Biche, J. Oce- 
nasek, P. Koumoutsakos: Learning probability dis- 
tributions in continuous evolutionary algorithms 
— A comparative review, Nat. Comput. 3(1), 77-112 
(2004) 

N. Hansen: An analysis of mutative o-self- 
adaptation on linear fitness functions, Evol. Comp. 
14(3), 255-275 (2006) 

A. Ostermeier, A. Gawelczyk, N. Hansen: A deran- 
domized approach to self-adaptation of evolution 
strategies, Evol. Comp. 2(4), 369-380 (1994) 

T. Runarsson: Reducing random fluctuations in 
mutative self-adaptation, Lect. Notes Comput. Sci. 
2439, 194-203 (2002) 

A. Ostermeier, A. Gawelczyk, N. Hansen: Step-size 
adaptation based on non-local use of selection in- 
formation, Lect. Notes Comput. Sci. 866, 189-198 
(1994) 

N. Hansen, A. Ostermeier, A. Gawelczyk: On the 
adaptation of arbitrary normal mutation distribu- 
tions in evolution strategies: The generating set 
adaptation, Int. Conf. Gen. Algorith., ed. by L.J. Es- 
helman (Morgan Kaufmann, San Francisco 1995) 
pp. 57-64 

N. Hansen, A. Ostermeier: Adapting arbitrary nor- 
mal mutation distributions in evolution strategies: 
The covariance matrix adaptation, IEEE Int. Conf. 
Evol. Comp. (1996) pp. 312-317 

A. Ostermeier, N. Hansen: An evolution strategy 
with coordinate system invariant adaptation of ar- 
bitrary normal mutation distributions within the 
concept of mutative strategy parameter control, 
Proc. Genet. Evol. Comput. Conf. (1999) pp. 902- 
909 

D. Wierstra, T. Schaul, J. Peters, J. Schmidhuber: 
Natural evolution strategies, IEEE Cong. Evol. Comp. 
(CEC 2008) (2008) pp. 3381-3387 

Y. Sun, D. Wierstra, T. Schaul, J. Schmidhuber: Ef- 
ficient natural evolution strategies, Proc. Genet. 
Evol. Comput. Conf. (2009) pp. 539-546 

T. Glasmachers, T. Schaul, Y. Sun, D. Wierstra, 
J. Schmidhuber: Exponential natural evolution 
strategies, Proc. Genet. Evol. Comput. Conf. (2010) 
pp. 393-400 

N. Hansen: Invariance, self-adaptation and cor- 
related mutations and evolution strategies, Lect. 
Notes Comput. Sci. 1917, 355-364 (2000) 

A. Ostermeier: An evolution strategy with momen- 
tum adaptation of the random number distribu- 
tion. In: Proc. 2nd Conf. Parallel Probl. Solving Nat., 
ed. by R. Männer, B. Manderick (North-Holland, 
Amsterdam 1992) pp. 199-208 


44.34 


44,35 


44.36 


44.37 


44.38 


44.39 


44.40 


4441 


44.42 


44.43 


44.44 


44.45 


44.46 


44.47 


44.48 


44.49 


44.50 


J. Poland, A. Zell: Main vector adaptation: A CMA 
variant with linear time and space complexity, 
Proc. Genet. Evol. Comput. Conf. (2001) pp. 1050- 
1055 

J.N. Knight, M. Lunacek: Reducing the space-time 
complexity of the CMA-ES, Proc. Genet. Evol. Com- 
put. Conf. (2007) pp. 658-665 

N. Hansen, S. Kern: Evaluating the CMA evolution 
strategy on multimodal test functions, Lect. Notes 
Comput. Sci. 3242, 282-291 (2004) 

H.G. Beyer, B. Sendhoff: Covariance matrix adapta- 
tion revisited — The CMSA evolution strategy, Lect. 
Notes Comput. Sci. 3199, 123-132 (2008) 

G.A. Jastrebski, D.V. Arnold: Improving evolution 
strategies through active covariance matrix adap- 
tation, IEEE Cong. Evol. Comp. (CEC 2006) (2006) 
pp. 2814-2821 

H.-G. Beyer: Mutate large, but inherit small! On 
the analysis of rescaled mutations in (1, 4)-ES with 
noisy fitness data, Lect. Notes Comput. Sci. 1498, 
109-118 (1998) 

S.l. Amari: Natural gradient works efficiently in 
learning, Neural Comput. 10(2), 251-276 (1998) 

Y. Sun, D. Wierstra, T. Schaul, J. Schmidhuber: 
Stochastic search using the natural gradient, Int. 
Conf. Mach. Learn., ed. by A.P. Danyluk, L. Bottou, 
M.L. Littman (2009) pp. 1161-1168 

Y. Akimoto, Y. Nagata, |. Ono, S. Kobayashi: Bidi- 
rectional relation between CMA evolution strategies 
and natural evolution strategies, Lect. Notes Com- 
put. Sci. 6238, 154-163 (2010) 

L. Arnold, A. Auger, N. Hansen, Y. Ollivier: 
Information-geometric optimization algorithms: 
A unifying picture via invariance principles, ArXiv 
e-prints (2011), DOI arXiv:1106.3708 

M. Pelikan, M.W. Hausschild, F.G. Lobo: Intro- 
duction to estimation of distribution algorithms. 
MEDAL Rep. No. 2012003 (University of Missouri, St. 
Louis 2012) 

G.R. Harik, F.G. Lobo: A parameter-less genetic al- 
gorithm, Proc. Genet. Evol. Comput. Conf. (1999) 
pp. 258-265 

F.G. Lobo, D.E. Goldberg: The parameter-less ge- 
netic algorithm in practice, Inf. Sci. 167(1), 217-232 
(2004) 

A. Auger, N. Hansen: A restart CMA evolution 
strategy with increasing population size, IEEE 
Cong. Evol. Comp. (CEC 2005) (2005) pp. 1769- 
1776 

T. Suttorp, N. Hansen, C. Igel: Efficient covariance 
matrix update for variable metric evolution strate- 
gies, Mach. Learn. 75(2), 167-197 (2009) 

Z. Michalewicz, M. Schoenauer: Evolutionary al- 
gorithms for constrained parameter optimization- 
problems, Evol. Comp. 4(1), 1-32 (1996) 

E. Mezura-Montes, C.A. Coello Coello: Constraint- 
handling in nature-inspired numerical optimiza- 
tion: Past, present, and future, Swarm Evol. Comp. 
1(4), 173-194 (2011) 


Evolution Strategies 


References 


44.51 


44.52 


44.53 


44.54 


44.55 


44.56 


44.57 


44.58 


44.59 


44.60 


44.61 


44.62 


44.63 


44.64 


44.65 


44.66 


M. Emmerich, A. Giotis, M. Ozdemir, T. Back, 
K. Giannakoglou: Metamodel-assisted evolution 
strategies, Lect. Notes Comput. Sci. 2439, 361-370 
(2002) 

C. Igel, N. Hansen, S. Roth: Covariance matrix 
adaptation for multi-objective optimization, Evol. 
Comp. 15(1), 1-28 (2007) 

N. Hansen, T. Vo, C. Igel: Improved step size 
adaptation for the MO-CMA-ES, Proc. Genet. Evol. 
Comput. Conf. (2010) pp. 487-494 

R. Salomon: Evolutionary algorithms and gradi- 
ent search: Similarities and differences, IEEE Trans. 
Evol. Comp. 2(2), 45-55 (1998) 

G. Rudolph: Self-adaptive mutations may lead to 
premature convergence, IEEE Trans. Evol. Comp. 
5(4), 410-414 (2001) 

A. Auger, N. Hansen: Reconsidering the progress 
rate theory for evolution strategies in finite di- 
mensions, Proc. Genet. Evol. Comput. Conf. (2006) 
pp. 445-452 

0. Teytaud, S. Gelly: General lower bounds for evo- 
lutionary algorithms, Lect. Notes Comput. Sci. 4193, 
21-31 (2006) 

H. Fournier, 0. Teytaud: Lower bounds for compari- 
son based evolution strategies using V(-dimension 
and sign patterns, Algorithmica 59(3), 387-408 
(2011) 

0. Teytaud: Lower bounds for evolution strategies. 
In: Theory of Randomized Search Heuristics: Foun- 
dations and Recent Developments, ed. by A. Auger, 
B. Doerr (World Scientific Publ., Singapore 2011) 
pp. 327-354 

J. Jagerskiipper: Lower bounds for hit-and-run di- 
rect search, Lect. Notes Comput. Sci. 4665, 118-129 
(2007) 

J. Jagerskiipper: Lower bounds for randomized di- 
rect search with isotropic sampling, Oper. Res. Lett. 
36(3), 327-332 (2008) 

J. Jagerskiipper: Probabilistic runtime analysis of 
(1 +A) evolution strategies using isotropic mu- 
tations, Proc. Genet. Evol. Comput. Conf. (2006) 
pp. 461-468 

M. Jebalia, A. Auger, P. Liardet: Log-linear 
convergence and optimal bounds for the 
(1+1)-ES, Lect. Notes Comput. Sci. 4926, 207- 
218 (2008) 

A. Auger, N. Hansen: Theory of evolution strate- 
gies: A new perspective. In: Theory of Randomized 
Search Heuristics: Foundations and Recent Devel- 
opments, ed. by A. Auger, B. Doerr (World Scientific 
Publ., Singapore 2011) pp. 289-325 

A. Auger, D. Brockhoff, N. Hansen: Analyzing the 
impact of mirrored sampling and sequential se- 
lection in elitist evolution strategies, Proc. 11th 
Workshop Found. Gen. Algorith. (2011) pp. 127-138 

A. Auger, D. Brockhoff, N. Hansen: Mirrored sam- 
pling in evolution strategies with weighted recom- 
bination, Proc. Genet. Evol. Comput. Conf. (2011) 
pp. 861-868 


44.67 


44.68 


44.69 


44.70 


44.71 


44.72 


44.73 


44.74 


44.75 


44.76 


44.77 


44.78 


44.79 


44.80 


44.81 


44.82 


44.83 


44.84 


44.85 


D.V. Arnold, H.-G. Beyer: Local performance of the 
(m/m, à)-ES in a noisy environment, Proc. 11th 
Workshop Found. Gen. Algorith. (2001) pp. 127-141 
D.V. Arnold, H.-G. Beyer: Local performance of the 
(1+ 1)-ES in a noisy environment, IEEE Trans. Evol. 
Comp. 6(1), 30-41 (2002) 

D.V. Arnold: Noisy Optimization with Evolution 
Strategies (Kluwer Academic, Boston 2002) 

D.V. Arnold, H.-G. Beyer: Performance analysis of 
evolutionary optimization with cumulative step 
length adaptation, IEEE Trans. Autom. Control 
49(4), 617-622 (2004) 

A.I. Oyman, H.-G. Beyer: Analysis of the (w/p,A)- 
ES on the parabolic ridge, Evol. Comp. 8(3), 267-289 
(2000) 

D.V. Arnold, H.-G. Beyer: Evolution strategies 
with cumulative step length adaptation on the 
noisy parabolic ridge, Nat. Comput. 7(4), 555-587 
(2008) 

D.V. Arnold, H.-G. Beyer: On the behaviour of evo- 
lution strategies optimising cigar functions, Evol. 
Comp. 18(4), 661-682 (2010) 

H.-G. Beyer: Towards a theory of evolution strate- 
gies: Self-adaptation, Evol. Comp. 3, 3 (1995) 

S. Meyer-Nieberg, H.-G. Beyer: Mutative self- 
adaptation on the sharp and parabolic ridge, Lect. 
Notes Comput. Sci. 4436, 70-96 (2007) 

D.V. Arnold, A. MacLeod: Hierarchically organised 
evolution strategies on the parabolic ridge, Proc. 
Genet. Evol. Comput. Conf. (2006) pp. 437-444 
H.-G. Beyer, M. Dobler, C. Hammerle, P. Masser: 
On strategy parameter control by meta-ES, Proc. 
Genet. Evol. Comput. Conf. (2009) pp. 499-506 
D.V. Arnold, A. Macleod: Step length adapta- 
tion on ridge functions, Evol. Comp. 16(2), 151-184 
(2008) 

D.V. Arnold: On the use of evolution strategies 
for optimising certain positive definite quadratic 
forms, Proc. Genet. Evol. Comput. Conf. (2007) 
pp. 634-641 

H.-G. Beyer, S. Finck: Performance of the 
(u/i, A)-oSA-ES on a class of PDQFs, IEEE Trans. 
Evol. Comp. 14(3), 400-418 (2010) 

M. Jebalia, A. Auger, N. Hansen: Log-linear con- 
vergence and divergence of the scale-invariant 
(1+1)-ES in noisy environments, Algorithmica 59(3), 
425-460 (2011) 

D.V. Arnold, H.-G. Beyer: On the benefits of pop- 
ulations for noisy optimization, Evol. Comp. 11(2), 
111-127 (2003) 

D.V. Arnold, H.-G. Beyer: A general noise model 
and its effects on evolution strategy performance, 
IEEE Trans. Evol. Comp. 10(4), 380-391 (2006) 

D.V. Arnold, H.-G. Beyer: Optimum tracking with 
evolution strategies, Evol. Comp. 14(3), 291-308 
(2006) 

D.V. Arnold, D. Brauer: On the behaviour of the (1+ 
1)-ES for a simple constrained problem, Lect. Notes 
Comput. Sci. 5199, 1-10 (2008) 


897 


th |3 Hed 


898 PartE 


Evolutionary Computation 


th |3 Hed 


44.86 


44.87 


44.88 


44.89 


D.V. Arnold: On the behaviour of the (1,A)-ES for 
a simple constrained problem, Lect. Notes Comput. 
Sci. 5199, 15-24 (2011) 

D.V. Arnold: Analysis of a repair mechanism for the 
(1,4)-ES applied to a simple constrained problem, 
Proc. Genet. Evol. Comput. Conf. (2011) pp. 853- 
860 

A. Anatoly Zhigljavsky: Theory of Global Random 
Search (Kluwer Academic, Boston 1991) 

A. Bienvenüe, 0. François: Global convergence for 
evolution strategies in spherical problems: Some 


44.90 


44.91 


44,92 


simple proofs and difficulties, Theor. Comp. Sci. 
306(1-3), 269-289 (2003) 

A. Auger: Convergence results for (1,4)-SA-ES using 
the theory of -irreducible Markov chains, Theor. 
Comp. Sci. 334(1-3), 35-69 (2005) 

J. Jagerskiipper: Algorithmic analysis of a basic 
evolutionary algorithm for continuous optimiza- 
tion, Theor. Comp. Sci. 379(3), 329-347 (2007) 

J. Jagerskiipper: How the (1+1) ES using isotropic 
mutations minimizes positive definite quadratic 
forms, Theor. Comp. Sci. 361(1), 38-56 (2006) 


45. Estimation of Distribution Algorithms 


Martin Pelikan, Mark W. Hauschild, Fernando G. Lobo 


Estimation of distribution algorithms (EDAs) guide 
the search for the optimum by building and sam- 
pling explicit probabilistic models of promising 
candidate solutions. However, EDAs are not only 
optimization techniques; besides the optimum or 
its approximation, EDAs provide practitioners with 
a series of probabilistic models that reveal a lot 
of information about the problem being solved. 
This information can in turn be used to design 
problem-specific neighborhood operators for lo- 
cal search, to bias future runs of EDAs on similar 
problems, or to create an efficient computational 
model of the problem. This chapter provides an 
introduction to EDAs as well as a number of point- 
ers for obtaining more information about this class 
of algorithms. 


45.1 Basic EDA Procedure..........................06 900 
45.1.1 Problem Definition.................... 900 
45.1.2 EDA Procedure................c:cccceeees 900 
45.1.3 Simulation of an EDA by Hand.... 902 

45.2 Taxonomy of EDA Models...................... 903 
45.2.1 Classification Based 

on Problem Decomposition........ 904 


45.2.2 Classification Based 
on Local Distributions 
in Graphical Models................... 906 


Estimation of distribution algorithms (EDAs) [45.1- 
8], also called probabilistic model-building genetic al- 
gorithms (PMBGAs) and iterated density estimation 
evolutionary algorithms (IDEAs), view optimization as 
a series of incremental updates of a probabilistic model, 
starting with the model encoding the uniform distri- 
bution over admissible solutions and ending with the 
model that generates only the global optima. In the 
past decade and a half, EDAs have been applied to 
many challenging optimization problems [45.9—21]. In 
many of these studies, EDAs were shown to solve 


45.3 Overview of EDAS ......................ccceeee 908 
45.3.1 EDAs for Fixed-Length Strings 

over Finite Alphabets................. 908 

45.3.2 EDAs for Real-Valued Vectors ...... 911 


45.3.3 EDAs for Genetic Programming.... 913 
45.3.4 EDAs for Permutation Problems... 915 


45.4 EDA Theory iis. osccessssscetessoncensesneeeebeand 916 
45.5 Efficiency Enhancement Techniques 
POG EDAS occas cenuen yee tery eR 917 
45.5.1 Parallelization................:cccceeeees 917 
H5.5.2° Wy Dn CIZatlOM.... ccs savcccsesecccesases 918 
45.5.3 Time Continuation ......cc0...cc.0ee. 918 
45.5.4 Using Prior Knowledge 
and Learning from Experience.... 919 
45.5.5 Fitness Evaluation Relaxation..... 919 
45.5.6 Incremental and Sporadic Model 
BUG. 4... ccacsccsaaeisecsavenscavencs 919 


45.6 Starting Points 
for Obtaining Additional Information ... 920 
45.6.1 Introductory Books 


and TWHOMAIS. ...cccciscsscceseneceste ease 920 

ESG SOTWATE oreren nanira 920 
5:6:3 Journals -ecessecrsiririraseriseass 920 
45.6.4 Conferences ssiri crsrcrirrsscsssssssi 920 

45.7 Summary and Conclusions ................... 921 
REFEFENCES..... occ ces cece ce ec ceeeeeeeeeneenes 921 


problems that were intractable with other techniques 
or no other technique could achieve comparable re- 
sults. However, the motive for the use of EDAs in 
practice is not only that these algorithms can solve 
difficult optimization problems, but that in addition to 
the optimum or its approximation EDAs provide prac- 
titioners with a compact computational model of the 
problem represented by a series of probabilistic mod- 
els [45.22—24]. These probabilistic models reveal a lot 
of information about the problem domain, which can 
in turn be used to bias optimization of similar prob- 


899 


v 
o 

= 

eb 
m 
f 
vi 


900 PartE 


Evolutionary Computation 


ot |3 Hed 


lems, create problem-specific neighborhood operators, 
and many other tasks. While many metaheuristics exist 
that essentially sample implicit probability distributions 
by using a combination of stochastic search operators, 
the insight into the problem represented by a series 
of explicit probabilistic models of promising candi- 
date solutions gives EDAs an edge over most of other 
metaheuristics. 

This chapter provides an introduction to EDAs. Ad- 
ditionally, the chapter presents numerous pointers for 
obtaining additional information about this class of 
algorithms. 


45.1 Basic EDA Procedure 
45.1.1 Problem Definition 


An optimization problem may be defined by specify- 
ing (1) a set of potential solutions to the problem and 
(2) a procedure for evaluating the quality of these so- 
lutions. The set of potential solutions is often defined 
using a general representation of admissible solutions 
and a set of constraints. The procedure for evaluating 
the quality of candidate solutions can either be defined 
as a function that is to be minimized or maximized 
(often referred to as an objective function or fitness 
function) or as a partial ordering operator. The task is 
to find a solution from the set of potential solutions 
that maximizes quality as defined by the evaluation 
procedure. 

As an example, let us consider the quadratic assign- 
ment problem (QAP), which is one of the fundamental 
NP-hard combinatorial problems [45.25]. In QAP, the 
input consists of distances between n locations and 
flows between n facilities. The task is to find a one- 
to-one assignment of facilities to locations so that the 
overall cost is minimized. The cost for a pair of loca- 
tions is defined as the product of the distance between 
these locations and the flow between the facilities as- 
signed to these locations; the overall cost is the sum of 
the individual costs for all pairs of locations. Therefore, 
in QAP, potential solutions are defined as permutations 
that define assignments of facilities to locations and 
the solution quality is evaluated using the cost func- 
tion discussed above. The task is to minimize the cost. 
As another example, consider the maximum satisfiabil- 
ity problem for propositional logic formulas defined in 
the conjunctive normal form with 3 literals per clause 
(MAX3SAT). In MAX3SAT, each potential solution 


The chapter is organized as follows. Section 45.1 
outlines the basic procedure of an EDA. Section 45.2 
presents a taxonomy of EDAs based on the type of 
decomposition encoded by the model and the type of 
local distributions used in the model. Section 45.3 re- 
views some of the most popular EDAs. Section 45.4 
discusses major research directions and the past results 
in theoretical modeling of EDAs. Section 45.5 fo- 
cuses on efficiency enhancement techniques for EDAs. 
Section 45.6 gives pointers for obtaining additional in- 
formation about EDAs. Section 45.7 summarizes and 
concludes the chapter. 


defines one interpretation of propositions (making each 
proposition either true or false), and the quality of a so- 
lution is measured by the number of clauses that are 
satisfied by the specific interpretation. The task is to find 
an interpretation that maximizes the number of satisfied 
clauses. 

Without additional assumptions about the problem, 
one way to find the optimum is to repeat three main 
steps: 


@ Generate candidate solutions. 

© Evaluate the generated solutions. 

© Update the procedure for generating new candidate 
solutions according to the results of the evaluation. 


Ideally, the quality of generated solutions would 
improve over time and after a reasonable number of 
iterations, the execution of these three steps would gen- 
erate the global optimum or its accurate approximation. 
Different algorithms implement the above three steps in 
different ways, but the key idea remains the same — it- 
eratively update the procedure for generating candidate 
solutions so that generated candidate solutions continu- 
ally improve in quality. 


45.1.2 EDA Procedure 


In EDAs, the central idea is to maintain an explicit prob- 
abilistic model that represents a probability distribution 
over candidate solutions. In each iteration, the model is 
adjusted based on the results of the evaluation of can- 
didate solutions so that it will generate better candidate 
solutions in the subsequent iterations. Note that using 
an explicit probabilistic model makes EDAs quite dif- 


Estimation of Distribution Algorithms | 45.1 Basic EDA Procedure 901 


ferent from many other metaheuristics, such as genetic 
algorithms [45.26, 27] or simulated annealing [45.28, 
29], in which the probability distribution used to gen- 
erate new candidate solutions is often defined implicitly 
by a search operator or a combination of several search 
operators. Researchers often distinguish two main types 


of 


EDAs: 


Population-based EDAs. Population-based EDAs 
maintain a population (multiset) of candidate solu- 
tions, starting with a population generated at ran- 
dom according to the uniform distribution over all 
admissible solutions. Each iteration starts by creat- 
ing a population of promising candidate solutions 
using the selection operator, which gives preference 
to solutions of higher quality. Any popular selection 
method for evolutionary algorithms can be used, 
such as truncation or tournament selection [45.30, 
31]. For example, truncation selection selects the 
top t% members of the population. A probabilistic 
model is then built for the selected solutions. New 
solutions are created by sampling the distribution 
encoded by the built model. The new solutions are 
then incorporated into the original population using 
a replacement operator. In full replacement, for ex- 
ample, the entire original population of candidate 
solutions is replaced by the new ones. A pseu- 
docode of a population-based EDA is shown in 
Algorithm 45.1. 

Incremental EDAs. In incremental EDAs, the pop- 
ulation of candidate solutions is fully replaced by 
a probabilistic model. The model is initialized so 
that it encodes the uniform distribution over all 
admissible solutions. The model is then updated 
incrementally by repeating the process of (1) sam- 
pling several candidate solutions from the current 
model and (2) adjusting the model based on the 
evaluation of these candidate solutions and their 
comparison so that the model becomes more likely 
to generate high-quality solutions in subsequent it- 
erations. A pseudocode of an incremental EDA is 
shown in Algorithm 45.2. 


Algorithm 45.1 Population-based estimation of 
distribution algorithm 


1: 


2 
3 
4: 
5 
6 


t40 

: generate population P(0) of random solutions 

: while termination criteria not satisfied, repeat do 
evaluate all candidate solutions in P(t) 

select promising solutions S(t) from P(t) 
build a probabilistic model M(t) for S(t) 


7: generate new solutions O(t) by sampling M (t) 
8: create P(t + 1) by combining O(t) and P(t) 
9: t<t+l 

10: end while 


Algorithm 45.2 Incremental estimation of distri- 
bution algorithm 
1: t<0O 
2: initialize model M(0) to represent the uniform dis- 
tribution over admissible solutions 
3: while termination criteria not satisfied, repeat do 
4: generate population P(t) of candidate solutions 
by sampling M(t) 
5: evaluate all candidate solutions in P(t) 
6: create new model M(t+ 1) by adjusting M(t) ac- 
cording to evaluated P(t) 
T: t<t+l 
8: end while 


Incremental EDAs often generate only a few can- 
didate solutions at a time, whereas population-based 
EDAs often work with a large population of candidate 
solutions, building each model from scratch. Nonethe- 
less, it is easy to see that the two approaches are 
essentially the same because even the population-based 
EDAs can be reformulated in an incremental-based 
manner. 

The main components of a population-based EDA 
thus include: 


(1) A selection operator for selecting promising solu- 
tions. 

(2) An assumed class of probabilistic models to use for 
modeling and sampling. 

(3) A procedure for learning a probabilistic model for 
the selected solutions. 

(4) A procedure for sampling the built probabilistic 
model. 

(5) A replacement operator for combining the popula- 
tions of old and new candidate solutions. 


The main components of an incremental EDA in- 
clude: 


(1) An assumed class of probabilistic models. 

(2) A procedure for adjusting the probabilistic model 
based on new candidate solutions and their evalua- 
tions. 

(3) A procedure for sampling the probabilistic model. 


ot |3 Hed 


902 


ot |3 Hed 


Part E 


Evolutionary Computation 


The procedure for learning a probabilistic model 
usually requires two subcomponents: a metric for eval- 
uating the probabilistic models from the assumed class, 
and a search procedure for choosing a particular model 
based on the metric used. EDAs differ mainly in 
the class of probabilistic models and the procedures 
used for evaluating candidate models and searching for 
a good model. 

The general outline of a population-based EDA is 
quite similar to that of a traditional evolutionary al- 
gorithm (EA) [45.32]; both guide the search toward 
promising solutions by iteratively performing selection 
and variation, the two key ingredients of any EA. In par- 
ticular, components (1) and (5) are precisely the same 
as those used in other EAs. Components (2), (3), and 
(4), however, are unique to EDAs, and constitute their 
way of producing variation, as opposed to using recom- 
bination and mutation operators as is often done with 
other EAs. 

As we shall see, this alternative perspective opens 
a way for designing search procedures from principled 
grounds by bringing to the evolutionary computation 
domain a vast body of knowledge from the machine 
learning literature, and in particular from probabilis- 
tic graphical models. The key idea of EDAs is to look 
at a population of previously visited good solutions as 
data, learn a model (or theory) of that data, and use 
the resulting model to infer where other good solutions 
might be. This approach is powerful, allowing a search 
algorithm to learn and adapt itself with respect to the 
optimization problem being solved, while it is being 
solved. 


45.1.3 Simulation of an EDA by Hand 


To better understand the EDA procedure, this section 
presents a simple EDA simulation by hand. The purpose 
of presenting the simulation is to clarify the components 
of the basic EDA procedure and to build intuition about 
the dynamics of an EDA run. 

The simulation assumes that candidate solutions are 
represented by binary strings of fixed length n > 0. The 
objective function to maximize is onemax, which is de- 
fined as the sum of the bits in the input binary string 
(Xi, X2,..., Xn) 

fonemax (Xi; X2... Xn) = J Xi. (45.1) 

i=1 
The quality of a candidate solution improves with the 
number of 1s in the input string, and the optimum is the 
string of all 1s. 


To model and sample candidate solutions, the sim- 
ulation uses a probability vector [45.1, 6, 33]. A proba- 
bility vector p for n-bit binary strings has n components, 
P =(p1,p2,---,Pn). The component p; represents the 
probability of observing a 1 in position 7 of a solution 
string. To learn the probability vector, p; is set to the 
proportion of 1s in position i observed in the selected 
set of solutions. To sample a new candidate solution 
(X1, X2, . . . , Xn), the components of the probability vec- 
tor are polled and each X; is set to 1 with probability p;, 
and to 0 with probability 1 — p;. 

The expected outcome of the learning and sampling 
of the probability vector is that the population of se- 
lected solutions and the population of new candidate 
solutions have the same proportion of 1s in each po- 
sition. However, since the sampling considers each new 
candidate solution independently of others, the actual 
proportions may vary a little from their expected val- 
ues. The probability-vector EDA described above is 
typically referred to as the univariate marginal distri- 
bution algorithm (UMDA) [45.6]; other EDAs based on 
the probability vector model [45.1, 33, 34] will be dis- 
cussed in Sect. 45.3.1. 

To keep the simulation simple, we consider a 5- 
bit onemax, a population of size N = 6, and truncation 
selection with threshold t = 50%. Recall that the trun- 
cation selection with t = 50% selects the top half of the 
current population. 

Figure 45.1 shows the first two iterations of the 
EDA simulation. The initial population of candidate 
solutions is generated at random. Truncation selection 
then selects the best 50% of candidate solutions based 
on their evaluation using onemax to form the set of 
promising solutions. Next, the probability vector is cre- 
ated based on the selected solutions and the distribution 
encoded by the probability vector is sampled to gener- 
ate new candidate solutions. The resulting population 
replaces the original population and the procedure re- 
peats. 

In both iterations of the simulation, the average ob- 
jective-function value in the new population is greater 
than the average value in the population before se- 
lection. The increase in the average quality of the 
population is good news for us because we want to max- 
imize the objective function, but why does this happen? 
Since for onemax the solutions with more 1s are better 
than those with fewer 1s, selection should increase the 
number of 1s in the population. The learning and sam- 
pling of the probability vector is not expected to create 
or destroy any bits and that is why the new population 
of candidate solutions should contain more 1s than the 


Estimation of Distribution Algorithms 


45.2 Taxonomy of EDA Models 


11001 (3) 11101 (4) 


11101 (4) | Truncation 10111 (4) 
00010 (1) | selection (50%) 11001 (3) 
10111 (4) 0010 (2 
00001 (1) 00) (1) 
10010 (2) )01 
Probability 
vector 


11101 (4) 11101 (4) 


10101 (3) | Truncation 11011 (4) 
11011 (4) | selection (50%) 11101 (4) 
10101 (3) 
11011 (4) 
11101 (4) 
Probability | 3 3 2 1 3 
vector 


original population (both in the proportion and in the 
actual number). Since onemax value increases with the 
number of 1s, we can expect the overall quality of the 
population to increase over time. Ideally, every itera- 
tion should increase the objective-function values in the 
population unless no improvement is possible. 
Nonetheless, the increase of the average objective- 
function value tells only half the story. A similar in- 
crease in the quality of the population in the first 
iteration would be achieved by just repeating selec- 
tion alone without the use of the probabilistic model. 
However, by applying selection alone, no new solutions 
are ever created and the resulting algorithm produces 
no variation at all (i.e., there is no exploration of new 
candidate solutions). Since the initial population is gen- 
erated at random, the EDA with selection alone would 
be just a poor algorithm for obtaining the best solution 
from the initial population. The learning and sampling 
of the probabilistic model provides a mechanism for 


45.2 Taxonomy of EDA Models 


This section provides a high-level overview of the dis- 
tinguishing characteristics of probabilistic models. The 
characteristics are discussed with respect to (1) the 


Fig. 45.1 Simple simulation of an 
EDA based on the probability-vector 
model for onemax. The fitness val- 
ues of candidate solutions are shown 
inside parentheses 


11101 (4) 
10101 (3) 
11011 (4) 
10101 (3) 
11011 (4) 
11101 (4) 


11011 (4) 
11101 (4) 
11101 (4) 
11001 (3) 
11111 (5) 
11101 (4) 


both (1) improving quality of new candidate solutions 
(under certain assumptions), and (2) facilitating explo- 
ration of the set of admissible solutions. 

What we have seen in this simulation was an exam- 
ple of the simplest kind of EDAs. The assumed class of 
probabilistic models, the probability vector, has a fixed 
structure. Under these circumstances, the procedure for 
learning it becomes trivial because there are really no 
alternative models to choose from. This class of EDAs 
is quite limited in what it can do. As we shall see in 
a moment, there are other classes of EDAs that allow 
richer probabilistic models capable of capturing inter- 
actions among the variables of a given problem. More 
importantly, these interactions can be learned automat- 
ically on a problem by problem basis. This results of 
course in a more complex model building procedure, 
but the extra effort has been shown to be well worth 
it, especially when solving more difficult optimization 
problems [45.4, 5, 8, 22, 35-37]. 


types of interactions covered by the model and (2) the 
types of local distributions. This section only focuses 
on the key characteristics of the probabilistic models; 


903 


Z’ Sh |3 Hed 


904 PartE 


Evolutionary Computation 


7S |3 Hed 


a more detailed overview of EDAs for various repre- 
sentations of candidate solutions will be covered by the 
following sections. 


45.2.1 Classification Based 
on Problem Decomposition 


To make the estimation and sampling tractable with 
reasonable sample sizes, most EDAs use probabilistic 
models that decompose the problem using uncondi- 
tional or conditional independence. The way in which 
a model decomposes the problem provides one impor- 
tant characteristic that distinguishes different classes 
of probabilistic models. Classification of probabilistic 
models based on the way they decompose a prob- 
lem is relevant regardless of the types of the under- 
lying distributions or the representation of problem 
variables. 

Most EDAs assume that candidate solutions are rep- 
resented by fixed-length vectors of variables and they 
use graphical models to represent the underlying prob- 
lem structure. Graphical models allow practitioners to 
represent both direct dependencies between problem 
variables as well as independence assumptions. One 
way to classify graphical models is to consider a hi- 
erarchy of model types based on the complexity of 
a model (see Fig. 45.2 for illustrative examples) [45.3, 
4,7]: 


@ No dependencies. In models that assume full in- 
dependence, every variable is assumed to be inde- 
pendent of any other variable. That is, the prob- 
ability distribution P(X,, X2,...,X,) of the vector 
(X,,X2,...,X,) of n variables is assumed to con- 
sist of a product of the distributions of individual 


a) Univariate model b) Chain model 
© © 

O © 
© © 


d) Marginal product model e) Bayesian network 


f) Markov network 


{> {p 


variables 


P(X1, X2, ..., Xn) = ] [?. 


i=1 


(45.2) 


The simulation presented in Sect. 45.1.3 was based 
on a model that assumed full independence of bi- 
nary problem variables. EDAs based on univariate 
models that assume full independence of problem 
variables include the equilibrium genetic algorithm 
(EGA) [45.33], the population-based incremental 
learning (PBIL) [45.1], the UMDA [45.6], the com- 
pact genetic algorithm (cGA) [45.34], the stochastic 
hill climbing with learning by vectors of normal dis- 
tributions [45.38], and the continuous PBIL [45.39]. 
Pairwise dependencies. In this class of models, de- 
pendencies between variables form a tree or forest 
graph. In a tree graph, each variable except for the 
root of the tree is conditioned on its parent in a tree 
that contains all variables. A forest graph, on the 
other hand, is a collection of disconnected trees. 
Again, the forest contains all problem variables. 
Denoting by R the set of roots of the trees in a for- 
est, and by X = (X1, X2, . . . , Xn) the entire vector of 
variables, the distribution from this class can be ex- 
pressed as 


P(X,,Xo,...,Xn) 
=|]? 
XiER 


x I] P (X;|parent(X;)) . 
XiEX\R 


(45.3) 


A special type of a tree model is sometimes dis- 
tinguished, in which the variables form a sequence 


c) Forest model 


Fig. 45.2a-f Illustrative exam- 
ples of graphical models. Problem 
variables are displayed as circles 
and dependencies are shown as 
edges between variables or clus- 
ters of variables. (a) Univariate 
model. (b) Chain model. (c) Forest 
model. (d) Marginal product model. 
(e) Bayesian network. (f) Markov 
network 


i 


Estimation of Distribution Algorithms | 45.2 Taxonomy of EDA Models 


(or a chain), and each variable except for the first 
one depends directly on its predecessor. Denoting 
by x (i) the index of the ith variable in the sequence, 
the distribution is given by 


P(X1,X2,...,Xn) =PXxay) | | 
i=2 


x P(X ro |XxG-1)) - 
(45.4) 


EDAs based on models with pairwise dependencies 
include the mutual information maximizing input 
clustering (MIMIC) [45.36], EDA based on de- 
pendency trees [45.35], and the bivariate marginal 
distribution algorithm (BMDA) [45.40]. 
Multivariate dependencies. Multivariate models 
represent dependencies using either directed acyclic 
graphs or undirected graphs. Two representative 
models are popular in EDAs: (1) Bayesian networks 
and (2) Markov networks. A Bayesian network 
is represented by a directed acyclic graph where 
each node corresponds to a variable and each edge 
defines a direct conditional dependence. The prob- 
ability distribution encoded by a Bayesian network 
can be written as 


P(X, X2,...,Xn) = | [ Pilparents(x;)) : 
i=l 


(45.5) 


A Bayesian network represents problem decom- 
position by conditional independence assumptions; 
each variable is assumed to be independent of any 
of its antecedents in the ancestral ordering of the 
variables, given the values of the variable’s parents. 
Note that all models discussed thus far were spe- 
cial cases of Bayesian networks. In fact, a Bayesian 
network can represent an arbitrary multivariate dis- 
tribution; however, for such a model to be practical, 
it is often desirable to consider Bayesian networks 
of limited complexity. 

In Markov networks (Markov random field mod- 
els), two variables are assumed to be independent of 
each other given a subset of variables defining the 
condition if every path between these variables is 
separated by one or more variables in the condition. 
A special subclass of multivariate models is some- 
times considered in which the variables are divided 
into disjoint clusters, which are independent of each 


other. These models are called marginal product 
models (MPM). Polytrees also represent a subclass 
of multivariate models in which a directed acyclic 
graph is used as the basic dependency structure but 
the graph is restricted so that at most one undirected 
path exists between any two vertices. 
EDAs based on models with multivariate depen- 
dencies include the factorized distribution algorithm 
(FDA) [45.37], the learning FDA (LFDA) [45.37], 
the estimation of Bayesian network algorithm 
(EBNA) [45.41], the Bayesian optimization algo- 
rithm (BOA) [45.42, 43] and its hierarchical version 
(hBOA) [45.44],the extended compact genetic algo- 
rithm (ECGA) [45.45], the polytree EDA [45.46], 
the continuous iterated density estimation algo- 
rithm [45.47], the estimation of multivariate nor- 
mal algorithm (EMNA) [45.48], and the real-coded 
BOA (rBOA) [45.49]. 

© Full dependence. Models may be used that do 
not make any independence assumptions. However, 
such models must typically impose a number of 
other restrictions on the distribution to ensure that 
the models remain tractable for a moderate-to-large 
number of variables. 


There are two additional types of probabilistic mod- 
els that have been used in EDAs and that provide 
a somewhat different mechanism for decomposing the 
problem: 


© Grammar models. Some EDAs use stochastic or 
deterministic grammars to represent the probabil- 
ity distribution over candidate solutions. The ad- 
vantage of grammars is that they allow model- 
ing of variable-length structures. Because of this, 
grammar distributions are mostly used as the ba- 
sis for implementing genetic programming using 
EDAs [45.50], which represents candidate solu- 
tions using labeled trees of variable size. Gram- 
mar models are used, for example, in the proba- 
bilistic-grammar based EDA for genetic program- 
ming [45.51], the program distribution estimation 
with grammar model (PRODIGY) [45.52], or the 
EDA based on probabilistic grammars with latent 
annotations [45.53]. 

@ Feature-based models. Feature-based models en- 
code the distribution of the neighborhood of a can- 
didate solution using position-independent sub- 
structures, which can be found in a variety of 
positions in fixed-length or variable-length solu- 
tions. This approach is used in the feature-based 


905 


Z’ Sh |3 Hed 


906 Part E | Evolutionary Computation 


Z’ Sh |3 Hed 


BOA [45.54]. Other features may be discovered, 
encoded, and used for guiding the exploration of 
the space of candidate solutions. Model-directed 
neighborhood structures are also used in other EDA 
variants, as will be discussed in Sect. 45.5.2. 


45.2.2 Classification Based 
on Local Distributions 
in Graphical Models 


Regardless of how a graphical model decomposes the 
problem, each model must also assume one or more 
classes of distributions to encode local conditional 
and marginal distributions. Some of the most common 
classes of local distributions are discussed below: 


© Probability tables. For discrete representations, 
conditional and marginal probabilities can be en- 
coded using probability tables, which define a prob- 
ability for each relevant combination of values in 
each conditional or marginal probability term. This 
was the case, for example, in the simulation in 
Sect. 45.1.3, in which the probability distribution 
for each string position i was represented by the 
probability p; of a 1; the probability of a O in 
the same position was simply 1— p;. As another 
example, in Bayesian networks, for each variable 
a probability table can be used to define conditional 
probabilities of any value of the variable given any 
combination of values of the variable’s parents. 
While probability tables cannot directly represent 
continuous probability distributions, they can be 
used even for real-valued representations in combi- 
nation with a discretization method that maps real- 
valued variables into discrete categories; each of 
the discrete categories can then be represented us- 
ing a single probability entry. Probability tables are 


>< 
N 
Pasi 
W 
Pei 
KN 


p(X1 | X2, X3, X4) 
0.75 
0.25 
0.25 
0.25 
0.20 
0.20 
0.20 
0.20 


=.=... COCO 
=.=... CO COCO 
=.=... CO OCC CO 


Fig. 45.3 A conditional probability table for p(X) |X2, X3, X4) and 
a corresponding decision tree that reduces the number of parameters 
(probabilities) from 8 to 4 


used, for example, in UMDA [45.6], BOA [45.43] 
and ECGA [45.45]. An example conditional proba- 
bility table is shown in Fig. 45.3. 

Decision trees or graphs, default tables. To avoid 
excessively large probability tables when many 
probabilities are either similar or negligible, more 
advanced local structures such as decision trees, de- 
cision graphs, or default tables may be used. In 
decision trees, for example, probabilities are stored 
in leaves of a decision tree in which each internal 
node represents a test on a variable and the children 
of the node correspond to the different outcomes of 
the test. Decision trees and decision graphs can also 
be used in combination with real-valued variables, 
in which the leaves store a continuous distribution 
in some way. More advanced structures such as de- 
cision trees and decision graphs are used, for exam- 
ple, in the decision-graph BOA (dBOA) [45.55], the 
hierarchical BOA (hBOA) [45.44], and the mixed 
BOA (mBOA) [45.56]. An example decision tree 
for representing conditional probabilities is shown 
in Fig. 45.3. 

Multivariate, continuous distributions. The nor- 
mal distribution is by far the most popular dis- 
tribution used in EDAs to represent univariate 
or multivariate distributions of real-valued vari- 
ables. A multivariate normal distribution can en- 
code a linear correlation between the variables 
using the covariance matrix, but it is often ineffi- 
cient in representing many other types of interac- 
tions [45.56, 57]. Normal distributions were used in 
many EDAs for real-valued vectors [45.38, 39, 47, 
48], although in many real-valued EDAs more ad- 
vanced distributions were used as well. Examples 
of multivariate normal distributions are shown in 
Fig. 45.4a-c. 

Mixtures of distributions. A mixture distribution 
consists of multiple components. Each compo- 
nent is represented by a specific local probabilistic 
model, such as a normal distribution, and each 
component is assigned a probability. Mixture dis- 
tributions were used in EDAs especially to en- 
able EDAs for real-valued representations to deal 
with real-valued distributions with multiple basins 
of attraction, in which a single-peak distribution 
does not suffice. Mixture distributions were used, 
for example, in the real-valued iterated density 
estimation algorithms [45.47] or the real-coded 
BOA [45.49]. The use of mixture distributions is 
more popular in EDAs for real-valued represen- 
tations, although mixture distributions were also 


Estimation of Distribution Algorithms 


45.2 Taxonomy of EDA Models 


a) Multivariate normal distribution with equal standard 


c) 


deviations and no covariance 


fX, X2) 


Multivariate normal distribution with an arbitrary 
(nondiagonal) covariance matrix 


f(X, X2) 


b) Multivariate normal distribution with arbitrary standard 


deviations for each variable (diagonal covariance matrix) 


f (Xi, X) 


w 


fX, X) 


A \ 

OEA W 

WA a 
i 


( Ke 
es 


ss 


Fig. 45.4a-d Local models for continuous distributions over real-valued variables. (a) Multivariate normal distribution 
with equal standard deviations and no covariance, (b) Multivariate normal distribution with arbitrary standard devia- 
tions for each variable (diagonal covariance matrix), (c) Multivariate normal distribution with an arbitrary (nondiagonal) 
covariance matrix, (d) Joint normal kernels distribution 


used to represent distributions over discrete rep- 
resentations in which the population consists of 
multiple dissimilar clusters [45.58] and in multi- 
objective EDAs [45.59, 60]. An example of a mix- 
ture of normal kernel distributions is shown in 
Fig. 45.4d. 

Histograms. In a number of EDAs for real-valued 
representations, to encode local distributions, real- 
valued variables or sets of such variables are di- 
vided into rectangular regions using a histogram- 
like model, and a separate probabilistic model is 


used to represent the distribution in each region. 
Histogram models can be seen as a special sub- 
class of the decision-tree models for real-valued 
variables. In real-valued EDAs, histograms were 
used, for example, in the histogram-based con- 
tinuous EDA [45.61]. Histogram models can also 
be used for other representations; for example, 
when optimizing permutations, histograms can be 
used to represent different relative ordering con- 
straints and their importance with respect to solu- 
tion quality [45.62, 63]. 


907 


Z'S |3 Hed 


908 PartE | Evolutionary Computation 


€°S |3 Hed 


45.3 Overview of EDAs 


This section gives an overview of EDAs based on the 
representation of candidate solutions; although some of 
the EDAs can be used across several representations. 
Due to the large volume of work in EDASs in the past 
two decades, we do not aim to list every single variant 
of an EDA discussed in the past; instead, we focus on 
some of the most important representatives. 


45.3.1 EDAs for Fixed-Length Strings 
over Finite Alphabets 


EDAs for candidate solutions represented by fixed- 
length strings over a finite alphabet can use a vari- 
ety of model types, from simple univariate models to 
complex Bayesian networks with local structures. This 
section reviews some of the work in this area. Candi- 
date solutions are assumed to be represented by binary 
strings of fixed length n, although most methods pre- 
sented here can be extended to optimization of strings 
over an arbitrary finite alphabet. The section classifies 
EDAs based on the order of interactions in the under- 
lying dependency model along the lines discussed in 
Sect. 45.2.1 [45.3, 4, 7]. 


No Interactions 

The EGA [45.33] and the population-based incremental 
learning (PBIL) [45.1] replace the population of candi- 
date solutions represented as fixed-length binary strings 
by a probability vector (p1, p2,..., Pn), Where n is the 
number of bits in a string and p; denotes the probability 
of a 1 in the ith position of solution strings. Each p; is 
initially set to 0.5, which corresponds to a uniform dis- 
tribution over the set of all solutions. In each iteration, 
PBIL generates s candidate solutions according to the 
current probability vector where s > 2 denotes the se- 
lection pressure. Each value is generated independently 
of its context (remaining bits) and thus no interactions 
are considered (Fig. 45.2a). The best solution from the 
generated set of s solutions is then used to update the 
probability-vector entries using 


Pi = pit Axi- pi), 


where A € (0, 1) is the learning rate (say, 0.02), and x; 
is the ith bit of the best solution. Using the above up- 
date rule, the probability p; of a 1 in the ith position 
increases if the best solution contains a 1 in that position 
and decreases otherwise. In other words, probability- 
vector entries move toward the best solution and, con- 
sequently, the probability of generating this solution 


increases. The process of generating new solutions and 
updating the probability vector is repeated until some 
termination criteria are met; for instance, the run can 
be terminated if all probability-vector entries are suffi- 
ciently close to either 0 or 1. 

Prior work refers to PBIL also as the hill 
climbing with learning (HCwL) [45.64] and the in- 
cremental univariate marginal distribution algorithm 
(IUMDA) [45.65]. 

PBIL is an incremental EDA, because it proceeds 
by executing incremental updates of the model using 
a small sample of candidate solutions. However, there is 
a strong correlation between the learning rate in PBIL 
and the population size in population-based EDAs or 
other evolutionary algorithms; essentially, decreasing 
the learning rate à corresponds to increasing the pop- 
ulation size. 

The cGA [45.34, 66] reduces the gap between PBIL 
and traditional steady-state genetic algorithms. Like 
PBIL, cGA replaces the population by a probability 
vector and all entries in the probability vector are ini- 
tialized to 0.5. Each iteration updates the probability 
vector by mimicking the effect of a single competition 
between two sampled solutions, where the best replaces 
the worst, in a hypothetical population of size N. De- 
noting the bit in the ith position of the best and worst of 
the two sampled solutions by x; and y;, respectively, the 
probability-vector entries are updated as follows: 


1 
Pit g if x, and y 


Pi = : if x; = 0 and y; = 1 
! T xi = i= 
PiN y 


Di otherwise . 


Although cGA uses the probability vector instead 
of a population, updates of the probability vec- 
tor correspond to replacing one candidate solution 
by another one using a population of size N and 
shuffling the resulting population using a univariate 
model that assumes full independence of problem 
variables. 

The UMDA [45.6] maintains a population of so- 
lutions. Each iteration of UMDA starts by selecting 
a population of promising solutions using an arbitrary 
selection method of evolutionary algorithms. A prob- 
ability vector is then computed using the selected 
population of promising solutions and new solutions 
are generated by sampling the probability vector. The 


Estimation of Distribution Algorithms | 45.3 Overview of EDAs 


new solutions replace the old ones and the process is 
repeated until termination criteria are met. Although 
UMDA uses a probabilistic model as an intermediate 
step between the original and new populations unlike 
PBIL and cGA, the performance, dynamics and limita- 
tions of PBIL, cGA, and UMDA are similar. 

PBIL, cGA, and UMDA can solve problems de- 
composable into subproblems of order | in a linear or 
quadratic number of fitness evaluations. However, if de- 
composition into single-bit subproblems misleads the 
decision making away from the optimum, these algo- 
rithms scale up poorly with problem size [45.60, 67, 
68]. 


Pairwise Interactions 
EDAs based on pairwise probabilistic models, such 
as a chain, a tree or a forest, represent the first step 
toward EDAs being capable of learning variable inter- 
actions and therefore solving decomposable problems 
of bounded order (difficulty) in a scalable manner. 

The MIMIC algorithm [45.36] uses a chain distri- 
bution (Fig. 45.2b) specified by 


(1) an ordering of string positions (variables), 

(2) a probability of a 1 in the first position of the chain, 
and 

(3) conditional probabilities of every other position 
given the value in the previous position in the chain. 


A chain probabilistic model encodes the probabil- 
ity distribution where all positions except the first are 
conditionally dependent on the previous position in the 
chain. After selecting promising solutions and com- 
puting marginal and conditional probabilities, MIMIC 
uses a greedy algorithm to maximize mutual informa- 
tion between the adjacent positions in the chain. In this 
fashion, the Kullback—Leibler divergence [45.69] be- 
tween the chain and actual distributions is minimized. 
Nonetheless, the greedy algorithm does not guarantee 
global optimality of the constructed model (with respect 
to Kullback—Leibler divergence). The greedy algorithm 
starts in the position with the minimum unconditional 
entropy. The chain is expanded by adding a new posi- 
tion that minimizes the conditional entropy of the new 
variable given the last variable in the chain. Once the 
full chain is constructed for the selected population of 
promising solutions, new solutions are generated by 
sampling the distribution encoded by the chain. The use 
of pairwise interactions was one of the most important 
steps in the development of EDAs capable of solving 
decomposable problems of bounded difficulty scalably. 


MIMIC was the first discrete EDA to not only learn and 
use a fixed set of st atistics, but it was also capable of 
identifying the statistics that should be considered to 
solve the problem efficiently. 

Baluja and Davies [45.35] use dependency trees 
(Fig. 45.2c) to model promising solutions. Like in 
PBIL, the population is replaced by a probability vec- 
tor but in this case the probability vector contains all 
pairwise probabilities. The probabilities are initialized 
to 0.25. Each iteration adjusts the probability vector ac- 
cording to new promising solutions acquired on the fly. 
A dependency tree encodes the probability distribution 
where every variable except for the root is condi- 
tioned on the variable’s parent in the tree. A variant 
of Prim’s algorithm for finding the minimum spanning 
tree [45.70] can be used to construct an optimal tree 
distribution. Here the task is to find a tree that maxi- 
mizes mutual information between parents (nodes with 
successors) and their children (successors). This can be 
done by first randomly choosing a variable to form the 
root of the tree, and hanging new variables to the ex- 
isting tree so that the mutual information between the 
parent of the new variable and the variable itself is max- 
imized. In this way, the Kullback—Leibler divergence 
between the tree and actual distributions is minimized 
as shown in [45.71]. Once a full tree is constructed, 
new solutions are generated according to the distribu- 
tion encoded by the constructed dependency tree and 
the conditional probabilities computed from the proba- 
bility vector. 

The BMDA [45.40] uses a forest distribution 
(a set of mutually independent dependency trees, see 
Fig. 45.2c). This class of models is even more general 
than the class of dependency trees, because any forest 
that contains two or more disjoint trees cannot be gen- 
erally represented by a tree. As a measure to determine 
whether to connect two variables, BMDA uses a Pear- 
son’s chi-square test [45.72]. This measure is also used 
to discriminate the remaining dependencies in order to 
construct the final model. To learn a model, BMDA uses 
a variant of Prim’s algorithm [45.70]. 

Pairwise models capture some interactions in 
a problem with reasonable computational overhead. 
EDAs with pairwise probabilistic models can identify, 
propagate, and juxtapose partial solutions of order 2, 
and therefore they work well on problems decompos- 
able into subproblems of order at most two [45.35, 36, 
40, 65, 73]. Nonetheless, capturing only some pairwise 
interactions has still been shown to be insufficient for 
solving all decomposable problems of bounded diffi- 
culty scalably [45.40, 73]. 


909 


€°S7 |3 Hed 


910 PartE 


Evolutionary Computation 


€°St |3 Hed 


Multivariate Interactions 
Using general multivariate models allows powerful 
EDAs capable of solving problems of bounded diffi- 
culty quickly, accurately, and reliably [45.4, 5, 8, 22, 
37]. On the other hand, learning distributions with mul- 
tivariate interactions necessitates more complex model- 
learning algorithms that require significant computa- 
tional time and still do not guarantee global optimality 
of the resulting model. Nonetheless, many difficult 
problems are intractable using simple models and the 
use of complex models and algorithms is necessary. 

The FDA [45.74] uses a fixed factorized distribution 
throughout the whole run. The model is allowed to con- 
tain multivariate marginal and conditional probabilities, 
but FDA learns only the probabilities, not the structure 
(dependencies and independencies). To solve a problem 
using FDA, we must first decompose the problem and 
then factorize the decomposition. While it is useful to 
incorporate prior information about the regularities in 
the search space, FDA necessitates that the practitioner 
is able to decompose the problem using a probabilistic 
model ahead of time. FDA does not learn what statistics 
are important to process within the EDA framework, it 
must be given that information in advance. A variant of 
FDA where probabilistic models are restricted to poly- 
trees was also proposed [45.46]. 

The ECGA [45.45] uses an MPM that partitions 
the variables into disjoint subsets (Fig. 45.2d). Each 
partition (subset) is treated as a single variable and dif- 
ferent partitions are considered to be mutually indepen- 
dent. To decide between alternative MPMs, ECGA uses 
a variant of the minimum description length (MDL) 
metric [45.75-77], which favors models that allow 
higher compression of data (in this case, the selected set 
of promising solutions). More specifically, the Bayesian 
information criterion (BIC) [45.78] is used. To find 
a good model, ECGA uses a greedy algorithm that 
starts with each variable forming one partition (like in 
UMDA). Each iteration of the greedy algorithm merges 
two partitions that maximize the improvement of the 
model with respect to BIC. If no more improvement 
is possible, the current model is used. ECGA provides 
robust and scalable solution for problems that can be 
decomposed into independent subproblems of bounded 
order (separable problems) [45.79-8 1]. However, many 
real-world problems contain overlapping dependencies, 
which cannot be accurately modeled by dividing the 
variables into disjoint partitions; this can result in poor 
performance of ECGA. 

The dependency-structure matrix genetic algorithm 
(DSMGA) [45.82-84] uses a similar class of models as 


ECGA that splits the variables into independent clusters 
or linkage groups. However, DSMGA builds models via 
dependency structure matrix clustering techniques. 

The BOA [45.42] builds a Bayesian network for the 
population of promising solutions (Fig. 45.2e) and sam- 
ples the built network to generate new candidate solu- 
tions. BOA uses the Bayesian—Dirichlet metric subject 
to a maximum model-complexity constraint [45.85- 
87] to discriminate competing models, but other met- 
rics (such as BIC) have been analyzed in BOA as 
well [45.88]. In all variants of BOA, the model is con- 
structed by a greedy algorithm that iteratively adds 
a new dependency in the model that maximizes the 
model quality. Other elementary graph operators — such 
as edge removals and reversals — can be incorporated, 
but edge additions are most important. The construction 
is terminated when no more improvement is possible. 
The greedy algorithm used to learn a model in BOA is 
similar to the one used in ECGA. However, Bayesian 
networks can encode more complex dependencies and 
independencies than models used in ECGA. Therefore, 
BOA is also applicable to problems with overlapping 
dependencies. BOA uses an equivalent class of mod- 
els as FDA; however, BOA learns both the structure 
and the probabilities of the model. Although BOA 
does not require problem-specific knowledge in ad- 
vance, prior information about the problem can be 
incorporated using Bayesian statistics, and the relative 
influence of prior information and the population of 
promising solutions can be tuned by the user [45.89, 
90]. 

A discussion of the use of Bayesian networks 
as an extension to tree models can also be found 
in Baluja’s and Davies’ work [45.91]. An EDA that 
uses Bayesian networks to model promising solutions 
was independently developed by Etxeberria and Lar- 
rañaga [45.41], who called it the EBNA. Miihlenbein 
and Mahnig [45.37] improved the original FDA by 
using Bayesian networks together with the greedy al- 
gorithm for learning the networks described above; the 
modification of FDA was named the (LFDA). An incre- 
mental version of BOA, the incremental BOA (iBOA) 
was proposed by Pelikan et al. [45.92]. 

The hierarchical BOA (hBOA) [45.44] extends 
BOA by employing local structures to represent lo- 
cal distributions instead of using standard conditional 
probability tables. This enables hBOA to more effi- 
ciently represent distributions with high-order interac- 
tions. Furthermore, hBOA incorporates a niching tech- 
nique called restricted tournament selection [45.93] to 
ensure effective diversity preservation. The two exten- 


Estimation of Distribution Algorithms | 45.3 Overview of EDAs 


sions enable hBOA to solve problems decomposable 
into subproblems of bounded order over a number of 
levels of difficulty of a hierarchy [45.44, 94]. 

Markov networks are yet another class of models 
that can be used to identify and use multivariate in- 
teractions in EDAs. Markov networks are undirected 
graphical models (Fig. 45.2f). Compared to Bayesian 
networks, Markov networks may sometimes cover the 
same distribution using fewer edges in the dependency 
model, but the sampling of these models becomes 
more complicated than the sampling of Bayesian net- 
works. Markov networks are used, for example, in the 
Markov network EDA (MN-EDA) [45.95] and the den- 
sity estimation using Markov random fields algorithm 
(DEUM) [45.96, 97]. 

Helmholtz machines used in the Bayesian evolu- 
tionary algorithm proposed by Zhang and Shin [45.98] 
can also encode multivariate interactions. Helmholtz 
machines encode interactions by introducing new, hid- 
den variables, which are connected to every variable. 

EDAs that use models capable of covering multi- 
variate interactions can solve a wide range of prob- 
lems in a scalable manner; promising results were 
reported on a broad range of problems, including 
several classes of spin-glass systems [45.22, 99-101], 
graph partitioning [45.90, 102, 103], telecommunica- 
tion network optimization [45.104], silicon cluster op- 
timization [45.80], scheduling [45.105], forest man- 
agement [45.13], ground water remediation system 
design [45.106, 107], multiobjective knapsack [45.20], 
and others. 


45.3.2 EDAs for Real-Valued Vectors 


There are two basic approaches to extending EDAs for 
discrete, fixed-length strings to other domains such as 
real-valued vectors: 


@ Map the other representation to the domain of fixed- 
length discrete strings, solve the discrete problem, 
and map the solution back to the problem’s original 
representation. 

© Extend or modify the class of probabilistic models 
to other domains. 


A number of studies have been published about the 
mapping of real-valued representations into a discrete 
one in evolutionary computation [45.26, 108—111]; this 
section focuses on EDAs from the second category. The 
approaches are classified along the lines presented in 
Sect. 45.2 [45.7, 22]. 


Single-Peak Normal Distributions 

The stochastic hill climbing with learning by vec- 
tors of normal distributions (SHCLVND) [45.38] is 
a straightforward extension of PBIL to vectors of 
real-valued variables using a normal distribution to 
model each variable. SHCLVND replaces the popula- 
tion of real-valued solutions by a vector of means u = 
([41,-.-+ Hn), Where u; denotes a mean of the distribu- 
tion for the ith variable. The same standard deviation o 
is used for all variables. See Fig. 45.4a for an example 
model. In each generation (iteration), a random set of 
solutions is first generated according to u and o. The 
best solution out of this subset is then used to update 
the entries in u by shifting each u; toward the value 
of the ith variable in the best solution using an update 
tule similar to the one used in PBIL. Additionally, each 
generation reduces the standard deviation to make the 
future exploration of the search space narrower. A sim- 
ilar algorithm was independently developed by Sebag 
and Ducoulombier [45.39], who also discussed several 
approaches to evolving a standard deviation for each 
variable. 


Mixtures of Normal Distributions 
The probability density function of a normal distribu- 
tion is centered around its mean and decreases exponen- 
tially with square distance from the mean. If there are 
multiple clouds of values, a normal distribution must ei- 
ther focus on only one of these clouds, or it can embrace 
multiple clouds at the expense of including the low- 
density area between them. In both cases, the resulting 
distribution cannot model the data accurately. One way 
of extending standard single-peak normal-distribution 
models to enable coverage of multiple groups of sim- 
ilar points is to use a mixture of normal distributions. 
Each component of the mixture of normal distributions 
is a normal distribution by itself. A coefficient is spec- 
ified for each component of the mixture to denote the 
probability that a random point belongs to this compo- 
nent. The probability density function of a mixture is 
thus computed by multiplying the density function of 
each mixture component by the probability that a ran- 
dom point belongs to the component, and adding these 
weighted densities together. 

Gallagher etal. [45.112,113] extended EDAs 
based on single-peak normal distributions by using an 
adaptive mixture of normal distributions to model each 
variable. The parameters of the mixture (including the 
number of components) evolve based on the discov- 
ered promising solutions. Using mixture distributions 
is a significant improvement compared to single-peak 


911 


€°S |3 Hed 


912 


ESH |3 Hed 


Part E 


Evolutionary Computation 


normal distributions, because mixtures allow simulta- 
neous exploration of multiple basins of attraction for 
each variable. 

Within the IDEA framework, Bosman and 
Thierens [45.47] proposed IDEAs using the joint 
normal kernels distribution, where a single normal 
distribution is placed around each selected solution 
(Fig. 45.4d). A joint normal kernels distribution 
can be therefore seen as an extreme use of mixture 
distributions with one mixture component per point 
in the training sample. The variance of each normal 
distribution can be fixed to a relatively small value, but 
it should be preferable to adapt variances according to 
the current state of search. Using kernel distributions 
corresponds to using a fixed zero-mean normally 
distributed mutation for each promising solution as 
is often done in evolution strategies [45.114]. That 
is why it is possible to directly take up strategies for 
adapting the variance of each kernel from evolution 
strategies [45.114-117]. 


Joint Normal Distributions and Their Mixtures 
What changes when instead of fitting each variable 
with a separate normal distribution or a mixture of nor- 
mal distributions, groups of variables are considered 
together? Let us first consider using a single-peak nor- 
mal distribution. In multivariate domains, a joint normal 
distribution can be defined by a vector of n means 
(one mean per variable) and a covariance matrix of 
size n x n. Diagonal elements of the covariance matrix 
specify the variances for all variables, whereas nondi- 
agonal elements specify linear dependencies between 
pairs of variables. Considering each variable separately 
corresponds to setting all nondiagonal elements in a co- 
variance matrix to 0. Using different deviations for 
different variables allows for squeezing or stretching 
the distribution along the axes. On the other hand, us- 
ing nondiagonal entries in the covariance matrix allows 
rotating the distribution around its mean. Figure 45.4b 
and c illustrates the difference between a joint normal 
distribution using only diagonal elements of the covari- 
ance matrix and a distribution using the full covariance 
matrix. Therefore, using a covariance matrix introduces 
another degree of freedom and improves the expressive- 
ness of a distribution. Again, one can use a number 
of joint normal distributions in a mixture, where each 
component consists of its mean, covariance matrix, and 
weight. 

A joint normal distribution including a full or 
partial covariance matrix was used within the IDEA 
framework [45.47] and in the estimation of Gaussian 


networks algorithm (EGNA) [45.48]. Both these algo- 
rithms can be seen as extensions of EDAs that model 
each variable by a single normal distribution, which 
allow also the use of nondiagonal elements of the co- 
variance matrix. 

Bosman and Thierens [45.118] proposed mixed 
IDEAs as an extension of EDAs that use a mixture 
of normal distributions to model each variable. Mixed 
IDEAs allow multiple variables to be modeled by a sep- 
arate mixture of joint normal distributions. At one 
extreme, each variable can have a separate mixture; 
at another extreme, one mixture of joint distributions 
covering all the variables is used. Despite that learning 
such a general class of distributions is quite difficult and 
a large number of samples is necessary for reasonable 
accuracy, good results were reported on single-objec- 
tive [45.118] as well as multiobjective problems [45.59, 
119, 120]. Using mixture models for all variables was 
also proposed as a technique for reducing model com- 
plexity in discrete EDAs [45.58]. 

Real-valued EDAs presented so far are applicable 
to real-valued optimization problems without requiring 
differentiability or continuity of the underlying prob- 
lem. However, if it is possible to at least partially 
differentiate the problem, gradient information can be 
used to incorporate some form of gradient-based local 
search and the performance of real-valued EDAs can 
be significantly improved. A study on combining real- 
valued EDAs within the IDEA framework with gra- 
dient-based local search can be found, for example, 
in [45.121]. 

One of the crucial limitations of using estimation 
of real-valued distributions is that real-valued EDAs 
have a tendency to lose diversity too fast even when the 
problem is relatively easy to solve [45.122]; for exam- 
ple, maximum likelihood estimation and sampling of 
a normal distribution will lead to diversity loss even 
while climbing a simple linear slope. That is why sev- 
eral EDAs were proposed that aim to control variance 
of the probabilistic model so that the loss of variance is 
avoided and yet the effective exploration is not ham- 
pered by an overly large variance of the model. For 
example, the adapted maximum-likelihood Gaussian 
model iterated density-estimation evolutionary algo- 
rithm (AMaLGaM) scales up the covariance matrix 
to prevent premature convergence on slopes [45.123, 
124]. 


Other Real-Valued EDAs 
Using normal distributions is not the only approach to 
modeling real-valued distributions. Other density func- 


Estimation of Distribution Algorithms | 45.3 Overview of EDAs 


tions are frequently used to model real-valued proba- 
bility distributions, including histogram distributions, 
interval distributions, and others. A brief review of real- 
valued EDAs that use other than normal distributions or 
their mixtures follows. 

In the algorithm proposed by Server et al. [45.125], 
an interval (a;,b;) and a number z; € (0, 1) are stored 
for each variable. By z;, the probability that the ith vari- 
able is in the lower half of (a;, bi) is denoted. Each z; is 
initialized to 0.5. To generate a new candidate solution, 
the value of each variable is selected randomly from the 
corresponding interval. The best solution is then used to 
update the value of each z;. If the value of the ith vari- 
able of the best solution is in a lower half of (a;, bi), zi is 
shifted toward 0; otherwise, z; is shifted toward 1. When 
zi gets close to 0, interval (a;, b;) is reduced to its lower 
half; if z; gets close to 1, interval (a;,b;) is reduced to 
its upper half. 

EDAs proposed in [45.47, 126] use empirical his- 
tograms to model each variable as opposed to using 
a single normal distribution or a mixture of normal dis- 
tributions. In these approaches, a histogram for each 
single variable is constructed. New points are then gen- 
erated according to the distribution encoded by the 
histograms for all variables. The sampling of a his- 
togram proceeds by first selecting a particular bin based 
on its relative frequency, and then generating a ran- 
dom point from the interval corresponding to the bin. 
It is straightforward to replace the histograms in the 
above methods by various classification and discretiza- 
tion methods of statistics and machine learning (such as 
k-means clustering) [45.108]. 

Pelikan et al. [45.111,127] use an adaptive map- 
ping from the continuous domain to the discrete one 
in combination with discrete EDAs. The population 
of promising solutions is first discretized using equal- 
width histograms, equal-height histograms, k-means 
clustering, or other classification techniques. A popu- 
lation of promising discrete solutions is then selected. 
New points are created by applying a discrete recombi- 
nation operator to the selected population of promising 
discrete solutions. For example, new solutions can be 
generated by building and sampling a Bayesian net- 
work like in BOA. The resulting discrete solutions are 
then mapped back into the continuous domain by sam- 
pling each class (a bin or a cluster) using the original 
values of the variables in the selected population of con- 
tinuous solutions (before discretization). The resulting 
solutions are perturbed using one of the adaptive mu- 
tation operators of evolution strategies [45.114-117]. 
In this way, competent discrete EDAs can be com- 


bined with advanced methods based on adaptive local 
search in the continuous domain. A related approach 
was proposed by Chen and Chen [45.109], who pro- 
pose a split-on-demand adaptive discretization method 
to use in combination with ECGA and report promis- 
ing results on several benchmarks and one real-world 
problem. 

The mixed Bayesian optimization algorithm 
(mBOA) developed by Ocenasek and Schwarz [45.56] 
models vectors of real-valued variables using an 
extension of Bayesian networks with local structures. 
A model used in mBOA consists of a decision tree for 
each variable. Each internal node in the decision tree 
for a variable is a test on the value of another variable. 
Each test on a variable is specified by a particular value, 
which is also included in the node. The test considers 
two cases: the value of the variable is greater or equal 
than the value in the node or it is smaller. Each internal 
node has two children, each child corresponding to 
one of the two results of the test specified in this node. 
Leaves in a decision tree thus correspond to rectangular 
regions in the search space. For each leaf, the decision 
tree for the variable specifies a single-variable mixture 
of normal distributions centered around the values 
of this variable in the solutions consistent with the 
path to the leaf. Thus, for each variable, the model in 
mBOA divides the space reduced to other variables 
into rectangular regions, and it uses a single-variable 
normal kernels distribution to model the variable in 
each region. The adaptive variant of mBOA (am- 
BOA) [45.128] extends mBOA by employing variance 
adaptation with the goal of maximizing effective- 
ness of the search for the optimum on real-valued 
problems. 


45.3.3 EDAs for Genetic Programming 


In genetic programming [45.129], the task is to solve 
optimization problems with candidate solutions repre- 
sented by labeled trees that encode computer programs 
or symbolic expressions. Internal nodes of a tree repre- 
sent functions or commands; leaves represent functions 
with no arguments, variables, and constants. There are 
two key challenges that one must deal with when apply- 
ing EDAs to genetic programming. Firstly, the length 
of programs is expected to vary and it is difficult to es- 
timate how large the solution will be without solving 
the problem first. Secondly, small changes in parent- 
child relationships often lead to large changes in the 
performance of a candidate solution, and often the re- 
lationship between nodes in the program trees is more 


913 


€°S |3 Hed 


914 PartE 


Evolutionary Computation 


ESH |3 Hed 


important than their actual position. Despite these chal- 
lenges, even in this problem domain, EDAs have been 
quite successful. In this section, we briefly outline some 
EDAs for genetic programming. 

The probabilistic incremental program evolution 
(PIPE) algorithm [45.130,131] uses a probabilistic 
model in the form of a tree of a specified maximum al- 
lowable size. Nodes in the model specify probabilities 
of functions and terminals. PIPE does not capture any 
interactions between the nodes in the model. The model 
is updated by adjusting the probabilities based on the 
population of selected solutions using an update rule 
similar to the one in PBIL [45.1]. New program trees 
are generated in a top-down fashion starting in the root 
and continuing to lower levels of the tree. More specif- 
ically, if the model generates a function in a node and 
that function requires additional arguments, the succes- 
sors (children) of the node are generated to form the 
arguments of the function. If a terminal is generated, the 
generation along this path terminates. An extension of 
PIPE named hierarchical probabilistic incremental pro- 
gram evolution (H-PIPE) was later proposed [45.132]. 
In H-PIPE, nodes of a model are allowed to contain sub- 
routines, and both the subroutines as well as the overall 
program are evolved. 

Handley [45.133] used tree probabilistic models to 
represent populations of programs (trees) in genetic 
programming. Although the goal of this work was to 
compress the population of computer programs in ge- 
netic programming, Handley’s approach can be used 
within the EDA framework to model and sample can- 
didate solutions represented by computer programs or 
symbolic expressions. A similar model was used in es- 
timation of distribution programming (EDP) [45.134], 
which extended PIPE by employing parent-child de- 
pendencies in candidate labeled trees. Specifically, in 
EDP the content of each node is conditioned on the 
node’s parent. 

The extended compact genetic programming 
(ECGP) [45.135] assumes a maximum tree of max- 
imum branching like PIPE. Nonetheless, ECGP 
uses an MPM which partitions nodes into clusters 
of strongly correlated nodes. This allows ECGP to 
capture and exploit interactions between nodes in 
program trees, and solve problems that are difficult for 
conventional genetic programming and PIPE. There 
are four main characteristics that distinguish ECGP and 
EDP. ECGP is able to capture dependencies between 
more than two nodes, it learns the dependency structure 
based on the promising candidate trees, and it is not 
restricted to the dependencies between parents and 


their children. On the other hand, ECGP is somewhat 
limited in its ability to efficiently encode long-range 
interactions compared to probabilistic models that 
do not assume that groups of variables must be fully 
independent of each other. 

Looks etal. [45.136] proposed to use Bayesian 
networks to model and sample program trees. Com- 
binatory logic is used to represent program trees in 
a unified manner. Program trees translated with combi- 
natory logic are then modeled with Bayesian networks 
of BOA, EBNA, and LFDA. Contrary to most other 
EDAs for genetic programming presented in this sec- 
tion, in the approach of Looks et al. the size of computer 
programs is not limited, but solutions are allowed to 
grow over time. Looks later developed a more power- 
ful framework for competent program evolution using 
EDAs, which was named meta-optimizing semantic 
evolutionary search (MOSES) [45.54, 137, 138]. The 
key facets of MOSES include the division of the pop- 
ulation into demes, the reduction of the problem of 
evolving computer programs to the one of building 
a representation with tunable features (knobs), and the 
use of hierarchical BOA [45.44] or another competent 
evolutionary algorithm to model demes and sample new 
candidate program solutions. 

Several EDAs for genetic programming used proba- 
bilistic models based on grammar rules [45.51, 52, 139, 
140]. Most grammar-based EDAs for genetic program- 
ming use a context-free grammar. The stochastic gram- 
mar-based genetic programming (SG-GP) [45.140, 
140] started with a fixed context-free grammar with 
a default probability for each rule; the probabilities at- 
tached to the different rules were gradually adjusted 
based on the best candidate programs. The program 
evolution with explicit learning (PEEL) [45.139] used 
a probabilistic L-system with rules applicable at spe- 
cific depths and locations; the probabilities of the rules 
were adapted using a variant of ant colony optimiza- 
tion (ACO) [45.141]. Another grammar-based EDA for 
genetic programming was proposed by Bosman and de 
Jong [45.51], who used a context-free grammar that 
was initialized to a minimum stochastic context-free 
grammar and adjusted to better fit promising candidate 
solutions by expanding rules and incorporating depth 
information into the rules. Grammar model-based pro- 
gram evolution (GMPE) [45.52, 142] also uses a proba- 
bilistic context-free grammar. In GMPE, new rules are 
allowed to be created and old rules may be eliminated 
from the model. A variant of the minimum-message- 
length metric is used in GMPE to compare grammars 
according to their quality. Tanev [45.143] incorporated 


Estimation of Distribution Algorithms | 45.3 Overview of EDAs 915 


stochastic context-sensitive grammars into the gram- 
mar-guided genetic programming [45.144-146]. 


45.3.4 EDAs for Permutation Problems 


In many problems, candidate solutions are most nat- 
urally represented by permutations. This is the case, 
for example, in many scheduling or facility location 
problems. These types of problems often contain two 
specific types of features or constraints that EDAs need 
to capture. The first is the absolute position of a symbol 
in a string and the second is the relative ordering of spe- 
cific symbols. In some problems, such as the traveling- 
salesman problem, relative ordering constraints matter 
the most. In others, such as the QAP, both the relative 
ordering and the absolute positions matter. 

One approach to permutation problems is to apply 
an EDA for problems not involving permutations in 
combination with a mapping function between the EDA 
representation and the admissible permutations. For ex- 
ample, one may use the random key encoding [45.147] 
to transfer the problem of finding a good permutation 
into the problem of finding a high-quality real-valued 
vector, allowing the use of EDAs for optimization of 
real-valued vectors in solving permutation-based prob- 
lems [45.148, 149]. Random key encoding represents 
a permutation as a vector of real numbers. The permu- 
tation is defined by the reordering of the values in the 
vector that sorts the values in ascending order. The main 
advantage of using random keys is that any real-valued 
vector defines a valid permutation and any EDA capable 
of solving problems defined on vectors of real numbers 
can thus be used to solve permutation problems. How- 
ever, since EDAs do not process the aforementioned 
types of regularities in permutation problems directly 
their performance can often be poor [45.148, 150]. That 
is why several EDAs were developed that aim to encode 
either type of constraints for permutation problems ex- 
plicitly. 

To solve problems where candidate solutions are 
permutations of a string, Bengoetxea et al. [45.151] 
start with a Bayesian network model built using the 
same approach as in EBNA [45.41]. However, the sam- 
pling method is changed to ensure that only valid 
permutations are generated. This approach was shown 
to have promise in solving the inexact graph match- 
ing problem. In much the same way, the dependency- 
tree EDA (dtEDA) of Pelikan et al. [45.152] starts with 
a dependency-tree model [45.35,71] and modifies the 
sampling to ensure that only valid permutations are 


generated. dtEDA for permutation problems was used 
to solve structured QAPs with great success [45.152]. 
Bayesian networks and tree models are capable of 
encoding both the absolute position and the relative or- 
dering constraints, although for some problem types, 
such models may turn out to be rather inefficient. 

Bosman and Thierens [45.148] extended the real- 
valued EDA to the permutation domain by storing the 
dependencies between different positions in a permu- 
tation in the induced chromosome element exchanger 
(ICE). ICE works by first using a real-valued EDA, 
which encodes permutations as real-valued vectors us- 
ing the random keys encoding. ICE extends the real- 
valued EDA by using a specialized crossover operator. 
By applying the crossover directly to permutations in- 
stead of simply sampling the model, relative ordering is 
taken into account. The resulting algorithm was shown 
to outperform many real-valued EDAs that use the ran- 
dom key encoding alone [45.148]. 

The edge-histogram-based sampling algorithm 
(EHBSA) [45.63, 153] works by creating an edge his- 
togram matrix (EHM). For each pair of symbols, EHM 
stores the probabilities that one of these symbols will 
follow the other one in a permutation. To generate new 
solutions, EHBSA starts with a randomly chosen sym- 
bol. EHM is then sampled repeatedly to generate new 
symbols in the solution, normalizing the probabilities 
based on what values have already been generated. 
EHM does not take into account absolute positions 
at all; in order to address problems in which abso- 
lute positions are important, EHBSA was extended 
to use templates [45.153]. To generate new solutions, 
first a random string from the population was picked 
as a template. New solutions were then generated by 
removing random parts of the template string and gen- 
erating the missing parts by sampling from EHM. The 
resulting algorithm was shown to be better than most 
other EDAs on the traveling salesman problem. In 
another study, the node-histogram based sampling algo- 
rithm (NHBSA) was proposed by Tsutsui et al. [45.63], 
which used a model capable of storing node frequencies 
in each position (thereby encoding absolute position 
constraints) and also used a template. 

Zhang etal. [45.154-156] proposed to use 
guided mutation to optimize both permutation prob- 
lems [45.154] as well as graph problems [45.156]. In 
guided mutation, the parts of the solution that are to 
be modified using a stochastic neighborhood operator 
are identified by analyzing a probabilistic model of the 
population of promising candidate solutions. 


€°S |3 Hed 


916 Part E | Evolutionary Computation 


1S |3 Hed 


45.4 EDA Theory 


Along with the design and application of EDAs, the 
theoretical understanding of these algorithms has im- 
proved significantly since the first EDAs were pro- 
posed. One way to classify key areas of theoretical 
study of EDAs follows [45.3]: 


© Convergence proofs. Some of the most important 
results in EDA theory focus on the number of iter- 
ations of an EDA on a particular class of problems 
or the conditions that allow EDAs to provably con- 
verge to a global optimum. The convergence time 
(number of iterations until convergence) of UMDA 
on onemax for selection methods with fixed se- 
lection intensity was derived by Miihlenbein and 
Schlierkamp-Voosen [45.157]. The convergence of 
FDA on separable additively decomposable func- 
tions (ADFs) was explored by Miihlenbein and 
Mahnig [45.158], who developed an exact formula 
for convergence time when using fitness-propor- 
tionate selection. Since, in practice, fitness-propor- 
tionate selection is rarely used because of its sensi- 
tivity to some linear and many other transformations 
of the objective function, truncation selection was 
also examined and an equation was derived giv- 
ing the approximate time to convergence from the 
analysis of the onemax function. Later, Miihlenbein 
and Mahnig [45.37] adapted the theoretical model 
to the class of general ADFs where subproblems 
were allowed to interact. Under the assumption 
of Boltzmann selection, theory of graphical mod- 
els was used to derive sufficient conditions for an 
FDA model so that FDA with a large enough pop- 
ulation is guaranteed to converge to a model that 
generates only the global optima. Zhang [45.159] 
analyzed stability of fixed points of limit models of 
UMDA and FDA, and showed that at least for some 
problems the chance of converging to the global 
optimum is indeed increased when using higher or- 
der models of FDA rather than only the probability 
vector of UMDA. Convergence properties of PBIL 
were studied, for example, in [45.64, 160, 161]. 

@ Population sizing. The convergence proofs men- 
tioned above assumed infinite populations in order 
to simplify calculations. However, in practice us- 
ing an infinite population is not possible and the 
choice of an adequate population size is crucial, 
similarly as for other population-based evolutionary 
algorithms [45.162—165]. Using a population that is 
too small can lead to convergence to solutions of 


low quality and inability to reliably find the global 
optimum. On the other hand, using a population 
that is too large can lead to an increased complex- 
ity of building and sampling probabilistic models, 
evaluating populations, and executing other EDA 
components. Similar to genetic algorithms, EDAs 
must have a population size sufficiently large to pro- 
vide an adequate initial supply of partial solutions in 
an adequate problem decomposition [45.163, 166] 
and to ensure that good decisions are made between 
competing partial solutions [45.165]. However, the 
population must also be large enough for EDAs 
to make good decisions about the presence or the 
absence of statistically significant variable interac- 
tions. To examine this topic, Pelikan et al. [45.166] 
analyzed the population size required for BOA to 
solve decomposable problems of bounded difficulty 
with uniformly and nonuniformly scaled subprob- 
lems. The results showed that the population sizes 
required grew nearly linearly with the number of 
subproblems (or problem size). The results also 
showed that the approximate number of evaluations 
grew subquadratically for uniformly scaled sub- 
problems but was quadratic on some nonuniformly 
scaled subproblems. Yu et al. [45.167] refined the 
model of Pelikan et al. [45.166] to provide a more 
accurate bound for the adequate population size in 
multivariate entropy-based EDAs such as ECGA 
and BOA, and also examined the effects of the se- 
lection pressure on the population size. Population 
sizing was also empirically analyzed in FDA by 
Miihlenbein [45.168]. 

Diversity loss. Stochastic errors in sampling can 
lead to a loss of diversity that may sometimes ham- 
per EDA performance. Shapiro [45.169] examined 
the susceptibility of UMDA to diversity loss and 
discussed how it is necessary to set the learning 
parameters in such a way that this does not hap- 
pen. Bosman et al. [45.170] examined diversity loss 
in EDAs for solving real-valued problems and the 
approaches to alleviating this difficulty. The re- 
sults showed that due to diversity loss some of 
the state-of-the-art EDAs for real-valued problems 
could still fail on slope-like regions in the search 
space. The authors proposed using anticipated mean 
shift (AMS) to shift the mean of new solutions each 
generation in order to effectively maintain diversity. 
Memory complexity. Another factor of importance 
in EDA problem solving is the memory required to 


Estimation of Distribution Algorithms 


45.5 Efficiency Enhancement Techniques for EDAs 


solve the problem. Gao and Culberson [45.171] ex- 
amined the space complexity of the FDA and BOA 
on additively decomposable functions where over- 
lap was allowed between subfunctions. Gao and 
Culberson [45.171] proved that the space complex- 
ity of FDA and BOA is exponential in the problem 
size even with very sparse interaction between vari- 
ables. While these results are somewhat negative, 
the authors point out that this only shows that EDAs 
have limitations and work best when the interaction 
structure is of bounded size. Note that one way to 
reduce the memory complexity of EDAs is to use in- 
cremental EDAs, such as PBIL [45.1], cGA [45.34] 
or the incremental Bayesian optimization algorithm 
(iBOA) [45.92]. 

@ Model accuracy. A number of studies exam- 
ined the accuracy of models in EDAs. Hauschild 
et al. [45.172] analyzed the models generated by 
hBOA when solving concatenated traps, random ad- 
ditively decomposable problems, hierarchical traps 
and two-dimensional Ising spin glasses. The mod- 
els generated were then compared to the underlying 
problem structure by analyzing the number of spu- 
rious and correct dependencies. The results showed 
that the models corresponded closely to the struc- 
ture of the underlying problems and that the models 
did not change significantly between consequent 
iterations of hBOA. The relationship between the 
probabilistic models learned by BOA and the under- 
lying problem structure was also explored by Lima 
et al. [45.173]. One of the most important contribu- 


tions of this study was to demonstrate the dramatic 
effect that selection has on spurious dependencies. 
The results showed that model accuracy was signif- 
icantly improved when using truncation selection 
compared to tournament selection. Motivated by 
these results, the authors modified the complex- 
ity penalty of BOA model building to take into 
account tournament sizes when using binary tour- 
nament selection. Echegoyen etal. [45.174] also 
analyzed the structural accuracy of the models us- 
ing EBNA on concatenated traps, two variants of 
Ising spin glass and MAXSAT. In this work, two 
variations of EBNA were compared, one that was 
given the complete model structure based on the 
underlying problem and another that learned the 
approximate structure. The authors then examined 
the probability at any generation that the models 
would generate the optimal solution. The results 
showed that it was not strictly necessary to have 
all the interactions that were in the complete model 
in order to solve the problems. Finally, the effects 
of spurious linkages on EDA performance were 
examined by Radetic and Pelikan [45.175]. The au- 
thors started by proposing a theoretical model to 
describe the effects of spurious (unnecessary) de- 
pendencies on the population sizing of EDAs. This 
model was then tested empirically on onemax and 
the results showed that while it would be expected 
that spurious dependencies would have little effect 
on population size, when niching was included the 
effects were substantial. 


45.5 Efficiency Enhancement Techniques for EDAs 


EDAs can solve many classes of important problems 
in a robust and scalable manner, oftentimes requiring 
only a low-order polynomial growth of the number of 
function evaluations with respect to the number of de- 
cision variables [45.4,5, 8,22, 74, 166, 176]. However, 
even a low-order polynomial complexity is sometimes 
insufficient for practical application of EDAs especially 
when the number of decision variables is extremely 
large, when evaluation of candidate solutions is compu- 
tationally expensive, or when there are many conflicting 
objectives to optimize. The good news is that a number 
of approaches exist that can be used to further en- 
hance efficiency of EDAs. Some of these techniques can 
be adopted from genetic and evolutionary algorithms 
with little or no change. However, some techniques 
are directly targeted at EDAs because these techniques 


exploit some of the unique advantages of EDAs over 
most other metaheuristics. Specifically, some efficiency 
enhancements capitalize on the facts that the use of 
probabilistic models in EDAs provides a rigorous and 
flexible framework for incorporating prior knowledge 
about the problem into optimization, and that EDAs 
provide practitioners with a series of probabilistic mod- 
els that reveal a lot of information about the problem. 
This section reviews some of the most important ef- 
ficiency enhancement techniques for EDAs with main 
focus on techniques developed specifically for EDAs. 


45.5.1 Parallelization 


One of the most straightforward approaches to speed- 
ing up any algorithm is to distribute the computation 


917 


S'S | J Hed 


918 PartE 


Evolutionary Computation 


S'S | J Hed 


over a number of computational nodes so that several 
computational tasks can be executed in parallel. There 
are two main bottlenecks of EDAs that are typically 
addressed by parallelization: (1) fitness evaluation, and 
(2) model building and sampling. If fitness evaluation is 
computationally expensive, a master-slave architecture 
can be used for distributing fitness evaluations and col- 
lecting the results [45.177]. If most computational time 
is spent in model building and sampling, model building 
and sampling should be parallelized [45.4, 178, 179]. 

Many parallelization techniques and much of the 
theory can be adopted from research on parallelization 
in genetic and evolutionary algorithms [45.177]. In the 
context of EDAs, parallelization of model building was 
discussed, for example, by Ocenasek et al. [45.178— 
181] who proposed the parallel BOA, mBOA and 
hBOA, and by Larrañaga and Lozano [45.4] who 
parallelized model building in EBNA. One of the 
most impressive results in parallelization of EDAs was 
published by Sastry et al. [45.14, 182] who proposed 
a highly efficient, fully parallelized implementation of 
cGA to solve large-scale problems with millions to bil- 
lions of variables even with a substantial amount of 
external noise in the objective function. 


45.5.2 Hybridization 


An optimization hybrid combines two or more opti- 
mizers in a single procedure [45.183—185]. Typically, 
a global procedure and a local procedure are combined; 
the global procedure is expected to find promising re- 
gions and the local procedure is expected to find local 
optima quickly within reasonable basins of attraction. 
Global and local search are used in concert to find good 
solutions faster and more reliably than would be possi- 
ble using either procedure alone. 

Numerous studies have proposed to combine EDAs 
with variants of local search both in the discrete 
domain [45.22,99,186] and in the real-valued do- 
main [45.187]. The main reason for combining EDAs 
with local search is that by reducing the search space 
to the local optima, the structure of the problem can 
be identified more easily and the population-sizing re- 
quirements can be significantly decreased [45.22, 99]. 
Furthermore, the search reduces to the space of basins 
of attraction around each local optimum as opposed to 
the space of all admissible solutions. 

However, hybridization of EDAs is not restricted to 
the combination of an EDA with simple local search. 
As was already pointed out, probabilistic models often 
contain a lot of information about the problem. By min- 


ing these models for information about the structure and 
other properties of the problem landscape, decisions 
can be made about the nature and likely effectiveness 
of particular local search procedures and appropriate 
neighborhood structures for those procedures [45.188- 
192]. In turn, subsequent local search as well as the 
coordination of the global and local search in a hybrid 
can be managed so that excellent solutions are found 
quickly, reliably and accurately. 

There are two main approaches to the design of 
EDA-based (model-directed) hybrids with advanced 
neighborhoods: (1) Belief propagation, which uses the 
probabilistic model to generate the maximum likely 
instance [45.189-191] and (2) local search with an ad- 
vanced neighborhood structure derived from an EDA 
model [45.188, 192]. However, it is important to note 
that the use of EDA models is not limited to advanced 
neighborhood structures or belief propagation, and one 
may envision the use of probabilistic models to control 
the division of time resources between the global and 
local searcher and in a number of other tasks. 

Local search based on advanced neighborhood 
structures in a hill-climbing like procedure [45.193, 
194] is strongly related to model-directed hybridization 
using EDAs, although in this approach no estimation 
of distributions takes place. The basic idea is to use 
a linkage learning approach to detect important inter- 
actions between problem variables, and then run a local 
search based on a neighborhood defined by the under- 
lying problem decomposition. 


45.5.3 Time Continuation 


To achieve the same solution quality, one may run an 
EDA or another population-based metaheuristic with 
a large population for one convergence epoch, or run the 
algorithm with a small population for a large number 
of convergence epochs with controlled restarts between 
these epochs [45.195]. Similar tradeoffs are involved 
in the design of efficient and reliable hybrid proce- 
dures where an appropriate division of computational 
resources between the component algorithms is critical. 
The term time continuation is used to refer to the trade- 
offs involved [45.196]. 

Two important studies related to time continua- 
tion in EDAs were published by Sastry and Gold- 
berg (45.197, 198]. Based on a theoretical model of an 
ECGA-based hybrid, Sastry showed that under certain 
assumptions, the neighborhoods created from EDA- 
built models provide sufficient information for local 
search to succeed on its own even on classes of prob- 


Estimation of Distribution Algorithms 


45.5 Efficiency Enhancement Techniques for EDAs 


lems for which local search with standard neighbor- 
hoods performs poorly. However, in many other cases, 
EDA-driven search in a hybrid with local search based 
on the adaptive neighborhood should perform better, es- 
pecially if the structure of the problem is complex and 
the problem is affected by external noise. 

One of the promising research directions related 
to time continuation in EDAs is to mine probabilistic 
models discovered by EDAs to find an optimal way 
to exploit time continuation tradeoffs, be it in an EDA 
alone or in an EDA-based hybrid. 


45.5.4 Using Prior Knowledge and Learning 
from Experience 


The use of prior knowledge has had longstanding study 
and use in optimization. For example, promising partial 
solutions may be used to bias the initial population of 
candidate solutions, specialized search operators can be 
designed to solve a particular class of problems, or rep- 
resentations can be biased in order to make the search 
for the optimum an easier task. However, one of the lim- 
itations of most of these approaches is that the prior 
knowledge must be incorporated by hand and the ap- 
proaches are limited to one specific problem domain. 
The use of probabilistic models provides EDAs with 
a unique framework for incorporating prior knowledge 
into optimization because of the possibility of using 
Bayesian statistics to combine prior knowledge with 
data in the learning of probabilistic models [45.23, 90, 
199-201]. Furthermore, the use of probabilistic models 
in EDAs provides a basis for learning from previ- 
ous runs in order to solve new problem instances of 
similar type with increased speed, accuracy, and reli- 
ability [45.22—24]. Practitioners can thus incorporate 
two sources of bias into EDAs: (1) prior knowledge and 
(2) information obtained from models from prior EDA 
runs on similar problems (or runs of some other algo- 
rithm); these two sources can of course be combined 
using Bayesian statistics or in another way [45.23, 
24, 90, 200]. Then, the bias can be incorporated into 
EDAs either by restricting the class of allowable mod- 
els [45.199] or by increasing scores of models that ap- 
pear to be more likely than others [45.23, 24, 90, 200]. 
For example, Hauschild et al. [45.24, 89] proposed 
to use a probability coincidence matrix to store prob- 
abilities of Bayesian-network dependencies between 
pairs of problem variables in prior hBOA runs and to 
bias the model building in hBOA on future problem in- 
stances of similar type using the matrix. Other related 


approaches were proposed [45.24, 200] that were based 
on combining a distance metric on problem variables 
and the pool of models obtained in previous runs on 
problems of similar type. The use of a distance metric 
in combination with prior EDA runs is somewhat more 
broadly applicable and promises to be more useful for 
practitioners. One of the main reasons for this is that 
this approach allows the use of bias derived from prior 
runs on problems of smaller size to bias optimization of 
larger problems. Furthermore, the approach is applica- 
ble even in cases where the meaning of a variable and its 
context change significantly from one problem instance 
to another. 


45.5.5 Fitness Evaluation Relaxation 


To reduce the number of the objective function eval- 
uations, a model of the objective function can be 
built [45.202—204]. While models of the objective func- 
tion can be created for any optimization method, EDAs 
enable the use of probabilistic models for creating 
relatively complex computational models of the prob- 
lem in a fully automated manner. Specifically, if an 
advanced EDA is used that contains a complex prob- 
abilistic model, the model can be mined to provide a set 
of statistics that can be estimated for an accurate, ef- 
ficient computational model of the objective function. 
The model can then used to replace some of the eval- 
uations, possibly most of them. It was shown that the 
use of adequate models of the objective function can 
yield multiplicative speedups of several tens [45.202- 
204]. 


45.5.6 Incremental and Sporadic Model 
Building 


With sporadic model-building, the probabilities (pa- 
rameters) of the model are updated in every itera- 
tion, but the structure of the probabilistic model is 
rebuilt only once in every few iterations (genera- 
tions) [45.205]. Sporadic model building was shown 
to yield significant speedups that increased with prob- 
lem size, mainly because building model structure 
is the most computationally expensive part of model 
building but model structure often changes only little 
between consequent iterations of an EDA. With incre- 
mental model building, the model is built incrementally 
starting from the structure discovered in the previous 
iteration [45.41]. This can often reduce computational 
resources required to learn an accurate model. 


919 


S'S | J Hed 


920 PartE | Evolutionary Computation 


9°Sh | 3 Hed 


45.6 Starting Points for Obtaining Additional Information 


This section provides pointers for obtaining additional 
information about EDAs. 


45.6.1 Introductory Books and Tutorials 


Numerous books and other publications exist that 
provide introduction to EDAs and additional starting 
points. The following list of references includes some 
of them: [45.2-5, 7, 8, 22, 206]. 


45.6.2 Software 


The following list includes some of the popular 
EDA implementations available online. These im- 
plementations should provide a good starting point 
for the interested reader. Entries in the list are or- 
dered alphabetically. Note that the list is not exhaus- 
tive: 


© Adapted maximum-likelihood Gaussian model it- 
erated density estimation evolutionary algorithm 
(AMaLGaM) [45.124]: http://homepages.cwi.nl/ 
~bosman/source_code.php 

© Bayesian optimization algorithm (BOA) [45.43]; 
BOA with decision graphs [45.55]; dependency- 
tree EDA [45.35]: http://medal-lab.org/ 

@ Demos of aggregation pheromone system 
(APS [45.207] and histogram-based EDAs for 
permutation-based problems (EHBSA) [45.63]: 
http://www.hannan-u.ac.jp/~tsutsui/research-e. 
html 

@ Distribution estimation using Markov random 
fields (DEUM) [45.96, 97]: http://sidshakya.com/ 
Downloads/Main.html 

@ Extended compact genetic algorithm [45.45], &- 
ary ECGA, BOA [45.43], BOA with decision 
trees/graphs [45.55], and others: http://illigal. 
org/ 

@ Mixed BOA (mBOA) [45.56], adaptive mBOA (am- 
BOA) [45.128]: http://jiri.ocenasek.com/ 

@ Probabilistic incremental program evolution (PIPE) 
[45.131]: ftp://ftp.idsia.ch/pub/rafal/ 

@ Real-coded BOA (rBOA) [45.49], multiobjective 
rBOA [45.208]: http://www.evolution.re.kr/ 

@ Regularity model based multiobjective EDA 
(RM-MEDA) [45.209]; hybrid of differential 
evolution and EDA [45.210]; model-based multiob- 
jective evolutionary algorithm (MMEA) [45.155], 


and others: http://cswww.essex.ac.uk/staff/qzhang/ 
mypublication.htm 


45.6.3 Journals 


The following journals are key venues for papers on 
EDAs and evolutionary computation, although papers 
on EDAs can be found in many other journals focusing 
on optimization, artificial intelligence, machine learn- 
ing, and applications: 


© Evolutionary Computation (MIT Press): 
http://www.mitpressjournals.org/loi/evco 

@ Evolutionary Intelligence (Springer): 
http://www.springer.com/engineering/journal/ 
12065 

© Genetic Programming and Evolvable Machines 
(Springer): 
http://www.springer.com/computer/ai/journal/ 
10710 

© JEEE Transactions on Evolutionary Computation 
(IEEE Press): 
http://ieeexplore.ieee.org/servlet/opac? 
punumber=4235 

© Natural Computing (Springer): 
http://www.springer.com/computer/ 
theoretical+computer+science/journal/1 1047 

@ Swarm and Evolutionary Computation (Elsevier): 
http://www.journals.elsevier.com/swarm-and- 
evolutionary-computation/ 


45.6.4 Conferences 


The following conferences provide the most important 
venues for publishing papers on EDAs and evolutionary 
computation, although similarly as for journals, papers 
on EDAs are often published in other venues: 


© ACM SIGEVO Genetic and Evolutionary Computa- 
tion Conference (GECCO) 

@ European Workshops on Applications of Evolution- 
ary Computation (EvoWorkshops) 

© JEEE Congress on Evolutionary Computation 
(CEC) 

@ Main European Events on Evolutionary Computa- 
tion (EvoStar) 

© Parallel Problem Solving in Nature (PPSN) 

© Simulated Evolution and Learning (SEAL) 


Estimation of Distribution Algorithms | References 


45.7 Summary and Conclusions 


EDAs are a class of stochastic optimization algorithms 
that have been gaining popularity due to their ability to 
solve a broad array of complex problems with excellent 
performance and scalability. Moreover, while many of 
these algorithms have been shown to perform well with 
little or no problem-specific information, such informa- 
tion can be used advantageously if available. 

EDAs have their roots in the fields of evolutionary 
computation and machine learning. From evolutionary 
computation, EDAs borrow the idea of using a pop- 
ulation of solutions that evolves through iterations of 
selection and variation. From machine learning, EDAs 
borrow the idea of learning models from data, and they 
use the resulting models to guide the search for better 
solutions. This approach is powerful especially because 
it allows the search algorithm to adapt to the problem 
being solved, giving EDAs the possibility of being an 
effective black-box search algorithm. Since most real- 
world problems have some sort of inherent structure (as 
opposed to being completely random), there is a hope 
that EDAs can learn such a structure, or at least parts of 
it, and put that knowledge to good use in searching for 
optima. 

Another key characteristic of EDAs, and one that 
sets them apart from other metaheuristics, lies in the 


References 


45.1 S. Baluja: Population-based incremental learn- 
ing: A method for integrating genetic search 
based function optimization and competitive 
learning, Tech. Rep. No. CMU-CS-94-163 (Carnegie 
Mellon, Pittsburgh 1994) 

45.2 J. Grahl, S. Minner, P. Bosman: Learning structure 
illuminates black boxes: An introduction into es- 
timation of distribution algorithms. In: Advances 
in Metaheuristics for Hard Optimization, ed. by 
Z. Michalewicz, P. Siarry (Springer, Berlin, Heidel- 
berg 2008) pp. 365-396 

45.3 M.W. Hauschild, M. Pelikan: An introduction and 
survey of estimation of distribution algorithms, 
Swarm Evol. Comput. 1(3), 111-128 (2011) 

45.4 P. Larrañaga, J.A. Lozano (Eds.): Estimation of Dis- 
tribution Algorithms: A New Tool for Evolutionary 
Computation (Kluwer Academic, Boston 2002) 

45.5 J.A. Lozano, P. Larrañaga, |. Inza, E. Ben- 
goetxea (Eds.): Towards a New Evolutionary 
Computation: Advances on Estimation of Distri- 
bution Algorithms (Springer, Berlin, Heidelberg 
2006) 


fact that the sequence of probabilistic models learned 
along a particular run (or a sequence or runs) yields 
important information that can be exploited for other 
means. For example, such information can be used 
for building surrogate models of the objective function 
leading to significant performance speedups, for de- 
signing effective neighborhoods for local search when 
conventional neighborhoods fail, and even for learning 
about characteristics of an entire class of problems that 
can in turn be used to solve other instances of the same 
problem class. 

This chapter gave an introduction and reviewed both 
the history and the state-of-the-art in EDA research. 
The basic concepts of these algorithms were presented 
and a taxonomy was outlined from the views based 
on the model decomposition and the type of local dis- 
tributions. The most popular EDAs proposed in the 
literature were then surveyed according to the most 
common representations for candidate solutions. Fi- 
nally, the major theoretical research areas and efficiency 
enhancement techniques for EDAs were highlighted. 
This chapter should be valuable both for those who 
want to grasp the basic ideas of EDAs as well as 
for those who want to have a coherent view of EDA 
research. 


45.6 H. Mühlenbein, G. PaaßB: From recombination of 
genes to the estimation of distributions l. Binary 
parameters, Lect. Notes Comput. Sci. 1141, 178-187 
(1996) 

45.7 M. Pelikan, D.E. Goldberg, F. Lobo: A survey of 
optimization by building and using probabilistic 
models, Comput. Optim. Appl. 21(1), 5-20 (2002) 

45.8 M. Pelikan, K. Sastry, E. Cantu-Paz (Eds.): Scalable 
Optimization via Probabilistic Modeling: From Al- 
gorithms to Applications (Springer, Berlin, Hei- 
delberg 2006) 

45.9 R. Armañanzas, Y. Saeys, |. Inza, M. Garcia-Torres, 
C. Bielza, Y.V. de Peer, P. Larrañaga: Peakbin selec- 
tion in mass spectrometry data using a consensus 
approach with estimation of distribution algo- 
rithms, IEEE/ACM Trans. Comput. Biol. Bioinform. 
8(3), 760-774 (2011) 

45.10 J. Bacardit, M. Stout, J.D. Hirst, K. Sastry, X. Llora, 
N. Krasnogor: Automated alphabet reduction 
method with evolutionary algorithms for protein 
structure prediction, Genet. Evol. Comput. Conf. 
(2007) pp. 346-353 


921 


Sh | 3 Hed 


922 


Sh | J Hed 


Part E 


Evolutionary Computation 


45.11 


45.12 


45.13 


45.14 


45.15 


45.16 


45.17 


45.18 


45.19 


45.20 


45.21 


45.22 


45.23 


45.24 


|. Belda, S. Madurga, X. Llorà, M. Martinell, T. Tar- 
ragó, M.G. Piqueras, E. Nicolás, E. Giralt: ENPDA: 
An evolutionary structure-based de novo pep- 
tide design algorithm, J. Comput. Aided Mol. Des. 
19(8), 585-601 (2005) 

Y. Chen, T.L. Yu, K. Sastry, D.E. Goldberg: A sur- 
vey of genetic linkage learning techniques. IIIiGAL 
Rep. No. 2007014 (University of Illinois, Urbana 
2007) 

E. Ducheyne, B. De Baets, R. De Wulf: Probabilistic 
models for linkage learning in forest manage- 
ment. In: Knowledge Incorporation in Evolution- 
ary Computation, ed. by Y. Jin (Springer, Berlin, 
Heidelberg 2004) pp. 177-194 


D.E. Goldberg, K. Sastry, X. Llora: Toward 
routine  billion-variable optimization using 
genetic algorithms, Complexity 12(3), 27-29 
(2007) 


P. Lipinski: ECGA vs. BOA in discovering stock mar- 
ket trading experts, Genet. Evol. Comput. Conf. 
(2007) pp. 531-538 

J.B. Kollat, P.M. Reed, J.R. Kasprzyk: A new 
epsilon-dominance hierarchical Bayesian op- 
timization algorithm for large multi-objective 
monitoring network design problems, Adv. Water 
Resour. 31(5), 828-845 (2008) 

P.M. Reed, R. Shah, J.B. Kollat: Assessing the 
value of environmental observations in a chang- 
ing world: Nonstationarity, complexity, and hi- 
erarchical dependencies, 5th Bienn. Meet. Int. 
Congr. Environ. Model. Soft. Model. Environ. Sake 
(2010) 

R. Santana, P. Larrañaga, J.A. Lozano: Protein 
folding in simplified models with estimation of 
distribution algorithms, IEEE Trans. Evol. Comput. 
12(4), 418-438 (2008) 

S. Santarelli, T.L. Yu, D.E. Goldberg, E.E. Altshuler, 
T. O'Donnell, H. Southall, R. Mailloux: Military an- 
tenna design using simple and competent genetic 
algorithms, Math. Comput. Model. 43(9-10), 990- 
1022 (2006) 

R. Shah, P. Reed: Comparative analysis of mul- 
tiobjective evolutionary algorithms for random 
and correlated instances of multiobjective d- 
dimensional knapsack problems, Eur. J. Oper. Res. 
211(3), 466-479 (2011) 

J. Sun, Q. Zhang, J. Li, X. Yao: A hybrid EDA for 
CDMA cellular system design, Int. J. Comput. In- 
tell. Appl. 7(2), 187-200 (2007) 

M. Pelikan: Hierarchical Bayesian Optimization 
Algorithm: Toward a New Generation of Evolu- 
tionary Algorithms (Springer, Berlin, Heidelberg 
2005) 

M.W. Hauschild, M. Pelikan: Enhancing efficiency 
of hierarchical BOA via distance-based model re- 
strictions, Lect. Notes Comput. Sci. 5199, 417-427 
(2008) 

M.W. Hauschild, M. Pelikan, K. Sastry, D.E. Gold- 
berg: Using previous models to bias structural 


45.25 


45.26 


45.27 


45.28 


45.29 


45.30 


45.31 


45.32 


45.33 


45.34 


45.35 


45.36 


45.37 


45.38 


45.39 


45.40 


45.41 


learning in the hierarchical BOA, Evol. Comput. 
20(1), 135-160 (2012) 

E.L. Lawler: The quadratic assignment problem, 
Manag. Sci. 9(4), 586-599 (1963) 

D.E. Goldberg: Genetic Algorithms in Search, Opti- 
mization, and Machine Learning (Addison-Wes- 
ley, Reading 1989) 

J.H. Holland: Adaptation in Natural and Artificial 
Systems (University of Michigan, Ann Arbor 1975) 
V. Cerny: Thermodynamical approach to the trav- 
eling salesman problem: An efficient simulation 
algorithm, J. Optim. Theory Appl. 45, 41-51 (1985), 
10.1007/BF00940812 

S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi: Optimiza- 
tion by simulated annealing, Science 220, 671- 
680 (1983) 

E. Cantu-Paz: Comparing selection methods of 
evolutionary algorithms using the distribution of 
fitness, Tech. Rep. UCRL-JC-138582 (University of 
California, San Francisco 2000) 

D.E. Goldberg, K. Deb: A comparative analysis 
of selection schemes used in genetic algorithms, 
Found. Genet. Algorithms 1, 69-93 (1991) 

A.E. Eiben, J.E. Smith: Introduction to Evolution- 
ary Computing (Springer, Berlin, Heidelberg 2010) 
A. Juels, S. Baluja, A. Sinclair: The equilibrium 
genetic algorithm and the role of crossover, Un- 
published manuscript (1993) 

G.R. Harik, F.G. Lobo, D.E. Goldberg: The compact 
genetic algorithm, Int. Conf. Evol. Comput. (1998) 
pp. 523-528 

S. Baluja, S. Davies: Using optimal dependency- 
trees for combinatorial optimization: Learning 
the structure of the search space, Proc. Int. Conf. 
Mach. Learn. (1997) pp. 30-38 

J.S. De Bonet, C.L. Isbell, P. Viola: MIMIC: Finding 
optima by estimating probability densities, Adv. 
Neural Inf. Proc. Syst. 9, 424-431 (1997) 

H. Mühlenbein, T. Mahnig: FDA — A scalable evo- 
lutionary algorithm for the optimization of addi- 
tively decomposed functions, Evol. Comput. 7(4), 
353-376 (1999) 

S. Rudlof, M. Köppen: Stochastic hill climbing 
with learning by vectors of normal distribu- 
tions, Ist On-line Workshop Soft Comput. (Nagoya, 
Japan 1996) 

M. Sebag, A. Ducoulombier: Extending popula- 
tion-based incremental learning to continuous 
search spaces, Lect. Notes Comput. Sci. 1498, 418- 
427 (1998) 

M. Pelikan, H. Mihlenbein: The bivariate 
marginal distribution algorithm. In: Advances 
in Soft Computing—Engineering Design and 
Manufacturing, ed. by R. Roy, T. Furuhashi, 
P.K. Chawdhry (Springer, Berlin, Heidelberg 1999) 
pp. 521-535 

R. Etxeberria, P. Larrañaga: Global optimization 
using Bayesian networks, 2nd Symp. Artif. Intell. 
(1999) pp. 332-339 


Estimation of Distribution Algorithms 


References 


45.42 


45.43 


45.44 


45.45 


45.46 


45.47 


45.48 


45.49 


45. 


45. 


45. 


45. 


45. 


45. 


45. 


45. 


51 


52 


53 


54 


55 


56 


57 


M. Pelikan, D.E. Goldberg, E. Cantu-Paz: Linkage 
problem, distribution estimation, and Bayesian 
networks. IIIiGAL Rep. No. 98013 (University of IIli- 
nois, Urbana 1998) 

M. Pelikan, D.E. Goldberg, E. Cantu-Paz: BOA: The 
Bayesian optimization algorithm, Genet. Evol. 
Comput. Conf. (1999) pp. 525-532 

M. Pelikan, D.E. Goldberg: Escaping hierarchical 
traps with competent genetic algorithms, Genet. 
Evol. Comput. Conf. (2001) pp. 511-518 

G. Harik: Linkage learning via probabilistic mod- 
eling in the ECGA. IIIiGAL Rep. No. 99010 (Univer- 
sity of Illinois, Urbana 1999) 

M. Soto, A. Ochoa: A factorized distribution al- 
gorithm based on polytrees, IEEE Congr. Evol. 
Comput. (2000) pp. 232-237 

P.A.N. Bosman, D. Thierens: Continuous iter- 
ated density estimation evolutionary algorithms 
within the IDEA framework, Workshop Proc. 
Genet. Evol. Comput. Conf. (2000) pp. 197-200 

P. Larrañaga, R. Etxeberria, J.A. Lozano, J.M. Pena: 
Optimization in continuous domains by learning 
and simulation of Gaussian networks, Workshop 
Proc. Genet. Evol. Comput. Conf. (2000) pp. 201- 
204 

C.W. Ahn, R.S. Ramakrishna, D.E. Goldberg: Real- 
coded Bayesian optimization algorithm: Bringing 
the strength of BOA into the continuous world, 
Genet. Evol. Comput. Conf. (2004) pp. 840-851 
R.I. McKay, N.X. Hoai, P.A. Whigham, Y. Shan, 
M. O'Neill: Grammar-based genetic program- 
ming: A survey, Genet. Progr. Evol. Mach. 11(3-4), 
365-396 (2010) 

P.A.N. Bosman, E.D. de Jong: Learning probabilis- 
tic tree grammars for genetic programming, Lect. 
Notes Comput. Sci. 3242, 192-201 (2004) 

Y. Shan: Program Distribution Estimation with 
Grammar Models, Ph.D. Thesis (Wuhan Cehui 
Technical University, China 2005) 

Y. Hasegawa, H. Iba: Estimation of distribution al- 
gorithm based on probabilistic grammar with la- 
tent annotations, IEEE Congr. Evol. Comput. (2007) 
pp. 1043-1050 

M. Looks: Levels of abstraction in modeling and 
sampling: The feature-based Bayesian optimiza- 
tion algorithm, Genet. Evol. Comput. Conf. (2006) 
pp. 429-430 

M. Pelikan, D.E. Goldberg, K. Sastry: Bayesian 
optimization algorithm, decision graphs, and Oc- 
cam's razor, Genet. Evol. Comput. Conf. (2001) 
pp. 519-526 

J. Ocenasek, J. Schwarz: Estimation of distribution 
algorithm for mixed continuous-discrete opti- 
mization problems, 2nd Euro-Int. Symp. Comput. 
Intell. (2002) pp. 227-232 

P.A.N. Bosman: On empirical memory design, 
faster selection of Bayesian factorizations and 
parameter-free Gaussian EDAs, Genet. Evol. Com- 
put. Conf. (2009) pp. 389-396 


45.58 


45.59 


45.60 


45.61 


45.62 


45.63 


45.64 


45.65 


45.66 


45.67 


45.68 


45.69 


45.70 


45.71 


45.72 


45.73 


45.74 


45.75 


45.76 


M. Pelikan, D.E. Goldberg: Genetic algorithms, 
clustering, and the breaking of symmetry, Lect. 
Notes Comput. Sci. 1517, 385-394 (2000) 

D. Thierens, P.A.N. Bosman: Multi-objective 
mixture-based iterated density estimation evo- 
lutionary algorithms, Genet. Evol. Comput. Conf. 
(2001) pp. 663-670 

M. Pelikan, K. Sastry, D.E. Goldberg: Multiobjec- 
tive hBOA, clustering, and scalability, Genet. Evol. 
Comput. Conf. (2005) pp. 663-670 

S. Tsutsui, M. Pelikan, D.E. Goldberg: Evolutionary 
algorithm using marginal histogram models in 
continuous domain, Workshop Proc. Genet. Evol. 
Comput. Conf. (2001) pp. 230-233 

S. Tsutsui, M. Pelikan, D.E. Goldberg: Probabilis- 
tic model-building genetic algorithms using his- 
togram models in continuous domain, J. Inf. Pro- 
cess. Soc. Jpn. 43, 24-34 (2002) 

S. Tsutsui, M. Pelikan, D.E. Goldberg: Node his- 
togram vs. edge histogram: A comparison of 
pmbgas in permutation domains. MEDAL Rep. 
No. 2006009 (University of Missouri, St. Louis 
2006) 

V. Kvasnicka, M. Pelikan, J. Pospichal: Hill climb- 
ing with learning (An abstraction of genetic algo- 
rithm), Neural Netw. World 6, 773-796 (1996) 

H. Muihlenbein: The equation for response to se- 
lection and its use for prediction, Evol. Comput. 
5(3), 303-346 (1997) 

G.R. Harik, F.G. Lobo, D.E. Goldberg: The compact 
genetic algorithm, IEEE Trans. Evol. Comput. 3(4), 
287-297 (1999) 

D. Thierens: Analysis and design of genetic algo- 
rithms, Ph.D. Thesis (Katholieke Universiteit Leu- 
ven, Leuven 1995) 

D. Thierens: Scalability problems of simple genetic 
algorithms, Evol. Comput. 7(4), 331-352 (1999) 

S. Kullback, R.A. Leibler: On information and suf- 
ficiency, Ann. Math. Stats. 22, 79-86 (1951) 

R. Prim: Shortest connection networks and some 
generalizations, Bell Syst. Tech. J. 36, 1389-1401 
(1957) 

C. Chow, C. Liu: Approximating discrete probability 
distributions with dependence trees, IEEE Trans. 
Inf. Theory 14, 462-467 (1968) 

L.A. Marascuilo, M. McSweeney: Nonparametric 
and Distribution. Free Methods for the Social Sci- 
ences (Brooks/Cole, Monterey 1977) 

P.A.N. Bosman, D. Thierens: Linkage information 
processing in distribution estimation algorithms, 
Genet. Evol. Comput. Conf. (1999) pp. 60-67 

H. Mühlenbein, T. Mahnig, A.O. Rodriguez: 
Schemata, distributions and graphical models in 
evolutionary optimization, J. Heuristics 5, 215-247 
(1999) 

J.J. Rissanen: Modelling by shortest data descrip- 
tion, Automatica 14, 465-471 (1978) 

J.J. Rissanen: Stochastic Complexity in Statistical 
Inquiry (World Scientific, Singapore 1989) 


923 


Sh | J Hed 


924 Part E | Evolutionary Computation 
45.77 J.J. Rissanen: Fisher information and stochastic 45.93 G.R. Harik: Finding multimodal solutions using 
complexity, IEEE Trans. Inf. Theory 42(1), 40-47 restricted tournament selection, Int. Conf. Genet. 
(1996) Algorith. (1995) pp. 24-31 
45.78 G. Schwarz: Estimating the dimension ofa model, 45.94 R.A. Watson, G.S. Hornby, J.B. Pollack: Model- 
Ann. Stat. 6, 461-464 (1978) ing building-block interdependency, Lect. Notes 
45.79 K. Sastry, D.E. Goldberg: On extended compact Comput. Sci. 1498, 97-106 (1998) 
genetic algorithm. IIIiGAL Rep. No. 2000026 (Uni- 45.95 R. Santana: Estimation of distribution algorithms 
versity of Illinois, Urbana 2000) with Kikuchi approximations, Evol. Comput. 13(1), 
45.80 K. Sastry: Efficient atomic cluster optimization us- 67-97 (2005) 
ing a hybrid extended compact genetic algorithm 45.96 S. Shakya, A.E.I. Brownlee, J.A.W. McCall, 
with seeded population. IIliGAL Rep. No. 2001018 F.A. Fournier, G. Owusu: A fully multivariate 
(University of Illinois, Urbana 2001) DEUM algorithm, IEEE Congr. Evol. Comput. (2009) 
45.81 K. Sastry, D.E. Goldberg, D.D. Johnson: Scalability pp. 479-486 
of a hybrid extended compact genetic algorithm 45.97 S.K. Shakya: DEUM: A Framework for an Estima- 
for ground state optimization of clusters, Mater. tion of Distribution Algorithm based on Markov 
Manuf. Process. 22(5), 570-576 (2007) Random Fields, Ph.D. Thesis (Robert Gordon Uni- 
45.82 T.L. Yu, D.E. Goldberg, Y.P. Chen: A genetic algo- versity, Aberdeen 2006) 
rithm design inspired by organizational theory: 45.98 B.T. Zhang, S.Y. Shin: Bayesian evolutionary opti- 
A pilot study of a dependency structure matrix mization using Helmholtz machines, Lect. Notes 
driven genetic algorithm. IIliGAL Rep. No. 2003007 Comput. Sci. 1917, 827-836 (2000) 
(University of Illinois, Urbana 2003) 45.99 M. Pelikan, D.E. Goldberg: Hierarchical BOA solves 
45.83 T.L. Yu, D.E. Goldberg, K. Sastry, C.F. Lima, M. Pe- Ising spin glasses and maxsat, Gene. Evol. Com- 
= likan: Dependency structure matrix, genetic algo- put. Conf. (2003) pp. 1275-1286 
o rithms, and effective recombination, Evol. Com- 45.100 M. Pelikan, A.K. Hartmann: Searching for ground 
= put. 17(4), 595-626 (2009) states of Ising spin glasses with hierarchical BOA 
leal 45.84 T.L. Yu: A matrix approach for finding extrema: and cluster exact approximation. In: Scalable 0p- 
F Problems with Modularity, Hierarchy, and Over- timization via Probabilistic Modeling: From Al- 
oI lap, Ph.D. Thesis (University of Illinois at Urbana- gorithms to Applications, ed. by E. Cantú-Paz, 
Champaign, Urbana 2006) M. Pelikan, K. Sastry (Springer, Berlin, Heidelberg 
45.85 G.F. Cooper, E.H. Herskovits: A Bayesian method 2006) 
for the induction of probabilistic networks from 45.101 S.K. Shakya, J.A. McCall, D.F. Brown: Solving the 
data, Mach. Learn. 9, 309-347 (1992) Ising spin glass problem using a bivariate EDA 
45.86 D. Heckerman, D. Geiger, D. M. Chickering: based on Markov random fields, IEEE Congr. Evol. 
LearningBayesian networks: The combination Comput. (2006) pp. 908-915 
of knowledge and statistical data, Tech. Rep. 45.102 H. Mühlenbein, T. Mahnig: Evolutionary opti- 
MSR-TR-94-09 (Microsoft Research, Redmond mization and the estimation of search distribu- 
1994) tions with applications to graph bipartitioning, 
45.87 D. Heckerman, D. Geiger, D.M. Chickering: Learn- Int. J. Approx. Reason. 31(3), 157-192 (2002) 
ing bayesian networks: The combination of 45.103 J. Schwarz, J. Ocenasek: Experimental study: Hy- 
knowledge and statistical data, Mach. Learn. pergraph partitioning based on the simple and 
20(3), 197-243 (1995) advanced algorithms BMDA and BOA, Int. Conf. 
45.88 M. Pelikan, D. E. Goldberg: A comparative study Soft Comput. (1999) pp. 124-130 
of scoring metrics in the Bayesian optimiza- 45.104 F. Rothlauf, D.E. Goldberg, A. Heinzl: Bad cod- 
tion algorithm: Minimum description length ings and the utility of well-designed genetic 
and Bayesian-Dirichlet. Unpublished Tech. Rep. algorithms. IIIiGAL Rep. No. 200007 (University of 
(2000) Illinois, Urbana 2000) 
45.89 M.W. Hauschild, M. Pelikan: Intelligent bias of 45.105 J. Li, U. Aickelin: A Bayesian optimization al- 
network structures in the hierarchical BOA, Genet. gorithm for the nurse scheduling problem, IEEE 
Evol. Comput. Conf. (2009) pp. 413-420 Congr. Evol. Comput. (2003) pp. 2149-2156 
45.90 J. Schwarz, J. Ocenasek: A problem-knowledge 45.106 R. Arst, B.S. Minsker, D.E. Goldberg: Comparing 
based evolutionary algorithm KBOA for hyper- advanced genetic algorithms and simple genetic 
graph partitioning, Proc. 4th Jt. Conf. Knowl.- algorithms for groundwater management, Proc. 
Based Softw. Eng. (2000) pp. 51-58 Water Resour. Plan. Manag. Conf. (2002) 
45.91 S. Baluja, S. Davies: Fast probabilistic modeling 45.107 M.S. Hayes, B.S. Minsker: Evaluation of advanced 
for combinatorial optimization, Proc. 15th Natl. genetic algorithms applied to groundwater re- 
Conf. Artif. Intell. (1998) pp. 469-476 mediation design, Proc. World Water Environ. 
45.92 M. Pelikan, K. Sastry, D.E. Goldberg: iBOA: The Resour. Congr. 2005 (2005) 
incremental Bayesian optimization algorithm, 45.108 E. CantU-Paz: Supervised and unsupervised dis- 


Genet. Evol. Comput. Conf. (2008) pp. 455-462 


cretization methods for evolutionary algorithms, 


Estimation of Distribution Algorithms 


References 


45.109 


45.110 


45.111 


45.112 


45.113 


45.114 


45.115 


45.116 


45.117 


45.118 


45.119 


45.120 


45.121 


45.122 


45.123 


45.124 


Workshop Proc. Genet. Evol. Comput. Conf. (2001) 
pp. 213-216 

Y.P. Chen, C.H. Chen: Enabling the extended com- 
pact genetic algorithm for real-parameter opti- 
mization by using adaptive discretization, Evol. 
Comput. 18(2), 199-228 (2010) 

D.E. Goldberg: Real-coded genetic algorithms, 
virtual alphabets, and blocking, Complex Syst. 
5(2), 139-167 (1991) 

M. Pelikan, K. Sastry, S. Tsutsui: Getting the best 
of both worlds: Discrete and continuous genetic 
and evolutionary algorithms in concert, Inf. Sci. 
156(3-4), 147-171 (2003) 

M. Gallagher, M. Frean: Population-based con- 
tinuous optimization, probabilistic modelling 
and mean shift, Evol. Comput. 13(1), 29-42 (2005) 
M. Gallagher, M. Frean, T. Downs: Real-valued 
evolutionary optimization using a flexible prob- 
ability density estimator, Genet. Evol. Comput. 
Conf. (1999), pp. 840-846 13-17 

|. Rechenberg: Evolutionsstrategie: Optimierung 
technischer Systeme nach Prinzipien der biologis- 
chen Evolution (Frommann-Holzboog, Stuttgart 
1973) 

N. Hansen, A. Ostermeier, A. Gawelczyk: On the 
adaptation of arbitrary normal mutation distri- 
butions in evolution strategies: The generating 
set adaptation, Int. Conf. Genet. Algorithms (1995) 
pp. 57-64 

|. Rechenberg: Evolutionsstrategie '94 (From- 
mann-Holzboog, Stuttgart 1994) 

H.P. Schwefel: Numerische Optimierung von 
Computer-Modellen mittels der Evolutionsstrate- 
gie (Birkhauser, Basel, Switzerland 1977) 

P.A.N. Bosman, D. Thierens: Mixed IDEAs, Tech. 
Rep. UU-CS-2000-45 (Utrecht University, Utrecht 
2000) 

N. Khan, D.E. Goldberg, M. Pelikan: Multi- 
objective Bayesian optimization algorithm. IIliGAL 
Rep. No. 2002009 (University of Illinois, Urbana 
2002) 

M. Laumanns, J. Ocenasek: Bayesian optimization 
algorithms for multi-objective optimization, Lect. 
Notes Comput. Sci. 2433, 298-307 (2002) 

P.A.N. Bosman, D. Thierens: Exploiting gradi- 
ent information in continuous iterated density 
estimation evolutionary algorithms, Proc. Belg.- 
Neth. Conf. Artif. Intell. (2001) pp. 69-76 

P.A.N. Bosman, J. Grahl, F. Rothlauf: SDR: A bet- 
ter trigger for adaptive variance scaling in normal 
EDAs, Genet. Evol. Comput. Conf. (2007) pp. 492- 
499 

P. A. N. Bosman, J. Grahl, D. Thierens: AMaLGaM 
IDEAs in noiseless black-box optimization bench- 
marking. Black Box Optim. Benchmarking BBOB 
Workshop Genet. Evol. Comput. Conf., GECCO- 
2009 (2009) pp. 2247-2254 

P.A.N. Bosman, J. Grahl, D. Thierens: AMaLGaM 
IDEAs in noisy black-box optimization bench- 


45.125 


45.126 


45.127 


45.128 


45.129 


45.130 


45.131 


45.132 


45.133 


45.134 


45.135 


45.136 


45.137 


45.138 


45.139 


45.140 


marking, Workshop Genet. Evol. Comput. Conf. 
(2009) pp. 2351-2358 

|. Servet, L. Trave-Massuyes, D. Stern: Telephone 
network traffic overloading diagnosis and evolu- 
tionary computation techniques, Proc. Eur. Conf. 
Artif. Evol. (1997) pp. 137-144 

S. Tsutsui, M. Pelikan, D.E. Goldberg: Proba- 
bilistic model-building genetic algorithm using 
marginal histogram models in continuous do- 
main, Knowl.-Based Intell. Inf. Eng. Syst. Allied 
Thech. (2001) pp. 112-121 

M. Pelikan, D.E. Goldberg, S. Tsutsui: Combining 
the strengths of the Bayesian optimization algo- 
rithm and adaptive evolution strategies, Genet. 
Evol. Comput. Conf. (2002) pp. 512-519 

J. Ocenasek, S. Kern, N. Hansen, P. Koumout- 
sakos: A mixed Bayesian optimization algorithm 
with variance adaptation, Lect. Notes Comput. 
Sci. 3242, 352-361 (2004) 

J.R. Koza: Genetic programming: On the Pro- 
gramming of Computers by Means of Natural 
Selection (MIT, Cambridge 1992) 

R.P. Salustowicz, J. Schmidhuber: Probabilistic in- 
cremental program evolution, Evol. Comput. 5(2), 
123-141 (1997) 

R.P. Salustowicz, J. Schmidhuber: Probabilistic in- 
cremental program evolution: Stochastic search 
through program space, Proc. Eur. Conf. Mach. 
Learn. (1997) pp. 213-220 

R. Salustowicz, J. Schmidhuber: H-PIPE: Facil- 
itating hierarchical program evolution through 
skip nodes, Tech. Rep. IDSIA-08-98 (IDSIA, Lugano 
1998) 

S. Handley: On the use of a directed acyclic graph 
to represent a population of computer programs, 
Int. Conf. Evol. Comput. (1994) pp. 154-159 

K. Yanai, H. Iba: Estimation of distribution pro- 
gramming based on Bayesian network, IEEE 
Congr. Evol. Comput. (2003) pp. 1618-1625 

K. Sastry, D.E. Goldberg: Probabilistic model 
building and competent genetic programming. 
In: Genetic Programming Theory and Practise, ed. 
by R.L. Riolo, B. Worzel (Kluwer Acadamic, Boston 
2003) pp. 205-220 

M. Looks, B. Goertzel, C. Pennachin: Learning 
computer programs with the Bayesian optimiza- 
tion algorithm, Genet. Evol. Comput. Conf. (2005) 
pp. 747-748 

M. Looks: Competent Program Evolution, Ph.D. 
Thesis (Washington University, St. Louis 2006) 

M. Looks: Scalable estimation-of-distribution 
program evolution, Genet. Evol. Comput. Conf. 
(2007) pp. 539-546 

Y. Shan, R. McKay, H.A. Abbass, D. Essam: Program 
evolution with explicit learning: A new frame- 
work for program automatic synthesis, IEEE Congr. 
Evol. Comput. (2003) pp. 1639-1646 

A. Ratle, M. Sebag: Avoiding the bloat with 
probabilistic grammar-guided genetic program- 


925 


Sh | 3 Hed 


926 Part E | Evolutionary Computation 
ming, 5th Int. Conf. Evol. Artif. (2001) pp. 255- 45.156 Q. Zhang, J. Sun, E.P.K. Tsang: An evolutionary al- 
266 gorithm with guided mutation for the maximum 
45.141 M. Dorigo, G.D. Caro, L.M. Gambardella: Ant al- clique problem, IEEE Trans. Evol. Comput. 9(2), 
gorithms for discrete optimization, Artif. Life 5(2), 192-200 (2005) 
137-172 (1999) 45.157 H. Mühlenbein, D. Schlierkamp-Voosen: Predic- 
45.142 Y. Shan, R.I. McKay, R. Baxter: Grammar model- tive models for the breeder genetic algorithm |. 
based program evolution, IEEE Congr. Evol. Com- Continuous parameter optimization, Evol. Com- 
put. (2004) pp. 478-485 put. 1(1), 25-49 (1993) 
45.143 |. Tanev: Incorporating learning probabilistic 45.158 H. Miihlenbein, T. Mahnig: Convergence theory 
context-sensitive grammar in genetic program- and applications of the factorized distribution al- 
ming for efficient evolution and adaptation of gorithm, J. Comput. Inf. Tech. 7(1), 19-32 (1998) 
snakebot, Proc. 8th Eur. Conf. Genet. Progr. (2005) 45.159 Q. Zhang: On stability of fixed points of limit 
pp. 155-166 models of univariate marginal distribution algo- 
45.144 F. Gruau: On using syntactic constraints with ge- rithm and factorized distribution algorithm, IEEE 
netic programming. In: Advances in Genetic Pro- Trans. Evol. Comput. 8(1), 80-93 (2004) 
gramming, Vol. 2, ed. by P.J. Angeline, K.E. Kin- 45.160 C. Gonzalez, J. Lozano, P. Larrañaga: Analyzing the 
near Jr. (MIT, Cambridge 1996) pp. 377-394 PBIL algorithm by means of discrete dynamical 
45.145 P. Whigham: Grammatically-based genetic pro- systems, Complex Syst. 4(12), 465-479 (2001) 
gramming, Proc. Workshop Genet. Progr. Theory 45.161 M. Höhfeld, G. Rudolph: Towards a theory 
Real-World Appl. (1995) pp. 33-41 of population-based incremental learning, Int. 
45.146 M.L. Wong, K.S. Leung: Genetic logic program- Conf. Evol. Comput. (1997) pp. 1-6 
ming and applications, IEEE Expert 10(5), 68-76 45.162 D.E. Goldberg, K. Deb, J.H. Clark: Genetic al- 
= (1995) gorithms, noise, and the sizing of populations, 
go 45.147 J.C. Bean: Genetic algorithms and random keys Complex Syst. 6, 333-362 (1992) 
= for sequencing and optimization, ORSA J. Comput. 45.163 D.E. Goldberg, K. Sastry, T. Latoza: On the sup- 
leal 6(2), 154-160 (1994) ply of building blocks, Genet. Evol. Comput. Conf. 
F 45.148 P.A.N. Bosman, D. Thierens: New IDEAs and more (2001) pp. 336-342 
wal ICE by learning and using unconditional permu- 45.164 G.R. Harik, E. Cantú-Paz, D.E. Goldberg, 
tation factorizations, Late-Breaking Pap. Genet. B.L. Miller: The gambler's ruin problem, ge- 
Evol. Comput. Conf. (2001) pp. 13-23 netic algorithms, and the sizing of populations, 
45.149 V. Robles, P. de Miguel, P. Larrañaga: Solving the Int. Conf. Evol. Comput. (1997) pp. 7-12 
traveling salesman problem with edas. In: Es- 45.165 G. Harik, E. Cantu-Paz, D.E. Goldberg, B.L. Miller: 
timation of Distribution Algorithms. A New Tool The gambler's ruin problem, genetic algorithms, 
for Evolutionary Computation, ed. by P. Lar- and the sizing of populations, Evol. Comput. 7(3), 
rañaga, J.A. Lozano (Kluwer Academic, Boston 231-253 (1999) 
2002) pp. 227-238 45.166 M. Pelikan, K. Sastry, D.E. Goldberg: Scalability of 
45.150 P.A.N. Bosman, D. Thierens: Crossing the road to the Bayesian optimization algorithm, Int. J. Ap- 
efficient IDEAs for permutation problems, Genet. prox. Reason. 31(3), 221-258 (2002) 
Evol. Comput. Conf. (2001) pp. 219-226 45.167 T.L. Yu, K. Sastry, D.E. Goldberg, M. Pelikan: Pop- 
45.151 E. Bengoetxea, P. Larrañaga, |. Bloch, A. Perchant, ulation sizing for entropy-based model building 
C. Boeres: Inexact graph matching using learn- in estimation of distribution algorithms, Genet. 
ing and simulation of Bayesian networks, Proc. Evol. Comput. Conf. (2007) pp. 601-608 
CaNew Workshop Conf. (2000) 45.168 H. Mühlenbein: Convergence of estimation of dis- 
45.152 M. Pelikan, S. Tsutsui, R. Kalapala: Dependency tribution algorithms for finite samples. Tech. Rep. 
trees, permutations, and quadratic assignment (Fraunhofer Institut, Sankt Augustin 2008) 
problem. MEDAL Rep. No. 2007003 (University of 45.169 J.L. Shapiro: Drift and scaling in estimation of 
Missouri, St. Louis 2007) distribution algorithms, Evol. Comput. 13, 99-123 
45.153 S. Tsutsui, D.E. Goldberg, M. Pelikan: Solving se- (2005) 
quence problems by building and sampling edge 45.170 P.A.N. Bosman, J. Grahl, D. Thierens: Enhancing 
histograms. IIliGAL Rep. No. 2002024 (University of the performance of maximum-likelihood Gaus- 
Illinois, Urbana 2002) sian EDAs using anticipated mean shift, Lect. 
45.154 A. Salhi, J.A.V. Rodriguez, Q. Zhang: An estima- Notes Comput. Sci. 5199, 133-143 (2008) 
tion of distribution algorithm with guided mu- 45.171 Y. Gao, J. Culberson: Space complexity of estima- 
tation for a complex flow shop scheduling prob- tion of distribution algorithms, Evol. Comput. 13, 
lem, Genet. Evol. Comput. Conf. (2007) pp. 570- 125-143 (2005) 
576 45.172 M.W. Hauschild, M. Pelikan, K. Sastry, C.F. Lima: 
45.155 Q. Zhang, H. Li: MOEA/D: A multiobjective evolu- Analyzing probabilistic models in hierarchical 


tionary algorithm based on decomposition, IEEE 
Trans. Evol. Comput. 11(6), 712-731 (2007) 


BOA, IEEE Trans. Evol. Comput. 13(6), 1199-1217 
(2009) 


Estimation of Distribution Algorithms 


References 


45 


45 


45 


45 


45 


45 


45 


45. 


45 


45 


45 


45 


45 


45 


45 


45 


45 


.173 


.174 


.175 


.176 


.177 


.178 


.179 


180 


.181 


.182 


.183 


.184 


.185 


.186 


.187 


.188 


.189 


C. Lima, F. Lobo, M. Pelikan, D.E. Goldberg: Model 
accuracy in the Bayesian optimization algorithm, 
Soft Comput. 15, 1351-1371 (2011) 

C. Echegoyen, A. Mendiburu, R. Santana, 
J.A. Lozano: Toward understanding EDAs based 
on Bayesian networks through a quantitative 
analysis, IEEE Trans. Evol. Comput. 99, 1-17 
(2011) 

E. Radetic, M. Pelikan: Spurious dependencies 
and EDA scalability, Genet. Evol. Comput. Conf. 
(2010) pp. 303-310 

D.E. Goldberg: The Design of Innovation: Lessons 
from and for Competent Genetic Algorithms 
(Kluwer Academic, Boston 2002) 

E. Cantu-Paz: Efficient and Accurate Parallel Ge- 
netic Algorithms (Kluwer Academic, Boston 2000) 
J. Ocenasek: Parallel Estimation of Distribution 
Algorithms: Principles and Enhancements (Lam- 
bert Academic, Saarbriichen 2010) 

J. Ocenasek, J. Schwarz: The parallel Bayesian op- 
timization algorithm, Proc. Eur. Symp. Comput. 
Intell. (2000) pp. 61-67 

J. Ocenasek: Parallel Estimation of Distribution 
Algorithms, Ph.D. Thesis (Brno University of Tech- 
nology, Brno 2002) 

J. Ocenasek, E. Cantu-Paz, M. Pelikan, J. Schwarz: 
Design of parallel estimation of distribution algo- 
rithms. In: Scalable Optimization via Probabilistic 
Modeling: From Algorithms to Applications, ed. 
by M. Pelikan, K. Sastry, E. Cantu-Paz (Springer, 
Berlin, Heidelberg 2006) 

K. Sastry, D.E. Goldberg, X. Llora: Towards billion- 
bit optimization via a parallel estimation of dis- 
tribution algorithm, Genet. Evol. Comput. Conf. 
(GECCO-2007) (2007) pp. 577-584 

G.E. Hinton, S.J. Nowlan: How learning can guide 
evolution, Complex Syst. 1, 495-502 (1987) 

A. Sinha, D.E. Goldberg: A survey of hybrid ge- 
netic and evolutionary algorithms. IIliGAL Rep. 
No. 2003004 (University of Illinois, Urbana 2003) 
C. Grosan, A. Abraham, H. Ishibuchi (Eds.): Hy- 
brid Evolutionary Algorithms. Studies in Compu- 
tational Intelligence (Springer, Berlin, Heidelberg 
2007) 

E. Radetic, M. Pelikan, D.E. Goldberg: Effects of 
a deterministic hill climber on hBOA, Genet. Evol. 
Comput. Conf. (2009) pp. 437-444 

P.A.N. Bosman: On gradients and hybrid evo- 
lutionary algorithms for real-valued multi- 
objective optimization, IEEE Trans. Evol. Comput. 
16(1), 51-69 (2012) 

C.F. Lima, M. Pelikan, K. Sastry, M.V. Butz, 
D.E. Goldberg, F.G. Lobo: Substructural neigh- 
borhoods for local search in the Bayesian opti- 
mization algorithm, Lect. Notes Comput. Sci. 4193, 
232-241 (2006) 

C.F. Lima, M. Pelikan, F.G. Lobo, D.E. Goldberg: 
Loopy substructural local search for the Bayesian 
optimization algorithm. In: Engineering Stochas- 


45 


45. 


45 


45 


45. 


45 


45 


45 


45 


45 


45. 


45. 


45. 


.190 


191 


.192 


.193 


194 


.195 


.196 


.197 


.198 


.199 


200 


201 


202 


tic Local Search Algorithms. Designing, Imple- 
menting and Analyzing Effective Heuristics, ed. 
by T. Stiitzle, M. Birattari, H.H. Hoos (Springer, 
Berlin, Heidelberg 2009) pp. 61-75 

A. Mendiburu, R. Santana, J.A. Lozano: Introduc- 
ing belief propagation in estimation of distribu- 
tion algorithms: A parallel approach, Tech. Rep. 
EHU-KAT-IK-11-07 (University of the Basque Coun- 
try, San Sebastián 2007) 

A. Ochoa, R. Hiins, M. Soto, H. Miihlenbein: 
A maximum entropy approach to sampling in 
EDA, Lect. Notes Comput. Sci. 2905, 683-690 
(2003) 

K. Sastry, D.E. Goldberg: Designing competent 
mutation operators via probabilistic model build- 
ing of neighborhoods, Genet. Evol. Comput. Conf. 
(GECCO) (2004) pp. 114-125, Also II1iGAL Rep. No. 
2004006 

D. Iclanzan, D. Dumitrescu: Overcoming hierar- 
chical difficulty by hill-climbing the building 
block structure, Genet. Evol. Comput. Conf. (2007) 
pp. 1256-1263 

P. Posik, S. Vanicek: Parameter-less local opti- 
mizer with linkage identification for determinis- 
tic order-k decomposable problems, Genet. Evol. 
Comput. Conf. (2011) pp. 577-584 

D.E. Goldberg: Using time efficiently: Genetic- 
evolutionary algorithms and the continuation 
problem, Genet. Evol. Comput. Conf. (1999) 
pp. 212-219 

D.E. Goldberg, S. Voessner: Optimizing global- 
local search hybrids, Genet. Evol. Comput. Conf. 
(1999) pp. 220-228 

K. Sastry, D.E. Goldberg: Let's get ready to rum- 
ble: Crossover versus mutation head to head, 
Genet. Evol. Comput. Conf. (GECCO) (2004) pp. 126- 
137 

K. Sastry, D.E. Goldberg: Let's get ready to rumble 
redux: Crossover versus mutation head to head on 
exponentially scaled problems, Genet. Evol. Com- 
put. Conf. (GECCO) (2007) pp. 114-125, Also IIIiGAL 
Report No. 2004006 

S. Baluja: Incorporating a priori knowledge in 
probabilistic- model based optimization. In: Scal- 
able Optimization via Probabilistic Modeling: 
From Algorithms to Applications, ed. by E. Cantú- 
Paz, M. Pelikan, K. Sastry (Springer, Berlin, Hei- 
delberg 2006) pp. 205-219 

M. Pelikan, M. Hauschild: Distance-based bias 
in model-directed optimization of additively de- 
composable problems. MEDAL Rep. No. 2012001 
(University of Missouri, St. Louis 2012) 

M. Pelikan, M. Hauschild, P.L. Lanzi: Transfer 
learning, soft distance-based bias, and the hi- 
erarchical BOA, Lect. Notes Comput. Sci. 7491, 
173-183 (2012) 

M. Pelikan, K. Sastry: Fitness inheritance in the 
Bayesian optimization algorithm, Genet. Evol. 
Comput. Conf. (2004) pp. 48-59 


927 


Sh | J Hed 


928 PartE 


Evolutionary Computation 


Sh | J Hed 


45.203 


45.204 


45.205 


45.206 


K. Sastry, M. Pelikan, D.E. Goldberg: Efficiency 
enhancement of genetic algorithms via building- 
block-wise fitness estimation, IEEE Congr. Evol. 
Comput. (2004) pp. 720-727 

K. Sastry, M. Pelikan, D.E. Goldberg: Efficiency 
enhancement of estimation of distribution algo- 
rithms. In: Scalable Optimization via Probabilistic 
Modeling: From Algorithms to Applications, ed. 
by E. Cantu-Paz, M. Pelikan, K. Sastry (Springer, 
Berlin, Heidelberg 2006) pp. 161-185 

M. Pelikan, K. Sastry, D.E. Goldberg: Sporadic 
model building for efficiency enhancement of the 
hierarchical BOA, Genet. Progr. Evol. Mach. 9(1), 
53-84 (2008) 

M. Pelikan: Probabilistic model-building genetic 
algorithms, Proc. 13th Annu. Conf. Companion 
Genet. Evol. Comput. (2011) pp. 913-940 


45.207 


45.208 


45.209 


45.210 


S. Tsutsui, M. Pelikan, A. Ghosh: Performance of 
aggregation pheromone system on unimodal and 
multimodal problems, IEEE Congr. Evol. Comput. 
(2005) pp. 880-887 

C.W. Ahn, R.S. Ramakrishna: Multiobjective real- 
coded Bayesian optimization algorithm revisited: 
Diversity preservation, Genet. Evol. Comput. Conf. 
(2007) pp. 593-600 

Q. Zhang, A. Zhou, Y. Jin: RM-MEDA: A regularity 
model-based multiobjective estimation of distri- 
bution algorithm, IEEE Trans. Evol. Comput. 12(1), 
41-63 (2008) 

H. Li, Q. Zhang: A multiobjective differential 
evolution based on decomposition for mul- 
tiobjective optimization with variable link- 
ages, Lect. Notes Comput. Sci. 4193, 583-592 
(2006) 


929 


46. Parallel Evolutionary Algorithms 


Dirk Sudholt 

Evolutionary algorithms (EAs) have given rise to iE Cellular EAS. 933 
many parallel variants, fuelled by the rapidly 46.1.5 A Unified Hypergraph Model 

increasing number of CPU cores and the ready for Population Structures ........... 935 
availability of computation power through GPUs 46.1.6 Hybrid Models...............0.c::ccceeee 935 
and cloud computing. A very popular approach 

is to parallelize evolution in island models, or 46.2 Effects of Parallelization....................... 935 
coarse-grained EAs, by evolving different popula- 46.2.1 Performance Measures 

tions on different processors. These populations för Parallel EAS... 5c. sessiccscaaesdescess 935 
run independently most of the time, but they 46.2.2 Superlinear Speedups................ 937 
periodically communicate genetic information to 

coordinate search. Many applications have shown 46.3 On the Spread of Information 

that island models can speed up computation in Parallel EAS...................c.ccecceeeceeeee ees 938 


significantly, and that parallel populations can 46.3.1 Logistic Models 


further increase solution diversity. for Growth Curves... Sadat Means sae 938 
The aim of this book chapter is to give a gen- 46.3.2 Rigorous Takeover Times ............ 939 
tle introduction into the design and analysis of boris Vaea ONE E aisan ii 
: : ; 6:34 Propagatiðm. -csi ieionserssises 941 
parallel evolutionary algorithms, in order to un- m 
derstand how parallel EAs work, and to explain 46.4 Examples Where Parallel A 
when and how speedups over sequential EAs can aa E T 943 m 
be obtained. 46.4.1 Independent RUNS .............. 943 = 
Understanding how parallel EAs work is a 46.4.2 Offspring Populations ................ 945 ov 
challenging goal as they represent interacting 46.4.3 Island Models ...............::0ccceeeees 945 
stochastic processes, whose dynamics are deter- 46.4.4 Crossover Between 
mined by several parameters and design choices. PSPS E EES 948 
This chapter uses a theory-guided perspective to 
explain how key parameters affect performance, 46.5 Speedups by Parallelization.................. 949 
based on recent advances on the theory of paral- 46.5.1 A General Method 
lel EAs. The presented results give insight into the for Analyzing Parallel EAs........... 949 
fundamental working principles of parallel EAs, as- 46.5.2 Speedups in Combinatorial 
sess the impact of parameters and design choices Optimization EET 953 
; z 46.5.3 Adaptive Numbers 
on performance, and contribute to an informed df Islands 955 
deian of efiecive paciia 22s SS a 
46.6 Conclusions ..................ccccccceeeeeseeeeeeeees 956 
46.1 Parallel Models... 931 46.6.1 Further Reading...............ccccc0 956 
46.1.1 Master-Slave Models................. 931 
46.1.2 Independent Runs ................... 931 Referentes enaena 957 


Recent years have witnessed the emergence of a huge ery desktop or notebook PC, and even mobile phones, 
number of parallel computer architectures. Almost ev- come with several CPU cores built in. Also GPUs 


930 PartE 


Evolutionary Computation 


9% | 3 Hed 


have been discovered as a source of massive com- 
putation power at no extra cost. Commercial IT so- 
lutions often use clusters with hundreds and thou- 
sands of CPU cores and cloud computing has become 
an affordable and convenient way of gaining CPU 
power. 

With these resources readily available, it has be- 
come more important than ever to design algorithms 
that can be implemented effectively in a parallel ar- 
chitecture. Evolutionary algorithms (EA) are popular 
general-purpose metaheuristics inspired by the natural 
evolution of species. By using operators like mutation, 
recombination, and selection, a multi-set of solutions — 
the population — is evolved over time. The hope is that 
this artificial evolution will explore vast regions of the 
search space and yet use the principle of survival of 
the fittest to generate good solutions for the problem 
at hand. Countless applications as well as theoretical 
results have demonstrated that these algorithms are ef- 
fective on many hard optimization problems. 

One of many advantages of EAs is that they are 
easy to parallelize. The process of artificial evolution 
can be implemented on parallel hardware in various 
ways. It is possible to parallelize specific operations, or 
to parallelize the evolutionary process itself. The latter 
approach has led to a variety of search algorithms called 
island models or cellular evolutionary algorithms. They 
differ from a sequential implementation in that evo- 
lution happens in a spatially structured network. Sub- 
populations evolve on different processors and good 
solutions are communicated between processors. The 
spread of information can be tuned easily via key pa- 
rameters of the algorithm. A slow spread of information 
can lead to a larger diversity in the system, hence in- 
creasing exploration. 

Many applications have shown that parallel EAs can 
speed up computation and find better solutions, com- 
pared to a sequential EA. This book chapter reviews 
the most common forms of parallel EAs. We highlight 
what distinguishes parallel EAs from sequential EAs. 
We also we make an effort to understand the search dy- 
namics of parallel EA. This addresses a very hot topic 
since, as of today, even the impact of the most basic pa- 
rameters of a parallel evolutionary algorithms are not 
well understood. 

The chapter has a particular emphasis on theoretical 
results. This includes runtime analysis, or computa- 


tional complexity analysis. The goal is to estimate the 
expected time until an EA finds a satisfactory solution 
for a particular problem, or problem class, by rigorous 
mathematical studies. This area has led to very fruit- 
ful results for general EAs in the last decade [46.1, 
2]. Only recently have researchers turned to investigat- 
ing parallel evolutionary algorithms from this perspec- 
tive [46.3-7]. The results help to get insight into the 
search behavior of parallel EAs and how parameters 
and design choices affect performance. The presenta- 
tion of these results is kept informal in order to make 
it accessible to a broad audience. Instead of present- 
ing theorems and complete formal proofs, we focus on 
key ideas and insights that can be drawn from these 
analyses. 

The outline of this chapter is as follows. In 
Sect. 46.1 we first introduce parallel models of evo- 
lutionary algorithms, along with a discussion of key 
design choices and parameters. Section 46.2 considers 
performance measures for parallel EAs, particularly no- 
tions for speedup of a parallel EA when compared to 
sequential EAs. 

Section 46.3 deals with the spread of information 
in parallel EAs. We review various models used to de- 
scribe how the number of good solutions increases in a 
parallel EA. This also gives insight into the time until 
the whole system is taken over by good solutions, the 
so-called takeover time. 

In Sect. 46.4 we present selected examples 
where parallel EAs were shown to outperform se- 
quential evolutionary algorithms. Drastic speedups 
were shown on illustrative example functions. This 
holds for various forms of parallelization, from in- 
dependent runs to offspring populations and island 
models. 

Section 46.5 finally reviews a general method for 
estimating the expected running time of parallel EAs. 
This method can be used to transfer bounds for a se- 
quential EA to a corresponding parallel EA, in an 
automated fashion. We go into a bit more detail here, 
in order to enable the reader to apply this method 
by her-/himself. Illustrative example applications are 
given that also include problems from combinatorial 
optimization. 

The chapter finishes with conclusions in Sect. 46.6 
and pointers to further literature on parallel evolution- 
ary algorithms. 


Parallel Evolutionary Algorithms | 46.1 Parallel Models 


46.1 Parallel Models 
46.1.1 Master-Slave Models 


There are many ways how to use parallel machines. 
A simple way of using parallelization is to execute 
operations on separate processors. This can concern 
variation operators like mutation and recombination as 
well as function evaluations. In fact, it makes most 
sense for function evaluations as these operations can be 
performed independently and they are often among the 
most expensive operations. This kind of architecture is 
known as master—slave model. One machine represents 
the master and it distributes the workload for executing 
operations to several other machines called slaves. It is 
well suited for the creation of offspring populations as 
offspring can be created and evaluated independently, 
after suitable parents have been selected. 

The system is typically synchronized: the master 
waits until all slaves have completed their operations 
before moving on. However, it is possible to use asyn- 
chronous systems where the master does not wait for 
slaves that take too long. 

The behavior of synchronized master-slave mod- 
els is not different from their sequential counterparts. 
The implementation is different, but the algorithm — and 
therefore search behavior — is the same. 


46.1.2 Independent Runs 


Parallel machines can also be used to simulate differ- 
ent, independent runs of the same algorithm in parallel. 
Such a system is very easy to set up as no communica- 
tion during the runtime is required. Only after all runs 
have been stopped, do the results need to be collected 
and the best solution (or a selection of different high- 
quality solutions) is output. 

Alternatively, all machines can periodically com- 
municate their current best solutions so that the system 
can be stopped as soon as a satisfactory solution has 
been found. As for master-slave models, this pre- 
vents us from having to wait until the longest run has 
finished. 

Despite its simplicity, independent runs can be quite 
effective. Consider a setting where a single run of 
an algorithm has a particular success probability, i.e., 
a probability of finding a satisfactory solution within 
a given time frame. Let this probability be denoted p. 
By using several independent runs, this success prob- 
ability can be increased significantly. This approach is 
commonly known as probability amplification. 


The probability that in A independent runs no run is 
successful is (1 —p)*. The probability that there is at 
least one successful run among these is, therefore, 


1—(1—p)*. (46.1) 


Figure 46.1 illustrates this amplified success probability 
for various choices of À and p. 

We can see that for a small number of proces- 
sors the success probability increases almost linearly. 
If the number of processors is large, a saturation effect 
occurs. The benefit of using ever more processors de- 
creases with the number of processors used. The point 
where saturation happens depends crucially on p; for 
smaller success probabilities saturation happens only 
with a fairly large number of processors. 

Furthermore, independent runs can be set up with 
different initial conditions or different parameters. This 
is useful to effectively explore the parameter space and 
to find good parameter settings in a short time. 


46.1.3 Island Models 


Independent runs suffer from obvious drawbacks: once 
a run reaches a situation where its population has be- 
come stuck in a difficult local optimum, it will most 
likely remain stuck forever. This is unfortunate since 
other runs might reach more promising regions of the 
search space at the same time. It makes more sense to 


Amplified success probability 
A 
1 


0.8 
0.6 
0.4 


0.2 


0 5 10 15 20 25 
Number of independent runs 


Fig. 46.1 Plots of the amplified success probability 
1—(1—p)* of a parallel system with A independent runs, 
each having success probability p 


931 


94 | 3 Hed 


932 


94 |3 Hed 


Part E 


Evolutionary Computation 


establish some form of communication between the dif- 
ferent runs to coordinate search, so that runs that have 
reached low-quality solutions can join in on the search 
in more promising regions. 

In island models, also called distributed EAs, the 
coarse-grained model, or the multi-deme model, the 
population of each run is regarded an island. One of- 
ten speaks of islands as subpopulations that together 
form the population of the whole island model. Is- 
lands evolve independently as in the independent run 
model, for most of the time. However, periodically solu- 
tions are exchanged between islands in a process called 
migration. 

The idea is to have a migration topology, a directed 
graph with islands as its nodes and directed edges con- 
necting two islands. At certain points of time selected 
individuals from each island are sent off to neighbored 
islands, i. e., islands that can be reached by a directed 
edge in the topology. These individuals are called mi- 
grants and they are included in the target island after 
a further selection process. This way, islands can com- 
municate and compete with one another. Islands that 
get stuck in low-fitness regions of the search space can 
be taken over by individuals from more successful is- 
lands. This helps to coordinate search, focus on the 
most promising regions of the search space, and use the 
available resources effectively. An example of an island 
model is given in Fig. 46.2. Algorithm 46.1 shows the 
general scheme of a basic island model. 


Algorithm 46.1 Scheme of an island model with 
migration interval t 

1: Initialize a population made up of subpopulations 

or islands, P® = PO, 408 , PO, 

: Lett:= 1. 
: loop 
for each island i do in parallel 

if t mod t = 0 then 


Fig. 46.2 Sketch of an island model with six islands and 
an example topology 


6: Send selected individuals from island po 
to selected neighbored islands. 
T: Receive immigrants igs from islands for 


which island PpO is a neighbor. 
8: Replace pP by a subpopulation resulting 


from a selection among pe and i? i 
9: end if 


10: Produce pee by applying reproduction op- 
erators and selection to PË. 

11: end for 

12: Lett:=t+1. 

13: end loop 


There are many design choices that affect the be- 
havior of such an island model: 


© Emigration policy. When migrants are sent, they 
can be removed from the sending island. Alter- 
natively, copies of selected individuals can be 
emigrated. The latter is often called pollina- 
tion. Also the selection of migrants is impor- 
tant. One might select the best, worst, or random 
individuals. 

© Immigration policy. Immigrants can replace the 
worst individuals in the target population, random 
individuals, or be subjected to the same kind of se- 
lection used within the islands for parent selection 
or selection for replacement. Crowding mechanisms 
can be used, such as replacing the most similar 
individuals. In addition, immigrants can be recom- 
bined with individuals present on the island before 
selection. 

@ Migration interval. The time interval between mi- 
grations determines the speed at which information 
is spread throughout an island model. Its reciprocal 
is often called migration frequency. Frequent migra- 
tions imply a rapid spread of information, while rare 
migrations allow for more exploration. Note that 
a migration interval of oo yields independent runs 
as a special case. 

© Number of migrants. The number of migrants, also 
called migration size, is another parameter that de- 
termines how quickly an island can be taken over by 
immigrants. 

@ Migration topology. Also the choice of the migra- 
tion topology impacts search behavior. The topol- 
ogy can be a directed or undirected graph — after 
all, undirected graphs can be seen as special cases 
of directed graphs. Common topologies include 
unidirectional rings (a ring with directed edges 


Parallel Evolutionary Algorithms | 46.1 Parallel Models 


only in one direction), bidirectional rings, torus or 
grid graphs, hypercubes, scale-free graphs [46.8], 
random graphs [46.9], and complete graphs. Fig- 
ure 46.3 sketches some of these topologies. An 
important characteristic of a topology T = (V, E) 
is its diameter: the maximum number of edges on 
any shortest path between two vertices. Formally, 
diam(7) = max„ vey dist(u, v), where dist(u, v) is 
the graph distance, the number of edges on a short- 
est path from u to v. The diameter gives a good 
indication of the time needed to propagate infor- 
mation throughout the topology. Rings and torus 
graphs have large diameters, while hypercubes, 
complete graphs, and many scale-free graphs have 
small diameters. 


Island models with non-complete topologies are 
also called stepping stone models. The impact of these 
design choices will be discussed in more detail in 
Sect. 46.3. 

If all islands run the same algorithm under identical 
conditions, we speak of a homogeneous island model. 
Heterogeneous island models contain islands with dif- 
ferent characteristics. Different algorithms might be 
used, different representations, objective functions, or 
parameters. Using heterogeneous islands might be use- 
ful if one is not sure what the best algorithm is for 
a particular problem. It also makes sense in the context 
of multiobjective optimization or when a diverse set of 
solutions is sought, as the islands can reflect different 
objective functions, or variations of the same objective 
functions, with an emphasis on different criteria. 

Skolicki [46.10] proposed a two-level view of search 
dynamics in island models. The term intra-island evolu- 
tion describes the evolutionary process that takes place 
within each island. On a higher level, inter-island evolu- 
tion describes the interaction between different islands. 
He argues that islands can be regarded as individuals in 
a higher-level evolution. Islands compete with one an- 
other and islands can take over other islands, just like 


Fig. 46.3 Sketches of common 
topologies: a unidirectional ring, 

a torus, and a complete graph. Other 
common topologies include bidi- 
rectional rings where all edges are 
undirected and grid graphs where the 
edges wrapping around the torus are 
removed 


individuals can replace other individuals in a regular 
population. One conclusion is that with this perspective 
an island models looks more like a compact entity. 

The two levels of evolution obviously interact with 
one another. Which level is more important is deter- 
mined by the migration interval and the other parame- 
ters of the system that affect the spread of information. 


46.1.4 Cellular EAs 


Cellular EAs represent a special case of island mod- 
els with a more fine-grained form of parallelization. 
Like in the island model we have islands connected 
by a fixed topology. Rings and two-dimensional torus 
graphs are the most common choice. The most striking 
characteristic is that each island only contains a single 
individual. Islands are often called cells in this context, 
which explains the term cellular EA. Each individual is 
only allowed to mate with its neighbors in the topology. 
This kind of interaction happens in every generation. 
This corresponds to a migration interval of 1 in the con- 


Fig. 46.4 Sketch of a cellular EA on a 7x7 grid graph. 
The dashed line indicates the neighborhood of the high- 
lighted cell 


933 


94 | 3 Hed 


934 PartE 


Evolutionary Computation 


94 |3 Hed 


text of island models. Figure 46.4 shows a sketch of 
a cellular EA. A scheme of a cellular EA is given in 
Algorithm 46.2. 


Algorithm 46.2 Scheme of a cellular EA 
1: Initialize all cells to form a population P = 
(P,...,P }. Lett:=0. 


2: loop 

3: for each cell i do in parallel 

4: Select a set S; of individuals from pe out of 
all cells neighbored to cell i. 

5: Create a set R; by applying reproduction oper- 
ators to S;. 

6: Create pee by selecting an individual 
from {p> U Ri. 

7: endfor 

8 Lett:=t+1. 

9: end loop 


Cellular EAs yield a much more fine-grained sys- 
tem; they have therefore been called fine-grained mod- 
els, neighborhood models, or diffusion models. The 
difference to island models is that no evolution takes 
place on the cell itself, i.e., there is no intra-island 
evolution. Improvements can only be obtained by cells 
interacting with one another. It is, however, possible 
that an island can interact with itself. 

In terms of the two-level view on island models, 
in cellular EAs the intra-island dynamics have effec- 
tively been removed. After all, each island only contains 
a single individual. Fine-grained models are well suited 
for investigations of inter-island dynamics. In fact, the 
first runtime analyses considered fine-grained island 
models, where each island contains a single individ- 
ual [46.4, 5]. Other studies dealt with fine-grained sys- 
tems that use a migration interval larger than 1 [46.3, 6, 
7]. 

For replacing individuals the same strategies as 
listed for island models can be used. All cells can 
be updated synchronously, in which case we speak of 
a synchronous cellular EA. A common way of imple- 
menting this is to create a new, temporary population. 
All parents are taken from the current population and 
new individuals are written into the temporary popula- 
tion. At the end of the process, the current population is 
replaced by the temporary population. 

Alternatively, cells can be updated sequentially, re- 
sulting in an asynchronous cellular EA. This is likely 
to result in a different search behavior as individu- 


als can mate with offspring of their neighbors. Alba 
et al. [46.11] define the following update strategies. The 
terms are tailored towards two-dimensional grids or 
torus graphs as they are inspired by cellular automata. 
It is, however, easy to adapt these strategies to arbitrary 
topologies: 


© Uniform choice: the next cell to be updated is cho- 
sen uniformly at random. 

© Fixed line sweep: the cells are updated sequentially, 
line by line in a grid/torus topology. 

© Fixed random sweep: the cells are updated sequen- 
tially, according to some fixed order. This order 
is determined by a permutation of all cells. This 
permutation is created uniformly at random dur- 
ing initialization and kept throughout the whole 
run. 

@ New random sweep: this strategy is like fixed ran- 
dom sweep, but after each sweep is completed a new 
permutation is created uniformly at random. 


A time step or generation is defined as the time 
needed to update m cells, m being the number of cells 
in the grid. The last three strategies ensure that within 
each time step each cell is updated exactly once. This 
yields a much more balanced treatment for all cells. 
With the uniform choice model is it likely that some 
cells must wait for a long time before being updated. In 
the limit, the waiting time for updates follows a Poisson 
distribution. Consider the random number of updates 
until the last cell has been updated at least once. This 
random process is known as the coupon collector prob- 
lem [46.12, page 32], as it resembles the process of 
collecting coupons, which are drawn uniformly at ran- 
dom. A simple analysis shows that the expected number 
of updates until the last cell has been updated in the uni- 
form choice model (or all coupons have been collected) 
equals 


m 


m- 1/i ~ m-In(m). 


i=1 
This is equivalent to 


m 


Š 1/ix Inm 


i=l 


time steps, which can be significantly larger than 1, the 
time for completing a sweep in any given order. 


Parallel Evolutionary Algorithms | 46.2 Effects of Parallelization 


Cellular EAs are often compared to cellular au- 
tomata. In the context of the latter, it is common practice 
to consider a two-dimensional grid and different neigh- 
borhoods. The neighborhood in Fig. 46.2 is called the 
von Neumann neighborhood or Linear 5. It includes 
the cell itself and its four neighbors along the directions 
north, south, west, and east. The Moore neighborhood 
or Compact 9 in addition also contains the four cells 
to the north west, north east, south west, and south 
east. Also larger neighborhoods are common, contain- 
ing cells that are further away from the center cell. 

Note that using a large neighborhood on a two- 
dimensional grid is equivalent to considering a graph 
where, starting with a torus graph, for each vertex 
edges to nearby vertices have been added. We will, 
therefore, in the remainder of this chapter stick to the 
common notion of neighbors in a graph (i.e., vertices 
connected by an edge), unless there is a good reason 
not to. 


46.1.5 A Unified Hypergraph Model 
for Population Structures 


Sprave [46.13] proposed a unified model for popula- 
tion structures. It is based on hypergraphs; an extension 
of graphs where edges can connect more than two ver- 
tices. We present an informal definition to focus on 
the ideas; for formal definitions we refer to [46.13]. 
A hypergraph contains a set of vertices and a collec- 
tion of hyperedges. Each hyperedge is a non-empty set 
of vertices. Two vertices are neighbored in the hyper- 
graph if there is a hyperedge that contains both vertices. 
Note that the special case where each hyperedge con- 
tains two different vertices results in an undirected 
graph. 

In Sprave’s model each vertex represents an indi- 
vidual. Hyperedges represent the set of possible parents 
for each individual. The model unifies various common 
population models: 


46.2 Effects of Parallelization 


An obvious effect of parallelization is that the computa- 
tion time can be reduced by using multiple processors. 
This section describes performance measured that can 
be used to define this speedup. We also consider ben- 
eficial effects of using parallel EAs that can lead to 
superlinear speedups. 


© Panmictic populations: for panmictic populations 
we have a set of vertices V and there is a sin- 
gle hyperedge that equals the whole vertex set. 
This reflects the fact that in a panmictic popula- 
tion each individual has all individuals as potential 
parents. 

@ Island models with migration: if migration is under- 
stood in the sense that individuals are removed, the 
set of potential parents for an individual contains all 
potential immigrants as well as all individuals from 
its own island, except for those that are being emi- 
grated. 

@ Island models with pollination: if pollination is 
used, the set of potential parents contains all immi- 
grants and all individuals on its own island. 

© Cellular EAs: For each individual, the potential par- 
ents are its neighbors in the topology. 


In the case of coarse-grained models, the hy- 
pergraph may depend on time. More precisely, we 
have different sets of potential parents when migration 
is used, compared to generations without migration. 
Sprave considers this by defining a dynamic population 
structure: instead of considering a single, fixed hyper- 
graph, we consider a sequence of hypergraphs over 
time. 


46.1.6 Hybrid Models 


It is also possible to combine several of the above 
approaches. For instance, one can imagine an island 
model where each island runs a cellular EA to fur- 
ther promote diversity. Or one can think of hierarchical 
island models where islands are island models them- 
selves. In such a system it makes sense that the inner- 
layer island models use more frequent migrations than 
the outer-layer island model. Island models and cellular 
EAs can also be implemented as master-slave models 
to achieve a better speedup. 


46.2.1 Performance Measures 
for Parallel EAs 


The computation time of a parallel EA can be defined 
in various ways. It makes sense to use wall-clock time 
as the performance measure as this accounts for the 


935 


7°94 |3 Hed 


936 PartE 


Evolutionary Computation 


7°94 |3 Hed 


overhead by parallelization. Under certain conditions, 
it is also possible to use the number of generations or 
function evaluations. This is feasible if these measures 
reflect the real running time in an adequate way, for in- 
stance if the execution of a generation (or a function 
evaluation) dominates the computational effort, includ- 
ing the effort for coordinating different machines. It is 
also feasible if one can estimate the overhead or the 
communication costs separately. 

We consider settings where an EA is run until a cer- 
tain goal is fulfilled. Goals can be reaching a global or 
local optimum or reaching a certain minimum fitness. In 
such a setting the goal is fixed and the running time of 
the EA can vary. This is in contrast to setups where the 
running time is fixed to a predetermined number of gen- 
erations and then the quality or accuracy of the obtained 
solutions is compared. As Alba pointed out [46.14], per- 
formance comparisons of parallel and sequential EAs 
only make sense if they reach the same accuracy. In 
the following, we focus on the former setting where the 
same goal is used. 

Still, defining speedup formally is far from trivial. 
It is not at all clear against what algorithm a parallel al- 
gorithm should be compared. However, this decision is 
essential to clarify the meaning of speedup. Not clarify- 
ing it, or using the wrong comparison, can easily yield 
misleading results and false claims. We present a taxon- 
omy inspired by Alba [46.14], restricted to cases where 
a fixed goal is given: 


© Strong speedup: the parallel run time of a par- 
allel algorithm is compared against the sequen- 
tial run time of the best known sequential algo- 
rithm. It was called absolute speedup by Barr 
and Hickman [46.15]. This measure captures in 
how far parallelization can improve upon the 
best known algorithms. However, it is often dif- 
ficult to determine the best sequential algorithm. 
Most researchers, therefore, do not use strong 
speedup [46.14]. 

@ Weak speedup: the parallel run time of an algorithm 
is compared against its own sequential run time. 
This gives rise to two subcases where the notion of 
its own sequential run time is made precise: 

— Single machine/panmixia: the parallel EA is 
compared against a canonical, panmictic ver- 
sion of it, running on a single machine. For 
instance, we might compare an island model 
with m islands against an EA running a single 
island. Thereby, the EA run on all islands is the 
same in both cases. 


— Orthodox: the parallel EA running on m ma- 
chines is compared against the same parallel 
EA running on a single machine. This kind of 
speedup was called relative speedup by Barr 
and Hickman [46.15]. 


In the light of these essential differences, it is essen- 
tial for researchers to clarify their notion of speedup. 

Having clarified the comparison, we can now de- 
fine the speedup and other measures. Let T, denote the 
time for m machines to reach the goal. Let T; denote 
the time for a single machine, where the algorithm is 
chosen according to one of the definitions of speedup 
defined above. 

The idea is to consider the ratio of T, and the time 
for a single machine, Tı, as speedup. However, as we 
are dealing with randomized algorithms, Tı and T, are 
random variables and so the ratio of both is a random 
variable as well. It makes more sense to consider the 
ratio of expected times for both the parallel and the se- 
quential algorithm as speedup 


_ EM) 
Ey) ` 


Sm 


Note that T; and T, might have very dissimilar prob- 
ability distributions. Even when both are re-scaled ap- 
propriately to obtain the best possible match between 
the two, they might still have different shapes and dif- 
ferent variances. In some cases it might make sense to 
consider the median or other statistics instead of the ex- 
pectation. 

According to the speedup Sm we distinguish the fol- 
lowing cases: 


© Sublinear speedup: if Sm < m we speak of a sublin- 
ear speedup. This implies that the total computation 
time across all machines is larger than the total com- 
putation time of the single machine (assuming no 
idle times in the parallel algorithm). 

© Linear speedup: the case Sm = m is known as linear 
speedup. There, the parallel and the sequential al- 
gorithm have the same total time. This outcome is 
very desirable as it means that parallelization does 
not come at a cost. There is no noticeable overhead 
in the parallel algorithm. 

© Superlinear speedup: if Sn >m we have a super- 
linear speedup. The total computation time of the 
parallel algorithm is even smaller than that of the 
single machine. This case is considered in more de- 
tail in the following section. 


Parallel Evolutionary Algorithms | 46.2 Effects of Parallelization 


a) Total effort for operation 


300 


200 


0 2 4 6 8 10 12 AG 
Number of processors m 


b) Total effort for operation 


0.8 


—o— Sequential 
0.6 ' --G-- Parallel 


= 


0 2 4 6 8 10 12 14 16 
Number of processors m 


Fig. 46.5a,b Total effort for executing an operation on a single, panmictic population of size u = 100 (sequential al- 
gorithm) and a parallel algorithm with m processors and m subpopulations of size u /m = 100/m each. The effort on 
a population of size n is assumed to be n Inn (a) and n? (b). Note that no overhead is considered for the parallel algorithm 


Speedup is the best known measure, but not the only 
one used regularly. For the sake of completeness, we 
mention other measures. The efficiency is a normaliza- 
tion of the speedup 


Sm 
em = 


Obviously, em = 1 is equivalent to a linear speedup. 
Lower efficiencies correspond to sublinear speedups, 
higher ones to superlinear speedups. 

Another measure is called incremental efficiency 
and it measures the speedup when moving from m -— 1 
processors to m processors 


— (m T 1) i E(Tm—1) 
gE m: E(Tn) 


There is also a generalized form where m—1 is re- 
placed by m’ < m in the above formula. This reflects 
the speedup when going from m’ processors to m pro- 
cessors. 


46.2.2 Superlinear Speedups 


At first glance superlinear speedups seem astonish- 
ing. How can a parallel algorithm have a smaller 
total computation time than a sequential counterpart? 
After all, parallelization usually comes with signif- 
icant overhead that slows down the algorithm. The 
existence of superlinear speedups has been discussed 


controversially in the literature. However, there are 
convincing reasons why a superlinear speedup might 
occur. 

Alba [46.14] mentions physical sources as one pos- 
sible reason. A parallel machine might have more 
resources in terms of memory or caches. When moving 
from a single machine to a parallel one, the algorithm 
might — purposely or not — make use of these additional 
resources. Also, each machine might only have to deal 
with smaller data packages. It might be that the smaller 
data fits into the cache while this was not the case for 
the single machine. This can make a significant perfor- 
mance difference. 

When comparing a single panmictic population 
against smaller subpopulations, it might be easier to 
deal with the subpopulations. This holds even when the 
total population sizes of both systems are the same. In 
particular, a parallel system has an advantage if oper- 
ations need time which grows faster than linearly with 
the size of the (sub)population. 

We give two illustrative examples. Compare a single 
panmictic population of size u with m subpopulations 
of size u/m each. Some selection mechanisms, like 
ranking selection, might have to sort the individuals in 
the population according to their fitness. In a straight- 
forward implementation one might use well-known 
sorting algorithms such as (randomized) QuickSort, 
MergeSort, or HeapSort. All of these are known to take 
time ©(nInn) for sorting n elements, on average. Let 
us disregard the hidden constant and the randomness of 


937 


7°94 |3 Hed 


938 PartE 


Evolutionary Computation 


€°9n | J Hed 


randomized QuickSort and assume that the time is pre- 
cisely nInn. 

Now the effort of sorting the panmictic population 
is u ln u. The total effort for sorting m populations of 
size u /m each is 


m: u/m-In(u/m) = p:ln(u/m) 
= pln(u)— p: ln(m). 


So, the parallel system executes this operation faster, 
with a difference of m - In(m) time steps in terms of the 
total computation time. 

This effect becomes more pronounced the more 
expensive operations are used (with respect to the popu- 
lation size). Assume that some selection mechanism or 
diversity mechanism is used, which compares every in- 
dividual against every other one. Then the effort for the 
panmictic population is roughly u? time steps. How- 
ever, for the parallel EA and its subpopulations the total 


effort would only be 
m: (u/m} =p°/m. 


This is faster than the panmictic EA by a factor of m. 

The above two growth curves are actually very typ- 
ical running times for operations that take more than 
linear time. A table with time bounds for common 
selection mechanisms can be found in Goldberg and 
Deb [46.16]. Figure 46.5 shows plots for the total ef- 
fort in both scenarios for a population size of u = 100. 
One can see that even with a small number of pro- 
cessors the total effort decreases quite significantly. 
To put this into perspective, most operations require 
only linear time. Also the overhead by paralleliza- 
tion was not accounted for. However, the discussion 
gives some hints as to why the execution time for 
smaller subpopulations can decrease significantly in 
practice. 


46.3 On the Spread of Information in Parallel EAs 


In order to understand how parallel EAs work, it is 
vital to get an idea on how quickly information is 
propagated. The spread of information is the most 
distinguishing aspect of parallel EAs, particularly dis- 
tributed EAs. This includes island models and cellular 
EAs. Many design choices can tune the speed at which 
information is transmitted: the topology, the migration 
interval, the number of migrants, and the policies for 
emigration and immigration. 


46.3.1 Logistic Models for Growth Curves 


Many researchers have turned to investigating the selec- 
tion pressure in distributed EAs in a simplified model. 
Assume that in the whole system we only have two 
types of solutions: current best individuals and worse 
solutions. No variation is used, i.e., we consider EAs 
using neither mutation nor crossover. The question is 
the following. Using only selection and migration, how 
long does it take for the best solutions to take over the 
whole system? This time, starting from a single best so- 
lution, is referred to as takeover time. 

It is strongly related to the study of growth curves: 
how the number of best solutions increases over time. 
The takeover time is the first point of time at which the 
number of best solutions has grown to the whole popu- 
lation. 


Growth curves are determined by both inter-island 
dynamics and intra-island dynamics: how quickly cur- 
rent best solutions spread in one island’s population, 
and how quickly they populate neighbored islands, un- 
til the whole topology is taken over. Both dynamics are 
linked: intra-island dynamics can have a direct impact 
on inter-island dynamics as the fraction of best indi- 
viduals can decide how many (if any) best individuals 
emigrate. 

For intra-island dynamics one can consider results 
on panmictic EAs. Logistic curves have been proposed 
and found to fit simulations of takeover times very well 
for common selection schemes [46.16]. These curves 
are defined by the following equation. If P(t) is the pro- 
portion of best individuals in the population at time f, 
then 


1 
1 =a 
1+(py-!e t 


where a is called the growth coefficient. One can see 
that the proportion of best individuals increases expo- 
nentially, but then the curve saturates as the proportion 
approaches 1. 

Sarma and De Jong [46.17] considered growth 
curves in cellular EAs. They presented a detailed em- 
pirical study of the effects of the neighborhood size and 


P(t) = 


Parallel Evolutionary Algorithms 


46.3 On the Spread of Information in Parallel EAs 


the shape of the neighborhood for different selection 
schemes. They showed that logistic curves as defined 
above can model the growth curves in cellular EAs rea- 
sonably well. 

Alba and Luque [46.18] proposed a logistic model 
called LOG tailored towards distributed EAs with pe- 
riodic migration. If t denotes the migration interval 
and m is the number of islands, then 


m—1 1 /m 
Proc(t) = 3 Ipa eT) ' 

In this model a and b are adjustable parameters. The 
model counts subsequent increases of the proportion of 
best individuals during migrations. However, it does not 
include any information about the topology and the au- 
thors admit that it only works appropriately on the ring 
topology [46.19, Section 4.2]. They, therefore, present 
an even more detailed model called TOP, which in- 
cludes the diameter diam(T) of the topology T. 


diam(T)— 1 


Prop(t) = > 


i=0 


1/m 
Ita: e™b(t— Tti) 


m-— diam(T)/m 
lta: e—(t—T-diam(7)) . 


Simulations show that this model yields very accurate 
fits for ring, star, and complete topologies [46.19, Sec- 
tion 4.3]. 

Luque and Alba [46.19, Section 4.3] proceed by 
analyzing the effect of the migration interval and the 
number of migrants. With a large migration interval, 
the growth curves tend to make jumps during migration 
and flatten out quickly to form plateaus during periods 
without migration. The resulting curves look like step 
functions, and the size of these steps varies with the mi- 
gration interval. 

Varying the number of migrants changes the slope 
of these steps. A large number of migrants has a bet- 
ter chance of transmitting best individuals than a small 
number of migrants. However, the influence of the num- 
ber of migrants was found to be less drastic than the 
impact of the migration interval. When a medium or 
large migration frequency is used, the impact of the 
number of migrants is negligible [46.19, Section 4.5]. 
The same conclusion was made earlier by Skolicki and 
De Jong [46.20]. 

Luque and Alba also presented experiments with 
a model based on the Sprave’s hypergraph formulation 
of distributed EAs [46.13]. This model gave a better fit 


than the simple logistic model LOG, but it was less ac- 
curate than the model TOP that included the diameter. 

For the sake of completeness, we also mention that 
Giacobini et al. [46.21] proposed an improved model 
for asynchronous cellular EAs, which is not based on 
logistic curves. 


46.3.2 Rigorous Takeover Times 


Rudolph [46.22, 23] rigorously analyzed takeover times 
in panmictic populations, for various selection schemes. 
In [46.22] he also dealt with the probability that the best 
solution takes over the whole population; this is not ev- 
ident for non-elitistic algorithms. In [46.23] Rudolph 
considered selection schemes made elitistic by undoing 
the last selection in case the best solution would be- 
come extinct otherwise. Under this scheme the expected 
takeover time in a population of size jz is O(u log u). 

In [46.24] Rudolph considered spatially structured 
populations in a fine-grained model. Each population 
has size 1, therefore vertices in the migration topology 
can be identified with individuals. Migration happens in 
every generation. Assume that initially only one vertex i 
in the topology is a best individual. If in every gen- 
eration each non-best vertex is taken over by the best 
individual in its neighborhood, then the takeover time 
from vertex i equals 


max dist(i,/) , 
jEV 


where V is the set of vertices and dist(i,j) denotes the 
graph distance, the number of edges on a shortest path 
from i toj. 

Rudolph defines the takeover time in a setting where 
the initial best solution has the same chance of evolving 
at every vertex. Then 


is the expected takeover time if, as above, best solutions 
are always propagated to their neighbors with probabil- 
ity 1. If this probability is lower, the expected takeover 
time might be higher. The above formula still represents 
a lower bound. Note that in non-elitist EAs it is possible 
that all best solutions might get lost, leading to a posi- 
tive extinction probability [46.24]. 

Note that maxjey dist(i, j) is bounded by the diam- 
eter of the topology. The diameter is hence a trivial 
lower bound on the takeover times. Rudolph [46.24] 


939 


€°97 | 3 Hed 


940 PartE 


Evolutionary Computation 


€°9n |3 Hed 


conjectures that the diameter is more important than the 
selection mechanism used in the distributed EA. 

In [46.25] the author generalizes the above argu- 
ments to coarse-grained models. Islands can contain 
larger populations and migration happens with a fixed 
frequency. In his model the author assumes that in 
each island new best individuals can only be gener- 
ated by immigration. Migration always communicates 
best individuals. Hence, the takeover time boils down 
to a deterministic time until the last island has been 
reached, plus a random component for the time until 
all islands have been taken over completely. 

Rudolph [46.25] gives tight bounds for unidirec- 
tional rings, based on the fact that each island with 
a best individual will send one such individual to each 
neighbored island. Hence, on the latter island the num- 
ber of best individuals increases by 1, unless the island 
has been taken over completely. For more dense topolo- 
gies he gives a general upper bound, which may not be 
tight for all graphs. If there is an island that receives 
best individuals from k > 1 other islands, the number 
of best individuals increases by k. (The number k could 
even increase over time.) It was left as an open problem 
to derive more tight bounds for interesting topologies 
other than unidirectional rings. 

Other researchers followed up on Rudolph’s sem- 
inal work. Giacobini et al. [46.26] presented theoret- 
ical and empirical results for the selection pressure 
on ring topologies, or linear cellular EAs. Giacobini 
etal. [46.27] did the same for toroidal cellular EAs. 
In particular, they considered takeover times for asyn- 
chronous cellular EAs, under various common update 
schemes. Finally, Giacobini et al. investigated growth 
curves for small-world graphs [46.9]. 


Fraction of global optima 


i a a da dn di dhe da dah dh dah dd ah hd 


0.8 


0.4 


0.2 


+ 
0.6}! 


> 
0 5 10 15 20 25 30 
Number of migrations 


—@-— Unidirectional ring 

—- Bidirectional ring 

— > 8x8-torus 

—A— 6-dimensional hypercube 
--*-- Complete graph 


The assumption from Rudolph’s model that only 
immigration can create new best individuals is not al- 
ways realistic. If standard mutation operators are used, 
there is a constant probability of creating a clone of a se- 
lected parent simply by not flipping any bits. This can 
lead to a rapid increase in the number of high-fitness 
individuals. 

This argument on the takeover of good solutions 
in panmictic populations has been studied as part of 
rigorous runtime analyses of population-based EAs. 
Witt [46.28] considered a simple (u +1) EA with 
uniform parent selection, standard bit mutations, no 
crossover, and cut selection at the end of the generation. 
From his work it follows that good solutions take over 
the population in expected time O(u log jz). More pre- 
cisely, if currently there is at least one individual with 
current best fitness i, then after O(u log p) generations 
all individuals in the population will have fitness i at 
least. 

Sudholt [46.29, Lemma 2] extended these argu- 
ments to a (u +À) EA and proved an upper bound of 
O(u/A- log u + log u). Note that, in contrast to other 
studies of takeover times, both results apply real EAs 
that actually use mutation. Extending these arguments 
to distributed EAs is an interesting topic for future 
work. 


46.3.3 Maximum Growth Curves 


Now, we consider inter-island dynamics in more de- 
tail. Assume for simplicity that intra-island takeover 
happens quickly: after each migration transmitting at 
least one best solution, the target island is completely 
taken over by best solutions before the next migra- 


Fig. 46.6 Plots of growth curves 
in an island model with 64 islands. 
We assume that in between two 
migrations all islands containing 

a current best solution completely 
take over all neighbored islands in 
the topology. Initially, one island 
contains a current best solution and 
all other islands are worse. The 
curves show the fraction of current 
best solutions in the system for dif- 
ferent topologies: a unidirectional 
ring, a bidirectional ring, a square 
torus, a hypercube, and a complete 
graph 


Parallel Evolutionary Algorithms 


46.3 On the Spread of Information in Parallel EAs 


tion. We start with only one island containing a best 
solution, assuming that all individuals on this island 
are best solutions. We call such an island an optimal 
island. If migrants are not subject to variation while 
emigrating or immigrating, we will always select best 
solutions for migration and, hence, successfully trans- 
mit best solutions. 

These assumptions give rise to a deterministic 
spread of best solutions: after each migration, each opti- 
mal island will turn all neighbored islands into optimal 
islands. This is very similar to Rudolph’s model [46.25], 
but it also accounts for a rapid intra-island takeover in 
between migrations. 

We consider growth curves on various graph 
classes: unidirectional and bidirectional rings, square 
torus graphs, hypercubes, and complete graphs. Fig- 
ure 46.6 shows these curves for all these graphs 
on 64 vertices. The torus graph has side lengths 
8x8. The hypercube has dimension 6. Each vertex 
has a label made of 6bits. All possible values for 
this bit string are present in the graph. Two vertices 
are neighbored if their labels differ in exactly one 
bit. 

For the unidirectional ring, after i— 1 migrations 
we have exactly i optimal islands, if i < m. The growth 
curve is, therefore, linear. For the bidirectional ring in- 
formation spreads twice as fast as it can spread in two 
directions. After i— 1 migrations we have 2i— 1 optimal 
islands if 2i— 1 < m. 

The torus allows communication in two dimensions. 
After one migration there are 1+4 = 5 optimal islands. 
After two migrations this number is 1 + 4+ 8, and after 
three migrations it is 1+4-+8-+ 12. In general, after 
i— 1 migrations we have 


i—1 
1+) 0 4j=142i(i-1) = 142? -2i 
j=l 


optimal islands, as long as the optimal islands can freely 
spread out in all four directions, north, south, west, and 
east. At some point the ends of the region of optimal is- 
lands will meet, i. e., the northern tip meets the southern 
one and the same goes for west and east. Afterwards, we 
observe regions of non-optimal islands that constantly 
shrink, until all islands are optimal. The growth curve 
for the torus is hence quadratic at first and then it starts 
to saturate. The deterministic growth on torus graphs 
was also considered in [46.30]. 

For the hypercube, we can without loss of gener- 
ality assume that the initial optimal island has a label 


containing only zeros. After one migration all islands 
whose label contains a single one become optimal. Af- 
ter two migrations the same holds for all islands with 
two ones, and so on. The number of optimal islands 
after i migrations in a d-dimensional hypercube (i. e., 
m = 2f) is hence X=). This number is close to d 
during the first migrations and then at some point starts 
to saturate. The complete graph is the simplest one to 
analyze here as it will be completely optimal after one 
migration. 

These arguments and Fig. 46.6 show that the 
growth curves can depend tremendously on the mi- 
gration topology. For sparse topologies like rings or 
torus graphs, in the beginning the growth is linear or 
quadratic, respectively. This is much slower than the 
exponential growth observed in logistic curves. Further- 
more, for the ring there is no saturation; linear curves 
are quite dissimilar to logistic curves. 

This suggests that logistic curves might not be the 
best models for growth curves across all topologies. 
The plots by Luque and Alba [46.19, Section 4.3] show 
a remarkably good overall fit for their TOP model. 
However, this might be due to the optimal choice of 
the parameters a and b and the fact that logistic curves 
are easily adaptable to various curves of roughly sim- 
ilar shape. We believe that it is possible to derive even 
more accurate models for common topologies, based on 
results by Giacobini et al. [46.9, 26, 27]. This is an in- 
teresting challenge for future work. 


46.3.4 Propagation 


So far, we have only considered models where migra- 
tion always successfully transmits best individuals. For 
non-trivial selection of emigrants, this is not always 
given. Also if crossover is used during migration, due 
to disruptive effects migration is not always successful. 
If we consider randomized migration processes, things 
become more interesting. 

Rowe et al. [46.31] considered a model of propa- 
gation in networks. Consider a network where vertices 
are either informed or not. In each round, each in- 
formed vertex tries to inform each of its neighbors. 
Every such trial is successful with a given probability p, 
and then the target island becomes informed. These 
decisions are made independently. Note that an unin- 
formed island might obtain a probability larger than p 
of becoming informed, in case several informed islands 
try to inform it. The model is inspired by models from 
epidemiology; it can be used to model the spread of 
a disease. 


941 


€°97 | 3 Hed 


942 


E'’9H |3 Hed 


Part E 


Evolutionary Computation 


The model of propagation of information directly 
applies to our previous setting where the network is the 
migration topology and p describes the probability of 
successfully migrating a current best solution. Note that 
when looking for estimations of growth curves and up- 
per bounds on the takeover time, we can assume that p 
is a lower bound on the actual probability of a success- 
ful transmission. Then the model becomes applicable 
to a broader range of settings, where islands can have 
different transmission probabilities. 

On some graphs like unidirectional rings, we can 
just multiply our growth curves by p to reflect the ex- 
pected number of optimal islands after a certain time. It 
then follows that the time for taking over all m islands 
is by a factor of 1/p larger than in the previous, deter- 
ministic model. 

However, this reasoning does not hold in general. 
Multiplying the takeover time in the deterministic set- 
ting by 1/p does not always give the expected takeover 
time in the random model. Consider a star graph (or 
hub), where initially only the center vertex is informed. 
In the deterministic case p = 1, the takeover time is 
clearly 1. However, if 0 < p < 1, the time until the last 
vertex is informed is given by the maximum of n— 1 
independent geometric distributions with parameter p. 
For constant p, this time is of order © (logn), i. e., the 
time until the last vertex is informed is much larger 
than the expected time for any specific island to be 
informed. 

Rowe et al. [46.31] presented a detailed analysis of 
hubs. They also show how to obtain a general upper 
bound that holds for all graphs. For every graph G 
with n vertices and diameter diam(G) the expected 
takeover time is bounded by 


o (=e + en) 
i : 


Both terms diam(G) and logn make sense. The diam- 
eter describes what distance needs to be overcome in 
order to inform all vertices in the network. The fac- 
tor 1/p gives the expected time until a next vertex 
is informed, assuming that it has only one informed 
neighbor. We also obtain diam(G) (without a fac- 
tor 1/p) as a lower bound on the takeover time. The 
additive term + logn is necessary to account for a po- 
tentially large variance, as seen in the example for star 
graphs. 

If the diameter of the graph is at least 2(logn), 
we can drop the + log n-term in the asymptotic bound, 
leading to an upper bound of O(diam(G)/p). 


Interestingly, the concept of propagation also ap- 
pears in other contexts. When solving shortest paths 
problems in graphs, metaheuristics like evolutionary 
algorithms [46.32-34] and ant colony optimization 
(ACO) [46.35,36] tend to propagate shortest paths 
through the graph. In the single-source shortest paths 
problem (SSSP) one is looking for shortest paths from 
a source vertex to all other vertices of the graph. The 
EAs and ACO algorithms tend to find shortest paths 
first for vertices that are close to the source, in a sense 
that their shortest paths only contain few edges. If 
these shortest paths are found, it enables the algo- 
rithm to find shortest paths for vertices that are further 
away. 

When a shortest paths to vertex u is found and there 
is an edge {u, v} in the graph, it is easy to find a shortest 
path for v. In the case of evolutionary algorithms, an EA 
only needs to assign u as a predecessor of v on the short- 
est path in a lucky mutation in order to find a shortest 
path to v. In the case of ACO, pheromones enable an ant 
to follow pheromones between the source and u, and so 
it only has to decide to travel between u and v to find 
a shortest path to v, with good probability. 

Doerr etal. [46.34, Lemma 3] used tail bounds 
to prove that the time for propagating shortest paths 
with an EA is highly concentrated. If the graph has 
diameter diam(G) > logn, the EA with high proba- 
bility finds all shortest paths in time O(diam(G)/p), 
where p = @(n~”) in this case. This result is similar 
to the one obtained by Rowe et al. [46.31]; asymptot- 
ically, both bounds are equal. However, the result by 
Doerr et al. [46.33] also allows for conclusions about 
growth curves. 

Ldssig and Sudholt [46.6, Theorem 3] introduced 
yet another argument for the analysis of propagation 
times. They considered layers of vertices. The i-th layer 
contains all vertices that have shortest paths of at most i 
edges, and that are not on any smaller layer. They bound 
the time until information is propagated throughout all 
vertices of a layer. This is feasible since all vertices 
in layer i are informed with probability at least p if 
all vertices in layers 1,...,i—1 are informed. If n; is 
the number of vertices in layer i, the time until the last 
vertex in this layer is informed is O(n;-logn;). This 
gives a bound for the total takeover time of O(diam(G)- 
In(en/diam(G))). For small (diam(G) = O(1)) or large 
(diam(G) = 2 (n)) diameters, we get the same asymp- 
totic bound as before. For other values it is slightly 
worse. 

However, the layering of vertices allows for inclu- 
sion of intra-island effects. Assume that the transmis- 


Parallel Evolutionary Algorithms 


46.4 Examples Where Parallel EAs Excel 


sion probability p only applies once islands have been 
taken over (to a significantly large degree) by best indi- 
viduals. This is a realistic setting as with only a single 
best individual the probability of selecting it for emigra- 
tion (or pollination, to be precise) might be very small. 
If all islands need time Tint: in order to reach this stage 


after the first best individual has reached the island, we 
obtain an upper bound of 


O(diam(G) - In(en/diam(G))) + diam(G) - Tintra 


for the takeover time. 


46.4 Examples Where Parallel EAs Excel 


Parallel EAs have been applied to a very broad range 
of problems, including many NP-hard problems from 
combinatorial optimization. The present literature is 
immense; already early surveys like the one by Alba 
and Troya [46.37] present long lists of applications 
of parallel EAs. Further applications can be found 
in [46.38—40]. Research on and applications of paral- 
lel metaheuristics has increased in recent years, due to 
the emergence of parallel computer architectures. 

Crainic and Hail [46.41] review applications of 
parallel metaheuristics, with a focus on graph color- 
ing, partitioning problems, covering problems, Steiner 
tree problems, satisfiability problems, location and 
network design, as well as the quadratic assignment 
problems with its famous special cases: the travel- 
ing salesman problem and vehicle routing problems. 
Luque and Alba [46.19] present selected applications 
for natural language tagging, the design of combina- 
torial logic circuits, the workforce planning problem, 
and the bioinformatics problem of assembling DNA 
fragments. 

The literature is too vast to be reviewed in this 
section. Also, for many hard practical problems it is 
often hard to determine the effect that parallelization 
has on search dynamics. The reasons behind the suc- 
cess of parallel models often remain elusive. We follow 
a different route and describe theoretical studies of evo- 
lutionary algorithms where parallelization was proven 
to be helpful. This concerns illustrative toy functions 
as well as problems from combinatorial optimization. 
All following settings are well understood and allow 
us to gain insights into the effect of parallelization. 
We consider parallel variants of the most simple evolu- 
tionary algorithm called (1 + 1) evolutionary algorithm, 
shortly (1+ 1) EA. It is described in Algorithm 46.3 
and it only uses mutation and selection in a population 
containing just one current search point. We are inter- 
ested in the optimization time, defined as the number 
of generations until the algorithm first finds a global 
optimum. Unless noted otherwise, we consider pseudo- 


Boolean optimization: the search space contains all bit 
strings of length n and the task is to maximize a func- 
tion f: {0, 1}" > R. We use the common notation x = 
X1 ...X, for bit strings. 


Algorithm 46.3 (1+1) EA for maximizing 
f:{0,1}" +R 
1: Initialize x € {0, 1}" uniformly at random. 
2: loop 
3: Create x’ by copying x and flipping each bit in- 
dependently with probability 1/n. 
if f(x’) > f(x) then x:= x’. 
end loop 


oh 


The presentation in this section is kept informal. For 
theorems with precise results, including all precondi- 
tions, we refer to the respective papers. 


46.4.1 Independent Runs 


Independent runs prove useful if the running time has 
a large variance. The reason is that the optimization 
time equals the time until the fastest run has found 
a global optimum. 

The variance can be particularly large in the case 
when the objective function yields local optima that 
are very hard to overcome. Bimodal functions contain 
two local optima, and typically only one is a global 
optimum. One such example was already analyzed the- 
oretically in the seminal runtime analysis paper by 
Droste et al. [46.42]. 

We review the analysis of a similar function that 
leads to a simpler analysis. The function TwoMax was 
considered by Friedrich et al. [46.43] in the context of 
diversity mechanisms. It is a function of unitation: the 
fitness only depends on the number of bits set to 1. The 
function contains two symmetric slopes that increase 
linearly with the distance to n/2. Only one of these 
slopes leads to a global optimum. Formally, the function 


943 


79h |3 Hed 


944 PartE 


Evolutionary Computation 


7°94 |3 Hed 


is defined as the maximum of OneMax := X` ;—; x; and 
its symmetric cousin ZeroMax := bear (1—x;), with an 
additional fitness bonus for the all-ones bit string 


+] [.. 


i=1 


TwoMax(x) := max È Xi, Sa — xi) 


i=1 


i=1 


See Fig. 46.7 for a sketch. 

The (1 + 1) EA reaches either a local optimum or 
a global optimum in expected time O(n logn). Due to 
the perfect symmetry of the function on the remainder 
of the search space, the probability that this is the global 
optimum is exactly 1/2. If a local optimum is reached, 
the (1+ 1) EA has to flip all bits in one mutation in 
order to reach the global optimum. The probability for 
this event is exactly n™”. 

The authors consider deterministic crowd- 
ing [46.43] in a population of size u as a diversity 
mechanism. It has the same search behavior as ju 
independent runs of the (1+ 1) EA, except that the 
running time is counted in a different way. Their result 
directly transfers to this parallel model. The only 
assumption is that the number of independent runs is 
polynomially bounded in n. 

The probability of finding a global optimum af- 
ter O(nlogn) generations of the parallel system is 
amplified to 1— 2~“. This means that only with prob- 
ability 2~“ we arrive at a situation where the parallel 
EA needs to escape from a local optimum. When all m 
islands are in this situation, the probability that at least 


DD 
20 
18 
16 
14 


12 


10 


> 
0 5 10 15 20 
Number of ones 


Fig. 46.7 Plots of the bimodal function TwoMax as de- 
fined in [46.43] 


one island makes this jump in one generation is at most 
_ a —n™”)” = O(m- n”) ; 


where the last equality holds since m is asymptotically 
smaller than n”. 

This implies that the expected number of genera- 
tions of a parallel system with m independent runs is 


O(nlogn) +27"-@ (=) 
m 


We can see from this formula that the number of runs m 
has an immense impact on the expected running time. 
Increasing the number of runs by 1 decreases the sec- 
ond summand by more than a factor of 2. The speedup 
is, therefore, exponential, up to a point where the run- 
ning time is dominated by the first term O(nlogn). 
Note in particular that log(n”) = nlogn processors 
are sufficient to decrease the expected running time 
to O(nlogn). 

This is a very simple example of a superlinear 
speedup, with regard to the optimization time. 

The observed effects also occur in combinatorial 
optimization. Witt [46.44] analyzed the (1+ 1) EA on 
the NP-hard PARTITION problem. The task can be 
regarded as scheduling on two machines: given a se- 
quence of jobs, each with a specific effort, the goal is to 
distribute the jobs on two machines to that the largest 
execution time (the makespan) is minimized. 

On worst-case instances the (1 + 1) EA has a con- 
stant probability of getting stuck in a bad local op- 
timum. The expected time to find a solution with 
a makespan of less than (4/3 — £) - OPT is n? ™ where 
€ > 0 is an arbitrary constant and OPT is the value of 
the optimal solution. 

However, if the (1+ 1) EA is lucky, it can, in- 
deed, achieve a good approximation of the global 
optimum. Assume we are aiming at a solution with 
a makespan of at most (1+ £). OPT, for some €> 
O we can choose. Witt’s analysis shows that then 
Qleloge+e)-[2/e] n(4/e)+0U/8) parallel runs output a so- 
lution of this quality with probability at least 3/4. (This 
probability can be further amplified quite easily by 
using more runs.) Each run takes time O(n In(1/e)). 
The parallel model represents what is known as 
a polynomial-time randomized approximation scheme 
(PRAS). The desired approximation quality (1 + £) can 
be specified, and if ¢ is fixed, the total computation 
time is bounded by a polynomial in n. This was the 
first example that parallel runs of a randomized search 
heuristics constitute a PRAS for an NP-hard problem. 


Parallel Evolutionary Algorithms 


46.4 Examples Where Parallel EAs Excel 


46.4.2 Offspring Populations 


Using offspring populations in a master-slave architec- 
ture can decrease the parallel running time and lead 
to a speedup. We will discuss this issue further in 
Sect. 46.5 as offspring populations are very similar to 
island models on complete topologies. For now, we 
present one example where offspring populations de- 
crease the optimization time very drastically. 

Jansen etal. [46.45] compared the (1+ 1) EA 
against a variant (1 + A) EA that creates À offspring in 
parallel and compares the current search point against 
the best offspring. They constructed a function Suf- 
Samp where offspring populations have a significant 
advantage. We refrain from giving a formal definition, 
but instead describe the main ideas. The vast majority 
of all search points tend to lead an EA towards the start 
of a path through the search space. The points on this 
path have increasing fitness, thus encouraging an EA to 
follow it. All points outside the path are worse, so the 
EA will stay on the path. 

The path leads to a local optimum at the end. How- 
ever, the function also includes a number of smaller 
paths that branch off the main path, see Fig. 46.8. All 
these paths lead to global optima, but they are diffi- 
cult to discover. This makes a difference between the 
(1+ 1) EA and the (1 + à) EA for sufficiently large À. 
The (1 + 1) EA typically follows the main path without 
discovering the smaller paths branching off. At the end 
of the main path it thus becomes stuck in a local opti- 
mum. The analysis in [46.45] shows that the (1+ 1) EA 
needs superpolynomial time, with high probability. 

Contrarily, the (1 +A) EA performs a more thor- 
ough search as it progresses on the main path. The many 
offspring tend to discover at least one of the smaller 
branches. The fitness on the smaller branches is larger 
than the fitness of the main path, so the EA will move 
away from the main path and follow a smaller path. It 
then finds a global optimum in polynomial time, with 
high probability. 

Interestingly, this construction can be easily adapted 
to show an opposite result. We replace the local opti- 
mum at the end of the main path by a global optimum 


Global optima 


Local Local 
optima optima 


and replace all global optima at the end of the smaller 
branches by local optima. This yields another function 
SufSamp’, also shown in Fig. 46.8. By the same reason- 
ing as above, the (1 + 4) EA will become stuck and the 
(1+ 1) EA will find a global optimum in polynomial 
time, with high probability. 

While the example is clearly constructed and artifi- 
cial, it can be seen as a cautionary tale. The reader might 
be tempted to think that using offspring populations in- 
stead of creating a single offspring can never increase 
the number of generations needed to find the optimum. 
After all, evolutionary search with offspring population 
is more intense and improvements can be found more 
easily. As we focus on the number of generations (and 
do not count the effort for creating A offspring), it is 
tempting to claim that offspring populations are never 
disadvantageous. 

The second example shows that this claim — how- 
ever obvious it may seem — does not hold for general 
problem classes. Note that this statement is also implied 
by the well-known no free lunch theorems [46.46], but 
the above results are much stronger and more concrete. 


46.4.3 Island Models 


The examples so far have shown that a more thorough 
search — by independent runs or increased sampling of 
offspring — can lead to more efficient running times. 
Lässig and Sudholt [46.3] presented a first example 
where communication makes the difference between 
exponential and polynomial running times, in a typi- 
cal run. They constructed a family of problems called 
LOLZ,,...,¢ where a simple island model finds the op- 
timum in polynomial time, with high probability. This 
holds for a proper choice of the migration interval and 
any migration topology that is not too sparse. The is- 
lands run (1+ 1) EAs, hence the island model resembles 
a fine-grained model. 

Contrarily, both a panmictic population as well as 
independent islands need exponential time, with high 
probability. This shows that the weak speedup versus 
panmixia is superlinear, even exponential (when con- 
sidering speedups with respect to the typical running 


Local optima 


Fig. 46.8 Sketches of the functions 
SufSamp (left) and SufSamp’ (right). 
The fitness is indicated by the color 


945 


79h |3 Hed 


946 PartE 


Evolutionary Computation 


7°94 |3 Hed 


Table 46.1 Examples of solutions for the function LOLZ with four blocks and z = 3, along with their fitness values. All 
blocks have to be optimized from left to right. The sketch shows in bold all bits that are counted in the fitness evaluation. 
Note how in x3 in the third block only the first z = 3 zeros are counted. Further 0-bits are ignored. The only way to 


escape from this local optimum is to flip all z O-bits in this block simultaneously 


XI 11110011 11010100 
x 11111111 11010100 
X3 11111111 11111111 


time instead of the expected running time). Unlike pre- 
vious examples, it also shows that more sophisticated 
means of parallelization can be better than independent 
runs. 

The basic idea of this construction is as follows. An 
EA can increase the fitness of its current solutions by 
gathering a prefix of bits with the same value. Gener- 
ally, a prefix of i leading ones yields the same fitness as 
a prefix of i leading zeros. The EA has to make a de- 
cision whether to collect leading ones (LOs) or leading 
zeros (LZs). This not only holds for the (1 + 1) EA but 
also for a (not too large) panmictic population as genetic 
drift will lead the whole population to either leading 
ones or leading zeros. 

In the beginning, both decisions are symmetric. 
However, after a significant prefix has been gath- 
ered, symmetry is broken: after the prefix has reached 
a length of z, z being a parameter of the function, only 
leading ones lead to a further fitness increase. If the EA 
has gone for leading zeros, it becomes stuck in a local 
optimum. The parameter z determines the difficulty of 
escaping from this local optimum. 

This construction is repeated on several blocks of 
the bit string that need to be optimized one-by-one. 
Each block has length £. Only if the right decision to- 
wards the leading ones is made on the first block, can 
the block be filled with further leading ones. Once the 
first block contains only leading ones, the fitness de- 
pends on the prefix in the second block, and a further 
decision between leading ones and leading zeros needs 
to be made. Figure 46.1 illustrates the problem defini- 
tion. 

So, the problem requires an EA to make several 
decisions in succession. The number of blocks, b, is 
another parameter that determines how many decisions 
need to be made. Panmictic populations will sooner or 
later make a wrong decision and become stuck in some 
local optimum. If b is not too small, the same holds for 
independent runs. 

However, an island model can effectively commu- 
nicate the right decisions on blocks to other islands. 
Islands that have become stuck in a local optimum can 


11010110 01011110 LOLZ (x1) = 4 
11010110 01011110 LOLZ (x2) = 10 
00000110 01011110 LOLZ(x3) = 19 


be taken over by other islands that have made the cor- 
rect decision. These dynamics make up the success of 
the island model as it can be shown to find global op- 
tima with high probability. A requirement is, though, 
that the migration interval is carefully tuned so that 
migration only transmits the right information. If mi- 
gration happens before the symmetry between leading 
ones and leading zeros is broken, it might be that islands 
with leading zeros take over islands with leading ones. 
Lässig and Sudholt [46.3] give sufficient conditions un- 
der which this does not happen, with high probability. 

An interesting finding is also how islands can regain 
independence. During migration, genetic information 
about future blocks is transmitted. Hence, after migra- 
tion all islands contain the same genotype on future 
blocks. This is a real threat as this dependence might 
imply that all islands make the same decision after mov- 
ing on to the next block. Then all diversity would be 
lost. 

However, under the conditions given in [46.3] there 
is a period of independent evolution following mi- 
gration, before any island moves on to a new block. 
During this period of independence, the genotypes of 
future blocks are subjected to random mutations, inde- 
pendently for each island. The reader might think of 
moving particles in some space. Initially, all bits are in 
the same position. However, then particles start moving 
around randomly. Naturally, they will spread out and 
separate from one another. After some time the distri- 
bution of particles will resemble a uniform distribution. 
In particular, an observer would not be able to distin- 
guish whether the positions of particles were obtained 
by this random process or by simply drawing them from 
a uniform distribution. 

The same effect occurs with bits of future blocks; 
after some time all bits of a future block will be in- 
distinguishable from a random bit string. This shows 
that independence can not only be gained by indepen- 
dent runs, but also by periods of independent evolution. 
One could say that the island model combines the 
advantages of two worlds: independent evolution and 
selection pressure through migration. The island model 


Parallel Evolutionary Algorithms 


46.4 Examples Where Parallel EAs Excel 


A 


4 ‘a 


OO QOC QQ 


Fig. 46.9 Sketch of the graph G’. The top shows a configuration where a decision at v* has to be made. The three 
configurations below show the possible outcomes. All these transitions occur with equal probability, but only the one on 
the bottom right leads to a solution where rotations are necessary 


is only successful because it can use both migration and 
periods of independent evolution. 

The theoretical results [46.3] were complemented 
by experiments in [46.47]. The aim was to look at what 
impact the choice of the migration topology and the 
choice of the migration interval have on performance, 
regarding the function LOLZ. The theoretical results 
made a statement about a broad class of dense topolo- 
gies, but required a very precise migration interval. The 
experiments showed that the island model is far more 
robust with respect to the migration interval than sug- 
gested by theory. 

Depending on the migration interval, some topolo- 
gies were better than others. The topologies involved 
were a bidirectional ring, a torus with edges wrapping 
around, a hypercube graph, and the complete graph. We 
considered the success rate of the island model, stop- 
ping it as soon as all islands had reached local or global 
optima. We then performed statistical tests comparing 
these success rates. For small migration intervals, i. e., 
frequent migrations, sparse topologies were better than 
dense ones. For large migration intervals, i. e., rare mi- 
grations, the effect was the opposite. This effect was 
expected; however, we also found that the torus was 
generally better than the hypercube. This is surprising, 
as both have a similar density. Table 46.2 shows the 
ranking obtained for commonly used topologies. 

Superlinear speedups with island models also oc- 
cur in simpler settings. Ldssig and Sudholt [46.6] also 
considered island models for the Eulerian cycle prob- 


lem. Given an undirected Eulerian graph, the task is to 
find a Eulerian cycle, i.e., a traversal of the graph on 
which each edge is traversed exactly once. This prob- 
lem can be solved efficiently by tailored algorithms, but 
it served as an excellent test bed for studying the per- 
formance of evolutionary algorithms [46.48-5 1]. 
Instead of bit strings, the problem representation by 
Neumann [46.48] is based on permutations of the edges 
of the graph. Each such permutation gives rise to a walk: 
starting with the first edge, a walk is the longest se- 
quence of edges such that two subsequent edges in the 
permutation share a common vertex. The walk encoded 
by the permutation ends when the next edge does not 
share a vertex with the current one. A walk that con- 
tains all edges represents a Eulerian cycle. The length 
of the walk gives the fitness of the current solution. 
Neumann [46.48] considered a simple instance that 
consists of two cycles of equal size, connected by one 
common vertex v* (Fig. 46.9). The instance is interest- 
ing as it represents a worst case for the time until an 


Table 46.2 Performance comparison according to success 
rates for commonly used migration topologies. The notion 
A < B means that topology A has a significantly smaller 
success rate than topology B 


Migration interval Ranking 

Small migration intervals Ku < hypercube < torus < ring 
Medium migration intervals hypercube < K,, < ring < torus 
High migration intervals ring < torus < hypercube < Ky, 


947 


1'9 |3 Hed 


948 PartE 


Evolutionary Computation 


7°94 |3 Hed 


improvement is found. This is with respect to random- 
ized local search (RLS) working on this representation. 
RLS works like the (1 + 1) EA, but it only uses lo- 
cal mutations. As the mutation operator it uses jumps: 
an edge is selected uniformly at random and then it is 
moved to a (different) target position chosen uniformly 
at random. All edges in between the two positions are 
shifted accordingly. 

On the considered instance RLS typically starts 
constructing a walk within one of these cycles, either by 
appending edges to the end of the walk or by prepend- 
ing edges to the start of the walk. When the walk 
extends to v* for the first time, a decision needs to be 
made. RLS can either extend the walk to the opposite 
cycle, Fig. 46.9. In this case, RLS can simply extend 
both ends of the walk until a Eulerian cycle is formed. 
The expected time until this happens is © (m°), where m 
denotes the number of edges. 

However, if another edge in the same cycle is added 
at v*, the walk will evolve into one of the two cycles 
that make up the instance. It is not possible to add fur- 
ther edges to the current walk, unless the current walk 
starts and ends in v*. However, the walk can be rotated 
so that the start and end vertex of the walk is moved to 
a neighbored vertex. Such an operation takes expected 
time @(m7). Note that the fitness after a rotation is the 
same as before. Rotations that take the start and end 
closer to v* are as likely as rotations that move it away 
from v*. The start and end of the walk hence performs 
a fair random walk, and ©(m7) rotations are needed on 
average in order to reach v*. The total expected time for 
rotating the cycle is hence © (m°). 

Summarizing, if RLS makes the right decision then 
expected time O(m) suffices in total. However, if ro- 
tations become necessary the expected time increases 
to O(m*). Now consider an island model with m is- 
lands running RLS. If islands evolve independently for 
at least T > m? generations, all mentioned decisions are 
made independently, with high probability. The proba- 
bility of making a wrong decision is 1/3, hence with m 
islands the probability that all islands make the wrong 
decision is 3~’”. The expected time can be shown to be 


O(m +3™ -m°). 


The choice m := log, m yields an expectation of © (m°), 
and every value up to logąm leads to a superlin- 
ear speedup, asymptotically speaking. Technically, the 
speedup is even exponential. 

Interestingly, this good performance only holds if 
migration is used rarely, or if independent runs are used. 


If migration is used too frequently, the island model 
rapidly loses diversity. If T is any strongly connected 
topology and diam(T) is its diameter, we have the fol- 
lowing. If 


t- diam(T) -m = O(m’) , 


then there is a constant probability that the island that 
first arrives at a decision at v* propagates this solution 
throughout the whole island model, before any other 
island can make an improvement. This results in an 
expected running time of 2(m*/log(m)). This is al- 
most @(m*), even for very large numbers of islands. 
The speedup is, therefore, logarithmic at best, or even 
worse. This natural example shows that the choice of 
the migration interval can make a difference between 
exponential and logarithmic speedups. 


46.4.4 Crossover Between Islands 


It has long been known that island models can also 
be useful in the context of crossover. Crossover usu- 
ally requires a good diversity in the population to work 
properly. Due to the higher diversity between different 
islands, compared to panmixia, recombining individu- 
als from different islands is promising. 

Watson and Jansen [46.52] presented and analyzed 
a royal road function for crossover: a function where 
crossover drastically outperforms mutation-based evo- 
lutionary algorithms. In contrast to previous theoreti- 
cally studied examples [46.53-57], their goal was to 
construct a function with a clear building-block struc- 
ture. In order to prove that a GA was able to assemble 
all building blocks, they resorted to an island model 
with a very particular migration topology. In their 
single-receiver model all islands except one evolve 
independently. Each island sends its migrants to a des- 
ignated island called the receiver (Fig 46.10). This way, 
all sending islands are able to evolve the right building 
blocks, and the receiver is used to assemble all these 
building blocks to obtain the optimum. 


Fig. 46.10 The topology for Watson and Jansen’s single- 
receiver model (after [46.52]) 


Parallel Evolutionary Algorithms | 46.5 Speedups by Parallelization 


LE, ES 


LE, LEX. 


Fig. 46.11 Vertex cover instance with bipartite graphs. The brown vertices denote selected vertices. In this configuration 
the second component shows a locally optimal configuration while all other components are globally optimal 


This idea was picked up later on by Neumann 
et al. [46.7] in a more detailed study of crossover in is- 
land models. We describe parts of their results, as their 
problem is more illustrative than the one by Watson 
and Jansen. The former authors considered instances of 
the NP-hard Vertex cover problem. Given an undirected 
graph, the goal is to select a subset of vertices such that 
each vertex is either selected or neighbored to a selected 
vertex. We say that vertices are covered if this property 
holds for them. The objective is to minimize the num- 
ber of selected vertices. The problem has a simple and 
natural binary representation where each bit indicates 
whether a corresponding vertex is selected or not. 

Prior work by Oliveto et al. [46.58] showed that 
evolutionary algorithms with panmictic populations 
even fail on simply structured instance classes like 
copies of bipartite graphs. An example is shown in 
Fig. 46.11. Consider a single bipartite graph, i.e., two 
sets of vertices such that each vertex in one set is con- 
nected to every vertex in the other set. If both sets 
have different sizes, the smaller set is an optimal Ver- 
tex cover. The larger set is another Vertex cover. It is, in 
fact, a non-optimal local optimum which is hard to over- 
come: the majority of bits has to flip in order to escape. 
If the instance consists of several independent copies of 
bipartite graphs, it is very likely that a panmictic EA 
will evolve a locally optimal configuration on at least 
one of the bipartite graphs. Then the algorithm fails to 
find a global optimum. 


46.5 Speedups by Parallelization 


46.5.1 A General Method 
for Analyzing Parallel EAs 


We now finally discuss a method for estimating the 
speedup by parallelization. Assume that, instead of run- 
ning a single EA, we run an island model where each 
island runs the same EA. The question is by how much 
the expected optimization time (1. e., the number of gen- 
erations until a global optimum is found) decreases, 


Island models perform better. Assume the topol- 
ogy is the single-receiver model. In each migration 
a 2-point crossover is performed between migrants and 
the individual on the target island. All islands have 
population size | for simplicity. We also assume that 
the bipartite subgraphs are encoded in such a way 
that each subgraph forms one block in the bit string. 
This is a natural assumption as all subgraphs can be 
clearly identified as building blocks. In addition, Jansen 
et al. [46.59] presented an automated way of encoding 
graphs in a crossover-friendly way, based on the degrees 
of vertices. 

The analysis in [46.7] shows the following. As- 
sume that the migration interval is at least t > nite 
for some positive constant € > 0. This choice implies 
that all islands will evolve to configurations where 
all bipartite graphs are either locally optimal or glob- 
ally optimal. With probability 1 — e7? 0™ we have that 
for each bipartite graph at least a constant fraction 
of all sender islands will have the globally optimal 
configuration. 

All that is left to do for the receiver island is to 
rely on crossover combining all present good building 
blocks. As two-point crossover can select one block 
from an immigrant and the remainder from the current 
solution on the receiver island, all good building blocks 
have a good chance to be obtained. The island model 
finds a global optimum within a polynomial number of 
generations, with probability 1 — e7? (mingn®/? m}) , 


compared to the single, panmictic EA. Recall that this 
speedup is called weak orthodox speedup [46.14]. 

In the following we sometimes speak of the ex- 
pected parallel optimization time to emphasize that 
we are dealing with a parallel system. If the num- 
ber of islands and the population size on each island 
is fixed, we can simply multiply this time by a fixed 
factor to obtain the expected number of function evalu- 
ations. 


949 


S°94 | 3 Hed 


950 PartE 


Evolutionary Computation 


6°94 | 3 Hed 


Lässig and Sudholt [46.4] presented a method for 
estimating the expected optimization time of island 
models. It combines growth curves with a well-known 
method for the analysis of evolutionary algorithms. 
The fitness-level method or method of f-based parti- 
tions [46.60] is a simple, yet powerful technique. The 
idea is to partition the search space into non-empty 
sets Aj, A2, . . ., Am such that the following holds: 


@ for each 1<i<m each search point in A; has 
a strictly worse fitness than each search point 
in Aj+, and 

© A,, contains all global optima. 


The described ordering with respect to the fitness f 
is often denoted 


Aj <p Az <n <p Åm š 


Note that A,, can also be redefined towards containing 
all search points of some desired quality if the goal is 
not global optimization. 

We say that a population-based algorithm A (in- 
cluding populations of size 1) is in A; or on fitness 
level i if the best search point in the population is in A;. 
Now, assume that we know that s; is a lower bound 
on the probability that the algorithm finds a solution 
in Aj4; U ++- U An if it is currently in A;. Then the re- 
ciprocal 1/s; is an upper bound on the expected time 
until this event happens. If the algorithm is elitist (i. e., 
it never loses the current best solution), then it will 
never decrease its current fitness level. A sufficient con- 
dition for finding an optimal solution is that all sets 
A,,A2,...,Am—j, are left in the described manner at 
least once. This implies the following bound on the ex- 
pected optimization time. 


Theorem 46.1 Wegener [46.60] 

Consider an elitist EA and assume a fitness-level par- 
tition A; <---<;~A,, where An is the set of global 
optima. Let s; be a lower bound for the probability 
that in one generation the EA finds a search point in 
Aj41 U+- U An if the best individual in the parent pop- 
ulation is in A;. Then the expected optimization time is 
bounded by 


m—1 
1 


The above bound applies to all elitist algorithms. 
It is generally applicable and often quite versatile, as 


we can freely choose the partition A;,...,A,. The 
challenge is to find such a partition and to find cor- 
responding probability bounds s,...,5,,—; for find- 
ing improvements. Many papers have shown that this 
method — applied explicitly or implicitly — yields tight 
bounds on the expected optimization time of EAs for 
various problems [46.32, 42, 48]. It can also be used as 
part of a more general analysis [46.61, 62]. 

We are being pessimistic in assuming that every fit- 
ness level has to be left. In reality, several fitness levels 
might be skipped. The fitness-level method often yields 
good bounds if not too many levels are skipped, and if 
the probability bounds s; are good estimates for the real 
probabilities of finding a better fitness-level set. Note 
that the lower bound s; must apply regardless of the 
precise search point(s) in A; present in the population, 
hence we need to consider the worst-case probability of 
escaping from Aj. 

Nevertheless, the fitness-level method often yields 
tight bounds. Sudholt [46.63] recently developed 
a lower-bound method based on fitness levels, which 
in each case shows that the upper bound is tight. Also, 
Lehre [46.64] recently presented an extension of the 
method to non-elitist algorithms. Asymptotically, the 
same bound as in Theorem 46.1 applies, if some ad- 
ditional conditions on the selection pressure and the 
population size are fulfilled. For the sake of simplicity, 
we focus on elitist algorithms in the following. 

If s; denotes the probability of a single offspring 
finding an improvement, this probability can be in- 
creased by using À offspring in parallel. We have 
already seen in Sect. 46.1 how A independent trials can 
increase or amplify a success probability p to 1— (1 — 
p)*. The same reasoning applies to the probability s; 
for finding an improvement on the current best level. 
Figure 46.1 has shown how this probability increases 
with the number of trials. Figure 46.12 shows how the 
expected time for having a success decreases with the 
number of offspring. In fact, the curves in Fig. 46.12 
are just reciprocals of those in the previous Fig. 46.1. 

Figure 46.12 shows that the speedup can be close to 
linear (in a strict, non-asymptotic sense), especially for 
low success probabilities. As the probability of increas- 
ing the current fitness level i is at least 1 — (1 — si), we 
obtain the following. 


Theorem 46.2 

Consider an elitist EA creating À offspring indepen- 
dently in each generation. Assume a fitness-level par- 
tition Aj <---<~A,, where Am is the set of global 
optima. Let s; be a lower bound for the probability that 


Parallel Evolutionary Algorithms | 46.5 Speedups by Parallelization 


in one generation a single offspring finds a search point 
in Aj+; U---UA,, if the best individual in the parent 
population is in A;. Then the expected optimization time 
is bounded by 


m—1 m—1 


1 1 1 
2 Taaaye a ae ae 


i=1 i=1 


Note that the first bound for A = 1 reproduces the 
previous upper bound from Theorem 46.1. For the sec- 
ond bound we used 

ee ee (46.2) 
T=(l=s9" KM Si 
where the inequality was proposed by Jon Rowe (per- 
sonal communication, 2011); it can be proven by a sim- 
ple induction. 

Our estimate of the probability for an improvement 
increases with the number of islands on the current best 
fitness level. In a spatially structured EA these growth 
curves are non-trivial. Especially with a sparse migra- 
tion topology, information about the current best fitness 
level is typically propagated quite slowly. The increased 
exploration slows down exploitation. Still, even sparse 
topologies lead to drastically improved upper bounds, 
when compared to the simple bound for a sequential 
EA from Theorem 46.1. The precise bounds crucially 
depend on the particular topology. 


Expected parallel time 
A 
20 


> 
0 2 4 6 8 10 
Number of independent trials 


Fig. 46.12 Plots of the expected parallel time until an off- 
spring population of size À has a success, if each offspring 
independently has a success probability of p. The dashed 
lines indicate a perfect linear speedup 


We first consider a setting where migration always 
transmits the current best fitness level and migration 
occurs in every generation. It is possible to adapt the re- 
sults to account for larger migration intervals. One way 
of doing this is to redefine s; to represent a lower bound 
of finding an improvement in a time period between 
migrations. Then we obtain an upper bound on the ex- 
pected number of migrations. For the sake of simplicity, 
we only consider the case t = 1 in the following. 

The following theorem was presented in Lässig and 
Sudholt [46.6]; it is a refined special case of previous 
results [46.4]. The main proof idea is to combine the 
investigation of growth curves with the consideration 
of amplified success probabilities. 


Theorem 46.3 Lässig and Sudholt [46.6] 

Consider an island model with jz islands where each is- 
land runs an elitist EA. In every iteration each island 
sends copies of its best individual to all neighbored is- 
lands (i. e., t = 1). Each island incorporates the best out 
of its own individuals and its immigrants. 

For every partition A; <p +-+ <p Am if s; is a lower 
bound for the probability that in one generation an is- 
land in A; finds a search point in Aj+; U +++ U Am then 
the expected parallel optimization time is bounded by: 
iL, oy = T7 + T ye Ł for every unidirectional 

ring (a ring with edges in one direction) or any other 

strongly connected topology, 
2. 3, et a 522 + for every undirected grid 


i=1 i=l 5; 
or torus graph with side lengths at least u x ~H, 
3. m—-1+ 7 572] L for the complete topology Ky. 


i=1 Sj 


Note that the bound for the complete topology K,, is 
equal to the upper bound for offspring populations, The- 
orem 46.2. This makes sense as an island model with 
a complete topology propagates the current best fitness 
level like an offspring population. 

All bounds in Theorem 46.3 consist of two additive 
terms. The second term 


represents a perfect linear speedup, compared to the up- 
per bound from Theorem 46.1. The larger we choose 
the number of islands jz, the smaller this term becomes. 
The first additive term is related to the growth curves of 
the current best fitness level in the island model. The 


951 


S°94 | 3 Hed 


952 


6°94 | 3 Hed 


Part E 


Evolutionary Computation 


denser the topology, the faster information is spread, 
and the smaller this term becomes. Note that it is inde- 
pendent of u. It can be regarded as the term limiting the 
degree of parallelizability. We can increase the number 
of islands in order to decrease the second term 


but we cannot decrease the first term by changing ju. 

This allows for immediate conclusions about cases 
where we obtain an asymptotic linear speedup over 
a single-island EA. For all choices of jz where the sec- 
ond term is asymptotically no smaller than the first 
term, the upper bound is smaller than the upper bound 
from Theorem 46.1 by a factor of order u. This is an 
asymptotic linear speedup if the upper bound from The- 
orem 46.1 is asymptotically tight. (If it is not, we can 
only compare upper bounds for a sequential and a par- 
allel EA.) 

We illustrate this with a simple and well-known test 
function from pseudo-Boolean optimization. The algo- 
rithm considered is an island model where each island 
runs a (1 + 1) EA; the island model is also called paral- 
lel (1 + 1) EA. The function 


LO(x) := 2 IE (LeadingOnes) 


i=1j=1 


counts the number of leading ones in the bit string. 
We choose the canonic partition where A; contains all 
search points with fitness i, 1. e., i leading ones. For any 
set A; 0 <i<n—1 we use the following lower bound 
on the probability for an improvement. 

An improvement occurs if the first 0-bit is flipped 
from 0 to 1 and no other bit flips. The probability 
of flipping the mentioned 0-bit is 1/n as each bit is 


flipped independently with probability 1/n. The prob- 
ability of not flipping any other bit is (1 —1/n)"—'. We 
use the common estimate (1—1/n)"—!> 1/e, where 
e = exp(1) = 2.718..., so the probability of an im- 
provement is at least s; > 1/(en) for all O<i<n-1. 
Plugging this into Theorem 46.3, the second term is 


i - en? for all bounds. The first terms are 


2n- (en)! = 2e! n? 
for the ring, 
3n- (en)!/3 = 3e!/3n4/3 


for the torus, and n for the complete graph, respectively. 

For the ring, choosing u = O(n!” 2) islands results 
in an expected parallel time of ol} -n) as the second 
term is asymptotically not smaller than the first one. 
This is asymptotically smaller by a factor of 1/j than 
the expected optimization time of a single (1 + 1) EA, 
O(n?) [46.42]. Hence, each choice of u up to p= 
O(n'/?) gives a linear speedup. For the torus we obtain 
a linear speedup for u = O(n?/*) in the same fashion. 
For the complete graph this even holds for u = O(n). 
One can see here that the island model can decrease the 
expected parallel running time by significant polyno- 
mial factors. 

Table 46.3 lists expected parallel optimization time 
bounds for several well-known pseudo-Boolean func- 
tions. The above analysis for LO generalizes to all 
unimodal functions. A function is called unimodal here 
if every non-optimal search point has a better Ham- 
ming neighbor, i. e., a better search point can be reached 
by flipping exactly one specific bit. ONEMAX(x) = 
71%; counts the number of ones, hence modeling 
a simple hill climbing task. Finally, Jump, [46.42] is 
a multimodal function of tunable difficulty. An EA 


Table 46.3 Upper bounds for expected parallel optimization times (number of generations) for the (1 + 1) EA and the 
corresponding island model with jz islands in pseudo-Boolean optimization. The last but one column is for any unimodal 
function with d function values. The number of function evaluations in the island model is larger than the number of 


generations by a factor of u 
Algorithm 
(1+1) EA 


ONEMAX 
O(n log n) [46.42] 


O(n+ moen) 
O(n+ nies) 
Island model on K,,/(1 + u) EA O(n+ mosa) 


LO 


Island model on ring 


Island model on torus 


O(n?) [46.42] 
O (n? + z) 
O (ntr + 2) 


O(n+ =) 


Unimodal, d values 
O(nd) 


O (an2 + an) 
O(an'/? + H) 


o(a+ #) 


Jump,, k = 3 
O(n‘) [46.42] 


O (n? + z) 
O (n? + a) 


O(n+ £) 


Parallel Evolutionary Algorithms | 46.5 Speedups by Parallelization 953 


typically has to make a jump by flipping k bits si- 
multaneously, where 2 < k < n. The (1 + 1) EA has an 
expected optimization time of O(n"), hence growing 
rapidly with increasing k. 

One can see that the island model leads to drasti- 
cally reduced parallel optimization times. This particu- 
larly holds for problems where improvements are hard 
to find. 

We remark that Lässig and Sudholt [46.4] also con- 
sidered parallel EAs where migration is not always 
successful in transmitting information about the cur- 
rent best fitness level. This includes the case where 
crossover is used during migration and crossover has 
a certain probability of being disruptive. We do obtain 
upper bounds on the expected optimization time if we 
know a lower bound pt on the probability of a suc- 
cessful transmission. The bounds depend on pt; the 
degree of this dependence is determined by the topol- 
ogy. For simplicity we only focus on the deterministic 
case here. 


46.5.2 Speedups in Combinatorial 
Optimization 


The techniques are also applicable in combinatorial op- 
timization. We review two examples here, presented 
in [46.6]. Scharnow et al. [46.32] considered the classi- 
cal sorting problem as an optimization problem: given 
a sequence of n distinct elements from a totally ordered 
set, sorting is the problem of maximizing sortedness. 
Without loss of generality the elements are 1,...,n; 
then the aim is to find the permutation opt such that 
(Zopt(1),.--, Zopt(7)) is the sorted sequence. 

The search space is the set of all permutations z 
on 1,...,n. Two different operators are used for muta- 
tion. An exchange chooses two indices i Æ j uniformly 
at random from {1,...,} and exchanges the entries at 
positions i and j. A jump chooses two indices in the 
same fashion. The entry at i is put at position j and all 
entries in between are shifted accordingly. For instance, 


a jump with i = 2 and j = 5 would turn (1, 2,3,4,5, 6) 
into (1,3,4,5, 2,6). 

The (1 + 1) EA draws S according to a Poisson dis- 
tribution with parameter A = | and then performs S + 1 
elementary operations. Each operation is either an ex- 
change or a jump, where the decision is made inde- 
pendently and uniformly for each elementary operation. 
The resulting offspring replaces its parent if its fitness 
is not worse. The fitness function f7,,,(7) describes the 
sortedness of (7(1),...,(m)). As in [46.32], we con- 
sider the following measures of sortedness: 


© INV(z) measures the number of pairs (i,j), 1 < i< 
j < n, such that z (i) < 7 (j) (pairs in correct order), 

@ HAM(rx) measures the number of indices i such that 
z (i) = i (elements at the correct position), 

© LAS(z) equals the largest k such that z (i1) < -+ < 
z (ip) for some i; <--- < ip (length of the longest as- 
cending subsequence), 

@ EXC(zsr) equals the minimal number of exchanges 
(of pairs x (i) and 7 (j)) to sort the sequence, leading 
to a minimization problem. 


The expected optimization time of the (1 + 1) EA 
is 2 (n?) and O(n? logn) for all fitness functions. The 
upper bound is tight for LAS, and it is believed to 
be tight for INV, HAM, and EXC as well [46.32]. 
Theorem 46.3 yields the following. For INV, all topolo- 
gies guarantee a linear speedup only in case u = 
O(log n) and the bound O(n? log n) for the (1 + 1) EA is 
tight. The other functions allow for linear speedups up 
to u = O(n! logn) (ring), u = O(n?/3 logn) (torus), 
and u = O(nlogn) (K,,), respectively (again assuming 
tightness, otherwise up to a factor of logn). Note how 
the results improve with the density of the topology. 
HAM, LAS, and EXC yield much better guarantees 
for the island model than INV. This is surprising as 
there is no visible performance difference for a single 
(1+ 1) EA. Theorem 46.3 yields the following results 
also shown in Tab. 46.4 


Table 46.4 Upper bounds for expected parallel optimization times for the (1+ 1) EA and the corresponding island 


model with u islands for sorting n objects 
Algorithm INV 
(1+1) EA O(n? logn) [46.32] 


O (n + noer) 
(0) (0? J Zoen) 
O (r? + Zeen) 


Island model on ring 
Island model on torus 


Island model on Ky, /(1 + 4) EA 


HAM, LAS, EXC 

O(n? log n) [46.32] 

0 nas ete) 

O (n + Zeen) 
n2 logn 

O (n + mT ) 


S'9h | 3 Hed 


954 PartE 


Evolutionary Computation 


S°94 | 3 Hed 


Table 46.5 Worst-case expected parallel optimization times for the (1 + 1) EA and the corresponding island model with 
H islands for the SSSP on graphs with n vertices and m edges. The value £ is the maximum number of edges on any 
shortest path from the source to any vertex and £* := max{£, Inn}. The second lines show a range of jz-values yielding 


a linear speedup, apart from a factor In(en/£) 
Algorithm 
(1+1)EA O(n? l*) [46.34] 

n2 lln(en/£) 
O (n?/201/2 4 Eler) 
— = 0((nb)'/?) 
o (nne a Linton) 
— u=0 ((ne)?/?) 
Island model on Ky,/(1 +) EA O (n $ Kemet) 
— u =0 (nb) 


Island model on ring 


Island model on torus 


An explanation is that INV leads to (5) non-optimal 
fitness levels that are quite easy to overcome. HAM, 
LAS, and EXC have only n non-optimal fitness levels 
that are more difficult. For a single EA both settings 
are equally difficult, leading to asymptotically equal 
expected times (assuming all upper bounds are tight). 
However, the latter setting is easier to parallelize than 
the former as it is easier to amplify small success prob- 
abilities. 

We also consider parallel variants of the (1+ 
1) EA for the single source shortest path prob- 
lem (SSSP) [46.32]. An SSSP instance is given 
by an undirected connected graph with vertices 
{1,... n} and a distance matrix D = (dj)\<ij<n, 
where dj € RE U {oo} defines the length value for 
given edges from node i to node j. We are searching 
for shortest paths from a node s (without loss of gener- 
ality s = n) to each other node 1 < i< n— 1. 

A candidate solution is represented as a shortest 
paths tree, a tree rooted at s with directed shortest paths 
to all other vertices. We define a search point x as 
vector of length n— 1, where position i describes the 
predecessor node x; of node i in the shortest path tree. 
Note that infeasible solutions are possible if the prede- 
cessors do not encode a tree. An elementary mutation 
chooses a vertex i uniformly at random and replaces 
its predecessor x; by a vertex chosen uniformly at ran- 
dom from {1,...,}\ {i,x;}. We call this a vertex-based 
mutation. Doerr et al. [46.65] proposed an edge-based 
mutation operator. An edge is chosen uniformly at ran- 
dom, and the edge is made a predecessor edge for its 
end node. 

The (1 + 1) EA uses either vertex-based mutations 
or edge-based ones. It creates an offspring using S el- 
ementary mutations, where S is chosen according to 


Vertex-based mutation [46.32] 


Edge-based mutation [46.65] 
O(me*) [46.65] 
O(m!/2n'/2¢1/2 at meine) 
— p = O((m/n-£)'/?) 
O (mA i méint 8) 
— = 0((m/n- 07?) 
m£ In(en/£) 
O (n + EE 
—> u =0(m/n-£) 


a Poisson distribution with A = 1. The result of an off- 
spring is accepted in case no distance to any vertex has 
gotten worse. 

Applying Theorem 46.3 along with a layering ar- 
gument as described at the end of Sect. 46.3.4 yields 
the bounds on the expected parallel optimization time 
shown in Table 46.5. 

The upper bounds for the island models with con- 
stant u match the expected time of the (1+ 1) EA if 
£= O(1) or £= Q (n) as then £ln(en/£) = O(4*). In 
other cases, the upper bounds are off by a factor of 
ln(en/£). Table 46.5 also shows a range of j1-values for 
which the speedup is linear (if £ = O(1) or £ = Q(n)) 
or almost linear, that is, when disregarding the In(en/4) 
term. 

Note how the possible speedups significantly in- 
crease with the density of the topology. The speedups 
also depend on the graph instance and the maxi- 
mum number of edges £ on any shortest path. For 
a single (1+ 1) EA edge-based mutations are more 
effective than vertex-based mutations [46.65]. Island 
models with edge-based mutations cannot be paral- 


Table 46.6 Asymptotic bounds for expected parallel run- 
ning times and expected sequential running times for the 
parallel (1 + 1) EA with adaptive population models 


Scheme Sequential Parallel 


ONEMAX A O(nlogn)  O(nlogn) 

B O(nlogn) O(n) 
LO A O(n?) O(nlogn) 

B O(n?) O(n) 
Unimodal f A O(dn) O(d log n) 
with df-values B O(dn) O(d + logn) 
Jump; A O(n’) O(n log n) 
with k > 2 B O(n’) O(n + klogn) 


Parallel Evolutionary Algorithms | 46.5 Speedups by Parallelization 955 


lelized as effectively for sparse graphs as those with 
vertex-based mutations if the graph is sparse, i.e., 
m = o(n°). Then the number of islands that guaran- 
tees a linear speedup is smaller for edge-based mu- 
tations than for vertex-based mutations. The reason 
is that with a more efficient mutation operator there 
is less potential for further speedups with a parallel 
EA. 


46.5.3 Adaptive Numbers of Islands 


Theorem 46.3 presents a powerful tool for deter- 
mining the number of islands that give an asymp- 
totic linear speedup. However, it would be even more 
desirable to have an adaptive system that automati- 
cally finds the ideal number of islands throughout the 
run. 

In [46.5] Lässig and Sudholt proposed and analyzed 
two simple adaptive schemes for choosing the number 
of islands. Both schemes check whether in the cur- 
rent generation some island has found an improvement 
over the current best fitness in the system. If no is- 
land has found an improvement, the number of islands 
is doubled. This can be implemented, for instance, by 
copying each island. New processors can be allocated 
to host these islands in large clusters or by using cloud 
computing. 

If some island has found an improvement, the 
number of islands is reduced by removing selected 
islands from the system and de-allocating resources. 
Both schemes differ in the way they decrease the 
number of islands. The first scheme, simply called 
Scheme A, only keeps one island containing a cur- 
rent best solution. Scheme B halves the number of 
islands. Both schemes use complete topologies, so all 
remaining islands will contain current best individuals 
afterwards. 

Both mechanisms lead to optimal speedups in many 
cases. Doubling the number of islands may seem ag- 
gressive, but the analysis shows that the probability of 
allocating far more islands than necessary is very very 
small. The authors considered the expected sequential 
optimization time, defined as the number of function 
evaluations, to measure the total effort over time. With 
both schemes it is guaranteed that the expected se- 
quential time does not exceed the simple bound for 
a sequential EA from Theorem 46.1, asymptotically. 
The expected parallel times on each fitness level can, 
roughly speaking, be replaced by their logarithms. 

The following is a slight simplification of results 
in [46.5]. 


Theorem 46.4 Ldssig and Sudholt [46.5] 

Given an f-based partition A,,...,A,, and lower 
bounds s1,...,Sm—1 On the probability of a single is- 
land finding an improvement, the expected sequential 
times for island models using a complete topology and 
either Scheme A or Scheme B are bounded by 


If each set A; contains only a single fitness value then 
also the expected parallel time is bounded by 


m—1 2 
4 log|—]}. 
Sj 


i=1 


Actually, for Scheme A we can obtain slightly better 
constants than the ones stated in Theorem 46.4. How- 
ever, with a more detailed analysis one can show that 
Scheme B can perform much better than Scheme A. 
Ldssig’s and Sundholt’s work [46.5] contains a more 
refined upper bound for Scheme B. We only show a spe- 
cial case where the fitness levels become increasingly 
harder. Then it makes sense to only halve the number 
of islands when an improvement is found, instead of re- 
setting the number of islands to 1. 


Theorem 46.5 Ldssig and Sudholt [46.5] 

Given an f-based partition A,,...,Am, where each 
set A; contains only a single fitness value and for the 
probability bounds it holds sı > s2 > -++ > Sm—1. Then 
the expected parallel running time for an island model 
using a complete topology and Scheme B is bounded by 


1 
3(m—2) + toe (- ) . 
m—1 


Example applications for a parallel (1+ 1) EA in 
Table 46.6 show that Scheme B can automatically lead 
to the same speedups as when using an optimal number 
of islands. This holds for ONEMAX, LO, and the gen- 
eral bound for unimodal functions. For Jump, it also 
holds in the most relevant cases, when k = O(n/ logn), 
as then the expected parallel time is O(n). 

We conclude that simply doubling or halving the 
number of islands represents a simple and effective 
mechanism for finding optimal parameters adaptively. 


S°94 | 3 Hed 


956 Part E | Evolutionary Computation 


9°94 |3 Hed 


46.6 Conclusions 


Parallel evolutionary algorithm can effectively reduce 
computation time and at the same time lead to an in- 
creased exploration and better diversity, compared to 
sequential evolutionary algorithms. 

We have surveyed various forms of parallel EAs, 
from independent runs to island models and cellular 
EAs. Different lines of research have been discussed that 
give insight into the working principles behind parallel 
EAs. This includes the spread of information, growth 
curves for current best solutions, and takeover times. 

A recurring theme was the possible speedup that can 
be achieved with parallel EAs. We have elaborated on 
the reasons why superlinear speedups are possible in 
practice. Rigorous runtime analysis has given examples 
where parallel EAs excel over sequential algorithms, 
with regard to the number of generations or the num- 
ber of function evaluations until a global optimum is 
found. The final section has covered a method for esti- 
mating the expected parallel optimization time of island 
models. The method is easy to apply as we can auto- 
matically transfer existing analyses for sequential EAs 
to a parallel version thereof. Examples have been given 
for pseudo-Boolean optimization and combinatorial op- 
timization. The results have also led to the discovery of 
a simple, yet surprisingly powerful adaptive scheme for 
choosing the number of islands. 

There are many possible avenues for future work. In 
the light of the development in computer architecture, it 
is important to develop parallel EAs that can run effec- 
tively on many cores. It also remains a crucial issue to 
increase our understanding of how design choices and 
parameters affect the performance of parallel EAs. Rig- 
orous runtime analysis has emerged recently as a new 
line of research that can give novel insights in this re- 
spect and opens new roads. The present results should 
be extended towards further algorithms, further prob- 
lems, and more detailed cost models that reflect the 
costs for communication in parallel architectures. It 
would also be interesting to derive further rigorous re- 
sults on takeover times in settings where propagation 
through migration is probabilistic. Finally, it is impor- 
tant to bring theory and practice together in order to 
create synergetic effects between the two areas. 


46.6.1 Further Reading 


This book chapter does not claim to be comprehen- 
sive. In fact, parallel evolutionary algorithms represent 


a vast research area with a long history. Early vari- 
ants of parallel evolutionary algorithms were devel- 
oped, studied, and applied more than 20 years ago. 
We, therefore, point the reader to references that may 
complement this chapter. Paz [46.66] presented a re- 
view of early literature and the history of parallel 
EAs. The survey by Alba and Troya [46.37] contains 
detailed overviews of parallel EAs and their character- 
istics. 

This chapter does not cover implementation de- 
tails of parallel evolutionary algorithms. We refer to 
the excellent survey by Alba and Tomassini [46.38]. 
This survey also includes an overview of the theory 
of parallel EAs. The emphasis is different from this 
chapter and it can be used to complement this chap- 
ter. 

Tomassini’s text book [46.67] describes various 
forms of parallel EAs like island models, cellular 
EAs, and coevolution. It also presents many mathe- 
matical and experimental results that help understand 
how parallel EAs work. Furthermore, it contains an 
appendix dealing with the implementation of parallel 
EAs. 

The book edited by Alba etal. [46.39] takes 
a broader scope on parallel models that also in- 
clude parallel evolutionary multiobjective optimization 
and parallel variants of swarm intelligence algorithms 
like particle swarm optimization and ant colony opti- 
mization. The book contains a part on parallel hard- 
ware as well as a number of applications of parallel 
metaheuristics. 

Alba’s edited book on parallel metaheuris- 
tics [46.40] has an even broader scope. It covers 
parallel variants of many common metaheuristics such 
as genetic algorithms, genetic programming, evolu- 
tion strategies, ant colony optimization, estimation- 
of-distribution algorithms, scatter search, variable- 
neighborhood search, simulated annealing, tabu 
search, greedy randomized adaptive search procedures 
(GRASPs), hybrid metaheuristics, multiobjective 
optimization, and heterogeneous metaheuristics. 

The most recent text book was written by Luque and 
Alba [46.19]. It provides an excellent introduction into 
the field, with hands-on advice on how to present results 
for parallel EAs. Theoretical models of selection pres- 
sure in distributed GAs are presented. A large part of 
the book then reviews selected applications of parallel 
GAs. 


Parallel Evolutionary Algorithms 


References 


References 
46.1 P.S. Oliveto, J. He, X. Yao: Time complexity of evolu- 46.14 E. Alba: Parallel evolutionary algorithms can 
tionary algorithms for combinatorial optimization: achieve super-linear performance, Inf. Process. 
A decade of results, Int. J. Autom. Comput. 4(3), Lett. 82(1), 7-13 (2002) 
281-293 (2007) 46.15 R.S. Barr, B.L. Hickman: Reporting computational 
46.2 F. Neumann, C. Witt: Bioinspired Computation in experiments with parallel algorithms: Issues, mea- 
Combinatorial Optimization — Algorithms and Their sures, and experts’ opinion, ORSA J. Comput. 5(1), 
Computational Complexity (Springer, Berlin, Hei- 2-18 (1993) 
delberg 2010) 46.16 D.E. Goldberg, K. Deb: A comparatative analysis 
46.3 J. Lassig, D. Sudholt: The benefit of migration in of selection schemes used in genetic algorithms. 
parallel evolutionary algorithms, Proc. Genet. Evol. In: Foundations of Genetic Algorithms, ed. by 
Comput. Conf. (GECCO 2010) (ACM, New York 2010) G.J.E. Rawlins (Morgan Kaufmann, Burlington 1991) 
pp. 1105-1112 pp. 69-93 
46.4 J. Lassig, D. Sudholt: General scheme for analyzing 46.17 J. Sarma, K. De Jong: An analysis of local selection 
running times of parallel evolutionary algorithms, algorithms in a spatially structured evolutionary 
Tith Int. Conf. Parallel Probl. Solving Nat. (PPSN algorithm, Proc. 7th Int. Conf. Genet. Algorithms 
2010) (Springer, Berlin, Heidelberg 2010) pp. 234- (Morgan Kaufmann, Burlington 1997) pp. 181-186 
243 46.18 E. Alba, G. Luque: Growth curves and takeover 
46.5 J. Lassig, D. Sudholt: Adaptive population models time in distributed evolutionary algorithms, Proc. 
for offspring populations and parallel evolutionary Genet. Evol. Comput. Conf. (Springer, Berlin, Hei- 
algorithms, Proc. 11th Workshop Found. Genet. Al- delberg 2004) pp. 864-876 
gorithms (FOGA 2011) (ACM, Berlin, Heidelberg 2011) 46.19 G. Luque, E. Alba: Parallel Genetic Algorithms - 
pp. 181-192 Theory and Real World Applications, Studies in 
46.6 J. Lassig, D. Sudholt: Analysis of speedups in Computational Intelligence, Vol. 367 (Springer, 
parallel evolutionary algorithms for combinatorial Berlin, Heidelberg 2011) 
optimization, 22nd Int. Symp. Algorithms Com- 46.20 Z. Skolicki, K.A. De Jong: The influence of migra- 
put. (ISAAC '11) (Springer, Berlin, Heidelberg 2011) tion sizes and intervals on island models, Proc. 
pp. 405-414 Genet. Evol. Comput. Conf. (GECCO '05) (ACM, New 
46.7 F. Neumann, P.S. Oliveto, G. Rudolph, D. Sudholt: York 2005) pp. 1295-1302 
On the effectiveness of crossover for migration in 46.21 M. Giacobini, E. Alba, M. Tomassini: Selection in- 
parallel evolutionary algorithms, Proc. Genet. Evol. tensity in asynchronous cellular evolutionary algo- 
Comput. Conf. (GECCO 2011) (ACM, New York 2011) rithms, Proc. Genet. Evol. Comput. Conf. (GECCO '03) 
pp. 1587-1594 (Springer, Berlin, Heidelberg 2003) pp. 955-966 
46.8 M. De Felice, S. Meloni, S. Panzieri: Effect of 46.22 G. Rudolph: Takeover times and probabilities of 
topology on diversity of spatially-structured evo- non-generational selection rules, Proc. Genet. 
lutionary algorithms, Proc. 13th Annu. Genet. Evol. Comput. Conf. (GECCO '00) (Morgan Kaufmann, 
Evol. Comput. Conf. (GECCO '11) (2011) pp. 1579- Burlington 2000) pp. 903-910 
1586 46.23 G. Rudolph: Takeover times of noisy non- 
46.9 M. Giacobini, M. Tomassini, A. Tettamanzi: Takeover generational selection rules that undo extinction, 
time curves in random and small-world struc- Proc. 5th Int. Conf. Artif. Neural Nets Genet. Algo- 
tured populations, Proc. Genet. Evol. Comput. rithms (ICANNGA 2001) (Springer, Berlin, Heidelberg 
Conf. (GECCO '05) (ACM, New York 2005) pp. 1333- 2001) pp. 268-271 
1340 46.24 G. Rudolph: On takeover times in spatially struc- 
46.10 Z. Skolicki: An Analysis of Island Models in Evolu- tured populations: Array and ring, Proc. 2nd Asia- 
tionary Computation, Ph.D. Thesis (George Mason Pac. Conf. Genet. Algorithms Appl. (Global-Link 
University, Fairfax 2000) Publishing, Hong Kong 2000) pp. 144-151 
46.11 E. Alba, M. Giacobini, M. Tomassini, S. Romero: 46.25 G. Rudolph: Takeover time in parallel populations 
Comparing Synchronous and Asynchronous Cellu- with migration, Proc. 2nd Int. Conf. Bioinspired Op- 
lar Genetic Algorithms, Parallel Problem Solving tim. Methods Appl. (BIOMA 2006), ed. by B. Filipic, 
from Nature VII (Springer, Berlin, Heidelberg 2002) J. Silc (2006) pp. 63-72 
pp. 601-610 46.26 M. Giacobini, M. Tomassini, A. Tettamanzi: Mod- 
46.12 M. Mitzenmacher, E. Upfal: Probability and Com- elling selection intensity for linear cellular evo- 
puting (Cambridge Univ. Press, Cambridge 2005) lutionary algorithms, Proc. 6th Int. Conf. Artif. 
46.13 J. Sprave: A unified model of non-panmictic pop- Evol., Evol. Artif. (Springer, Berlin, Heidelberg 2003) 
ulation structures in evolutionary algorithms, Proc. pp. 345-356 
1999 Congr. Evol. Comput. (IEEE, Bellingham 1999) 46.27 M. Giacobini, E. Alba, A. Tettamanzi, M. Tomassini: 


pp. 1384-1391 


Selection intensity in cellular evolutionary algo- 


957 


9% |3 Hed 


958 PartE 


Evolutionary Computation 


9% |3 Hed 


46.28 


46.29 


46.30 


46.31 


46.32 


46.33 


46.34 


46.35 


46.36 


46.37 


46.38 


46.39 


46.40 


46.41 


46.42 


46.43 


46.44 


rithms for regular lattices, IEEE Trans. Evol. Comput. 
9, 489-505 (2005) 

C. Witt: Runtime analysis of the (u + 1)EA on sim- 
ple pseudo-Boolean functions, Evol. Comput. 14(1), 
65-86 (2006) 

D. Sudholt: The impact of parametrization in 
memetic evolutionary algorithms, Theor. Comput. 
Sci. 410(26), 2511-2528 (2009) 

M. Giacobini, E. Alba, A. Tettamanzi, M. Tomassini: 
Modeling selection intensity for toroidal cellular 
evolutionary algorithms, Proc. Genet. Evol. Com- 
put. Conf. (GECCO '04) (Springer, Berlin, Heidelberg 
2004) pp. 1138-1149 

J. Rowe, B. Mitavskiy, C. Cannings: Propaga- 
tion time in stochastic communication networks, 
2nd IEEE Int. Conf. Digit. Ecosyst. Technol. (2008) 
pp. 426-431 

J. Scharnow, K. Tinnefeld, |. Wegener: The analysis 
of evolutionary algorithms on sorting and shortest 
paths problems, J. Math. Model, Algorithms 3(4), 
349-366 (2004) 

B. Doerr, E. Happ, C. Klein: Crossover can prov- 
ably be useful in evolutionary computation, Theor. 
Comput. Sci. 425, 17-33 (2012) 

B. Doerr, E. Happ, C. Klein: A tight analysis of the 
(1+ 1)-EA for the single source shortest path prob- 
lem, Proc. IEEE Congr. Evol. Comput. (CEC '07) (IEEE, 
Bellingham 2007) pp. 1890-1895 

C. Horoba, D. Sudholt: Ant colony optimization for 
stochastic shortest path problems, Proc. Genet. 
Evol. Comput. Conf. (GECCO 2010) (ACM, New York 
2010) pp. 1465-1472 

D. Sudholt, C. Thyssen: Running time analysis of 
ant colony optimization for shortest path prob- 
lems, J. Discret. Algorithms 10, 165-180 (2012) 

E. Alba, J.M. Troya: A survey of parallel distributed 
genetic algorithms, Complexity 4, 31-52 (1999) 

E. Alba, M. Tomassini: Parallelism and evolutionary 
algorithms, IEEE Trans. Evol. Comput. 6, 443-462 
(2002) 

E. Alba, N. Nedjah, L. de Macedo Mourelle: Parallel 
Evolutionary Computations (Springer, Berlin, Hei- 
delberg 2006) 

E. Alba: Parallel Metaheuristics: A New Class of Al- 
gorithms (Wiley-Interscience, New York 2005) 

T.G. Crainic, N. Hail: Parallel metaheuristics appli- 
cations. In: Parallel Metaheuristics: A New Class of 
Algorithms, (Wiley-Interscience, New York 2005) 

S. Droste, T. Jansen, |. Wegener: On the analysis of 
the (1+ 1) evolutionary algorithm, Theor. Comput. 
Sci. 276, 51-81 (2002) 

T. Friedrich, P.S. Oliveto, D. Sudholt, C. Witt: 
Analysis of diversity-preserving mechanisms for 
global exploration, Evol. Comput. 17(4), 455-476 
(2009) 

C. Witt: Worst-case and average-case approxima- 
tions by simple randomized search heuristics, Proc. 
22nd Symp. Theor. Asp. Comput. Sci. (STACS '05) 
(Springer, Berlin, Heidelberg 2005) pp. 44-56 


46.45 


46.46 


46.47 


46.48 


46.49 


46.50 


46.51 


46.52 


46.53 


46.54 


46.55 


46.56 


46.57 


46.58 


46.59 


46.60 


T. Jansen, K.A. De Jong, |. Wegener: On the 
choice of the offspring population size in evo- 
lutionary algorithms, Evol. Comput. 13, 413-440 
(2005) 

C. Igel, M. Toussaint: A no-free-lunch theorem 
for non-uniform distributions of target func- 
tions, J. Math. Model, Algorithms 3(4), 313-322 
(2004) 

J. Lassig, D. Sudholt: Experimental supplements 
to the theoretical analysis of migration in the is- 
land model, 11th Int. Conf. Parallel Probl. Solving 
Nat. (PPSN 2010) (Springer, Berlin, Heidelberg 2010) 
pp. 224-233 

F. Neumann: Expected runtimes of evolutionary al- 
gorithms for the Eulerian cycle problem, Comput. 
Oper. Res. 35(9), 2750-2759 (2008) 

B. Doerr, N. Hebbinghaus, F. Neumann: Speed- 
ing up evolutionary algorithms through asymmet- 
ric mutation operators, Evol. Comput. 15, 401-410 
(2007) 

B. Doerr, D. Johannsen: Adjacency list matchings - 
An ideal genotype for cycle covers, Proc. Genet. 
Evol. Comput. Conf. (GECCO '07) (ACM, New York 2007) 
pp. 1203-1210 

B. Doerr, C. Klein, T. Storch: Faster evolutionary 
algorithms by superior graph representation, 1st 
IEEE Symp. Found. Comput. Intell. (FOCI '07) (2007) 
pp. 245-250 

R.A. Watson, T. Jansen: A building-block royal road 
where crossover is provably essential, Proc. Genet. 
Evol. Comput. Conf. (GECCO '07) (ACM, New York 2007) 
pp. 1452-1459 

T. Jansen, |. Wegener: On the analysis of evolution- 
ary algorithms — A proof that crossover really can 
help, Algorithmica 34(1), 47-66 (2002) 

T. Jansen, |. Wegener: Real royal road functions — 
Where crossover provably is essential, Discret. Appl. 
Math. 149, 111-125 (2005) 

T. Storch, |. Wegener: Real royal road functions for 
constant population size, Theor. Comput. Sci. 320, 
123-134 (2004) 

S. Fischer, |. Wegener: The one-dimensional ising 
model: Mutation versus recombination, Theor. 
Comput. Sci. 344(2/3), 208-225 (2005) 

D. Sudholt: Crossover is provably essential for the 
ising model on trees, Proc. Genet. Evol. Comput. 
Conf. (GECCO '05) (ACM, New York 2005) pp. 1161-1167 
P.S. Oliveto, J. He, X. Yao: Analysis of the (1 + 1)-EA 
for finding approximate solutions to vertex cover 
problems, IEEE Trans. Evol. Comput. 13(5), 1006- 
1029 (2009) 

T. Jansen, P.S. Oliveto, C. Zarges: On the analysis 
of the immune-inspired B-cell algorithm for the 
vertex cover problem, Proc. 10th Int. Conf. Artif. 
Immune Syst. (ICARIS 2011) (Springer, Berlin, Heidel- 
berg 2011) pp. 117-131 

|. Wegener: Methods for the analysis of evolu- 
tionary algorithms on pseudo-Boolean functions. 
In: Evolutionary Optimization, ed. by R. Sarker, 


Parallel Evolutionary Algorithms 


References 


46.61 


46.62 


46.63 


X. Yao, M. Mohammadian (Kluwer, Dordrecht 2002) 
pp. 349-369 

F. Neumann, I. Wegener: Randomized local search, 
evolutionary algorithms, and the minimum span- 
ning tree problem, Theor. Comput. Sci. 378(1), 32- 
40 (2007) 

D. Sudholt, C. Zarges: Analysis of an iterated local 
search algorithm for vertex coloring, 21st Int. Symp. 
Algorithms Comput. (ISAAC 2010) (Springer, Berlin, 
Heidelberg 2010) pp. 340-352 

D. Sudholt: General lower bounds for the run- 
ning time of evolutionary algorithms, 11th Int. Conf. 
Parallel Probl. Solving Nat. (PPSN 2010) (Springer, 
Berlin, Heidelberg 2010) pp. 124-133 


46.64 


46.65 


46.66 


46.67 


P.K. Lehre: Fitness-levels for non-elitist popu- 
lations, Proc. 13th Annu. Genet. Evol. Comput. 
Conf. (GECCO '11) (ACM, New York 2011) pp. 2075- 
2082 

B. Doerr, D. Johannsen, C. Winzen: Drift analy- 
sis and linear functions revisited, IEEE Congr. Evol. 
Comput. (CEC '10) (2010) pp. 1967-1974 

E. Cantú Paz: A survey of parallel genetic algo- 
rithms, Tech. Rep., Illinois Genetic Algorithms Lab- 
oratory (University of Illinois at Urbana Champaign, 
Urbana 1997) 

M. Tomassini: Spatially Structured Evolutionary Al- 
gorithms: Artificial Evolution in Space and Time 
(Springer, Berlin, Heidelberg 2005) 


959 


9% |3 Hed 


47, Learning Classifier Systems 


Martin V. Butz 


Learning Classifier Systems (LCSs) essentially 
combine fast approximation techniques with evo- 
lutionary optimization techniques. Despite their 
somewhat misleading name, LCSs are not only sys- 
tems suitable for classification problems, but may 
be rather viewed as a very general, distributed 
optimization technique. Essentially, LCSs have very 
high potential to be applied in any problem do- 
main that is best solved or approximated by means 
of a distributed set of local approximations, or 
predictions. The evolutionary component is de- 
signed to optimize a partitioning of the problem 
domain for generating maximally useful predic- 
tions within each subspace of the partitioning. 
The predictions are generated and adapted by 
the approximation technique. Generally any form 
of spatial partitioning and prediction are pos- 
sible — such as a Gaussian-based partitioning 
combined with linear approximations, yielding 
a Gaussian mixture of linear predictions. In fact, 
such a solution is developed and optimized by 
XCSF (XCS for function approximation). The LCSs XCS 
(X classifier system) and the function approxima- 
tion version XCSF, indeed, are probably the most 
well-known LCS architectures to date. Their opti- 
mization technique is very-well balanced with the 
approximation technique: as long as the approxi- 
mation technique yields reasonably good solutions 
and evaluations of these solutions fast, the evolu- 
tionary component will pick-up on the evaluation 
signal and optimize the partitioning. This chapter 


Learning classifier systems (LCSs) are machine learn- 
ing algorithms that combine gradient-based approx- 
imation with evolutionary optimization. Due to this 
flexibility, LCSs have been successfully applied to 
classification and data mining problems, reinforcement 
learning (RL) problems, regression problems, cogni- 
tive map learning, and even robot control problems. 


47.1 Background................ceeeeeeeeeeeneeeeeennee 962 
47.1.1 Early Applications... 962 
47.1.2 The Pitt and Michigan Approach. 963 
47.1.3 Basic Knowledge Representation 964 


D9. E ee 965 
47.2.1 System Overview ...............c:00e08 965 
47.2.2 When and How XCS Works.......... 968 
47.2.3. When and How to Apply XCS....... 968 
47.2.4 Parameter Tuning in XCS............. 969 

WAS ESP T 970 

47.4 Data Mining ....................cccccceeeeeeeeeeeees 972 

47.5 Behavioral Learning......................00068 973 


47.5.1 Reward-Based Learning with LCSs 973 
47.5.2 Anticipatory Learning 


Classifier Systems ...............0.::00 974 

47.5.3 Controlling a Robot Arm 
TT FV DES fos cs lsoxsssuicancsqsmonas sone 975 
G76 CONCMISIONS «...605ccccccsecsccssecccsececssseewseee 977 
47.7 Books and Source Code ........................ 978 
Referentes. raa ana 979 


provides historical background on LCSs. Then XCS 

and XCSF are introduced in detail providing enough 
information to be able to implement, understand, 
and apply these systems. Further LCS architectures 
are surveyed and their potential for future research 
and for applications is discussed. The conclusions 
provide an outlook on the many possible future 

LCS applications and developments. 


The main feature of LCSs is their innovative combina- 
tion of two learning principles; whereas gradient-based 
approximation adapts local, predictive approximations 
of target function values, evolutionary optimization 
structures individual classifiers to enable the formation 
of effectively distributed and accurate approximations. 
The two learning methods interact bidirectionally in 


961 


v 
o 
= 
et 
m 
S 


962 


24 |3 Hed 


Part E 


Evolutionary Computation 


that the gradient-based approximations yield local fit- 
ness quality estimates of the generated approximations, 
which the evolutionary optimization technique uses 
for optimizing classifier structures. Concurrently, the 
evolutionary optimization technique is generating new 
classifier structures, which again need to be evaluated 
by the gradient-based approach in competition with the 
other, locally overlapping, interacting classifiers. 

Due to the innovative combination of two learning 
and optimization techniques, LCSs are often perceived 
as being hard to understand. Facet-wise analyses of 
the individual LCS components and their interactions, 
however, give both mathematical scalability bounds 
for learning and an intuitive understanding of the sys- 
tems in general. Moreover, the currently most common 
LCS, which is the XCS classifier system (note that 
the X in XCS does not really encode any particular 
acronym according to the system creator Wilson), is 
comparatively easy to understand, to tune, and to ap- 
ply. Thus, the core of this chapter focuses on XCS, 
gives a facet-wise overview of its functionality, de- 
tails several enhancements, and highlights various suc- 
cessful application domains. However, XCS is also 
compared with other LCS architectures and LCSs in 


47.1 Background 


Learning classifier systems (LCS) were proposed over 
30 years ago by Holland [47.1-3]. Originally, Hol- 
land and Reitman actually called LCSs cognitive sys- 
tems [47.4], focusing on problems related to reinforce- 
ment learning (RL) [47.5,6]. His cognitive system 
developed a memory of classifiers, where each classi- 
fier consisted of a condition part (taxon), an action part 
(originally consisting of a message, and an effector bit), 
a payoff prediction part, and several other parameters 
that stored the age, the application frequency, and the 
attenuation of the classifier. 

Concurrently with the development of temporal dif- 
ference learning techniques in RL — such as the now 
well-known state-action-reward-state-action (SARSA) 
algorithm [47.6] — Holland and Reitman introduced 
the bucket brigade algorithm [47.4, 7], which also dis- 
tributes reward backwards in time with a discounting 
mechanism. In addition, the attenuation parameter in 
a classifier realized something similar to an eligibility 
trace in RL — distributing a currently encountered re- 
ward also to classifiers that were active several time 
steps ago and that thus indirectly led to gaining the cur- 


general are compared with other machine learning tech- 
niques. 

This chapter starts with a historical perspective, 
providing information on the beginnings of LCSs and 
establishing some terminology background. We then in- 
troduce the XCS classifier system providing a detailed 
system overview as well as theoretical and facet-wise 
conceptual insights on its performance. Also tricks 
and tweaks are discussed to tune the system to the 
problem at hand. Next, the XCS counterpart for regres- 
sion problems, XCSF, is introduced. Focusing then on 
the application-side, LCS applications to data mining 
tasks and to behavioral learning and cognitive modeling 
tasks are surveyed. We cover various LCS architectures 
that have been successfully applied in the data mining 
realm. With respect to behavioral learning, we point out 
the relation of LCSs to reinforcement learning. More- 
over, we cover anticipatory learning classifier systems 
(ALCSs) — which learn predictive schema models of 
the environment rather than reward prediction maps — 
and we introduce the modified XCSF version that can 
effectively learn a redundant forward-inverse kinemat- 
ics model of a robot arm. A summary and conclusions 
wrap up the chapter. 


rently experienced reward. Meanwhile, Holland’s cog- 
nitive system applied a genetic algorithm (GA) [47.1, 
8] as its second learning mechanism. The GA modified 
the taxa in Holland and Reitman’s cognitive system. 

In sum, the first actual LCS implementation, i.e., 
the cognitive system by Holland and Reitman [47.4], 
was ahead of its time. It implemented various reward- 
related ideas that were later established in the reinforce- 
ment learning community — and can now partially be 
regarded as standard RL techniques. However, the com- 
bination with GAs yielded a highly interactive and very 
complex system that was and still is hard to analyze. 
Thus, while proposing a highly innovative cognitive 
learning approach, the applicability of the system re- 
mained limited at the time. 


47.1.1 Early Applications 


Nonetheless, early applications of LCSs were pub- 
lished in the 1980s. Smith developed a poker deci- 
sion making system [47.9] based on De Jong’s ap- 
proach to LCSs [47.10]. Booker worked on animal-like 


Learning Classifier Systems | 47.1 Background 963 


automation based on the cognitive systems architec- 
ture [47.11]. Wilson proposed and worked on the animat 
problem with LCS architectures derived from Hol- 
land and Reitman’s cognitive systems approach [47.12, 
13]. Goldberg solved a gas pipeline control task with 
a simplified version of the cognitive system archi- 
tecture [47.8, 14]. Despite these successful early ap- 
plications, a decade passed until a growing research 
community developed that worked on learning classi- 
fier systems. 


47.1.2 The Pitt and Michigan Approach 


Two fundamentally different LCS approaches were pur- 
sued from early on. The Pitt approach was fostered by 
the work of De Jong et al. [47.10, 15, 16]. On the other 
hand, the Michigan approach developed in the further 
years at Michigan under the supervision of John H. 
Holland [47.11, 14, 17, 18]. Diverse perspectives on the 
Michigan approach can be found in [47.19]. 

The essential difference between the two ap- 
proaches is that in the Pitt approach rule sets are evolved 
where each particular rule set constitutes an individual 
for the GA. In contrast, in the Michigan approach one 
set of rules is evolved and each rule is an individual for 


a) Pitt approach 


Problem instance 
(state information) & 
quality feedback 


Actions 
(classifications) 


b) Michigan approach 


Rule set 
(population of classifiers) 


Problem instance 
(state information) & 
quality feedback 


Evaluation 
of classifiers 


Action 
(classification) 


Environment (problem) 


Fig. 47.1a,b While the Pitt approach to LCSs evolves 
a population of sets of rules, in the Michigan approach 
there is only one set of rules (i. e., the population) that is 
evolved 


the GA. As a consequence, the Pitt-style LCSs are much 
closer to general GAs because each individual consti- 
tutes an overall problem solution. In the Michigan-style 
LCSs, on the other hand, each individual only applies 
in a subspace of the overall problem and only the whole 
set of rules that evolves constitutes the overall problem 
solution. Figure 47.1 illustrates this fundamental con- 
trast between the two approaches. 

As a consequence of this contrast, Pitt-style LCSs 
usually apply rather standard GA approaches. The 
whole population of rule sets is evolved. For fitness 
evaluation purposes, each set of rules needs to be 
evaluated in the problem environment addressed. On 
the other hand, Michigan-style LCSs need to con- 
tinuously interact with an environment to sufficiently 
evaluate all the rules in the rule set — essentially ex- 
ploring all the environmental subspaces to make sure all 
rules can develop a sufficiently useful fitness estimate. 
This continuous interaction and the typical interacting 
components of Michigan-style LCSs are illustrated in 
further detail in Fig. 47.2. Due to the continuously de- 
veloping fitness estimates, often a more steady-state, 
niched GA is applied online in Michigan-style LCSs. 
The undertaken updates then depend directly on the cur- 
rent interaction and thus on the current subset of rules 
relevant in the experienced interaction. The steady- 
state, niched GA optimizes the internal knowledge 
base iteratively depending on the incoming learning 
samples. 


Learning classifier system architecture 


Evolutionary learning component 


Population 


Rule selection, reproduction, mutation, 


[Condition Action Reward] recombination, and deletion 
Ci A R 
Q Ay Ry Reinforcement learning component 
C A R - 
a 2 E Reward propagation & 
C Ag Ry ; 
rule evaluation 


Action decision making 
(behavioral policy) 


Action Reinforcement 
(classification) feedback 


Environment (problem) 


Problem instance 
(state information) 


Fig. 47.2 LCSs consist of a knowledge base (population of 
classifiers), a genetic algorithm for rule structure evolution, 
and a reinforcement learning component for rule evalua- 
tion, reward propagation, and decision making. The system 
interacts with its environment or problem iteratively learn- 
ing online 


L'h |3 Hed 


964 PartE 


Evolutionary Computation 


24 |3 Hed 


In summary, Pitt-style LCSs evaluate and opti- 
mize their rule sets globally based on sets of problem 
instances. They usually learn offline. Michigan-style 
LCSs evaluate and optimize their set of rules online 
while interacting with the problem, iteratively perceiv- 
ing problem instances. The major qualities of Pitt-style 
LCSs are that they evolve competing global problem 
solutions in the form of sets of rules. Evolutionary 
rule structure optimization is used — typically evolving 
small sets of rules (10 s). Michigan-style LCSs, on the 
other hand, are designed to develop one distributed, lo- 
cally optimized problem solution by combining local 
gradient-based approximation techniques with steady- 
state, niched GAs. In consequence, typically larger, 
more distributed sets of rules develop yielding problem 
solutions with potentially 1000s of rules. 


47.1.3 Basic Knowledge Representation 


Because an exemplary knowledge representation was 
already discussed for the early cognitive system imple- 
mentation of [47.4], we now provide a general sketch 
of the knowledge representation typically found in 
Michigan-style LCSs. 

The knowledge representation of an LCS consists 
of a finite population of classifiers (that is, a finite set 
of rules). This population of classifiers essentially rep- 
resents the current knowledge of the LCS about the 
problem the system is applied to. Each rule — or clas- 
sifier — usually consists of a condition and an action 
part, as well as a prediction and a fitness estimate. The 
condition part specifies the problem subspace in which 
the classifier is applicable. When the condition part is 
satisfied given a particular problem instance, a classi- 
fier is said to match that problem instance. The action 
part specifies an action that may be executed, or a clas- 
sification that may be tested. The prediction specifies 
the expected reward, or feedback value, given the spec- 
ified action was executed under the specified contextual 
conditions. The fitness estimates the value of this classi- 
fier relative to other, competing classifiers. In the early 
approaches, fitness was often simply equal to the pre- 
diction value. In the currently established LCSs, fitness 
typically estimates the accuracy of the prediction. 

Michigan-style LCSs usually learn online about 
a problem, iteratively perceiving or actively generating 
problem instances. Given a particular problem instance, 
first, the system forms a match set of those classifiers 
in the population whose conditions match. Next, the 
system decides on an action or classification and ex- 


ecutes it. Classifiers in the match set that specify the 
executed action constitute the current action set. After 
feedback is received, the predictions of the classifiers 
in the action set are adjusted. From the classifier pre- 
diction estimates, a fitness estimate is derived for each 
classifier. Finally, the steady-state GA is applied to the 
match set or the population as a whole. The GA mod- 
ifies classifier structures by reproducing, mutating, and 
recombining well-performing classifiers and by delet- 
ing ill-performing ones. In contrast to the Michigan 
approach, Pitt-style LCSs evaluate their sets of rules 
typically independently of each other in the provided 
problem. The GA exchanges rules and rule-structures 
within and across the sets of rules. 

A Michigan-style LCS consequently is an interac- 
tive, online learning system. It maintains a population 
of classifiers as its knowledge base. It applies a niched, 
steady-state genetic algorithm for gradual rule structure 
evolution; it applies a gradient-based learning com- 
ponent for rule evaluation — yielding prediction and 
fitness estimates. Michigan-style LCSs are often ap- 
plied in RL scenarios in which reward estimates need 
to be propagated and action decisions are made based 
on the learned reward prediction estimates. In this 
case, typically techniques similar to SARSA learning 
or Q-learning are applied. Figure 47.2 shows the basic 
components of a Michigan-style LCS as well as their 
interactions. 

The earliest Michigan-style LCS implementation 
is the introduced cognitive system CS1 [47.4]. After 
various early applications of LCSs, Wilson set a mile- 
stone in LCS research by introducing the zeroth level 
classifier system ZCS [47.20] and the now most promi- 
nent and well-known LCS: the XCS classifier sys- 
tem [47.21]. Both systems were explicitly compared 
to the very well-known Q-learning [47.22] technique 
from the RL community, offering with ZCS and XCS 
two learning classifier systems that can learn Q-value 
functions with a compact highly generalized rule-based 
representation. 

In the following, we now first give a precise in- 
troduction to the XCS classifier system. We then also 
introduce the real-valued version for solving regression 
problems, with a Gaussian mixture of linear approxi- 
mations, i. e., XCSF. After that, we provide spot-lights 
on various current application domains where vari- 
ous types of LCSs, including XCS(F), have produced 
highly competitive problem solutions, when compared 
to other machine learning techniques and regression 
algorithms. 


Learning Classifier Systems 


47.2 XCS 


47.2 XCS 


Wilson introduced the XCS classifier system in 
1995 [47.21]. The two main novel features of XCS 
in comparison to earlier Michigan-style LCSs are its 
accuracy-based fitness estimation and its niche-based 
application of the evolutionary component. The intro- 
duction of accuracy-based fitness essentially decoupled 
the classifier fitness estimate from the reward pre- 
diction, enforcing that XCS learned complete payoff 
landscapes rather than only estimates for those sub- 
spaces where high reward is encountered. In addition, 
Wilson related XCS directly to Q-Learning [47.21, 
22]. Much later, even a relation to Kalman filtering 
and general regression tasks was made mathemati- 
cally explicit [47.23,24]. The niche-based GA repro- 
duction combined with population-wide deletion en- 
abled a much more focused GA-based optimization 
of classifier structures as well as the generalization 
of classifier structures based on the sampling distri- 
bution [47.25]. In consequence, XCS is an LCS that 
is designed to evolve not only the best solution to 
a problem, but it evolves all alternative solutions with 
associated Q-value estimations and variance estima- 
tions of the respective Q-value estimates. Due to its 
GA design and fitness definition, XCS strives to ap- 
proximate the full Q-table of a problem with a maxi- 
mally accurate and maximally compact classifier-based 
representation. 

Despite its original strong relation to Q-learning and 
RL in general, XCS has also been applied successfully 
to classification problems and regression problems. In 
the former case, XCS identifies locally relevant features 
for the generation of maximally accurate classification 
estimates. In the latter case, XCS optimizes the distri- 
bution and structure of local, typically linear estimators 
for a maximally accurate approximation of the func- 
tion surface. Thus, despite its original strong relation 
to RL, XCS is a much more generally applicable learn- 
ing system that can solve single-step classification or 
regression problems as well as multi-step RL prob- 
lems, which are typically defined as Markov decision 
processes. 


47.2.1 System Overview 


XCS evolves one population of classifiers. Classifier 
structures are optimized by means of a steady-state GA. 
A classifier consists of a condition part C, an action 
part A, reward prediction r, reward prediction error €, 
and fitness f estimates. While the condition and action 


structures are iteratively optimized by the steady-state 
GA, the estimates are adjusted using the Widrow—Hoff 
delta rule [47.26] based on an approximation of the Q- 
value signal. 

While condition and action parts can be generally 
represented in any way desired [47.25], in this overview 
we focus on binary problems and a ternary representa- 
tion of the condition part. Conventionally, the condition 
part C is coded by C € {0, 1, #}", where the # symbol 
matches zero and one. Condition C essentially speci- 
fies a hypercube within which the classifier matches and 
can be said to cover a certain volume of the complete 
problem space. Action part A € A defines an action or 
classification from a provided finite set of possible ac- 
tions A. Reward prediction r € R estimates the moving 
average of the received reward in the recent activations 
of the classifier. Reward prediction error € estimates 
the moving average of the absolute error of the reward 
prediction. Finally, fitness f € [0, 1] estimates the mov- 
ing average of the relative accuracy of the classifier 
compared to the competing classifiers in the activated 
match sets (or action sets). The larger the fitness esti- 
mate, the on average larger the accuracy of a classifier 
in comparison to all classifiers that encode the same 
action and whose condition parts define overlapping 
subspaces. 

Each classifier also maintains several additional pa- 
rameters. The action set size estimate as estimates the 
moving average of the action sets the classifier was 
part of. It is updated similarly to the reward predic- 
tion r. A time stamp ts specifies the last time the 
classifier was part of a GA competition. An expe- 
rience counter exp specifies the number of applied 
parameter updates. The numerosity num specifies the 
number of (micro-) classifiers, this macro-classifier 
actually represents — mainly for saving computation 
time. 

Learning usually starts with an empty population. 
The problem faced is sampled iteratively, encounter- 
ing particular problem instances s € S. The set of all 
matching classifiers in the classifier population [P] is 
termed the match set [M]. If some action in A is not 
represented in [M], a covering mechanism is applied. 
Covering creates classifiers that match s (inserting #- 
symbols in the new C with a probability Py at each 
position) and that specify the unrepresented actions. [M] 
essentially contains all the knowledge of XCS about the 
current problem instance. Given [M], XCS estimates the 
payoff for each possible action forming a prediction ar- 


965 


Z'Lh |3 Wed 


966 PartE | Evolutionary Computation 


e247 | Hed 


ray P(A), 


Daa=ancie[my Clr: cl f 


P(A) = 
ea. A=AAcIE[M] clf 


; (47.1) 


where classifier parameters are addressed using the dot 
notation. P(A) computes the fitness-averaged Q-value 
estimates for each action in the current state s. Thus, 
P(A) can be used to decide on the currently most 
promising action. 

Any action selection policy may be applied, such 
as choosing the action with the largest Q-value ex- 
pectation. Because XCS relies on exploring the com- 
plete problem spaces, however, it is important that 
all actions are applied sufficiently frequently. Al- 
ternatively, also the prediction error estimates may 
be considered for action selection — choosing, for 
example, that action with the highest fitness-aver- 
aged £ value with the aim of maximizing informa- 
tion gain (see also more elaborate techniques sur- 
veyed recently in the computational intelligence liter- 
ature [47.27]). 

After the choice of an action A, an action set 
[A] is formed, which contains all classifiers in [M] 
that specify the chosen action. Moreover, the cho- 
sen action is executed, feedback is received in the 
form of scalar reward ReIR, and the next prob- 
lem instance may be perceived. In conjunction with 
the maximum P(A) derived from the resulting match 
set, the [A] formed is updated according to the esti- 
mated Q-value signal, which is R+ y max4e a P(A). 
Moreover, the steady-state GA may be applied, repro- 
ducing two classifiers in [A], but choosing classifiers 
from [P] for deletion. In classification problems — of- 
ten also termed single-step problems — the Q-learning 


R Cc A Ri 
Ry $ a A 
Ca Aas Ra 


Ry 


Steady-state 
genetic algorithm 


Problem instance Action Reinforcement toben 
(state information) (classification) feedback Pene 


Environment (problem) 


update only considers the immediate reward R. Fig- 
ure 47.3 illustrates the iterative learning process applied 
in XCS. 


Rule Evaluation 
To evaluate the classifiers, it is crucial to update 
their parameter estimates and derive a relative fit- 
ness estimate. Parameter updates are applied itera- 
tively in respective action sets. Usually, the predic- 
tion error is updated before the prediction and the 
fitness. Other parameters may be updated in any 
order. 

In particular, the reward prediction error ¢ of each 
classifier in [A] is updated by 


e<e+B(|p—Rl|—e), (47.2) 
where p = R in classification problems and 
=R P(A 
p=R+ y max P(A) 


in multi-step reinforcement learning problems. Parame- 
ter £ € [0, 1] specifies a learning rate, which is typically 
set to values between 0.05 and 0.2. The higher the value 
of f is, the more the £ value depends on the most recent 
problem interactions. Next, the reward prediction r of 
each classifier in [A] is updated by 


r<—r+ f(p—-r). (47.3) 


Note that XCS essentially applies Q-learning updates, 
where Q-values are not approximated by a tabular entry 
but by a collection of rules expressed in the prediction 
array P(A) [47.21]. 


updates 
Re Yes) match set 
(A) Condition Action Reward) 


Fig. 47.3 The XCS classifier system 
learns iteratively online. With each 
iteration it forms a match set given 
the current problem instance. Next, 
it chooses an action or classification 
and applies it. After the perception of 
feedback, the classifiers in the corre- 
sponding action set [A] are updated 
and the steady-state GA is applied. 
After that, the next problem iteration 
proceeds 


with (P^ 


Learning Classifier Systems 


47.2 XCS 


To update the fitness estimate of each classifier 
in [A], a current scaled relative accuracy x’ is deter- 
mined. 


1 if € < £0 
K= v P (47.4) 
a (2) otherwise 
F k - num 
k= (47.5) 


X cl.«-cl.num ` 
clE[A] 


k essentially measures the current inverse error of 
a classifier. £ọ specifies the targeted error below which 
a classifier is considered maximally accurate. «’ then 
determines the current relative accuracy with respect 
to all other classifiers in the current action set [A]. 
Thus, each classifier in [A] competes for a limited fit- 
ness resource, which is distributed relative to the current 
accuracy estimates x. Finally, the fitness estimate f is 
updated given the current x’ by 

f<ft+Bk'—f). (47.6) 
In effect, fitness reflects the moving average, set- 
relative accuracy of a classifier. As before, 8 controls 
the sensitivity of the fitness estimates to changes in the 
population. 

The action set size estimate as is updated similarly 
to the reward prediction R but with respect to the current 
action set size |[A]| 

as — as + B(|[A]| — as) , (47.7) 
resulting in an action set size adaptation to changes |[A]| 
in an order similar to the fitness changes. Parameters r, 
£, and as are updated using the moyenne adaptive mod- 
ifiée technique [47.28]. This technique sets parameter 
values directly to the average of the so far encountered 
cases until the resulting update is smaller than 6 (which 
is the case after 1/f updates). Finally, the experience 
counter exp is increased by one. If the GA is applied, 
the time stamps ts of all classifiers in [A] are set to the 
current iteration time t. 


Rule Evolution 
XCS applies a steady-state genetic algorithm (GA) for 
rule evolution. Given a current action set [A], the GA 
is invoked if the average time since the last GA ap- 
plication (stored in parameter ts) in [A] is larger than 
threshold 0g4. This mechanism is applied to ensure suf- 
ficient evaluation of classifiers, as well as to control 


unbalanced sampling. The higher the threshold Oga is, 
the slower evolution proceeds, but also the less prone 
XCS is to unbalanced problem sampling [47.29]. 

The steady-state GA first selects two parental clas- 
sifiers for reproduction in [A]. While this selection 
process was done by proportionate selection based on 
fitness in the original XCS, more recently it was shown 
that tournament selection can improve the robustness of 
the system highly significantly [47.30]. Tournament se- 
lection in XCS chooses the classifier with the highest 
fitness from a tournament of randomly chosen clas- 
sifiers from [A]. The tournament size is usually set 
relative to the current action set size |[A]| to t-|[A]]. Two 
classifiers are selected in two independent tournaments. 
The selected classifiers are reproduced generating the 
offspring. Crossover and mutation are applied to the 
offspring. The parents stay in the population. Muta- 
tion usually changes each condition and action symbol 
randomly with a certain probability u. Crossover ex- 
changes condition and action symbols. Often, simple 
uniform crossover is applied (exchanging each symbol 
with a probability of 0.5). However, also more sophisti- 
cated estimation of distribution (EDAs) algorithms have 
been applied for more effective building block process- 
ing [47.31]. 

The offspring parameters are initialized by setting 
prediction R, £, f, and as to the parental values. Fitness 
f is often decreased to 10% of the parental fitness. Ex- 
perience counter exp and numerosity num are set to one. 

The resulting offspring classifiers are finally added 
to the population. In this case, GA subsumption may be 
applied [47.32] to stress generalization. GA subsump- 
tion searches for another classifier in [A] that may sub- 
sume an offspring classifier. This classifier must have 
a more general condition than the offspring classifier, 
its error estimate must be below £ọ, and its experience 
counter must be sufficiently high (exp > Osup). If such 
a classifier is found, the offspring is subsumed, increas- 
ing the numerosity of the more general classifier by one 
and discarding the offspring. 

The population of classifiers [P] is maximally of fi- 
nite size N. When this size is exceeded after offspring 
insertion, classifiers are deleted from [P]. Fitness pro- 
portionate selection is applied depending on the action 
set size estimates as. Note that tournament selection is 
not suitable in this case because a balance in the action 
set sizes is most desirable. The likelihood of deletion 
of a classifier is further increased by a factor f/f if this 
classifier is experienced exp > Qe, and additionally if 
its fitness f is below a fraction 5 of the average fitness f 
in the population. 


967 


Z'Lh |3 Wed 


968 PartE | Evolutionary Computation 


e724 |3 Hed 


47.2.2 When and How XCS Works 


From the description above it may seem hard to un- 
derstand why XCS learns successfully. This section 
provides intuition about when and how XCS works and 
points to relevant literature that quantifies the sketched- 
out intuition. 

The two interacting learning components, which are 
gradient-based rule evaluation and evolutionary-based 
rule evolution, are strongly interactive. From an evo- 
lutionary point of view, several evolutionary pressures 
yield particular learning biases. Since reproduction is 
designed to maximize fitness, XCS strives to develop 
maximally accurate classifiers applying a fitness pres- 
sure [47.33]. Meanwhile, rules are selected in [A] for 
reproduction but they are selected in [P] for deletion. 
Since the classifier conditions in [A] will on average 
cover a larger subspace, i.e., they have a larger vol- 
ume than the average condition volumes of classifiers 
in [P], more general classifiers will be reproduced on 
average (when ignoring the fitness pressure for the mo- 
ment), yielding a sampling-dependent generalization 
pressure [47.33]. In consequence, it has been put for- 
ward that XCS strives to evolve a complete problem 
solution that is represented by maximally general clas- 
sifiers that are meanwhile maximally accurate (error 
below the threshold £o). The resulting problem solution 
representation was previously termed the optimal solu- 
tion representation [O] [47.34]. 

While these evolutionary pressures generally de- 
scribe how the GA in XCS works, successful rule 
evolution still relies on sufficiently accurate fitness sig- 
nals. Thus, rule evaluation needs to have enough time to 
estimate rule fitness before expected rule deletion. This 
leads to a covering bound, which quantifies the need 
for a sufficiently large population size given a particu- 
lar initial condition volume. Moreover, each particular 
problem can be assumed to have a certain complexity 
in terms of subspace sizes that need to be separated for 
learning to take place, that is, for decreasing the error 
below the average deviation of the payoff signal to per- 
ceive an initial fitness signal towards higher accuracy. In 
consequence, the subspace size requires the generation 
of classifiers with condition volumes of maximally that 
size, consequently yielding a schema bound on the pop- 
ulation size to be able to cover the full problem space 
with such condition volumes. Finally, better classifiers 
with a certain condition volume need to be able to grow, 
that is, have reproductive opportunities before deletion 
can be expected, consequently yielding a reproductive 
opportunity bound. 


Together these bounds give estimates on the neces- 
sary initial condition volumes and the resulting max- 
imal population size necessary to cover a problem 
space. For example, given the need for an initial 
classifier volume of 0.01 of the encountered prob- 
lem space, the population size N should be set to 
about 10/0.01 = 1000 to assure proper rule evolu- 
tion. Given that these factors are satisfied, better 
classifiers are assured to be identified and to grow 
in the population with high probability. For binary 
and for real-valued problem domains, these consid- 
erations have been quantified, showing that XCS is 
an approximate polynomial-time learning algorithm in 
problem domains with bounded complexity [47.25, 
35]. 

The considerations above ensure the theoretic 
growth of better classifiers. However, the evolutionary 
component may still destroy relevant classifier struc- 
tures due to mutation and crossover. Thus, neither 
mutation nor crossover may be overly disruptive. In ex- 
treme cases, where highly unstructured subspaces may 
need to be identified and recombined, estimation of 
distribution algorithms can help to identify these sub- 
spaces [47.31, 36]. In most cases, though, a sufficiently 
low mutation rate and uniform crossover suffice to learn 
successfully. However, clearly mutation is mandatory 
to detect more accurate classifier structures over time. 
Thus, a good compromise is necessary to ensure that 
offspring is usually mutated but its structure is not 
fully destructed. In the binary domain, for example, 
the mutation probability is consequently often set to 
1/1, where l is the number of bits of a problem in- 
stance. This is a typical choice for the mutation strength 
used in genetic algorithms — essentially setting the ex- 
pected number of attributes that will be mutated to 
one. 


47.2.3 When and How to Apply XCS 


From the reflections above it becomes clear that XCS 
is designed to learn the target function of a problem by 
a population of locally accurate predictors, that is, clas- 
sifiers. This target function may be the Q-value function 
in RL problems, a correctness function in classification 
problems, or also any other type of function. XCS is 
best suited to be applied in problem domains that can 
be partitioned into subspaces within which simple pre- 
dictions yield accurate values. Moreover, XCS is even 
better suited to be applied to problems where regular- 
ities in the target function can be well-represented in 
classifier conditions, that is, subspaces in which the 


Learning Classifier Systems 


47.2 XCS 


function values are approximately equal should be com- 
pactly representable with few classifiers. Overall, XCS 
thus strives to develop distributed problem solutions in 
the form of a set of locally partially overlapping classi- 
fier structures, which cover the whole sampled problem 
space in a generalized way. 

As long as a condition representation can be cho- 
sen that identifies expectable regularities in a data set 
or also in a reinforcement learning problem well, XCS 
is a good candidate to optimize these local condition 
structures iteratively online. However, also in offline, 
data mining-based classification problems XCS was 
applied successfully and it was shown that the gen- 
eralization and accuracy performance XCS yield is 
comparable to other state-of-the art machine learning 
algorithms [47.25, 37], such as decision tree learners, 
instance-based classifiers, or support vector machines. 
Thus, XCS may be applied to multi-step Q-learning 
problems but also to single-step classification problems 
and general regression problems. Online generalization 
and optimal condition structuring for accurate predic- 
tions are the major features of XCS. From a regres- 
sion perspective, XCS is a non-parametric regression 
algorithm that strives to minimize the expected abso- 
lute function approximation error, or also the expected 
squared function approximation error as put forward 
elsewhere [47.24]. 

The two components, (a) gradient-based rule pre- 
diction approximation and evaluation and (b) evolution- 
ary rule structure evolution, are the key to successful 
XCS applications. With respect to rule structure evolu- 
tion, also the XCS system strongly depends on distance 
representations, which can be compared with general 
kernel representations as used in support vector ma- 
chines and elsewhere [47.38, 39]. As long as the repre- 
sented kernel-based condition structures can be mean- 
ingfully modified by genetic operators, evolution and 
thus also XCS can be applied. Meanwhile, also sensible 
value predictions need to be generated. Gradient-based 
methods work best to approximate these predictions, 
whether the prediction is a single value, is computed 
linearly or polynomially from input, or its structured 
otherwise depends on the problem at hand and the 
gradient-based approximation approach available. The 
more the prediction structure fits with the regularities in 
the target function, the faster and more robust learning 
can be expected. While such structural considerations 
can improve system performance, the successful appli- 
cations of XCS to various problem domains show that 
successful learning is usually not precluded by subopti- 
mal structural choices. 


47.2.4 Parameter Tuning in XCS 


While XCS does, indeed, specify many parameters, 
only few parameters are really crucial. All other param- 
eter values can typically be set to standard values. Here 
we discuss some rules of thumb for tuning the critical 
parameter settings and also provide standard settings. 
While the following recommendations have not been 
published elsewhere so far, they can be derived from 
observations and other recommendations found in the 
literature [47.25, 35, 40]. 

The two most important parameters are the maximal 
population size N and the strived-for error threshold £o. 
The larger the population size N is, the more capacity 
XCS has for learning and thus the more complex prob- 
lems XCS can learn. On the other hand, the larger N 
is, the slower XCS learns, because it reproduces and 
deletes only two classifiers in a typical learning itera- 
tion. Parameter £ọ specifies the targeted approximation 
error. In continuous function approximation problems, 
smaller ¢9 values demand finer problem space partition- 
ings and thus larger population sizes to cover the whole 
problem space and to enable reproductive opportunities 
(see above). Moreover, €9 can partially determine the 
fitness signal available to XCS: if ¢9 is chosen very 
small, (47.4) will yield values very close to zero for 
all highly inaccurate classifiers. Thus, overly small £o 
values should be avoided. In noisy problems, £o should 
thus also not be chosen much smaller than the standard 
deviation of the noise expected in the function value 
signal. 

Without much knowledge of a problem, one may 
start with a rather small population size N — say 1000 — 
and evaluate learning progress in this setting with a de- 
sired € 9. If the generated approximation error over time 
does not decrease, then £o should be set to about 1/10 
of the encountered error. Next, the population size N 
should be progressively increased, for example, to N = 
5000 or more. If still no error decrease is observed, fur- 
ther analysis is necessary. If the population is filled with 
classifiers but the match set sizes are very small (be- 
low 5), better classifiers probably do not receive enough 
reproductive opportunities. In this case, first the initial 
condition volume should be increased — for example, in 
the binary domain the probability Py would need to be 
increased (up to close to 1). If the match set still de- 
creases to sizes below 5, the problem is rather hard, 
requiring a further population size increase. On the 
other hand, if the match set sizes are very large (above 
100), then over-generalization takes place and XCS ap- 
parently does not pick up the fitness signal. In this case, 


969 


Z'Lh |3 Wed 


970 PartE 


Evolutionary Computation 


€°24 | 3 Hed 


Population 


Condition Prediction 


the initial condition volume should be decreased. If this 
does not help, then the GA application rate should be 
decreased to enforce a more accurate classifier evalua- 
tion before evolution applies. This can be accomplished 
by increasing the threshold Öga to say 100, 500, or 
even higher. An increase in Og, can also be crucial in 
problems where the problem domain is sampled highly 
unevenly, as is studied in detail elsewhere [47.29]. 
Several other parameter settings may be checked 
as well; the mutation rate should not be set overly 
high. As stated above, in the binary problem domain, 


47.3 XCSF 


The XCS classifier system for real-valued inputs was 
introduced by Wilson in 1999, introducing Michigan- 
style LCSs to the real-valued problem domain [47.41, 
42]. It was further enhanced to approximate continu- 
ous real-valued function surfaces in 2001/2002 [47.43], 
yielding an iterative online learning non-parametric 
regression system. XCS for function approximation 
(XCSF) essentially enhances and modifies XCS by 


Match set 
Condition Prediction 
Pi 
P, 


RLS update 


Pu Reproduce 


Steady-state 
genetic algorithm 


Insert & delete 


Problem instance 
(argument of function) 


Corresponding 
value 


Environment (problem) 


Fig. 47.4 The XCSF classifier system learns linear value predic- 
tions and usually does not specify actions. The feedback is the 
actual function value, which is used to update the linear approxi- 
mators of the matching classifiers. The consequent error and fitness 
estimation updates are then considered in the evolutionary compo- 
nent for further optimization of the condition structures 


for example, a mutation rate of u = 1/1, where / de- 
notes the condition size, is a good rule of thumb. 
Crossover can mostly be applied without restrictions 
(x = 1.0) — especially when tournament selection for 
reproduction is chosen because in this case disruption 
is often prevented by choosing two equal classifiers. 
Other parameters can be safely set to somewhat stan- 
dardized values. A typical initial parameter setting for 
XCS is: N = 1000, ¢9 = 0.1, y = 1/41, x= 1, «= 1, 
É =0.2, v = 5, fsa = 25, y = 0.9, Oae = 20, 6 = 0.1, 
Osub = 20, Py = 0.5, and t = 0.4. 


changing its classifier condition structure to accept real- 
valued input. Moreover, the prediction part no longer 
predicts single values, but it computes its prediction 
from the input using linear approximation techniques, 
such as recursive least squares (RLS) [47.44]. Finally, 
the action part of the system is removed, applying the 
parts of the algorithm that were previously applied to 
[A] to the match set [M] in XCSF. Figure 47.4 illustrates 
the iterative learning process in XCSF. 

XCSF is thus a regression system that solves func- 
tion approximation problems by developing partially 
overlapping locally weighted projections in the form of 
a population of classifiers. In this form, XCS develops 
problem solutions that are similar to those developed 
by the locally-weighted projection regression algorithm 
(LWPR), which is rather well-known in the robotics 
community [47.45]. A comparative study has shown 
that XCSF can outperform LWPR in various problem 
domains [47.46], often yielding better problem space 
partitionings, as well as more accurate function value 
approximations with a comparable number of individ- 
ual locally linear approximators (i. e., classifiers). In 
XCSF, each classifier specifies in its condition the sub- 
space within which it is applicable. Thus, the condition 
may be compared with a receptive field determining the 
neural activity of the classifier. Moreover, each classi- 
fier specifies a linear approximator weighted within its 
subspace. In effect, the function approximation prob- 
lem is approximated by locally-weighted, overlapping 
linear approximations. While typically the weighting is 


Fig. 47.5a,b Screenshots of the XCSF program learning to approximate the crossed ridge function. Current performance 
values are plotted on the top left. The current approximation surface is approximated on the bottom left. On the right- 
hand side the classifier condition structures are plotted. For visualization purposes, the receptive field sizes are plotted 
smaller than their actual size. Darker classifier conditions have higher fitness values > 


Learning Classifier Systems 


47.3 XCSF 


b) = 


6 NOSE Petommance = BT |li Condition Visusticwton of Layer t 


esate 
m~ tmcreCiasstier — 
foes — 
10%% 
100 
10 
1 
or 
LM 
he 
on W 
osn 


© mo m om o oo n m ose eee 
E =o 
GTBRRQA? 


iteration 22009 


Becation: 22000, WacroClassiters: 3304 


on Van iy kat een Jed epee ce ee 1 


EE] opier tonto tO) 
eceeeanr? 


fam (€0.0000, 300000 scsie 100000, 100000 


ecatee, 200000, tecroCiassders 1629 


971 


EZH | 3 Hed 


972 


124 |3 Hed 


Part E 


Evolutionary Computation 


fitness-dependent, also a weighting based on the dis- 
tance to the center of the classifier condition can be 
applied. 

With this structure it has been shown that XCSF is 
very well suited for developing any type of kernel struc- 
ture [47.47]. In effect, various condition structures have 
been applied, including rectangular structures with and 
without rotation and with various forms of represen- 
tation [47.35, 48, 49]. Moreover, the linear approxima- 
tions may be enhanced to polynomial approximations 
and others [47.50]. Finally, it is also possible to cluster 
a contextual space with conditions, while approximat- 
ing linear (or other) predictions given totally different 
inputs. For example, the velocity kinematics of an arm 
can be predicted locally dependent on the angular arm 
constellation for redundancy resolution [47.51] (see fur- 
ther details below). Thus, XCSF is a highly flexible 
system with which other modifications in the condi- 
tion and prediction parts of the classifiers may still yield 
highly vital system applications. 

As an example, we applied XCSF to the crossed 
ridge function — a function that has been used as 
a benchmark in the neural computation and machine 


47.4 Data Mining 


Data mining is a rather large field of research that gen- 
erally addresses the challenge of extracting knowledge 
from data. In the LCS realm, the addressed data usually 
consists of a set of data instances, where each instance 
specifies a set of features and a corresponding class the 
data instance belongs to. LCSs then typically learn to 
mine the data by predicting the class likelihoods of un- 
seen data instances, as well as by identifying the most 
relevant features and feature interactions for classifi- 
cation. Particularly Pitt-style LCSs have proven to be 
highly valuable in data mining applications. However, 
also the XCS classifier system was successfully applied 
in this domain. 

The XCS system was also converted to an of- 
fline learning system; the sUpervised classifier system 
(UCS) algorithm [47.53] determines classifier predic- 
tions and resulting fitness values in a supervised man- 
ner. Meanwhile, the other learning aspects of UCS were 
derived from XCS. Both, XCS and UCS have shown 
effective if not even superior prediction accuracies in 
various data mining tasks — most of them taken from 
the UCI machine learning database repository [47.54]. 
When applying always the same standard setting and 
comparing with various other decision making algo- 


learning community for many years [47.45,52]. The 
function contains a mix of linear and non-linear sub- 
spaces. It is specified in two dimensions as follows 


fi, x2) = max fexp(—10x7) , exp(—50x3) ; 


1.25 exp(—S(xt +.23))} . (47.8) 
We ran XCSF with a maximum population size N = 
4000 and a target error €ọ = 0.005 on this function, 
applying a condensation mechanism late in the run. 
Figure 47.5 shows that XCSF is able to yield a good 
function approximation very early in the run. The 
evolving classifier structures learn to suitably partition 
the problem space into local subspaces. In consequence, 
a smooth overall approximation surface is generated. 
Note how the inverse exponential hill in the center is ap- 
proximated with nearly circular receptive fields, while 
the fields are selectively elongated in the x; or x2 di- 
mension due to the non-linearities caused by the ridges 
extending to the four sides. Towards the corners of the 
input space, the function flattens out so that the recep- 
tive fields become increasingly wider. 


rithms, such as support-vector machines, decision tree 
learning, naive Bayes classifiers, and others imple- 
mented in the WEKA machine learning tool [47.55], 
XCSF outperformed these competing techniques in 
many cases — often depending on the problem at 
hand [47.25]. A similar performance was achieved with 
UCS, outperforming XCS in some cases due to its more 
accurate classifier prediction estimates. XCS was also 
further enhanced to be able to deal with highly unbal- 
anced datasets in data mining domains by automatically 
adjusting the threshold that controls the frequency of 
GA applications Og, [47.29]. 

Pitt-style LCSs have been evaluated and applied 
to data mining problems even more extensively. The 
typical offline-learning scenario faced in data min- 
ing particularly suits the Pitt approach. However, 
also the fact that often very compact rule sets are 
strived for is advantageous for the Pitt approach. More 
than 10 years ago, the GALE architecture [47.56, 
57] yielded very good performance results on a col- 
lection of datasets from the UCI repository. GALE 
distributes its evolutionary process adding additional 
niching biases due to a grid-based spatial distribution 
of individuals. A comparative study of GALE, XCS, 


Learning Classifier Systems | 47.5 Behavioral Learning 973 


and other machine learning algorithms can be found 
in [47.58]. 

The GAassist architecture [47.59,60] develops 
a priority list of classification rules. The advantage of 
GAassist is its developing compactness. A compara- 
tive analysis with XCS is provided in [47.61]. Later, 
the architecture was enhanced with ensemble learn- 
ing techniques [47.62] and memetic algorithms [47.63], 
proving high scalability and fast learning of very com- 
pact rule sets. 

Recently, many efficiency enhancement techniques 
from the GA literature (cf. [47.64]) and from other 
fields, including bioinformatics and systems biol- 
ogy [47.65] were applied to various LCSs. These tech- 
niques can help tremendously to improve the learning 
speed of LCSs, particularly in data mining realms. For 


47.5 Behavioral Learning 


While the application of LCSs to data mining problems 
will certainly still produce many further impressive re- 
sults and promises to yield novel, deep insights into data 
structures, LCSs were originally designed as cognitive 
systems. Thus, in the following we will focus on LCSs 
as cognitive systems, their structures, and their poten- 
tial as neural cognitive models. As had been sketched 
out above, the XCS classifier system in particular was 
compared with Q-learning in RL. We start from this 
perspective and detail various successful applications of 
XCS in reinforcement learning problems. Next, ALCSs 
are surveyed. ALCSs learn generalized cognitive maps 
that are suitable to apply Sutton’s Dyna algorithm and 
value iteration techniques in general. A strong relation 
to factored RL techniques was pointed out recently in 
this respect [47.67]. Finally, robotics applications of 
LCSs are discussed and their potential is revealed. 


47.5.1 Reward-Based Learning with LCSs 


From the beginning [47.2] a big appeal to LCSs lay in 
the fact that they are designed for reward-based learn- 
ing. Once the original bucket-brigade algorithm was 
replaced by Q-learning techniques, a theory developed 
in the RL community also applied to LCSs to a certain 
extent. 

In XCS, in particular, it was shown that the sys- 
tem approximates the Q-value function by a collection 
of classifiers. The prediction array (47.1) calculation 
essentially approximates the current Q-value estimates 
for the current state in the environment. The fitness 


example, windowing techniques select subsets of data 
instances to speed up the classifier evaluation process. 
Fitness surrogates were used to make the fitness es- 
timation even cheaper [47.66]. Hybrid methods were 
already mentioned above; they combine traditional GA 
operators with informed ones, as is done when ap- 
plying memetic algorithms, which locally improve the 
developing classifier structures when applied to LCSs. 
In combination, such techniques can yield LCSs that 
not only produce highly accurate classification perfor- 
mance and good generalizations, but they also offer 
solution interpretability allowing mining of the knowl- 
edge developed in the LCS rules, and they generate 
these results without requiring much computational 
time — which is often comparable to the time needed 
by much simpler machine learning techniques. 


weighting based on the relative accuracies, which are 
normalized to one, assures that these Q-value esti- 
mates on average do not over or underestimate the 
expected Q-value. Moreover, since Q-learning is an 
off-policy learning technique, XCS is well-suited to 
be combined with it because also XCS benefits from 
exploring all possible state—action combinations in 
the long run — striving to develop an approxima- 
tion of the complete Q-value function in the problem 
space. 

As a result, XCS has been successfully applied to 
learning optimal paths in various maze environments. 
Starting from the Woods! and Woods2 environments 
proposed by Wilson [47.20,21], XCS’s performance 
and generalization capabilities have been investigated 
in various mazes [47.68]. For illustrative purposes such 
mazes are shown in Fig. 47.6. These maze environ- 
ments provide information about the surrounding grid 
cells, indicating whether they are either free or occu- 
pied by an obstacle or by food. Reaching the latter cell 
usually results in a reward trigger. Movements are typ- 
ically possible to the eight surrounding cells, yielding 
a rather large action space. The point of providing sen- 
sory state information rather that cell IDs or coordinates 
is that XCS is then able to exhibit its generalization 
capabilities. It essentially manages to generalize over 
the sensory state space ignoring irrelevant bits and gen- 
eralizing over the states with respect to state—action 
combinations that yield the same reward. 

Performance in many of these environments has 
yielded extreme generalization capabilities. For exam- 


G'Zh | 3 Hed 


974 PartE | Evolutionary Computation 


S24 | 3 Hed 


b) Maze6 
eee 


a) Woods 1 


Fig. 47.6a,b Two highly typical maze environments used 
as benchmarks in the LCS literature for generalized rein- 
forcement learning. Woods! is a toroidal maze. In Maze6 
the food location is much harder to find. In both cases, 
the LCS-controlled agent perceives information about the 
eight neighboring cells encoding free, blocked, and food 
cells by means of two bits. The agent can execute move- 
ments to each of these cells. Movements to blocked cells 
yield no reward. A movement to the food cell triggers re- 
ward and a reset of the agent 


ple, in the Maze6 environment (Fig. 47.6) up to 90 
irrelevant bits were introduced, which changed ran- 
domly while interacting with the environment. While 
learning was slightly delayed and a larger population 
size was needed for successful learning, the optimal 
Q-value function was still extracted from iterative in- 
teractions [47.25]. Thus, XCS learned the optimal 
Q-value function in a problem space that contained 
more than 10% potential sensory state encodings. Also 
rather noisy action outcomes did not preclude learn- 
ing success. Later, it was shown that highly effective 
generalizations are even possible when each bit in the 
sensory encoding is relevant. In [47.36] the encoding 
for each bit was changed to a nested Boolean function, 
such as the parity function. XCS was still able to learn 
the optimal Q-value function, while Q-learning with- 
out generalization failed miserably due to the large state 
space. Thus, XCS is able to identify those aspects of the 
available sensory information that are relevant for accu- 
rate reward predictions. 

To successfully apply XCS in these scenarios, one 
crucial modification was necessary to stabilize the Q- 
values and thus the derived fitness values: the update of 
the classifier predictions had to be further modified by 
the error gradient factor, converting (47.3) to 


a oo 
Žac] clf 


The exact derivation of this equation can be found in 
the literature [47.69]. The gradient term essentially re- 


r<—r+B(p—-r) (47.9) 


sults in much more stable performance and successful 
learning and generalization in problems that require 
the establishment of long reward chains. It stabilizes 
the reward learning by down-scaling updates of inaccu- 
rate and unreliable classifiers. Consequently, these rules 
do not tend to over-estimate reward, and thus learning 
progress is stabilized. As a further consequence, XCS 
with gradient-based reward predictions updates was 
also successfully applied to blocks world problems, in 
which even more generalizations are possible [47.25]. 

The generalization capabilities of LCSs reached 
even as far as being successfully applied to control 
simple light following behavior on a real robot plat- 
form [47.70,71]. In this case, however, reward learn- 
ing was maximized and no complete Q-value function 
approximation developed. Nonetheless, this work con- 
stituted one of the first successful application in the 
robotics domain. 

Besides condition—action Michigan-style LCSs, 
such as the XCS, other Michigan-style LCS techniques 
have been applied for behavioral learning and also 
for learning cognitive maps. Such anticipatory learning 
classifier systems are surveyed in the following. 


47.5.2 Anticipatory Learning 
Classifier Systems 


Anticipatory learning classifier systems (ALCSs) are 
learning systems that learn a generalized predictive 
model or cognitive map [47.72] of the encountered en- 
vironment online. ALCSs are typical Michigan-style 
LCSs. However, in contrast to the usual classifier struc- 
ture, classifiers in ALCSs have a state prediction or an- 
ticipatory part that predicts the environmental changes 
in the environment caused when executing the speci- 
fied action in the specified context. As in XCS, ALCSs 
derive classifier fitness estimates from the accuracy of 
their predictions. However, the accuracy of the antici- 
patory state predictions are considered, rather than the 
accuracy of the reward prediction. Figure 47.7 illus- 
trates the typical structures and learning processes that 
apply in an ALCS architecture. 

Rick Riolo originally proposed an ALCS that gen- 
erated its cognitive map mediated by a message list 
storage system, which was also used in Holland’s orig- 
inal classifier system architecture [47.73]. However, 
this approach appeared to not be sufficiently elegant 
to enable any serious learning. Starting with Stolz- 
mann’s anticipatory classifier system [47.74], various 
ALCS architectures were developed. Particularly in 
maze problems, optimal behavior was achieved with 


Learning Classifier Systems | 47.5 Behavioral Learning 975 


cA, 2a 
jy Ak 


Fig. 47.7 Instead of the condition— 
action—reward prediction rules in 


a Potential pN typical LCSs, ALCSs encode and de- 


update Next velop condition-action—effect rules. 


R+ Vmax match set 
P(A) [Condition Action Effect | 


Typically, the structural optimization 
of these rules is done by a combi- 
nation of evolutionary and heuristic 

| Cw Ee | techniques 


C E 
C Ey 


Anticipatory 
learning process Match 


with (P’) 


Problem instance Action Reinforcement pereg eee 
(state information) (classification) feedback anne 


Environment (problem) 


various ALCSs [47.75—79]. To prevent the development 
of overgeneral models for concurrent reward learning, 
the reward learning process was often decoupled, yield- 
ing a system that learns a cognitive map based on 
LCS principles, and, additionally, a state value esti- 
mation system. In combination, DYNA-based learning 
techniques [47.80] were applied to improve the state 
value estimations also offline. These techniques al- 
lowed the simulation of animal-like behavioral patterns, 
such as reward adaptations based on knowledge about 
the behavioral consequences in rats in a T-maze envi- 
ronment [47.81], as well as in controlled devaluation 
or satiation experiments [47.82]. In these studies it 
was also pointed out that ALCSs do not only allow 
DYNA-based reward learning updates, but also en- 
able the application of search and planning techniques 
for improving behavioral performance of the system. 
Even curiosity mechanisms have been added [47.83] to 
speed up the learning progress. Most approaches, how- 
ever, never generalized the list of states with associated 
rewards. 

The combination of the ACS2 system with the XCS 
system for state-value estimations, terming the resulting 
system XACS (x-anticipatory classifier system), may 
be the one with the most current potential for future 
research [47.84]. XACS essentially applies two LCS 
learning mechanisms: one being an ALCS architecture 
in the form of ACS2, which learns a cognitive model of 
the encountered environment, and the other one being 
the XCS system, which learns state-value estimations 
in this case. Figure 47.8 illustrates the components in 
the XACS architecture and their interactions. 

XACS has been shown to develop optimal behav- 
ior in blocks world problems in which other approaches 
failed to yield proper generalizations and resultingly 


optimal behavior control. Moreover, the reward-based 
generalization mechanism in XACS is directly based 
on the XCS classifier system, thus enabling the in- 
corporation of any tools and representations developed 
for XCS so far. The generalizations that were de- 
veloped confirmed the identification of task-relevant 
perceptual attributes. In the XCS components, reward- 
distinguishing attributes were identified. In the ACS2 
component, on the other hand, state prediction-relevant 
components were detected. In consequence, general- 
ized detectors for prediction with respect to reward 
and state could be distinguished. The implementa- 
tion of other anticipatory mechanisms in XACS, such 
as task-dependent attentional mechanisms, further in- 
teractions of the learning components, and multiple 
behavioral modules for the representation of multiple 
motivations (or needs) [47.84] are still open issues in 
the LCS realm. Further research with ALCSs is ex- 
pected to yield highly promising, cognitive learning 
architectures. 


47.5.3 Controlling a Robot Arm with an LCS 


We end this section of behavioral learning with the 
XCSF system. Over the last decade or so it has be- 
come increasingly clear that XCS is extremely well 
suited to partition a contextual space for the generation 
of accurate predictions. Predictions, however, do not 
necessarily need to be reward predictions. Behavioral 
consequences serve just as well as a target for predic- 
tions. The forward kinematics mapping in the robotics 
domain [47.85] offers even another potential target for 
learning. 

Consequently, XCSF was modified to learn the 
forward velocity kinematics of a robotic arm in simu- 


G24 | 3 Hed 


976 Part E | Evolutionary Computation 
Fig. 47.8 The XACS system com- 
bines the model learning capabilities 
of ALCSs with the generalizing re- 
Wotioneet Next inforcement learning capabilities of 
Condition Action Effect match set XCS. Consequently, generalizations 
€ A E Condition Action Effect ‘ 
. E a in the two system components are 
@ 4. k 
om E targeted towards a compact represen- 
Ew tation for accurate predictions and 
Anticipatory for reward predictions, respectively. 
learning process Match The combined system enables the 
with (P’) ee ; 
application of lookahead planning 
Predict Determine and search techniques for behavioral 
potential expected control as well as of reinforcement 
next reward values & : . . E 
states choose action learning techniques and combinations 
thereof 
Population Match set Valle updates Ne 
iti Condition Reward RY max match set 
Condition Reward o on S VSA) 
G 
Cu 
Reproduce 
Steady-state 
genetic algorithm Match 
with (P’) 
Problem instance Action Reinforcement aes 
(state information) (classification) feedback agama 
Environment (problem) 
lation [47.86]. To do so, XCSF projects its condition angle at all [47.87]. Recent advancements in the explo- 
v parts into the joint angle space of the robotic arm. ration strategy, which can be self-induced by the XCSF 
5 However, its locally linear predictions receive as input controller during learning, have shown that XCSF is 
m small joint movements, that is, changes in joint space able to learn to control all seven degrees of freedom 
= and predict the consequent change in task space, that of a humanoid arm highly effectively — flexibly ad- 
z is, changes of the end-effector location. This mapping hering to different constraints while pursuing motions 


has the great advantage that it is locally linear so that 
given a current joint angle constellation of the arm not 
only location changes of joint angle movements can 
be predicted but also directional motion of the end- 
effector can be invoked by inverting the locally linear 
forward velocity mappings. Seeing that those are lin- 
ear, the inversion can be rather easily done using linear 
algebra techniques. Given a redundant arm system — 
one that has more degrees of freedom (i. e., joint an- 
gles to manipulate) than actual locations to move to — 
it is possible to add additional constraints to the arm 
motion. For example, the arm can be driven to main- 
tain a relaxed arm posture while pursuing a certain goal 
or it may be forced to prevent moving a certain joint 


to certain goal locations. Moreover, mappings could 
be learned in different reference frame representations. 
For example, end-effector locations were either repre- 
sented in a Cartesian coordinate system or in a distance 
plus angles encoding. XCSF learned different classi- 
fier structures due to the differences in the linearities 
encountered. Nonetheless, XCSF yielded equally good 
arm control in both cases [47.51]. Figure 47.9 illustrates 
the XCSF setup for arm control. 

These results confirmed that XCSF may very well 
be further developed into a cognitive system architec- 
ture for behavioral control. While this type of architec- 
ture was probably not the one envisioned by Holland 
originally, it may still prove highly valuable. Various 


Learning Classifier Systems | 47.6 Conclusions 


XCSF for control 


Fig. 47.9 In the published robot arm 
control applications, XCSF clusters 
the contextual configuration state of 
the arm and learns linear approxi- 
mations of the average Jacobian in 


Population Match set RLS update 
Condition Prediction Condition Prediction 

G P oS A 

1 1 Pa 

=Z H @ 2 

G P; č P, 

Cy P4 is be 

Ch Ps 

an Steady-state 

C, l; Insert 4 À 

= & delete genetic algorithm 


Arm state Desired Desired 


Arm controller/goal generator 


configuration location 


Environment (problem) 


neuroscientific evidence points out that similar forward- 
inverse predictive-control structures may be found in 
the cerebellum [47.88, 89]. Only more detailed knowl- 
edge on cortical and cerebellar structures may allow the 
direct comparison of the shapes and orientations of the 
receptive fields developed by the XCSF system and po- 
tential cortical and neural structures found in the brain. 


47.6 Conclusions 


While LCSs have been applied to a wide variety of 
problems, still there are many potential developments 
that have not been further evaluated. In the following, 
potential future research directions are summarized. 

At the moment nearly all LCSs are flat in that 
they develop one population of classifiers (or compet- 
ing sets of classifiers in the Pitt-style system). All of the 
classifiers, however, apply to the same problem granu- 
larity. Ever since the introduction of LCSs by Holland, 
the development of default hierarchies was envisioned. 
However, so far it was never convincingly or rigor- 
ously accomplished [47.19]. Default hierarchies refer 
to classifier systems in which general rules predict one 
thing but more specialized rules predict exceptions of 
the general rule. The emergent development of default 
hierarchies in LCSs remains an open challenge. 

With the most recent understanding of LCSs and 
the XCS system in particular, it seems that at least 


Experienced 
(angular motion angular angular 
configuration) direction motion change 


Current Current Resulting 
angular effector angular 
configuration 


the respective subspaces. In con- 
sequence, the system can generate 
both forward predictions of move- 
ment consequences as well as inverse 
control commands when directional 
movements of the arm are desired 


Experienced 
effector location 
change 


Resulting 
effector 
location 


While the brain may not implement actual evolutionary 
techniques literally, as XCSF does, it appears plausible 
that local competitions take place [47.90]. Moreover, 
it is known that neurons populate novel information 
sources once available — as XCSF does. Further re- 
search in neural computation with LCSs may prove 
highly valuable. 


the development of a hierarchically-structured LCS 
architecture is within our grasp. We expect such a hi- 
erarchical LCS to progressively refine its predictions 
in a hierarchical way. Default rules may gain a cer- 
tain level of accuracy, but more specialized rules may 
identify exceptions of the default prediction. Alterna- 
tively, the more specialized rules may also simply add 
further accuracy to the default predictions where and 
when necessary. In the latter case, a hierarchical pre- 
dictive system may develop that allows the progressive 
refinement of activated predictions until the finest pre- 
diction granularity in the hierarchical representation is 
reached. 

When developing hierarchical LCSs, also network 
LCSs seem to be of vital importance. For example, 
when developing classifier structures in spatial do- 
mains that are intricately structured, a network structure 
may provide additional hints on the connectivity of 


977 


9°24 | 3 Hed 


978 PartE 


Evolutionary Computation 


224 | J Wed 


the space. Especially the case where XCSF learns ve- 
locity kinematics, or generally, contextually-dependent 
sensory-motor contingencies — as sketched out in the 
section on controlling a robot arm with XCSF above — 
a network structure can give additional hints on how the 
sensorimotor space is structured and may be traversed. 
Networks of LCS classifiers may allow the application 
of lookahead planning and goal-oriented control — as 
was pursued in early work in [47.91]. 

A network structure may also enable the speed- 
up of the XCS matching process. For example, when 
a problem space is sampled by means of a random 
walk process, overlapping classifiers may be directly 
identified within a classifier structure instead of apply- 
ing a global matching process in each iteration. Also, 
when XCSF is used for goal-directed control — as men- 
tioned above with respect to velocity kinematics — this 
may improve the efficiency of the system tremendously. 
Furthermore, given a hierarchically network structured 
LCS system matching may proceed from coarse-to- 
fine-grained levels. All these processes may speed up 
the matching, which is often considered a bottleneck in 
LCS research and has been improved by means of nu- 
merous approaches over the recent years [47.92, 93]. 

Besides these additions, also ALCSs may be pur- 
sued further, as sketched out above. From a cognitive 
modeling perspective, ALCSs essentially learn gener- 
alized schemata or production rules [47.94—96], which 
specify the expected state changes perceived after the 
execution of the specified action. Such rules may be ap- 
plied by the cognitive science community for learning, 
for example, ACT-R structures [47.97]. The lookahead 
planning capabilities, the sensorimotor generalization 
capabilities, as well as the abstraction capabilities of 
these systems still ask for further development. The re- 
cent point that ALCSs can be very effectively applied 


47.7 Books and Source Code 


Further information about learning classifier systems 
can be found in the biannually published IWLCS 
(International Workshop on Learning Classifier Sys- 
tems) workshop proceedings and yearly workshops 
on the topic. A book on LCSs and the XCS clas- 
sifier system in particular covers XCS from a the- 
oretical and application-oriented point of view and 
also provides a detailed algorithmic description of 


to factored RL problems [47.98] should be further pur- 
sued. Also, the combination of ALCS-based cognitive 
map or concept learning and XCS-based reward learn- 
ing promises further research advancements. 

Even without the addition of hierarchies, network 
structures, or anticipations, however, LCSs can be suc- 
cessfully applied to various domains including rein- 
forcement learning problems, classification and data 
mining problems, and regression problems. XCS, in 
particular, learns iteratively online, striving for the 
development of a compact, maximally general, and 
maximally accurate problem solution. Pitt-style sys- 
tems typically learn offline and are thus most promising 
in large-scale data mining tasks in which rather small 
compact sets of rules are searched for. Seeing that the 
learning mechanisms of LCSs are highly flexible, it 
is possible to substitute the condition of a classifier 
with any other form or condition structure, as long 
as this structure can be mutated and recombined in 
a way that small structural changes also yield small 
changes in the defined subspace within which the con- 
dition matches. Similarly, the prediction structure can 
be replaced with any other prediction structure that can 
be quickly and accurately adapted by suitable learning 
techniques. Thus, the available LCS techniques — such 
as GALE and GAassist on the Pitt side and XCS, XCSF, 
or XACS on the Michigan side — can be further ex- 
ploited and combined with novel structures and forms 
of representations. Learning promises to be robust due 
the combination of a flexible evolutionary component, 
which searches for optimal rule structures, and the 
gradient-based fitness estimation, which quickly yields 
useful prediction and fitness estimations. It seems only 
a matter of time until LCSs gain even more recognition 
and be successfully applied to even more diverse prob- 
lem domains and challenging research tasks. 


the system [47.25]. A more theoretical coverage of 
the approximation approach in XCS can be found 
in [47.23]. Several books also give further details on 
theoretical considerations [47.23,99] as well as on 
successful applications of LCSs [47.100, 101]. The 
source code can be found online, for example, for 
XCS in C++ [47.102] as well as for XCSF in 
Java [47.103]. 


Learning Classifier Systems 


References 


References 


47.1 


47.2 


47.3 


47.4 


47.5 


47.6 


47.7 


47.8 


47.9 


47.10 


47.11 


47.12 


47.13 


47.14 


47.15 


47.16 


47.17 


47.18 


47.19 


J.H. Holland: Adaptation in Natural and Artificial 
Systems (Univ. of Michigan, Ann Arbor 1975) 

J.H. Holland: Adaptation. In: Progress in Theoret- 
ical Biology, Vol. 4, ed. by R. Rosen, F.M. Snell 
(Academic, New York 1976) pp. 263-293 

L.B. Booker, D.E. Goldberg, J.H. Holland: Classifier 
systems and genetic algorithms, Artif. Intell. 40, 
235-282 (1989) 

J.H. Holland, J.S. Reitman: Cognitive systems 
based on adaptive algorithms. In: Pattern Di- 
rected Inference Systems, ed. by D.A. Waterman, 
F. Hayes-Roth (Academic, New York 1978) pp. 313- 
329 

L.P. Kaelbling, M.L. Littman, A.W. Moore: Rein- 
forcement learning: A survey, J. Artif. Intell. Res. 
4, 237-285 (1996) 

R.S. Sutton, A.G. Barto: Reinforcement Learning: 
An Introduction (MIT Press, Cambridge 1998) 

J.H. Holland: Properties of the bucket brigade al- 
gorithm, Proc. Int. Conf. Genet. Algorithms Appl. 
(1985) pp. 1-7 

D.E. Goldberg: Genetic Algorithms in Search, 
Optimization and Machine Learning (Addison- 
Wesley, Reading 1989) 

S.F. Smith: A learning system based on genetic 
adaptive algorithms, Ph.D. Thesis (Univ. of Pitts- 
burgh, Pittsburgh 1980) 

K.A. De Jong: An analysis of the behavior of a class 
of genetic adaptive systems, Ph.D. Thesis (Univ. of 
Michigan, Ann Arbor 1975) 

L.B. Booker: Intelligent behavior as an adaptation 
to the task environment, Ph.D. Thesis (The Univ. 
of Michigan, Ann Arbor 1982) 

S.W. Wilson: Knowledge growth in an artifi- 
cial animal, Proc. Int. Conf. Genet. Algorit. Appl. 
(1985) pp. 16-23 

S.W. Wilson: Classifier systems and the animat 
problem, Mach. Learn. 2, 199-228 (1987) 

D.E. Goldberg: Computer-aided gas pipeline op- 
eration using genetic algorithms and rule learn- 
ing, Diss. Abstr. Int. 44, 3174B (1983) 

K.A. De Jong: Learning with genetic algorithms: 
An overview, Mach. Learn. 3, 121-138 (1988) 

K.A. De Jong, W.M. Spears, D.F. Gordon: Using 
genetic algorithms for concept learning, Mach. 
Learn. 13, 161-188 (1993) 

R.L. Riolo: Bucket brigade performance: I. Long 
sequences of classifiers, Proc. 2nd Int. Conf. 
Genet. Algorithms (ICGA87), ed. by J.J. Grefen- 
stette (Lawrence Erlbaum Associates, Cambridge 
1987) pp. 184-195 

R.E. Smith, H. Brown Cribbs: Is a learning classifier 
system a type of neural network?, Evol. Comput. 
2, 19-36 (1994) 

J.H. Holland, L.B. Booker, M. Colombetti, 
M. Dorigo, D.E. Goldberg, S. Forrest, R.L. Riolo, 
R.E. Smith, P.L. Lanzi, W. Stolzmann, S.W. Wilson: 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


47. 


24 


31 


34 


What is a learning classifier system?, Lect. Notes 
Comput. Sci. 1813, 3-6 (2000) 

S.W. Wilson: ZCS: A zeroth level classifier system, 
Evol. Comput. 2, 1-18 (1994) 

S.W. Wilson: Classifier fitness based on accuracy, 
Evol. Comput. 3, 149-175 (1995) 

C.J.C.H. Watkins: Learning from delayed rewards, 
Ph.D. Thesis (King's College, Cambridge 1989) 

J. Drugowitsch: Design and Analysis of Learn- 
ing Classifier Systems: A Probabilistic Approach, 
Studies in Computational Intelligence (Springer, 
Berlin, Heidelberg 2008) 

J. Drugowitsch, A. Barry: A formal framework and 
extensions for function approximation in learn- 
ing classifier systems, Mach. Learn. 70, 45-88 
(2008) 

M.V. Butz: Rule-Based Evolutionary Online Learn- 
ing Systems: A Principled Approach to LCS Analysis 
and Design (Springer, Berlin, Heidelberg 2006) 

B. Widrow, M. Hoff: Adaptive switching circuits, 
West. Electron. Show Conv. &, 96-104 (1960) 

P.-Y. Oudeyer, F. Kaplan, V.V. Hafner: Intrinsic 
motivation systems for autonomous mental de- 
velopment, IEEE Trans. Evol. Comput. 11, 265-286 
(2007) 

G. Venturini: Adaptation in dynamic environ- 
ments through a minimal probability of explo- 
ration, from animals to animats 3, Proc. 3rd Int. 
Conf. Simul. Adapt. Behav. (1994) pp. 371-381 

A. Orriols-Puig, E. Bernad6-Mansilla, D.E. Gold- 
berg, K. Sastry, P.L. Lanzi: Facetwise analysis of XCS 
for problems with class imbalances, IEEE Trans. 
Evol. Comput. 13, 1093-1119 (2009) 

M.V. Butz, K. Sastry, D.E. Goldberg: Strong, stable, 
and reliable fitness pressure in XCS due to tour- 
nament selection, Genet. Program. Evol. Mach. 6, 
53-77 (2005) 

M.V. Butz, M. Pelikan, X. Llorà, D.E. Goldberg: Au- 
tomated global structure extraction for effective 
local building block processing in XCS, Evol. Com- 
put. 14, 345-380 (2006) 

S.W. Wilson: Generalization in the XCS classifier 
system, genetic programming 1998, Proc. 3rd Ann. 
Conf. (1998) pp. 665-674 

M.V. Butz, T. Kovacs, P.L. Lanzi, S.W. Wilson: To- 
ward a theory of generalization and learning in 
XCS, IEEE Trans. Evol. Comput. 8, 28-46 (2004) 

T. Kovacs: XCS classifier system reliably evolves ac- 
curate, complete, and minimal representations 
for Boolean functions. In: Soft Computing in 
Engineering Design and Manufacturing, ed. by 
R. Roy, P.K. Chawdhry, R.K. Pant (Springer, Berlin, 
Heidelberg 1997) pp. 59-68 

P.O. Stalph, X. Llorà, D.E. Goldberg, M.V. Butz: Re- 
source management and scalability of the XCSF 
learning classifier system, Theor. Comput. Sci. 
425, 126-141 (2012) 


979 


Zh |3 Hed 


980 PartE 


Evolutionary Computation 


24 |3 Wed 


47.36 


47.37 


47.38 


47.39 


47.40 


47.41 


47.42 


47.43 


47.44 


47.45 


47.46 


47.47 


47.48 


47.49 


47.50 


47.51 


47.52 


47.53 


M.V. Butz, P.L. Lanzi: Sequential problems that 
test generalization in learning classifier systems, 
Evol. Comput. 2, 141-147 (2009) 

L. Bull, E. Bernado-Mansilla, J. Holmes (Eds.): 
Learning Classifier Systems in Data Mining, 
Studies in Computational Intelligence, Vol. 125 
(Springer, Berlin, Heidelberg 2008) 

B. Schokopf, A.J. Smola: Learning with Kernels: 
Support Vector Machines, Regularization, Opti- 
mization, and Beyond (MIT Press, Cambridge 2001) 
W. Liu, J.C. Principe, S. Haykin: Kernel Adaptive 
Filtering: A Comprehensive Introduction, 1st edn. 
(Wiley, Hoboken 2010) 

M.V. Butz, S.W. Wilson: An algorithmic description 
of XCS, Soft Comput. 6, 144-153 (2002) 

S.W. Wilson: Get real! XCS with continuous-valued 
inputs. In: Festschrift in honor of John H. Holland, 
ed. by L. Booker, S. Forrest, M. Mitchell, R.L. Ri- 
olo (Center for the Study of Complex Systems, Ann 
Arbor 1999) pp. 111-121 

S.W. Wilson: Get real! XCS with continuous-valued 
inputs, Lect. Notes Comput. Sci. 1813, 209-219 
(2000) 

S.W. Wilson: Classifiers that approximate func- 
tions, Nat. Comput. 1, 211-234 (2002) 

S. Haykin: Adaptive Filter Theory, 4th edn. (Pren- 
tice Hall, Upper Saddle River 2002) 

S. Vijayakumar, A. D'Souza, S. Schaal: Incremental 
online learning in high dimensions, Neural Com- 
put. 17, 2602-2634 (2005) 

P. Stalph, J. Rubinsztajn, 0. Sigaud, M.V. Butz: 
Function approximation with LWPR and _ XCSF: 
A comparative study, Evol. Comput. 5, 103-116 
(2012) 

M.V. Butz: Kernel-based, ellipsoidal conditions in 
the real-valued XCS classifier system, Proc. Genet. 
Evol. Comput. Conf. (GECCO 2005) (2005) pp. 1835- 
1842 

C. Stone, L. Bull: For real! XCS with continuous- 
valued inputs, Evol. Comput. 11, 299-336 (2003) 
M.V. Butz, P.L. Lanzi, S.W. Wilson: Function 
Approximation With XCS: Hyperellipsoidal Con- 
ditions, Recursive Least Squares, and Com- 
paction, IEEE Trans. Evol. Comput. 12, 355-376 
(2008) 

D. Loiacono, P.L. Lanzi: Recursive least squares 
and quadratic prediction in continuous multistep 
problems, Lect. Notes Comput. Sci. 6471, 70-86 
(2010) 

P.O. Stalph, M.V. Butz: Learning local linear Jaco- 
bians for flexible and adaptive robot arm control, 
Genet. Program. Evol. Mach. 13, 137-157 (2012) 

S. Schaal, C.G. Atkeson: Constructive incremental 
learning from only local information, Neural Com- 
put. 10, 2047-2084 (1998) 

E. Bernad6-Mansilla, J.M. Garrell-Guiu: Accu- 
racy-based learning classifier systems: Models, 
analysis, and applications to classification tasks, 
Evol. Comput. 11, 209-238 (2003) 


47.54 


47.55 


47.56 


47.57 


47.58 


47.59 


47.60 


47.61 


47.62 


47.63 


47.64 


47.65 


47.66 


47.67 


47.68 


47.69 


K. Bache, M. Lichman: UCI Machine Learning 
Repository (Univ. of California, School of Informa- 
tion and Computer Sciences 2013) http://archive. 
ics.uci.edu/ml 

I.H. Witten, E. Frank: Data Mining. Practical Ma- 
chine Learning Tools and Techniques with Java 
Implementations (Morgan Kaufmann, San Fran- 
cisco 2000) 

X. Llora, J.M. Garrell: Knowledge independent 
data mining with fine-grained parallel evolu- 
tionary algorithms, Proc. Genet. Evol. Comput. 
Conf. (GECCO 2001) (2001) pp. 461-468 

X. Llorà, J.M. Garrell: Inducing partially-defined 
instances with evolutionary algorithms, Proc. 
18th Int. Conf. Mach. Learn. (ICML 2001) (2001) 

E. Bernadó, X. Llorà, J.M. Garrell: XCS and GALE: 
A comparative study of two learning classifier 
systems and six other learning algorithms on 
classification tasks, Lect. Notes Comput. Sci. 2321, 
115-132 (2002) 

J. Bacardit, J.M. Garrell: Evolving multiple dis- 
cretizations with adaptive intervals for a Pitts- 
burgh rule-based learning classifier system, Lect. 
Notes Comput. Sci. 2724, 1818-1831 (2003) 

J. Bacardit, M.V. Butz: Data mining in learning 
classifier systems: Comparing XCS with GAssist, 
Lect. Notes Comput. Sci. 4399, 282-290 (2007) 

J. Bacardit, M.V. Butz: Data mining in learning 
classifier systems: Comparing XCS with GAssist (II- 
liGAL, Univ. of Illinois at Urbana-Champign 2004) 
J. Bacardit, N. Krasnogor: Empirical evaluation of 
ensemble techniques for a Pittsburgh learning 
classifier system, Lect. Notes Comput. Sci. 4998, 
255-268 (2008) 

J. Bacardit, N. Krasnogor: Performance and effi- 
ciency of memetic Pittsburgh learning classifier 
systems, Evol. Comput. 17, 307-342 (2009) 

K. Sastry, D.E. Goldberg, X. Llora: Towards billion- 
bit optimization via a parallel estimation of dis- 
tribution algorithm, Proc. Genet. Evol. Comput. 
Conf. (GECCO 2007) (2007) pp. 577-584 

J. Bacardit, E. Burke, N. Krasnogor: Improving the 
scalability of rule-based evolutionary learning, 
Memet. Comput. 1, 55-67 (2009) 

X. Llora, K. Sastry, T.-L. Yu, D.E. Goldberg: Do not 
match, inherit: Fitness surrogates for genetics- 
based machine learning techniques, Proc. Genet. 
Evol. Comput. Conf. (GECCO 2007) (2007) pp. 1798- 
1805 

0. Sigaud, M.V. Butz, 0. Kozlova, C. Meyer: Antic- 
ipatory Learning Classifier Systems and Factored 
Reinforcement Learning (Springer, Berlin, Heidel- 
berg 2009) pp. 321-333 

P.L. Lanzi: An analysis of generalization in the XCS 
classifier system, Evol. Comput. 7, 125-149 (1999) 
M.V. Butz, D.E. Goldberg, P.L. Lanzi: Gradient de- 
scent methods in learning classifier systems: Im- 
proving XCS performance in multistep problems, 
IEEE Trans. Evol. Comput. 9, 452-473 (2005) 


Learning Classifier Systems | References 981 


47.70 


47.71 


47.72 


47.73 


47.74 


47.75 


47.76 


47.77 


47.78 


47.79 


47.80 


47.81 


47.82 


47.83 


47.84 


47.85 


47.86 


J. Hurst, L. Bull: Self-adaptation in classifier 
system controllers, Artif. Life Robot. 5, 109-119 
(2001) 

J. Hurst, L. Bull: A neural learning classifier sys- 
tem with self-adaptive constructivism for mobile 
robot learning, Artif. Life 12, 1-28 (2006) 

E.C. Tolman: Cognitive maps in rats and men, Psy- 
chol. Rev. 55, 189-208 (1948) 

R.L. Riolo: Lookahead planning and latent learn- 
ing in a classifier system, from animals to an- 
imats, Proc. 1st Int. Conf. Simul. Adapt. Behav. 
(1991) pp. 316-326 

W. Stolzmann: Anticipatory classifier systems, Ge- 
netic Programming 1998, Proc. 3rd Ann. Conf. 
(1998) pp. 658-664 

M.V. Butz: Anticipatory Learning Classifier Systems 
(Kluwer, Boston 2002) 

M.V. Butz, D.E. Goldberg, W. Stolzmann: The an- 
ticipatory classifier system and genetic general- 
ization, Nat. Comput. 1, 427-467 (2002) 

P. Gérard, 0. Sigaud: YACS: Combining dynamic 
programming with generalization in classifier 
systems, Lect. Notes Comput. Sci. 1996, 52-69 
(2001) 

P. Gérard, J.-A. Meyer, 0. Sigaud: Combining la- 
tent learning and dynamic programming in MACS, 
Eur. J. Oper. Res. 160, 614-637 (2005) 

W. Stolzmann, M.V. Butz: Latent learning and ac- 
tion planning in robots with anticipatory classi- 
fier systems, Lect. Notes Comput. Sci. 1813, 301-317 
(2000) 

R.S. Sutton: DYNA: an integrated architecture for 
learning, planning, and reacting, ACM SIGART Bull. 
2(4), 160-163 (1991) 

W. Stolzmann, M.V. Butz, J. Hoffmann, D.E. Gold- 
berg: First cognitive capabilities in the anticipa- 
tory classifier system, from animals to animats 6, 
Proc. 6th Int. Conf. Simul. Adapt. Behav. (2000) 
pp. 287-296 

M.V. Butz, J. Hoffmann: Anticipations control be- 
havior: Animal behavior in an anticipatory learn- 
ing classifier system, Adapt. Behav. 10, 75-96 
(2002) 

M.V. Butz: Biasing exploration in an anticipatory 
learning classifier system, Lect. Notes Comput. Sci. 
2321, 3-22 (2002) 

M.V. Butz, D.E. Goldberg: Generalized state values 
in an anticipatory learning classifier system. In: 
Anticipatory Behavior in Adaptive Learning Sys- 
tems: Foundations, Theories, and Systems, ed. by 
M.V. Butz, 0. Sigaud, P. Gérard (Springer, Berlin, 
Heidelberg 2003) pp. 282-301 

B. Siciliano, 0. Khatib: Springer Handbook of 
Robotics (Springer, Berlin, Heidelberg 2007) 

M.V. Butz, 0. Herbort: Context-dependent predic- 
tions and cognitive arm control with XCSF, Proc. 


47.87 


47.88 


47.89 


47.90 


47.91 


47.92 


47.93 


47.94 


47.95 


47.96 


47.97 


47.98 


47.99 


47.100 


47.101 


47.102 


47.103 


Genet. Evol. Comput. Conf. (GECCO 2008) (2008) 
pp. 1357-1364 

M.V. Butz, G.K.M. Pedersen, P.O. Stalph: Learn- 
ing sensorimotor control structures with XCSF: 
Redundancy exploitation and dynamic control, 
Proc. Genet. Evol. Comput. Conf. (GECCO 2009) 
(2009) pp. 1171-1178 

D.M. Wolpert, R.C. Miall, M. Kawato: Internal 
models in the cerebellum, Trends Cogn. Sci. 2, 
338-347 (1998) 

J.G. Fleischer: Neural correlates of anticipation 
in cerebellum, basal ganglia, and hippocampus, 
Lect. Notes Comput. Sci. 4520, 19-34 (2007) 

C.T. Fernando, E. Szathmary, P. Husbands: Se- 
lectionist and evolutionary approaches to brain 
function: A critical appraisal, Front. Comput. Neu- 
rosci. 6, doi: 10.3389/fncom.2012.00024 (2012) 

A. Tomlinson, L. Bull: A corporate XCS, Lect. Notes 
Comput. Sci. 1813, 195-208 (2000) 

X. Llorà, K. Sastry: Fast rule matching for learn- 
ing classifier systems via vector instructions, Proc. 
Genet. Evol. Comput. Conf. (GECCO 2006) (2006) 
pp. 1513-1520 

M.V. Butz, P.L. Lanzi, X. Llorà, D. Loiacono: An 
analysis of matching in learning classifier sys- 
tems, Proc. Genet. Evol. Comput. Conf. (GECCO 
2008) (2008) pp. 1349-1356 

J.R. Anderson: Rules of the Mind (Lawrence Erl- 
baum Associates, Hillsdale 1993) 

G.L. Drescher: Made-Up Minds: A Constructivist 
Approach to Artificial Intelligence (MIT Press, 
Cambridge 1991) 

A. Newell: Physical symbol systems, Cogn. Sci. 4, 
135-183 (1980) 

J.R. Anderson, D. Bothell, M.D. Byrne, S. Dou- 
glass, C. Lebiere, Y. Qin: An integrated theory of 
the mind, Psychol. Rev. 111, 1036-1060 (2004) 

0. Sigaud, S. Wilson: Learning classifier systems: 
A survey, soft computing - a fusion of founda- 
tions, Methodol. Appl. 11, 1065-1078 (2007) 

L. Bull, T. Kovacs (Eds.): Foundations of Learn- 
ing Classifier Systems, Stud. Fuzziness and Soft 
Comput, Vol. 183 (Springer, Berlin, Heidelberg 
2005) 

L. Bull (Ed.): Applications of Learning Classifier 
Systems (Springer, Berlin, Heidelberg 2004) 

L. Bull: On lookahead and latent learning in 
Simple LCS, Learn. Classif. Syst. Int. Workshops, 
IWLCS 2006-2007, ed. by J. Bacardit, E. Bernad- 
Mansilla, M.V. Butz (Springer, Berlin, Heidelberg 
2008) pp. 154-168 

P:L. Lanzi: xcslib - The XCS Library. http://xcslib. 
sourceforge.net/ 

P. 0. Stalph, M. V. Butz: Documentation of 
JavaXCSF (COBOSLAB, University of Wiirzburg, Ger- 
many, Y2009N001 2009) 


Zh |3 Hed 


48. Indicator-Based Selection 


Lothar Thiele 


The goal of multiobjective evolutionary optimiza- 
tion is to determine a set of solutions that satisfies 
certain optimality properties. Recently, there is 
a growing number of very competitive search al- 
gorithms that are based on an explicit formulation 
of the optimization goal as a set property, i.e., 
they build on the concept of set indicators. These 
indicators are used to guide the selection pro- 
cess which is usually denoted as indicator-based 
selection. This major breakthrough leads to sev- 
eral advantages in terms of analysis and algorithm 
design: Algorithms are conceptually simpler and 
more robust as they are largely based on a sin- 
gle indicator; certain convergence properties can 
be proven; the optimization criterion is made ex- 
plicit; by changing the set indicator, it is possible 
to explicitly consider preferences of a user. The 
chapter introduces step-by-step the concept of 


48.1 Motivation 


Variation and selection are the main ingredients of 
evolutionary optimization algorithms. Despite of many 
variations that have been developed in the past, their 
basic iterative structure can simply be described by the 
following three steps: 


1. From the current set of solutions (parent set), a sub- 
set is determined (mating pool) by mating selection. 

2. From the solutions in the mating pool new solutions 
are generated (offspring set) through variation oper- 
ators such as mutation and recombination. 

3. Environmental selection determines the new parent 
set as a subset of the joined parent set and offspring 
set. 


As can be seen in the above template, selection 
denotes the process of forming a subset of a set of 
solutions. Mating selection determines the set of can- 


48.1 Motivation .......0..00. cece ee eeees 983 

48.2 Basic Concepts... 984 

82I NONO. cssciss ascend snacdsasonncnad sore 984 

48.2.2 SOC ICAO sc oa sccctasccssscvesasoas 985 

48.3 Selection Schemes ...............0....cccccceeeee 987 

48.3.1 Basic Search Algorithm............... 987 

48.3.2 Exhaustive Selection.................. 987 
48.3.3 Steady State 

and Greedy Selection................. 988 

48.3.4 Hierarchical Set Preferences........ 989 

48.3.5 Using Binary Indicators.............. 989 

48.4 Preference-Based Selection.................. 990 

48.5 Concluding Remarks .......................0008 992 

REFEFENCOS......... 0. cece cece c ccc eec eee eeaeseeeneeneeees 993 


set indicators and their use in indicator-based 
selection. 


didate solutions that will be further explored by con- 
structing new solutions, i.e., the offspring set. To this 
end, promising solutions need to be selected whose 
offsprings are expected to advance the optimization 
process most. In contrast, environmental selection com- 
bines parent and offspring sets toward the new parent 
set and thereby, it reduces the number of solutions that 
are considered in the next iteration. Loosely speaking, 
mating selection is involved in the exploration phase 
of the evolutionary optimization whereas environmen- 
tal selection is central to the decision phase. 

Set indicators map sets of solutions to scalar values. 
They characterize to which degree the set satisfies some 
desirable property. Therefore, they can be used to guide 
the selection process which is usually denoted as indica- 
tor-based selection. The following chapter concentrates 
on environmental selection as indicator-based methods 
have been applied in this context mainly. 


983 


v 
ry) 
= 
m 
F 
[oe] 
= 


984 PartE 


Evolutionary Computation 


7°84 |3 Hed 


The goal of multiobjective evolutionary optimiza- 
tion is to determine a set of solutions that satisfies 
certain optimality properties. The corresponding notion 
of optimality is partially defined by solution preference, 
i.e., when we consider one single solution to be prefer- 
able to another single solution. One common choice of 
such a solution-oriented preference relation is Pareto— 
dominance. But there is still a large degree of freedom 
left in defining what an optimal set of solutions is, as 
there may be many more Pareto-optimal solutions than 
can be reasonably processed, stored, or presented to the 
user. Therefore, we need additional information that de- 
scribes the preference of the user, i.e., what subset of 
Pareto-optimal solutions he/she is interested in. For ex- 
ample, the user may be interested in a diverse set of 
solutions or in solutions which cover a certain subspace 
of interest. Set indicators can now define such a prefer- 
ence relation and influence: 


a) The result of the population-based optimization and 

b) The characteristics of the sets of solutions during an 
optimization run and 

c) The search efficiency. 


Traditionally, multiobjective evolutionary optimiza- 
tion algorithms such as NSGA-II (nondominated sort- 
ing genetic algorithm ID [48.1] or SPEA2 (strength 


48.2 Basic Concepts 


Before discussing the role of indicators, selection, and 
archiving in multiobjective evolutionary algorithms, we 
will define the notation used in the forthcoming sec- 
tions. In particular, we will define the underlying class 
of multiobjective optimization problems. 


48.2.1 Notation 


We will consider the minimization of a vector-val- 
ued objective function f = (f\,...,f,) : X —> R which 
maps each point in the decision space X to an n- 
dimensional vector. The decision space X denotes the 
feasible set of alternatives for the optimization problem 
and n denotes the dimension of the minimization prob- 
lem, i.e., the number of objectives. For simplicity of 
notation, we suppose in the following that X is finite. 
Often, we call an element of the decision space a so- 
lution and the corresponding objective value z = f(x) 
is denoted as objective vector. The image of the deci- 
sion space X under the objective function f is called 


Pareto evolutionary algorithm 2) [48.2] start from so- 
lution preference, i.e., the Pareto-dominance relation 
between the solutions, and then attempt to consider 
set preferences such as diversity using heuristics. As 
a downside of this approach, deterioration and cyclic 
behavior have been reported [48.3], formal conver- 
gence results can not be obtained and unsatisfiable 
optimization results have been shown for high-dimen- 
sional objective spaces [48.4]. 

On the other hand, indicator-based selection treats 
multiobjective evolutionary optimization as a set-opti- 
mization with a single optimization criterion, namely 
the set—preference relation or its defining set indica- 
tor. In other words, instead focussing on individual 
solutions with multiple criteria, set-based methods con- 
sider sets of solutions as the object of optimization 
and a single set criterion, i.e., the set indicator. This 
is a radical change from the traditional approach. The 
set indicator directly represents the user preference 
and the optimization algorithm determines a set of 
solutions that optimizes this single set indicator. The 
advantages of this approach are obvious: Formal and 
unambiguous definition of the optimization goal, possi- 
bility to show strong convergence results, and a clear 
approach to consider user preferences in the search 
method. 


objective space Z C R” with Z = {f(a) |x € X}, i.e., it 
contains all objective vectors corresponding to solu- 
tions in X. 

In the above formulation, it is not yet clear what 
we understand as the minimization of a vector-valued 
function. In this chapter, we follow the usual concept of 
Pareto dominance which defines an order relation be- 
tween all solutions based on a preference relation, i. e., 
it defines when we call a solution better than another 
one. 


Definition 48.1 

A solution a € X weakly Pareto dominates a solution 
beX, denoted as a < b, if f(a) <f,(b) for all 1 <i 
<n. Solution a strongly Pareto dominates b, denoted 
asa < bif (axb)A(bfKa). 


We can rewrite the strong domination criterion as 
(a< b) & (ax b)A (f(a) Ff (d)). We also say that so- 
lution a is better than or weakly preferable to b if 


Indicator-Based Selection | 48.2 Basic Concepts 


a < b or a < b, respectively. Note that the weak Pareto- 
dominance relation is suitable for optimization as it 
defines a preorder on the set of solutions X. A preorder 
< on a given set X is reflexive and transitive: a < a and 
(a xb) A(b xc) > (a< c) hold for all a,b,c € X. 

In terms of optimization, we say that a solution 
a € X is Pareto-optimal if there is no better solution in 
X, i.e., if b <a for some b € X then a < b. The set of 
all Pareto-optimal solutions is denoted as the Pareto- 
optimal set and its image in the objective space as the 
Pareto-optimal front. 

Ideally, a multiobjective optimizer determines the 
Pareto-optimal set for a given objective function f and 
the corresponding decision space X. Traditionally, evo- 
lutionary multiobjective algorithms attempt to solve 
this problem by generating a suitable approximation of 
the Pareto-optimal set. To this end, they maintain and 
improve sets of solutions, denoted as populations. In 
this context, the following questions arise: 


© Ifthe set of Pareto-optimal solutions is too large to 
be determined efficiently, how do we select those 
which will be the result of the optimization process? 

@ How do we valuate a set of solutions, i.e., an ap- 
proximation of the set of Pareto-optimal solutions, 
in terms of its degree of optimality in order to guide 
the optimization process? 


The chapter describes how indicator-based selection 
can be used to answer the above questions. There- 
fore, it touches two core issues for multiobjective 
evolutionary algorithms: (a) how to formalize the op- 
timization goal in the sense of specifying what type of 
set is sought; (b) how to efficiently determine a suit- 
able subset to achieve the formalized optimization 
goal. 

The following section introduces the concept of set 
indicators that can be used to valuate a set of solutions, 
i. e., associate a quality indicator which describes its de- 
gree of optimality. 


48.2.2 Set Indicators 


Preference relations between sets of solutions are the 
basis of set-based multiobjective optimization. They 
provide the information on the basis of which the search 
is carried out, i.e., for any two Pareto set approxima- 
tions, they say whether one set is considered to be equal, 
better, or worse. 

A set indicator can now be used to define such 
a preference relation and therefore to indicate whether 
one set of solutions is preferable to another one. In addi- 


tion, it also contains quantitative information about the 
degree of preference. 

Depending on the particular definition of the prefer- 
ence relation, a set can be considered to be better than 
another one or even the other way round. With different 
definitions of such a preference relation, we can expect 
that the optimal result of a search process will be differ- 
ent as well. Therefore, the definition of an indicator is 
essential for formally defining the goal of the set-based 
optimization. In addition, it allows us to adjust the opti- 
mization goal according to the preferences of the user, 
i.e., to provide flexibility with respect to the subset of 
Pareto-optimal solutions searched for. 

But the set indicator and the resulting preference 
relation can not be chosen arbitrarily as they need to 
conform to the concept of Pareto dominance. Other- 
wise, the search process may end up with a set which 
is weakly preferable to all other sets but does not con- 
tain any Pareto-optimal solution. 

In order to derive the requirements for a well- 
behaved set indicator, let us first generalize the concept 
of Pareto dominance of solutions to Pareto dominance 
of sets. 


Definition 48.2 

A set of solutions A C X weakly Pareto dominates 
a set of solutions B C X, denoted as A = B, if (Yb € 
B: (Ja € A : a< b)). Set A strongly Pareto-dominates 
set B, denoted as A < B if 


(A 3B)^ (BZA). 


In other words, a set of solutions A weakly dom- 
inates a set of solutions B if every solution in B is 
weakly dominated by at least one solution in A. More- 
over, it can be shown that the set-based dominance 
relation defines a preorder, i. e., it is suited for optimiza- 
tion purposes. 

Now, let us define the concept of a set indicator 
and its induced preference relation. In the first part of 
this section, we restrict ourselves to unary indicators. 
A more detailed discussion on the various aspects of 
indicators is provided in [48.5]. 


Definition 48.3 

A unary indicator maps each set A C X of the decision 
space to a real number I(A) € R. Given an indicator, 
we can determine the corresponding preference relation 
= as 


A <; B:= (I(A) =1(B)). 


985 


7°84 |3 Hed 


986 PartE 


Evolutionary Computation 


7°84 |3 Hed 


In other words, the larger the set indicator of a set of 
solutions the better we consider the set. It can be shown 
that the preference relation induced by the indicator de- 
fines a total order on the set of solutions X. 

As discussed above, not all preference relations can 
be used inside search methods as they at least need 
to comply to the definition of Pareto dominance in 
Def. 48.2. To this end, the following definition describes 
the notion of preference refinement: 


Definition 48.4 
A preference relation <,er is denoted as a refinement 
of x if 


Ax<B> AX B. 


What we need to guarantee can be formulated as fol- 
lows: If a solution A C X is strictly better than a solu- 
tion B C X in the sense of Pareto dominance, i.e., A < 
B, then the preference relation used for optimization 
should say so as well, i. e., A <ref B, see also [48.5]. 
If we use this result for the unary indicator accord- 
ing to Def. 48.3, then we directly get the following 
condition for a compliant indicator, i. e., whose corre- 
sponding preference relation is a refinement of < 


A<B= (I(A)> I(B). (48.1) 


In other words, if a solution A is strictly better than 
a solution B, i.e., A < B, then the indicator should 
say so as well, i.e., (A) > 7(B). It has been shown 
in [48.5] that the Pareto-compliance guarantees that 
a set with the maximal indicator value is minimal with 
respect to the Pareto—dominance relation according to 
Def. 48.2. 

Indicators have been introduced to the area of mul- 
tiobjective evolutionary optimization first as a mean to 
compare different optimization runs [48.6—9]. The use 
of indicators to guide multiobjective search methods in 
general appeared in the year 2003, notably in [48.10] 
and later in [48.11—13]. In a more restricted setting, in- 
dicators have been used for archiving, i. e., maintaining 
a set of Pareto-approximate solutions [48.3, 14]. 

In several studies, the properties of set indicators 
have been investigated in terms of their compliance to 
the Pareto dominance [48.7, 15]. Whereas many well 
known and widely applied indicators do not fall into 
this class, there are various indicators that at least sat- 
isfy a weak refinement (A < B > A gref B), e.g., the 
unary R and R; indicators [48.16] and the multiplica- 
tive as well as the unary additive and multiplicative 


epsilon indicators [48.3, 15]. The latter two indicators 
are related to additive or multiplicative approxima- 
tion [48.17, 18]. 

Before discussing binary indicators, let us intro- 
duce an example of a set indicator that is compliant to 
Pareto dominance. The hypervolume indicator has been 
introduced to the field of multiobjective evolutionary 
optimization in [48.19] for the purpose of performance 
assessment. It can be defined as 


IHA, R) = J dz, (48.2) 


z€H(A,R) 


where H(A, R) denotes the objective space dominated 
by A and dominating R 


H(A,R) = 
{ze R”|3a €A:dr Ee R:f(a)<z<r}. 


In other words, we determine the volume covered by 
all points z € R” that are enclosed between the image 
of the solutions in objective space f(A) and the ref- 
erence set R, where enclosed is interpreted in terms 
of weak Pareto dominance. Due to its compliance to 
Pareto dominance it has been used in most of the indi- 
cator-based selection schemes to date. 

One of the major drawbacks of the hypervolume 
indicator is the associated computational overhead. 
Bringmann and Friedrich [48.20] have proven that 
the problem of computing the hypervolume is #P- 
complete, i. e., there exists no polynomial algorithm un- 
less NP = P. Several algorithms have been proposed in 
the past to determine the hypervolume indicator, start- 
ing from the hypervolume by slicing objective approach 
independently proposed by several authors (Knowles 
and Zitzler) with complexity O(N"—!) where N is the 
number of solutions in the population and n is the num- 
ber of objectives. Later on, improved version appeared 
with complexity O(N"~*logN) [48.21] and finally 
O(N log N + N”? log N) [48.22]. An approximation al- 
gorithm with proven bounds is presented in [48.23] 
which gives an €-approximation of the hypervolume 
with probability (1 — 8) in time O(log(1/8) nN/e?). 

Binary indicators 1(A,B) can be used to com- 
pare two sets A and B as described in the following 
Def. 48.5. 


Definition 48.5 
A binary indicator maps an ordered pair of sets A, B C 
X of the decision space to a real number /(A, B) € R. 


Indicator-Based Selection | 48.3 Selection Schemes 


Given a binary indicator, we can determine the corre- 
sponding preference relation <; as 


A xı B:= (I(A,B)>1(B,A)). 


In a similar way to (48.1), one can derive the condition 
for an indicator whose corresponding preference rela- 
tion is a refinement of =<, 


A<B => (I(A, 8B) > I(B, A)). 


Two popular examples of a binary indicators that 
have been successfully used in indicator-based selec- 
tion [48.11,24,25], archiving [48.3] and approximation 
schemes [48.18] are the additive and multiplicative ep- 
silon indicators. They can be defined as 


I} (A,B) = min max F$ (a,b) , 


bEB acA 


FE (a,b) = min (fib) -f:(a)) . (48.3) 


48.3 Selection Schemes 
48.3.1 Basic Search Algorithm 


Let us start the discussion of indicator-based selec- 
tion with a simple template of a multiobjective evolu- 
tionary algorithm (SPAM - set preference algorithm 
for multiobjective optimization [48.26]) as shown in 
Alg. 48.1. 


Algorithm 48.1 Simple SPAM 
1: generate initial set of solutions P of size u 
while termination criterion not fulfilled do 
generate A offspring solutions O € X 
S = select(P UO, m) 
if S <et P then P < S 
return P 


aw Ee Oy 


Obviously, the template is still very simplistic but 
it will help us to understand the integration of the 
concept of indicators and selection in multiobjective 
evolutionary algorithms. Line 3 in Alg. 48.1 refers to 
the variation of solutions that are in population P, i. e., 
starting with mating selection and then applying varia- 
tion operators such as mutation and recombination. This 
essential part of any evolutionary algorithm will not be 
discussed further here. Line 4 is denoted as environ- 


for the additive version and 
ok e : x 
Ie (A, B) = min max Fe (a,b), 


: =. fib) 
F% (a, b) = isin f(a) 


for the multiplicative one. Formally speaking, 
Ltn: B) (or IZ(A,B)) denotes the maximum 
amount one can to add to (or multiply to) every 
objective value f;(a) of every solution a € A such that 
the resulting set still weakly dominates B. 

Unfortunately, the above binary indicators do not 
induce a preorder as the resulting preference relation 
is not transitive in general [48.5]. This negative result 
needs to be considered when deciding to use it (and sim- 
ilar generalizations of unary indicators) in optimization 
algorithms. 

Next, indicator-based selection and its integra- 
tion into multiobjective search algorithms will be 
discussed. 


(48.4) 


mental selection and reduces the union of parent set P 
and offspring set O from size u +À to u again, i.e., 
SC PUO and |S| = u. Finally, line 5 is responsible 
for selecting either the old population P or the new one 
S depending on the chosen preference relation %ref. 

In the following, we will stepwise refine the selec- 
tion operator in line 4 and thereby, relate the above 
template to existing indicator-based selection schemes. 


48.3.2 Exhaustive Selection 


Let us first suppose that the selection operator in line 
4 is exhaustive in the following sense: If there exists 
a subset S C P UO with |S| = u that satisfies S <;e¢ P, 
then it will generate it with nonzero probability. Under 
this condition, one can proof an important convergence 
property of the algorithm that ensures that there is no 
deterioration behavior as reported for algorithms such 
as NSGA-II and SPEA2 [48.3]. The line of arguments 
is just sketched here as it closely follows the investiga- 
tions in [48.27]. 

In most general terms, the goal of the optimization 
is to generate as large as possible subset of the Pareto- 
optimal solutions. Therefore, what we at least require is 
that Alg. 48.1 generates such a set provided that it runs 


987 


€°8t7 |3 Hed 


988 PartE 


Evolutionary Computation 


€°847 |3 Hed 


long enough. Indeed, one can show that this is the case 
if: 


a) The offspring generation in line 3 is exhaustive, i. e., 
all solutions are generated with nonzero probability, 

b) The selection operator is exhaustive and 

c) Xref iS a refinement of the Pareto dominance. 


Let us suppose for simplicity of arguments, that 
there are more than u Pareto-optimal solutions in X. 
Moreover, suppose that the population at some point 
in time (still) contains a dominated solution. Then 
there is a nonzero-probability that in the set O of off- 
springs there is a Pareto-optimal solution not yet in 
P. Replacing the dominated solution with the addi- 
tional Pareto-optimal solution leads to a preferred set 
according tO %ref as it refines the Pareto dominance. 
Note, however, that the above convergence property 
does not mean that Alg. 48.1 determines an opti- 
mal set w.r.t. tO <;e, 1. e., that the resulting subset of 
Pareto-optimal solutions actually is minimal in terms 
of Xref- 


Exhaustive selection can usually not be imple- 
mented efficiently, as all possible subsets must be 
tested, i.e., (ore) possible preference relations. The 
following refinements of the basic algorithm lead to 
more efficient schemes. 


48.3.3 Steady State and Greedy Selection 


A first possibility has been proposed in the indica- 
tor-based selection and archiving schemes described 
in [48.12, 14]. The size of the offspring set is A = 1 and 
therefore, at most u + 1 preference relations need to be 
constructed in each iteration. In particular, the hyper- 
volume indicator I4 [48.9] (S measure) has been used 
to define the preference relation, i. e., ref: =<. In this 
case, the selection in line 4 just removes the solution 
that leads to the least loss in Jy. 

Still, the convergence to a Pareto-optimal subset can 
be guaranteed if the offspring generation is exhaustive. 
On the other hand, it cannot be guaranteed that the al- 
gorithm determines an optimal set w.r.t. <;ef, i. €., a set 
that is not strictly dominated by any other subset of size 
H W.t.t. X;ep. First counterexamples for various set indi- 
cators that show this property for A < jz and especially 
for A = 1 appeared in [48.5]. A more indepth discus- 
sion on this issue for the hypervolume indicator can be 
found in [48.28]. 

A second approach allows for general sizes A of the 
offspring population © and employs a simple greedy 


strategy, i.e., solutions are removed one-by-one from 
the set PU O until a set with size jz is obtained. The 
following template in Alg. 48.2 sketches the approach. 


Algorithm 48.2 Greedy Selection 
1: procedure select(P U O, u) 
2: S<PUO 
while |S| > u do 
for alla € S do 
5a < loss(S, a) 
choose p € S with 6, = minges ĝa 
S< S\ {p} 
return S 


90) at ON Ch 


If A = 1, then this template covers the steady-state 
selection scheme in [48.12, 14]. The function loss(S, a) 
quantifies the loss in set quality, if solution a is removed 
from it. In line 6, the solution with the smallest loss is 
chosen and removed from the population in line 7. If 
the preference relation is based on a unary indicator as 
shown in Def. 48.3, then the loss function can simply 
be determined as 


loss(S,a) = 1(S) —I(S \ {a}) . 


For the more general case of preference relations that 
are not based on indicators, see also [48.5]. Note that 
convergence to the set with the maximal indicator value 
can now not be guaranteed anymore as the greedy selec- 
tion is a heuristic. For the hypervolume indicator this is 
shown in [48.29]. On the other hand, we still can guar- 
antee that SPAM with greedy selection generates an as 
large as possible subset of the Pareto-optimal solutions 
if the offspring generation in line 3 is exhaustive, i. e. all 
solutions are generated with nonzero probability, and 
Xref is a refinement of the Pareto dominance. 

As shown in Alg. 48.2, we do not need to deter- 
mine the value of the set indicator but only the least 
contributor, i.e. the solutions that leads to the minimal 
loss. Depending on the choice of the indicator, this in- 
formation may be easier to compute than evaluating the 
indicator for A + u different sets and comparing the val- 
ues. In the context of the hypervolume indicator, a more 
detailed discussion on this issue is provided in [48.30]. 

As has been mentioned already, the indicator-based 
selection schemes described here have been applied 
to the problem of archiving as well. Archiving algo- 
rithms attempt to maintain a bounded set of solutions 
given a sequence of solutions [48.3, 14]. In analogy to 
the template in Alg. 48.1, the sequence of solutions 
would be generated by the offspring generation and 


Indicator-Based Selection | 48.3 Selection Schemes 


the selection process would determine the new popu- 
lation P. For archiving purposes, one is usually only 
interested in maintaining a subset of all nondominated 
solutions received so far. Dominated solutions in P are 
usually not considered in the underlying set preference 
relations. 

The Pareto—dominance relation on sets is by def- 
inition insensitive to dominated solutions. The same 
holds for set preference relations based on popular 
indicators such as the hypervolume indicator which re- 
flects the volume dominated by a set of solutions. On 
the other hand, preferences among dominated solutions 
may be of importance to guide the search. In partic- 
ular, all solutions in P (dominated and nondominated 
ones) are candidates for the mating selection and there- 
fore, may be chosen for variation in the generation of 
offspring solutions. Therefore, useful set preference re- 
lations that are refinements of Pareto dominance need 
to be constructed that allow us to consider preferences 
on dominated solutions as well. 


48.3.4 Hierarchical Set Preferences 


Most of the hierarchical indicator-based selection 
schemes that have been described so far combine set 
indicators with constructing a sequence of subsets of 
solutions [48.5, 12,31]. For example, nondominated 
sorting [48.32, 33] starts with the whole set as the first 
element of the sequence, and then removes the nondom- 
inated solutions from the previous subset to construct 
the next subset in the sequence. The following Alg. 48.3 
provides a template for a hierarchical selection scheme 
involving nondominated sorting. Many variants of the 
above basic scheme could be thought of such as other 
subset-constructions like dominance ranking [48.34]. 


Algorithm 48.3 Hierarchical Selection 
1: procedure select PU O, u 


2: S<PUO 

3: Sø 

4 lew 

5: repeat 

6: S < S US” 

T: S” < {ae S| Abe S:b<a} 
8 S< S\ S” 


9: until |S’U S”|> pu 
10: while |S’ U S”| > u do 


11: foralla € S” do 

12: ôa < loss(S”, a) 

13: choose p € S” with §, = minges” ôa 
14: S ag” \ {p} 


15: return S'U S” 


The set S in Alg. 48.3 before the execution of the 
iteration in lines 5—9 contains the last subset in the 
sequence of dominating subsets. When leaving the iter- 
ation with line 9, we have PU O = SU S'U S” where S” 
is the last dominating set that has been peeled off. The 
iteration in lines 10—14 removes solutions from $” one- 
by-one as in Alg. 48.2 until the set S’ U S” contains u 
solutions. In [48.5], a detailed analysis of the optimiza- 
tion and convergence properties of such constructions 
is provided. In particular, conditions are derived under 
which the corresponding preference relation is a refine- 
ment of Pareto dominance, 1. e., can safely be used in an 
indicator-based selection. 


48.3.5 Using Binary Indicators 


The previous discussion concentrated on the use of 
unary indicators in multiobjective evolutionary algo- 
rithms. On the other hand, one of the first indicator- 
based selection schemes used for optimization was 
based on the concept of binary indicators following 
Def. 48.5 [48.11]. In particular, the use of a binary vari- 
ant of the hypervolume indicator, (48.2), as well as the 
use of the binary additive epsilon indicator, (48.3), have 
been described. 

The structure of IBEA (indicator-based evolution- 
ary algorithm) follows directly the greedy selection 
scheme as described in Alg. 48.2. In the basic scheme 
of IBEA, i.e., without parameter adaptation, the loss- 
function is computed as follows, 


loss(S,a j=- SO he (48.5) 
beS\{a} 


Let us interpret the above loss function by means of 
the binary additive epsilon indicator, i.e., (A,B) := 
It (A,B). The solution with the smallest loss- 
function will be selected for removal from the set. 
This actually is the solution with the largest sum 
oes\ fa} e!tb}.ta})/K | Tf considering a large scaling 
factor x, the sum of exponentials actually acts similar 
to sorting the indicators and just considers the largest 
one. If « is smaller, then not only the largest indicator 
is taken into account but also smaller ones. As a result, 
the sum is dominated by the solution b which leads to 
the largest value of [({b}, {a}). 

Remember that /({b}, {a}) = miny<j<, (f(a) — 
fi(b)) denotes the maximal amount one can add to 
every objective value f;(b) of solution b such that it 


989 


€°8t7 |3 Hed 


990 PartE 


Evolutionary Computation 


1°84 |3 Hed 


still weakly dominates a. As a result for large values 
of k we can summarize the selection as follows: For 
each solution a, we determine the solution b to which 
we can add the largest amount such that it still weakly 
dominates a. The solution a is removed for which 
this amount is largest. In some sense, the first step 
determines the closest solution (or strongest dominator) 
to a in the epsilon indicator and the second step then 
removes the solution which has the closest neighbor (or 
strongest dominator). If « is smaller, also the closeness 


48.4 Preference-Based Selection 


Recently, there has been increasing interest in con- 
structing evolutionary optimization methods that allow 
to consider search preferences of the user. In other 
words, the resulting set of solutions should not contain 
an arbitrary subset of Pareto-optimal solutions but one 
that satisfies secondary criteria. For example, it may 
be desirable to preferably determine solutions that are 
close to some reference point, that are along a direction 
in the objective space, or have some other predefined 
distribution. 

The choice of the right subset of solutions as a re- 
sult of an evolutionary multiobjective optimization was 
of major concern since the beginning. In particular, it 
was a major goal to design algorithms that lead to well- 
distributed solutions that are close to the Pareto-opti- 
mal front. Various heuristics have been implemented 
in standard algorithms like SPEA2 [48.2] and NSGA- 
II [48.1] to achieve such an implicitly defined objective. 

The concept of indicator-based selection changed 
this approach fundamentally. It not only allows to for- 
malize the objective of population-based multiobjective 
optimization in general but also to design algorithms 
that optimize a set of solutions toward it. As a result of 
this achievement, the focus of research moved toward 
the following questions: 


@ What kind of user preference is useful in the context 
of preference-based search? 

@ How can these preferences be mathematically for- 
mulated and incorporated in set indicators? 

@ How canapreference-based algorithm be integrated 
into an interactive optimization approach that in- 
volves the decision maker? 


Including preference information in multiobjective 
evolutionary methods has been investigated since the 
beginning [48.37] for a survey. In a very early attempt, 


(or domination) of other solutions b is taken into 
account. 

Recently, an indicator-based algorithm has been 
developed [48.25], which uses a similar principle. It 
uses the binary epsilon indicator as defined in (48.3) 
and (48.4), but instead of comparing successive popu- 
lations as in IBEA, it uses a possibly growing archive of 
the best Pareto-approximations found so far as the ref- 
erence set. The solution to be removed according to the 
template in Alg. 48.2 is determined by sorting. 


Fonseca and Fleming [48.34] suggested to assign ranks 
to the members of a population. Much later it was pro- 
posed in [48.38] to include preferences through the use 
of reference points, guided dominance schemes and 
a biased crowding scheme. Preference-based multiob- 
jective evolutionary methods can be used within a hy- 
brid approach that combine ideas from both, evolution- 
ary and interactive multiobjective optimization [48.24]: 
In an iterative approach, several consecutive runs of the 
evolutionary algorithm are performed. The user is asked 
to give preference information in terms of his refer- 
ence point consisting of desirable aspiration levels for 
objective functions. This information is used in a prefer- 
ence-based evolutionary algorithm that generates a new 
population by combining the fitness function and a so- 
called achievement scalarizing function containing the 
reference point. 

In the meantime, many other possibilities to for- 
malize user preferences have been investigated [48.35, 
39], for example weight functions in the objective space 
which change the desired (nonuniform) density of so- 
lutions, stressing objectives, guiding the search toward 
preference points, transforming objective functions, 
weighted Tchebycheff approaches using ideal points, 
epsilon-constraint methods, and desirability functions, 
just to name a few. 

In the following, two examples for considering 
preference information in selection schemes will be 
described in some more detail. In [48.24], the greedy 
selection scheme according to Alg. 48.2 with a binary 
indicator according to (48.5) has been used. In particu- 
lar, a normalization according to 


PAb}, {a}) = IGD}, {a})/s" (8, f(a) 


is proposed. The normalization function s(g,f(x)) is 
closely related to the concept of achievement scalariz- 


Indicator-Based Selection | 48.4 Preference-Based Selection 


a) Stressing an extreme 
Al 


0 = 0 
0 1 0 


b) Emphasizing a preference point 


0 


n WY 
0 1 


Fig. 48.1a,b The figures show the Pareto front approximations (dots) found by HypE (after [48.31]) using different 
weight distribution functions, shown as contour lines at intervals of 10% of the maximum weight value. For both rows 
one parameter of the sampled distribution was modified, i.e., on top the rate parameter of the exponential distribution, 
on the bottom the spread of a multivariate normal distribution (after [48.35]). The test problem is ZDT1 (after [48.36]) 


where the Pareto front is shown as a solid line. The graphics appeared in [48.35] 


ing functions, first proposed by Wierzbicki [48.40] 
s(g, f(a)) = max (fi(a) — gi) . 
1<i<n 


where g denotes the reference point whose components 
represent the desired values of the objective functions. 
The function s* is obtained from s by normaliza- 
tion such that only positive values are obtained, i.e., 
s* (g, f (a)) > 0 [48.24]. 

A second approach uses the the concept of 
a weighted hypervolume indicator as proposed 
in [48.41]. In extension to (48.2), we now determine 
the weighted volume that is covered by all points 
z€ R” that are enclosed between the image of the 
solutions in objective space f(A) and the reference 
set R, where enclosed is interpreted in terms of weak 
Pareto dominance: 


Definition 48.6 

Given a set of solutions A C X, a set of reference 
points R C R” and a positive weight function w: 
IR — Rso. Then the weighted hypervolume indica- 


tor (A, R) of A with respect to R is defined as 


KWA, R) = w(z)-dz, (48.6) 


z€H(A,R) 


where H(A, R) denotes the objective space dominated 
by A and dominating R 


H(A,R) = 
{z€ R”|Ja € A:sreR: f(a) <z<r}. 


The weight function is supposed to be integrable on any 
bounded set, i.e., Soc.y) W@dz <oo for any y >0, 
where B(0, y) is the open ball centered in 0 and of ra- 
dius y. 


In a similar way to (48.2), the weighted hypervol- 
ume indicator is compliant to Pareto dominance and 
can safely be used in the previously described indica- 
tor-based selection schemes. 

In later work [48.35,39], the approach [48.41] 
has been extended toward more general weight func- 


991 


1'8 | 4 Hed 


992 


S°84 |3 Hed 


Part E 


Evolutionary Computation 


tions, their relation to typical user preference spec- 
ifications and higher dimensions. Moreover, it is 
well known that the exact computation of the hy- 
pervolume is expensive in the number of objec- 
tives, i.e., it is exponential unless P = NP. To this 


48.5 Concluding Remarks 


There has been a major shift in our understanding of 
population-based multiobjective optimization. In a cer- 
tain sense, the focus of classical algorithms such as 
NSGA-II or SPEA2 was the Pareto—dominance rela- 
tion between individual solutions. Properties such as 
a large diversity of solutions in the final population was 
achieved through (clever) heuristics and tuning of the 
selection mechanisms. 

The role of set indicators in multiobjective opti- 
mization was first limited to the performance assess- 
ment. The possibility to assign a single measure to a set 
was used in elaborated methods that allow us to com- 
pare the results of optimization runs and to statistically 
verify whether one algorithms is preferable to another 
one. In this context, indicators have been compared in 
terms of their suitability for performance assessment, 
e.g., whether they comply to the underlying preference 
relation between solutions. 

Recently, there is a growing number of very com- 
petitive search algorithms that are based on an explicit 
formulation of the optimization goal as a set prop- 
erty, i.e., they build on the concept of set indicators. 
In simplified terms, they can be regarded as optimiza- 
tion methods that deal with sets of solutions as their 
optimization object. In contrast, single-objective op- 
timization traditionally works with single solutions. 
One can simply draw the correspondence between tra- 
ditional single-objective optimization and population- 
based multiobjective optimization as follows: single 
solution versus set of solutions, and single objective 
function versus single set indicator. 

This major breakthrough leads to several advan- 
tages in terms of analysis and algorithm design: 


© Algorithms are conceptually simpler as they are 
based on a single indicator and do not rely on 
heuristics to a large extent. As a result, it can be ex- 
pected that they are more robust and less parameter 
tuning is necessary. 

@ Certain convergence properties can be derived. As 
a result, the new class of algorithms does not show 


end, efficient sampling methods [48.31] have been 
combined with the general concept of weight func- 
tions. Figure 48.1 shows some examples of the ef- 
fect of weighting the hypervolume indicator, taken 
from [48.35]. 


deterioration and/or cyclic behavior. It also appears 
in some of the experiments that have been con- 
ducted, that they are more robust toward increasing 
the number of objectives. 

@ The optimization criterion is made explicit, i. e., the 
discussion about convergence versus diversity in the 
research community can now be based on quantita- 
tive measures. 

@ By changing the set indicator, it is possible to ex- 
plicitly consider preferences of a user. It will be 
seen whether this possibility will lead to interactive 
methods that involve the decision maker in the opti- 
mization process. 


The purpose of the chapter was to introduce the con- 
cept of set indicators and to discuss several ways to 
use them as essential parts of a set-based multiobjective 
optimization algorithm. Many other important aspects 
have not been discussed in detail. In particular, due to 
its superior properties in terms of: 


a) Compliance to Pareto dominance, and 
b) Sensitivity to changes of single solutions and 
c) Simple interpretation. 


The hypervolume indicator has been very popular 
as a component of set-based optimization methods. Un- 
fortunately, it has some detrimental properties, such 
as its computation complexity with respect to growing 
number of objectives. In addition, the complexity to de- 
termine a subset of solutions that has the least influence 
on the hypervolume increases exponentially with the 
size of the subset. 

Finally, it has been shown that removing solutions 
one-by-one may lead to a major loss in optimiza- 
tion quality. Therefore, its use in indicator-based se- 
lection needs to be done with care. As described in 
this chapter, recent methods overcome some of these 
difficulties by using advanced methods such as sam- 
pling. A more detailed investigation and review of the 
hypervolume indicator can be found e.g., in [48.31, 


35]. 


Indicator-Based Selection 


References 


References 
48.1 K. Deb, S. Agrawal, A. Pratap, T. Meyarivan: A 48.16 M.P. Hansen, A. Jaszkiewicz: Evaluating the Qual- 
fast elitist non-dominated sorting genetic algo- ity of Approximations to the Non-dominated Set, 
rithm for multi-objective optimization: NSGA-II, Tech. Rep. IMM-REP-1998-7 (Technical Univ. of 
Lect. Notes Comput. Sci. 1917, 849-858 (2000) Denmark, Lyngby 2010) pp. 1-31 
48.2 E. Zitzler, M. Laumanns, L. Thiele: SPEA2: Improv- 48.17 C.H. Papadimitriou, M. Yannakakis: On the ap- 
ing the strength Pareto evolutionary algorithm for proximability of trade-offs and optimal access of 
multiobjective optimization, Evol. Methods Des. web sources, 41st Annu. Symp. Found. Comput. Sci. 
Optim. Control Appl. Ind. Probl. (2002) pp. 95-100 (2000) pp. 86-92 
48.3 M. Laumanns, L. Thiele, K. Deb, E. Zitzler: Com- 48.18 K. Bringmann, T. Friedrich: The maximum hyper- 
bining convergence and diversity in evolutionary volume set yields near-optimal approximation, 
multiobjective optimization, Evol. Comput. 10(3), Genet. Evol. Comput. Conf. (2010) pp. 511-518 
263-282 (2002) 48.19 E. Zitzler, L. Thiele: Multiobjective optimization us- 
48.4 T. Wagner, N. Beume, B. Naujoks: Pareto-, aggrega- ing evolutionary algorithms — A comparative case 
tion-, and indicator-based methods in many-ob- study, Lect. Notes Comput. Sci. 1498, 292-304 (1998) 
jective optimization, Lect. Notes Comput. Sci. 4403, 48.20 K. Bringmann, T. Friedrich: Approximating the 
742-756 (2007) volume of unions and intersections of high- 
48.5 E. Zitzler, L. Thiele, J. Bader: On set-based multiob- dimensional geometric objects, Lect. Notes Com- 
jective optimization, IEEE Trans. Evol. Comput. 14(1), put. Sci. 5369, 436-447 (2008) 
58-79 (2010) 48.21 C.M. Fonseca, L. Paquete, M. López-Ibáñez: An 
48.6 V. da Grunert Fonseca, C.M. Fonseca, A.O. Hall: improved dimension-sweep algorithm for the hy- 
Inferential performance assessment of stochastic pervolume indicator, Congr. Evol. Comput. (2006) 
optimisers and the attainment function, Conf. Evol. pp. 1157-1163 
Multi-Criterion Optim. (EMO 2001), ed. by E. Zit- 48.22 N. Beume: S-Metric calculation by considering 
zler, K. Deb, L. Thiele, C.A. Coello Coelle, D. Corne dominated hypervolume as Klee's measure prob- 
(Springer, Berlin, Zurich 2001) pp. 213-225 lem, Evol. Comput. 17(4), 477-492 (2009) 
48.7 J. Knowles, D. Corne: On metrics for comparing 48.23 K. Bringmann, T. Friedrich: S-Metric calculation 
non-dominated sets, Conf. Evol. Comput. (2002) by considering dominated hypervolume as Klee's 
pp. 711-716 measure problem, Comput. Geom. 43(6/7), 601-610 
48.8 D.A. Van Veldhuizen, G.B. Lamont: On measur- (2010) 
ing multiobjective evolutionary algorithm perfor- 48.24 L. Thiele, K. Miettinen, P.J. Korhonen, J. Molina: 
mance, Congr. Evol. Comput. (2000) pp. 204-211 A preference-based evolutionary algorithm for 
48.9 E. Zitzler, L. Thiele: Multiobjective evolutionary multi-objective optimization, Evol. Comput. 17(3), 
algorithms: A comparative case study and the 411-436 (2009) 
strength Pareto approach, IEEE Trans. Evol. Comput. 48.25 K. Bringmann, T. Friedrich, F. Neumann, M. Wag- 
3(4), 257-271 (1999) ner: Approximation-guided evolutionary multi- 
48.10 M. Fleischer: The measure of Pareto optima. Ap- objective optimization, Proc. 22nd Int. Jt. Conf. 
plications to multi-objective metaheuristics, Conf. Artif. Intell. (2011) pp. 1198-1203 
Evol. Multi-Criterion Optim. (2003) pp. 519-533 48.26 E. Zitzler, L. Thiele, J. Bader: SPAM: Set preference 
48.11 E. Zitzler, S. Kiinzli: Indicator-based selection in algorithm for multiobjective optimization, Lect. 
multiobjective search, Lect. Notes Comput. Sci. Notes Comput. Sci. 5199, 847-858 (2008) 
3242, 832-842 (2004) 48.27 G. Rudolph, A. Agapie: Convergence properties 
48.12 M. Emmerich, N. Beume, B. Naujoks: An EMO algo- of some multi-objective evolutionary algorithms, 
rithm using the hypervolume measure as selection Congr. Evol. Comput. (2000) pp. 1010-1016 
criterion, Evol. Multi-Criterion Optim. 3rd Int. Conf. 48.28 K. Bringmann, T. Friedrich: Convergence of hyper- 
(2005) pp. 62-76 volume-based archiving algorithms |: Effective- 
48.13 C. Igel, N. Hansen, S. Roth: Covariance matrix adap- ness, 13th Annu. Genet. Evol. Comput. Conf. (2011) 
tation for multi-objective optimization, Evol. Com- pp. 745-752 
put. 15(1), 1-28 (2007) 48.29 K. Bringmann, T. Friedrich: An efficient algorithm 
48.14 J.D. Knowles, D. Corne: Properties of an adaptive for computing hypervolume contributions, Evol. 
archiving algorithm for storing nondominated vec- Comput. 18(3), 383-402 (2010) 
tors, IEEE Trans. Evol. Comput. 7(2), 100-116 (2003) 48.30 K. Bringmann, T. Friedrich: Approximating the least 
48.15 E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, hypervolume contributor: NP-hard in general, but 


V. da Grunert Fonseca: Performance assessment of 
multiobjective optimizers: An analysis and review, 
IEEE Trans. Evol. Comput. 7(2), 117-132 (2003) 


fast in practice, evolutionary multi-criterion op- 
timization, Lect. Notes Comput. Sci. 5467, 6-20 
(2009) 


993 


847 | 3 Hed 


994 PartE 


Evolutionary Computation 


847 | 3 Hed 


48.31 


48.32 


48.33 


48.34 


48.35 


48.36 


J. Bader, E. Zitzler: HypE: An algorithm for fast 
hypervolume-based many-objective optimization, 
Evol. Comput. 19(1), 45-76 (2011) 

N. Srinivas, K. Deb: Multiobjective optimization us- 
ing nondominated sorting in genetic algorithms, 
Evol. Comput. 2(3), 221-248 (1994) 

D.E. Goldberg: Multiobjective optimization. In: Ge- 
netic Algorithms in Search, Optimization, and Ma- 
chine Learning (Addison-Wesley, Reading 1989) 
pp. 197-201 

C.M. Fonseca, P.J. Fleming: Genetic algorithms for 
multiobjective optimization: Formulation, discus- 
sion and generalization, Proc. 5th Conf. Genet. 
Algorithms (1993) pp. 416-423 

J. Bader: Hypervolume-Based Search for Multiob- 
jective Optimization: Theory and Methods, Ph.D. 
Thesis (CreateSpace, ETH Zurich 2010) 

E. Zitzler, K. Deb, L. Thiele: Comparison of multiob- 
jective evolutionary algorithms: Empirical results, 
Evol. Comput. 8(2), 173-195 (2000) 


48.37 


48.38 


48.39 


48.40 


48.41 


C.A. Coello Coello: Handling preferences in evo- 
lutionary multiobjective optimization: A survey, 
Congr. Evol. Comput. (2000) pp. 30-37 

K. Deb, J. Sundar: Reference point based multi- 
objective optimization using evolutionary algo- 
rithms, Genet. Evol. Comput. Conf. (2006) pp. 635- 
642 

A. Auger, J. Bader, D. Brockhoff, E. Zitzler: Ar- 
ticulating user preferences in many-objective 
problems by sampling the weighted hypervol- 
ume, Genet. Evol. Comput. Conf. (2009) pp. 555- 
562 

A. Wierzbicki: The use of reference objectives 
in multiobjective optimization, Lect. Notes Econ. 
Math. Syst. 177, 468-486 (1980) 

E. Zitzler, D. Brockhoff, L. Thiele: The hyper- 
volume indicator revisited: On the design of 
Pareto-compliant indicators via weighted inte- 
gration, Lect. Notes Comput. Sci. 4403, 862-876 
(2007) 


49, Multi-Objective Evolutionary Algorithms 


Kalyanmoy Deb 


Evolutionary algorithms (EAs) have amply shown 
their promise in solving various search and opti- 
mization problems for the past three decades. One 
of the hallmarks and niches of EAs is their ability 
to handle multi-objective optimization problems 
in their totality, which their classical counterparts 
lack. Suggested in the beginning of the 1990s, 
evolutionary multi-objective optimization (EMO) 
algorithms are now routinely used in solving 
problems with multiple conflicting objectives in 
various branches of engineering, science, and 
commerce. In this chapter, we provide an overview 
of EMO methodologies by first presenting princi- 
ples of EMO through an illustration of one specific 
algorithm and its application to an interesting 
real-world bi-objective optimization problem. 
Thereafter, we provide a list of recent research 
and application developments of EMO to provide 
a picture of some salient advancements in EMO re- 
search. The development and application of EMO to 
multi-objective optimization problems and their 
continued extensions to solve other related prob- 
lems has elevated EMO research to a level which 
may now undoubtedly be termed as an active field 
of research with a wide range of theoretical and 
practical research and application opportunities. 


49.1 Preamble oeenn eeit 995 
49.2 Evolutionary Multi-Objective 
Optimization (EMO) .....................0.0.00. 996 


49.1 Preamble 


Search and optimization problems, particularly involv- 
ing nonlinear, non-convex and non-differentiable objec- 
tive and constraint functions, provide a stiff challenge 
even today. No known mathematical algorithm exists 
to solve such problems to optimality. In such cases, 
the use of meta-heuristic optimization methods such 
as evolutionary algorithms [49.1—3], simulated anneal- 


49.21 EMO Principles.........:.06.crscseeeees 997 
49.2.2 A Posteriori MCDM Methods 
and EMO ss cccsnsacesesadaectsastdeciwns 998 


49.3 A Brief Timeline 
for the Development 


of EMO Methodologies ....................... 999 
49.4 Elitist EMO: NSGAKII................. eee 1000 
49.4.1 Sample Results ................::c.cee 1000 
49.4.2 Constraint Handling in EMO....... 1001 
49.5 Applications of EMO.....................0000 1002 
49.5.1 Spacecraft Trajectory Design ...... 1002 
49.6 Recent Developments in EMO................ 1004 
49.6.1 Hybrid EMO Algorithms............. 1004 
49.6.2 Multi-Objectivization ............... 1005 
49.6.3 Uncertainty-Based EMO............ 1005 
49.6.4 EMO and Decision-Making........ 1006 


49.6.5 EMO for Handling 
a Large Number of Objectives: 


Many-Objective EMO ................ 1006 
49.6.6 Knowledge Extraction Through 
BND ij ovcdcanao vaenvasacheanpnctedeaves 1008 
49.6.7 Dynamic EMO... 1008 
49.6.8 Quality Estimates for EMO ......... 1009 
49.6.9 Exact EMO with Run-Time 
Analy S nr 1009 
49.6.10 EMO with Meta-Models ............ 1010 
49.7 Conclusions ....................::eceeeeeeeeeeeeee 1010 
References... cc ceeeceneeeeeeeeeaeeeenees 1011 


ing [49.4], tabu search [49.5,6], and other methods 
motivated by another natural or physical phenomenon 
have been popularly applied. 

EAs were traditionally used for solving problems 
having a single goal or objective. However, most real- 
world problems have multiple conflicting goals and 
theoretically they give rise to a set of trade-off solu- 


995 


vV 
o 
= 
eb 
m 
E 
\o 
= 


996 PartE 


Evolutionary Computation 


7°64 | J Hed 


tions. The classical literature to solve multi-objective 
optimization problems has been mostly indirect, mainly 
due to the fact there did not exist any search and op- 
timization methods which could find multiple optimal 
solutions in a single simulation. While the scientific 
community was waiting for a suitable algorithm for 
handling such problems, evolutionary algorithms with 
their population approach caught the eyes of a number 
of researchers. This spurred the development of a series 
of first generation evolutionary multi-objective opti- 
mization (EMO) algorithms around 1993-1995. A set 
of three different algorithms (but all motivated by 
a single idea portrayed by legendary EA researcher, 
Prof. Goldberg [49.1]) showed the world that EMOs 
are viable candidates for multi-objective optimization, 
and that there are meta-heuristic-based approaches for 
finding multiple trade-off solutions in a single simula- 
tion. EMO researchers, and in that spirit the whole EA 
research community, realized the niche of EAs in such 
problem-solving tasks and promoted the developmen- 
tal and application studies using EMO. Subsequently, 
EMO methodologies were made to be better, faster, and 
more accessible. The algorithms were commercialized 
by various software companies and they made the field 


of EMO more popular and applicable to many different 
problems, which academic researchers alone could not 
have done. 

In this chapter, we provide a brief overview of the 
EMO principle, present one EMO algorithm in de- 
tail, and emphasize the importance of using EMO in 
practice. Besides this specific algorithm, there exist 
a number of other equally efficient EMO algorithms, 
which we do not describe here for brevity. Instead, 
in this chapter, we discuss a number of recent ad- 
vancements of EMO research and applications that 
are driving researchers and practitioners ahead. Fortu- 
nately, researchers have utilized the EMO principle of 
solving multi-objective optimization problems in han- 
dling various other problem-solving tasks. The diversity 
of EMO’s research is bringing together researchers 
and practitioners with different backgrounds, includ- 
ing computer scientists, mathematicians, economists, 
and engineers. The topics that we discuss here am- 
ply demonstrate why and how EMO researchers from 
different backgrounds must and should collaborate on 
complex problem-solving tasks, which have become the 
need of the hour in most branches of science, engineer- 
ing, and commerce today. 


49.2 Evolutionary Multi-Objective Optimization (EMO) 


Before we discuss an evolutionary algorithm for multi- 
objective optimization, we present a generic problem 
that involves multiple conflicting objectives. A multi- 
objective optimization problem involves a number of 
objective functions that are to be either minimized or 
maximized, subject to a number of constraints and vari- 
able bounds 


subject to g(x) > 0, JH 1,2, se ge S 
h(x) = 0, BS 2yec8 E 
sP axe, i=1,2,...,n. 

(49.1) 


A solution xe R” is a vector of n decision vari- 
ables: x = (x1,%2,...,X,)’. The solutions satisfying 
the constraints and variable bounds constitute a fea- 


sible set S in the decision variable space R”. One 
of the striking differences between single-objective 
and multi-objective optimization is that in multi- 
objective optimization the objective function vec- 
tors belong to a multi-dimensional objective space 
R”. The objective function vectors constitute a fea- 
sible set Z in the objective space. For each so- 
lution x in S, there exists a point z € Z, denoted 
by f(x) = z= (z.22,....Z)’. To make the descrip- 
tions clear, we refer to a decision variable vector as 
a solution and the corresponding objective vector as 
a point. 

The optimal solutions in multi-objective optimiza- 
tion can be defined from the mathematical concept 
of partial ordering [49.7]. In the parlance of multi- 
objective optimization, the term domination is used 
for this purpose. In this section, we restrict ourselves 
to discussing unconstrained (without any equality, in- 
equality or bound constraints) optimization problems. 
The domination between two solutions is defined as fol- 
lows [49.8, 9]: 


Multi-Objective Evolutionary Algorithms 


49.2 Evolutionary Multi-Objective Optimization (EMO) 


a) f, (minimize) b) f (minimize) 
A 


Fig. 49.1a,b A set of points and the 
first non-dominated front are shown 


i 6 
2 ! 2 
mo ° i 5 o Non- 
4 4 dominated 
front 
MESE k 
o egom ° 3 
i! | --------- bi---o3 i | 1 
rp. ii! i 
2 6 10 14 18 2 6 10 14 18 


fı (maximize) 


Definition 49.1 
A solution xí’ is said to dominate another solution x®, 
if both the following conditions are true. 


1. The solution x is no worse than x® in all ob- 
jectives. Thus, the solutions are compared based on 
their objective function values (or location of the 
corresponding points (z“ and z)) in the objective 
function set Z). 

2. The solution x“) is strictly better than x in at least 
one objective. 


For a given set of solutions (or corresponding 
points in the objective function set Z, for example, 
those shown in Fig. 49.la), a pair-wise comparison 
can be made using the above definition and whether 
one point dominates another point can be established. 
All points that are not dominated by any other mem- 
ber of the set are called non-dominated points of class 
one, or simply non-dominated points. For the set of six 
points shown in the figure, they are points 3, 5, and 
6. One property of any two such points is that a gain 
in an objective from one point to the other happens 
only due to a sacrifice in at least one other objec- 
tive. This trade-off property between non-dominated 
points makes practitioners interested in finding a wide 
variety of them before making a final choice. These 
points make up a front when viewed together on 
the objective space; hence non-dominated points are 
often visualized to represent a non-dominated front. 
The theoretical computational effort needed to se- 
lect the points of the non-dominated front from a set 
of N points is O(NlogN) for 2 and 3 objectives, 
and O(N log”? N) for M > 3 objectives [49.10], but 
for a moderate number of objectives, the procedure 
need not be particularly computationally effective in 
practice. 


fi (maximize) 


With the above concept, now it is easier to define the 
Pareto-optimal solutions in a multi-objective optimiza- 
tion problem. If the given set of points for the above task 
contain all feasible points in the objective space, the 
points lying on the first non-domination front, by defini- 
tion, do not become dominated by any other point in the 
objective space; hence they are Pareto-optimal points 
(together they constitute the Pareto-optimal front), and 
the corresponding pre-images (decision variable vec- 
tors) are called Pareto-optimal solutions. However, 
more mathematically elegant definitions of Pareto- 
optimality (including the ones for continuous search 
space problems) exist in the multi-objective optimiza- 
tion literature [49.9, 11]. Some convergence analyses of 
EMO under certain assumptions can also be found else- 
where [49.12—15]. 


49.2.1 EMO Principles 


In the context of multi-objective optimization, the ex- 
tremist principle of finding the optimum solution cannot 
be applied to one objective alone, when the rest of the 
objectives are also important. This clearly suggests two 
ideal goals of multi-objective optimization. 


© Convergence: find a (finite) set of solutions which 
lies on the Pareto-optimal front. 

© Diversity: find a set of solutions which is diverse 
enough to represent the entire range of the Pareto- 
optimal front. 


EMO algorithms attempt to follow both the above 
principles, similar to the a posteriori multiple cri- 
teria decision-making (MCDM) method. Figure 49.2 
schematically shows the principles followed in an EMO 
procedure. Since EMO procedures are heuristic based, 
they may not guarantee finding exact Pareto-optimal 
points, as a theoretically provable optimization method 


997 


7°64 | J Hed 


998 PartE 


7°64 |3 Hed 


Evolutionary Computation 


Multi-objective 
optimization problem 
minimize fı 
minimize fz 
minimize fm 
subject to constraints 


4 
a 
v 
> 

Nn 


v 


IDEAL 
Multi-objective 
optimizer 


+ 


Multiple trade-off 
solutions found 


Higher-level 
information 


Choose one 
solution 


Step 2 


Fig. 49.2 Schematic of a two-step multi-criteria optimization and decision-making procedure 


would do for tractable (for example, linear or con- 
vex) problems. However, EMO procedures have es- 
sential operators to constantly improve the evolving 
non-dominated points (from the point of view of con- 
vergence and diversity mentioned above) similar to 
how most natural and artificial evolving systems con- 
tinuously improve their solutions. To this effect, a re- 
cent study [49.16] demonstrated that a particular EMO 
procedure, starting from random non-optimal solu- 
tions, can progress towards theoretical Karush—Kuhn— 
Tucker (KKT) points with iterations in real-valued 
multi-objective optimization problems. The main dif- 
ference and advantage of using EMO compared to 
a posteriori MCDM procedures is that multiple trade- 
off solutions can be found in a single run of an 
EMO algorithm, whereas most a posteriori MCDM 
methodologies would require multiple independent 
runs. 

In Step 1 of the EMO-based multi-objective op- 
timization and decision-making procedure (the task 
shown vertically downwards in Fig. 49.2), multiple 
trade-off, non-dominated points are found. Thereafter, 
in Step 2 (the task shown horizontally, towards the 
right), higher-level information is used to choose one 
of the trade-off points obtained. 


49.2.2 A Posteriori MCDM Methods and EMO 


In the a posteriori MCDM approaches (also known 
as generating MCDM methods), the task of finding 
multiple Pareto-optimal solutions is achieved by ex- 
ecuting many independent single-objective optimiza- 
tions, each time finding a single Pareto-optimal solu- 
tion [49.9]. A parametric scalarizing approach (such 
as the weighted-sum approach, €-constraint approach, 
and others) can be used to convert multiple objectives 
into a parametric single-objective objective function. 
By simply varying the parameters (the weight vector 
or the €-vector) and optimizing the scalarized function, 
different Pareto-optimal solutions can be found. In con- 
trast, in an EMO, multiple Pareto-optimal solutions are 
attempted to be found in a single run of the algorithm by 
emphasizing multiple non-dominated and isolated solu- 
tions in each iteration of the algorithm and without the 
use of any scalarization of objectives. 

Consider Fig. 49.3, in which we sketch how mul- 
tiple independent parametric single-objective optimiza- 
tions (through a posteriori MCDM method) may find 
different Pareto-optimal solutions. It is worth high- 
lighting here that the Pareto-optimal front corresponds 
to the global optimal solutions of several problems, 


Multi-Objective Evolutionary Algorithms 


49.3 A Brief Timeline for the Development of EMO Methodologies 


ft 


Local fronts 


Infeasible 
regions 


Pareto-optima 
front 


fi 


Fig. 49.3 A posteriori MCDM methodology employing 
independent single-objective optimizations 


each formed with a different scalarization of objec- 
tives. During the course of an optimization task, algo- 
rithms must overcome a number of difficulties, such 
as infeasible regions, local optimal solutions, flat or 
non-improving regions of objective landscapes, isola- 
tion of optimum, etc., to finally converge to the global 
optimal solution. Moreover, due to practical limita- 
tions, an optimization task must also be completed in 
a reasonable computational time. All these difficul- 
ties in a problem require that an optimization algo- 


rithm strikes a good balance between exploring new 
search directions and exploiting the extent of search 
in currently-best search direction. When multiple runs 
of an algorithm need to be performed independently 
to find a set of Pareto-optimal solutions, the above 
balancing act must be performed in every single run. 
Since runs are performed independently from one an- 
other, no information about the success or failure of 
previous runs is utilized to speed up the overall pro- 
cess. In difficult multi-objective optimization problems, 
such memory-less, a posteriori methods may demand 
a large overall computational overhead to find a set 
of Pareto-optimal solutions [49.17]. Moreover, despite 
the issue of global convergence, independent runs may 
not guarantee achieving a good distribution among 
obtained points by an easy variation of scalarization 
parameters. 

EMO, as was mentioned earlier, constitutes an 
inherent parallel search. When a particular popula- 
tion member overcomes certain difficulties and makes 
a progress towards the Pareto-optimal front, its vari- 
able values and their combination must reflect this fact. 
When a recombination takes place between this so- 
lution and another population member, such valuable 
information of variable value combinations is shared 
through variable exchanges and blending, thereby mak- 
ing the overall task of finding multiple trade-off solu- 
tions a parallely processed task. 


49.3 A Brief Timeline for the Development of EMO Methodologies 


During the seventies and eighties, EA researchers re- 
alized the need for solving multi-objective optimiza- 
tion problems in practice and mainly resorted to using 
weighted-sum approaches to convert multiple objec- 
tives into a single goal [49.18, 19]. 

However, the first implementation of a real multi- 
objective evolutionary algorithm (vector-evaluated GA 
(genetic algorithm) or VEGA) was suggested by Schaf- 
fer in 1984 [49.20]. Schaffer modified the simple 
three-operator genetic algorithm [49.2] (with selection, 
crossover, and mutation) by performing independent 
selection cycles according to each objective. The selec- 
tion method is repeated for each individual objective 
to fill up a portion of the mating pool. Then the en- 
tire population is thoroughly shuffled to apply crossover 
and mutation operators. This is performed to achieve 
the mating of individuals of different subpopulation 
groups. The algorithm worked efficiently for some gen- 
erations but in some cases suffered from its bias towards 


some individuals or regions (mostly individual objec- 
tive champions). This does not fulfill the second goal of 
EMO, discussed earlier. 

Ironically, no significant study was performed for 
almost a decade after the pioneering work of Schaf- 
fer, until a revolutionary 10-line sketch of a new 
non-dominated sorting procedure suggested by Gold- 
berg in his seminal book on GAs [49.1]. Since an 
EA needs a fitness function for reproduction, the trick 
was to find a single metric from a number of ob- 
jective functions. Goldberg’s suggestion was to use 
the concept of domination to assign more copies to 
non-dominated individuals in a population. Since di- 
versity is the other concern, he also suggested the 
use of a niching strategy [49.21] among solutions of 
a non-dominated class. To get this clue, at least three 
independent groups of researchers developed differ- 
ent versions of multi-objective evolutionary algorithms 
during 1993-1994 [49.22—24]. These algorithms differ 


999 


€°67 | J Hed 


1000 PartE 


Evolutionary Computation 


7°64 |3 Hed 


in the way a fitness assignment scheme is introduced 
to each individual. Independently, Poloni [49.25] sug- 
gested a domination-based EMO approach (he called it 
multi-objective genetic algorithm (MOGA)) in which 
instead of niching, a toroidal grid-based local selection 
method was used to find multiple trade-off solutions. 
These early EMO methodologies gave a good head- 
start to the research and application of EMO, but 
suffered from the fact that they did not use an elite- 
preservation mechanism in their procedures. Inclusion 


49.4 Elitist EMO: NSGA-II 


The NSGA-II procedure [49.27] is one of the popularly 
used EMO procedures which attempt to find multiple 
Pareto-optimal solutions in a multi-objective optimiza- 
tion problem and has the following three features: 


1. It uses an elitist principle. 
2. It uses an explicit diversity preserving mechanism. 
3. It emphasizes non-dominated solutions. 


At any generation t, the offspring population (say, 
Q,) is first created by using the parent population (say, 
P,) and the usual genetic operators. Thereafter, the two 
populations are combined to form a new population 
(say, R,) of size 2N. Then, the population R, is clas- 
sified into different non-dominated classes. Thereafter, 
the new population is filled by points of different non- 
dominated fronts, one at a time. The filling starts with 
the first non-dominated front (of class 1) and continues 
with points of the second non-dominated front, and so 
on. Since the overall population size of R, is 2N, not 
all fronts can be accommodated in the N slots available 
for the new population. All fronts that could not be ac- 
commodated are deleted. When the last allowed front 
is being considered, there may exist more points in the 
front than the slots remaining in the new population. 
This scenario is illustrated in Fig. 49.4. Instead of arbi- 
trarily discarding some members from the last front, the 
points that will make the diversity of the selected points 
the highest are chosen. 

The crowded-sorting of the points of the last front 
which could not be accommodated fully is achieved in 
the descending order of their crowding distance values, 
and points from the top of the ordered list are chosen. 
The crowding distance d; of point i is a measure of the 
objective space around i which is not occupied by any 
other solution in the population. Here, we simply calcu- 


of elitists in an EMO provides a monotonically non- 
degrading performance [49.26]. The second generation 
EMO algorithms implemented an elite-preserving op- 
erator in different ways and gave birth to elitist EMO 
procedures, such as non-dominated sorting GA NSGA- 
II [49.27], strength Pareto EA (SPEA) [49.28], Pareto- 
archived ES (PAES) [49.29], and others. Since these 
EMO algorithms are state-of-the-art and commonly- 
used procedures, we describe one of these algorithms 
in detail. 


late this quantity d; by estimating the perimeter of the 
cuboid (Fig. 49.5) formed by using the nearest neigh- 
bors in the objective space as the vertices (we call this 
the crowding distance). 


49.4.1 Sample Results 


Here, we show results from several runs of the NSGA- 
II algorithm on two test problems. The first problem 
(ZDT2 — Zitzler-Deb-Thiele) is a two-objective, 30- 


Non-dominated 
sorting 


Crowding 
distance sorting 


r E- = 
P, F, | -------+-+--------- >| a 


Q, + Rejected 


R, 


Prat 


Fig. 49.4 Schematic of the NSGA-II procedure 


(0) 


Cuboid 


Fig. 49.5 The crowding distance calculation 


Multi-Objective Evolutionary Algorithms | 49.4 Elitist EMO: NSGA-II 1001 


variable problem with a concave Pareto-optimal front 


minimize fi(x) =x, 


minimize f(x) = s(x)[1 — (f{(x)/s(x))’] 


ZDT2: ) where s(x) =14+ 22,4), 
O<x, <1, 
-l<x, <1, i=2,3,...,30. 


(49.2) 


The second problem (KUR — Kurswae), with three vari- 
ables, has a disconnected Pareto-optimal front 


minimize fi(x) =o, 


m |-10exp (02s Fii )| ; 
` | minimize A(x) = 2, [lil + 5 sinG3)] , 


—S5<x, <5, i=1,2,3. 
(49.3) 


NSGA-II is run with a population size of 100 and for 
250 generations. The variables are used as real numbers 
and a simulated binary crossover (SBX) recombina- 
tion operator [49.30] with pe = 0.9, a distribution index 
of ne = 10, and a polynomial mutation operator [49.8] 
with pm = 1/n (n is the number of variables) and a 
distribution index of nm = 20 are used. Figures 49.6 
and 49.7 show that NSGA-II converges to the Pareto- 
optimal front and maintains a good spread of solutions 
in both test problems. 

There also exist other competent EMOs, such as 
the strength Pareto evolutionary algorithm (SPEA) 
and its improved version SPEA2 [49.31], the Pareto- 
archived evolution strategy (PAES) and its im- 
proved versions pareto-envelope based selection al- 
gorithm (PESA) and PESA2 [49.32], multi-objective 
messy GA (MOMGA) [49.33], multi-objective micro- 
GA [49.34], neighborhood constraint GA [49.35], adap- 
tive range MOGA (ARMOGA) [49.36], and others. 
Moreover, there exist other EA-based methodologies, 
such as particle swarm-based EMO [49.37, 38], ant- 
based EMO [49.39, 40], and differential evolution- 
based EMO [49.41]. Simulated annealing method is 
used to find multiple Pareto-optimal solutions for 
multi-objective optimization problems [49.42]. The 
tabu search method is also used for multi-objective 
optimization [49.43]. 


49.4.2 Constraint Handling in EMO 


The constraint handling method modifies the binary 
tournament selection, where two solutions are picked 
from the population, and the better solution is chosen. 
In the presence of constraints, each solution can be ei- 
ther feasible or infeasible. Thus, there may be at most 
three situations: 


i) Both solutions are feasible. 
ii) One is feasible and other is not. 
iii) Both are infeasible. 


We consider each case by simply redefining 
the domination principle as follows (we call it the 


p ill 
1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
03 
0.2 
0.1 


0 
0 0.1 0.2 0.3 04 05 0.6 0.7 0.8 09 1 
fi 


Fig. 49.6 NSGA-II on ZDT2 


-10 


=i ! > 
-20 -19 -18 -17 -16 -15 -14 


Fig. 49.7 NSGA-II on KUR 


7°64 | J Hed 


1002 


S°6n | 3 Hed 


Part E 


Evolutionary Computation 


constrained-domination condition for any two solu- 
tions x and x): 


Definition 49.2 

A solution x is said to be a constrained-dominated 
solution x (or x <.x), if any of the following 
conditions are true: 


1. Solution x is feasible and solution x“ is not. 

2. Solutions x and x are both infeasible, but so- 
lution x has a smaller constraint violation, which 
can be computed by adding the normalized viola- 
tion of all constraints 


J K 
CV(x) = = max (o -8g (x)) + oy osae) 
k=1 


j=1 


The normalization is achieved with the population 
minimum ((g)min) and maximum ((g;)max) con- 
straint violations 


ga) = ((gj(x)) a (gj) min) / ((3) max = (8) min) . 


3. Solutions x and x are feasible and solution x 
dominates solution x in the usual sense (Defini- 
tion 49.1). 


The above change in the definition requires a mini- 
mal change in the NSGA-II procedure described earlier. 


49.5 Applications of EMO 


Since the early development of EMO algorithms in 
1993, they have been applied to many challeng- 
ing real-world optimization problems. Descriptions of 
some of these studies can be found in books [49.8, 
44-47], dedicated conference proceedings [49.48—53], 
and domain-specific books, journals, and proceed- 
ings. A repository of most research and application 
papers of EMO is available [49.54]. In this sec- 
tion, we describe one case study that clearly demon- 
strates the EMO philosophy which we described in 
Sect. 49.2.1. 


49.5.1 Spacecraft Trajectory Design 


Coverstone-Carroll et al. [49.55] proposed a multi- 
objective optimization technique using the original non- 


f 10 


0 
Oi 07 O38 Of! Os Wf My Wk Of i 
fi 


Fig. 49.8 Non-constrained-domination fronts 


Figure 49.8 shows the non-dominated fronts on a six- 
member population due to the introduction of two 
constraints (the minimization problem is described as 
CONSTR elsewhere [49.8]). In the absence of the con- 
straints, the non-dominated fronts (shown by dashed 
lines) would have been ((1,3,5), (2,6), (4)), 
but in their presence, the new fronts are ((4,5), 
(6), (2), (1), (3)). The first non-dominated 
front consists of the best (that is, non-dominated and 
feasible) points from the population and any feasible 
point lies on a better non-dominated front than an in- 
feasible point. 


dominated sorting algorithm (NSGA) [49.24] to find 
multiple trade-off solutions in a spacecraft trajectory 
optimization problem. To evaluate a solution (trajec- 
tory), the SEPTOP (solar electric propulsion trajectory 
optimization) software [49.56] is called, and the de- 
livered payload mass and the total time of flight are 
calculated. The multi-objective optimization problem 
has eight decision variables controlling the trajectory 
and three objective functions: 


i) Maximize the delivered payload at destination. 

ii) Maximize the negative of the time of flight. 

iii) Maximize the total number of heliocentric revolu- 
tions in the trajectory, and three constraints limiting 
the SEPTOP convergence error and minimum and 
maximum bounds on heliocentric revolutions. 


Multi-Objective Evolutionary Algorithms 


49.5 Applications of EMO 


On the Earth—Mars rendezvous mission, the study 
found interesting trade-off solutions [49.55]. Using 
a population of size 150, the NSGA was run for 30 
generations. The non-dominated solutions obtained are 
shown in Fig. 49.9 for two of the three objectives, and 
some selected solutions are shown in Fig. 49.10. It is 
clear that there exist short-time flights with smaller de- 
livered payloads (solution marked 44 with 1.12 years 
of flight and delivering 685.28 kg load) and long-time 
flights with larger delivered payloads (solution marked 
36 with close to 3.5 years of flight and delivering about 
900kg load). While solution 44 can deliver a mass 
of 685.28kg and requires about 1.12 years, solution 
72 can deliver almost 862kg with a travel time of 
about 3 years. In these figures, each continuous part 
of a trajectory represents a thrusting arc and each 
dashed part of a trajectory represents a coasting arc. 
It is interesting to note that only a small improve- 
ment in delivered mass occurs in the solutions between 
73 and 72 with a sacrifice in flight time of about 
1 year. 

The multiplicity in trade-off solutions, as depicted 
in Fig. 49.10, is what we envisaged in discovering 
in a multi-objective optimization problem by using 
a posteriori procedure, such as a generating method 
or using an EMO procedure vis-a-vis an a priori ap- 
proach in which a single scalarized problem is solved 
with a single preferred parameter setting to find a sin- 
gle Pareto-optimal solution. This aspect is also shown 
in Fig. 49.2. Once a set of solutions with a good trade- 
off among objectives is obtained, one can analyze them 
to choose a particular solution. For example, in this 
problem context, it makes sense not to choose a solu- 
tion between points 73 and 72 due to poor trade-off 
between the objectives in this range, a matter which 
is only revealed after a representative set of trade-off 
solutions are found. On the other hand, choosing a so- 
lution within points 44 and 73 is worthwhile, but which 
particular solution to choose depends on other mission- 
related issues. However, by first finding a wide range 
of possible solutions thereby revealing the shape of 
front in a computationally quicker manner, EMO can 
help a decision-maker in narrowing down the choices 


Mass delivered to target (kg) 


900 73 2 36's 


RT V pomat womi am 

aut a 
Mau mom a 1 
ahua iia a 
mapa woii w 1 
mei wom ah 
mer ein a 1 
Pi i eiu 
TRE i 
phut ma 
Hee 


300 Uy ty, 


200} HE BI 
100) hE Bs 


fesdesetbee cess doce clee ee heed ee oicle ose 


1 Ss 2) 2S 3 35) 
Transfer time (yrs) 


Fig. 49.9 Non-dominated solutions obtained using NSGA 


and in allowing a better decision to be made. With- 
out the knowledge of such a wide variety of trade-off 
solutions, proper decision-making may be a difficult 
task. With the use of an a priori approach to find 
a single solution using, for example, the €-constraint 
method with a particular € vector, the decision-maker 
will always wonder what solution would have been 
derived if a different € vector had been chosen. For 
example, if €; = 2.5 years is chosen and the mass de- 
livered to the target is maximized, a solution in between 
points 73 and 72 will be found. As discussed earlier, 
this part of the Pareto-optimal front does not pro- 
vide the best trade-offs between the objectives that this 
problem can offer. A lack of knowledge of good trade- 
off regions before a decision is made may allow the 
decision-maker to settle for a solution which, although 
optimal, may not be a good compromise solution. The 
EMO procedure allows a flexible and a pragmatic pro- 
cedure for finding a well-diversified set of solutions 
simultaneously so as to enable picking a particular re- 
gion for further analysis or a particular solution for 
implementation. 


1003 


S°64 | 3 Hed 


1004 PartE 


Evolutionary Computation 


9°64 | 3 Hed 


Individual 44 


Mars 
09.22.07 


Individual 73 


Fig. 49.10 Four trade-off trajectories 
(after [49.55]) 


02.04.09 
Individual 36 


49.6 Recent Developments in EMO 


An interesting aspect regarding research and applica- 
tion of EMO is that soon after a number of efficient 
EMO methodologies had been suggested and applied 
in various interesting problem areas, researchers did 
not waste any time to look for opportunities to make 
the field broader and more useful by diversifying EMO 
applications to various other problem-solving tasks. In 
this section, we describe a number of such salient recent 
developments of EMO. 


49.6.1 Hybrid EMO Algorithms 


The search operators used in EMO are heuristic based. 
Thus, these methodologies are not guaranteed to find 
Pareto-optimal solutions with a finite number of so- 
lution evaluations in an arbitrary problem. In single- 
objective EA research, hybridization of EAs is common 
for ensuring convergence to an optimal solution; it is not 
surprising that studies on developing hybrid EMOs are 
now being pursued to ensure that true Pareto-optimal 
solutions are found by hybridizing them with mathe- 
matically convergent ideas. 


EMO methodologies provide adequate emphasis 
on currently non-dominated and isolated solutions so 
that population members progress towards the Pareto- 
optimal front iteratively. To make the overall procedure 
faster and to perform the task with a more theo- 
retical emphasis, EMO methodologies are combined 
with mathematical optimization techniques having lo- 
cal convergence properties. A simple-minded approach 
would be to start the process with an EMO and the 
solutions obtained from EMO could be improved by 
optimizing a composite objective derived from multi- 
ple objectives to ensure a good spread by using a local 
search technique [49.57]. Another approach would be 
to use a local search technique as a mutation-like op- 
erator in an EMO, so that all population members are 
at least guaranteed to be local optimal solutions [49.57, 
58]. To save computational time, instead of performing 
the local search for every solution in a generation, a mu- 
tation can be performed only after a few generations. 
Some recent studies [49.58-60] have demonstrated the 
usefulness of such hybrid EMOs for a guaranteed con- 
vergence. 


Multi-Objective Evolutionary Algorithms | 49.6 Recent Developments in EMO 1005 


Although these studies concentrated on ensuring 
convergence to the Pareto-optimal front, some emphasis 
should now be placed on providing adequate diversity 
among the solutions obtained, particularly when a con- 
tinuous Pareto-optimal front is represented by a finite 
set of points. Some ideas of maximizing the hypervol- 
ume measure [49.61] or the maintenance of a uniform 
distance between points are proposed for this purpose, 
but how such diversity-maintenance techniques would 
be integrated with convergence-ensuring principles in 
a synergistic way would be interesting and useful future 
research. Some relevant studies in this direction exist al- 
ready [49.59, 62—65]. 


49.6.2 Multi-Objectivization 


Interestingly, the act of finding multiple trade-off so- 
lutions using an EMO procedure has found its ap- 
plication outside the realm of solving multi-objective 
optimization problems. The concept of finding near- 
optimal trade-off solutions is applied to solve other 
kinds of optimization problems as well. For example, 
the EMO concept is used to solve constrained single- 
objective optimization problems by converting the task 
into a two-objective optimization task of additionally 
minimizing an aggregate constraint violation [49.66]. 
This eliminates the need to specify a penalty param- 
eter while using a penalty-based constraint handling 
procedure. If viewed this way, the usual penalty func- 
tion approach used in classical optimization studies is 
a special weighted-sum approach to the bi-objective 
optimization problem of minimizing the objective func- 
tion and minimizing the constraint violation, for which 
the weight vector is a function of the penalty parameter. 
A well-known difficulty in genetic programming stud- 
ies, called bloating, arises due to the continual increase 
in the size of genetic programs evolved with iteration. 
The reduction of bloating by minimizing the size of 
a program as an additional objective has helped find 
high-performing solutions with a smaller size of the 
code [49.67, 68]. In clustering algorithms, minimizing 
the intra-cluster distance and maximizing inter-cluster 
distance simultaneously in a bi-objective formulation 
of a clustering problem is found to yield better solu- 
tions than the usual single-objective minimization of the 
ratio of the intra-cluster distance to the inter-cluster dis- 
tance [49.69]. An EMO is found to solve a minimum 
spanning tree problem better than a single-objective 
EA [49.70]. A recently edited book [49.71] describes 
many interesting applications in which EMO method- 
ologies have helped to solve problems that are oth- 


erwise (or traditionally) not treated as multi-objective 
optimization problems. 


49.6.3 Uncertainty-Based EMO 


A major surge in EMO research has taken place in 
handling uncertainties among decision variables and 
problem parameters in multi-objective optimization. 
Practice is full of uncertainties and almost no parameter, 
dimension, or property can be guaranteed to be fixed at 
the value it is aimed at. In such scenarios, evaluation of 
a solution is not precise, and the resulting objective and 
constraint function values become probabilistic quan- 
tities. Optimization algorithms are usually designed to 
handle such stochastiticies by using crude methods, 
such as Monte Carlo simulation of stochasticities in un- 
certain variables and parameters and by sophisticated 
stochastic programming methods involving nested op- 
timization techniques [49.72]. When these effects are 
taken care of during the optimization process, the re- 
sulting solution is usually different from the optimum 
solution of the problem and is known as a robust so- 
lution. Such an optimization procedure will then find 
a solution which may not be the true global optimum 
solution, but one which is less sensitive to uncertain- 
ties in decision variables and problem parameters. In 
the context of multi-objective optimization, a consider- 
ation of uncertainties for multiple objective functions 
will result in a robust frontier which may be different 
from the globally Pareto-optimal front. Each and every 
point on the robust frontier is then guaranteed to be less 
sensitive to uncertainties in decision variables and prob- 
lem parameters. Some such studies in EMO are [49.73, 
74]. 

When the evaluation of constraints under uncertain- 
ties in decision variables and problem parameters is 
considered, deterministic constraints become stochastic 
(they are also known as chance constraints) and involve 
a reliability index (R) to handle the constraints. A con- 
straint g(x) > 0 then becomes Prob(g(x) > 0) > R. In 
order to find the left-hand side of the above chance con- 
straint, a separate optimization methodology [49.75] 
is needed, thereby making the overall algorithm a bi- 
level optimization procedure. Approximate single-loop 
algorithms exist [49.76] and recently one such method- 
ology was integrated with an EMO [49.72] and shown 
to find a reliable frontier corresponding a specified re- 
liability index, instead of the Pareto-optimal frontier, 
in problems having uncertainty in decision variables 
and problem parameters. More such methodologies are 
needed, as uncertainties are an integral part of practical 


9°64 | 3 Hed 


1006 PartE 


Evolutionary Computation 


9°64 | 3 Hed 


problem-solving, and multi-objective optimization re- 
searchers must look for better and faster algorithms to 
handle them. 


49.6.4 EMO and Decision-Making 


Searching for a set of Pareto-optimal solutions by us- 
ing an EMO fulfills only one aspect of multi-objective 
optimization, as choosing a particular solution for an 
implementation is the remaining decision-making task, 
which is equally important. For many years, EMO re- 
searchers have postponed the decision-making aspect 
and concentrated on developing efficient algorithms for 
finding multiple trade-off solutions. Having pursued 
that part somewhat, now for the past couple of years or 
so, EMO researchers are putting efforts to design com- 
bined algorithms for optimization and decision-making. 
In the view of the author, the decision-making task can 
be considered from two main considerations in an EMO 
framework: 


1. Generic consideration: there are some aspects that 
most practical users would like to use in narrowing 
down their choice. Above we discussed the im- 
portance of finding robust and reliable solutions in 
the presence of uncertainties in decision variables 
and/or problem parameters. In such scenarios, an 
EMO methodology can straightaway find a robust 
or a reliable frontier [49.72,73] and no subjective 
preference from any decision maker may be nec- 
essary. Similarly, if a problem resorts to a Pareto- 
optimal front having knee points, such points are 
often the choice of decision-makers. Knee points 
demands a large sacrifice in at least one objective 
to achieve a small gain in another, thereby making it 
discouraging to move out from a knee point [49.77]. 
Other such generic choices are related to Pareto- 
optimal points depicting a certain pre-specified rela- 
tionship between objectives, Pareto-optimal points 
having multiplicity (say, at least two or more so- 
lutions in the decision variable space mapping to 
identical objective values), Pareto-optimal solutions 
which do not lie close to variable boundaries, 
Pareto-optimal points having certain mathematical 
properties, such as all Lagrange multipliers with 
more or less identical magnitudes — a condition 
often desired to make an equal importance to all 
constraints, and others. These considerations are 
motivated from the fundamental and practical as- 
pects of optimization and may be applied to most 
multi-objective problem-solving tasks, without any 


consent of a decision-maker. These considerations 
may narrow down the set of non-dominated points. 
A further subjective consideration (which is dis- 
cussed below) may then be used to pick a preferred 
solution. 

2. Subjective consideration: in this category, any 
problem-specific information can be used to nar- 
row down the choices, and the process may 
even lead to a single preferred solution at the 
end. Most decision-making procedures use some 
preference information (utility functions, refer- 
ence points [49.78], reference directions [49.79], 
marginal rate of return, and a host of other consid- 
erations [49.9]) to select a subset of Pareto-optimal 
solutions. A recent book [49.80] is dedicated to 
the discussion of many such multi-criteria decision 
analysis (MCDA) tools and collaborative sugges- 
tions of using EMO with such MCDA tools. Some 
hybrid EMO and MCDA algorithms have been sug- 
gested in the recent past [49.81—85]. 


Many other generic and subjective considerations 
are needed, and it is interesting that EMO and 
MCDM researchers are collaborating on developing 
such complete algorithms for multi-objective optimiza- 
tion [49.80]. 


49.6.5 EMO for Handling a Large Number 
of Objectives: Multi-Objective EMO 


Initial studies of EMO amply showed that EMO algo- 
rithms can be used to find a wide spread of trade-off 
solutions on two and three-objective optimization prob- 
lems. However, their performance on four or more 
objective problems have not been studied enough. Re- 
cently, such studies have become important and are 
known as many-objective optimization studies in the 
EMO literature. 

A detailed study [49.86] made on eight-objective 
problems revealed somewhat negative results about 
the existing EMO methodologies. However, in his 
book [49.8] and recent other studies [49.87—90] the 
author has clearly explained the reasons for this behav- 
ior of EMO algorithms. EMO methodologies work by 
emphasizing non-dominated solutions in a population. 
Unfortunately, as the number of objectives increases, 
most population members in a randomly created pop- 
ulation tend to become non-dominated to each other. 
For example, in a three-objective scenario, about 10% 
of the members in a population of the size 200 are 
non-dominated, whereas in a 10-objective problem sce- 


Multi-Objective Evolutionary Algorithms 


49.6 Recent Developments in EMO 


nario, as much as 90% of the members in a population 
of size of 200 are non-dominated. Thus, in a large- 
objective problem, an EMO algorithm runs out of room 
to introduce new population members into a genera- 
tion, thereby causing a stagnation in the performance 
of an EMO algorithm. It has been argued that to make 
EMO procedures efficient, an exponentially large popu- 
lation size (with respect to the number of objectives) is 
needed. This makes the EMO procedure slow and com- 
putationally less attractive. 

However, recent techniques use a fixed set of refer- 
ence points [49.91—93] or reference directions [49.94] 
and are promising, as they are shown to find a widely 
distributed set of solutions in 3 to 15-objective test and 
real-world problems. 

However, practically speaking, even if an algo- 
rithm can find tens of thousands of Pareto-optimal 
solutions for a multi-objective optimization problem, 
besides simply getting an idea of the nature and shape 
of the front, they are simply too many to be con- 
ceivable for any decision-making purposes. Keeping 
these views in mind, EMO researchers have taken two 
different approaches in dealing with many-objective 
problems. 


Finding a Partial Set 
Instead of finding the complete Pareto-optimal front 
in a problem having many objectives, EMO proce- 
dures can be used to find only a preferred part of the 
Pareto-optimal front. This can be achieved by indicating 
preference information by various means. Ideas such 
as reference point-based EMO [49.81, 85], light beam 
search [49.82], biased sharing approaches [49.95], 
cone-dominance [49.96], etc. have been suggested for 
this purpose. Each of these studies has shown that for up 
to 10 and 20-objective problems, although finding the 
complete frontier is a difficulty, finding a partial fron- 
tier corresponding to certain preference information is 
not that difficult a proposition. 

The use of a parallel or a distributed computing plat- 
form can be used with the above idea, and the complete 
Pareto-optimal front can be obtained by a distributed 
computing procedure [49.96]. In the study, each pro- 
cessor in a distributed computing environment receives 
a unique cone defining domination. The cones are de- 
signed carefully so that at the end of such a distributed 
computing EMO procedure, solutions are found to exist 
in various parts of the complete Pareto-optimal front. 
A collection of these solutions is then able to provide 
a good representation of the entire original Pareto- 
optimal front. 


Identifying and Eliminating 

Redundant Objectives 
Many practical optimization problems can easily list 
a large of number of objectives (often more than 10), 
as many different criteria or goals are often of inter- 
est to practitioners. In most instances, it is not entirely 
sure whether or not the chosen objectives are all in con- 
flict with each other. For example, the minimization of 
weight and the minimization of cost of a component 
or a system are often mistaken to have an identical 
optimal solution, but may lead to a range of trade-off 
optimal solutions. Practitioners do not take any chances 
and tend to include all (or as many as possible) objec- 
tives into the optimization problem formulation. There 
is another fact which is more worrisome. Two appar- 
ently conflicting objectives may show a good trade-off 
when evaluated with respect to some randomly created 
solutions. However, if these two objectives are evalu- 
ated for solutions close to their optima, they tend to 
show a good correlation. That is, although objectives 
can exhibit conflicting behavior for random solutions, 
near their Pareto-optimal front, the conflict vanishes 
and the optimum of one becomes close to the optimum 
of the other. 

Thinking of the existence of such problems in 
practice, certain researchers [49.90, 97,98] performed 
linear and non-linear principal component analysis 
(PCA) to a set of EMO-produced solutions. Objec- 
tives causing a positively correlated relationship be- 
tween the the obtained NSGA-II solutions were iden- 
tified and declared as redundant. The EMO proce- 
dure is then restarted with non-redundant objectives. 
This combined EMO-PCA procedure is continued 
until no further reduction in the number of objec- 
tives is possible. The procedure has handled practi- 
cal problems involving five and more objectives and 
has shown to reduce the choice of real conflicting 
objectives to a few. On test problems, the proposed 
approach has been shown to reduce an initial 50- 
objective problem to the correct three-objective Pareto- 
optimal front by eliminating 47 redundant objectives. 
Another study [49.99] used an exact and a heuristic- 
based conflict identification approach on a given set 
of Pareto-optimal solutions. For a given error mea- 
sure, an effort is made to identify a minimal subset of 
objectives that does not alter the original dominance 
structure on a set of Pareto-optimal solutions. This 
idea was recently introduced within an EMO [49.100], 
but a continual reduction of objectives through a suc- 
cessive application of the above procedure would be 
interesting. 


1007 


9°64 | 3 Hed 


1008 PartE 


Evolutionary Computation 


9°64 | 3 Hed 


This is a promising area of EMO research and 
more computationally faster objective-reduction tech- 
niques are definitely needed for the purpose. A recent 
approach uses previously-fixed multiple directional 
searches to find a widely distributed set of Pareto- 
optimal points [49.94]. In this direction, the use of 
alternative definitions of domination may be beneficial. 
One such idea redefined the definition of domination: 
a solution is said to dominate another solution, if the 
former solution is better than the latter one in more 
objectives. This certainly excludes finding the entire 
Pareto-optimal front and helps an EMO to converge 
near the intermediate and central part of the Pareto- 
optimal front. Another EMO study used a fuzzy domi- 
nance [49.101] relation (instead of Pareto-dominance), 
in which superiority of one solution over another in any 
objective is defined in a fuzzy manner. Many other such 
definitions are possible and can be implemented based 
on the problem context. 


49.6.6 Knowledge Extraction Through EMO 


One striking difference between single-objective opti- 
mization and multi-objective optimization is the car- 
dinality of the solution set. In the latter, multiple 
solutions are the outcome and each solution is theoret- 
ically an optimal solution corresponding to a particu- 
lar trade-off among the objectives. Thus, if an EMO 
procedure can find solutions close to the true Pareto- 
optimal set, what we have in our hands is a number 
of high-performing solutions trading-off the conflicting 
objectives considered in the study. Since these solu- 
tions are all near optimal, they can be analyzed for 
finding properties which are common to them. Such 
a procedure can then become a systematic approach in 
deciphering the important and hidden properties that 
optimal and high-performing solutions must have for 
that problem. In a number of practical problem-solving 
tasks, the so-called innovation procedure is shown to 
find important knowledge about high-performing so- 
lutions [49.102]. Such useful properties are expected 
to exist in practical problems, as they follow certain 
scientific and engineering principles at the core, but 
in the past not much attention had been paid to find- 
ing them through a systematic scientific procedure. The 
principle of first searching for multiple trade-off and 
high-performing solutions using a multi-objective opti- 
mization procedure and then analyzing them to discover 
useful knowledge certainly remains a viable way for- 
ward. The current efforts [49.103, 104] to automate the 
knowledge extraction procedure through a sophisticated 


data-mining task should make the overall approach 
more appealing and useful in practice. 


49.6.7 Dynamic EMO 


Dynamic optimization involves objectives, constraints, 
or problem parameters that change over time. This 
means that as an algorithm approaches the optimum 
of the current problem, the problem definition changes 
and now the algorithm must solve a new problem. 
This is not equivalent to another optimization task in 
which a new and different optimization problem must 
be solved afresh. Often, in such dynamic optimization 
problems, an algorithm is usually not expected to find 
the optimum, instead it is best expected to track the op- 
timum changing with the iteration. The performance of 
a dynamic optimizer then depends on how close it is 
able to track the true optimum (which changes with it- 
eration or time). Thus, practically speaking, it may be 
hoped that optimization algorithms can handle prob- 
lems that do not change significantly with time. With 
respect to the algorithm, since here the problem is not 
expected to change too much from one time instance to 
another and some good solutions to the current problem 
are already at hand in a population, researchers fancied 
solving such dynamic optimization problems using evo- 
lutionary algorithms [49.105]. 

A recent study [49.106] proposed the following pro- 
cedure for dynamic optimization involving single or 
multiple objectives. Let P(t) be a problem that changes 
with time ¢ (from t= 0 to f= T). Despite the contin- 
ual change in the problem, we assume that the problem 
is fixed for a time period t, which is not known a pri- 
ori, and the aim of the (offline) dynamic optimization 
study is to identify a suitable value of t for an accurate 
as well as a computationally faster approach. For this 
purpose, an optimization algorithm with t as a fixed 
time period is run from t= 0 to f= T with the prob- 
lem assumed fixed for every t time period. A measure 
T (t) determines the performance of the algorithm and 
is compared with a pre-specified and expected value 
Ty. If P(t) >I), for the entire time domain of the 
execution of the procedure, we declare t to be a per- 
missible length of stasis. Then, we try with a reduced 
value of t and check if a smaller length of statis is 
also acceptable. If not, we increase t to allow the op- 
timization problem to remain static for a longer time 
so that the chosen algorithm can now have more iter- 
ations (time) to perform better. Such a procedure will 
eventually come up with a time period t*, which would 
be the smallest time of statis allowed for the optimiza- 


Multi-Objective Evolutionary Algorithms 


49.6 Recent Developments in EMO 


tion algorithm to work based on chosen performance 
requirement. Based on this study, a number of test prob- 
lems and a hydro-thermal power dispatch problem were 
tackled recently [49.106]. 

In the case of dynamic multi-objective problem- 
solving tasks, there is an additional difficulty which is 
worth mentioning here. Not only does an EMO algo- 
rithm need to find or track the changing Pareto-optimal 
fronts, in a real-world implementation, it must also 
make an immediate decision about which solution to 
implement from the current front before the problem 
changes to a new one. Decision-making analysis is con- 
sidered to be time-consuming, involving execution of 
analysis tools, higher-level considerations, and some- 
times group discussions. If dynamic EMO is to be 
applied in practice, automated procedures for making 
decisions must be developed. Although it is not clear 
how to generalize such an automated decision-making 
procedure in different problems, problem-specific tools 
are certainly possible and a worthwhile and fertile area 
for research. 


49.6.8 Quality Estimates for EMO 


When algorithms are developed and test problems with 
known Pareto-optimal fronts are available [49.107— 
110], an important task is to have performance mea- 
sures with which the EMO algorithms can be evaluated. 
Thus, a major focus of EMO research has been used to 
develop different performance measures. Since the fo- 
cus in an EMO task is multi-faceted — convergence to 
the Pareto-optimal front and diversity of solutions along 
the entire front, it is also expected that one performance 
measure to evaluate EMO algorithms will be unsatisfac- 
tory. In the early years of EMO research, three different 
sets of performance measures were used: 


1. Metrics evaluating convergence to the known 
Pareto-optimal front (such as error ratio, distance 
from reference set, etc.) 

2. Metrics evaluating spread of solutions on the known 
Pareto-optimal front (such as spread, spacing, etc.). 

3. Metrics evaluating certain combinations of conver- 
gence and spread of solutions (such as hypervol- 
ume, coverage, R-metric, etc.). 


Some of these metrics are described in texts [49.8, 
44]. A detailed study [49.111] comparing most ex- 
isting performance metrics based on out-performance 
relations recommended the use of the S-metric (or 
the hypervolume metric) and the R-metric suggested 


by [49.112]. A recent study argued that a single unary 
performance measure or any finite combination of them 
(for example, any of the first two metrics described 
above in the enumerated list or both together) can- 
not adequately determine whether one set is better 
than another [49.113]. That study also concluded that 
binary performance metrics (indicating usually two dif- 
ferent values when a set of solutions A is compared 
against B and B is compared against A), such as an 
epsilon-indicator, a binary hypervolume indicator, util- 
ity indicators R1 to R3, etc., are better measures for 
multi-objective optimization. The flip side is that the 
chosen binary metric must be computed K(K — 1) times 
when comparing K different sets to make a fair com- 
parison, thereby making the use of binary metrics com- 
putationally expensive in practice. Importantly, these 
performance measures have allowed researchers to use 
them directly as fitness measures within indicator-based 
EAs (IBEAs) [49.114]. In addition, the attainment in- 
dicators of [49.115,116] provide further information 
about location and inter-dependencies among the solu- 
tions obtained. 

The hypervolume metric is a popular metric used in 
EMO studies. However, the computation of the hyper- 
volume metric for more than three-objective problems 
becomes a computationally challenging task. Recent 
studies on computationally fast estimation methods of 
the hypervolume metric have gained popularity among 
theoretical minds [49.62, 63, 117, 118]. These methods 
compute the proportion of randomly generated objec- 
tive points that are dominated by the current set of 
non-dominated points to estimate the hypervolume met- 
ric. A reliable computation method of these studies will 
facilitate the use of the hypervolume metric in design- 
ing efficient EMO algorithms. 


49.6.9 Exact EMO with Run-Time Analysis 


Since they were first suggested, efficient EMO algo- 
rithms have been increasingly applied in a wide vari- 
ety of problem domains to obtain trade-off frontiers. 
Simultaneously, some researchers have also devoted 
their efforts to developing exact EMO algorithms with 
a theoretical complexity estimate in solving certain 
discrete multi-objective optimization problems. The 
first such study [49.119] suggested a pseudo-Boolean 
multi-objective optimization problem — a two-objective 
LOTZ (leading ones trailing zeroes) — and a couple 
of EMO methodologies — a simple evolutionary multi- 
objective optimizer (SEMO) and an improved version 
fair evolutionary multi-objective optimizer (FEMO). 


1009 


9°64 | 3 Hed 


1010 PartE 


Evolutionary Computation 


2°64 | 3 Hed 


The study then estimated the worst-case computational 
effort needed to find all Pareto-optimal solutions of the 
LOTZ problem. This study spurred a number of im- 
proved EMO algorithms with run-time estimates and re- 
sulted in many other interesting test problems [49.120- 
123]. Although these test problems may not resem- 
ble common practical problems, the working principles 
of suggested EMO algorithms to handle specific prob- 
lem structures bring in a plethora of insights about the 
working of multi-objective optimization, particularly 
in comprehensively finding all (not just one or a few) 
Pareto-optimal solutions. 


49.6.10 EMO with Meta-Models 


The practice of optimization algorithms is often lim- 
ited by the computational overheads associated with 
evaluating solutions. Certain problems involve expen- 
sive computations, such as numerical solution of par- 
tial differential equations describing the physics of 
the problem, finite difference computations involving 
an analysis of a solution, computational fluid dynam- 
ics simulation to study the performance of a solu- 
tion over a changing environment, etc. In some such 
problems, evaluation of each solution to compute con- 
straints and objective functions may take a few hours 
to a complete day or two. In such scenarios, even 
if an optimization algorithm needs one hundred so- 
lutions to get anywhere close to a good and feasible 
solution, the application needs an easy 3 to 6 months 
of continuous computational time. In most practical 


49.7 Conclusions 


The research and application in evolutionary multi- 
objective optimization (EMO) over the past 15 years 
have resulted in a number of efficient algorithms for 
finding a set of well-diversified, near Pareto-optimal 
solutions. EMO algorithms are now regularly being ap- 
plied to different problems in most areas of science, 
engineering, and commerce. This chapter has discussed 
the principles of EMO and illustrated the principle 
by depicting one efficient and popularly used EMO 
algorithm. Results from an inter-planetary spacecraft 
trajectory optimization problem reveal the importance 
of the principles followed in EMO algorithms. There- 
after, a specific constraint handling procedure used in 
EMO studies was briefly described. 

The main highlight of this chapter has been the 
description of some of the current research and appli- 


purposes, this is considered a luxury in an indus- 
trial set-up. Optimization researchers are constantly on 
their toes in coming up with approximate, yet faster 
algorithms. 

A little thought brings out an interesting fact about 
how optimization algorithms work. The initial iterations 
deal with solutions which may not be close to opti- 
mal solutions. Therefore, these solutions need not be 
evaluated with high precision. Meta-models for objec- 
tive functions and constraints have been developed for 
this purpose. Mostly two different approaches are fol- 
lowed. In one approach, a sample of solutions is used 
to generate a meta-model (an approximate model of 
the original objectives and constraints), and then ef- 
forts are made to find the optimum of the meta-model, 
assuming that the optimal solutions of both the meta- 
model and the original problem are similar to each 
other [49.124, 125]. In another method, a successive 
meta-modeling approach is used in which the algorithm 
starts to solve the first meta-model obtained from a sam- 
ple of the entire search space [49.126-128]. As the 
solutions start to focus near the optimum region of the 
meta-model, a new and more accurate meta-model is 
generated in the region dictated by the solutions of the 
previous optimization. A coarse-to-fine-grained meta- 
modeling technique based on artificial neural networks 
is shown to reduce the computational effort by about 30 
to 80% on different problems [49.126]. Other success- 
ful meta-modeling implementations for multi-objective 
optimization are based on Kriging and response surface 
methodologies exist [49.128, 129]. 


cation activities in EMO. One critical area of current 
research lies in collaborative EMO-MCDM algorithms 
for achieving a complete multi-objective optimization 
task of finding a set of trade-off solutions and finally 
arriving at a single preferred solution. Another di- 
rection taken by researchers is to address guaranteed 
convergence and diversity of EMO algorithms through 
hybridizing them with mathematical and numerical op- 
timization techniques as local search algorithms. Inter- 
estingly, EMO researchers have discovered its potential 
in solving traditionally hard optimization problems, 
but not necessarily multi-objective ones in nature, in 
a convenient manner using EMO algorithms. So-called 
multi-objectivization studies are attracting researchers 
from various fields to develop and apply EMO algo- 
rithms in many innovative ways. Considerable interest 


Multi-Objective Evolutionary Algorithms | References 


in research and application has also been shown in ad- 
dressing practical aspects in existing EMO algorithms. 
In this direction, handling uncertainty in decision vari- 
ables and parameters, meeting an overall desired system 
reliability in obtained solutions, handling dynamically 
changing problems (on-line optimization), and han- 
dling a large number of objectives have been discussed 
in this chapter. Besides the practical aspects, EMO has 
also attracted mathematically-oriented theoreticians to 
develop EMO algorithms and design suitable problems 
for coming up with a computational complexity anal- 
ysis. There are many other research directions which 
could not even mention due to space restrictions. 

In the short span of about 15 years, it has become 
clear that the field of EMO research and application 
now has efficient algorithms and numerous interesting 


References 


49.1 D.E. Goldberg: Genetic Algorithms for Search, 
Optimization, and Machine Learning (Addison- 
Wesley, Reading 1989) 

49.2 J.H. Holland: Adaptation in Natural and Artificial 
Systems (MIT, Ann Arbor 1975) 

49.3 K.A. De Jong: Evolutionary Computation: A Unified 
Approach (MIT, Cambridge 2006) 

49.4 P.J.M. Laarhoven, E.H.L. Aarts: Simulated Anneal- 
ing: Theory and Applications (Springer, Heidel- 
berg 1987) 

49.5 F. Glover: Tabu search - Part 1, ORSA J. Comput. 
1(2), 190-206 (1989) 

49.6 F. Glover: Tabu search - Part 2, ORSA J. Comput. 
2(1), 4-32 (1990) 

49.7 B.S.W. Schröder: Ordered Sets: An Introduction 
(Birkhauser, Boston 2003) 

49.8 K. Deb: Multi-Objective Optimization Using Evo- 
lutionary Algorithms (Wiley, Chichester 2001) 

49.9 K. Miettinen: Nonlinear Multiobjective Optimiza- 
tion (Kluwer, Boston 1999) 

49.10 H.T. Kung, F. Luccio, F.P. Preparata: On finding 
the maxima of a set of vectors, J. Assoc. Comput. 
Mach. 22(4), 469-476 (1975) 

49.11 J. Jahn: Vector Optimization (Springer, Berlin 
2004) 

49.12 G. Rudolph: On a multi-objective evolutionary al- 
gorithm and its convergence to the Pareto set, 
Proc. 5th IEEE Conf. Evol. Comput. (1998) pp. 511- 
516 

49.13 G. Rudolph, A. Agapie: Convergence properties 
of some multi-objective evolutionary algorithms, 
Proc. 2000 Congr. Evol. Comput. (CEC2000) (2000) 
pp. 1010-1016 

49.14 0. Schütze, M. Laumanns, C.A.C. Coello, M. Dell- 
nitz, E.-G. Talbi: Convergence of stochastic 
search algorithms to finite size Pareto set ap- 


and useful applications, and has been able to attract the- 
oretically and practically-oriented researchers to come 
together and collaborate. The practical importance of 
EMO’s working principle, the flexibility of evolution- 
ary optimization, which lies at the core of EMO algo- 
rithms, and the demonstrated diversification of EMO’s 
principle to a wide variety of different problem-solving 
tasks are the main cornerstones for their success so far. 
The scope of research and application in EMO and us- 
ing EMO are enormous and open-ended. This chapter 
remains an open invitation to everyone who is interested 
in any type of problem-solving tasks to take a look at 
what has been done in EMO and to explore how one 
can contribute to collaborating with EMO to address 
problem-solving tasks that are still in need of a better 
solution procedure. 


proximations, J. Glob. Optim. 41(4), 559-577 
(2008) 

49.15 0. Schütze, M. Laumanns, E. Tantar, C.A.C. Coello, 
E.-G. Talbi: Computing gap-free Pareto front ap- 
proximations with stochastic search algorithms, 
Evol. Comput. J. 18(1), 65-96 (2010) 

49.16 K. Deb, R. Tiwari, M. Dixit, J. Dutta: Finding trade- 
off solutions close to KKT points using evolu- 
tionary multi-objective optimization, Proc. Congr. 
Evol. Comput. (CEC-2007) (2007) pp. 2109-2116 

49.17 P. Shukla, K. Deb: On finding multiple Pareto- 
optimal solutions using classical and evolution- 
ary generating methods, Eur. J. Oper. Res. (EJOR) 
181(3), 1630-1652 (2007) 

49.18 R.S. Rosenberg: Simulation of Genetic Popula- 
tions with Biochemical Properties, Ph.D. Thesis 
(University of Michigan, Ann Arbor 1967) 

49.19 L.J. Fogel, A.J. Owens, M.J. Walsh: Artificial Intel- 
ligence Through Simulated Evolution (Wiley, New 
York 1966) 

49.20 J.D. Schaffer: Some Experiments in Machine 
Learning Using Vector Evaluated Genetic Al- 
gorithms, Ph.D. Thesis (Vanderbilt University, 
Nashville 1984) 

49.21 D.E. Goldberg, J. Richardson: Genetic algorithms 
with sharing for multimodal function optimiza- 
tion, Proc. First Int. Conf. Genet. Algorithms Their 
Appl. (1987) pp. 41-49 

49.22 C.M. Fonseca, P.J. Fleming: Genetic algorithms for 
multiobjective optimization: Formulation, dis- 
cussion, and generalization, Proc. Fifth Int. Conf. 
Genet. Algorithms (1993) pp. 416-423 

49.23 J. Horn, N. Nafploitis, D.E. Goldberg: A niched 
Pareto genetic algorithm for multi-objective op- 
timization, Proc. First IEEE Conf. Evol. Comput. 
(1994) pp. 82-87 


1011 


647 | 3 Hed 


1012 


64 | 3 Hed 


Part E 


Evolutionary Computation 


49.24 


49.25 


49.26 


49.27 


49.28 


49.29 


49.30 


49.31 


49.32 


49.33 


49.34 


49.35 


49.36 


49.37 


N. Srinivas, K. Deb: Multi-objective function 
optimization using non-dominated sorting ge- 
netic algorithms, Evol. Comput. J. 2(3), 221-248 
(1994) 

C. Poloni: Hybrid GA for multi-objective aerody- 
namic shape optimization. In: Genetic Algorithms 
in Engineering and Computer Science, ed. by 
G. Winter, J. Periaux, M. Galan, P. Cuesta (Wiley, 
Chichester 1997) pp. 397-414 

G. Rudolph: Convergence analysis of canonical 
genetic algorithms, IEEE Trans. Neural Netw. 5(1), 
96-101 (1994) 

K. Deb, S. Agrawal, A. Pratap, T. Meyarivan: 
A fast and elitist multi-objective genetic algo- 
rithm: NSGA-II, IEEE Trans. Evol. Comput. 6(2), 
182-197 (2002) 

E. Zitzler, L. Thiele: Multiobjective evolutionary 
algorithms: A comparative case study and the 
strength Pareto approach, IEEE Trans. Evol. Com- 
put. 3(4), 257-271 (1999) 

J.D. Knowles, D.W. Corne: Approximating the 
non-dominated front using the Pareto archived 
evolution strategy, Evol. Comput. J. 8(2), 149-172 
(2000) 

K. Deb, R.B. Agrawal: Simulated binary crossover 
for continuous search space, Complex Syst. 9(2), 
115-148 (1995) 

E. Zitzler, M. Laumanns, L. Thiele: SPEA2: Improv- 
ing the strength Pareto evolutionary algorithm 
for multiobjective optimization. In: Evolution- 
ary Methods for Design Optimization and Con- 
trol with Applications to Industrial Problems, ed. 
by K.C. Giannakoglou, D.T. Tsahalis, J. Périaux, 
K.D. Papailiou, T. Fogarty (CIMNE, Athens 2001) 
pp. 95-100 

D.W. Corne, J.D. Knowles, M. Oates: The Pareto 
envelope-based selection algorithm for multiob- 
jective optimization, Proc. Sixth Int. Conf. Parallel 
Probl. Solving Nat. VI (PPSN-VI) (2000) pp. 839- 
848 

D. Van Veldhuizen, G.B. Lamont: Multiobjective 
evolutionary algorithms: Analyzing the state-of- 
the-art, Evol. Comput. J. 8(2), 125-148 (2000) 
C.A.C. Coello, G. Toscano: A Micro-Genetic Algo- 
rithm for Multi-Objective Optimization, Technical 
Report Lania-RI-2000-06 (Laboratoria Nacional 
de Informatica Avanzada, Xalapa 2000) 

D.H. Loughlin, S. Ranjithan: The neighborhood 
constraint method: A multiobjective optimization 
technique, Proc. Seventh Int. Conf. Genet. Algo- 
rithms (1997) pp. 666-673 

D. Sasaki, M. Morikawa, S. Obayashi, K. Naka- 
hashi: Aerodynamic shape optimization of su- 
personic wings by adaptive range multiobjective 
genetic algorithms, Proc. First Int. Conf. Evol. 
Multi-Criterion Optim. (EMO 2001) (2001) pp. 639- 
652 

C.A.C. Coello, M.S. Lechuga: MOPSO: A proposal for 
multiple objective particle swarm optimization, 


49.38 


49.39 


49.40 


49.41 


49.42 


49.43 


49.44 


49.45 


49.46 


49.47 


49.48 


49.49 


49.50 


49.51 


Congr. Evol. Comput. (CEC'2002), Vol. 2 (IEEE Ser- 
vice Center, Piscataway 2002) pp. 1051-1056 

S. Mostaghim, J. Teich: Strategies for finding good 
local guides in multi-objective particle swarm 
optimization (MOPSO), 2003 IEEE Swarm Intell. 
Symp. Proc. (IEEE Service Center, Indianapolis 
2003) pp. 26-33 

P.R. McMullen: An ant colony optimization ap- 
proach to addessing a JIT sequencing problem 
with multiple objectives, Artifi. Intell. Eng. 15, 
309-317 (2001) 

M. Gravel, W.L. Price, C. Gagné: Scheduling con- 
tinuous casting of aluminum using a multiple 
objective ant colony optimization metaheuristic, 
Eur. J. Oper. Res. 143(1), 218-229 (2002) 

B.V. Babu, M.L. Jehan: Differential evolution for 
multi-objective optimization, Proc. 2003 Congr. 
Evol. Comput. (CEC'2003), Vol. 4 (IEEE, Canberra 
2003) pp. 2696-2703 

S. Bandyopadhyay, S. Saha, U. Maulik, K. Deb: 
A simulated annealing-based multiobjective op- 
timization algorithm: Amosa, IEEE Trans. Evol. 
Comput. 12(3), 269-283 (2008) 

M.P. Hansen: Tabu search in multiobjective op- 
timization: MOTS, Thirteenth Int. Conf. Multi- 
Criterion Decis. Mak. (MCDM'97) (University of Cape 
Town, Cape Town 1997) 

C.A.C. Coello, D.A. VanVeldhuizen, G. Lam- 
ont: Evolutionary Algorithms for Solving Multi- 
Objective Problems (Kluwer, Boston 2002) 

C.A.C. Coello, G.B. Lamont: Applications of Multi- 
Objective Evolutionary Algorithms (World Scien- 
tific, Singapore 2004) 

A. Osyczka: Evolutionary Algorithms for Single and 
Multicriteria Design Optimization (Physica, Hei- 
delberg 2002) 

K.C. Tan, E.F. Khor, T.H. Lee: Multiobjective Evo- 
lutionary Algorithms and Applications (Springer, 
London 2005) 

E. Zitzler, K. Deb, L. Thiele, C.A.C. Coello, 
D.W. Corne: Evolutionary Multi-Criterion Opti- 
mization, 1st International Conference (EMO- 
2001), Lecture Notes in Computer Science, Vol. 1993 
(Springer, Heidelberg 2001) 

C.M. Fonseca, P. Fleming, E. Zitzler, K. Deb, 
L. Thiele: Evolutionary Multi-Criterion Optimiza- 
tion, 2nd International Conference, (EM0-2003), 
Lecture Notes in Computer Science, Vol. 2632 
(Springer, Heidelberg 2003) 

C.A.C. Coello, A.H. Aguirre, E. Zitzler: Evolution- 
ary Multi-Criterion Optimization, 3rd Interna- 
tional Conference (EM0-2005), Lecture Notes in 
Computer Science, Vol. 3410 (Springer, Heidelberg 
2005) 

S. Obayashi, K. Deb, C. Poloni, T. Hiroyasu, T. Mu- 
rata: Evolutionary Multi-Criterion Optimization, 
4th International Conference (EM0-2007), Lecture 
Notes in Computer Science, Vol. 4403 (Springer, 
Heidelberg 2007) 


Multi-Objective Evolutionary Algorithms 


References 


49.52 


49.53 


49.54 


49.55 


49.56 


49.57 


49.58 


49.59 


49.60 


49.61 


49.62 


49.63 


49.64 


49.65 


49.66 


M. Ehrgott, C.M. Fonseca, X. Gandibleux, J.- 
K. Hao, M. Sevaux: Evolutionary Multi-Criterion 
Optimization, 5th International Conference 
(EM0-2009), Lecture Notes in Computer Science, 
Vol. 5467 (Springer, Heidelberg 2009) 

R.H.C. Takahashi, K. Deb, E.F. Wanner, S. Greco: 
Evolutionary Multi-Criterion Optimization, 6th 
International Conference (EMO-2011), Lecture 
Notes in Computer Science, Vol. 6576 (Springer, 
Heidelberg 2011) 

C. A. Coello: List of references on evolutionary 
multiobjective optimization (emo), http://www. 
lania.mx/~ccoello/EMOO/EMOObib.html 

V. Coverstone-Carroll, J.W. Hartmann, W.J. Ma- 
son: Optimal multi-objective low-thurst space- 
craft trajectories, Comput. Meth. Appl. Mech. Eng. 
186(2-4), 387-402 (2000) 

C.G. Sauer: Optimization of multiple target elec- 
tric propulsion trajectories, AIAA 11th Aerosp. Sci. 
Meet. (1973), Paper Number 73-205 

K. Deb, T. Goel: A hybrid multi-objective evolu- 
tionary approach to engineering shape design, 
Proc. First Int. Conf. Evol. Multi-Criterion Optim. 
(EMO-01) (2001) pp. 385-399 

K. Sindhya, K. Deb, K. Miettinen: A local search 
based evolutionary multi-objective optimization 
technique for fast and accurate convergence, 
Proc. Parallel Probl. Solving Nat. (PPSN-2008) 
(Springer, Berlin 2008) 

H. Jin, M.-L. Wong: Adaptive diversity mainte- 
nance and convergence guarantee in multiob- 
jective evolutionary algorithms, Proc. Congr. Evol. 
Comput. (CEC-2003) (2003) pp. 2498-2505 

Z.M. Saul, C.A.C. Coello: A proposal to hy- 
bridize multi-objective evolutionary algorithms 
with non-gradient mathematical programming 
techniques, Proc. Parallel Probl. Solving Nat. 
(PPSN-2008) (2008) pp. 837-846 

M. Fleischer: The measure of Pareto optima: Ap- 
plications to multi-objective optimization, Proc. 
Second Int. Conf. Evol. Multi-Criterion Optim. 
(EMO-2003) (Springer, Berlin 2003) pp. 519-533 

L. While, P. Hingston, L. Barone, S. Huband: 
A faster algorithm for calculating hypervolume, 
IEEE Trans. Evol. Comput. 10(1), 29-38 (2006) 

L. Bradstreet, L. While, L. Barone: A fast incre- 
mental hypervolume algorithm, IEEE Trans. Evol. 
Comput. 12(6), 714-723 (2008) 

M. Laumanns, L. Thiele, K. Deb, E. Zitzler: Com- 
bining convergence and diversity in evolution- 
ary multi-objective optimization, Evol. Comput. 
10(3), 263-282 (2002) 

P.A.N. Bosman, D. Thierens: The balance between 
proximity and diversity in multiobjective evolu- 
tionary algorithms, IEEE Trans. Evol. Comput. 7(2), 
174-188 (2003) 

C.A.C. Coello: Treating objectives as constraints for 
single objective optimization, Eng. Optim. 32(3), 
275-308 (2000) 


49.67 


49.68 


49.69 


49.70 


49.71 


49.72 


49.73 


49.74 


49.75 


49.76 


49.77 


49.78 


49.79 


49.80 


49.81 


49.82 


49.83 


S. Bleuler, M. Brack, E. Zitzler: Multiobjective ge- 
netic programming: Reducing bloat using SPEA2, 
Proc. 2001 Congr. Evol. Comput. (2001) pp. 536- 
543 

E.D. De Jong, R.A. Watson, J.B. Pollack: Reduc- 
ing bloat and promoting diversity using multi- 
objective methods, Proc. Genet. Evol. Comput. 
Conf. (GECCO-2001) (2001) pp. 11-18 

J. Handl, J.D. Knowles: An evolutionary approach 
to multiobjective clustering, IEEE Trans. Evol. 
Comput. 11(1), 56-76 (2007) 

F. Neumann, |. Wegener: Minimum spanning 
trees made easier via multi-objective optimiza- 
tion, GECCO'05: Proc. 2005 Conf. Genetic Evol. 
Comput. (ACM, New York 2005) pp. 763-769 

J.D. Knowles, D.W. Corne, K. Deb: Multiobjective 
Problem Solving from Nature, Springer Natural 
Computing Series (Springer, Heidelberg 2008) 

K. Deb, S. Gupta, D. Daum, J. Branke, A. Mall, 
D. Padmanabhan: Reliability-based optimization 
using evolutionary algorithms, IEEE Trans. Evol. 
Comput. 13(5), 1054-1074 (2009) 

K. Deb, H. Gupta: Introducing robustness in 
multi-objective optimization, Evol. Comput. J. 
14(4), 463-494 (2006) 

M. Basseur, E. Zitzler: Handling uncertainty in 
indicator-based multiobjective optimization, Int. 
J. Comput. Intell. Res. 2(3), 255-272 (2006) 

T.R. Cruse: Reliability-Based Mechanical Design 
(Marcel Dekker, New York 1997) 

X. Du, W. Chen: Sequential optimization and re- 
liability assessment method for efficient proba- 
bilistic design, ASME Trans. J. Mech. Des. 126(2), 
225-233 (2004) 

J. Branke, K. Deb, H. Dierolf, M. Osswald: Find- 
ing knees in multi-objective optimization, Lect. 
Notes Comput. Sci. 3242, 722-731 (2004) 

A.P. Wierzbicki: The use of reference objectives in 
multiobjective optimization. In: Multiple Criteria 
Decision Making Theory and Applications, ed. by 
G. Fandel, T. Gal (Springer, Berlin 1980) pp. 468- 
486 

P. Korhonen, J. Laakso: A visual interactive 
method for solving the multiple criteria problem, 
Eur. J. Oper. Res. 24, 277-287 (1986) 

J. Branke, K. Deb, K. Miettinen, R. Slowinski: 
Multiobjective Optimization: Interactive and Evo- 
lutionary Approaches (Springer, Berlin 2008) 

K. Deb, J. Sundar, N. Uday, S. Chaudhuri: Ref- 
erence point based multi-objective optimization 
using evolutionary algorithms, Int. J. Comput. In- 
tell. Res. (IJCIR) 2(6), 273-286 (2006) 

K. Deb, A. Kumar: Light beam search based multi- 
objective optimization using evolutionary algo- 
rithms, Proc. Congr. Evol. Comput. (CEC-07) (2007) 
pp. 2125-2132 

K. Deb, A. Kumar: Interactive evolutionary multi- 
objective optimization and decision-making us- 
ing reference direction method, Proc. Genet. Evol. 


1013 


647 | 3 Hed 


1014 PartE 


Evolutionary Computation 


64 | 3 Hed 


49.84 


49.85 


49.86 


49.87 


49.88 


49.89 


49.90 


49.91 


49.92 


49.93 


49.94 


49.95 


49.96 


49.97 


Comput. Conf. (GECCO-2007) (ACM, New York 2007) 
pp. 781-788 

L. Thiele, K. Miettinen, P. Korhonen, J. Molina: 
A Preference-Based Interactive Evolutionary Al- 
gorithm for Multiobjective Optimization, Tech- 
nical Report Working Paper W-412 (Helsingin 
School of Economics, Helsingin Kauppakorkeak- 
oulu 2007) 

M. Luque, K. Miettinen, P. Eskelinen, F. Ruiz: In- 
corporating preference information in interactive 
reference point based methods for multiobjective 
optimization, Omega 37(2), 450-462 (2009) 

V. Khare, X. Yao, K. Deb: Performance scaling 
of multi-objective evolutionary algorithms, Lect. 
Notes Comput. Sci. 2632, 376-390 (2003) 

J. Knowles, D. Corne: Quantifying the effects of 
objective space dimension in evolutionary mul- 
tiobjective optimization, Lect. Notes Comput. Sci. 
4403, 757-771 (2007) 

J.A. Lopez, C.A.C. Coello: Some techniques to deal 
with many-objective problems, Proc. 11th Annu. 
Conf. Companion Genet. Evol. Comput. Conf. (ACM, 
New York 2009) pp. 2693-2696 

E.J. Hughes: Evolutionary many-objective opti- 
misation: Many once or one many?, IEEE Congr. 
Evol. Comput. (CEC-2005) (2005) pp. 222-227 

D.K. Saxena, J.A. Duro, A. Tiwari, K. Deb, Q. Zhang: 
Objective reduction in many-objective optimiza- 
tion: Linear and nonlinear algorithms, IEEE Trans. 
Evol. Comput. 17(1), 77-99 (2013) 

K. Deb, H. Jain: An Improved NSGA-II Procedure 
for Many-Objective Optimization, Part lI: Problems 
with Box Constraints, Tech. Rep. KanGAL Report, 
Vol. 2012009 (Indian Institute of Technology, Kan- 
pur 2012) 

K. Deb, H. Jain: An Improved NSGA-II Procedure 
for Many-Objective Optimization, Part II: Han- 
dling Constraints and Extending to an Adaptive 
Approach, Tech. Rep. KanGAL Report, Vol. 2012010 
(Indian Institute of Technology, Kanpur 2012) 

K. Deb, H. Jain: Handling many-objective prob- 
lems using an improved NSGA-II procedure, Proc. 
World Congr. Comput. Intell. (WCCI-2012) (2012) 

Q. Zhang, H. Li: MOEA/D: A multiobjective evolu- 
tionary algorithm based on decomposition, Evol. 
Comput. IEEE Trans. 11(6), 712-731 (2007) 

J. Branke, K. Deb: Integrating user preferences 
into evolutionary multi-objective optimization. 
In: Knowledge Incorporation in Evolutionary 
Computation, ed. by Y. Jin (Springer, Heidelberg 
2004) pp. 461-477 

K. Deb, P. Zope, A. Jain: Distributed computing of 
pareto-optimal solutions using multi-objective 
evolutionary algorithms, Lect. Notes Comput. Sci. 
2632, 535-549 (2003) 

K. Deb, D. Saxena: Searching for Pareto-optimal 


solutions through dimensionality reduction 
for certain large-dimensional multi-objective 
optimization problems, Proc. World Congr. 


49.98 


49.99 


49.100 


49.101 


49.102 


49.103 


49.104 


49.105 


49.106 


49.107 


49.108 


49.109 


49.110 


Comput. Intell. (WCCI-2006) (2006) pp. 3352- 
3360 

D.K. Saxena, K. Deb: Non-linear dimensionality 
reduction procedures for certain large-dimen- 
sional multi-objective optimization problems: 
Employing correntropy and a novel maximum 
variance unfolding, Proc. Fourth Int. Conf. Evol. 
Multi-Criterion Optim. (EMO-2007) (2007) pp. 772- 
787 

D. Brockhoff, E. Zitzler: Dimensionality reduc- 
tion in multiobjective optimization: The mini- 
mum objective subset problem. In: Operations 
Research Proceedings 2006, ed. by K.H. Wald- 
mann, U.M. Stocker (Springer, Heidelberg 2007) 
pp. 423-429 

D. Brockhoff, E. Zitzler: Offline and Online 0b- 
jective Reduction in Evolutionary Multiobjective 
Optimization Based on Objective Conflicts, TIK Re- 
port, Vol. 269 (Institut für Technische Informatik 
und Kommunikationsnetze, ETH Zürich 2007) 

M. Farina, P. Amato: A fuzzy definition of optimal- 
ity for many criteria optimization problems, IEEE 
Trans. Syst., Man Cybern. Part A: Syst, Hum. 34(3), 
315-326 (2004) 

K. Deb, A. Srinivasan: Innovization: Innovat- 
ing design principles through optimization, Proc. 
Genet. Evol. Comput. Conf. (GECCO-2006) (ACM, 
New York 2006) pp. 1629-1636 

S. Bandaru, K. Deb: Towards automating the dis- 
covery of certain innovative design principles 
through a clustering based optimization tech- 
nique, Eng. Optim. 43(9), 1-941 (2011) 

S. Bandaru, K. Deb: Automated innovization for 
simultaneous discovery of multiple rules in bi- 
objective problems, Proc. Sixth Int. Conf. Evol. 
Multi-Criterion Optim. (EMO-2011) (Springer, Hei- 
delberg 2011) pp. 1-15 

J. Branke: Evolutionary Optimization in Dynamic 
Environments (Springer, Heidelberg 2001) 

K. Deb, U.B. Rao, S. Karthik: Dynamic multi- 
objective optimization and decision-making us- 
ing modified NSGA-II: A case study on hydro- 
thermal power scheduling bi-objective optimiza- 
tion problems, Proc. Fourth Int. Conf. Evol. Multi- 
Criterion Optim. (EMO-2007) (2007) 

K. Deb: Multi-objective genetic algorithms: Prob- 
lem difficulties and construction of test problems, 
Evol. Comput. J. 7(3), 205-230 (1999) 

K. Deb, L. Thiele, M. Laumanns, E. Zitzler: Scalable 
test problems for evolutionary multi-objective 
optimization. In: Evolutionary Multiobjective Op- 
timization, ed. by A. Abraham, L. Jain, R. Gold- 
berg (Springer, London 2005) pp. 105-145 

S. Huband, L. Barone, L. While, P. Hingston: 
A scalable multi-objective test problem toolkit, 
Proc. Evol. Multi-Criterion Optim. (EMO-2005) 
(Springer, Berlin 2005) 

T. Okabe, Y. Jin, M. Olhofer, B. Sendhoff: On test 
functions for evolutionary multi-objective opti- 


Multi-Objective Evolutionary Algorithms 


References 


49.111 


49.112 


49.113 


49.114 


49.115 


49.116 


49.117 


49.118 


49.119 


mization, Parallel Problem Solving from Nature 
(PPSN VIII) (2004) pp. 792-802 

J.D. Knowles, D.W. Corne: On metrics for com- 
paring nondominated sets, Congr. Evol. Comput. 
(CEC-2002) (IEEE, Piscataway 2002) pp. 711-716 

M. P. Hansen, A. Jaskiewicz: Evaluating the Qual- 
ity of Aapproximations to the Non-Dominated 
Set, Technical Report IMM-REP-1998-7 (Institute 
of Mathematical Modelling, Technical University 
of Denmark, Lyngby 1998) 

E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, 
V.G. Fonseca: Performance assessment of multi- 
objective optimizers: An analysis and review, IEEE 
Trans. Evol. Comput. 7(2), 117-132 (2003) 

E. Zitzler, S. Kiinzli: Indicator-based selection in 
multiobjective search, Lect. Notes Comput. Sci. 
3242, 832-842 (2004) 

C.M. Fonseca, P.J. Fleming: On the perfor- 
mance assessment and comparison of stochas- 
tic multiobjective optimizers. In: Parallel Prob- 
lem Solving from Nature (PPSN IV), ed. by H.- 
M. Voigt, W. Ebeling, |. Rechenberg, H.-P. Schwe- 
fel (Springer, Berlin 1996), pp. 584-593, Also 
available as Lecture Notes in Computer Science 
Vol. 1141 

C.M. Fonseca, V. da Grunert Fonseca, L. Paquete: 
Exploring the performance of stochastic multi- 
objective optimisers with the second-order at- 
tainment function, Third Int. Conf. Evol. Multi- 
Criterion Optim. (EMO-2005) (Springer, Berlin 
2005) pp. 250-264 

A. Auger, J. Bader, D. Brockhoff: Theoretically in- 
vestigating optimal w-distributions for the hy- 
pervolume indicator: First results for three ob- 
jectives, Lect. Notes Comput. Sci. 6238, 586-596 
(2010) 

J. Bader, K. Deb, E. Zitzler: Faster hypervolume- 
based search using monte carlo sampling, Lect. 
Notes Econ. Math. Syst. 634, 313-326 (2010) 

M. Laumanns, L. Thiele, E. Zitzler, E. Welzl, K. Deb: 
Running time analysis of multi-objective evolu- 
tionary algorithms on a simple discrete optimiza- 
tion problem, Proc. Seventh Conf. Parallel Probl. 
Solving Nat. (PPSN-VII) (2002) pp. 44-53 


49.120 


49.121 


49.122 


49.123 


49.124 


49.125 


49.126 


49.127 


49.128 


49.129 


0. Giel: Expected runtimes of a simple multi- 
objective evolutionary algorithm, Proc. Congr. 
Evol. Comput. (CEC-2003) (IEEE, Piscatway 2003) 
pp. 1918-1925 

M. Laumanns, L. Thiele, E. Zitzler: Running time 
analysis of multiobjective evolutionary algo- 
rithms on pseudo-Boolean functions, IEEE Trans. 
Evol. Comput. 8(2), 170-182 (2004) 

0. Giel, P.K. Lehre: On the effect of populations in 
evolutionary multi-objective optimization, Proc. 
8th Annu. Genet. Evol. Comput. Conf. (GECCO 
2006) (ACM, New York 2006) pp. 651-658 

R. Kumar, N. Banerjee: Analysis of a multiob- 
jective evolutionary algorithm on the 0-1 knap- 
sack problem, Theor. Comput. Sci. 358(1), 104-120 
(2006) 

M.A. El-Beltagy, P.B. Nair, A.J. Keane: Metamod- 
elling techniques for evolutionary optimization 
of computationally expensive problems: Promises 
and limitations, Proc. Genet. Evol. Comput. Conf. 
(GECCO-1999) (Morgan Kaufman, San Mateo 1999) 
pp. 196-203 

K.C. Giannakoglou: Design of optimal aero- 
dynamic shapes using stochastic optimization 
methods and computational intelligence, Prog. 
Aerosp. Sci. 38(1), 43-76 (2002) 

P.K.S. Nain, K. Deb: Computationally effective 
search and optimization procedure using coarse 
to fine approximations, Proc. Congr. Evol. Com- 
put. (CEC-2003) (2003) pp. 2081-2088 

K. Deb, P.K.S. Nain: An Evolutionary Multi- 
Objective Adaptive Meta-Modeling Procedure Us- 
ing Artificial Neural Networks (Springer, Berlin 
2007) pp. 297-322 

M. Emmerich, K.C. Giannakoglou, B. Naujoks: 
Single and multiobjective evolutionary optimiza- 
tion assisted by gaussian random field meta- 
models, IEEE Trans. Evol. Comput. 10(4), 421-439 
(2006) 

M. Emmerich, B. Naujoks: Metamodel-assisted 
multiobjective optimisation strategies and their 
application in airfoil design, Adaptive Computing 
in Design and Manufacture VI (Springer, London 
2004) pp. 249-260 


1015 


647 | 3 Hed 


50. Parallel Multiobjective Evolutionary Algorithms 


Francisco Luna, Enrique Alba 


The use of evolutionary algorithms (EAs) for solving 
multiobjective optimization problems has been 
very active in the last few years. The main rea- 
sons for this popularity are their ease of use with 
respect to classical mathematical programming 
techniques, their scalability, and their suitabil- 
ity for finding trade-off solutions in a single run. 
However, these algorithms may be computationally 
expensive because (1) many real-world optimiza- 
tion problems typically involve tasks demanding 
high computational resources and (2) they are 
aimed at finding a whole front of optimal so- 
lutions instead of searching for a single optimum. 
Parallelizing EAs emerges as a possible way of re- 
ducing the CPU time down to affordable values, 
but it also allows researchers to use an advanced 
search engine — the parallel model — that provides 
the algorithms with an improved population di- 
versity and enable them to cooperate with other 
(eventually nonevolutionary) techniques. The goal 
of this chapter is to provide the reader with an up- 


50.1 Multiobjective Optimization 


and Parallelism ...................cc:c:cseeeeeeees 1017 
50.2 Parallel Models for Evolutionary 
Multi-Objective Algorithms .................. 1018 
50.2.1 Specialized Models 
fot Parallel EAS seers 1018 
50.2.2 General Models 
for Parallel Metaheuristics.......... 1020 
50.3 An Updated Review of the Literature .... 1020 
50.3.1 Analysis by Year ...............0..60606- 1023 


50.3.2 Analysis of the Parallel Models ... 1023 
50.3.3 Review of the Software 


Implementations ..................0 1025 

50.3.4 Main Application Domains ......... 1025 

50.4 Conclusions and Future Works.............. 1026 
SUA SUMMMIEIG. «5 ccccnsasxscasnenderpesinaterescc 1026 
50.4.2 Future WENGS ....3..scciaccsrsaccesceaes 1026 
ROT OT OTE COS oean rae a ei 1027 


to-date review of the recent literature on parallel 
EAs for multiobjective optimization. 


50.1 Multiobjective Optimization and Parallelism 


Multiobjective optimization arises in many real-world 
applications, especially in engineering, in which sev- 
eral performance criteria conflict with each other. These 
conflicting objectives make the optimization results in 
that no single solution can usually optimize them all 
simultaneously. Indeed, the aim of multiobjective opti- 
mization is to find a set of compromise solutions with 
different tradeoffs among criteria, also known as the 
Pareto optimal set. When this set is plotted in the ob- 
jective space it is called the Pareto front [50.1, 2]. 
Many different techniques have been proposed 
in the multiobjective research community to address 
multiobjective optimization problems (MOPs). Unlike 
classical mathematical programming approaches, meta- 


heuristics in general, and EAs (multiobjective evo- 
lutionary algorithms or MOEAs) in particular, have 
attracted growing attention over the last decade because 
of two main facts. On the one hand, EAs have the ability 
to generate several members of the Pareto optimal set 
in one single run, as opposed to classical multicriteria 
decision-making techniques. They are also less sensi- 
tive to the shape of the Pareto front so therefore can 
deal with a large variety of MOPs. On the other hand, as 
randomized black-box algorithms, EAs can address op- 
timization problems with nonlinear, nondifferentiable, 
or noisy objective functions. 

In spite of these advantages, these algorithms might 
be computationally expensive because, on the one hand, 


1017 


1018 


TOS | J Hed 


Part E 


Evolutionary Computation 


they need to explore larger portions of the search 
space since they seek the entire Pareto front, which 
usually results in more function evaluations being per- 
formed; on the other hand, and even more importantly, 
many real-world multiobjective problems typically use 
computationally expensive methods for computing the 
objective functions and constraints. 

These issues are usually addressed in two differ- 
ent ways. First, one can use surrogate models of the 
fitness functions instead of true fitness function evalua- 
tions [50.3—5]. The second more important line lies in 
using parallel computing platforms to speed up the EA 
search [50.6]. This is the mainstream of this chapter. 

Due to their population-based approach, EAs are 
very suitable for parallelization because their main op- 
erations (i.e., crossover, mutation, and in particular 
function evaluation) can be carried out independently 
on different individuals. There is a vast amount of liter- 
ature on how to parallelize EAs; the reader is referred 
to [50.7-10] for surveys on this topic. However, par- 
allelism here is not only a way for solving problems 
more rapidly, but also for developing new and more 
efficient search models: a parallel EA can be more ef- 
fective than a sequential one, even when executed on 
a single processor. The advantages that parallelism offer 
to single-objective optimization also hold in multiob- 
jective optimization. Of particular interest for parallel 


MOFA s is the improvement in population diversity that 
shall help to fully approximate the entire Pareto front of 
the given optimization problems. 

The contribution of this chapter is to provide the 
reader with a recent review of publications related to 
parallel MOEAs, showing the latest advances in the 
field. Given the shear volume of papers, we have been 
forced to restrict ourselves to only those works which 
have been published since 2008/09, the years to which 
the two best known surveys date back [50.11, 12]. The 
structure of this chapter distinguishes between the nu- 
merical model of the parallel MOEA and its physical 
parallelization. In seminal papers in the fields [50.13], 
it was assumed that the model maps directly onto the 
parallel computing platform, but this is no longer true 
and any (MO)EA can be deployed in parallel, but not al- 
ways resulting in a high performance. The next section 
is therefore devoted to presenting the classical models 
for parallel EAs and a recent proposal that not only 
considers EAs, but metaheuristics and exact algorithms 
in general. Section 50.3 dives into the details of more 
than 80 publications, analyzing particular features of 
MOEFAs (fitness assignment, diversity preservation) as 
well as on their parallelization (model, topology, paral- 
lel platform). Finally, the last section presents the main 
conclusions and the trends for future research on paral- 
lel MOEAs. 


50.2 Parallel Models for Evolutionary Multi-Objective Algorithms 


Parallelism arises naturally when dealing with pop- 
ulations of individuals, since each individual is an 
independent unit. As a consequence, the performance 
of population-based algorithms is particularly improved 
when run in parallel. The main models for parallel 
MOEAs have been proposed within two clear scopes: 
especially EA-targeted models coming from the EA 
community [50.7, 14, 15] and those proposed for par- 
allel metaheuristics in general (of which EAs are a sub- 
class) [50.12, 16]. They are briefly presented in the 
following subsections. 


50.2.1 Specialized Models for Parallel EAs 


The most well-known models for parallel MOEAs have 
been inherited directly from the single-objective paral- 
lel EA community, in which two parallelizing strategies 
are defined for population-based algorithms: (1) par- 
allelization of computation, in which the operations 


commonly applied to each individual are performed 
in parallel, and (2) parallelization of population, in 
which the population is split into different parts, each 
one evolving in semi-isolation (individuals can be ex- 
changed between subpopulations). 

The simplest parallelization scheme of EAs is 
the well-known master-slave or global parallelization 
method (Fig. 50. 1a). In this scheme, a central processor 
performs the selection operations while the associated 
slave processors perform the recombination, mutation, 
and/or the evaluation of the fitness function. This al- 
gorithm is the same as the traditional (one population, 
panmictic), although it is faster, especially for time- 
consuming objective functions. Its simplicity has made 
it the most popular among practitioners. 

However, other models for parallel EAs utilize some 
kind of spatial disposition of the individuals (it is said 
that the population is then structured), and afterward 
parallelize the resulting chunks in a pool of proces- 


Parallel Multiobjective Evolutionary Algorithms | 50.2 Parallel Models for Evolutionary Multi-Objective Algorithms 


Fig. 50.1a-f Different models of 
parallel EAs: (a) global paralleliza- 
tion, (b) coarse grain, and (c) fine 


grain. Many hybrids have been de- 


fined by combining parallel EAs at 


two levels: (d) coarse and fine grain, 


Slaves Slaves 


e) f) 


HH) (83 


De 
ae 


sors. Among the most widely known types of structured 
EAs, the distributed (dEA) (or coarse-grain) and cel- 
lular (CEA) (fine-grain or diffusion) algorithms are 
very popular optimization procedures [50.7]. In the case 
of distributed EAs (Fig. 50.1b), the population is par- 
titioned into a set of islands in which isolated EAs 
run in parallel. Sparse individual exchanges are per- 
formed among these islands, with the goal of inserting 
some diversity into the subpopulations, thus avoiding 
them getting stuck in local optima. Islands may apply 
the same (homogeneous) or different (heterogeneous) 
EAs [50.17]. In the case of a cellular EA (Fig. 50.1c), 
subpopulations are typically composed of one individ- 
ual, which may only interact with its nearest neighbors 
in the breeding loop, i. e., the concept of neighborhood 
is introduced. These neighborhoods are overlapped, 
which implicitly defines a migration mechanism and 
allows a smooth diffusion of the best solutions through- 
out the population. This parallel scheme was targeted 
to massively parallel computers but nowadays it can 
be used sequentially on a regular computer or in par- 
allel on graphic processing units (GPUs). Also, hybrid 
models have been proposed (Fig. 50.1d—f) in which 
a two-level approach of parallelization is undertaken. 
In these models, the higher level for parallelization uses 
to be a coarse-grain implementation and the basic is- 
land performs a CEA, a master-slave method, or even 
another distributed one. 

This taxonomy holds as well for parallel 
MOEAs [50.15], so we can consider master-slave 
MOEAs (msMOEAs), distributed MOEAs (dMOEAs), 


(e) coarse grain and global paral- 
lelization, and (f) coarse grain at the 
two levels 


and cellular MOEAs (cMOEFAs). Nevertheless, these 
two decentralized population approaches need a further 
particularization for MOPs [50.14]. As we stated be- 
fore, the main goal of any multiobjective optimization 
algorithm is to find the optimal Pareto front for a given 
MOP. It is clear that in msMOEFAs the management of 
this Pareto front is carried out by the master processor. 
But, when the search process is distributed among 
different subalgorithms, as happens in dMOEAs and 
cMOEAs, the management of the nondominated set of 
solutions during the optimization procedure becomes 
a capital issue. Hence, it can be distinguished when the 
Pareto front is distributed and locally managed by each 
sub-EA during the computation, or it is a centralized 
element of the algorithm. They have been called 
centralized Pareto front (CPF) structured MOEAs and 
Distributed Pareto Front (DPF) structured MOEAs, 
respectively [50.16]. 

For distributed MOEAs, very specialized models 
have been proposed in the literature which are aimed at 
capturing the different approaches for partitioning the 
search of each island so as to avoid them overlapping 
their exploration [50.18]. On the one hand, each island 
may consider a different subset of the objectives and 
then either aggregate them into a single-objective prob- 
lem [50.19] or use a coevolutionary approach [50.20]. 
On the other hand, the search space (either the de- 
cision space or the objective space) can be explicitly 
partitioned and assigned to different islands. As stated 
in [50.11], in a general multiobjective problem it is dif- 
ficult to design an a priori distribution so that it: 


1019 


70S |3 Hed 


1020 


€°0S | 3 Hed 


Part E 


Evolutionary Computation 


Covers the entire search space, 

. Assigns regions of equal size, and 

3. Aggregates a minimum complexity to constraint 
demes to their assigned region. 


| e 


50.2.2 General Models 
for Parallel Metaheuristics 


Several models have been proposed for parallelizing 
metaheuristics [50.21,22] in which EAs, as a type of 
metaheuristic, perfectly fit. For parallel MOEAs, two 
main approaches have been proposed in the literature. 
In [50.16], the authors distinguish between single-walk 
and multiple-walk parallelizations. The former is aimed 
at speeding up the computations by parallelizing the 
evaluation of the objective functions or the search op- 
erators. In the latter, several search threads (EAs or 
any other search method) cooperate to better explore 
the search space (not only accelerating the execution). 
The same issue with the Pareto front as in the parallel 
MOEA models emerges here, so the authors also subdi- 
vide into centralized and distributed Pareto front models 
(CPF and DPF, respectively). 


On the other hand, Talbi etal. [50.12] catego- 
rize parallel metaheuristics in three major hierarchical 
models. The self-contained parallel cooperation is tar- 
geted to parallel computing platforms with limited 
communication. The search is performed by several 
subalgorithms in parallel, which might cooperate by 
exchanging some kind of information. It embraces the 
island model or dMOEAs explained before. Two main 
groups are distinguished: cooperating subpopulations, 
which are based on partitioning the objective/search 
space; and the multistart approach, in which several op- 
timization algorithms run separately in parallel. In the 
former, subpopulations can be homogeneous or hetero- 
geneous, explore separate regions of the search space, 
etc. The latter lies in running several local search al- 
gorithms in parallel. On a second and third level of 
the hierarchy, the authors consider those models aimed 
merely at speeding up the computations: problem in- 
dependent parallelization, which mainly comprises the 
master-slave approach of parallel fitness evaluation, 
and problem dependent parallelization, which focuses 
on subdividing single evaluations into parallel tasks that 
speed up the evaluation step. 


50.3 An Updated Review of the Literature 


This section is devoted to presenting and analyzing the 
most recent contributions in the literature to the parallel 
evolutionary multiobjective optimization field. We have 
structured the published material according to the clas- 
sical parallel EA models, i. e., master/slave, distributed, 
cellular, and hybrid models (Sect. 50.2.1) because this 
chapter is targeted precisely to EAs and, as a conse- 
quence, this classification better captures the design 
principles of the different contributions. Table 50.1 in- 
cludes, ordered by the year of publication, an updated 
review of the field. Also, in order to help the reader with 
the terminology of this table, Table 50.2 displays the 
symbols used and their definitions. Then, for each row 
of Table 50.1, the following information is shown: 


© FA-DP (Fitness assignment and diversity preserva- 
tion): As two of the most important design issues in 
EMO algorithms, the fitness assignment and diver- 
sity preservation mechanisms allow, respectively, 
to better guide the search toward Pareto optimal 
solutions and to spread out these Pareto optimal 
solutions along the entire Pareto front. They are 
frequently merged into one single measure that 
translates the vector of objective functions value 


of a multiobjective problem into one single scalar 
value which is used to rank solutions properly (from 
a Pareto optimality point of view, nondominated so- 
lutions are noncomparable). 

@ PM (Parallel model): It can take the values MS 
(Master/slave), Dis (distributed model), Cell (cel- 
lular model) or Hyb (hybrid), according to the 
classical parallel EA categorization. 

@ PFC (Pareto front computation): This column dis- 
tinguishes between the CPF and DPF strategies 
defined before. 

@ PP (parallel platform): When applicable, this col- 
umn indicates the kind of parallel computing plat- 
form in which the given algorithm is executed 
(GPUs, multicore, cluster, grid, etc.). 

@ Topology: Communication topology of the parallel 

MOEA (Star, Hybrid, all-to-all [A2A], etc.). 

@ Programming: When publicly reported, the pro- 

gramming language used to implement the parallel 

MOEA is included in this column. 

@ Description: The main features of the parallel 

MOEA in a few words. 

@ Application domain: The area in which the parallel 

MOEA has been applied. 


Parallel Multiobjective Evolutionary Algorithms | 50.3 An Updated Review of the Literature 1021 


Part E | 50.3 


ZILA ‘LAZ 
Sumu Lep ‘ZILA ‘LAZ 


yowəsər anynousy 


LAZ 

uSsIsop SULIeOg-ITV 

GZ 

SLANVI Ur uoneziumdo jseopeorg 
Suruonnsed ys 

yoesdeuy 

usIsop Sq 


Sulouvyeqos anol YIM JAA 
yoesdeuy sansofgon[nyy 
‘do uonenuaye [e}sk10 oruog 
quowAojdap NSM ‘ZILA 


yoesdeuy pue Zunnpoyos 
qpeuyousq VAA 

uSISop Sulm-surydiojy 
uoneziundo ormeuckposer ‘LAZ 
Suruuejd uononpoid ‘LAZ 

Sursen Arojoofely, 

usIsop 

wa}sks pəppoquə-erpown mW 
SuONe]JO]SUOS es peuoneIdo 


USISOpP UID}01g 

uonezrumdo josojoid Sunseopeorg 
DAM ‘LAZ 

DAM ‘ZILA 

suoneis uoneunoryo JO UONRIOTTY 
wəjqord Surysop emon 
uoneziundo odeys stueuApoisy 
ulewop uoyvddy 


wWyLOsTe poseq-qq OW [eed 

Nd9 vo wynos OWA Ped 

Joysnjo 

JINMoog e uo uoneziojeed T]-VDSN SIN 
woes TU 

oandepe yim VAOW Uonendod-1g 
quado s110 WIM WAOW PEA 
VAOWE uonejndodgns uonnjosar afdninyy 
SYJON Jo Asaqens sanerodoos pojpeIed 
TVAdS pamqmsıp snoəuəJo199H 
SVHOW Peseq-purjst 107 uolojys Preda 
VAOW Ppəseq-yde13 pored 

sxoqyĝrou 

JUDIOPJIpP UO UONVIFTU 10} SIATYOTE OML 
G/VHONN JO TOIA Np 

VAON-? IALJS/INSEN 

VAOW Peseq-es 

quod 

oed [eqo|s e pue puegs: SM JWopuedopuy 
uonezrundo orueuAp 10} YVJON PIREA 
IFVOSN 3AIS/INSEN 

sjuose pue sojod yım VAOW M-e] 
VAOW SI 2181S-Apes}s pasrsse-ae Zong 
YIS [BOOT YM T]-VOSN MLSS 
CVaddS pue 

I-VOSN Sururquios YAOW Peda 
SJO|SN[I SNOdUSSOIO}OY JOJ []-VOSN-? SIN 
SPUD 

uo pəÁojdəp əq 0} YAOI poseq-purjs] 
oysunoyIodAy paseqg-ounyjoasedAyY Joed 
onstineysoddAy poseq-ounyjoasedAyY Joeda 
Ad P™ IDON Jo uoyezipuqAH 

SPULIST parejosr y TI-VOSN Paseq PUD 
Vad Pue I-VOSN SW 

JOAOs Joyered uoNNpOsomMpNU TeoryoeINH 
uondinssq 


V/N 
van 


PRN 
VIN 


V/N 
IdW/++) 
IJN 
V/N 
V/N 
Id 


VOS/SAHC 
V/N 


CO-HOIdN 
IJN 

IJAN 

esef 
PHO 
IdW/++) 
WAd/D 


puey 
IEIS 


IIS 


MLA 
WIS 
IHIH 
puey 
DLAN 
puey 
WIS 


snIoL 
pny 
WIS 
V/N 


[Os] 
WIS 
WIS 
qsəW 
WIS 
WIS 


SUA 
IŞ 


Suy 
puey 
vzy 
puo 
IŞ 
IŞ 
qH 


summueis0ig Asojodoy, 


V/N 
NdD 


IJS 


V/N 
IJS 
V/N 
IJS 
IOMMIN 
IJS 
IONN 


3I0NMN 
bas 
ASNO 
IION 


V/N 
IOMMIN 


IJSNJO 
IASI 

V/N 
IJSNJI 


a109T] [NJ 
IASI 


PHD 
JasniO 
IRSN 

bas 
bas 
Jo\snyO 
Jo\snyO 
dd 


did sid qL O10 [g9°0S] 
ddd SWN Ou 6002 [29‘99°0S] 
ddd SW DU 6007 [s9°0S] 
dda sid Du 6002 [9‘E9°0S] 
ddd SIN U 6007 [z9'0S] 
dda sid JAS 6007 [190S] 
dda siq oA 6007 [090S] 
dda sid JAS 6007 [6s'osS] 
dda siq d 6007 [8s ‘ZS°0S] 
dda sta U 6007 [osos] 
dda siq Sa 6007 [ss'OS] 
dda TO SM 6007 [rs ‘esosS] 
ddd SW > 6007 [zsosS] 
did sid qı 6007 [ISOS] 
ddð sid SM 6007 [osos] 
dda sid oa 6007 [6r‘8ros] 
ddd SWN DU 6007 [Lr'0S] 
ddd [led JAS 6007 [90S] 
ddd SW DU 8007 [c'os] 
ddð SW oA 8007 [sr'0s] 
dda sid Du s8007 = [rr ‘er'0s] 
ddd SIN > g00c [tr‘Ir‘0s] 
dda sid DU 8007 [ors] 
dda sid qı 8007 [6¢0S] 
dda sid qı 300c_ [8¢-9¢ 0S] 
dio [led JAS 8007 [seos] 
dda sid oA 8007 [peos] 
dio SW Ou g007 [ee ‘ze0s] 
ddd SW Id 800c  I[I€-€7 0S] 
Əd Wd dG-VA AeaX SVWIƏPA 


8007 99UTS SVHOW [oiled LOS a/qeL 


1022 PartE | Evolutionary Computation 


ZE IAZ 
JonUOS JaMmod 3ANILII/IFLN OA 


səurguə ut ‘do uonsnquiod ‘LAZ 
wəjqosd yousdeuy 1/0 

wa]qoid Suruontsod euuguy 

FUU NI UONROYISse[D sANeIOOSSY 
SuT[qejowN stwWoeproy 


sainjonns sodu paroAepn[nyy 
SopOIYoA puqÁy Jo uoNsnquioD 


ZILA 


Sul[npoyps YseL ‘ZILA ‘LAZ 
uonorpaid aimjonas UIA 
Suljnpeyss doys-Mo]4 

ZILC “LAZ 


Zl 


quowXojdap aremyjos 

ssoooid Suruedii əsəəyə [eLysnpuy 
Surlpnpoyps yoofoig 

Surlnpoyps yoafoig 


suonouny yreuyoueg 


LZ 
ANSNput SUISIOJ [RoTURYdI/\ 


waqoid Sulpnpoyos doys-moja 
uonorpeid aimjonss ule}01g 
QoUdIOJUT ONoUaso[AYG 
ulewiop uoneoyddy 


Part E | 50.3 


wypuose 

OW Oped HUON uey? Ao eN 4A 

yg 2ueuuop ? jaTereg 

S}S09 UONENJeLA “Joy WIM WAOW ‘ouAsy 
II-VDSN PYISI/SIN Pugdy pered 
VAOW Peseq-purjlst poreIed 

uonesmedag su0D Y™ TI-VOSN Peda 
XA Pewa Suisn syAOW [feed 
ylomowey 

oudsT][a}UI [euONenduIOS porera 
VDON Perda 

swojqord 

oanoofqo-Aueu 10} VJON PIREA 
VAHOW 

Axeuonnj{oasos sanesadoos JojeIed 
SAVd Pu IE VOSN Messe [sere 
puvyst yoe ur SIYSIOM JUOIOIIP YIM WOU 
SpueIS! OS Pur OWN JO UoHeUTquIO) 
sJossoo01d 

sromu 10} G/VAOW Peed 

*syoo} Surppuey 

JURHSUOI TEIMAS YIM []-VOSNP 
VHON Peseq-pue|s] 

I-VOSN PWPW 

II-VOSN MALSAN 

uonenueis 

ssouy Azznyj əandepe ym yoo 
VOOWD ® 

ur amypnys uonerndod 190d-0}-199g 
II-VOSN Palsisse [opoul-vjour JojpeIed 
JOAJOS 

OW Wod svu ojdymu pored 
soovdgeare SuIsN JJ-YOSN peseq pues 
VON 8 JO voneztorfered SW PuqAH 
uondinsəq 


“SUOIUYOPp 1Y} pue 2194 pasn sjoquIAS IY} JO IST B SƏPNPOUI ZS AQEL 


vano 
D TPOSTA 
V/N 
IdN/D 
+49 
IdW/9 
#9 


V/N 
av ILVN 


V/N 


esef 
V/N 
V/N 
V/N 


esef 


V/N 
ATOWUedO 
VIN 
VIN 


V/N 


V/N 
V/N 


Id N/++) 
soovdgrare 


Id W/At+) 


WIS 
WIS 
WIS 
SUN 
VTV 
WIS 
WIS 


IS 
WIS 


SUN 


VTV 
WIS 
[OS] 
VvV 


IIS 


V/N 
V/N 
V/N 
V/N 


PHD 


V/N 
IS 


reg 
SUN 
IIS 


summueis01g Asojodoy, 


NdD 
SIONAL 
ISNI 
30MIN 
IOMMIN 
IOMMIN 
IOMMIN 


IASNJO 
Jaysni[O 


bas 


IOMMIN 
JaysnfO 
V/N 

V/N 


JOST NYY 


IOMMIN 
PHD 
V/N 
V/N 


bas 


bas 
Ja\sni[O 


ISN 
IION 
Ja\sn[O 
dd 


ddd 
dd 
dd 
ddd 
ddd 
ddd 
ddd 


ddd 
ddd 


ddd 


ddd 
ddO 
did 
ddd 


Ado 


ddd 
ddd 
ddd 
dd) 


ddd 


ddd 
ddd 


ddd 
did 
ddd 
Od 


SW 
SW 
SW 

q4H 
sid 
sid 
SW 


SIN 
SIN 


SICT 


sid 
SW 
sid 
sid 


sid 


sid 
sid 
SKA 
SW 


LLE®) 


ILEO) 
SIN 


sid 
sid 
SW 


od 
dda-Val 


I10¢ 
1107 
I10¢ 
1107 
T10¢ 
I10¢ 
T10¢ 


I10¢ 
I10¢ 


1107 


I10¢ 
1107 
OTOT 
OTOT 


O10c 


oroz 
O10c 
OTOT 
OTOT 


O10c 


oroz 
O10c 


OTOT 
oTOT 
O10c 


[76 “€6'0S] 

[76'0S] 
[16'0S] 
1060S] 
[68'0S] 
[33'0S] 


[L8'0S] 


[930s] 


[sgos] 


[78'0S] 


[0z'0S] 
[esos] 
[610s] 
[zs 0s] 


[18 “08°0S] 


[60S] 
[szos] 
lLL'os] 
[LL'0S] 


[90S] 


[scos] 
[bL os] 


lez'os] 
[ZZ ILOs] 
[04 “69°0S] 


IVA SVWIPYA 


(ponuguov) 10g a1geL 


Parallel Multiobjective Evolutionary Algorithms 


50.3 An Updated Review of the Literature 


50.3.1 Analysis by Year 


The first point of analysis of the published material is 
done with respect to the number of publications over 
the years considered in this chapter, i.e., the period 
between 2008 and 2011. Figure 50.2 displays this infor- 
mation not only for the period analyzed in this chapter, 


Table 50.2 List of symbols used in Table 50.1 


Column Symbol Definition 
FA-DP R Ranking 
RS Ranking and sharing 
RC Ranking and crowding 
SRF Strength raw fitness 
WS Weighted sum (aggregation) 
Ib Indicator-based 
PT Pareto tournaments 
E€ Epsilon dominance 
Tcheb Tchebycheff aggregation 
PM MS Master/slave model 
Dis Distributed model 
Cell Cellular model 
Hyb Hybrid model 
PFC CPF Centralized Pareto front 
DPF Distributed Pareto front 
PP Seq Sequential algorithm 
GPU Graphics processing unit 
Topology A2A All-to-All 
Rand Random 
Isol Isolated 
Eucl Euclidean 
Hier Hierarchical 
Hyb Hybrid topology 
Programming PVM Parallel virtual machine 
MPI Message Passing Interface 
Mpich-G2 An MPI implementation for 
grid computing 
DEVS Discrete event system 
SOA Service-oriented architecture 
OpenMP Open multiprocessing 
CUDA Compute unified device 
architecture 
OpenMOLE Open MOdeL Experiment 
Description MS Master/slave 
MO Multiobjective 
SO Single-objective 
DE Differential evolution 
dNSGA-II Distributed NSGA-II 
Tech. techniques 
Async. Asynchronous 
Het. Heterogeneous 
- N/A Not available 


but also for the period 1993-2005 presented in [50.14]. 
The trend is fairly clear: this research topic has been 
active during the last few years. Indeed, if one com- 
pares this evolution with that presented in [50.14] by 
2006, where the highest number of works per year was 
10, it can be seen that the published material is dou- 
bled (more than 20 publications/year in 2009, 2010, 
and 2011). Despite the relative lack of novel, attractive 
approaches in the field, parallelism remains as a power- 
ful tool in the EMO community because of one major 
factor: the optimization problems addressed require to 
reduce the execution times to affordable values. This is 
emphasized with the current availability of cheap par- 
allel computing platforms such as multicore processors 
and, lately, GPUs. Indeed, the keyword multicore in col- 
umn PP in Table 50.1 is the second that appears the 
most. 


50.3.2 Analysis of the Parallel Models 


In this section, the different contributions are analyzed 
from the point of view of the characteristics of the 
parallel model. We will pay particular attention to the 
columns FA-DP, PM, PFC, and Topology. 

The fitness assignment and diversity preservation 
is a major issue in parallel MOEAs because, in many 
cases, the Pareto front is spread between different sub- 
algorithms (especially in the distributed models). The 
management of optimal solutions (via fitness assign- 
ment) and how they are distributed along the Pareto 
front (diversity) deserves a brief review. The FA-DP 
column shows that the Ranking and Crowding mech- 
anisms inherited from the most widely used algorithm 
in the area, namely NSGA-II [50.95], are also the most 
present in the literature as long as NSGA-II is the 
base algorithm for many of these parallel MOEAs. 
In the case of the distributed and cellular models, it 
is worth mentioning that Crowding is applied locally, 
i.e., diversity if kept within the same subalgorithm. 
If no advanced mechanism is devised to partition the 
search space (such as in [50.96,97]), the algorithm 
will be accepting/discarding solutions that should prob- 
ably be in the same region as those computed by 
the other subalgorithm components. The same hap- 
pens with classical FA-DP methods such as the strength 
raw fitness (SRF) or the indicator-based (IB) in col- 
umn FA-DP, respectively). As a final note, we strongly 
believe that these algorithms based on decomposition 
such as MOEA/D [50.98] are especially well suited to 
profit from parallel platforms. Indeed, they are based on 
decomposing the multiobjective problems into a num- 


1023 


€°0S | J Hed 


1024 PartE 


Evolutionary Computation 


€°0S | 3 Hed 


Number of publications 


FI 1993-2005 
FI 2008-2011 


Fig. 50.2 Number of publications on 
parallel MOEAs grouped by the year 
of publication in the periods 1993— 
2005 (after [50.14]) and 2008-2011 
(this chapter) 


0 
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 <-> 2008 2009 2010 2011 


ber of scalar subproblems that are distributed along the 
Pareto front. Therefore, the partition of the search space 
is implicitly done and they take full advantage of mul- 
ticore processors. The texts [50.80, 81] follow this line 
of research. 

If one analyzes the usage of the different paral- 
lel models in the revised publications (Fig. 50.3), i. e., 
MS, distributed (Dis), and cellular (Cell), several clear 
conclusions can be drawn. First, despite the simplic- 
ity of the MS model, it appears in almost half of the 
related literature (45%). Multiobjective optimization 
problems are becoming more and more complex and 
demand high-end computational resources, what makes 
this approach very suitable in this context. Indeed, the 
underlying search model remains unchanged (entirely 
located at the master process), because authors are usu- 
ally only interested at speeding up the computations. 
Second, the distributed models are still receiving much 
attention from the multiobjective community (47% of 
the analyzed publications use this model) in the quest 
for engineering an improved algorithm that reaches the 
Pareto fronts in a more effective way (not only reducing 
execution times). The promising results published with 


Fig. 50.3 Percentage of use of the different parallel 
MOEA models in the revised literature 


their single objective counterparts [50.7] have pushed. 
forward the research on this area. The major issue with 
distributed models arises with the difficult management 
of the Pareto front, as stated in Sect. 50.2.1 and as will 
be discussed below. Finally, a special comment about 
the cellular model: even though it is to be exploited 
in future literature, its percentage of use has doubled 
since the previous literature review in 2006 [50.14]. 
The point is that these algorithms are usually exe- 
cuted sequentially, with no parallelism at all, because 
they were originally targeted to massively parallel ma- 
chines, and these kind of machines fell into disuse. 
There are, of course, some exceptions such as [50.46], 
where a cellular-like MOFA for aerodynamic optimiza- 
tion is deployed on a cluster of computers. 

The PFC column is a hot topic in parallel MOEAs. 
Handling the nondominated solutions found during the 
search, when the search is distributed in probably sep- 
arate processors, has promoted and is still promoting 
fundamental research within the community. The most 
widely used strategy, however, is to keep a central 
pool of nondominated solutions (CPF), that is, there is 
one single front. This approach appears in 55% of the 
analyzed publications and totally matches both the mas- 
ter/slave and the cellular parallel models. This design 
option is straightforward and makes common sense. 

Almost the same happens with the DPF strategy 
(45% of the papers), which is mostly used with the dis- 
tributed model of parallel MOEAs: the Pareto front is 
approximated separately for each of the subalgorith- 
mic components during the search, only merged into 
one single front at the end of the exploration. In gen- 
eral, all DPF strategies are complemented with eventual 
CPF phases, which allows the search of the different 
subalgorithms to get overlapped [50.11]. A couple of 
exceptions are to be found among the revised litera- 
ture, in which a distributed computation of the Pareto 


Parallel Multiobjective Evolutionary Algorithms 


50.3 An Updated Review of the Literature 


front is endowed with a fully centralized Pareto front. 
In [50.50], a distributed MOEA has one single Pareto 
front that is computed with several isolated islands. 
Each island uses a weighted sum approach (with differ- 
ent weights) and is targeted to one region of the search 
space. The second exception appears in [50.80, 81], in 
which a multithreaded parallelization of MOEA/D is 
presented. The Pareto front is stored in global mem- 
ory and the different threads are in charge of separate 
groups of weight combinations. 

The final aspect under analysis in this section is the 
interconnection topology of the different components of 
the parallel algorithms. The Star topologies are widely 
used in two scenarios: (i) master/slave models and (ii) 
in distributed models with periodic gathering opera- 
tions required to generate a single Pareto front. A star 
topology is able to capture the idea of topologies with 
a central master that delivers tasks to a set of worker 
nodes and this is why it is so popular in these two pre- 
vious cases. The column Topology in Table 50.1 also 
reveals that All-to-All (A2A) and Random topologies 
also exist in the literature. The former enables the dif- 
ferent components of the parallel algorithm to be tightly 
coupled, thus quickly spreading the nondominated solu- 
tions found for a faster convergence toward the optimal 
Pareto front. The later implies that the genetic material 
may take longer to reach all the algorithmic compo- 
nents, thus promoting diversity. 


50.3.3 Review of the Software 
Implementations 


This section is mainly targeted to summarize the con- 
tents of the column Programming in Table 50.1, in 
which a note on the implementation of the algorithms 
is given. A quick look at the items of the column 
clearly states that the combination of C/C++ as the 
programming language and MPI (Message Passing In- 
terface) [50.99] as the technology for enabling the 
parallel communication between the different com- 
ponents of the parallel algorithms are the preferred 
options. This can be explained by the strong engineer- 
ing background of most of the MOPs addressed (and 
researchers), a field in which C/C++ has had a dom- 
inant position for many years. Indeed, C/C++ allows 
researchers to include very low level routines (even as- 
sembler code) that enable full control of all parts of their 
applications. MPI, in turn, is a standard (not just a li- 
brary) for which many implementations exist (MPICH, 
LAM, etc.), so its use always guarantee correctness and 
efficiency. 


Despite this clear fact, only two novels, relevant 
trends on this topic that are worth mentioning in de- 
tail can be found. On the one hand, even though 
clusters of computers are able to provide researchers 
with a large computational power, there are MOPs 
that require still more additional resources. These re- 
sources can only be supplied by grid computing plat- 
forms [50.100]. This has promoted the parallelization of 
EMO algorithms with grid-enabled technologies such 
as MPICH-G2 [50.40] or MatGrid [50.34]. On the 
other hand, there already are several seminal works 
on the parallelization of multiobjective optimizers in 
GPUs, as stated in Sect. 50.3.1. To the best of our 
knowledge, only implementations with C and CUDA 
(compute unified device architecture) [50.101] have 
been proposed in [50.66, 93], but nowadays other op- 
portunities have also emerged such as, for example, 
OpenCL [50.102]. 


50.3.4 Main Application Domains 


One of the main reasons, if not the main one for 
the popularity of MOEAs, is their success in solv- 
ing real-world problems. Parallel EMO algorithms are 
no exception. As a consequence, the variability in the 
application domains is very large, which makes the 
task of classification rather difficult. By partially fol- 
lowing the categorization proposed in [50.18], three 
main areas of application are distinguished: engineer- 
ing, industrial, and scientific. Figure 50.4 summarizes 
the percentage of revised publications that fall into 
each area. Besides these three categories, we have also 
displayed in this figure a fourth item devoted to bench- 


Engineering 
35% AN 


Benchmarking << 


Industrial 


Fig. 50.4 Application domains 


X Scientific 


1025 


€°0S | J Hed 


1026 


70S |3 Hed 


Part E 


Evolutionary Computation 


marking. This latter is not an application but it appears 
a lot in the revised literature. There are well-established 
testbeds such as Zitzler-Deb-Thiele (ZDT) [50.103], 
Deb-Thiele-Laumanns-Zitzler (DTLZ) [50.104], or 
the walking fish group (WFG) [50.105], that have been 
widely used as a comparison basis for introducing new 
algorithmic proposals. 

Among the real-world applications, engineering ap- 
plications are, by far, the most popular domain within 
the parallel EMO literature (also in the entire EMO 
field), principally because they usually have suitable 
mathematical models for this kind of algorithms. In- 
deed, Fig. 50.4 shows that almost 35% of the papers 
analyzed address an optimization problem from the 
engineering domain. Several relevant works among 
those analyzed are devoted to aerodynamic shape opti- 
mization [50.27, 30, 46], reconfiguration of operational 
satellite constellations [50.41], or the combustion con- 
trol for different types of vehicles [50.85, 91]. 

The second place in terms of popularity is occupied 
by the industrial applications, that appear in 21% of the 
papers reviewed. These applications are related to the 
fields of manufacturing, scheduling, and management. 


50.4 Conclusions and Future Works 
50.4.1 Summary 


In this chapter, we have carried out a comprehensive 
survey of the literature concerning parallel MOEAs 
since 2008/2009, the year when two of the most 
well-known comprehensive surveys were published. 
We have first described the existing parallel models 
for MOEAs, distinguishing between those specifically 
targeted at EAs and those aimed at capturing the 
essence of parallel metaheuristics in general. Based 
on the former model (as long as we are interested 
in surveying parallel MOEAs), more than 80 relevant 
papers have been carefully analyzed (many dozens 
more studied but left out because of little relevance 
to this survey). Fundamental aspects such as the fit- 
ness assignment and the diversity preservation, the 
parallel model used, the management of the approx- 
imated Pareto front, the underlying parallel platform 
used (if any), and the communication topology of the 
algorithms have been revised. Their main application 
domains have been gathered and structured into engi- 
neering, industrial, and scientific real-world multiobjec- 
tive problems. 


Very interesting problems have been addressed in this 
industrial domain, such as the optimization of sonic 
crystal attenuation properties [50.52], the Camembert 
cheese ripening process [50.78], and evaluation of the 
input tax to regulate pollution from agricultural produc- 
tion [50.65]. 

Scientific applications, the third category of real- 
world applications analyzed (16% of the papers), is in- 
tended to group optimization problems in bioinformat- 
ics, chemistry, and computer science. The most success- 
ful applications here are devoted to the bioinformatic 
and chemistry fields, in problems on molecular dock- 
ing [50.33], protein design [50.40], drug design [50.56], 
and phylogenetic inference [50.69, 70]. In our view, the 
reason these applications (either engineering, industrial 
or scientific) have been tackled with parallel MOEAs 
is precisely the tasks involved to compute their objec- 
tive functions. When a new enhanced parallel search 
model is sought, then authors usually rely on bench- 
marking functions. Indeed, as stated above, 30% of 
the revised papers (Fig. 50.4) use these testbeds either 
exclusively [50.61, 68, 82, 94] or prior to solving a real- 
world application [50.5, 20, 51]. 


Despite the lack of new, attractive ideas in the area, 
this survey has revealed that the research on parallel 
MOFAs is still moving forward for two main reasons. 
On the one hand, researchers think of parallelism as 
a way to not only speed up computations, but also as 
a strategy to enhance the search engines of the algo- 
rithms. On the other hand, the computational demands 
of many multiobjective problems (dimensionality, un- 
certainty, simulations, etc.) means that the parallelism 
is the only suitable option to address them. 

In the following section, some topics for future re- 
search for engineers interested in parallel MOEAs are 
outlined. In the authors’ opinion, these topics merit par- 
ticular attention so that the current state-of-the-art can 
be improved. 


50.4.2 Future Trends 


We have structured the future trends section as 
a bottom-up approach, proposing research lines for par- 
allel MOEAs that range from low-level algorithmic 
details to more complex enhanced strategies at a higher 
level. We will also suggest that different studies that are 


Parallel Multiobjective Evolutionary Algorithms | References 


missing in the literature need to be carried out and, in 
our opinion, once completed, might have a great impact 
within the community. 

Let us start with one of the main issues of parallel 
MOEAs which is such that, in the end, the algorithms 
have to compute one single approximation to the Pareto 
front, namely, they have an inherent centralized struc- 
ture. When several subalgorithms cooperate to approx- 
imate such a Pareto front (i. e., the distributed models), 
they may coordinate themselves somehow in order to 
effectively sample different regions of the front in order 
to avoid overlaps. These overlaps are the main rea- 
son for the distributed algorithms being outperformed 
by centralized approaches (such as the master/slave). 
This issue has been partially addressed in the litera- 
ture [50.96, 97] with some degree of success. However, 
these cited results have not been widely used yet and 
new advances are needed. 

We propose here two lines of research that might 
help the distributed models of parallel MOEAs to over- 
come this issue. The first one is based on designing 
a fully distributed diversity preservation method (e.g., 
a distributed crowding). The research question here is 
whether it is possible to devise a density estimator that 
considers both local and global information from the 
other components of the distributed MOEA. We think 
so. Instead of trying to allocate a given portion of the 
Pareto front to each island, these islands should period- 
ically broadcast a list with the objective values of their 
local solutions (but not the decision variables). Then, 
when checking whether a solution is to be stored in its 
local Pareto front, the island has to consider both the 
local information and the global information received 
from the other islands. To the best of our knowledge, 
such a mechanism does not exist in the literature. The 
second proposal is to use rough set theory [50.106] to 
effectively partition the search space and allocate dif- 
ferent portions to different components of the parallel 
MOEA. There does exist a preliminary work [50.107] 
that uses a multiobjective simulated annealing, and 


References 


50.1 C.A. Coello Coello, D.A. Van Veldhuizen, G.B. La- 
mont: Evolutionary Algorithms for Solving Multi- 
Objective Problems (Kluwer, Boston 2002) 

50.2 K. Deb: Multi-Objective Optimization Using Evo- 
lutionary Algorithms (Wiley, New York 2001) 

50.3 R.R. Coelho, P. Bouillard: Multi-objective relia- 
bility-based optimization with stochastic meta- 
models, Evol. Comput. 19(4), 525-560 (2011) 


these ideas will have a great impact on the design of 
parallel MOFAs. 

On a higher algorithmic level, this survey has also 
revealed that, despite their advanced search model 
and accurate results on many benchmarking func- 
tions [50.108], cellular models of parallel MOEAs have 
been marginally used in most of the application do- 
mains. We strongly believe that these algorithms might 
improve the state-of-the-art in many of these unex- 
plored research areas. On a related matter, designing 
heterogeneous algorithms [50.17] is also a line of re- 
search with a high potential. Indeed, few contributions 
in the literature have considered parallel heterogeneous 
algorithms for multiobjective optimization [50.59]. To 
the best of our knowledge, there is no published anal- 
ysis on the impact of heterogeneity (i. e., several dif- 
ferent subalgorithms cooperating, namely, NSGA-II, 
SPEA2, PAES, MOEA/D, etc.) while approximating 
Pareto fronts. Our experience with these totally dif- 
ferent algorithms is that each one is usually better 
suited to explore given regions of the search space, 
so their collaboration may result in a newly improved 
algorithm. 

Finally, the last group of open research lines are de- 
voted to well-grounded studies on the behavior of paral- 
lel MOEAs concerning their scalability with respect to 
the number of decision variables or their convergence 
speed toward the optimal Pareto front, as done for se- 
quential MOBFAs in [50.109, 110], respectively. Again, 
the influence of heterogeneity on these two search ca- 
pabilities of parallel MOEAs is of interest. 

Last but not least, there is a distinct lack of theo- 
retical work in this area. For example, to the best of 
our knowledge, there is no analysis of the takeover 
time [50.111] in the multiobjective context, and we 
believe that characterizing parallel MOEAs based on 
this metric may help researchers in future develop- 
ments. The difficulty of the landscapes in MOPs and 
the relative theoretical advantages for different types of 
problems are open research lines. 


50.4 T. Goel, R. Vaidyanathan, R. Haftka, W. Shyy: 
Response surface approximation of Pareto opti- 
mization front in multi-objective optimization, 
10th AIAA/ISSMO Multidiscip. Anal. Optim. Conf. 
(2004) 

50.5 A. Syberfeldt, H. Grimm, A. Ng, R.I. John: A par- 
allel surrogate-assisted multi-objective evolu- 
tionary algorithm for computationally expensive 


1027 


0S | 3 Hed 


1028 PartE 


Evolutionary Computation 


0S | 3 Hed 


50.6 


50.7 


50.8 


50.9 


50.10 


50.11 


50.12 


50.13 


50.14 


50.15 


50.16 


50.17 


50.18 


50.19 


50.20 


optimization problems, IEEE Congr. Evol. Comput. 
(2008) pp. 3177-3184 

E. Alba: Parallel Metaheuristics: A New Class of Al- 
gorithms (Wiley, New York 2005) 

E. Alba, M. Tomassini: Parallelism and evolution- 
ary algorithms, IEEE Trans. Evol. Comput. 6(5), 
443-462 (2002) 

E. Alba, J.M. Troya: A Survey of parallel dis- 
tributed genetic algorithms, Complexity 4(4), 31- 
52 (1999) 

E. Cantu-Paz: Efficient and Accurate Parallel Ge- 
netic Algorithms (Kluwer, New York 2000) 

G. Luque, E. Alba: Parallel Genetic Algorithms: 
Theory and Real World Applications (Springer, 
Berlin, Heidelberg 2011) 

A. Lopez-Jaimes, C.A. Coello Coello: Applications 
of parallel platforms and models in evolutionary 
multi-objective optimization. In: Biologically- 
Inspired Optimisation Methods, ed. by A. Lewis, 
S. Mostaghim, M. Randall (Springer, Berlin, Hei- 
delberg 2009) pp. 23-29 

E.-G. Talbi, S. Mostaghim, T. Okabe, H. Ishibuchi, 
G. Rudolph, C.A. Coello Coello: Parallel approaches 
for multiobjective optimization, Lect. Notes Com- 
put. Sci. 5252, 349-372 (2008) 

A.J. Chipperfield, P.J. Fleming: Parallel genetic al- 
gorithms. In: Parallel and Distributed Computing 
Handbook, ed. by A.Y. Zomaya (McGraw Hill, New 
York 1996) pp. 1118-1143 

F. Luna, A.J. Nebro, E. Alba: Parallel evolutionary 
multiobjective optimization. In: Parallel Evolu- 
tionary Computations, ed. by N. Nedjah, E. Alba, 
L. de Macedo (Springer, Berlin, Heidelberg 2006) 
pp. 33-56, Chapter 2 

D.A. Van Veldhuizen, J.B. Zydallis, G.B. Lamont: 
Considerations in engineering parallel multiob- 
jective evolutionary algorithms, IEEE Trans. Evol. 
Comput. 87(2), 144-173 (2003) 

A.J. Nebro, F. Luna, E.-G. Talbi, E. Alba: Parallel 
multiobjective optimization. In: Parallel Meta- 
heuristics, ed. by E. Alba (Wiley, New York 2005) 
pp. 371-394 

F. Luna, E. Alba, A.J. Nebro: Parallel heteroge- 
neous metaheuristics. In: Parallel Metaheuristics, 
ed. by E. Alba (Wiley, New York 2005) pp. 395- 
422 

C.A. Coello Coello, G.B. Lamont, D.A. Van Veld- 
huizen: Evolutionary Algorithms for Solving 
Multi-Objective Problems, Genetic and Evolu- 
tionary Computation (Springer, Berlin, Heidelberg 
2007) 

E. Rashidi, M. Jahandar, M. Zandieh: An improved 
hybrid multi-objective parallel genetic algorithm 
for hybrid flow shop scheduling with unrelated 
parallel machines, Int. J. Adv. Manuf. Technol. 49, 
1129-1139 (2010) 

B. Dorronsoro, G. Danoy, P. Bouvry, A.J. Nebro: 
Multi-objective cooperative coevolutionary evo- 
lutionary algorithms for continuous and com- 


50.21 


50.22 


50.23 


50.24 


50.25 


50.26 


50.27 


50.28 


50.29 


50.30 


50.31 


50.32 


50.33 


binatorial optimization. In: Intelligent Decision 
Systems in Large-Scale Distributed Environments, 
Studies in Computational Intelligence, Vol. 362, 
(Springer, Berlin, Heidelberg 2011) pp. 49-74 

T.G. Crainic, M. Toulouse: Parallel strategies for 
metaheuristics. In: Handbook of Metaheuristics, 
ed. by F.W. Glover, G.A. Kochenberger (Kluwer, 
Boston 2003) 

V.-D. Cung, S.L. Martins, C.C. Ribeiro, C. Rou- 
cairol: Strategies for the parallel implementation 
of metaheuristics. In: Essays and Surveys in Meta- 
heuristics, ed. by C.C. Ribeiro, P. Hansen (Kluwer, 
Boston 2003) pp. 263-308 

L.F. Gonzalez: Robust Evolutionary Methods for 
Multi-objective and Multidisciplinary Design in 
Aeronautics, Ph.D. Thesis (University of Sydney, 
Sydney 2005) 

D.S. Lee, L.F. Gonzalez, J. Periaux, G. Bugeda: 
Double-shock control bump design optimization 
using hybridized evolutionary algorithms, Proc. 
Inst. Mech. Eng. G: J. Aerosp. Eng. (2011) pp. 1175- 
1192 

D.S. Lee, L.F. Gonzalez, J. Periaux, K. Srinivas: Evo- 
lutionary optimisation methods with uncertainty 
for modern multidisciplinary design in aeronau- 
tical engineering, Notes Numer. Fluid Mech. Mul- 
tidiscip. Des. 100, 271-284 (2009) 

D.S. Lee, L.F. Gonzalez, J. Periaux, K. Srinivas: 
Efficient hybrid-game strategies coupled to evo- 
lutionary algorithms for robust multidisciplinary 
design optimization in aerospace engineering, 
IEEE Trans. Evol. Comput. 15(2), 133-150 (2011) 

D.S. Lee, L.F. Gonzalez, J. Periaux, K. Srini- 
vas, E. Onate: Hybrid-game strategies for multi- 
objective design optimization in engineering, 
Comput. Fluids 47, 189-204 (2011) 

D.S. Lee, L.F. Gonzalez, K. Srinivas, J. Periaux: Ro- 
bust design optimisation using multi-objective 
evolutionary algorithms, Comput. Fluids 37(5), 
565-583 (2008) 

D.S. Lee, L.F. Gonzalez, K. Srinivas, J. Periaux: Ro- 
bust evolutionary algorithms for UAV/UCAV aero- 
dynamic and RCS design optimisation, Comput. 
Fluids 37(5), 547-564 (2008) 

D.S. Lee, J. Periaux, L.F. Gonzalez, K. Srinivas, 
E. Onate: Robust multidisciplinary UAS design op- 
timisation, Struct. Multidiscip. Optim. 45(3), 433- 
450 (2012) 

D.S. Lee, J. Periaux, E. Onate, L.F. Gonzalez, 
N. Qin: Active transonic aerofoil design optimiza- 
tion using robust multiobjective evolutionary al- 
gorithms, J. Aircr. 48(3), 1084-1094 (2011) 

J.-C. Boisson, L. Jourdan, E.-G. Talbi, D. Hor- 
vath: Parallel multi-objective algorithms for the 
molecular docking problem, IEEE Symp. Comput. 
Intell. Bioinform. Comput. Biol. (2008) pp. 187- 
194 

J.-C. Boisson, L. Jourdan, E.-G. Talbi, D. Horvath: 
Single- and multi-objective cooperation for the 


Parallel Multiobjective Evolutionary Algorithms 


References 


50.34 


50.35 


50.36 


50.37 


50.38 


50.39 


50.40 


50.41 


50.42 


50.43 


50.44 


50.45 


flexible docking problem, J. Math. Model. Algo- 
rith. 9, 195-208 (2010) 

G. Ewald, W. Kurek, M.A. Brdys: Grid implementa- 
tion of a parallel multiobjective genetic algorithm 
for optimized allocation of chlorination stations 
in drinking water distribution systems: Chojnice 
case study, IEEE Trans. Syst. Man Cybern. C: Appl. 
Rev. 38(4), 497-509 (2008) 

J.J. Durillo, A.J. Nebro, F. Luna, E. Alba: Solv- 
ing three-objective optimization problems using 
a new hybrid cellular genetic algorithm, Lect. 
Notes Comput. Sci. 5199, 661-670 (2008) 

C. Leon, G. Miranda, E. Segredo, C. Segura: Parallel 
hypervolume-guided hyperheuristic for adapting 
the multi-objective evolutionary island model, 
Nat. Inspir. Coop. Strat. Optim. (2009) pp. 261- 
272 

C. Leon, G. Miranda, C. Segura: A self-adaptive 
island-based model for multi-objective opti- 
mization, Genet. Evol. Comput. Conf. (2008) 
pp. 757-758 

C. Leon, G. Miranda, C. Segura: Hyperheuristics 
for a dynamic-mapped multi-objective island- 
based model, Lect. Notes Comput. Sci. 5518, 41-49 
(2009) 

C. Leon, G. Miranda, C. Segura: Optimizing the 
configuration of a broadcast protocol through 
parallel cooperation of multi-objective evolu- 
tionary algorithms, Int. Conf. Adv. Eng. Comput. 
Appl. Sci. (2008) pp. 135-140 

P. Liu, S. Dong: Parallel multi-objective GA based 
rotamer optimization on grid, Int. Coll. Comput. 
Comm. Control. Manag. (CCCM) (2008) pp. 238-241 
M.P. Ferringer, D.B. Spencer, P. Reed: Many- 
objective reconfiguration of operational satel- 
lite constellations with the large-cluster epsilon 
non-dominated sorting genetic algorithm II, IEEE 
Congr. Evol. Comput. (2009) pp. 340-349 

P.M. Reed, J.B. Kollat, M.P. Ferringer, T.G. Thomp- 
son: Parallel evolutionary multi-objective opti- 
mization on large, heterogeneous clusters: An 
applications perspective, J. Aerosp. Comput. Inf. 
Commun. 5, 460-478 (2008) 

J.L. Risco-Martin, D. Atienza, J.I. Hidalgo, J. Lan- 
chares: A parallel evolutionary algorithm to opti- 
mize dynamic data types in embedded systems, 
Soft Comput. 12, 1157-1167 (2008) 

J.L. Risco-Martin, D. Atienza, J.I. Hidalgo, J. Lan- 
chares: Parallel and distributed optimization of 
dynamic data structures for multimedia embed- 
ded systems. In: Parallel and Distributed Compu- 
tational Intelligence, ed. by F.F. Vega, E. Cantú- 
Paz (Springer, Berlin, Heidelberg 2010) pp. 263- 
290 

D. Sharma, K. Deb, N.N. Kishore: Towards gener- 
ating diverse topologies of path tracing compliant 
mechanisms using a local search based multi- 
objective genetic algorithm procedure, IEEE Congr. 
Evol. Comput. (2008) pp. 2004-2011 


50.46 


50.47 


50.48 


50.49 


50.50 


50.51 


50.52 


50.53 


50.54 


50.55 


50.56 


50.57 


50.58 


V.G. Asouti, K.C. Giannakoglou: Aerodynamic op- 
timization using a parallel asynchronous evolu- 
tionary algorithm controlled by strongly interact- 
ing demes, Eng. Optim. 41(3), 241-257 (2009) 

S. Bharti, M. Frecker, G. Lesieutre: Optimal 
morphing-wing design using parallel nondom- 
inated sorting genetic algorithm II, AIAA J. 47(7), 
1627-1634 (2009) 

M. Camara, J. Ortega, F. de Toro: A single front 
genetic algorithm for parallel multi-objective op- 
timization in dynamic environments, Neurocom- 
puting 72, 3570-3579 (2009) 

M. Camara, J. Ortega, F. de Toro: Approaching dy- 
namic multi-objective optimization problems by 
using parallel evolutionary algorithms. In: Ad- 
vances in Multi-Objective Nature Inspired Com- 
puting, ed. by C.A. Coello Coello, C. Dhaenes, 
L. Jourdan (Springer, Berlin, Heidelberg 2010) 
pp. 63-86 

P.-C.S.-H. Chand Chen: The development of 
a sub-population genetic algorithm ii (SPGA II) for 
multi-objective combinatorial problems, Appl. 
Soft Comput. 9, 173-181 (2009) 

J. Bader, D. Brockhoff, S. Welten, E. Zitzler: On 
using populations of sets in multiobjective opti- 
mization, Lect. Notes Comput. Sci. 5467, 140-154 
(2009) 

J.M. Herrero, S. Garcia-Nieto, X. Blasco, V. Rome- 
ro-Garcia, J.V. Sanchez-Perez, L.M. Garcia-Raffi: 
Optimization of sonic crystal attenuation prop- 
erties by ev-MOGA multiobjective evolutionary 
algorithm, Struct. Multidiscip. Optim. 39, 203-215 
(2009) 

H. Ishibuchi, Y. Sakane, N. Tsukamoto, Y. No- 
jima: Effects of using two neighborhood struc- 
tures on the performance of cellular evolutionary 
algorithms for many-objective optimization, IEEE 
Congr. Evol. Comput. (2009) pp. 2508-2515 

H. Ishibuchi, Y. Sakane, N. Tsukamoto, Y. Nojima: 
Implementation of cellular genetic algorithms 
with two neighborhood structures for single- 
objective and multi-objective optimization, Soft 
Comput. 15, 1749-1767 (2011) 

N. Jozefowiez, F. Semet, E.-G. Talbi: An evolu- 
tionary algorithm for the vehicle routing problem 
with route balancing, Eur. J. Oper. Res. 195, 761- 
769 (2009) 

C.C. Kannas, C.A. Nicolaou, C.S. Pattichis: A par- 
allel implementation of a multi-objective evolu- 
tionary algorithm, 9th Int. Conf. Inform. Technol. 
Appl. Biomed. (2009) pp. 1-6 

C. Leon, G. Miranda, E. Segredo, C. Segura: Par- 
allel library of multi-objective evolutionary al- 
gorithms, 17th Euromicro Int. Conf. IEEE (2009) 
pp. 28-35 

C. Leon, G. Miranda, C. Segura: METCO: A paral- 
lel plugin-based framework for multi-objective 
optimization, Int. J. Artif. Intell. Tools 18(4), 569- 
588 (2009) 


1029 


0S | 3 Hed 


1030 PartE 


Evolutionary Computation 


0S | 3 Hed 


50.59 


50.60 


50.61 


50.62 


50.63 


50.64 


50.65 


50.66 


50.67 


50.68 


50.69 


50.70 


50.71 


50.72 


A. Rama Mohan Rao: Distributed evolutionary 
multi-objective mesh-partitioning algorithm for 
parallel finite element computations. Comput, 
Struct. 87(3), 1469-1473 (2009) 

C. Segura, A. Cervantes, A.J. Nebro, M.D. Jaraiz- 
Simon, E. Segredo, S. Garcia, F. Luna, J.A. GGmez- 
Pulido, G. Miranda, C. Luque, E. Alba, M.Á. Vega- 
Rodríguez, C. León, I.M. Galván: Optimizing the 
DFCN broadcast protocol with a parallel coop- 
erative strategy of multi-objective evolutionary 
algorithms, Lect. Notes Comput. Sci. 5467, 305- 
319 (2009) 

E. Szlachcic, W. Zubik: Parallel distributed genetic 
algorithm for expensive multi-objective opti- 
mization problems, Lect. Notes Comput. Sci. 5717, 
938-946 (2009) 

N. Wang, C.-M. Tsai, K.-C. Cha: Optimum design 
of externally pressurized air bearing using cluster 
OpenMP, Tribol. Int. 42, 180-1186 (2009) 

T. Qiu, G. Ju: A selective migration parallel multi- 
objective genetic algorithm, Chin. Control Decis. 
Conf. (2010) pp. 463-467 

Z.X. Wang, G. Ju: A parallel genetic algorithm in 
multi-objective optimization, Chin. Control Decis. 
Conf. (2009) pp. 3497-3501 

G. Whittaker, R. Confesor Jr., S.M. Griffith, R. Fare, 
S. Grosskopf, J.J. Steiner, G.W. Mueller-Warrant, 
G.M. Banow: A hybrid genetic algorithm for mul- 
tiobjective problems with activity analysis-based 
local search, Eur. J. Oper. Res. 193, 195-203 (2009) 
M.L. Wong: Parallel multi-objective evolutionary 
algorithms on graphics processing units, Genet. 
Evolut. Comput. Conf. (2009) pp. 2515-2522 

M.L. Wong: Data mining using parallel multi- 
objective evolutionary algorithms on graphics 
hardware, IEEE Congr. Evol. Comput. (2010) pp. 1- 
8 

A.A. Montaño, C.A. Coello Coello, E. Mezura- 
Montes: pMODE-LD + SS: An effective and effi- 
cient parallel differential evolution algorithm for 
multi-objective optimization, Lect. Notes Com- 
put. Sci. 6239, 21-30 (2010) 

W. Cancino, L. Jourdan, E.-G. Talbi, A.C.B. Del- 
bem: Parallel multi-objective approaches for in- 
ferring phylogenies, Lect. Notes Comput. Sci. 
6023, 26-37 (2010) 

W. Cancino, L. Jourdan, E.-G. Talbi, A.C.B. Del- 
bem: Parallel multi-objective evolutionary al- 
gorithm for phylogenetic inference, Lect. Notes 
Comput. Sci. 6073, 196-199 (2010) 

D. Becerra, A. Sandoval, D. Restrepo-Montoya, 
L.F. Nino: A parallel multi-objective ab initio 
approach for protein structure prediction, IEEE 
Int. Conf. Bioinform. Biomed. (2010) pp. 137- 
141 

D. Dasgupta, D. Becerra, A. Banceanu, F. Nino, 
J. Simien: A parallel framework for multi- 
objective evolutionary optimization, IEEE Congr. 
Evol. Comput. (2010) pp. 1-8 


50.73 


50.74 


50.75 


50.76 


50.77 


50.78 


50.79 


50.80 


50.81 


50.82 


50.83 


50.84 


50.85 


50.86 


J.R. Figueira, A. Liefooghe, E.-G. Talbi, A.P. Wierz- 
bicki: A parallel multiple reference point ap- 
proach for multi-objective optimization, Eur. 
J. Op. Res. 205, 390-400 (2010) 

L. Fourment, R. Ducloux, S. Marie, M. Ejday, 
D. Monnereau, T. Masse, P. Montmitonnet: Mono 
and multi-objective optimization techniques ap- 
plied to a large range of industrial test cases us- 
ing metamodel assisted evolutionary algorithms, 
10th Int. Conf. Numer. Methods Ind. Form. (2010) 
pp. 833-840 

T. Hiroyasu, T. Noda, M. Yoshimi, M. Miki, H. Yo- 
kouchi: Examination of multi-objective genetic 
algorithm using the concept of a peer-to-peer 
network, 2nd World Congr. Nat. Biol. Inspir. Com- 
put. (2010) pp. 508-512 

|. Kamkar, M.-R. Akbarzadeh-T: Multiobjective 
cellular genetic algorithm with adaptive fuzzy fit- 
ness granulation, IEEE Int. Conf. Syst. Man Cybern. 
(2010) pp. 4147-4153 

A. Kandil, K. El-Rayes, 0. El-Anwar: Optimiza- 
tion research: Enhancing the robustness of large- 
scale multiobjective optimization in construction, 
J. Constr. Eng. Manag. 136(1), 17-25 (2009) 

S. Mesmoudi, N. Perrot, R. Reuillon, P. Bourgine, 
E. Lutton: Optimal viable path search for a cheese 
ripening process using a multi-objective EA, Int. 
Conf. Evol. Comput. (2010) 

J. Montgomery, |. Moser: Parallel constraint han- 
dling in a multiobjective evolutionary algorithm 
for the automotive deployment problem, 6th 
IEEE Int. Conf. e-Sci. Workshops (2010) pp. 104- 
109 

J.J. Durillo, Q. Zhang, A.J. Nebro, E. Alba: Distri- 
bution of computational effort in parallel MOEA/D, 
Learn. Intell. Optim. (2011) pp. 488-502 

A.J. Nebro, J.J. Durillo: A study of the paralleliza- 
tion of the multi-objective metaheuristic MOEA/D, 
Lect. Notes Comput. Sci. 6073, 303-317 (2010) 

M. Pilat, R. Neruda: Combining multiobjective 
and single-objective genetic algorithms in het- 
erogeneous island model, IEEE Congr. Evol. Com- 
put. (2010) pp. 1-8 

J.C. Calvo, J. Ortega, M. Anguita: Comparison of 
parallel multi-objective approaches to protein 
structure prediction, J. Supercomput. 58, 253-260 
(2011) 

M. Garza-Fabre, G. Toscano-Pulido, C.A. Coello 
Coello, E. Rodriguez-Tello: Effective ranking + 
speciation = many-objective optimization, IEEE 
Congr. Evol. Comput. (2011) pp. 2115-2122 

D. Gladwin, P. Stewart, J. Stewart: Internal com- 
bustion engine control for series hybrid electric 
vehicles by parallel and distributed genetic pro- 
gramming/multiobjective genetic algorithms, Int. 
J. Syst. Sci. 42(2), 249-261 (2011) 

D.S. Lee, C. Morillo, G. Bugeda, S. Oller, E. Onate: 
Multilayered composite structure design optimi- 
sation using distributed/parallel multi-objective 


Parallel Multiobjective Evolutionary Algorithms 


References 


50.87 


50.88 


50.89 


50.90 


50.91 


50.92 


50.93 


50.94 


50.95 


50.96 


50.97 


50.98 


evolutionary algorithms, Compos. Struct. 94(3), 
1087-1096 (2012) 

A.L. Marquez, C. Gil, R. Baños, J. Gomez: Paral- 
lelism on multicore processors using Parallel.FX, 
Adv. Eng. Softw. 42, 259-265 (2011) 

B.S.P. Mishra, A.K. Addy, R. Roy, S. Dehuri: Parallel 
multi-objective genetic algorithms for associa- 
tive classification rule mining, Int. Conf. Commun. 
Comput. Secur. (2011) pp. 409-414 

E. Segredo, C. Segura, C. Leon: On the comparison 
of parallel island-based models for the multiob- 
jectivised antenna positioning problem, 15th Int. 
Conf. Knowl. Intell. Inf. Eng. Syst. (2011) pp. 32- 
4 

G.N. Shinde, S.B. Jagtap, S.K. Pani: Paralleliz- 
ing multi-objective evolutionary genetic algo- 
rithms, Proc. World Congr. Eng. (2011) pp. 1534- 
1537 

M. Yagoubi, L. Thobois, M. Schoenauer: Asyn- 
chronous evolutionary multi-objective algo- 
rithms with heterogeneous evaluation costs, IEEE 
Congr. Evol. Comput. (2011) pp. 21-28 

A. Zhang, H. Li, C. Xiao: Parallel comput- 
ing model for time-varied coordinated voltage/ 
reactive power control, J. Electr. Syst. 7(1), 1-11 
(2011) 

W. Zhu, Y. Li: GPU-accelerated differential evo- 
lutionary Markov chain Monte Carlo method 
for multi-objective optimization over continuous 
space, 2nd Workshop Bio-Inspir. Algorithms Dis- 
trib. Syst. (2010) pp. 1-8 

W. Zhu, A. Yaseen, Y. Li: DEMCMC-GPU: An effi- 
cient multi-objective optimization method with 
GPU acceleration on the fermi architecture, New 
Gener. Comput. 29, 163-184 (2011) 

K. Deb, A. Pratap, S. Agarwal, T. Meyarivan: 
A fast and elitist multiobjective genetic algo- 
rithm: NSGA-II, IEEE Trans. Evol. Comput. 6(2), 
182-197 (2002) 

J. Branke, H. Schmeck, K. Deb, M.S. Reddy: Paral- 
lelizing multi-objective evolutionary algorithms: 
Cone separation, Congr. Evol. Comput. (2004) 
pp. 1952-1957 

F. Streichert, H. Ulmer, A. Zell: Parallelization 
of multi-objective evolutionary algorithms us- 
ing clustering algorithms, Lect. Notes Comput. Sci. 
3410, 92-107 (2005) 

Q. Zhang, H. Li: MOEA/D: A multi-objective evolu- 
tionary algorithm based on decomposition, IEEE 
Trans. Evol. Comput. 11(6), 712-731 (2007) 


50.99 


50.100 


50.101 


50.102 


50.103 


50.104 


50.105 


50.106 


50.107 


50.108 


50.109 


50.110 


50.111 


W. Gropp, E. Lusk, A. Skjellum: Using MPI: Portable 
Parallel Programming with the Message-Passing 
Interface (MIT, London 2000) 

F. Berman, G.C. Fox, A.J.G. Hey: Grid Comptuing 
Making the Global Infrastructure A Reality, Com- 
munications Networking and Distributed Systems 
(Wiley, New York 2003) 

NVIDIA Corporation: NVIDIA CUDA Compute Unified 
Device Architecture Programming Guide (NVIDIA 
Corporation, Santa Clara 2007) 

R. Tsuchiyama, T. Nakamura, T. lizuka, A. Asahara, 
S. Miki: The OpenCL Programming Book (Fixstars 
Corporation, Synnyvale 2010) 

E. Zitzler, K. Deb, L. Thiele: Comparison of Multi- 
objective Evolutionary Algorithms: Empirical Re- 
sults Evol, Comput. 8(2), 173-195 (2000) 

K. Deb, L. Thiele, M. Laumanns, E. Zitzler: Scalable 
test problems for evolutionary multiobjective op- 
timization. In: Evolutionary Multiobjective Opti- 
mization. Theoretical Advances and Applications, 
ed. by A. Abraham, L. Jain, R. Goldberg (Springer, 
Berlin, Heidelberg 2005) pp. 105-145 

S. Huband, P. Hingston, L. Barone, L. While: A re- 
view of multiobjective test problems and a scal- 
able test problem toolkit, IEEE Trans. Evol. Com- 
put. 10(5), 477-506 (2006) 

Z. Pawlak: Rough sets, Int. J. Parallel Program. 11, 
341-356 (1982) 

U. Maulik, A. Sarkar: Evolutionary rough paral- 
lel multi-objective optimization algorithm, Fun- 
dam. Inform. 99(1), 13-27 (2010) 

A.J. Nebro, J.J. Durillo, F. Luna, B. Dorronsoro, 
E. Alba: A cellular genetic algorithm for multi- 
objective optimization, Int. J. Intell. Syst. 24(7), 
723-725 (2009) 

J.J. Durillo, A.J. Nebro, C.A. Coello, J. Garcia- 
Nieto, F. Luna, E. Alba: A study of multiobjective 
metaheuristics when solving parameter scalable 
problems, IEEE Trans. Evol. Comput. 14(4), 618- 
635 (2010) 

J.J. Durillo, A.J. Nebro, F. Luna, C.A. Coello Coello, 
E. Alba: Convergence speed in multi-objective 
metaheuristics: Efficiency criteria and empirical 
study, Int. J. Numer. Methods Eng. 84(11), 1344- 
1375 (2010) 

D.E. Goldber, K. Deb: A comparative analysis of 
selection schemes used in genetic algorithms. 
In: Foundations of Genetic Algorithms, ed. by 
G.J.E. Rawlins (Morgan Kaufmann, San Mateo 
1991) pp. 69-93 


1031 


0S | 3 Hed 


1033 


51. Many-Objective Problems: 
Challenges and Methods 


Antonio López Jaimes, Carlos A. Coello Coello 


This chapter presents a short review of the state- 
of-the-art efforts for understanding and solving 
problems with a large number of objectives 
(usually known as many-objective optimization 
problems, MOP s). The first part of the chapter 
presents the current studies aimed at discovering 
the sources that make a multiobjective optimiza- 
tion problem (MOP) harder when more objectives 
are added, degrading in this way, the performance 
of a multiobjective evolutionary algorithm (MOEA). 
Next, some of the most relevant techniques de- 
signed to deal with MOPs are presented and 
categorized. 


SUD Background. ccert 1033 
51.2 Basic Concepts and Notation................ 1034 
51.2.1 Multiobjective Optimization 
PYOUIBINS.« « ievevcccscyssice cesta cevetiai acs 1034 
51.2.2 Notions of Conflict Among 
OD fRCMVES o eseis 1035 


51.1 Background 


Since the first implementation of an MOEA in the mid- 
1980s [51.1], a wide variety of new MOEAs have been 
proposed, gradually improving in both their effective- 
ness and efficiency to solve MOPs [51.2]. However, 
most of these algorithms have been evaluated and ap- 
plied to problems with only two or three objectives, in 
spite of the fact that many real-world problems have 
more than three objectives [51.3—6]. 

Recent experimental [51.7—9] and analytical [51.10, 
11] studies have shown that MOEAs based on Pareto 
optimality [51.12] scale poorly in MOPs with a high 
number of objectives (4 or more). These MOPs are 
usually known in the community as MOPs. Although 
those scalability issues seem mainly to affect Pareto- 
based MOEAs, as we will see later in this chapter, 
optimization problems with a large number of objec- 


51.3 Sources of Difficulty to Solve 
Many-Objective Optimization Problems. 1036 
51.3.1 Deterioration 


of the Search Ability ................ 1036 
51.3.2 Effectiveness 

of Crossover Operators................ 1037 
51.3.3 Dimensionality 

of the Pareto Front.................... 1037 


51.3.4 Visualization of the Pareto Front. 1038 
51.4 Current Approaches to Deal 
with Many-Objective Problems............. 1038 
51.4.1 Preference Relations to Deal 
with Many-Objective Problems... 1039 
51.4.2 Objective Reduction Approaches. 1039 
51.4.3 Preference Incorporation 


Approaches. oseere 1041 

51.5 Recombination Operators 
and Mating Restrictions....................... 1042 
51.6 Scalarization Methods.......................... 1043 
51.7 Conclusions and Research Paths ........... 1043 
REFEFENCES.......... oe cceec ccc cecceeeeeecceeeaeeeneceeeeaes 1044 


tives introduce some difficulties common to any other 
multiobjective optimizer. 

The goal of this chapter is to present a general 
view of the difficulties posed by many-objective prob- 
lems for Pareto-based MOEAs. Specifically, we present 
a review of the potential sources of difficulty currently 
found in the specialized literature. Likewise, we present 
a brief review of the current proposals to deal with these 
sources of difficulty. These proposals are classified into 
five classes. Among the most common approaches to 
deal with MOPs, we can find the use of preference 
relations to further rank nondominated solutions, the 
removal of redundant objectives during or after the 
search, and the incorporation of preference information. 
Finally, at the end of the chapter some future research 
paths are outlined. 


1034 PartE | Evolutionary Computation 


als | J Hed 


51.2 Basic Concepts and Notation 


In this section, we will introduce the concepts and no- 
tation that will be used throughout the rest of the paper. 
Since some of these proposals are based on conflict 
information among the objectives, some definitions of 
conflict are also provided. 


51.2.1 Multiobjective Optimization Problems 


Definition 51.1 Multiobjective optimization prob- 
lem 
An MOP is defined as 


Minimize f(x) = [fi(x),fo(%),..., KOF, 


subject toxe X. (51.1) 


The vector x € R” is formed by n decision vari- 
ables representing the quantities for which values are 
to be chosen in the optimization problem. The feasi- 
ble set X C R” is implicitly determined by a set of 
equality and inequality constraints. The vector func- 
tion f : X — RÝ is composed of k > 2 scalar objective 
functions fi: X —>R G@=1,...,). In multiobjective 
optimization, the sets R” and R* are known as the 
decision variable space and objective function space, 
respectively. The image of X under the function f is 
a subset of the objective function space denoted by 
Z = f(X) and referred to as the feasible set in the ob- 
jective function space. 

In order to define precisely the multiobjective op- 
timization problem stated in Definition 51.1, we have 
to establish the meaning of minimization in R*. That 
is to say, we need to define how vectors z = f(x) € R4 
have to be compared for different solutions x € R”. In 
single-objective optimization the relation less than or 
equal (<) is used to compare the scalar objective val- 
ues. By using this relation there may be many different 
optimal solutions x € X, but only one optimal value 
fm = min{f (x) |x € X} since the relation < induces 
a total order in R (i. e., every pair of solutions is com- 
parable, and thus, we can sort solutions from the best to 
the worst one). In contrast, in multiobjective optimiza- 
tion problems, there is no canonical order of R‘, and 
thus, we need weaker definitions of order to compare 
vectors in R4. 

In multiobjective optimization, the Pareto domi- 
nance relation is usually adopted. This relation was 
originally proposed by Edgeworth in 1881 [51.13], but 
generalized by the French-Italian economist Pareto in 
1896 [51.12]. 


Definition 51.2 Pareto dominance relation 
We say that a vector z! dominates vector z”, denoted by 
z! <2’, if and only if 


Vie fl... kiz < z2 (51.2) 
and 


HEV ck ig! < zZ. (51.3) 


If z! =z? or z! >z? for some i, then we say that 
z! does not dominate z? (denoted by z! 4 z”). Thus, to 
solve an MOP, we have to find those solutions x € X 
whose images, z = f(x), are not dominated by any other 
vector in the feasible space. It is said that two vectors, 
z! and z, are mutually nondominated vectors if z! x z? 
and z? 4 z!. 


Definition 51.3 Pareto optimality 
A solution x* € X is Pareto optimal if there does not 
exist another solution x € X such that f(x) < f(x*). 


Definition 51.4 Pareto optimal set 
The Pareto optimal set, Pop, is defined as 


Pop = {x €X |AV EX : f(y) <fa). (51.4) 


Definition 51.5 Pareto front 
For a Pareto optimal set Pop, the Pareto front PF op is 
defined as 


PFop = {2 = A), fE) |X € Popy. (51.5) 


In decision variable space, these vectors are referred 
to as decision vectors of the Pareto optimal set, while in 
objective space, they are called objective vectors of the 
Pareto optimal set. In practice, the goal of MOEAs is 
to find the best approximation set of the Pareto opti- 
mal front. An approximation set is a finite subset of Z 
composed of mutually nondominated vectors and is de- 
noted by PF approx. Currently, it is well accepted that the 
best approximation set is determined by the closeness to 
the Pareto optimal front, and the spread over the entire 
Pareto optimal front [51.2, 14, 15]. 


Many-Objective Problems: Challenges and Methods | 51.2 Basic Concepts and Notation 1035 


A common approach to deal with multiobjective 
optimization problems is formulating it as a single opti- 
mization problem by means of a kind of function called 
scalarizing function. 


Definition 51.6 Scalarizing function 

A scalarizing function is a parameterized function s: 
R? — R. Thus, the multiobjective problem is trans- 
formed into the following scalar problem: 


Minimize s(z) , 
subject toz € Z. (51.6) 


It is worth noting, however, that scalarizing func- 
tions generate one point at a time (instead of several, as 
happens when using the definition of Pareto optimality). 
A common scalarizing function is based on the Cheby- 
shev distance (Loo metric) [51.16, 17]. 


Definition 51.7 Weighted Chebyshev scalarizing 
function 
The weighted Chebyshev scalarizing function (or 
Chebyshev function for short) is defined by 

Sœ (z, z") = [max Aiki- gh (51.7) 
where z" is a reference point, A = [A,,..., Ag] is a vec- 
tor of weights such that Vi A; > 0 and, for at least one i, 
Ài >0. 


51.2.2 Notions of Conflict Among Objectives 


One important condition of a multiobjective problem 
is the conflict among their objectives. If the objectives 
have no conflict among them, then we could solve the 
problem optimizing each objective function indepen- 
dently. Nonetheless, it has been found that in some 
problems, although a conflict exists elsewhere, some 
objectives behave in a nonconflicting manner. Although 
different authors have proposed definitions for con- 
flict (nonconflict) among objectives [51.18—21], in this 
chapter we only present conflict (nonconflict) defini- 
tions relevant to this document. 


Definition 51.8 

Let Sx be a subset of X, then, according to Carlsson and 
Fullér, two objectives can be related in the following 
ways (assuming minimization): 


1. fiis in conflict with f on Sx if fŒ!) < fix?) implies 
fœ) > f(x?) for all x!,x? € Sx. 


2. f; supports f on Sx if f,(x!) > f(x?) implies f(x!) > 
f(x?) for all x!,x? € Sx. 
3. fi and f are independent on Sx, otherwise. 


In the cases 2 and 3, those objectives are also called 
nonconflicting objectives. When Sx = X, it is said that 
fj is in conflict with (or supports) f; globally. However, in 
many MOPs the relation among the objectives changes 
when comparing different subsets of X. Figure 51.1 
shows an example in which two functions are in con- 
flict in some subsets of X, while in others, they support 
each other. 

Nonconflicting objectives are also known as 
nonessential or redundant objectives because, as 
pointed out by Gal and Hanne [51.22], when a non- 
conflicting objective is removed from the original set of 
objectives, the resulting Pareto front does not change. 
Based on the notion of nonessential objectives, Brock- 
hoff and Zitzler [51.21] proposed a conflict definition 
that verifies whether the Pareto dominance relation 
changes when some objectives are removed, or not. The 
Pareto dominance relation induced by a given set of ob- 
jectives, F C {fi,f,...,f;}, is defined as 


Sr= ty) | x,y E€ X and Vf € F: fix) <fi)}. 


Definition 51.9 

Let F,, F2 C Ø be two subsets of objectives, where Ø 
is the entire set of objectives = {f,,fo,...,f,}. Then, 
we call F; nonconflicting with Fz iff (<p,C<,r,) A 
(Xr, Er). 


In other words, F; and F3 are called nonconflicting 
if and only if the corresponding relations <p, and <p, 


Conflict Support 


Fig. 51.1 Two objective functions can be in conflict in 
some subsets of the feasible space, and can be supportive 
in other subsets 


T'S |3 Hed 


1036 Part E | Evolutionary Computation 


E'LS | J Hed 


are identical, but not necessarily F; = F2. The noncon- 
flicting definition is useful since if F and F’ C F are 
nonconflicting, then we can replace F with F’ and ob- 
tain the same Pareto optimal front. The objectives in F’ 
are then called essential objectives, whereas the objec- 
tives in F \ F’ are known as nonessential or redundant 
objectives. 

In practice, however, it is useful to allow a certain 
extent of change on the Pareto front when an objec- 
tive is omitted in order to define degrees of nonconflict 
among objectives. In this direction, Brockhoff and Zit- 
zler proposed to use the additive e-dominance indicator 
to measure the change between two dominance rela- 


tions. The ¢-dominance relation induced by a set F 
is defined by <%= {(x,y)|x,y € X and Vf; € F : fi(x) 
—€ <fi(y)}. 


Definition 51.10 

Let F1, F2 C F be two subsets of objectives, where F 
is the entire set of objectives. Then, we call F; 6- 
nonconflicting with F3 iff (<r, ext) A (Srm E$). 


In this case, if an objective subset F’ C F is 6- 
nonconflicting with F, then we can omit all objectives 
in F\ F’ without causing a larger error than 6 in the 
omitted objectives. 


51.3 Sources of Difficulty to Solve Many-Objective Optimization Problems 


51.3.1 Deterioration of the Search Ability 


A widespread explanation for this problem is based 
on the fact that the proportion of nondominated solu- 
tions (i. e., equally good solutions according to Pareto 
dominance) in a population increases rapidly with the 
number of objectives [51.23, 24]. In order to illustrate 
this condition, Fig. 51.2 shows the nondominated re- 
gions with respect to a given solution z. 

In general, as presented by Farina and Am- 
ato [51.23], the expression to compute the proportion, e, 
of mutually nondominated regions and the whole search 
space is given by e = (2% —2)/2*, where k is the num- 
ber of objectives. This proportion goes to infinity when 
the number of objectives approaches infinity. 


f 


fi 


E| Worst Better Equal 


Fig. 51.2 Example of the increasing proportion of nondominated 
solutions: for 2 objectives 1/2 of the search space is composed of 
nondominated regions, whereas for 3 objectives 3/4 of the search 
space consists of nondominated regions. In general, for k objectives, 
(2t — 2)/2* of the objective space comprises nondominated regions 


Therefore, since in MOPs with a high number of 
objectives almost all solutions are equivalent, many re- 
searchers have suggested [51.11, 23, 25-28] that in such 
problems, the selection of the appropriate individuals 
for steering the population toward the Pareto optimal 
set gets more difficult. As a result, an MOP gets harder 
to solve as more objectives are added. 

However, as pointed out by Schiitze et al. [51.29], 
the increase of the number of nondominated individ- 
uals is not a sufficient condition for an increase of 
the hardness of a problem. Specifically, they conclude 
that in a class of uni-modal problems, their diffi- 
culty is marginally increased when more objectives are 
added despite the exponential growth of the propor- 
tion of nondominated solutions with k. Nonetheless, 
they suggest that the hardness increase observed in ex- 
perimental studies might be the result of the addition 
of local optima to the problem as more objectives are 
aggregated. 

Therefore, although the rise of the proportion of 
incomparable solutions does not significantly deter- 
mine the difficulty of an MOP per se, it seems that 
the addition of objectives aggravates some particular 
difficulties observed in the context of 2 or 3 objec- 
tives. This is the case of the so-called dominance 
resistant solutions (DRSs) or outliers [51.14, 30-32]. 
DRSs are solutions with a poor value in at least 
one of the objectives, but with near optimal values 
in the others. In other words, those are nondomi- 
nated solutions, but far from the Pareto optimal front. 
Figure 51.3 shows an example of DRSs in the well- 
known test problem DTLZ2 [51.14]. These kinds of 
solutions represent potential difficulty since, as many 


Many-Objective Problems: Challenges and Methods | 51.3 Sources of Difficulty to Solve Many-Objective Optimization Problems 


researchers have pointed out [51.14, 30-32], the num- 
ber of DRSs grows as the number of objectives is 
increased. 


51.3.2 Effectiveness of Crossover Operators 


In a combinatorial class of MOPs, Sato et al. [51.33] 
performed a series of experiments that revealed that so- 
lutions in the variable space become more distant (in 
terms of the Hamming distance between binary en- 
coded solutions.) from each other as more objectives 
are added to the problem. In this scenario, the re- 
combination of two parents close to the Pareto front 
might generate an offspring far from the Pareto front 
since a conventional crossover operator might be too 
disruptive. 


51.3.3 Dimensionality of the Pareto Front 


Due to the curse of dimensionality, the number of points 
required to represent accurately a Pareto front increases 
exponentially with the number of objectives. Formally, 
the number of points necessary to represent a Pareto 
front with k objectives and resolution r is bounded by 
O(krk—!) [51.34]. This expression is derived assum- 
ing that each solution is contained inside a hypercube 
to preserve an even distribution. As can be seen in 
Fig. 51.4, the number of hypercubes determines the 
resolution of the Pareto front, i.e., r is the number of 
hypercubes per dimension. An example of the shortest 
connected and nondegenerated 2-objective Pareto front 
(a straight line) is shown on the left side of Fig. 51.4. 
The figure also shows a bound for the largest Pareto 
front for 2 and 3 objectives. In general, the bounding 
Pareto front is formed by k hyperplanes containing r*—! 
hypercubes each (see, for example, the 3-objective case 
shown on the right side of Fig. 51.4). This way, the max- 
imum number of points of a 2-objective Pareto front 
with resolution r = 6 is 2x 62~! = 12, whereas for 3 
objectives and r = 5 is 3x5?! = 75. Table 51.1 shows 
the maximum number of points required to represent 
a Pareto front for different numbers of objectives using 
a resolution of r= 25, which is a conservative num- 
ber considering that a resolution of r = 50 is usually 
used in several studies to obtain 100 solutions in 2- 
objective problems. Notwithstanding, for 5 objectives, 
we would require approximately 2 million points to 
represent a Pareto front with resolution r = 25. There 
are other formulations leading to a similar exponen- 
tial expression with respect to k. For example, using 
the concept of €-dominance, Laumanns et al. [51.35] 


Fig. 51.3 Illustration of some DRSs in problem DTLZ2: although 
solutions marked as DRSs seem to be dominated by some solu- 
tion in the lower part of the circled solutions, they achieve marginal 
improvements in objectives fı or f2, and therefore, they are nondom- 
inated solutions, but having poor values in objective f3, though 


> 


Fig. 51.4 Number of points required to represent a Pareto 
front with a resolution r, i.e., the number of hypercubes 
per dimension 


Table 51.1 Bound for the number of points required to rep- 
resent a Pareto front with resolution r = 25 


k Points 
2 50 
4 62 500 
5 1953 125 
7 1708 984 375 


and Schiitze et al. [51.36] give a similar exponential 
bound for the size of an approximation of a Pareto 
front. 

This poses some difficulties to solve MOPs. The 
most important one is the number of function eval- 


1037 


E'LS | 3 Hed 


1038 PartE 


Evolutionary Computation 


is |3 Hed 


uations required to deal with a large number of so- 
lutions. This is a serious issue since plenty of real- 
world problems (e.g., [51.37—43]), due to time con- 
straint reasons, have a small budget of function eval- 
uations. In fact, there is an important research effort 
toward designing MOEAs that generate good approx- 
imations of the Pareto front using less than 1000 
function evaluations (e.g., [51.44-47]). Other chal- 
lenges are related to the design of both data struc- 
tures to efficiently manage that number of points, 
and density estimators to achieve an even distribu- 
tion of the solutions along the Pareto front. Unfortu- 
nately, even if we could efficiently obtain an accurate 
approximation of the Pareto front, the selection of 
one solution among such a huge number of solutions 


would be a very difficult task for a decision maker 
(DM). 


51.3.4 Visualization of the Pareto Front 


Clearly, with more than three objectives it is not pos- 
sible to plot the Pareto front as usual. This is a serious 
problem since visualization plays a key role for a proper 
decision-making process. Parallel coordinates [51.48] 
and self-organizing maps [51.49] are some of the meth- 
ods proposed to ease decision making in high dimen- 
sional problems. The reader is referred to Chapters 8 
and 9 of [51.50] for a good review of various visual- 
ization techniques. Nevertheless, more research in the 
many-objective optimization context is still required. 


51.4 Current Approaches to Deal with Many-Objective Problems 


Besides studies about the scalability of Pareto-based 
MOBAs, in the current literature we can find sev- 
eral proposals to overcome those scalability issues. 
The most common approaches can be categorized as 
follows: 


1. Adopt or propose a preference relation that yields 
a finer solution ordering than the one yielded by 
Pareto optimality. In other words, these relations 
are able to further rank nondominated solutions. In 
addition, most of these preference relations share 
the property that their optimal set of solutions 
is a subset of the Pareto optimal set. Therefore, 
these techniques can also be used as a remedy to 
cope with the dimensionality of Pareto fronts in 
MOPs. 

2. Reduce the number of objectives of the problem 
during the search process or, a posteriori, once 
an approximation of the Pareto front has been 
found [51.21, 26,51]. The main goal of these kinds 
of reduction techniques is to identify the noncon- 
flicting objectives (at least to a certain extent) in 
order to discard them. 

3. Scalarizing decomposition of an MOP. As de- 
scribed in the previous section, the degradation 
observed on MOEAs when dealing with many- 
objective problems is mainly attributed to the inef- 
ficiency of the Pareto relation in high-dimensional 
spaces. Therefore, methods that do not rely on 
Pareto dominance, like scalarizing decomposition 
methods, have been suggested as an alternative to 


deal with many-objective problems. The underlying 
idea of these types of methods is to perform a num- 
ber of single-objective searches along different 
search vectors evenly distributed over the objective 
space. Each single-objective search is formulated by 
means of a scalarizing function. This way, the ap- 
proximation of the Pareto front is composed of the 
optima found by every single-objective search. 

4. Incorporation of preference information interac- 
tively throughout the course of the optimization 
process. By incorporating preferences we can cope 
with MIOPs in two aspects. First, the search can be 
focused on the decision maker’s region of interest, 
avoiding this way, the evaluation of a huge number 
of solutions. Second, the preference relations 
usually used in interactive methods help to deal 
with a large number of objectives since they are 
able to rank incomparable nondominated solutions. 

5. Use of specialized recombination operators or 
strategies to control the mating among parents. 
The first approach tries to diminish the disruptive 
effect of recombination operators by regulating the 
proportion in which the traits of each parent con- 
tribute to create the offspring. The second approach 
restricts which individuals can be paired for recom- 
bination, for instance, using the similarity as mating 
criteria or the location in the objective space. 


In the remainder of this section, some of the most 
relevant approaches to deal with many-objective prob- 
lems are presented. 


Many-Objective Problems: Challenges and Methods 


51.4 Current Approaches to Deal with Many-Objective Problems 


51.4.1 Preference Relations to Deal 
with Many-Objective Problems 


Bentley and Wakefield [51.52] proposed the average 
ranking (AR) and the maximum ranking (MR) pref- 
erence relations. The AR relation computes, for each 
solution, a different rank considering each objective in- 
dependently. The final rank is obtained by summing 
up the ranks on each objective. In turn, the MR rela- 
tion takes the best rank as the global rank. Clearly, this 
method favors extreme solutions, i.e., solutions with 
high performance in some of the objectives, although 
with poor overall performance. Although it is less ev- 
ident, the average ranking relation also favors extreme 
solutions. 

In the favor relation, proposed by Drechsler 
et al. [51.53], a vector z! is preferred to vector z? with 
respect to the favor relation (Z! <fayor 27), if and only if 


Hiz! <z <iis k} 
Stig >g, jak. 


In other words, the favored vector is that which outper- 
forms the other one in more objectives. Unfortunately, 
this relation emphasizes extreme solutions. 

The preference order relation (POR), developed by 
di Pierro [51.54], is based on the concept of efficiency 
of order proposed by Das [51.55], which states that: 
A vector z* is efficient of the order q if it is not domi- 
nated by any other vector in all the (‘) objective subsets 
of size q. 

Based on that definition, it is said that the vector 
z! is preferred to the vector 2 (Z! <por z”), if and only 
if, for some integer q and YI C {1,2,...,k} such that 


=q 


zJ <2 Wiel, and Jiel: z <z. 

In other words, if z! and z? do not dominate each other, 
then the solutions are compared in a lower dimensional 
space in order to break the tie. 

Sato et al. [51.56] proposed a preference relation to 
control the dominance area of solutions. This method 
controls the degree of expansion or contraction of the 
dominance area by modifying each objective vector z 
with the expression 

r-sin(@; + 5;-7) 


_ VGH 1,2) erg Kk; 
a sin(s;- 7) ' 


where s € R* is a user-defined vector, r = ||z||, and œ; 
is the declination angle between z and the axis of f;. 


If the user adopts values s; < 0.5 (Y i = 1,2,..., k), 
the dominance area is expanded and produces 
a more fine-grained ranking of solutions which would 
strengthen the selection process. Thus, we can say that 
vector z is preferred to vector y with respect to the ex- 
pansion relation (Z ~expansion Y), if and only if z’ < y’. 

Farina and Amato [51.57] proposed an alterna- 
tive relation which takes into account the number of 
improved objectives between two solutions. This rela- 
tion employs three quantities, 7 (X1, X2), Me(X1, X2) and 
Ny (X1, X2), which denote the objectives where x, is bet- 
ter, equal or worse than x2, respectively. Using these 
quantities, the concepts of (1 —k)-dominance and k- 
optimality are defined. A solution x, (1 — k) dominates 
X2 if and only if 


ne(X1, X2) < M 
=ne 


= 
np(X1, X2) = k41 


In a similar way to Pareto optimality, a solution x* 
is a k-optimum if and only if there is no x in the decision 
variable space such that x k-dominates x*. 

An important remark that we have to keep in mind 
with respect to a new preference relation is that in 
spite of the fact that some preference relations con- 
tribute to converge faster to the Pareto front than the 
Pareto dominance relation, they also stress the gener- 
ation of solutions far from the knee region (usually 
the middle region of the Pareto front). This condi- 
tion limits the applicability of these relations since, 
in the general case, it is commonly assumed that the 
DM prefers solutions from the knee region [51.58- 
61). 


51.4.2 Objective Reduction Approaches 


Deb and Saxena [51.26] proposed a method for re- 
ducing the number of objectives based on principal 
component analysis. The main assumption is that if two 
objectives are negatively correlated (taking the gener- 
ated Pareto front as the data set), then these objectives 
are in conflict with each other. To determine the most 
conflicting objectives (i. e., the most essential), the au- 
thors analyze in turn the eigenvectors (i. e., the principal 
components) of the correlation matrix. That is, by pick- 
ing the most negative and the most positive elements 
from the first eigenvector, we can identify the two most 
important conflicting objectives. To aggregate more 
objectives to the set of essential objectives, the remain- 
der of the eigenvectors are analyzed in a similar way 


1039 


is |3 Hed 


1040 PartE 


Evolutionary Computation 


is |3 Hed 


until the cumulative contribution of the eigenvalues ex- 
ceeds a threshold cut (TC). This method is incorporated 
into an iterative scheme which uses a multiobjective 
optimizer (the actual implementation uses the nondom- 
inated sorting genetic algorithm II (NSGA-II) [51.62]) 
to obtain a reduced objective set containing only the 
nonredundant objectives according to the analysis of the 
eigenvectors. In this scheme, the evolutionary multiob- 
jective optimizer is first run and then, the correlation 
analysis is carried out to obtain a reduced set of objec- 
tives. This process is repeated using the new reduced 
set of objectives. The process stops when the current 
subset is equal to the subset generated in the previous 
iteration. 

Brockhoff and Zitzler [51.21] defined two kinds of 
objective reduction problems and two corresponding al- 
gorithms to solve them. The problems proposed are the 
following: 


1. The 5-MOSS problem. Given an MOP, the 6-mi- 
nimum objective subset problem is defined as fol- 
lows. 
© Input: A Pareto front approximation of the MOP 

andadeR. 
© Task: Compute the minimum objective subset 
F’ C F such that F’ is 5-nonconflicting with F. 

2. The K-EMOSS problem. Given an MOP, the prob- 
lem of finding the minimum objective subset of size 
K with minimum error is defined as follows. 

@ Input: A Pareto front approximation of the MOP 
anda K EN. 

© Task: Compute an objective subset F’ C F with 
size |F’| < K, such that F’ is 6-nonconflicting 
with F with the minimum possible 6. 


Since both problems are NP-hard, the authors 
proposed both an exact and a greedy algorithm for 
each of them. The exact algorithms for both problems 
have time complexity O(m7k- 2"), where m is the size 
of the given nondominated set and k is the number 
of objectives. On the other hand, the greedy algo- 
rithm for the 6-MOSS problem has time complexity 
O(min{m?k>, m*k?}), while the greedy algorithm for 
the K-EMOSS problem has time complexity O(m7k*). 

A similar approach was proposed by López Jaimes 
et al. [51.51]. They proposed two different objective re- 
duction algorithms: 


1. An algorithm that finds a minimum subset of 
nonredundant objectives with the minimum error 
possible. 


2. An algorithm that finds a K-size subset of nonre- 
dundant objectives, yielding the minimum error 
possible. 


Both algorithms are based on an unsupervised 
feature selection technique proposed by Mitra 
et al. [51.63], in which the correlation coefficient 
is used to estimate the conflict among objectives. 
Specifically, a negative correlation between a pair of 
objectives means that one objective increases, while 
the other decreases and vice versa (see, for example, 
the functions in Fig. 51.1). On the other hand, if the 
correlation is positive, then both objectives increase or 
decrease at the same time. This way, we could interpret 
that the more negative the correlation between two 
objectives, the more the conflict between them. 

These two algorithms were designed to be used af- 
ter an approximation of the Pareto front has been found. 
From a general point of view, the removal of the non- 
conflicting objectives can help to the problem designer 
or the decision maker to gain knowledge about the re- 
lation and importance of the objectives according to the 
conflict. With regard to the decision-making process, 
the removal of the nonconflicting objectives eases the 
visualization of the approximation of the Pareto front. 
In cases with a moderate number of objectives (i. e., 4— 
7), the reduced objective set might be visualized using 
the traditional 3D plots. 

However, an objective reduction technique can also 
be used in the course of the search. In [51.64], for 
instance, the authors proposed the incorporation of 
an objective reduction technique into a Pareto-based 
MOEA in order to cope with many-objective problems 
during the search. One possible approach is gradually 
reducing the number of objectives throughout different 
stages of the search until a target objective subset size 
has been reached. In each reduction stage, an objec- 
tive reduction method is applied on the current Pareto 
front approximation. Toward the end of the search, the 
original objective set is used again to approximate the 
entire Pareto front. This kind of approach can be advan- 
tageous for solving real-world problems with expensive 
objective functions since only a small subset of the 
objective functions is evaluated. Additionally, the use 
of a small set of objectives throughout the course of 
the search makes possible the adoption of expensive 
ranking schemes (e.g., those based on the hypervolume 
indicator) in problems with a high number of objec- 
tives [51.65]. 

A further approach, presented in [51.66], consists 
in partitioning the objective set into several subsets so 


Many-Objective Problems: Challenges and Methods 


51.4 Current Approaches to Deal with Many-Objective Problems 


that a different portion of the population focuses the 
search on a different subspace. The partitioning of the 
set of objectives is based on the analysis of the conflict 
information obtained from the current Pareto front ap- 
proximation. 


51.4.3 Preference Incorporation Approaches 


Like the alternative preference relations reviewed in 
Sect. 51.4.1, the integration of DM’s preferences pro- 
vides a finer rank of the solutions. However, unlike 
preference relation approaches, in an interactive ap- 
proach, the region of interest can be changed during 
the search according to the requirements of the decision 
maker. 

Among the earliest attempts to incorporate prefer- 
ences in an MOEA, we can find Fonseca and Flem- 
ing’s proposal [51.67, 68]. This proposal consisted of 
extending the ranking mechanism of multiobjective ge- 
netic algorithm (MOGA) [51.69] using the so-called 
preferability relation. This relation accommodates goal 
information (equivalent to a reference point in other 
methods) and priorities in a single preference relation. 
The DM should define goal values and group objec- 
tives according to its priority. Using the preferability 
relation, two solutions are first compared in terms of 
the group of objectives with the highest priority. If the 
objectives of both solutions meet all their goal values 
or, contrarily, violate some or all of their goal values 
in a similar way, the next priority objective group is 
considered. This process continues until reaching the 
lowest priority group, where solutions are compared us- 
ing the Pareto dominance relation. By setting particular 
goals and priorities the authors derived the following 
special cases: the usual Pareto relation, lexicographic 
relation, constrained optimization, and goal program- 
ming. One disadvantage of this relation is that it is 
affected by the feasibility of the goal provided by the 
decision maker. If the given goal is far away from the 
feasible region, then the solutions will be mainly com- 
pared in terms of the objective priorities, reducing the 
relation to the lexicographic relation. In addition, if two 
solutions either do or do not meet their goals, the rela- 
tion does not take into account the degree of under- or 
over-attainment. 

Deb [51.70] proposed a technique to transform goal 
programming problems into multiobjective optimiza- 
tion problems which are then solved using an MOEA. 
In goal programming, the DM has to assign goals that 
wishes to achieve for each objective, and these val- 
ues are incorporated into the problem as additional 


constraints. The objective function then attempts to 
minimize the absolute deviations from the goals to the 
objectives. Unfortunately, as the previous method, this 
approach is sensitive to the feasibility of the goal val- 
ues. If the goal is contained in the feasible space, it 
could prevent the generation of a better solution. On 
the other hand, if the goal is located far away from the 
feasible space, the effect of the method is practically 
nonexistent. 

More recently, Deb and Sundar [51.71] incor- 
porated a reference point approach into the NSGA- 
II [51.72]. They introduced a modification in the crowd- 
ing distance operator in order to select from the last 
nondominated front the solutions that would take part 
of the new population. They used the Euclidean dis- 
tance to sort and rank the population accordingly (the 
solution closest to the reference point receives the best 
rank). This method was designed to take into account 
a set of reference points. The drawback of this scheme 
is that it only guarantees weak Pareto optimality. That 
is to say, besides Pareto optimal solutions, the method 
might generate some weakly Pareto optimal solutions, 
particularly in MOPs with disconnected Pareto fronts. 
A similar approach was also proposed by Deb and 
Kumar [51.73], in which the light beam search pro- 
cedure [51.74] was incorporated into the NSGA-II. 
Similar to the previous approach, they modified the 
crowding operator to incorporate DM’s preferences. 
They used a weighted achievement function to assign 
a crowding distance to each solution in each front. 
Thus, the solution with the least distance will have 
the best crowding rank. Like in the previous approach, 
this algorithm finds a subset of solutions around the 
optimum of the achievement function adopting the 
usual outranking relation. A vector z! outranks vec- 
tor z? if z! is considered to be at least as good as 2’. 
In [51.74], three kinds of thresholds are defined to de- 
termine if one solution outranks another one, namely, 
indifference, preference, and veto threshold. However, 
in [51.73] the veto threshold is the only one used. This 
relation depends on the crowding comparison opera- 
tor. In contrast, the new preference relation presented 
in this work does not depend on external methods, 
and, therefore, it can be used in every Pareto-based 
MOEA. 

Recently, Thiele etal. [51.75] proposed a vari- 
ant of the indicator-based evolutionary algorithm 
(IBEA) [51.76], in which preference information is 
incorporated by means of an achievement scalariza- 
tion function. The basic idea is to divide the original 
indicator value (which is to be maximized) by the 


1041 


is |3 Hed 


1042 


Gls | 3 Hed 


Part E 


Evolutionary Computation 


Fig. 51.5 Nondominated solutions with respect to the 
Chebyshev relation 


achievement value (which is to be minimized). Thus, 
solutions with a smaller achievement value will be pre- 
ferred since the modified indicator value is larger. In 
a further paper, the new IBEA of Thiele et al. was used 
in [51.77] in order to approximate the entire Pareto front 
by defining several reference points. 

A recent interactive optimization method was pro- 
posed by López Jaimes etal. [51.78] to deal with 


MOpPs. This method is based on a Chebyshev achieve- 
ment function. The basic idea of the Chebyshev pref- 
erence relation is to combine the Pareto dominance 
relation and the achievement function to compare so- 
lutions in objective function space. The Chebyshev 
preference relation is defined as follows. 


Definition 51.11 
A solution z! is preferred to solution z? with respect to 
the Chebyshev relation (z! <cheby 2’), if and only if: 


1. Soo (zi, zy < Soo (z7, zi) A {z! ¢ R(z*, 8) vz? ¢ 
R(z**, 8)}, or, 
2, gag A {z!, z? ERG™,d)}, 


where 
Ra, 8) = {z | Solz, 2") < gin a 5} 


is the region of interest (ROI) with respect to the vector 
of aspiration levels z". 


As an illustration of the preference relation, con- 
sider solutions z! and z? presented in Fig. 51.5. 
Since z? €R(z™,8) and soo(z! Z") < S90(z7,2), 
then z! ~<cheby z. 


51.5 Recombination Operators and Mating Restrictions 


The idea of restricted mating is not new in the field of 
evolutionary optimization. For instance, in 1989 Deb 
and Goldberg [51.79] suggested the use of restrictive 
mating with respect to the phenotypic (i. e., using the 
decoded values of the variables) distance using some 
metric. A different approach consisted in distributing 
solutions on a logical topology. For example, Baita 
et al. [51.80] placed solutions on a grid and restricted 
the area within which each solution could mate. For 
more examples of restricted mating, the reader is re- 
ferred to [51.2]. 

Recently, specific mating techniques to deal with 
many-objective problems have been proposed. Sato 
etal. [51.81] described a local recombination scheme 
that recombines individuals if they have similar search 
directions in the objective space. The search direction is 
defined by the polar coordinates of each solution, i. e., 
its norm and declination angles to the axis associated 
with the first k— 1 objectives. 


In order to control the disruptive effect of re- 
combination, in [51.33], a crossover operator for bi- 
nary representation was proposed, namely the con- 
trolling crossed genes (CCG) operator. This tech- 
nique was applied into the two-point and uniform- 
crossover operators. In two-point crossover, from the 
three binary segments in which two parents are di- 
vided, the middle segment is exchanged between 
the parents to produce two children. Thus, in the 
CCG operator for two-point crossover, the length of 
the middle segment is regulated by a user param- 
eter. This way, as the middle segment gets shorter, 
the generated children become more similar to each 
parent. 

Regarding uniform crossover, the number of ex- 
changed bits between parents is regulated with the 
probability of writing a 1 or a 0 in the bit mask string 
that determines which parent bit will be copied into the 
produced offspring. 


Many-Objective Problems: Challenges and Methods 


51.7 Conclusions and Research Paths 


51.6 Scalarization Methods 


Most of the scalarization methods have in common the 
following mechanisms (although they differ in the way 
in which they are implemented): 


© Aclass of scalarizing function to evaluate solutions. 

© A mechanism to generate a uniform distribution of 
search direction vectors. 

© A mechanism to obtain an overall ranking of the 
solutions derived from the evaluation of each scalar- 
izing function. 


Hughes [51.82] proposed a method in which the 
weighted Chebyshev function and the vector angle dis- 
tance scaling are used as scalarizing functions. The 
method to generate the search direction is formulated 
as the problem of maximizing the angle between each 


51.7 Conclusions and Research Paths 


This chapter presented a short review of the current ad- 
vances to cope with optimization problems with a high 
number of objectives MOPs using MOEA. We covered 
results aimed at discovering and studying the causes 
that make an MOP more difficult as more objectives are 
aggregated. We also described and classified some of 
the current techniques to deal with MOPs. 

Regarding the sources of difficulty of many- 
objective optimization problems, we can realize that 
most of the initial works are based on experimental 
analysis, and only a few studies are focused on in- 
vestigating the nature of the problem using theoretical 
considerations. When the interest on many-objective 
optimization problems begun, some hypotheses about 
the causes of the poor performance of MOEA on MOPs 
were suggested. Although some of them were con- 
sidered highly probable and may turn out to be true, 
further investigation is still needed to confirm or refute 
these hypotheses. This was the case of the propor- 
tion of nondominated solutions, which was often taken 
as a sufficient condition to increase the difficulty of 
an MOP. However, recent studies have shown that there 
exists some problems, in which this proportion rises 
exponentially, while the hardness of the problem only 


pair of neighboring search vectors. The fitness of each 
solution in the current population is based on the best 
result obtained over all the scalarizing function, i. e., the 
search direction in which the solution performs better. 

Another algorithm that has been recently tested 
in many-objective problems is the multiobjective 
evolutionary algorithm based on decomposition 
(MOEA/D) [51.83]. In [51.84], the performance of 
MOEA/D using either a weighted sum function or 
a Chebyshev function was studied using several in- 
stances of a knapsack problem. The results showed that 
the weighted sum function provided better results than 
the Chebyshev function, while in nonconvex problems, 
the Chebyshev function helped to achieve a better 
performance of MOEA/D. 


increases marginally. In this sense, future research paths 
must be channeled to investigate other sources of diffi- 
culty. Some promising areas of future research are, for 
example, the following: 


@ Since DRS are not present in every MOP, a charac- 
terization of the problems that promote the creation 
of DRSs is required. 

@ Investigate if recombination operators in continuous 
spaces also represent an issue as observed in dis- 
crete spaces. 


Regarding the methods to solve MOPs, many pro- 
posals have been designed to improve the search ability 
of MOEAs in high-dimensional scenarios. However, 
a few efforts are perceived for developing visualization 
methods specialized for MOPs. Similarly, more pro- 
posals for coping with the dimensionality of the Pareto 
front are needed. For instance, diversity mechanisms 
that are effective in large spaces or data structures to 
efficiently manage a large number of solutions. With 
respect to the assessment of a new MOEA in many- 
objective scenarios, our recommendation is adopting 
a diverse set of MOPs, taking instances from different 
families of test suites. 


1043 


21S |3 Hed 


1044 PartE 


Evolutionary Computation 


LS | 3 Hed 


References 


51.1 


51.21 


51.12 


51.13 


51.14 


51.15 


J.D. Schaffer: Multiple objective optimization with 
vector evaluated genetic algorithms, Proc. 1st Int. 
Conf. Genet. Algorithms (1985) pp. 93-100 

C.A. Coello Coello, G.B. Lamont, D.A. Van Veld- 
huizen: Evolutionary Algorithms for Solving Multi- 
Objective Problems, 2nd edn. (Springer, New York 
2007) 

C.M. Fonseca, P.J. Fleming: Multiobjective opti- 
mization and multiple constraint handling with 
evolutionary algorithms — Part Il: Application Ex- 
ample, IEEE Trans. Syst. Man Cybern. Part A 28(1), 
38-47 (1998) 

E.J. Hughes: Radar waveform optimisation as 
a many-objective application benchmark, Lect. 
Notes Comput. Sci. 4403, 700-714 (2007) 

T. Stewart, 0. Bandte, H. Braun, N. Chakraborti, 
M. Ehrgott, M. Göbelt, Y. Jin, H. Nakayama, S. Poles, 
D. Di Stefano: Real-world applications of multiob- 
jective optimization, Lect. Notes Comput. Sci. 5252, 
285-327 (2009) 

R.A. Shah, P.M. Reed, T.W. Simpson: Many- 
objective evolutionary optimisation and visual 
analytics for product family design. In: Multi- 
objective Evolutionary Optimisation for Product De- 
sign and Manufacturing, ed. by L. Wang, A.H.C. Ng, 
K. Deb (Springer, London 2011) pp. 137-159 

E.J. Hughes: Evolutionary many-objective optimi- 
sation: Many once or one many?, IEEE Congr. Evol. 
Comput. (CEC'2005), Vol. 1 (IEEE Service Center, Ed- 
inburgh 2005) pp. 222-227 

T. Wagner, N. Beume, B. Naujoks: Pareto-, 
aggregation-, and indicator-based methods in 
many-objective optimization, Lect. Notes Comput. 
Sci. 4403, 742-756 (2007) 

K. Praditwong, X. Yao: How well do multi-objective 
evolutionary algorithms scale to large problems, 
IEEE Congr. Evol. Comput. (CEC'2007) (IEEE, Singa- 
pore 2007) pp. 3959-3966 

0. Teytaud: On the hardness of offline multi- 
objective optimization, Evol. Comput. 15, 475-491 
(2007) 

J. Knowles, D. Corne: Quantifying the effects of 
objective space dimension in evolutionary mul- 
tiobjective optimization, Lect. Notes Comput. Sci. 
4403, 757-771 (2007) 

V. Pareto: Cours D'Economie Politique (F. Rouge, 
Paris 1896) 

FY. Edgeworth: Mathematical Psychics (P. Keagan, 
London 1881) 

K. Deb, L. Thiele, M. Laumanns, E. Zitzler: Scalable 
multi-objective optimization test problems, Congr. 
Evol. Comput. (CEC'2002), Vol. 1 (IEEE Service Center, 
New Jersey 2002) pp. 825-830 

E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, 
V.G. da Fonseca: Performance assessment of mul- 


51. 


51. 


51. 


51. 


51. 


Dis 


51. 


51. 


51. 


51. 


51. 


51. 


51. 


51. 


51. 


25 


26 


27 


30 


tiobjective optimizers: An analysis and review, IEEE 
Trans. Evol. Comput. 7(2), 117-132 (2003) 

K.M. Miettinen: Nonlinear Multiobjective Opti- 
mization (Kluwer, Boston 1998) 

M. Ehrgott: Multicriteria Optimization, 2nd edn. 
(Springer, Berlin 2005) 

C. Carlsson, R. Fullér: Multiple criteria decision 
making: The case for interdependence, Comput. 
Oper. Res. 22(3), 251-260 (1995) 

R.C. Purshouse, P.J. Fleming: Conflict, harmony, 
and independence: Relationships in evolutionary 
multi-criterion optimisation, Lect. Notes Comput. 
Sci. 2632, 16-30 (2003) 

K.C. Tan, E.F. Khor, T.H. Lee: Multiobjective Evo- 
lutionary Algorithms and Applications (Springer, 
London 2005) 

D. Brockhoff, E. Zitzler: Are all objectives necessary? 
On dimensionality reduction in evolutionary mul- 
tiobjective optimization, Lect. Notes Comput. Sci. 
4193, 533-542 (2006) 

T. Gal, T. Hanne: Consequences of dropping 
nonessential objectives for the application of MCDM 
methods, Eur. J. Oper. Res. 119(2), 373-378 (1999) 
M. Farina, P. Amato: On the optimal solution def- 
inition for many-criteria optimization problems, 
Proc. NAFIPS-FLINT Int. Conf. 2002 (IEEE Service Cen- 
ter, New Jersey 2002) pp. 233-238 

P. Winkler: Random orders, Order 1(4), 317-331 
(1985) 

R.C. Purshouse, P.J. Fleming: Evolutionary multi- 
objective optimisation: An exploratory analysis, 
Proc. Congr. Evol. Comput. (CEC'2003), Vol. 3 (IEEE, 
Canberra 2003) pp. 2066-2073 

K. Deb, D.K. Saxena: Searching for Pareto-optimal 
solutions through dimensionality reduction for 
certain large-dimensional multi-objective opti- 
mization problems, IEEE Congr. Evol. Comput. 
(CEC'2006) (IEEE, Vancouver 2006) pp. 3353-3360 
M. Köppen, K. Yoshida: Substitute distance as- 
signments in NSGA-II for handling many-objective 
optimization problems, Lect. Notes Comput. Sci. 
4403, 727-741 (2007) 

H. Ishibuchi, N. Tsukamoto, Y. Nojima: Evolution- 
ary many-objective optimization: A short review, 
Congr. Evol. Comput. (CEC'2008) (IEEE Service Cen- 
ter, Hong Kong 2008) pp. 2424-2431 

0. Schütze, A. Lara, C.A. Coello Coello: On the in- 
fluence of the number of objectives on the hard- 
ness of a multiobjective optimization problem, IEEE 
Trans. Evol. Comput. 15(4), 444-455 (2011) 

K. Ikeda, H. Kita, S. Kobayashi: Failure of Pareto- 
based MOEAs: Does non-dominated really mean 
near to optimal?, Proc. IEEE Congr. Evol. Comput. 
(CEC'2001), Vol. 2 (IEEE Service Center, Piscataway 
2001) pp. 957-962 


Many-Objective Problems: Challenges and Methods 


References 


51.31 


51.32 


51.33 


51.34 


51.35 


51.36 


51.37 


51.38 


51.39 


51.40 


51.41 


51.42 


51.43 


51.44 


51.45 


T. Hanne: Global Multiobjective optimization with 
evolutionary algorithms: Selection mechanisms 
and mutation control, Lect. Notes Comput. Sci. 
1993, 197-212 (2001) 

S. Huband, L. Barone, L. While, P. Hingston: A scal- 
able multi-objective test problem toolkit, Lect. 
Notes Comput. Sci. 3410, 280-295 (2005) 

H. Sato, H. Aguirre, K. Tanaka: Genetic diversity and 
effective crossover in evolutionary many-objective 
optimization, Lect. Notes Comput. Sci. 6683, 91-105 
(2011) 

P. Sen, J.-B. Yang: Multiple Criteria Decision Support 
in Engineering Design (Springer, London 1998) 

M. Laumanns, L. Thiele, K. Deb, E. Zitzler: Combin- 
ing convergence and diversity in evolutionary mul- 
tiobjective optimization, Evol. Comput. 10, 263-282 
(2002) 

0. Schütze, M. Laumanns, E. Tantar, C.A. Coello 
Coello, E.-G. Talbi: Computing gap free Pareto front 
approximations with stochastic search algorithms, 
Evol. Comput. 18(1), 65-96 (2010) 

A. Arias Montaño, C.A. Coello Coello, E. Mezura- 
Montes: Multi-objective evolutionary algorithms 
in aeronautical and aerospace engineering, IEEE 
Trans. Evol. Comput. 16(5), 662-694 (2012) 

J. Braun, J. Krettek, F. Hoffmann, T. Bertram: Multi- 
objective optimization with controlled model as- 
sisted evolution strategies, Evol. Comput. 17(4), 
577-593 (2009) 

K. Chiba, A. Oyama, S. Obayashi, K. Nakahashi, 
H. Morino: Multidisciplinary design optimization 
and data mining for transonic regional-jet wing, 
AIAA J. Aircr. 44(4), 1100-1112 (2007) 

T. Kipouros, G.T. Parks, A.M. Savill, D.M. Jaeggi: 
Multi-objective aerodynamic design optimisation, 
ERCOF-TAC design optimization: Methods Appl. 
Conf. Proc., ed. by K.C. Giannakoglou, W. Haase 
(2004), on CD-ROM 

P.M. Kruse, J. Wegener, S. Wappler: A highly con- 
figurable test system for evolutionary black-box 
testing of embedded systems, Proc. 11th Annu. 
Conf. Genet. Evol. Comput., GECCO '09 (ACM, New 
York 2009) pp. 1545-1552 

P. Stewart, D.A. Stone, P.J. Fleming: Design of ro- 
bust fuzzy-logic control systems by multi-objective 
evolutionary methods with hardware in the loop, 
Eng. Appl. Artif. Intell. 17(3), 275-284 (2004) 

P. Wozniak: Preferences in multi-objective evolu- 
tionary optimisation of electric motor speed control 
with hardware in the loop, Appl. Soft Comput. 11(1), 
49-55 (2011) 

M. Emmerich, B. Naujoks: Metamodel assisted 
multiobjective optimisation strategies and their 
application in airfoil design. In: Adaptive Com- 
puting in Design and Manufacture VI, ed. by 
1.C. Parmee (Springer, London 2004) pp. 249-260 
J. Knowles: ParEGO: A hybrid algorithm with on- 
line landscape approximation for expensive mul- 


51.46 


51.47 


51.50 


51.51 


51.52 


51.53 


51.54 


51.55 


51.56 


51.57 


51.58 


51.59 


51.60 


tiobjective optimization problems, IEEE Trans. Evol. 
Comput. 10(1), 50-66 (2006) 

C.A. Georgopoulou, K.C. Giannakoglou: A multi- 
objective metamodel-assisted memetic algorithm 
with strengthbased local refinement, Eng. Optim. 
41(10), 909-923 (2009) 

S. Zapotecas Martinez, C.A. Coello Coello: Amemetic 
algorithm with non gradient-based local search 
assisted by a meta-model, Lect. Notes Comput. Sci. 
6238, 576-585 (2010) 

E.J. Wegman: Hyperdimensional data analysis us- 
ing parallel coordinates, J. Am. Stat. Assoc. 85, 
664-675 (1990) 

S. Obayashi, D. Sasaki: Visualization and data min- 
ing of pareto solutions using self-organizing map, 
Lect. Notes Comput. Sci. 2632, 796-809 (2003) 

J. Branke, K. Deb, K. Miettinen, R. Slowinski (Eds.): 
Multiobjective Optimization: Interactive and Evo- 
lutionary Approaches, Lecture Notes in Computer 
Science, Vol. 5252 (Springer, New York 2008) 

A. Lopez Jaimes, C.A. Coello Coello, D. Chakraborty: 
Objective reduction using a feature selection tech- 
nique, Genet. Evol. Comput. Conf. (GECCO'2008) 
(ACM Press, Atlanta 2008) pp. 674-680 

P.J. Bentley, J.P. Wakefield: Finding acceptable so- 
lutions in the Pareto-optimal range using multi- 
objective genetic algorithms. In: Soft Computing in 
Engineering Design and Manufacturing Part 5, ed. 
by P.K. Chawdhry, R. Roy, R.K. Pant (Springer, Lon- 
don 1997) pp. 231-240 

N. Drechsler, R. Drechsler, B. Becker: Multi-objected 
optimization in evolutionary algorithms using sat- 
isfyability classes, Lect. Notes Comput. Sci. 1625, 
108-117 (1999) 

F. di Pierro, S.-T. Khu, D.A. Savić: An investigation 
on preference order ranking scheme for multiob- 
jective evolutionary optimization, IEEE Trans. Evol. 
Comput. 11(1), 17-45 (2007) 

|. Das: A preference ordering among various Pareto 
optimal alternatives, Struct. Multidiscip. Optim. 
18(1), 30-35 (1999) 

H. Sato, H.E. Aguirre, K. Tanaka: Controlling dom- 
inance area of solutions and its impact on the 
performance of MOEAs, Lect. Notes Comput. Sci. 
4403, 5-20 (2007) 

M. Farina, P. Amato: A fuzzy definition of “optimal- 
ity" for many-criteria optimization problems, IEEE 
Trans. Syst. Man Cybern. Part A 34(3), 315-326 (2004) 
|. Das: On characterizing the “knee” of the Pareto 
curve based on normal-boundary intersection, 
Struct. Optim. 18(2/3), 107-115 (1999) 

C.A. Mattson, A.A. Mullur, A. Messac: Smart Pareto 
filter: Obtaining a minimal representation of mul- 
tiobjective design space, Eng. Optim. 36(6), 721-740 
(2004) 

J. Branke, K. Deb, H. Dierolf, M. Osswald: Finding 
knees in multi-objective optimization, Lect. Notes 
Comput. Sci. 3242, 722-731 (2004) 


1045 


LS | 3 Hed 


1046 PartE 


Evolutionary Computation 


LS | 3 Hed 


51.61 


51.62 


51.63 


51.64 


51.65 


51.66 


51.67 


51.68 


51.69 


51.70 


51.71 


51.72 


0. Schütze, M. Laumanns, C.A. Coello Coello: Ap- 
proximating the knee of an MOP with stochastic 
search algorithms, Lect. Notes Comput. Sci. 5199, 
795-804 (2008) 

K. Deb, S. Agrawal, A. Pratap, T. Meyarivan: A fast 
elitist non-dominated sorting genetic algorithm 
for multi-objective optimization: NSGA-II, Lect. 
Notes Comput. Sci. 1917, 849-858 (2000) 

P. Mitra, C.A. Murthy, S.K. Pal: Unsupervised fea- 
ture selection using feature similarity, IEEE Trans. 
Pattern Anal. Mach. Intell. 24(3), 301-312 (2002) 

A. López Jaimes, C.A. Coello Coello, J.E. Urías Bar- 
rientos: Online objective reduction to deal with 
many-objective problems, Lect. Notes Comput. Sci. 
5467, 423-437 (2009) 

D. Brockhoff, E. Zitzler: Improving hypervolume- 
based multiobjective evolutionary algorithms by 
using objective reduction methods, 2007 IEEE 
Congr. Evol. Comput. (CEC'2007) (IEEE, Singapore 
2007) pp. 2086-2093 

A. López Jaimes, H. Aguirre, K. Tanaka, C.A. Coello 
Coello: Objective space partitioning using conflict 
information for many-objective optimization, Lect. 
Notes Comput. Sci. 6238, 657-666 (2010) 

C.M. Fonseca, P.J. Fleming: Genetic algorithms for 
multiobjective optimization: Formulation, discus- 
sion and generalization, Proc. 5th Int. Conf. Genet. 
Algorithms, ed. by S. Forrest (Morgan Kauffman, 
San Mateo 1993) pp. 416-423 

C.M. Fonseca, P.J. Fleming: Multiobjective opti- 
mization and multiple constraint handling with 
evolutionary algorithms — Part |: A unified formu- 
lation, IEEE Trans. Syst. Man Cybern. Part A 28(1), 
26-37 (1998) 

C.M. Fonseca, P.J. Fleming: An overview of evolu- 
tionary algorithms in multiobjective optimization, 
Evol. Comput. 3(1), 1-16 (1995) 

K. Deb: Solving goal programming problems using 
multi-objective genetic algorithms, Congr. Evol. 
Comput. 1999 (IEEE Service Center, Washington 
1999) pp. 77-84 

K. Deb, J. Sundar: Reference Point Based Multi- 
Objective Optimization Using Evolutionary 
Algorithms, 2006 Genetic Evol. Comput. Conf. 
(GECCO'2006), ed. by M. Keijzer (ACM, Seattle 2006) 
pp. 635-642 

K. Deb, A. Pratap, S. Agarwal, T. Meyarivan: 
A fast and elitist multiobjective genetic algorithm: 


51.73 


51.74 


51.75 


51.76 


51.77 


51.78 


51.79 


51.80 


51.81 


51.82 


51.83 


51.84 


NSGA-II, IEEE Trans. Evol. Comput. 6(2), 182-197 
(2002) 

K. Deb, A. Kumar: Light beam search based multi- 
objective optimization using evolutionary algo- 
rithms, IEEE Congr. Evol. Comput. (CEC'2007) (IEEE, 
Singapore 2007) pp. 2125-2132 

A. Jaszkiewicz, R. Stowinski: The light beam search 
approach — An overview of methodology and ap- 
plications, Eur. J. Oper. Res. 113(2), 300-314 (1999) 
L. Thiele, K. Miettinen, P.J. Korhonen, J. Molina: 
A preference-based evolutionary algorithm for 
multi-objective optimization, Evol. Comput. 17, 
411-436 (2009) 

E. Zitzler, S. Kiinzli: Indicator-based selection in 
multiobjective search, Lect. Notes Comput. Sci. 
3242, 832-842 (2004) 

J.R. Figueira, A. Liefooghe, E.-G. Talbi, A.P. Wierz- 
bicki: A parallel multiple reference point approach 
for multi-objective optimization, Eur. J. Oper. Res. 
205(2), 390-400 (2010) 

A. Lopez-Jaimes, A. Arias-Montafo, C.A. Coello 
Coello: Preference incorporation to solve many- 
objective airfoil design problems, IEEE Congr. Evol. 
Comput. (CEC'2011) (2011) pp. 1605-1612 

K. Deb, D.E. Goldberg: An investigation of niche and 
species formation in genetic function optimiza- 
tion, Proc. 3rd Int. Conf. Genet. Algorithms (Morgan 
Kaufmann, San Francisco 1989) pp. 42-50 

F. Baita, F. Mason, C. Poloni, W. Ukovich: Ge- 
netic algorithm with redundancies for the vehicle 
scheduling problem. In: Evolutionary Algorithms 
in Management Applications, ed. by J. Biethahn, 
V. Nissen (Springer, Berlin 1995) pp. 341-353 

H. Sato, H.E. Aguirre, K. Tanaka: Local dominance 
and local recombination in MOEAs on 0/1 multiob- 
jective knapsack problems, Eur. J. Oper. Res. 181(3), 
1708-1723 (2007) 

E.J. Hughes: MSOPS-II: A general-purpose many- 
objective optimiser, IEEE Congr. Evol. Comput. 
(CEC'2007) (IEEE, Singapore 2007) pp. 3944-3951 

Q. Zhang, H. Li: MOEA/D: A multiobjective evolu- 
tionary algorithm based on decomposition, IEEE 
Trans. Evol. Comput. 11(6), 712-731 (2007) 

H. Ishibuchi, Y. Sakane, N. Tsukamoto, Y. Nojima: 
Adaptation of scalarizing funtions in MOEA/D: An 
adaptive scalarizing funtion-based multiobjective 
evolutionary algorithm, Lect. Notes Comput. Sci. 
5467, 438-452 (2009) 


52. Memetic and Hybrid Evolutionary Algorithms 


Jhon Edgar Amaya, Carlos Cotta Porras, Antonio J. Fernandez Leiva 


This chapter presents an overview of hybridiza- 
tion mechanisms in evolutionary algorithms. Such 
mechanisms are aimed to introducing prob- 
lem knowledge in the optimization technique 
by means of the synergistic combination of 
general-purpose methods and problemspecific 
add-ons. This combination is presented in this 
work from two wide perspectives: memetic al- 
gorithms and cooperative optimization models. 
Memetic algorithms are based on the smart or- 
chestration of global (population-based) and local 
(trajectorybased) techniques, using an algorithmic 
scheme in which the latter are often subordinated 
to the former. As to cooperative models, they are 
based on the collaboration of different optimiza- 
tion techniques that exchange information in order 
to boost their respective performances. Both ap- 
proaches, memetic algorithms and cooperative 


52.1 Overview 


Heuristic methods are aimed to efficiently produce 
near-optimal solutions for hard problems (optimiza- 
tion problems in particular). We are here specifically 
concerned with those methods used to solve an op- 
timization problem by means of an intelligent explo- 
ration of the search space and the fruitful exploitation 
of knowledge about the problem structure. This is ad- 
mittedly a very broad class of methods that comprise — 
among others — classical artificial intelligence tools 
such a the A* algorithm as well as modern optimiza- 
tion techniques such as metaheuristics [52.1]. The latter 
are general-purpose techniques for optimization that 
guide some underlying basic heuristics for intelligently 
exploring the search space of the problem under con- 
sideration. 

There exists a plethora of metaheuristic methods, 
each of them with its own distinctive features and gov- 
erning parameters, typically (yet not always) based 


52.1 OVErVieW ...............cccccccccsseeeeeeeeeeseeeeaes 1047 
52.2 A Bird's View 

of Evolutionary Algorithms................... 1049 
52.3 From Hybrid Metaheuristics 

to Hybrid EAS... ceecee: 1050 

52.3.1 Hybridization Mechanisms......... 1050 

5832 6011 8 i eee 1051 
52.4 Memetic Algorithms................0.....::0008 1052 
52.5 Cooperative Optimization Models.......... 1055 
52.6 CONCIUSIONS.......0005506.36.secsscascedessaseiceneds 1056 
PTO E e o os os sss A ES 1056 


models, provide a framework to achieve synergis- 
tic algorithmic combinations for the resolution of 
large-scale combinatorial problems. 


on some analogy of a real-world phenomenon (be it 
in the area of biology, zoology, physics, etc.) Indeed, 
there have been several attempts in the literature to 
classify these techniques according to different crite- 
ria, e.g. whether they are inspired by nature or not, 
use of memory, neighborhood structure, use of sin- 
gle solutions or populations thereof, etc. Blum and 
Roli [52.1] proposed a classification in which a dis- 
tinction was firstly made between trajectory-based (or 
single-point search) and population-based techniques 
(see also Fig. 52.1). The former can be depicted as fol- 
lowing a particular trajectory (sequence of points) in 
the search space by the smart exploration of the neigh- 
borhood of a single solution (this is to some extent 
an oversimplification, since trajectory-based techniques 
are often endowed with intensification/diversification 
mechanisms that may turn this trajectory into complex 
branching paths; nevertheless, it serves as an initial 


1047 


1048 PartE | Evolutionary Computation 


LZS |3 Hed 


Nature-inspired/ 
not-nature inspired 


Population-based/ 
single point search 


analogy). The latter are, however, better imagined as 
a cloud of points moving through the search space, 
expanding and contracting according to some internal 
dynamics. 

Despite what the above depiction may suggest in 
terms of the superiority or adequateness of methods 
falling within some particular class, it is not possi- 
ble to state that any method is better than any other 
one, at least not in a general sense. This is a some- 
what counterintuitive result that was formally derived 
by Wolpert and Macready [52.2] in the so-called “no 
free lunch theorem” (NFL). This theorem can be for- 
mulated as 


>> PAn fA D=) P@mIf.B.e), (52.1) 
f f 


where P(x; |f, A, e) is the probability that algorithm A 
detects the optimal solution for a generic objective 
function f using computational effort e (i.e., gener- 
ating e different solutions) and P(x,,|f,B,e) is the 
analogous probability for algorithm B. In other words, 
the average performance of any pair of algorithms 
across all possible problems defined on particular do- 
mains and co-domains is identical. Hence, whenever an 
algorithm performs well on a certain problem or class 
of problems, it follows that it will exhibit degraded per- 
formance on the set of all remaining problems. While 
the initial assumptions from which the NFL theorem is 
derived are questionable (most importantly, the consid- 
eration of all possible problems include many functions 
that are random or incompressible in an algorithmic 
information-complexity sense, and hence cannot be ef- 
ficiently calculated, thus rendering them irrelevant from 
an optimization point of view), the concept that there 
is no universal optimizer had a significant impact on 
the scientific community and provides a safe ground 
onto which particular optimization procedures can be 


One neighborhood structure/ 
various neighborhood structures 


Method with memory/ 
memory-less methods 


H Metaheuristic 
Metaheuristic 
|| 


Dynamic objective function/ 
static objective function 


Fig. 52.1 Classification of metaheuristics according to Blum and 
Roli (after [52.1]) 


built. To be precise, the NFL theorem highlights the 
limitations of black-box optimization procedures, i. e., 
techniques whose search strategy is independent or un- 
aware of the internal working of the objective function 
that is being optimized, and emphasizes the need for 
trying to exploit domain knowledge within the search 
algorithm in order to tailor the optimization process to 
the problem under consideration. 

The argument above is commonly used to support 
the development and utilization of hybrid metaheuris- 
tics, where the term hybrid is used to denote in a broad 
sense the exploitation of problem-dependent knowl- 
edge (typically attained via the sensible combination of 
general-purpose and problem-specific mechanisms). In- 
deed, these hybrid methods can be shown to provide 
an efficient behavior and notable flexibility for deal- 
ing with real-world problems. The general idea here 
is achieving a synergetic combination of complemen- 
tary techniques in order to enhance their strengths and 
having their weaknesses alleviated. Roughly speaking, 
such hybrid approaches can be attained via two dif- 
ferent (and complementary) approaches: cooperation 
(the techniques involved exchange information in order 
to boost their respective performances) and integration 
(one of the techniques is subordinated to the other one, 
which uses the former as a tool to achieve some internal 
goal) [52.3]. 

Arguably, one of the advantages (if not from 
the performance point of view at least from the de- 
sign point of view) of population-based methods over 
trajectory-based methods is their greater flexibility 
when it comes to integrating different metaheuristics. 
For example, cooperative methods can often be de- 
fined as a population of (possibly heterogenous) search 
agents exchanging information according to some un- 
derlying connection topology. Different architectures 
for such cooperative methods have been defined, e.g. 
MAGMA [52.4] or COSEARCH [52.5], depending 
on the communication strategy and the intervening 
methods. We can also cite hyper-heuristics [52.6, 7] 
in this regard, i.e., the use of a high-level heuristic 
to control the application of a population of low-level 
heuristics. 

The above ideas fit nicely with the notion of 
memetic algorithm (MA). MAs are a family of meta- 
heuristics that try to blend several concepts from 
population-based and trajectory-based techniques. The 
term memetic comes from meme, a word coined by 
Dawkins [52.8] as an analogy to the gene in the con- 
text of cultural evolution. In this sense, there is, indeed, 
a connection between cultural evolution and memetic 


Memetic and Hybrid Evolutionary Algorithms | 52.2 A Bird's View of Evolutionary Algorithms 1049 


algorithms, in the sense that memes are much more 
plastic and flexible than genes — and hence evolve 
faster — and can be subject to lifetime learning, thus 
leading to the transmission of acquired traits (much un- 
like biological evolution). Due to the way in which this 
can be implemented, MAs are often termed hybrid EAs 
or Lamarckian EAs, among other fancy terms. From 
a general perspective, we can say that an MA is a search 
strategy in which a population of optimizing agents — 
explicitly concerned with using knowledge from the 
problem being solved — synergistically cooperate and 
compete [52.9]. 

Focusing on combinatorial optimization problems, 
that is, problems whose solution space is composed 
of combinatorial structures such as graphs, trees, sets, 
lists, permutations, etc., built on a discrete collec- 
tion of variables, MAs and hybrid metaheuristics are 
very well suited to their resolution. On the one hand, 
the solutions to these problems are information-rich 
structures that the algorithmic designer can analyze 
in order to extract problem information to be later 
used in the optimizer (some attempts have been made 
to automatically extract this information in combina- 


torial contexts as well [52.10]). This contrasts with 
most continuous optimization problems in which the 
high-dimensionality and highly non-linear coupling of 
variables makes them much more opaque in general 
(not to mention that black-box scenarios are more fre- 
quent in this continuous domain, e.g., optimization of 
physical or industrial processes via simulations of the 
system). On the other hand, it is very often the case 
that the objective function for combinatorial problems 
is decomposable or at least incrementally computable, 
meaning that after a small perturbation has been intro- 
duced in a solution the latter does not need to be fully 
evaluated from scratch (only an incremental term de- 
pendent on the modification done must be computed). 
This makes the use of local search strategies much 
more computationally amenable than in most contin- 
uous domains. Before describing MAs in more detail, 
let us first overview some generic ideas about evo- 
lutionary algorithms (EAs) and hybrid metaheuristics. 
Throughout the discussion we will focus on combi- 
natorial problems and provide illustrative examples 
on the instantiation of these techniques for discrete 
optimization. 


52.2 A Bird's View of Evolutionary Algorithms 


An EA is a stochastic iterative procedure for gener- 
ating candidate solutions for a certain problem. The 
algorithm manipulates a pool pop of individuals (the so- 
called population), each of them carrying one or more 
chromosomes. Chromosomes are, in turn, composed 
of smaller pieces called genes, each of them taking 
a value from a certain domain (the allele set). Chro- 
mosomes represent a solution for the problem at hand 
via an encoding/decoding process. More precisely, EAs 
assume the existence of a phenotype space compris- 
ing the solutions for the problem under consideration 
and a genotype space, comprising all possible chro- 
mosomes. It is between these two sets that the growth 
(or expression) function is defined so as to have the 
mapping between chromosomes and solutions. While 
in some cases these two spaces may be identical, this 
does not generally happen. In this general situation, the 
growth function is merely required to be surjective. 
The pool of solutions is initialized either at ran- 
dom or by means of some heuristic seeding procedure. 
Each individual then receives a fitness value quantify- 
ing how good the solution it carries is. This value will 
be used by the EA for guiding the search. The ini- 


tial population is actually the playground on which the 
EA will subsequently work, iteratively applying some 
evolutionary operators to modify its contents. More pre- 
cisely, the process comprises two major stages: selec- 
tion (promising solutions are selected for breeding and 
survival), and reproduction (new solutions are created 
by modifying selected solutions using some reproduc- 
tive operators). Selection is further decomposed into 
two sub-stages: the first one is selection for reproduc- 
tion (often simply called selection) in which solutions 
from the population are picked and fed to the repro- 
duction stage; the second one is selection for survival 
(commonly called replacement) in which the new solu- 
tions obtained in the reproduction stage are inserted in 
the population at the expense of removing some older 
solutions. Both selection sub-stages are present in EAs 
(although in some cases one of these sub-stages may 
take a very simplistic form; e.g., random selection for 
reproduction is sometimes used in evolution strategies). 
This selection—production cycle is repeated until a cer- 
tain termination criterion (usually reaching a maximum 
number of fitness computations; some more complex 
criteria based on stagnation detection are also possi- 


77s |3 Hed 


1050 Part E | Evolutionary Computation 


EZS | J Hed 


Hybrid 
metaheuristics 


ble [52.11]) is fulfilled. Each iteration of this process 
is commonly termed a generation. The whole process is 
illustrated in Algorithm 52.1. Every possible instantia- 
tion of this algorithmic template leads to a different EA. 


Algorithm 52.1A Basic Evolutionary Algorithm 
1: function BasicEA (in P: Problem, in par: Parame- 
ters): Solution; 
2: begin 


pop < INITIALIZE(par, P) ; 

repeat 
newpop, < SELECT(pop, par, P) ; 
newpop2 <- REPRODUCE(newpop, par, P) ; 
pop <+ REPLACE (pop, newpop2) ; 

until TERMINATIONCRITERION(par); 

9: return GETBEST(pop) ; 

10: end 


BO: SON TS OF 


52.3 From Hybrid Metaheuristics to Hybrid EAs 


As was mentioned before, hybrid metaheuristics (and 
in particular hybrid EAs) are developed aiming to at- 
tain a synergistic combination of several techniques, 
exploiting their strengths and mitigating their weak- 
nesses [52.12]. Besides the theoretical justifications for 
hybrid metaheuristics (arising, for example, from the 
NFL theorem sketched before), these techniques have 
been repeatedly vindicated by their practical success. 
Before getting to hybrid EAs, let us first focus on how 
hybridization can be approached. 


52.3.1 Hybridization Mechanisms 


As was already mentioned in Sect. 52.1, attempts to 
classify hybrid metaheuristics are manifold [52.13— 
17]. We will focus in the following on two of these, 
namely the classification of Talbi [52.13] and that of 
Raidl [52.16]. 


Heterogeneous/ 


homogeneous 


Ss 


Se 


Fig. 52.2 Classification of hybrid metaheuristics by Talbi (af- 
ter [52.13]) 


Talbi proposed a hierarchical taxonomy based on 
two design issues: functionality and algorithmic archi- 
tecture. According to this, we can distinguish between 
high/low-level hybrids and relay/team-work hybrids. 
Low-level hybridization addresses the functional com- 
position of a single optimization method in which 
a certain function of a metaheuristic is replaced by 
another metaheuristic. On the contrary, in high-level hy- 
brids, the internals of different metaheuristics are non- 
intersecting. As for relay hybridization, it comprises 
models in which a set of metaheuristics are sequen- 
tially applied, each using the output of the previous as 
its input. On the other hand, teamwork hybridization 
represents cooperative optimization models. These two 
distinctions (low versus high, relay versus teamwork) 
are orthogonal, and hence lead to four different com- 
binations. These four classes can, in turn, be refined 
using three additional dichotomies, namely homoge- 
neous versus heterogeneous (referring to the type of 
metaheuristics involved in the hybrid), global versus 
partial (referring to whether or not each technique ex- 
plores the whole search space) and specialist versus 
general (referring to whether or not all algorithms solve 
the same optimization problem). Figure 52.2 shows this 
taxonomy. 

Raidl [52.16], in turn, proposed a hybrid meta- 
heuristic classification centered around four elements: 
type of hybridization, level/strength of hybridiza- 
tion, control strategy, and execution order. Regard- 
ing the type of hybridization, we can distinguish 
between: 


1) Combinations of different metaheuristics 

2) Combination of metaheuristics and problem-spe- 
cific algorithms 

3) Combinations of metaheuristics with general oper- 
ational research (OR), artificial intelligence (AI), or 
constraint programming (CP) techniques. 


Memetic and Hybrid Evolutionary Algorithms 


52.3 From Hybrid Metaheuristics to Hybrid EAs 


Regarding the hybridization strength, we can 
distinguish high-level/weakly-coupled hybrids and 
low-level/strongly-coupled hybrids. As to the con- 
trol strategy, there are two possibilities: integrative 
(a technique takes a subordinate role) and collaborative 
(exchange of information without subordination). 
Finally, the order of execution captures the temporal 
aspect of the interaction among techniques. Thus, we 
can have sequential execution (a technique takes as 
input the output of another technique), intertwined 
execution (both techniques alternate parts of their 
execution at a computational or algorithmic level), 
and parallel execution (the techniques run in parallel). 
Figure 52.3 shows this classification. 


52.3.2 Hybrid EAs 


One of the most classical hybridization approaches 
for EAs is defined in the context of knowledge- 
augmented representations, particularly in the case 
that the solutions sought have an extremely complex 
structure for which a direct search does not seem 
adequate, or with problems that exhibit constraints. 
In the latter case, these can be handled in three 
ways: 


i) By using penalty functions that guide the search to 
feasible solutions 

ii) By using repairing mechanisms that turn infeasible 
solutions into feasible ones 


iii) By defining reproductive operators that always re- 
main in the feasible region. 


While the complexity of the representation and the 
operators can be kept low in the first two cases (i. e., 
the complexity is moved to the fitness function and the 
repairing function, respectively), the third case requires 
either a careful representation safeguarding feasibility, 
or complex operators intelligently handling the con- 
straints of the problem. Focusing on representations, 
decoders [52.18] are commonly used. These provide 
a complex genotype-to-phenotype mapping that may 
not just produce feasible solutions, but can also pro- 
vide better quality solutions. Consider, for example, 
the knapsack problem: solutions are sets of objects in 
this case, but clearly a random set may be infeasible 
due to the knapsack capacity constraint. This could 
be handled with a penalty term to account for this 
capacity violation or by adding/removing some ob- 
jects to turn the solution into a feasible one [52.19]. 
A decoder approach could, however, encode solutions 
as permutations, indicating the order in which ob- 
jects are to be considered for inclusion in the knap- 
sack. Since any object violating the capacity constraint 
would be skipped, a feasible solution would be al- 
ways obtained. Problem-space search [52.20] — the 
use of a construction heuristic that is guided through 
problem-space — also falls within this class of low- 
level/strong hybrids. Following with the knapsack prob- 
lem, solutions could in this case be represented as 


Metaheuristics with metaheuristics 


Type of hybridization | ——— | Metaheuristics other techniques 


Metaheuristics with problem-specific 
algorithms 


— > High-level 


Hybrid metaheuristics 


Level of hybridization 


Low-level 
see Sequential 


Control strategy | Interleaved 


Parallel 
Integrative 
Order of execution 


e _ Collaborative 


Fig. 52.3 Classification of hybrid metaheuristics by Raid (after [52.16]) 


1051 


EZS | J Hed 


1052 


HZS | J Hed 


Part E 


Evolutionary Computation 


perturbations of the value of objects. Each of these 
solutions would be evaluated by constructing the so- 
modified problem instance, solving it with a greedy 
heuristic and using the original instance to evalu- 
ate the quality of the solution obtained. This strategy 
is very competitive for this problem, as is shown 
in [52.21]. 

On the other hand, high-level/weak hybrid evolu- 
tionary algorithms are most typically obtained either by 
integrating within the EA a local-search (single-point 
or trajectory-based) method, or other techniques from 
the realm of OR/AI/CP/..., etc. Regarding the former 
approach, the underlying idea is to boost the intensi- 
fication capabilities of the algorithm by improving the 
solutions generated by the population-based search en- 
gine. This kind of combination dates back to the late 
1980s, when it used to take the form of a genetic 
algorithm hybridized with simulated annealing (SA), 
[52.22]. A particularly interesting hybrid EA along this 
line is the parallel recombinative simulated annealing 
algorithm [52.23], in which a pool of SA algorithms 
cooperate/compete in a genetic algorithm framework. 
Tabu search (TS) is another popular local search meta- 
heuristic to be hybridized with EAs (see [52.24—29], to 
mention just a few). 

Other EA techniques such as estimation of distribu- 
tion algorithms (EDAs) have also been hybridized with 
local search approaches, e.g., Campelo et al. [52.30] 
for the design of electromagnetic devices and Laguna 
et al. [52.31] for maximum cut. It is also worth men- 
tioning the work by Santana et al. [52.32] on the combi- 
nation of variable neighborhood search [52.33] (VNS) 
with EDAs for protein structure prediction. This is 
done in different ways, most notably either integrating 
VNS within the EDA or alternating the two algorithms. 
Zhang etal. [52.34] propose an analogous approach 
for quadratic programming based on the hybridization 


52.4 Memetic Algorithms 


MAs are population-based metaheuristics and as such 
they keep a population of candidate solutions for the 
problem under consideration. While these solutions 
were called individuals in EA jargon, in the context 
of MAs it is sometimes more appropriate to think of 
them as agents, thus highlighting their more active na- 
ture (i. e., behavior purposefully directed at optimizing 
some problem) in contrast to the passive nature of EA 
individuals (which are mere information placeholders 


of EDAs and 2-opt hill climbing. A very interesting 
approach was also proposed by Peña etal. [52.35], 
who hybridized a steady-state genetic algorithm and an 
EDA; each of these algorithms is responsible for gen- 
erating a part of the population. On the other hand, 
Zhou et al. [52.36] and Ahn et al. [52.37] propose the 
hybridization of an EDA with particle swarm optimiza- 
tion (PSO) where the latter is used for intensification 
purposes. 

As for hybridization with techniques from the 
realms of AI/OR or constraint programming, exam- 
ples date back to the mid 1990s. Particularly interesting 
is the combination of EAs with exact techniques and 
derivatives thereof. For example, branch and bound 
(BnB) can be integrated within an EA as a recom- 
bination operator [52.38, 39] or in the decoding pro- 
cess [52.40]. Conversely, an EA can be used for the 
strategic guidance of BnB [52.41]. As for collaborative 
combinations, intertwined approaches were considered. 
in [52.42] by combining EAs and BnB within a par- 
allel multiagent system, and in [52.38, 43] by defining 
a model in which the exact technique provided par- 
tial promising solutions, and the EA returned improved 
bounds. A related multilevel approach involving beam 
search and an EA hybridized with local-search algo- 
rithm can be found in [52.4446]. For further details on 
this kind of exact/metaheuristic hybridization we refer 
the reader to [52.3]. 

Most of the above hybrid EAs can be safely de- 
scribed as memetic algorithms, if only under the broad 
interpretation of MAs emanating from seminal and 
early works on the topic [52.9,47]. Indeed, the algo- 
rithmic hybridizations mentioned above can be seen 
as combinations of global and local search, prob- 
ably the most widely recognized feature of MAs. 
The next section provides a more detailed overview 
of MAs. 


subject to evolutionary operations). The particular way 
in which this active behavior can be captured will be 
discussed later on. Algorithm 52.2 shows the general 
pseudocode of a simple MA. 


Algorithm 52.2 Pseudocode of a basic MA based on 
a local search LS 
1: function Basic MA (in P: Problem, in par: Param- 
eters): Solution; 


Memetic and Hybrid Evolutionary Algorithms 


52.4 Memetic Algorithms 


2: for i € N, do 

3: pop|i] <-GENERATE-SOLUTION(P); 

4: pop{i]<-LOCAL-IMPROVEMENT  (pop{i], P, 
par); 

5: end for 

6: i< 0; 

7: while i < MaxEvals do 

8: auxpop[0] <-SELECT (pop); 

9: for j <1 to #op do 


10: auxpop|j] < APPLY(op|j], auxpopļj— 1], P, 
par), 
11: end for 


12: newpop < LOCAL-IMPROVEMENT 
(auxpop|#op], P, par); 

13: pop < REPLACE(pop, newpop); 

14: if DEGENERATED(pop) then 

15: RESTART (pop, P); 

16: endif 

17: end while 

18: return GetBest (pop); 


First of all, the population must be initialized. Prob- 
lem knowledge can be introduced in this stage by using 
constructive heuristics. For example, greedy strate- 
gies based in the nearest neighbor heuristic [52.48] 
could be used to generate solutions for the travel- 
ing salesman problem (TSP) — see also [52.49-51] 
for other examples in the context of scheduling and 
timetabling. Then, the population of agents is subject 
to processes of competition and mutual cooperation 
much like in EAs. Competition (i.e., selection and 
replacement) can be done in general using any of 
the well-known strategies used in EAs, e.g., tourna- 
ment, ranking, or fitness-proportionate selection, and/or 
comma replacement, etc. As for cooperation, it is ac- 
complished by using a number of reproductive opera- 
tors. Many different such operators can be used in an 
MA, as illustrated in the general pseudocode shown in 
Algorithm 52.2: an array op of operators is sequen- 
tially applied to the population in a pipeline fashion. 
Note also how these operators receive as input not 
just the solutions they act on but also problem data, 
thus emphasizing the usage of problem knowledge. 
While it is possible to consider local improvement as 
one of these operators, it plays such a distinctive role 
in most MAs that it is independently depicted in the 
pseudocode. 

Recombination is the algorithmic component that 
best captures cooperation among two (or more [52.52]) 
agents in MAs. By using this operation, the relevant 
information contained in the parents is combined to 


produce new solutions. Relevance here amounts to 
be significant when it comes to evaluating the qual- 
ity of solutions. As an example, consider again the 
TSP. While solutions can be encoded as permuta- 
tions, a standard permutational recombination opera- 
tor will not perform adequately in general. The rea- 
son is that permutations are information-rich struc- 
tures carrying positional, precedence, and adjacency 
information [52.53]. Clearly, the latter is the really 
relevant piece of information when the TSP is in- 
volved. Hence, an edge-manipulation operator such as 
edge recombination [52.54] (ER) will perform better 
than position-based operators such as partially-mapped 
crossover [52.55] (PMX) or uniform cycle crossover 
[52.56] (UCX). There are several principled approaches 
to define measures capturing the goodness of different 
representations (that is, the way a particular encoding 
is interpreted) among which we can cite epistasis (non- 
additive influence on the fitness function of combining 
several information units) [52.57,58], fitness variance 
of formae (variance of the fitness values of a representa- 
tive subset of solutions carrying a particular information 
unit) [52.59], and fitness correlation (correlation in the 
fitness values of parents and offspring) [52.60, 61]. 
Mutation is the other classical reproductive oper- 
ator. Its role is that of injecting new material in the 
population (at a low rate to prevent the search degrad- 
ing to a random walk in the solution space). This view 
of mutation as an important operator but it is, never- 
theless, secondary to recombination and departs from 
the interpretation of the search process made in, e.g. 
evolutionary programming [52.62]. In either case, mu- 
tation plays an important role in EAs since it favors 
the effectiveness of recombination (particularly in some 
unstructured landscapes). Furthermore, if the problem 
exhibits constraints it is commonly much easier to han- 
dle these in a local way and maintain/achieve feasibility 
by introducing small perturbations in a solution than via 
recombination (e.g., consider a university timetabling 
problem [52.63]: given a feasible solution, it is easier to 
exchange a couple of slots, and keep them feasible, than 
to produce a new feasible solution that comes from the 
combination of two feasible assignments). However, it 
must be noted that unlike classical EAs, in which re- 
combination is a mere random shuffler of information 
(and hence can be arguably cast as a macro-mutational 
process), MAs usually utilize intelligent problem-aware 
mechanisms for recombination, and thus play a cru- 
cial role in the search. Broadly speaking, this inclusion 
of problem knowledge during recombination can be 
projected on two aspects of the process, namely the se- 


1053 


HZS | J Hed 


1054 Part E 


Evolutionary Computation 


77S |3 Hed 


lection of the pieces of information from the parents 
that will be transmitted to the offspring, and the selec- 
tion of the external information that will be added to 
it. Regarding the former issue, it is commonly assumed 
that transmission of common features is beneficial for 
some problems [52.54, 64]. Further completion of the 
descendant can be done in several ways. Radcliffe and 
Surry [52.59] proposed the use of local improvers or 
implicit enumeration schemes. Cotta and Troya sug- 
gested the use of exact techniques to find the best 
way of combining the information present in the par- 
ents [52.39]. Ibaraki [52.65] and Gallardo et al. [52.66] 
used dynamic programming for this purpose. 

The use of local search (LS) is one of the most dis- 
tinctive components of MAs, to the extent that MAs 
are often equated to EAs endowed with LS. While this 
is certainly a very popular implementation of MAs, 
several authors [52.47,67,68] advocate a broader in- 
terpretation of the paradigm in which an explicit local 
search algorithm need not be present (e.g., local im- 
provement can take place during recombination as in 
the edge assembly crossover defined in [52.69] for the 
TSP). In its simplest incarnation, these local improvers 
can be hill climbers, exploring the neighborhood of the 
current solution and performing uphill moves in the 
corresponding fitness landscape [52.70] until a local op- 
timum is found or the computational budget assigned 
to this operator is exhausted. Obviously, much more 
complex mechanisms can be defined for this purpose, 
such as the use of fully-fledged metaheuristics, such as, 
for example, TS, SA, or VNS, just to mention a few. 
It must be also noted that it is mainly because of the 
use of this mechanism for improving solutions on a lo- 
cal (and even autonomous) basis that the term agent 
is deserved. Under this interpretation, the MA can be 
viewed as a collection of agents that autonomously ex- 
plore the search space, cooperate via recombination, 
and compete for computational resources via selection 
and replacement. This also provides an interesting link 
to cooperative models for optimization and to memetic 
computing in general [52.71, 72]. 

One of the crucial elements governing the suc- 
cessful application of local search within an MA is 
achieving a good balance between global and local 
search. This amounts to determining when to apply lo- 
cal search (how often and on which solutions) and how 
intense this local search has to be. This parameteriza- 
tion problem is very hard and constitutes an active area 
of research [52.73]. An additional issue is the selection 
of a particular local search scheme within the MA. This 
has actually led to a very fruitful line of research in 


so-called multimemetic algorithms (MMAs). Therein, 
a meme is interpreted as a lifetime learning procedure 
capable of improving individual solutions [52.74-79]. 
Each solution in a MMA carries a gene indicating 
the particular LS operator that has to be applied on 
it (a pointer to an existing operator, or the parame- 
terization of a generic local search template). Thus, 
they constitute a generalization of meta-lamarckian 
EAs [52.80] (in which the selection of the LS opera- 
tor — from a pre-fixed set — is made using some rules 
that are hard-wired into the MA) and an intermediate 
step in the direction of co-evolving MAs [52.75] (in 
which a population of LS operators co-evolve along 
with a population of solutions). Finally, it is essential 
from a purely computational perspective to be able to 
apply LS in an efficient way. As was mentioned in 
Sect. 52.1 this is normally attained in combinatorial 
problems by incrementally evaluating solutions belong- 
ing to the neighborhood area. For example, consider the 
2-opt neighborhood in the TSP [52.48]: each neighbor 
of a given solution is obtained by a 2-opt move that re- 
moves two edges and adds two new edges; the fitness of 
such a neighbor can thus be computed by taking the fit- 
ness of the initial solution and adding a term accounting 
for the difference between added edges and removed 
edges. 

Another interesting element of MAs is the restart- 
ing process invoked whenever the population is deemed 
degenerate due to a lack of diversity or any other factor 
impairing the subsequent performance of the algorithm. 
This restarting process can be done in numerous ways 
(for example, triggering hypermutation [52.81] or in- 
troducing random solutions in the population [52.82]) 
and can be often found in plain EAs as well (indeed 
the use of restarting procedures in EAs can be traced 
back to the CHC algorithm [52.83] in the early 1990s). 
This said, it constitutes a generic element to be rou- 
tinely included in MAs. Indeed, scatter search [52.84] 
(a technique that can reasonably be termed memetic, 
despite having an independent origin from MAs and 
its fair share of distinctive features such as the em- 
phasis on using deterministic strategies) has a restart 
as a crucial element in its algorithmic cycle. Note 
also that it is not unusual to have MAs without mu- 
tation, given the fact that new information can also 
be injected in the population via local search, and the 
availability of restarting mechanisms. Indeed, in some 
applications, it may be better to converge quickly and 
then restart, rather than continuously diversifying the 
search. This is not the general norm at any rate. As 
a matter of fact, one can easily find MAs that use sev- 


Memetic and Hybrid Evolutionary Algorithms 


52.5 Cooperative Optimization Models 


eral mutation operators, either by considering different 
basic neighborhoods [52.27,85] or by defining light 
and heavy mutations that introduce different amounts 
of new information [52.86,87] — cf. hypermutation. 
Needless to say, the use of restarting strategies is a cor- 
rective measure that is taken once a diversity problem 
is encountered and can be complemented with pre- 
ventive measures aimed to hinder (or even avoid) this 
problem taking place for the first time. For example, 
structured populations [52.88] could be used to cause 


a slowdown in the propagation of information across 
the population, hence hindering the apparition of su- 
per agents that might quickly take the population over 
and destroy diversity. Also, population management 
strategies based on the use of distance measures have 
been utilized with notable success in combinatorial 
problems [52.89]. More traditional strategies for main- 
taining diversity during selection and replacement, such 
as crowding [52.90] or sharing [52.91], can be used as 
well. 


52.5 Cooperative Optimization Models 


As stated in previous section, the interpretation of MAs 
as a collection of interacting agents that autonomously 
explore the search space while cooperating/competing 
with each other seamlessly integrates with the more 
general notion of memetic computing and coopera- 
tive optimization models. According to the definition 
in [52.92], memetic computing is: 


a paradigm that uses the notion of meme(s) as units 
of information encoded in computational represen- 
tations for the purpose of problem solving, 


where meme should be interpreted as local-search op- 
erator as mentioned before. This orchestration of dif- 
ferent LS operators naturally links with cooperative 
models dating back to the late 1990s [52.93]. These 
attempts to attain an effective mechanism for explor- 
ing the search space try to escape from local optima 
by combining search agents that have diverse inten- 
sification/diversification characteristics and that start 
from different points in the search space [52.94]. Ac- 
cording to [52.95], the distinctive features of this kind 
of models are (1) a collection of autonomous algo- 
rithms (agents), each of them supporting a different 
optimization method, and (2) a cooperative scheme for 
combining these autonomous elements into an unified 
problem-solving strategy. 

Early cooperative models involve an algorithmi- 
cally homogenous collection of algorithms exchang- 
ing information. For example, Toulouse et al. [52.93] 
considered a collection of TS algorithms exchanging 
tabu attributes (notice the relation of this model with 
the parallel recombinative SA algorithm mentioned in 
Sect. 52.3.2) and later proposed a hierarchical decom- 
position approach [52.96]. A related model was also 
proposed by Crainic and Gendreau [52.97]. Crainic 


et al. [52.98] put forward an asynchronous cooperative 
search procedure on the basis of VNS. A different ap- 
proach based on the used of a central manager was 
proposed by Pelta et al. [52.99]. This central manager 
gathers information about the performance of the differ- 
ent agents and acts on them, altering their behavior — see 
also [52.100]. Other centralized approaches were de- 
fined by LeBouthillier and Crainic [52.101] by means 
of maintaining a solution warehouse upon which indi- 
vidual heuristics act. More recently, Barbucha [52.102] 
explored synchronous and asynchronous versions of 
an analogous memory-centralized approach in the con- 
text of vehicle routing problems. Leung et al. [52.103] 
proposed, in turn, a cooperative/competitive scheme in 
which the problem space is partitioned and a pool of 
agents is structured into several subgroups which repel 
each other, thus contributing to keeping diversity. 
Multi-level models have also received a lot of at- 
tention in the last years. These models consist of 
layered algorithmic approaches and are not to be con- 
fused with multilevel partitioning strategies proposed 
for combinatorial optimization [52.104, 105], in which 
the resolution of the problem is attained via its incre- 
mental reduction and further reconstruction, using some 
solver at each level and the solution obtained therein 
as seeds for solving the next higher level. Hulianytskyi 
and Sirenko [52.106] presented a two-level coopera- 
tive approach: the lower level corresponds to basic 
algorithms, whereas the upper level combines the infor- 
mation found by these and broadcasts a refined version 
back to the basic algorithms. Milano and Roli [52.4] 
developed a multiagent system called MAGMA (mul- 
tiagent metaheuristic architecture) allowing the use of 
metaheuristics at different levels (generating solutions, 
improving them, defining search strategies, and co- 
ordinating lower-level agents). Each level (or layer) 


1055 


S°@S | 4 Hed 


1056 PartE 


Evolutionary Computation 


ZS |3 Hed 


provides a different abstraction level and can contain 
several agents loaded with a particular search algo- 
rithm. The lowest layer (level 0) generates solutions 
to be fed to level 1. The latter provides local improve- 
ment of these solutions. Level 2 has a global view of the 
search space and provides the means for escaping from 
local optima. The upmost level (level 3) coordinates 
the functioning of the underlying agents, rewarding 


52.6 Conclusions 


Memetic algorithms in particular, and memetic com- 
puting in general, constitute a flexible and powerful 
optimization approach. Rather than being competi- 
tors for existing methods and/or paradigms, they are 
a very suitable framework for integrating such exist- 
ing techniques in order to attain synergistic combi- 
nations or being able to deal with the curse of di- 
mensionality in large-scale optimization settings. They 
are also a very active research area in which, in ad- 
dition to a steadily growing number of application 
works, new fundamental issues are attracting the in- 
terest of the research community. Among these we 


those which perform well or adapting and improving 
their functioning. They specifically adapt this frame- 
work for deployment of MAs within it. Finally, Amaya 
et al. [52.107] defined a multilevel model in which het- 
erogenous simple MAs (i.e., MAs obtained from the 
hybridization of an EA and a local search method) are 
combined in a cooperative model, and exchange infor- 
mation following an underlying arbitrary topology. 


can cite the theoretical study of their self-adaptation 
capabilities and their deployment on the emerging com- 
putational platforms that are available nowadays. We 
refer to [52.72, 108, 109] for recent reviews of the field. 
For an overview of the literature dealing with the 
application of these techniques to combinatorial opti- 
mization problems we refer the reader to [52.67, 108] 
for a general perspective and to [52.63] for a review of 
scheduling and timetabling applications. Finally, we re- 
fer the reader to [52.110, 111] for further information 
on the deployment of MAs on combinatorial optimiza- 
tion problems. 


References 

52.1 C. Blum, A. Roli: Metaheuristics in combinatorial 52.8 R. Dawkins: The Selfish Gene (Clarendon, Oxford 
optimization: Overview and conceptual compar- 1976) 
ison, ACM Comput. Surv. 35(3), 268-308 (2003) 52.9 P. Moscato: On Evolution, Search, Optimiza- 

52,2 D.H. Wolpert, W.G. Macready: No free lunch the- tion, Genetic Algorithms and Martial Arts: To- 
orems for optimization, IEEE Trans. Evol. Comput. wards Memetic Algorithms. Technical Report Cal- 
1(1), 67-82 (1997) tech Concurrent Computation Program, Report. 

52.3 J. Puchinger, G.R. Raidl: Combining metaheuris- 826 (California Institute of Technology, Pasadena 
tics and exact algorithms in combinatorial opti- 1989) 
mization: A survey and classification, Lect. Notes 52.10 R. Santana, C. Bielza, P. Larranaga: Network Mea- 
Comput. Sci. 3562, 113-124 (2005) sures for Re-using Problem Information in EDAs. 

52.4 M. Milano, A. Roli: MAGMA: A multiagent archi- Technical Report UPM-FI/DIA/2010-3 (Department 
tecture for metaheuristics, IEEE Trans. Syst. Man of Artificial Intelligence, Faculty of Informatics, 
Cybern. Part B 34(2), 925-941 (2004) Technical University of Madrid 2010) 

52.5 E.-G. Talbi, V. Bachelet: COSEARCH: A parallel co- 52.11 C. Cotta, E. Alba, J.M. Troya: Stochastic reverse 
Operative metaheuristic, J. Math. Model, Algo- hillclimbing and iterated local search, Proc. 1999 
rithms 5(1), 5-22 (2006) Congr. Evol. Comput. (IEEE Neural Network Coun- 

52.6 P. Cowling, G. Kendall, E. Soubeiga: A hyper- cil — Evolutionary Programming Society - Insti- 
heuristic approach to scheduling a sales summit, tution of Electrical Engineers, Washington 1999) 
Lect. Notes Comput. Sci. 2079, 176-190 (2001) pp. 1558-1565 

52.7 K. Chakhlevitch, P.I. Cowling: Hyperheuristics: Re- 52.12 C. Blum, J. Puchinger, G. Raidl, A. Roli: A brief 


cent developments. In: Adaptive and Multilevel 
Metaheuristics, Studies in Computational Intel- 
ligence, Vol. 136, ed. by C. Cotta, M. Sevaux, 
K. Sörensen (Springer, Berlin 2008) pp. 3-29 


survey on hybrid metaheuristics, 4th Int. Conf. 
Bioinspired Optim. Methods Appl. (BIOMA 2010), 
ed. by B. Filipic, J. Silc (Ljubljana, Slovenia 2010) 
pp. 3-16 


Memetic and Hybrid Evolutionary Algorithms | References 

52.13 E.-G. Talbi: A taxonomy of hybrid metaheuristics, 52.30 F. Campelo, F.G. Guimaraes, J.A. Ramirez, 
J. Heuristics 8, 541-564 (2002) H. Igarashi: Hybrid estimation of distribution 

52.14 C. Cotta, E.G. Talbi, E. Alba: Parallel hybrid algorithm using local function approximations, 
metaheuristics. In: Parallel! Metaheuristics, ed. IEEE Trans. Magn. 45(3), 1558-1561 (2009) 
by E. Alba (Wiley-Interscience, Hoboken 2005) 52.31 M. Laguna, A. Duarte, R. Mart: Hybridizing the 
pp. 347-370 cross-entropy method: An application to the 

52,15 M. El-Abd, M. Kamel: A taxonomy of cooperative max-cut problem, Comput. Oper. Res. 36(2), 487- 
search algorithms, Lect. Notes Comput. Sci. 3636, 498 (2009) 

32-41 (2005) 52.32 R. Santana, P. Larrañaga, J.A. Lozano: Combin- 

52.16 G. Raidl: A unified view on hybrid metaheuristics, ing variable neighborhood search and estimation 
Lect. Notes Comput. Sci. 4030, 1-12 (2006) of distribution algorithms in the protein side 

52.17 L. Jourdan, M. Basseur, E.-G. Talbi: Hybridizing chain placement problem, J. Heuristics 14, 519- 
exact methods and metaheuristics: A taxonomy, 547 (2008) 

Eur. J. Oper. Res. 199(3), 620-629 (2009) 52.33 P. Hansen, N. Mladenović: Variable neighborhood 

52.18 Z. Michalewicz: Decoders. In: Handbook of Evo- search: Principles and applications, Eur. J. Oper. 
lutionary Computation, ed. by T. Back, D.B. Fogel, Res. 130(3), 449-467 (2001) 

Z. Michalewicz (Institute of Physics Publishingand 52.34 Q. Zhang, J. Sun, E. Tsang, J. Ford: Estima- 
Oxford Univ. Press, Bristol 1997) tion of distribution algorithm with 2-opt lo- 

52.19 P.C. Chu, J.E. Beasley: A genetic algorithm for the cal search for the quadratic assignment prob- 
multidimensional knapsack problem, J. Heuristics lem. In: Towards a New Evolutionary Computa- 
4, 63-86 (1998) tion, Studies in Fuzziness and Soft Computing, 

52.20 R.H. Storer, S.D. Wu, R. Vaccari: New search spaces Vol. 192, ed. by J. Lozano, P. Larrañaga, |. Inza, 
for sequencing problems with application to job- E. Bengoetxea (Springer, Berlin, Heidelberg 2006) 
shop scheduling, Manag. Sci. 38, 1495-1509 (1992) pp. 281-292 

52.21 C. Cotta, J.M. Troya: A hybrid genetic algorithm 52.35 J.M. Peña, V. Robles, P. Larrañaga, V. Herves, 
for the 0-1 multiple knapsack problem. In: Arti- F. Rosales, M.S. Prez: GA-EDA: Hybrid evolutionary 
ficial Neural Nets and Genetic Algorithms 3, ed. algorithm using genetic and estimation of distri- 
by G.D. Smith, N.C. Steele, R.F. Albrecht (Springer, bution algorithms, Lect. Notes Comput. Sci. 3029, 
Wien 1998) pp. 251-255 361-371 (2004) 

52.22 M.G. Norman, P. Moscato: A competitive and 52.36 Y. Zhou, J. Wang, J. Yin: A discrete estimation of 
cooperative approach to complex combinatorial distribution particle swarm optimization for com- 
search, Proc. 20th Inf. Oper. Res. Meet., Buenos binatorial optimization problems, 3rd Int. Conf. 
Aires (1989), pp. 3.15-3.29 Nat. Comput. (ICNC 2007) (2007) pp. 80-84 

52.23 S.W. Mahfoud, D.E. Goldberg: Parallel recombi- 52.37 C.W. Ahn, J. An, J.-C. Yoo: Estimation of parti- 
native simulated annealing: A genetic algorithm, cle swarm distribution algorithms: Combining the 
Parallel Comput. 21(1), 1-28 (1995) benefits of PSO and EDAs, Inf. Sci. 192, 109-119 

52.24 C. Fleurant, J.A. Ferland: Genetic and hybrid al- (2012) 
gorithms for graph coloring, Ann. Oper. Res. 63, 52.38 C. Cotta, J.F. Aldana, A.J. Nebro, J.M. Troya: Hy- 
437-461 (1996) bridizing genetic algorithms with branch and 

52.25 H. Kim, Y. Hayashi, K. Nara: The performance of bound techniques for the resolution of the TSP. 
hybridized algorithm of genetic algorithm sim- In: Artificial Neural Nets and Genetic Algorithms 
ulated annealing and Tabu search for thermal 2, ed. by D.W. Pearson, N.C. Steele, R.F. Albrecht 
unit maintenance scheduling, 2nd IEEE Conf. Evol. (Springer, Wien 1995) pp. 277-280 
Comput. ICEC'95 (Perth, Australia 1995) pp. 14-119 52.39 C. Cotta, J.M. Troya: Embedding branch and 

52.26 J. Thiel, S. Voss: Some experiences on solv- bound within evolutionary algorithms, Appl. In- 
ing multiconstraint zero-one knapsack problems tell. 18(2), 137-153 (2003) 
with genetic algorithms, INFOR 32(4), 226-242 52.40 J. Puchinger, G.R. Raidl, G. Koller: Solving a real- 
(1994) world glass cutting problem, Lect. Notes Comput. 

52.27 C.-F. Liaw: A hybrid genetic algorithm for the Sci. 3004, 165-176 (2004) 
open shop scheduling problem, Eur. J. Oper. Res. 52.41 K. Kostikas, C. Fragakis: Genetic programming ap- 
124, 28-42 (2000) plied to mixed integer programming, Lect. Notes 

52.28 E.K. Burke, A.J. Smith: A memetic algorithm to Comput. Sci. 3003, 113-124 (2004) 
schedule planned maintenance for the national 52.42 J. Denzinger, T. Offermann: On cooperation be- 
grid, J. Exp. Algorithmics 4, 1-13 (1999) tween evolutionary algorithms and other search 

52.29 J.E. Gallardo, C. Cotta, A.J. Fernández: Finding low paradigms, 6th Int. Conf. Evol. Comput. IEEE 
autocorrelation binary sequences with memetic (1999) pp. 2317-2324 
algorithms, Appl. Soft Comput. 9(4), 1252-1262 52.43 J.E. Gallardo, C. Cotta, A.J. Fernandez: Solv- 


(2009) 


ing the multidimensional knapsack problem us- 


1057 


zS |3 Hed 


1058 PartE 


Evolutionary Computation 


7s |3 Hed 


52.44 


52.45 


52.46 


52.47 


52.48 


52.49 


52.50 


52.51 


52.52 


52.53 


52.54 


52.55 


52.56 


52.57 


ing an evolutionary algorithm hybridized with 
branch and bound, Lect. Notes Comput. Sci. 3562, 
21-30 (2005) 

J.E. Gallardo, C. Cotta, A.J. Fernandez: A multi- 
level memetic/exact hybrid algorithm for the still 
life problem, Lect. Notes Comput. Sci. 4193, 212- 
221 (2006) 

J.E. Gallardo, C. Cotta, A.J. Fernandez: On the hy- 
bridization of memetic algorithms with branch- 
and-bound techniques, IEEE Trans. Syst. Man Cy- 
bern. Part B 37(1), 77-83 (2007) 

J.E. Gallardo, C. Cotta, A.J. Fernandez: Recon- 
structing phylogenies with memetic algorithms 
and branch-and-bound. In: Analysis of Biolog- 
ical Data: A Soft Computing Approach, ed. by 
S. Bandyopadhyay, U. Maulik, J.T.-L. Wang (World 
Scientific, Singapore 2007) pp. 59-84 

P. Moscato: Memetic algorithms: A short intro- 
duction. In: New Ideas in Optimization, ed. by 
D. Corne, M. Dorigo, F. Glover (McGraw-Hill, Maid- 
enhead 1999) pp. 219-234 

G. Reinelt: The Traveling Salesman. Computa- 
tional Solutions for TSP Applications (Springer, 
Berlin, Heidelberg 1994) 

W.-C. Yeh: A memetic algorithm of the 
n/2/FlowshoplaF+ Cmax scheduling prob- 
lem, Int. J. Adv. Manuf. Technol. 20, 464-473 
(2002) 

R. Varela, J. Puente, C.R. Vela, A. Gomez: A 
knowledge-based evolutionary strategy for 
scheduling problems with bottlenecks, Eur. 
J. Oper. Res. 145(1), 57-71 (2003) 

0. Rossi-Doria, B. Paechter: A memetic algorithm 
for university course timetabling. In: Combina- 
torial Optimisation 2004 Book of Abstracts, Lan- 
caster 2004, p. 56, ed. by Lancaster University 
A.E. Eiben, P.-E. Raue, Z. Ruttkay: Genetic al- 
gorithms with multi-parent recombination, Lect. 
Notes Comput. Sci. 866, 78-87 (1994) 

B.R. Fox, M.B. McMahon: Genetic operators for se- 
quencing problems. In: Foundations of Genetic 
Algorithms |, ed. by G.J.E. Rawlins (Morgan Kauf- 
mann, San Mateo 1991) pp. 284-300 

K. Mathias, L.D. Whitley: Genetic operators, the 
fitness landscape and the traveling salesman 
problem. In: Parallel Problem Solving From Na- 
ture Il, ed. by R. Manner, B. Manderick (Elsevier 
Science B.V., Amsterdam 1992) pp. 221-230 

D.E. Goldberg, R. Lingle Jr.: Alleles, loci and 
the traveling salesman problem, Proc. 1st Int. 
Conf. Genet. Algorithms, ed. by J.J. Grefenstette 
(Lawrence Erlbaum Associates, Hillsdale 1985) 
pp. 154-159 

C. Cotta, J.M. Troya: Genetic forma recombination 
in permutation flowshop problems, Evol. Com- 
put. 6(1), 25-44 (1998) 

Y. Davidor: Epistasis variance: Suitability of a rep- 
resentation to genetic algorithms, Complex Syst. 
4 (4), 369-383 (1990) 


52.58 


52.59 


52.60 


52.61 


52.62 


52.63 


52.64 


52.65 


52.66 


52.67 


52.68 


52.69 


52.70 


52.71 


Y. Davidor: Epistasis variance: A viewpoint on GA- 
hardness. In: Foundations of Genetic Algorithms 
I, ed. by G.J.E. Rawlins (Morgan Kaufmann, San 
Mateo 1991) pp. 23-35 

N.J. Radcliffe, P.D. Surry: Fitness variance of for- 
mae and performance prediction. In: Founda- 
tions of Genetic Algorithms III, ed. by L.D. Whitley, 
M.D. Vose (Morgan Kaufmann, San Francisco 1994) 
pp. 51-72 

B. Manderick, M. de Weger, P. Spiessens: The ge- 
netic algorithm and the structure of the fitness 
landscape, Proc. 4th Int. Conf. Genet. Algorithms, 
ed. by R.K. Belew, L.B. Booker (Morgan Kaufmann, 
San Mateo 1991) pp. 143-150 

J. Dzubera, L.D. Whitley: Advanced correlation 
analysis of operators for the traveling salesman 
problem, Lect. Notes Comput. Sci. 866, 68-77 
(1994) 

L.J. Fogel, A.J. Owens, M.J. Walsh: Artificial Intel- 
ligence Through Simulated Evolution (Wiley, New 
York 1966) 

C. Cotta, A.J. Fernandez: Memetic algorithms in 
planning, scheduling, and timetabling. In: Evo- 
lutionary Scheduling, Studies in Computational 
Intelligence, Vol. 49, ed. by K. Dahal, K.C. Tan, 
P.I. Cowling (Springer, Berlin, Heidelberg 2007) 
pp. 1-30 

C. Oğuz, M.F. Ercan: A genetic algorithm for hybrid 
flow-shop scheduling with multiprocessor tasks, 
J. Sched. 8, 323-351 (2005) 

T. Ibaraki: Combination with dynamic program- 
ming. In: Handbook of Evolutionary Computa- 
tion, ed. by T. Back, D. Fogel, Z. Michalewicz 
(Oxford Univ. Press, New York 1997), pp. D3.4:1-2 

J.E. Gallardo, C. Cotta, A.J. Fernandez: A memetic 
algorithm with bucket elimination for the still life 
problem, Lect. Notes Comput. Sci. 3906, 73-85 
(2006) 

P. Moscato, C. Cotta: A gentle introduction to 
memetic algorithms. In: Handbook of Meta- 
heuristics, ed. by F. Glover, G. Kochenberger 
(Kluwer, Boston 2003) pp. 105-144 

P. Moscato, C. Cotta, A.S. Mendes: Memetic al- 
gorithms. In: New Optimization Techniques in 
Engineering, ed. by G.C. Onwubolu, B.V. Babu 
(Springer, Berlin, Heidelberg 2004) pp. 53-85 

Y. Nagata, S. Kobayashi: Edge assembly crossover: 
A high-power genetic algorithm for the travel- 
ing salesman problem, Proc. 17th Int. Conf. Genet. 
Algorithms (ICGA), ed. by T. Back (Morgan Kauf- 
mann, San Mateo 1997) pp. 450-457 

T.C. Jones: Evolutionary Algorithms, Fitness Land- 
scapes and Search, Ph.D. Thesis (University of New 
Mexico, Albuquerque 1995) 

F. Neri, C. Cotta: A primer on memetic algorithms. 
In: Handbook of Memetic Algorithms, Studies 
in Computational Intelligence, Vol. 379, ed. by 
F. Neri, C. Cotta, P. Moscato (Springer, Berlin, Hei- 
delberg 2012) pp. 43-52 


Memetic and Hybrid Evolutionary Algorithms | References 

52.72 F. Neri, C. Cotta: Memetic algorithms and memetic 52.87 P.M. França, J.N.D. Gupta, A.S. Mendes, 
computing optimization: A literature review, P. Moscato, K.J. Veltnik: Evolutionary algo- 
Swarm Evol. Comput. 2, 1-14 (2012) rithms for scheduling a flowshop manufacturing 

52.73 D. Sudholt: Parametrization and balancing local cell with sequence dependent family setups, 
and global search. In: Handbook of Memetic Al- Comput. Ind. Eng. 48, 491-506 (2005) 
gorithms, Studies in Computational Intelligence, 52.88 M. Tomassini: Spatially Structured Evolutionary 
Vol. 379, ed. by F. Neri, C. Cotta, P. Moscato Algorithms: Artificial Evolution in Space and Time 
(Springer, Berlin, Heidelberg 2012) pp. 55-72 (Springer, New York 2005) 

52.74 N. Krasnogor, B.P. Blackburne, E.K. Burke, 52.89 K. Sörensen, M. Sevaux: MA|PM: Memetic algo- 
J.D. Hirst: Multimeme algorithms for protein rithms with population management, Comput. 
structure prediction, Lect. Notes Comput. Sci. Oper. Res. 33(5), 1214-1225 (2006) 

2439, 769-778 (2002) 52.90  0.J. Mengshoel, D.E. Goldberg: The crowding ap- 

52.75 J.E. Smith: Co-evolution of memetic algorithms: proach to niching in genetic algorithms, Evol. 
Initial investigations, Lect. Notes Comput. Sci. Comput. 16(3), 315-354 (2008) 

2439, 537-548 (2002) 52.91 D.E. Goldberg, J. Richardson: Genetic algorithms 

52.76 N. Krasnogor: Self generating metaheuristics in with sharing for multimodal function optimiza- 
bioinformatics: The proteins structure compari- tion, Proc. 2nd Int. Conf. Genet. Algorithms Genet. 
son case, Genet. Program. Evol. Mach. 5(2), 181- Algorithms Appl. (L. Erlbaum Associates, Hillsdale 
201 (2004) 1987) pp. 41-49 

52.77 N. Krasnogor, S.M. Gustafson: A study on the use 52.92 Y.-S. Ong, M.-H. Lim, X. Chen: Memetic compu- 
of “self-generation” in memetic algorithms, Nat. tation — Past, present and future, IEEE Comput. 
Comput. 3(1), 53-76 (2004) Intell. Mag. 5(2), 24-31 (2010) 

52.78 J.E. Smith: Coevolving memetic algorithms: A re- 52.93 M. Toulouse, T.G. Crainic, B. Sanso, K. Thu- 
view and progress report, IEEE Trans. Syst. Man lasiraman: Self-organization in cooperative Tabu 
Cybern. Part B 37(1), 6-17 (2007) search algorithms, IEEE Int. Conf. Syst. Man Cy- 

52.79 J.E. Smith: Credit assignmentin adaptive memetic bern., Vol. 3 (1998) pp. 2379-2384 
algorithms, GECCO '07: Proc. 9th Annu. Conf. 52.94 M. Toulouse, T.G. Crainic, B. Sans: Systemic be- 
Genet. Evol. Comput. Conf., ed. by H. Lipson (2007) havior of cooperative search algorithms, Parallel 
pp. 1412-1419 Comput. 30(1), 57-79 (2004) 

52.80 Y.-S. Ong, A.J. Keane: Meta-Lamarckian learning 52.95 T.G. Crainic, M. Toulouse: Explicit and emergent 
in memetic algorithms, IEEE Trans. Evol. Comput. cooperation schemes for search algorithms, Lect. 
8(2), 99-110 (2004) Notes Comput. Sci. 5313, 95-109 (2008) 

52.81 H.G. Cobb: An Investigation into the Use of Hy- 52.96 M. Toulouse, K. Thulasiraman, F. Glover: Multi- 
permutation as an Adaptive Operator in Genetic level cooperative search: A new paradigm for 
Algorithms Having Continuous, Time-Dependent combinatorial optimization and an application to 
Nonstationary Environments. Technical Report graph partitioning, Lect. Notes Comput. Sci. 1685, 
AlC-90-001 (Naval Research Laboratory, Washing- 533-542 (1999) 
ton, DC 1990) 52.97 T.G. Crainic, M. Gendreau: Cooperative paral- 

52.82 J.J. Grefenstette: Genetic algorithms for changing lel tabu search for capacitated network design, 
environments. In: Parallel Problem Solving from J. Heuristics 8(6), 601-627 (2002) 

Nature Il, ed. by R. Manner, B. Manderick (Elsevier, 52.98 T.G. Crainic, M. Gendreau, P. Hansen, N. Mladen- 
Amsterdam 1992) pp. 137-144 ović: Cooperative parallel variable neighborhood 

52.83 L.J. Eshelman: The CHC adaptive search algorithm: search for the p-median, J. Heuristics 10, 293-314 
How to have safe search when engaging in non- (2004) 
traditional genetic recombination. In: Founda- 52.99 D. Pelta, C. Cruz, A. Sancho-Royo, J. Verdegay: Us- 
tions of Genetic Algorithms l, ed. by G.J.E. Rawlins ing memory and fuzzy rules in a co-operative 
(Morgan Kaufmann, San Mateo 1991) pp. 265- multi-thread strategy for optimization, Inf. Sci. 
283 176, 1849-1868 (2006) 

52.84 M. Laguna, R. Marti: Scatter search. In: Method- 52.100 C. Cruz, D. Pelta: Soft computing and cooperative 
ology and Implementations in C, Operations Re- strategies for optimization, Appl. Soft Comput. 
search/Computer Science Interfaces, Vol. 24, ed. 9(1), 30-38 (2009) 
by R. Sharda, S. Voß (Kluwer, Boston 2003) 52.101 A. LeBouthillier, T.G. Crainic: A cooperative paral- 

52.85 M. Sevaux, S. Dauzère-Pérés: Genetic algorithms lel meta-heuristic for the vehicle routing problem 
to minimize the weighted number of late jobs on with time windows, Comput. Oper. Res. 32(7), 
a single machine, Eur. J. Oper. Res. 151, 296-306 1685-1708 (2005) 

(2003) 52.102 D. Barbucha: Synchronous vs. asynchronous co- 

52.86 E.K. Burke, J. Newall, R. Weare: A memetic algo- operative approach to solving the vehicle routing 


rithm for university exam timetabling, Lect. Notes 
Comput. Sci. 1153, 241-250 (1996) 


problem, Lect. Notes Comput. Sci. 6421, 403-412 
(2010) 


1059 


zS |3 Hed 


1060 PartE 


Evolutionary Computation 


ZS |3 Hed 


52.103 


52.104 


52.105 


52.106 


52.107 


52.108 


K.S. Leung, |. King, Y.B. Wong: A probabilistic 
cooperative-competitive hierarchical model for 
global optimization, Appl. Math. Comput. 175(2), 
1092-1124 (2006) 

S.T. Barnard, H.D. Simon: Fast multilevel imple- 
mentation of recursive spectral bisection for par- 
titioning unstructured problems, Concurr. Pract. 
Exp. 6(2), 101-117 (1994) 

C. Walshaw: A multilevel approach to the travel- 
ling salesman problem, Oper. Res. 50(5), 862-877 
(2002) 

L. Hulianytskyi, S. Sirenko: Cooperative model- 
based metaheuristics, Electron. Notes Discret. 
Math. 36, 33-40 (2010) 

J. Amaya, C. Cotta, A.J. Fernandez-Leiva: Memetic 
cooperative models for the tool switching prob- 
lem, Memetic Comput. 3, 199-216 (2011) 

P. Moscato, C. Cotta: A modern introduction to 
memetic algorithms. In: Handbook of Meta- 


52.109 


52.110 


52.111 


heuristics, International Series in Operations Re- 


search and Management Science, Vol. 146, ed. by 
M. Gendreau, J.Y. Potvin (Springer, New York, Dor- 
drecht, Heidelberg, London 2010) pp. 141-183 

F. Neri, C. Cotta, P. Moscato: Handbook of 
Memetic Algorithms, Studies in Computational 
Intelligence, Vol. 379 (Springer, Berlin, Heidelberg 
2012) 

J.-K. Hao: Memetic algorithms in discrete opti- 
mization. In: Handbook of Memetic Algorithms, 
Studies in Computational Intelligence, Vol. 379, 
ed. by F. Neri, C. Cotta, P. Moscato (Springer, Berlin, 
Heidelberg 2012) pp. 73-95 

P. Merz: Memetic algorithms and fitness land- 
scapes in combinatorial optimization. In: Hand- 
book of Memetic Algorithms, Studies in Com- 
putational Intelligence, Vol. 379, ed. by F. Neri, 
C. Cotta, P. Moscato (Springer, Berlin, Heidelberg 
2012) pp. 96-122 


53. Design of Representations and Search Operators 


Franz Rothlauf 


Successful and efficient use of evolutionary algo- 
rithms depends on the choice of genotypes and 
the representation — that is, the mapping from 
genotype to phenotype — and on the choice of 
search operators that are applied to the genotypes. 
These choices cannot be made independently of 
each other. This chapter gives recommendations 
on the design of representations and correspond- 
ing search operators and discusses how to consider 
problem-specific knowledge. For most problems in 
the real world, similar solutions have similar fitness 
values. This fact can be exploited by evolutionary 
algorithms if they ensure that the representa- 
tions and search operators used are defined in 
such a way that similarities between phenotypes 
correspond to similarities between genotypes. 
Furthermore, the performance of evolutionary al- 
gorithms can be increased by problem-specific 
knowledge. We discuss how properties of high- 
quality solutions can be exploited by biasing 
representations and search operators. 


53.1 Representations 


Successful and efficient use of evolutionary algorithms 
(EA) and other types of modern heuristics [53.1, 2] 
depends on the choice of genotypes and the repre- 
sentation — that is, the mapping from genotype to 
phenotype — and on the choice of search operators that 
are applied to the genotypes. These choices cannot be 
made independently of each other [53.2]. The ques- 
tion whether a certain representation leads to a better 
performing EA than an alternative representation can 
only be answered when the operators applied are taken 
into account. The reverse is also true: deciding between 
alternative operators is only meaningful for a given 
representation. 

In practice, one can distinguish two complemen- 
tary approaches to the design of representations and 


53.1 Representations .....................ccccccceeeee 1061 
53.1.1 Genotypes and Phenotypes........ 1062 
53.1.2 Genotype 

and Phenotype Search Spaces..... 1062 
53.1.3 Benefits 


of Representations .................... 1064 

53.1.4 Standard Genotypes .................. 1064 

53.2 Search Operators......................ceeeeeee 1065 
53.2.1 General Design Guidelines ......... 1065 
53.2.2 Local Search Operators ............... 1066 
53.2.3 Recombination Operators........... 1068 
53.2.4 Direct Representations............... 1069 
53.2.5 Standard Search Operators ......... 1070 


53.3 Problem-Specific Design 
of Representations 


and Search Operators........................: 1071 
53:31 PR LOCA i.e cccassusceasssadenesess 1072 
533.2 Biasing SEAR... sieri nrn 1075 
53.4 Summary and Conclusions ................... 1079 
Referentes. ceirnin 1080 


search operators [53.3]. The first approach defines 
representations (also known as decoders or indirect 
representations) where a solution is encoded in a stan- 
dard data structure, such as strings or vectors, and 
applies standard off-the-shelf search operators to these 
genotypes. To evaluate a solution, the genotype needs 
to be mapped to the phenotype space. The proper 
choice of this genotype-phenotype mapping is impor- 
tant for the performance of the search process. The 
second approach encodes solutions to the problem in 
its most natural problem space and designs search op- 
erators to operate on this search space. In this case, 
often no additional mapping between genotypes and 
phenotypes is necessary, but domain-specific search op- 
erators need to be defined. The resulting combination 


1061 


1062 


es |3 Hed 


Part E 


Evolutionary Computation 


of representation and operator is often called direct 
representation. 

This section focuses on representations. It intro- 
duces genotypes and phenotypes (Sect. 53.1.1) and 
discusses properties of the resulting genotype and phe- 
notype space (Sect. 53.1.2). Section 53.1.3 lists the 
benefits of using (indirect) representations. Finally, 
Sect. 53.1.4 gives an overview of standard genotypes. 


53.1.1 Genotypes and Phenotypes 


In 1866, Mendel recognized that nature stores the com- 
plete genetic information for an individual in pairwise 
alleles [53.4]. The genetic information that determines 
the properties, appearance, and shape of an individual 
is stored by a number of strings. Later, it was discov- 
ered that the genetic information is formed by a double 
string of four nucleotides, called DNA (deoxyribonu- 
cleic acid). 

Mendel found that nature distinguishes between the 
genetic code of an individual and its outward appear- 
ance. The genotype represents all the information stored 
in the chromosomes and allows us to describe an indi- 
vidual on the level of genes. The phenotype describes 
the outward appearance of an individual. A transfor- 
mation exists — a genotype-phenotype mapping or 
a representation — that uses the genotype information 
to construct the phenotype. To represent the large num- 
ber of possible phenotypes with only four nucleotides, 
the genotype information is not stored in the allele it- 
self, but in the sequence of alleles. By interpreting the 
sequence of alleles, nature can encode a large number 
of different phenotypes using only a few different types 
of alleles. 

Figure 53.1 illustrates the differences between chro- 
mosome, gene, and allele. A chromosome is a string of 
some length / where all the genetic information of an 
individual is stored. Although nature often uses more 
than one chromosome, many EAs use only one chro- 
mosome for encoding all phenotype information. Each 
chromosome consists of many alleles. Alleles are the 
smallest information units in a chromosome. In nature, 
alleles exist pairwise, whereas in most EA implementa- 
tions an allele is represented by only one symbol. For 


101@ 00111 


Allele 


Chromosome 


Gene 


Fig. 53.1 Alleles, genes, and chromosomes 


example, binary genotypes only have alleles with value 
zero or one. If a phenotypic property of an individual 
(solution), like its hair color or eye size is determined 
by one or more alleles, then these alleles together are 
called a gene. A gene is a region on a chromosome that 
must be interpreted together and which is responsible 
for a specific property of a phenotype. 

We must carefully distinguish between genotypes 
and phenotypes. The phenotypic appearance of a so- 
lution determines its objective value. Therefore, when 
comparing the quality of different solutions, we must 
judge them on the phenotype level. However, when it 
comes to the application of variation operators we must 
view solutions on the genotype level. New solutions that 
are created using variation operators do not inherit the 
phenotypic properties of its parents, but only the geno- 
type information regarding the phenotypic properties. 
Therefore, search operators work on the genotype level, 
whereas the evaluation of the solutions is performed on 
the phenotype level. 

Formally, we define ®, as the genotype space where 
the variation operators are applied. An optimization 
problem on ®, could be formulated as f(x): Ø, > 
R, where f assigns an element (fitness value) in R 
to every element in the genotype space ®,. A max- 
imization problem is defined as finding the optimal 
solution x* = {x € ®,|Yy € By: f(y) < f(x)}, where x 
is usually a vector or string of decision variables (al- 
leles) and f(x) is the objective or fitness function. x* 
is the global maximum. To be able to apply EAs to 
a problem, the inverse function fT! does not need to 
exist. 


53.1.2 Genotype 
and Phenotype Search Spaces 


When using a representation, we have to define — 
in analogy to nature — genotypes and a genotype— 
phenotype mapping [53.5,6]. Therefore, the fitness 
function f can be decomposed into two parts. f maps 
the genotype space ®, to the phenotype space ®,, and 
fp maps ®, to the fitness space R 


fe(x®) : By > Dy , 
HP): >R, (53.1) 
where f = fp of = fo (fa (x£)). The genotype—phenotype 
mapping f is determined by the type of genotype used. 
fp represents the fitness function and assigns a fitness 
value fp (x?) to each solution x? € ®,. The search opera- 
tors are applied to the genotypes [53.7, 8]. 


Design of Representations and Search Operators 


53.1 Representations 


The search space describes the set of feasible so- 
lutions of an optimization problem and defines rela- 
tionships (for example, distances) between solutions. 
A metric defined on a search space can be used for 
measuring similarities between solutions [53.2]. Usu- 
ally, defining a search space ® also defines a metric. 
Using a metric, the distance d(x, y) between two solu- 
tions x, y € ® measures how different the two solutions 
are. The larger the distance, the more different two 
individuals are with respect to the metric used. In 
principle, different metrics can be used for the same 
search space. Different metrics result in different dis- 
tances and different measurements for the similarity of 
solutions. 

In metric search spaces, the similarities between so- 
lutions are measured by a distance. Therefore, we have 
a set X of solutions and a real-valued distance func- 
tion (also called a metric) d: X x X — R that assigns 
a real-valued distance to any combination of two ele- 
ments x, y E€ X. 

An example of a metric space is the set of real num- 
bers R. Here, a metric can be defined by d(x, y) := 
|x— y|. Therefore, the distance between any solutions 
x,y € R is the absolute value of their differences. Ex- 
tending this definition to two-dimensional spaces R?, 
we obtain the city-block metric (also known as the taxi- 
cab metric or the Manhattan distance). It is defined for 
two-dimensional spaces as d(x, y) := |x; —y1| + |x2 — 
y2|, where x = (x1, x2) and y = (y1, y2). This metric is 
named the city-block metric as it describes the dis- 
tance between two points on a two-dimensional plane 
in a city like Manhattan or Mannheim with a rectan- 
gular ground plan. On n-dimensional search spaces R”, 
the city-block metric becomes d(x, y) := 7", |xi—yil, 
where x, y € R”. 

Another example of a metric that can be defined on 
R” is the Euclidean metric. In Euclidean spaces, a so- 
lution x = (x1, . . . , Xn) is a vector of continuous values 
(x; € R). The Euclidean distance between two solu- 
tions x and y is defined as d(x, y) := y X} 1 i — yi). 
For n= 1, the Euclidean metric coincides with the 
city-block metric. For n = 2, we have a standard two- 
dimensional search space and the distance between two 
elements x,y € R? is just a direct line between two 
points on a two-dimensional plane. 

If we assume that we have a binary space (x € 
{0, 1}"), a commonly used metric is the binary Ham- 
ming metric [53.9] d(x,y) =>“, |xi—yi|, where 
d(x,y) € {0,...,}. The binary Hamming distance be- 
tween two binary vectors x and y of length n is just the 
number of binary decision variables on which x and y 


differ. For continuous and discrete decision variables, it 
becomes d(x, y) = J- ;—; Zi where 


0, forx;=y;, 
Zi = (53.2) 
1, forx; Æ yi. 


In general, the Hamming distance measures the number 
of decision variables on which x and y differ. Two in- 
dividuals are neighbors if the distance between them is 
minimal. For the binary Hamming metric, the minimal 
distance between two individuals is dmin = 1. There- 
fore, two individuals x and y are neighbors if their 
distance d(x, y) = 1. 

Using the Euclidean or the Hamming metric only 
makes sense for measuring distances between solutions 
of the same length n. If solutions have different lengths, 
the Levenshtein distance (or edit distance) [53.10] can 
be used. This distance counts the minimum number 
of insertion, deletion, or substitution operations that 
transform one solution into the other. The Levenshtein 
distance between two solutions can be calculated with 
polynomial effort using dynamic programming [53.11]. 
For fixed-length solutions, the Levenshtein distance is 
equivalent to the Hamming distance. 

Using a representation fẹ, we obtain two differ- 
ent search spaces, ®, and ®,. Therefore, different 
metrics can be defined for the phenotype and the 
genotype space. The metric used on the phenotype 
search space ®, is usually determined by the specific 
problem to be solved and describes which problem 
solutions are similar to each other. Examples of com- 
mon phenotypes and corresponding metrics are given 
in Sect. 53.2.5. In contrast, the metric defined on ®, 
is not defined by the specific problem but can be 
defined by the search operators selected for the opti- 
mization method. As we can define different types of 
genotypes to represent the phenotypes, we are able to 
define different metrics on ®,. However, if the met- 
rics on ®, and ®, are different, different neighborhoods 
can exist on ®, and ®,. For example, when encoding 
phenotype integers using genotype bitstrings, the phe- 
notype xP = 5 has two neighbors, yP = 6 and z? = 4. 
When using the Hamming metric and binary geno- 
types, the corresponding binary string xë = 101 has 
three different neighbors, yë = 001, z8 = 111, and w8 = 
100 [53.12]. 

Therefore, the metric on the genotype space should 
be chosen such that it fits the metric on the phenotype 
space well. A representation introduces an additional 
genotype—phenotype mapping and thus modifies the fit. 


1063 


les |3 Hed 


1064 PartE 


Evolutionary Computation 


es |3 Hed 


When designing optimization methods, we have to en- 
sure that the metric on the genotype search space fits the 
original problem metric. We should choose the geno- 
type metric in such a way that phenotypic neighbors 
remain neighbors in the genotype search space. Repre- 
sentations that ensure that neighboring phenotypes are 
also neighboring genotypes are called high-locality rep- 
resentations (Sect. 53.3.1). 


53.1.3 Benefits of Representations 


In principle, a representation is not necessary for the 
application of EAs as search operators may also be 
directly applied to phenotypes. However, the use of 
an additional genotype-phenotype mapping has some 
benefits: 


@ The use of representations is necessary for prob- 
lems where a phenotype cannot be depicted as 
a string or in another way that is accessible to 
variation operators. A representative example is the 
shape of an object, for example the wing of an air- 
plane. EAs that are used to find the optimal shape 
usually require a representation as the direct appli- 
cation of search operators to the shape of a wing is 
difficult. Therefore, additional genotype—phenotype 
mappings are used and variation operators are ap- 
plied to genotypes that indirectly determine the 
shape. 

@ The introduction of a representation can be useful 
if there are constraints or restrictions on the pheno- 
type space that can be advantageously modeled by 
a specific encoding. An example is a tree problem 
where the optimal solution is a star. Instead of 
applying search operators directly to trees, we can 
introduce genotypes that only encode stars resulting 
in a much smaller search space. 

@ The use of the same genotypes for different types 
of problems, and only interpreting them differently 
by using a different genotype—phenotype map- 
ping, allows us to use standard search operators 
(Sect. 53.2.5) with known properties. In this case, 
we do not need to develop new operators with 
unknown properties and behavior. 

© Finally, using an additional genotype—phenotype 
mapping can change the difficulty of a problem. 
A representation can reduce the difficulty of the 
problem and make it easier to solve for a particular 
optimization method. However, usually the def- 
inition of a proper representation is difficult and 
problem specific. 


53.1.4 Standard Genotypes 


We characterize some of the most important and widely 
used genotypes. For a more detailed overview of differ- 
ent types of genotypes, we refer to [53.13, Sect. C1]. 


Binary Genotypes 
Binary genotypes are commonly used in genetic al- 
gorithms [53.14, 15]. Such EA types use recombina- 
tion as the main search operator and mutation only 
serves as background noise. A typical search space is 
P, = {0, 1}/, where / is the length of a binary vector 
x8 = (xj,...,x7). The genotype-phenotype mapping fa 
depends on the specific optimization problem to be 
solved. For many combinatorial optimization problems 
using binary genotypes allows a direct and very natural 
encoding. 

When using binary genotypes for encoding integer 
phenotypes, specific genotype—-phenotype mappings are 
necessary. Different types of binary representations for 
integers assign the integers xP € Ø, (phenotypes) in dif- 
ferent ways to the binary vectors xë € P, (genotypes). 
The most common binary genotype—phenotype map- 
pings are binary, Gray, and unary encoding [53.3, 16, 
Chap. 5]. 

When using binary genotypes to encode continu- 
ous phenotypes, the accuracy (precision) depends on 
the number of bits that represent one phenotype vari- 
able. By increasing the number of bits that are used to 
represent one continuous variable the accuracy of the 
representation can be increased. 


Integer Genotypes 
Instead of using binary strings with cardinality 7 = 
2, higher y-ary alphabets, where {xy € N|y > 2}, can 
also be used for the genotypes. Then, instead of a bi- 
nary alphabet a y-ary alphabet is used for a string of 
length Z. Instead of encoding 2! different individuals 
with a binary alphabet, we are able to encode y’ differ- 
ent possibilities. The size of the search space increases 
from |®,| = 2! to || = x’. 

For integer problems, users sometimes prefer to use 
binary instead of integer genotypes because schema 
processing is maximally efficient with binary alphabets 
when using standard recombination operators in genetic 
algorithms [53.17]. Goldberg [53.17] qualified this rec- 
ommendation and emphasized that the alphabet used 
in the encoding should be as small as possible while 
still allowing a natural representation of solutions. To 
give general recommendations is difficult, as users often 
do not know a priori whether binary genotypes allow 


Design of Representations and Search Operators 


53.2 Search Operators 


a natural encoding of integer phenotypes [53.18, 19]. 
We recommend that users use binary genotypes for en- 
coding binary decision variables and integer genotypes 
for integer decision variables. 


Continuous Genotypes 
When using continuous genotypes, the search space 
is od, =R', where l is the size of a real-valued 
string or vector. Continuous genotypes are often 
used in local search methods like evolution strate- 
gies or evolutionary programming. These types of 
optimization methods mainly rely on local search 
and search through the search space by adding 
a multivariate zero-mean Gaussian random variable 
to each continuous variable. In contrast, when us- 
ing recombination-based genetic algorithms, continu- 
ous decision variables are often represented by using 
binary genotypes. 

Continuous genotypes cannot only be used for en- 
coding continuous problems, but also for permutation 


53.2 Search Operators 


This section distinguishes between standard search op- 
erators, which are applied to genotypes, and problem- 
specific search operators that can also be applied 
to phenotypes (often called direct representations). 
We start with an overview of general design guide- 
lines. Sections 53.2.2 and 53.2.3 discuss local and 
recombination-based search operators. In Sect. 53.2.4, 
we focus on direct representations, where search op- 
erators are directly applied to phenotypes and no 
explicit genotype-phenotype mapping exists. Finally, 
Sect. 53.2.5 gives an overview of standard search 
operators. 


53.2.1 General Design Guidelines 


During the 1990s, Radcliffe developed guidelines for 
the design of search operators. It is important for search 
operators that the representation used is taken into ac- 
count as search operators are based on the metric that 
is defined on the genotype space. Radcliffe introduced 
the principle of formae, which are subsets of the search 
space [53.24-29]. Formae are defined as equivalence 
classes that are induced by a set of equivalence rela- 
tions. Any possible solution of an optimization problem 
can be identified by specifying the equivalence class 
to which it belongs for each of the equivalence re- 


and combinatorial problems. Trees, schedules, tours, or 
other combinatorial problems can easily be represented 
by using continuous genotypes and special genotype— 
phenotype mappings (for an example, see weighted 
encodings for trees [53.20, 21]). 


Messy Representations 

In all previously discussed genotypes, the position of 
each allele is fixed along the chromosome and only 
the corresponding value is specified. A first gene- 
independent genotype was proposed by [53.22], where 
an inversion operator changes the relative order of the 
alleles in the string. The position of an allele and the 
corresponding value are coded together as a tuple in 
a string. This concept can be used for all types of 
genotypes such as binary, integer, and real-valued al- 
leles, and allows an encoding which is independent of 
the position of the alleles in the chromosome. Later, 
Goldberg et al. [53.23] used this position-independent 
representation for the messy genetic algorithm. 


lations. For example, if we have a search space of 
faces [53.30], basic equivalence relations might be 
same hair color or same eye color, which would in- 
duce the formae red hair, dark hair, green eyes, etc. 
Formae of higher order like red hair and green eyes 
are then constructed by composing simple formae. The 
search space, which includes all possible faces, can be 
constructed with strings of alleles that represent the dif- 
ferent formae. For the definition of formae, the structure 
of the phenotypes is relevant. For example, for binary 
problems, possible formae would be bit i is equal to 
one/zero. 

It is an unsolved problem to find appropriate equiv- 
alences for a particular problem. From the equiva- 
lences, the genotype search space ®, and the genotype— 
phenotype mapping f can be constructed. Usually, 
a solution is encoded as a string of alleles. The 
value of an allele indicates whether the solution sat- 
isfies a particular equivalence. Radcliffe [53.25] pro- 
posed several design guidelines for creating appro- 
priate equivalences for a given problem. The most 
important one is that the generated formae should 
group together solutions of related fitness [53.28], 
in order to create a fitness landscape or structure 
of the search space that can be exploited by search 
operators. 


1065 


T'ES | J Hed 


1066 PartE 


Evolutionary Computation 


T'ES |3 Hed 


Radcliffe recognized that the genotype search space, 
the genotype—phenotype mapping, and the search op- 
erators belong together, and their design cannot be 
separated from each other [53.26]. He assumed that 
search operators create offspring solutions from a set 
of parent solutions. For the development of appropriate 
search operators that are based on predefined formae, he 
formulated the following four design principles [53.25, 
29]: 


@ Respect: offspring produced by recombination 
should be members of all formae to which both their 
parents belong. For the face example this means that 
offspring should have red hair and green eyes if both 
parents have red hair and green eyes. 

@ Transmission: an offspring should be equivalent to 
at least one of its parents under each of the basic 
equivalence relations. This means that every gene 
should be set to an allele which is taken from one 
of the parents. If one parent has dark hair and the 
other red hair, then the offspring has either dark or 
red hair. 

@ Assortment: an offspring can be formed with any 
compatible characteristics taken from the parents. 
Assortment is necessary as some combinations 
of equivalence relations may be infeasible. This 
means, for example, that the offspring inherits dark 
hair from the first parent and blue eyes from the 
second parent only if dark hair and blue eyes are 
compatible. Otherwise, the alleles are set to feasi- 
ble values taken from a random parent. 

© Ergodicity: an iterative use of search operators al- 
lows us to reach any point in the search space from 
all possible starting solutions. 


Radcliffe developed a consistent concept of how 
to design efficient EAs once appropriate equivalence 
classes (formae) are defined. However, the finding 
of appropriate equivalence classes, which is equiva- 
lent to either defining the genotype search space and 
the genotype—phenotype mapping or appropriate direct 
search operators on the phenotypes, is often difficult 
and remains an unsolved problem. 

As long as the genotypes are either binary, inte- 
ger, or real-valued strings, standard recombination and 
mutation operators can be used. The situation is differ- 
ent if direct representations (Sect. 53.2.4) are used for 
problems whose phenotypes are not binary, integer, or 
real-valued. Specialized operators are necessary that al- 
low offspring to inherit important properties from their 
parents [53.24, 25, 27,31]. In general, these operators 


are problem-specific and must be developed separately 
for every optimization problem. 


53.2.2 Local Search Operators 


Local search and the use of local search operators 
are at the core of EAs. The goal of local search is 
to find fitter individuals by performing neighborhood 
search [53.32]. Usually, a local search operator creates 
offspring that have a small or sometimes even mini- 
mal distance to their parents. Therefore, local search 
operators and the metric on the corresponding search 
space cannot be decided independently of each other 
but determine each other. A metric defines possible 
local search operators and a local search operator de- 
termines the metric. As search operators are applied to 
the genotypes, the metric on ®, is relevant for the defi- 
nition of local search operators. 

The basic idea behind using local search operators 
is that the structure of a fitness landscape should guide 
a search heuristic to high-quality solutions [53.33], and 
that good solutions can be found by performing small 
iterated changes. We assume that in most real-world 
problems high-quality solutions are not isolated in the 
search space but grouped together [53.34, 35]. There- 
fore, better solutions can be found by searching in the 
neighborhood of already found good solutions. The 
search steps must be small because too large search 
steps would result in randomization of the search, and 
guided search around good solutions would become 
impossible. In contrast, when using search operators 
that perform large steps in the search space it would 
not be possible to find better solutions by searching 
around already found good solutions but the search al- 
gorithm would jump randomly around the search space 
(Sect. 53.3.1). 

The following paragraphs review some common lo- 
cal search operators for binary, integer, and continuous 
genotypes and illustrate how they are designed based on 
the underlying metric. The local search operators (and 
underlying metrics) are commonly used and are usu- 
ally a good choice. However, in principle, we are free to 
choose other metrics and to define corresponding search 
operators. Then, the metric should be chosen such that 
high-quality solutions are neighboring solutions and the 
resulting fitness landscape leads guided search methods 
to an optimal solution. The choice of a proper metric 
and corresponding search operators are always problem 
specific and the ultimate goal is to choose a metric such 
that the problem becomes easy for EAs. However, we 
want to emphasize that for most practical applications 


Design of Representations and Search Operators | 53.2 Search Operators 1067 


the illustrated search operators are a good choice and 
allow us to design efficient and effective EAs. 


Binary Genotypes 
When using binary genotypes, the distance between 
two solutions x, y € {0, 1}! is often measured using the 
Hamming distance. Local search operators based on 
this metric generate new solutions with the Hamming 
distance d(x,y) = 1. This type of search operator is 
also known as a standard mutation operator for binary 
strings or a bit-flipping operator. As each binary solu- 
tion of length / has / neighbors, this search operator can 
create / different offspring. For example, applying the 
bit-flipping operator to (0, 0,0, 0) can result in four dif- 
ferent offspring (1,0,0,0), (0, 1,0,0), (0,0, 1,0), and 
(0,0, 0, 1). 

Reeves [53.36] proposed another local search oper- 
ator for binary strings based on a different neighbor- 
hood definition: for a randomly chosen k € {0,..., l}, 
it complements the bits x;,...,x;. Again, each so- 
lution has / neighbors. For example, applying this 
search operator to (0,0,0,0) can result in (1,1, 1,1), 
(0,1, 1, 1), (0,0, 1, 1), or (0,0, 0, 1). Although the op- 
erator is of minor practical importance, it has some 
interesting theoretical properties. First, it is closely re- 
lated to the one-point recombination crossover (see 
below) as it chooses a random point and inverts all 
x; with i > k. Therefore, it has also been called the 
complementary crossover operator. Second, if all geno- 
types are encoded using Gray code [53.37,38], the 
neighbors of a solution in the Gray-coded search space 
using Hamming distance are identical to the neigh- 
bors in the original binary-coded search space using the 
complementary crossover operator. Therefore, Ham- 
ming distances between Gray encoded solutions are 
equivalent to the distances between the original binary 
encoded solutions using the metric induced by the com- 
plementary crossover operator (neighboring solutions 
have distance one). For more information regarding the 
equivalence of different neighborhood definitions and 
search operators we refer to the literature [53.36, 39, 
40]. 


Integer Genotypes 
For integer genotypes, different metrics are common, 
leading to different local search operators. When us- 
ing the binary Hamming metric, two individuals are 
neighbors if they differ in one decision variable. Search 
operators based on this metric can assign a random 
value to a randomly chosen allele. Therefore, each so- 
lution x€ {0,...,k! has Ik neighbors. For example, 


x = (0,0) with x; € {0,1,2} has four different neigh- 
bors ((1, 0), (2,0), (0, 1), and (0, 2)). 

The situation changes when defining local search 
operators based on the city-block metric. Then, a lo- 
cal search operator can create new solutions by slightly 
increasing or decreasing one randomly chosen deci- 
sion variable. For example, new solutions are gen- 
erated by adding +/-1 to a randomly chosen vari- 
able x;. Each solution of length / has a maximum 
of 2/ different neighbors. For example, x = (0,0) with 
x; E€ {0,1,2} has only two different neighbors (0, 1) 
and (1,0). 

Finally, we can define search operators such that 
they do not modify values of decision variables but 
exchange values of two decision variables x; and x. 
Therefore, using the Hamming distance, two neighbors 
have distance d = 2 and each solution has a maximum 
of (2) different neighbors. For example, x = (3,5,2) 
has three different neighbors ((5,3,2), (2,5,3), and 
(3,2, 5)). 


Continuous Genotypes 
For continuous genotypes, we can define local search 
operators analogously to integer genotypes. Based on 
the binary Hamming metric, the application of a lo- 
cal search operator can assign a random value x; € 
[xi min; Xi,max] to the i-th decision variable. Furthermore, 
we can define a local search operator such that it ex- 
changes the values of two decision variables x; and x. 
The binary Hamming distance between old and new so- 
lutions is d = 2. 

The situation is a little more complex in compari- 
son to integer genotypes when designing a local search 
operator based on the city-block metric. We must de- 
fine a search operator such that its iterative application 
allows us to reach all solutions in reasonable time. 
Therefore, a search step should be not too small (we 
want to have some progress in search) and not too large 
(the offspring should be similar to the parent solution). 
A commonly used concept for such search operators is 
to add a random variable with zero mean to the de- 
cision variables. This results in x’; = x; + m, where m 
is a random variable and x’ is the offspring generated 
from x. Sometimes m is uniformly distributed in [—a, a], 
where a < (Xi max — Xi,min). More common is the use of 
anormal distribution N (0, 0) with zero mean and stan- 
dard deviation o. The addition of zero-mean Gaussian 
random variables generates offspring that have, on aver- 
age, the same statistical properties as their parents. For 
more information on local search operators for contin- 
uous variables, we refer to [53.41]. 


T'ES | J Hed 


1068 PartE 


Evolutionary Computation 


T'ES |3 Hed 


53.2.3 Recombination Operators 


To be able to use recombination operators, a set of so- 
lutions (population) must exist as the goal of recombi- 
nation is to recombine meaningful properties of parent 
solutions. Thus, for the application of recombination 
operators at least two parent solutions are necessary; 
otherwise local search operators are the only option. 
Recombination operators should be designed according 
to Radcliffe’s recommendations (Sect. 53.2.1). 

Analogously to local search operators, recombina- 
tion operators should be designed based on the metric 
used [53.6, 30]. Given two parent solutions x?! and xP? 
and one offspring solution x°, recombination operators 
should be designed such that 


max(d(x?!, x°), d(x, x2)) < d(x?!, xP?) . (53.3) 


Therefore, the application of recombination operators 
should result in offspring where the distances between 
offspring and its parents are equal to or smaller than 
the distance between the parents. When viewing the 
distance between two solutions as a measurement of 
dissimilarity, this design principle ensures that offspring 
solutions are similar to parents. Consequently, applying 
a recombination operator to the same parent solutions 
x?! = xP? should also result in the same offspring (x° = 
xP! = xP?), 

In the last few years, this basic concept of the de- 
sign of recombination operators has been interpreted as 
geometric crossover [53.42—44]. This work builds upon 
previous work [53.6, 30,45] and defines crossover and 
mutation representation-independently using the notion 
of distance associated with the search space. 

Why should we use recombination operators in 
EAs? The motivation is that we assume that many real- 
world problems are decomposable. Therefore, prob- 
lems can be solved by decomposing them into smaller 
subproblems, solving these smaller subproblems, and 
combining the optimal solutions of the subproblems to 
obtain overall problem solutions. The purpose of re- 
combination operators is to form new overall solutions 
by recombining solutions of smaller subproblems that 
exist in different parent solutions. If this juxtaposition 
of smaller, highly fit, partial solutions (often denoted 
as building blocks) does not result in good solutions, 
search strategies that are based on recombination op- 
erators will show low performance. However, as many 
problems of practical relevance can be decomposed into 
smaller problems (they are decomposable), the use of 
recombination operators often results in good perfor- 
mance of EAs. 


Common recombination operators for standard 
genotypes are one-point crossover [53.22] and uniform 
crossover [53.46-48]. We assume a vector or string x 
of decision variables of length /. When using one-point 
crossover, a crossover point c={I1,...,/—1} 
initially chosen randomly. Usually, two offspring solu- 
tions are created from two parent solutions by swapping 
oe partial nings: As a result, we obtain for the parents 


= [eet xP] and xP? = P,a, 2] the 
tn gis cae eee we, Pa] and 
= p, ee eee a ]. A generalized 


version of one- point crossover is n-point crossover. For 
this type of crossover operator, we choose n different 
crossover points and create an offspring by alternately 
selecting alleles from parent solutions. For uniform 
crossover, we decide independently for every single 
allele of the offspring from which parent solution it 
inherits the value of the allele. In most implementa- 
tions, no parent is preferred and the probability of an 
offspring inheriting the value of an allele from a spe- 
cific parent is p = 1/m, where m denotes the number 
of parents that are considered for recombination. For 
example, when two possible offspring are considered 
with the same probability (p = 1/2), we could obtain 
as offspring x?! = [x"", xpd eae 
E E E E 

Figure 53. 2 presents examples for the three 
crossover variants. All three recombination operators 
are based on the binary Hamming distance and fol- 
low (53.3) as d(xP!, xP?) > max(d(a?!, x°), d(xP?, x°)). 
Therefore, the similarity between offspring and parent 
is not lower than between the parents. 

Uniform and n-point crossover can be used inde- 
pendently of the type of decision variables (binary, 
integer, continuous, etc.), since these operators only 
exchange alleles between parents. In contrast, inter- 
mediate recombination operators attempt to average 
or blend components across multiple parents and are 
designed for continuous and integer problems. Given 
two parents x?! and x®?, a crossover operator known 
as arithmetic crossover [53. 49] creates an offspring 
X° as x? = ax?” + (1—a)x?”, where œ € [0, 1]. Ifa = 
0.5, the crossover just takes the average of both par- 
ent solutions. In general, for m parents, this operator 
becomes x? = ae ang , where X`; L; a; = 1. Arith- 
metic crossover is based on the city-block metric. With 
respect to this metric, the distance between offspring 
and parent is smaller than the distance between the par- 
ents. Another type of crossover operator that is based on 
the Euclidean metric is geometrical crossover [53.49]. 


and 


Design of Representations and Search Operators 


53.2 Search Operators 


Fig. 53.2a-c Different crossover 


a b G : ; 
) 7 ) ; ) variants. (a) One-point crossover. 
Crossover point Crossover points A : 
(b) Two-point crossover. (c) Uniform 
2.7|0.6} 7 | 9 | 23] 2 2.7}0.6) 7 | 9 | 23] 2 2.7106] 7 | 9 | 23] 2 | crossover 
4.3|0.7| 10| 5 | 17] 1 4.3}0.7] 10} 5 | 17] 1 4.3]0.7] 10] 5 | 17] 1 | 


Parent solutions Parent solutions 


Parent solutions 


2.7/0.6} 7| 9 | 17] 1 2.7/0.7} 10} 5 | 23] 2 4.3 


0.6 


7| 3 || 2] 


4.3]}0.7| 10} 5 | 23) 2 4.3}0.6| 7) 9 | 17] 1 2.7 


0.7 


10] 9 J 23] 1 | 


Offspring solutions Offspring solutions 


Given two parents, an offspring is created as x? = 


tors we refer to [53.50] (binary crossover) and [53.41] 
(continuous crossover). 


/ pl p2 : ; 
xX? x. For further information on crossover opera- 


53.2.4 Direct Representations 


If we apply search operators directly to phenotypes, it is 
not necessary to specify a representation and a genotype 
space. Then, phenotypes are the same as genotypes 


F(X): Bg >R. (53.4) 


fz does not exist and we have a direct representa- 
tion. Because there is no longer an additional mapping 
between Ø, and ®), a direct representation does not 
change any aspect of the phenotype problem such as 
difficulty or metric. However, when using direct rep- 
resentations, we often cannot use standard search op- 
erators, but have to define problem-specific operators. 
Therefore, important for the success of EAs using a di- 
rect representation is not finding a good representation 
for a specific problem, but developing proper search op- 
erators defined on phenotypes. 

The definition of the variation operators are relevant 
for different implementations of direct representations. 
Since we assume that local search operators always 
generate neighboring solutions, the definition of a lo- 
cal search operator induces a metric on the genotypes. 
Therefore, the metric that we use on the genotype 
space should be chosen in such a way that new solu- 
tions that are generated by local search operators have 
a small distance to the old solutions and the solutions 
are neighbors with respect to the metric used. Further- 
more, the distance between two solutions x € ®, and 
y € P, should be proportional to the minimal number 


Offspring solutions 


of local search steps that are necessary to move from x 
to y. Analogously, the definition of a recombination op- 
erator also induces a metric on the search space. The 
metric used should guarantee that the application of 
a recombination operator to two solutions x? € ®, and 
yP € @, creates a new solution x° € Ø, whose distances 
to the parents are not larger than the distance between 
the parents (53.3). 

For the definition of variation operators, we should 
also consider that for many problems we have a natural 
notion of similarity between phenotypes. When we cre- 
ate a problem model, we often know which solutions are 
similar to each other. Such a notion of similarity should 
be considered for the definition of variation operators. 
We should design local search operators in such a way 
that their application creates solutions which we view 
as similar. Such a definition of local search operators 
ensures that neighboring phenotypes are also neighbors 
with respect to the metric that is induced by the search 
operators. 

At a first glance, it seems that the use of direct rep- 
resentations makes life easier as direct representations 
release us from the challenge to design efficient rep- 
resentations. However, we are confronted with some 
problems: 


@ There are many phenotypes to which no standard 
variation operators can be applied. 

@ The design of high-quality problem-specific search 
operators is difficult. 

@ We cannot use EAs that only work on standard 
genotypes. 


For indirect representations with standard geno- 
types, the definition of search operators is straightfor- 
ward as these are usually based on the metric of the 
genotype space (Sects. 53.2.2 and 53.2.3). The behav- 


1069 


T'ES | J Hed 


1070 PartE 


Evolutionary Computation 


T'ES |3 Hed 


ior of EAs using standard search operators is usually 
well studied and well understood. However, when us- 
ing direct representations, standard operators can often 
no longer be used. Instead, problem-specific operators 
must be developed for each phenotype. This is difficult, 
as we cannot use most of our knowledge about the be- 
havior of EAs using standard genotypes and standard 
operators. 

The design of proper search operators is often de- 
manding as phenotypes are usually not string-like but 
are more complicated structures like trees, schedules, 
time tables, or other structures (Sect. 53.2.5). In this 
case, phenotypes cannot be depicted as a string or in an- 
other way that is accessible to variation operators. Other 
representative examples are the form or shape of an ob- 
ject. Search operators that can be directly applied to the 
shape of an object are often difficult to design. 

Finally, using specific variants of EAs like estima- 
tion of distribution algorithms (EDAs) becomes very 
difficult. These types of EAs do not use standard search 
operators that are applied to genotypes but build new so- 
lutions according to a probabilistic model of previously 
generated solutions [53.51—56]. These search methods 
were developed for a few standard genotypes (usually 
binary and continuous) and result in better performance 
than, for example, traditional simple genetic algorithms 
for decomposable problems [53.57, 58]. However, be- 
cause direct representations with non-standard pheno- 
types and problem-specific search operators can hardly 
be implemented in EDAs, direct representations cannot 
benefit from these optimization methods. 


53.2.5 Standard Search Operators 


We provide an overview of standard search spaces and 
the corresponding search operators. The search spaces 
can either represent genotypes (indirect representation) 
or phenotypes (direct representation). We order the 
search spaces by increasing complexity. With increas- 
ing complexity, the design of search operators becomes 
more demanding. An alternative to designing complex 
search operators for complex search spaces is to in- 
troduce additional mappings that map complex search 
spaces to simpler ones. Then, the design of the cor- 
responding search operators becomes easier, however, 
a proper design of the additional mapping (representa- 
tion) becomes more important. 


Strings and Vectors 
Strings and vectors of either fixed or variable length are 
the most elementary search spaces. They are the most 


frequently used genotype structures. Vectors allow us 
to represent an ordered list of decision variables and are 
the standard genotypes for the majority of optimization 
problems. Strings are appropriate for sequences of char- 
acters or patterns. Consequently, strings are suited for 
problems where the objects modeled are text, charac- 
ters, or patterns. 

For strings and vectors with fixed length, we can 
use standard local search and recombination operators 
(Sects. 53.2.2 and 53.2.3) that are based on the Ham- 
ming metric or the binary Hamming metric. Search 
operators for strings and vectors with variable length 
are often based on the Levenshtein distance. 


Coordinates/Points 
To represent locations in a geometric space, coordinates 
can be used. Coordinates can be either integer or con- 
tinuous. Common examples are locations of cities or 
other spots on two-dimensional grids. Coordinates are 
appropriate for problems that work on sites, positions, 
or locations. 

We can use standard local and recombination opera- 
tors for continuous variables and integers, respectively. 
For coordinates, the Euclidean metric is often used to 
measure the similarity of solutions. 


Graphs 
Graphs allow us to represent relationships between ar- 
bitrary objects. Usually, the structure of a graph is 
described by listing its edges. An edge represents a re- 
lationship between a pair of objects. Given a graph 
with n nodes (objects), there are n(n— 1)/2 possible 
edges. Using graphs is appropriate for problems that 
seek a network, circuit, or relationship. 

Common genotypes for graphs are lists of edges in- 
dicating which edges are used. For example, the char- 
acteristic vector representation encodes graphs of fixed 
size using a binary vector of length n(n— 1)/2 [53.3, 
Sect. 6.3]. Standard search operators for the characteris- 
tic vector representation are based on the Hamming met- 
ric as the distance between two graphs can be calculated 
as the number of different edges. Standard search oper- 
ators can be used if there are no additional constraints. 


Subsets 
Subsets represent selections from a set of objects. Given 
n different objects, the number of subsets having ex- 
actly k elements is equal to (o). Thus, the number of 
possible subsets is $` %—ọo () = 2”. For subsets, the or- 
der of the objects does not matter. The two example 


subsets {1,3,5} and {3, 5, 1} represent the same pheno- 


Design of Representations and Search Operators | 53.3 Problem-Specific Design of Representations and Search Operators 1071 


type solution. Local search operators that can be applied 
directly to subsets often either modify the objects in 
the subset, or increase/reduce the number of objects in 
one subset. Recombination operators that are directly 
applied to subsets are more sophisticated as no stan- 
dard operators can be used. We refer to [53.59] for 
detailed information on their design. Subsets are often 
used for problems that seek a cluster, collection, parti- 
tion, group, packaging, or selection. 

Given n different objects, a subset of fixed size k 
can be represented using an integer vector x of length 
k, where the x; indicate the selected objects and x; Æ x, 
for i Aj and i,j € [1, k]. Then, standard local search op- 
erators can be applied if we assume that each of the k 
selected objects is unique. The application of recombi- 
nation operators is more demanding as each subset is 
represented by k! different genotypes (integer vectors) 
and the distances between the k! different genotypes 
that represent the same subset are large [53.60]. Re- 
combination operators must be designed such that the 
distances between offspring and parents are smaller 
than the distances between parents (53.3) and the re- 
combination of two genotypes that represent the same 
subset always results in the same subset. For guidelines 
on the design of appropriate recombination operators 
and examples, we refer to [53.61] and [53.60]. 


Permutations 
A large variety of EAs have been developed for per- 
mutation problems as many such problems are of 
practical relevance but NP-hard (NP: non-deterministic 
polynomial-time). Permutations are orderings of items. 
The order of the objects is relevant for permutations. 
The number of permutations on a set of n elements is 
n!. 1-2-3 and 1-3-2 are two examples of permutations 
of three integer numbers x€ {1,2,3}. The traveling 
salesperson problem (TSP) is a prominent example of 
a permutation problem. Permutations are commonly 
used for problems that seek an arrangement, tour, or- 
dering, or sequence. 

The design of appropriate search operators for per- 
mutations is demanding. In many approaches, permuta- 
tions are encoded using an integer genotype vector of 
length n, where each decision variable x; indicates an ob- 


ject and has a unique value (x; Æ x for i Aj and i,j € 
{1,...,2}). Standard recombination and mutation opera- 
tors applied to such genotypes fail since the resulting so- 
lutions usually represent no permutations. Therefore, in 
the literature a variety of different permutation-specific 
variation operators have been developed. They are either 
based on the absolute or relative ordering of the objects 
in a permutation. When using the absolute ordering of 
objects in a permutation as the distance metric, two solu- 
tions are similar to each other if the objects have the same 
position in the two solutions (x! = x7). For example, 1- 
2-3-4 and 2-3-4-1 have a maximum absolute distance of 
d = 4, as the two solutions have no common absolute po- 
sitions. In contrast, when using relative ordering, two so- 
lutions are similar if the sequence of objects is similar for 
the two solutions. For example, 1-2-3-4 and 2-3-4-1 have 
distance d = 1 as the two permutations are shifted by one 
position. Based on the metric used (relative versus abso- 
lute ordering), a large variety of different recombination 
and local search operators have been developed. Exam- 
ples are the order crossover [53.62], partially mapped 
crossover [53.63], cycle crossover [53.64], generalized 
order crossover [53.65], or precedence preservative 
crossover [53.66]. For more information on the design 
of such permutation-specific variation operators, we re- 
fer to [53.67, 68], and [53.60]. 


Trees 
Trees are used to describe hierarchical relationships be- 
tween objects. Trees are a specialized variant of graphs 
where only one path exists between each pair of nodes. 
As standard search operators cannot be applied to tree 
structures, we either need to define problem-specific 
search operators that are directly applied to trees or ad- 
ditional genotype—phenotype mappings that map each 
tree to simpler genotypes where standard variation op- 
erators can be applied. 

We can distinguish between trees of fixed and vari- 
able size. For trees of fixed size, we refer to [53.2]. 
Search operators for tree structures of variable size are 
at the core of genetic programming. They are often 
based on the Levenshtein distance. Further information 
about appropriate search operators for trees of variable 
size can be found in [53.69] and [53.70]. 


53.3 Problem-Specific Design of Representations and Search Operators 


Jones and Forrest [53.71] assumed that the difficulty of 
an optimization problem is determined by how the ob- 


jective values are assigned to the solutions x € X and 
what metric is defined on X. They classified fitness 


EES |3 Hed 


1072 


EES |3 Hed 


Part E 


Evolutionary Computation 


landscapes into three classes, straightforward, difficult, 
and misleading. 


1. Straightforward. For such problems, the fitness of 
a solution is correlated with the distance to the 
optimal solution. With lower distance, the fitness 
difference to the optimal solution decreases. As the 
structure of the search space guides search methods 
towards the optimal solution such problems are usu- 
ally easy for guided search methods. 

2. Difficult. There is no correlation between the fitness 
difference and the distance to the optimal solu- 
tion. The fitness values of neighboring solutions 
are uncorrelated and the structure of the search 
space provides no information about which solu- 
tions should be sampled next by the search method. 

3. Misleading. The fitness difference is negatively cor- 
related to the distance to the optimal solution. 
Therefore, the structure of the search space misleads 
a guided search method to sub-optimal solutions. 


The general idea is to measure how well the metric 
defined on the search space fits the structure of the ob- 
jective function. A high fit between metric and structure 
of the fitness function makes a problem easy for guided 
search methods. 

A fundamental assumption about the application of 
EAs is that the vast majority of optimization problems 
that we can observe in the real world are: 


@ Neither misleading nor difficult, 
@ Have high locality (distance between solutions is 
correlated with their fitness difference). 


We assume that misleading problems have no im- 
portance in the real world as usually optimal solutions 
are not isolated in the search space surrounded by only 
low-quality solutions. Furthermore, we assume that the 
metric of a search space is meaningful and, on average, 
the fitness differences between neighboring solutions 
are smaller than between randomly chosen solutions. 
It is only because most real-world problems are nei- 
ther difficult nor misleading that guided search methods 
which use information about previously sampled so- 
lutions can outperform random search for real-world 
problems [53.2, 34]. 

Since we assume that high locality is a general prop- 
erty of real-world problems, EAs must ensure that their 
design does not destroy the high locality of a problem. 
If the high locality of a problem is destroyed, straight- 
forward problems turn into difficult problems and can- 
not be solved better than by random search [53.2]. 
Therefore, EAs must ensure that the search operators 


used fit the metric on the search space and representa- 
tions have high locality; this means phenotype distances 
must correspond to genotype distances. 

The second aspect of this section is how we can con- 
sider knowledge about problem-specific properties for 
the design of EAs. For example, we have knowledge 
about the character and properties of high-quality (or 
low-quality) solutions. Such problem-specific knowl- 
edge can be exploited by introducing a bias into EAs. 
The bias should consider this knowledge and, for exam- 
ple, concentrate search on solutions that are expected 
to be of high quality or avoid solutions expected to be 
of low quality. A bias can be considered in the repre- 
sentation as well as the search operator. However, EAs 
should only be biased if we have obtained some par- 
ticular knowledge about an optimization problem or 
problem instance. If we have no knowledge about prop- 
erties of a problem, we should not bias EAs as this will 
mislead the search heuristics. 

Section 53.3.1 discusses how the design of EAs can 
modify the locality of a problem. To ensure guided 
search, the search operators must fit the problem met- 
ric. Local search operators must generate neighboring 
solutions and recombination operators must generate 
offspring where the distances between offspring and 
parents do not exceed the distances between parents. 
Section 53.3.2 focuses on the possibility of biasing 
EAs. We discuss how problem-specific construction 
heuristics can be used as genotype—phenotype map- 
pings (Sect. 53.3.2) and how redundant representations 
affect heuristic search (Sect. 53.3.2). 


53.3.1 High Locality 


The locality of a problem measures how well the dis- 
tances d(x, y) between any two solutions x, y € X corre- 
spond to the difference in their fitness values |f(x) — 
f(y)|. Locality is high if neighboring solutions have 
similar fitness values and fitness differences correlate 
positively with distances. In contrast, the locality of 
a problem is low if low distances do not correspond to 
low differences in the fitness values. Important for the 
locality of a problem is the metric defined on the search 
space. 

The performance of guided search methods is high 
if the locality of a problem is relatively high; this means 
that the structure of the fitness landscape leads search 
algorithms to high quality solutions [53.2]. Local search 
methods show especially good performance if either 
high-quality or low-quality solutions are grouped to- 
gether in the solution space. Optimization problems 


Design of Representations and Search Operators 


53.3 Problem-Specific Design of Representations and Search Operators 


with high locality can usually be solved well using EAs, 
as all EAs have some kind of local search elements. 

The following paragraphs provide design guidelines 
for search operators and representations. Search oper- 
ators must fit the metric of a search space, because 
otherwise EAs show low performance as they behave 
like random search. A representation introduces an ad- 
ditional genotype—phenotype mapping. The locality of 
a representation describes how well the metric on the 
phenotype space fits to the metric on the genotype 
space. Low locality, which means there is a poor fit, ran- 
domizes the search and also leads to low performance 
of EAs. 


Search Operator 
EAs rely on the concept of local search. Local search 
iteratively generates new solutions similar to existing 
ones. Local search is a reasonable and successful search 
approach for real-world problems, as most real-world 
problems have high locality and are neither mislead- 
ing nor difficult. In addition, to avoid being trapped in 
local optima, EAs use diversification steps. Diversifi- 
cation steps randomize search and allow EAs to jump 
through the search space. 

Different types of EAs use different concepts for 
controlling intensification and diversification [53.2, 
Chap. 5]. Local search intensifies the search as it allows 
incremental improvements of already found solutions. 
Diversification steps must be relatively rare as they usu- 
ally lead to inferior solutions. When designing search 
operators, we must have in mind that EAs use local 
search operators as well as recombination operators 
for intensifying the search. Solutions that are gener- 
ated should be similar to the existing ones. Therefore, 
we must ensure that search operators (local search op- 
erators as well as recombination operators) generate 
similar solutions and do not jump around in the search 
space. This can be done by ensuring that local search 
operators generate neighboring solutions and recombi- 
nation operators generate solutions where the distances 
between parent and offspring are smaller or equal to the 
distances between parents (Sect. 53.2.3 and 53.3). 

Applying search operators to solutions defines 
a metric on the corresponding search space. With re- 
spect to the search operators, solutions are similar to 
each other if only a few local search steps suffice to 
transform one solution into another. Therefore, when 
designing search operators, it is important that the met- 
ric induced by the search operators fits the problem 
metric. If both metrics are similar (this means a lo- 
cal search operator creates neighboring solutions with 


respect to the problem metric), guided search will per- 
form well as it can systematically explore promising 
areas of the search space. 

Therefore, we should make sure that local search 
operators generate neighboring solutions. The fit be- 
tween the problem metric and the metric induced by the 
search operators should be high. Then, most real-world 
problems, where neighboring solutions have, on aver- 
age, similar fitness values, are easy to solve for EAs. 

For real-world problems, the design or choice of 
proper search operators can be difficult if it is unclear 
what a natural problem metric is. We want to illustrate 
this issue for a scheduling problem. Given a number of 
tasks, we want to find an optimal schedule. There are 
different metrics that can be relevant for such a permu- 
tation problem. We have the choice between metrics 
based either on the relative or absolute ordering of 
the tasks (Sect. 53.2.5). The choice of the right prob- 
lem metric depends on the properties of the scheduling 
problem. For example, if we want to find an optimal 
class schedule, usually it is more natural to use a met- 
ric based on the absolute ordering of the tasks (classes). 
The relative ordering of the tasks is much less important 
as we have fixed time slots. The situation is reversed 
if we want to find an optimal schedule for a paint 
shop. For example, when painting different cars con- 
secutively, color changes are time-consuming as paint 
tools have to be cleaned before a new color can be used. 
Therefore, the relative ordering of the tasks (paint jobs) 
is important, as the tasks should be grouped together 
such that tasks that require the same color are painted 
consecutively and ordered such that the paint shop starts 
with the brightest colors and ends with the darkest ones. 

This example makes it clear that the most natural 
problem metric does not depend on the set of possible 
solutions but on the character of the underlying opti- 
mization problem and fitness function. The goal is to 
choose a metric such that the locality of the problem 
is high. The same is true for the design of operators. 
A high-quality local search operator should generate 
solutions with similar fitness. Then, problems become 
easy to solve for EAs. 


Representation 
Representations introduce an additional genotype— 
phenotype mapping and thus modify the fit between 
the metric on the genotype space (which is induced 
by the search operators used), and the original problem 
metric on the phenotype space. High-quality represen- 
tations ensure that the metric on the genotype space 
fits the original problem metric. The locality of a rep- 


1073 


EES |3 Hed 


1074 Part E 


Evolutionary Computation 


EES |3 Hed 


resentation describes how well neighboring genotypes 
correspond to neighboring phenotypes [53.72-76]. In 
contrast to the locality of a problem, which measures 
the fit between fitness differences and phenotype dis- 
tances, the locality of a representation measures the fit 
between phenotype distances and genotype distances. 

The use of a representation can change the dif- 
ficulty of problems (Sect. 53.1.2) [53.6]. The ability 
of representations to change the difficulty of a prob- 
lem is closely related to their locality. The locality of 
a representation is high if all neighboring genotypes 
correspond to neighboring phenotypes. In contrast, it 
is low if neighboring genotypes do not correspond to 
neighboring phenotypes. Therefore, the locality dm of 
a representation can be defined as [53.3, Sect.3.3] 


> E, T dinl , 


di y=d* 


min 


dn = (53.5) 


where d., is the phenotype distance between the phe- 
notypes x? and y?, dy is the genotypic distance be- 
tween the corresponding genotypes, and d’; and dẹ in 
are the minimum distances between two (neighboring) 
phenotypes and genotypes, respectively. Without loss of 
generality, we assume that d = Ol ge For dm = 0, all 
genotypic neighbors correspond to phenotypic neigh- 
bors and the encoding has perfect (high) locality. 

We want to emphasize that the locality of a rep- 
resentation depends on the representation fọ and the 
metrics that are defined on ®, and ®,. f, alone only 
determines which phenotypes are represented by which 
genotypes and cannot be used for measuring similarities 
between solutions. To describe or measure the local- 
ity of a representation, a metric must be defined on ®, 
and ®). 

Figure 53.3 illustrates the difference between high- 
locality and low-locality representations. We assume 
12 different phenotypes (a-l) and measure distances 
between solutions using the Euclidean metric. Each 
phenotype (lower case symbol) corresponds to one 
genotype (upper case symbol). The representation fg 
has perfect (high) locality if neighboring genotypes cor- 
respond to neighboring phenotypes. Then, local search 
steps have the same effect in the phenotype and geno- 
type search space. 

If we assume that fẹ is a one-to-one mapping, ev- 
ery phenotype is represented by exactly one genotype 
and there are |®,|! = |®,|! different representations. 
Each of these many different representations assigns the 
genotypes to the phenotypes in a different way. 


We want to ask how the locality of a representa- 
tion influences the performance of EAs. Often, there is 
a natural problem metric describing which phenotypes 
are similar to each other. A representation introduces 
a new genotype metric based on the genotypes and 
search operators used. This metric can be different from 
the problem (phenotype) metric. Therefore, the charac- 
ter of search operators can be different for genotypes 
versus phenotypes. If the locality of a representation 
is high, then a search operator has the same effect 
on the phenotypes as on the genotypes. As a result, 
the original problem difficulty remains unchanged by 
a representation f,. Easy (straightforward) problems 
remain easy and misleading problems remain mislead- 
ing. Figure 53.4 (left) illustrates the effect of local 
search operators for high-locality representations. A lo- 
cal search step has the same effect on the phenotypes as 
on the genotypes. 

For low-locality representations, the situation is dif- 
ferent and the influence of a representation on the 


Phenotype 
search space 


A B C D LAC I 
Genotype FGH DFKH 
search space 

I J KL E J BG 

High locality Low locality 


Fig. 53.3 High versus low-locality representations 
High locality 


Low locality 


Phenotype 
search space 


Genotype 
search space 


Fig. 53.4 The effect of local search operators for high ver- 
sus low-locality representations 


Design of Representations and Search Operators 


53.3 Problem-Specific Design of Representations and Search Operators 


difficulty of a problem depends on its character. If 
a problem f is straightforward, a low-locality repre- 
sentation f randomizes the problem by destroying the 
correlation between distance and fitness and making the 
problem f = f,(fe(x*)) more difficult. When using low- 
locality representations, a small change in a genotype 
does not correspond to a small change in the pheno- 
type, but larger changes in the phenotype are possible 
(Fig. 53.4, right). Therefore, when using low-locality 
representations, straightforward problems become, on 
average, difficult as low-locality representations lead 
to a more uncorrelated fitness landscape and heuristics 
can no longer extract meaningful information about the 
structure of the problem. Guided search becomes more 
difficult as many genotypic search steps do not result in 
a similar solution but in a random one. 

Summarizing the results, low-locality representa- 
tions have the same effect as using random search. 
Therefore, on average, straightforward problems be- 
come more difficult for guided search methods. As 
most real-world problems are straightforward, the use 
of low-locality representations makes these problems 
more difficult. Therefore, we strongly encourage users 
of EAs to use high-locality representations for problems 
of practical relevance. Of course, low-locality repre- 
sentations make misleading problems easier for guided 
search [53.3]; however, these are problems which we do 
not expect to meet in reality and we do not really want 
to solve. 

For more information on the influence of the local- 
ity of representations on the performance of EAs, we 
refer the interested reader to [53.3, Sect. 3.3] and [53.2]. 


53.3.2 Biasing Search 


This section discusses how to bias EAs. If we have 
some knowledge about the properties of either high- 
quality or low-quality solutions, we can make use of 
this knowledge for the design of EAs. For representa- 
tions, we can incorporate heuristics or introduce redun- 
dant encodings and assign a larger number of genotypes 
to high-quality phenotypes. Search operators can be 
designed in such a way that they distinguish between 
high-quality and low-quality solution features (building 
blocks) and prefer the high-quality ones [53.2]. 

A representation or search operator is biased if 
the application of a variation operator generates some 
solutions in the search space with higher probabil- 
ity [53.12]. We can bias representations by incorporat- 
ing heuristics into the genotype—phenotype mapping. 
Furthermore, representations can be biased if the num- 


ber of genotypes exceeds the number of phenotypes. 
Then, representations are called redundant [53.7780]. 
Redundant representations are biased if some pheno- 
types are represented by a larger number of genotypes. 
Analogously, search operators are biased if some solu- 
tions are generated with higher probability. 

When biasing EAs, we must make sure that we 
have a priori knowledge about the problem and the bias 
exploits this knowledge in an appropriate way. Intro- 
ducing an inappropriate or wrong bias into EAs would 
mislead search and result in low solution quality. Fur- 
thermore, we must make sure that a bias is not too 
strong. Using a bias can focus the search on specific 
areas of the search space and exclude solutions from 
consideration. If the bias is too strong, EAs can easily 
fail. 

The following paragraphs discuss biasing represen- 
tations and search operators. The next one gives an 
overview of how problem-specific construction heuris- 
tics can be used as genotype—phenotype mappings. 
Then, heuristic search varies either the input (problem 
space search) or the parameters (heuristic space search) 
of the construction heuristic. The following paragraph 
discusses redundant representations. Redundant repre- 
sentations with low locality randomize guided search 
and thus should not be used. Redundant representations 
with high locality can be biased by overrepresenting so- 
lutions similar to optimal solutions. 


Incorporating Construction Heuristics 

in Representations 
We focus on combining problem-specific construction 
heuristics with genotype—phenotype mappings. The 
possibility to design problem-specific representations 
and to incorporate relevant knowledge about the prob- 
lem into the genotype—phenotype mapping by using 
construction heuristics is a promising line of research 
and is continuously discussed in the operations research 
and evolutionary computation communities [53.6, 22, 
81-84]. 

Genotype—phenotype mappings map genotypes to 
phenotypes and can incorporate problem-specific con- 
struction heuristics. An early example of a problem- 
specific representation is the ordinal representation 
of [53.85], who studied the performance of genetic 
algorithms for the TSP. The ordinal representation 
encodes a tour (permutation of n integers) by a geno- 
type xë of length n, where xf € {1,...,n—i} and i € 
{0,...,n—1}. For constructing a phenotype, a pre- 
defined permutation x° of n integers representing the 
n different cities is used. x° can be problem-specific and, 


1075 


EES |3 Hed 


1076 PartE 


Evolutionary Computation 


EES |3 Hed 


for example, consider edge weights. A phenotype (tour) 
is constructed from x8 by subsequently adding (start- 
ing with i = 0) the x*-th element of x to the phenotype 
(which initially contains no elements) and removing the 
x;-th element of x°. Problem-specific knowledge can be 
considered by choosing an appropriate x° as genotypes 
define perturbations of x° and using small integers for 
the x? results in a bias of the resulting phenotypes to- 
wards x°. For example, for a =1(€ {0,...,n—1}), 
the resulting phenotype is x’. 

Other early examples of problem-specific repre- 
sentations can be found in [53.86], where a problem- 
specific schedule builder was incorporated into repre- 
sentations for job shop scheduling problems, in [53.87] 
where representations that use a greedy adding heuris- 
tic for partitioning problems, and the more general 
adaptive representation genetic optimization technique 
(ARGOT) strategy was studied in [53.88, 89]. ARGOT 
dynamically changes either the structure of the geno- 
types or the genotype—phenotype mapping according to 
the progress made during search. 

In parallel to, and independently from represen- 
tations, Storer et al. [53.90] proposed problem space 
search (PSS) and heuristic space search (HSS) as ap- 
proaches that also combine problem-specific heuristics 
with problem-independent EAs. PSS and HSS apply in 
each search iteration of a modern heuristic a problem- 
specific base heuristic H that exploits known properties 
of the problem. The base heuristic H should be fast 
and creates a phenotype from a genotype. Results pre- 
sented for different applications show that this approach 
can lead to improved performance of EAs [53.82, 91, 
92]. For PSS, H is applied to perturbed versions of the 
genotype. The perturbations of the genotypes are usu- 
ally small and based on a definition of neighborhood in 
the genotype space. For HSS, in each search step of the 
modern heuristic, the genotypes remain unchanged, but 
(slightly) different variants of the base heuristic H are 
used. For scheduling problems, linear combinations of 
priority dispatching rules with different weights, or the 
application of different base heuristics to different parts 
of the genotype, have been proposed [53.91]. 

PSS and HSS use the same underlying concepts 
as problem-specific representations. The base heuris- 
tic H is equivalent to a (usually problem-specific) 
genotype—phenotype mapping and assigns phenotypes 
to genotypes. PSS performs heuristic search by mod- 
ifying (perturbing) the genotypes, which is equivalent 
to the concept of representations originally proposed 
by [53.22] (for an early example, see [53.85]). HSS 
perturbs the base heuristic (genotype-phenotype map- 


ping), which is similar to the concept of adaptive repre- 
sentations (for early examples, see [53.88] or [53.93]). 


Redundant Representation 
We assume a combinatorial optimization problem with 
a finite number of phenotypes. If the size of the geno- 
type search space is equal to the size of the phenotype 
search space (|®,| = |®,|) and the representation maps 
all genotypes to all phenotypes (bijection), a representa- 
tion cannot be biased. All solutions are represented with 
the same probability and a bias can only be a result of 
the search operator used. 

The situation is different for representations where 
the number of genotypes exceeds the number of pheno- 
types. We still assume that all phenotypes are encoded 
by at least one genotype. Such representations are 
usually called (e.g., in [53.78, 94, 95], or [53.80]). Rad- 
cliffe and Surry [53.28] introduced a different notion 
of redundancy and distinguished between degenerated 
representations, where more than one genotype en- 
codes one phenotype, and redundant representations 
where parts of the genotypes are not used for the 
construction of a phenotype. However, this distinction 
has not generally been accepted in the EA commu- 
nity. Therefore, we follow the majority of the lit- 
erature and define encodings to be redundant if the 
number of genotypes exceeds the number of pheno- 
types (which is equivalent to the notion of degeneracy 
of [53.28]). 

Rothlauf and Goldberg [53.80] distinguished be- 
tween different types of redundant representations 
based on the similarity of the genotypes that are as- 
signed to the same phenotype. A representation is 
defined to be synonymously redundant if the geno- 
types that are assigned to the same phenotype are 
similar to each other. Consequently, representations are 
non-synonymously redundant if the genotypes that are 
assigned to the same phenotype are not similar to each 
other. Therefore, the synonymity of a representation 
depends on the genotype and phenotype metric. Fig- 
ure 53.5 illustrates the differences between synonymous 
and non-synonymous redundant encodings. Distances 
between solutions are measured using a Euclidean met- 
ric. The symbols indicate different genotypes and their 
corresponding phenotypes. When using synonymously 
redundant representations (left), genotypes that repre- 
sent the same phenotype are similar to each other. 
When using non-synonymously redundant representa- 
tions (right), genotypes that represent the same pheno- 
type are not similar to each other but distributed over 
the whole search space. 


Design of Representations and Search Operators | 53.3 Problem-Specific Design of Representations and Search Operators 1077 


Formally, a redundant representation f assigns 
a phenotype x? to a set of different genotypes xë € 
oY, where Vx® € oy :fe(x®) = xP. All genotypes xë 
in the genotype set oy represent the same phenotype 
xP. A representation is synonymously redundant if the 
genotype distances between all x8 € Ø? are small for 
all different x?. Therefore, if for all phenotypes the sum 
over the distances between all genotypes that corre- 
spond to the same phenotype 


>. DY } dew]. (53.6) 


x xep? yeep? 


where x8 Æ y$, is reasonably small, a representation is 
called synonymously redundant. d(x®, yë) depends on 
the metric used and measures the distance between two 
genotypes xë € oy and y8 € oy, which both represent 
the same phenotype x. 


Non-Synonymously Redundant Representations. 
The synonymity of a representation can have a large 
influence on the performance of EAs. When using non- 
synonymously redundant representations, a local search 
operator can result in solutions that are phenotypically 
completely different from their parents. For recombi- 
nation operators, the distances between offspring and 
parents are not necessarily smaller than the distances 
between parents. 

Local search methods outperform random search if 
solutions with similar fitness are grouped together in 
the search space and are not scattered over the whole 
search space [53.25, 33-35, 96, 97]. Furthermore, prob- 
lems are easy for guided search methods if distances 


Synonymous 


Non-synonymous 


Fig. 53.5 Synonymous versus non-synonymous_ redun- 
dancy 


between solutions are related to corresponding fitness 
differences. However, non-synonymously redundant 
representations destroy existing correlations between 
solutions and their corresponding fitness values. Thus, 
search heuristics cannot use any information learned 
during the search for determining future search steps. 
As a result, it makes no sense for guided search ap- 
proaches to search around already found high-quality 
genotypes and guided search algorithms become ran- 
dom search. A local search step does not result in 
a solution with similar properties but in a random 
solution. Analogously, recombination is not able to 
create new solutions with similar properties to their 
parents, but creates new, random solutions. Therefore, 
non-synonymously redundant representations have the 
same effect on EAs as low-locality representations 
(Sect. 53.3.1). 

The use of non-synonymously redundant represen- 
tations allows us to reach many different phenotypes 
in a single local search step (Fig. 53.6). However, in- 
creasing the connectivity between phenotypes results 
in random search and decreases the efficiency of EAs. 
As for low-locality representations, a genotype search 
step does not result in a similar phenotype but cre- 
ates a random solution. Therefore, guided search is no 
longer possible but becomes random search. As a result, 
we obtain reduced performance of EAs on straightfor- 
ward problems when using non-synonymously redun- 
dant representations. 

On average, non-synonymously redundant repre- 
sentations transform straightforward (as well as mis- 
leading) problems into difficult problems where the 
fitness differences between two solutions are not cor- 


Synonymous 


Non-synonymous 


Fig. 53.6a,b The effects of local search steps for (a) syn- 
onymously versus (b) non-synonymously redundant repre- 
sentations. The arrows indicate search steps 


EES |I Hed 


1078 PartE 


Evolutionary Computation 


EES |3 Hed 


related to their distances. Easy problems become 
more difficult. Therefore, we do not recommend us- 
ing non-synonymously redundant encodings. A more 
detailed discussion of non-synonymously redundant 
representations can be found in [53.3, Sect. 3.1] 
and [53.60]. 


Bias of Synonymously Redundant Representations. 
The use of synonymously redundant representations al- 
lows local search to generate neighboring solutions. 
Small variations of genotypes cannot result in large 
phenotypic changes but result either in the same or 
a similar phenotype (Fig. 53.6, left). 

To describe relevant properties of synony- 
mously redundant representations, we can use 
the order of redundancy, k,, which is defined as 
k, = log(|®,|)/ log(|®p]) [53.3, Sect. 3.1]. k; mea- 
sures the amount of redundant information in the 
encoding. Furthermore, we are especially interested 
in biases of synonymously redundant representations. 
r measures a bias and denotes the number of geno- 
types that represent the optimal solution. When using 
non-redundant representations, every phenotype is 
assigned to exactly one genotype (r = 1). In general, 
1<r<|®,|—|®|+1. 

Synonymously redundant representations are unbi- 
ased (uniformly redundant) if each phenotype is, on 
average, encoded by the same number of genotypes. 
In contrast, encodings are biased (non-uniformly redun- 
dant) if some phenotypes are represented by a different 
number of genotypes. Rothlauf and Goldberg [53.80] 
studied how the bias of synonymously redundant rep- 
resentations influence the performance of EAs. If rep- 
resentations are uniformly redundant, unbiased search 
operators generate each phenotype with the same prob- 
ability as for a non-redundant representation. Further- 
more, variation operators have the same effect on the 
genotypes and phenotypes and the performance of EAs 
using a uniformly and synonymously redundant encod- 
ing is similar to non-redundant representations. 

The situation is different for non-uniformly re- 
dundant encodings. The probability P of finding the 
correct solution depends on P x 1 —exp(—r/2*) [53.3]. 
Therefore, uniformly redundant representations do not 
change the behavior of EAs. Only by increasing r, 
which means overrepresenting optimal solutions, does 
the performance of EAs increase. In contrast, the per- 
formance of EAs decreases if the optimal solution is 
underrepresented. Therefore, non-uniformly redundant 
representations can only be used advantageously if a- 
priori information exists regarding optimal solutions. 


For more information on redundant representations, we 
refer the reader to [53.80]. 


Search Operators 
Search operators are applied either to genotypes or phe- 
notypes and subsequently create new solutions. Search 
operators can be either biased or unbiased. In the un- 
biased case, each solution in the search space has the 
same probability of being created. If some phenotypes 
have higher probabilities to be created by applying 
a search operator to a randomly chosen solution, we call 
this a bias towards those phenotypes [53.3, 98]. 

Using biased search operators can be helpful if 
some knowledge about the structure of high-quality 
solutions exists and the search operators are biased 
such that high-quality solutions are preferred. Then, the 
average fitness of solutions that are generated by the bi- 
ased search operator is higher than randomly generated 
solutions or search operators without a bias. Identify- 
ing high-quality solutions is often difficult, as exact 
optimization methods usually need exponential effort 
to find optimal solutions for relevant (often NP-hard) 
problems, and heuristic optimization methods usually 
do not provide any guarantee on solution quality. Pos- 
sible approaches to overcome these problems are to 
exactly solve small problem instances and to deduce the 
structure of high-quality solutions from the solutions 
obtained. However, this assumes that relevant prop- 
erties of optimal solutions are independent from the 
problem size. Second, solutions of higher quality can 
be identified by heuristic optimization methods if we in- 
crease the time spent on the heuristic search. Although 
heuristic optimization methods usually do not provide 
any guarantee of finding optimal solutions, the proba- 
bility of finding high-quality solutions increases with 
the time spent on the heuristics search. For an example, 
we refer the reader to [53.99]. 

Problems can occur if a bias induced by a search 
operator is either too strong or towards solutions that 
have a large distance to an optimal solution. If the bias 
is too strong, the of a population is reduced and the 
individuals in a population quickly converge towards 
those solutions to which the search operators are biased. 
Then, after a few search steps, heuristic search is no 
longer possible. Furthermore, the performance of EAs 
decreases if a bias exists towards solutions that have 
a large distance to optimal solutions. The biased search 
operators push a population of solutions in the wrong 
direction and it is difficult for EAs to find optimal solu- 
tions. Therefore, biased search operators should be used 
with care. 


Design of Representations and Search Operators 


53.4 Summary and Conclusions 


Standard search operators for binary, integer, or 
continuous genotypes (Sect. 53.2.5) are unbiased. The 
situation is slightly different for standard recombina- 
tion operators. Recombination operators never intro- 
duce new solution features but only recombine existing 
properties. Thus, once some solution features are lost 
in a population of solutions they can never be regained 
later by using recombination operators alone. Given 
a randomly generated and unbiased initial population 
of solutions, the iterative application of recombination 
operators can result in a random fixation of some de- 
cision variables, which reduces the of solutions that 
exist in a population. This effect is known as genetic 
drift. The existence of genetic drift is widely known 
and has been addressed in the field of population genet- 
ics [53.100—104] and also in the field of evolutionary 
algorithms [53.105—108]. 

When using more sophisticated phenotypes and 
direct search operators, identifying bias of search op- 
erators may be difficult. For example, in standard 
approaches, programs and syntactical expressions are 
encoded as trees of variable size. Daida et al. [53.109] 
showed that the two standard search operators in ge- 


53.4 Summary and Conclusions 


This chapter discusses the design of representations and 
search operators. Representations and search operators 
cannot be designed independently of each other, as to- 
gether they define the structure of the search space. 
Section 53.1 summarized the benefits of representa- 
tions and gave an overview of standard genotypes. 
Section 53.2 reviewed design guidelines for local, as 
well as recombination operators and gave an overview 
of standard search operators. Section 53.3 discussed 
possibilities for a problem-specific design of represen- 
tations and search operators. 

Representations and search operators should have 
high locality; this means that applying a local search 
operator to a genotype should result in a neighboring 
phenotype. If we have some knowledge about the prop- 
erties of high-quality solutions, we can bias evolution- 
ary algorithms by either incorporating problem-specific 
heuristics in the representation, using biased represen- 
tations, or biased search operators. 

For problems of practical relevance, we assume that 
the metric of a search space is meaningful and, on 
average, the fitness differences between neighboring 
solutions are smaller than between randomly chosen so- 
lutions. Search operators and representations should be 


netic programming (sub-tree swapping crossover and 
sub-tree mutation) are biased as they do not effec- 
tively search all tree shapes. In particular, very full 
or very narrow tree solutions are extraordinarily dif- 
ficult to find, even when the fitness function provides 
good guidance to the optimum solutions. Therefore, ge- 
netic programming approaches will perform poorly on 
problems where optimal solutions require full or nar- 
row trees. Furthermore, since the search operators do 
not find solutions that are at both ends of this full- 
ness spectrum, problems may arise if we use those 
search operators to solve problems whose solutions 
are restricted to a particular shape, of whatever degree 
of fullness. Hoai et al. [53.110] studied approaches to 
overcome these problems and introduced a new tree- 
based representation and local insertion and deletion 
search operators with a lower bias. 

In general, the bias of search operators can be 
measured by comparing the properties of randomly 
generated solutions with solutions that are created by 
subsequent applications of search operators. Examples 
of how to analyze the bias of search operators can be 
found in [53.2]. 


designed in such a way that they fit the metric of the 
search space. If local as well as recombination-based 
search operators are not able to generate similar solu- 
tions, intensification of search is not possible, and EAs 
behave like random search. For a representation which 
introduces an additional genotype—phenotype mapping, 
we must make sure that it does not alter the charac- 
ter of the search operators. Therefore, its locality must 
be high; this means that the phenotype metric must fit 
the genotype metric. Low locality of a representation 
randomizes the search and leads to low performance of 
evolutionary algorithms. 

There is a general trade-off between the effective- 
ness and application range of optimization methods. 
Usually, the more problems that can be solved with 
one particular optimization method, the lower its re- 
sulting average performance. Therefore, standard EAs 
that are not problem-specific often only work for small 
or toy problems. As the problem becomes larger and 
more realistic, performance degrades. To improve the 
performance for selected optimization problems, we 
must design them in a more problem-specific way. By 
assuming that most problems in the real world have 
high locality, EAs already exploit a specific property 


1079 


es | J Wed 


1080 PartE 


Evolutionary Computation 


ES | 3 Hed 


of problems, namely their high locality. We can fur- 
ther increase the performance of EAs if we have some 
idea about properties of high-quality solutions. Such 
problem-specific knowledge can be exploited by intro- 


ducing a bias into EAs. The bias should consider this 
knowledge and, for example, concentrate search on so- 
lutions that are expected to be of high quality or avoid 
solutions expected to be of low quality. 


References 
53.1 Z. Michalewicz, D.B. Fogel: How to Solve It: Mod- 53.16 J.E. Rowe, L.D. Whitley, L. Barbulescu, J.-P. Wat- 
ern Heuristics, 2nd edn. (Springer, Berlin 2004) son: Properties of Gray and binary representa- 
53.2 F. Rothlauf: Design of Modern Heuristics (Springer, tions, Evol. Comput. 12(1), 46-76 (2004) 
Berlin, Heidelberg 2011) 53.17 D.E. Goldberg: Real-coded genetic algorithms, 
53.3 F. Rothlauf: Representations for Genetic and Evo- virtual alphabets, and blocking, Complex Syst. 
lutionary Algorithms, 2nd edn. (Springer, Berlin, 5(2), 139-167 (1991) 
Heidelberg 2006) 53.18 N.J. Radcliffe: Theoretical foundations and prop- 
53.4 G. Mendel: Versuche über Pflanzen-Hybriden, erties of evolutionary computations: Schema pro- 
Verhandlungen Des Naturforschenden Vereins cessing. In: Handbook of Evolutionary Computa- 
(Naturforschender Verein zu Brünn) 4(1), 3-47 tion, ed. by T. Bäck, D.B. Fogel, Z. Michalewicz 
(1866) (Institute of Physics Publishing/Oxford Univ. Press, 
53.5 R.C. Lewontin: The Genetic Basis of Evolution- Bristol, New York 1997), pp. B2.5:1-B2.5:10 
ary Change, Columbia Biological Series, Vol. 25 53.19 D.B. Fogel, L.C. Stayton: On the effectiveness of 
(Columbia Univ. Press, New York 1974) crossover in simulated evolutionary optimization, 
53.6 G.E. Liepins, M.D. Vose: Representational issues in BioSystems 32, 171-182 (1994) 
genetic optimization, J. Exp. Theor. Artif. Intell.2, 53.20 G.R. Raidl, B.A. Julstrom: Edge-sets: An effective 
101-115 (1990) evolutionary coding of spanning trees, IEEE Trans. 
53.7 J.D. Bagley: The Behavior of Adaptive Systems Evol. Comput. 7(3), 225-239 (2003) 
Which Employ Genetic and Correlation Algo- 53.21 F. Rothlauf: A problem-specific and effective 
rithms, Ph.D. Thesis (University of Michigan, Col- encoding for metaheuristics for the minimum 
lege of Literature, Science, and the Arts. De- communication spanning tree problem, INFORMS 
partment of Communication Sciences. Ann Arbor, J. Comput. 21(4), 575-584 (2009) 
Michigan 1967) 53.22 J.H. Holland: Adaptation in Natural And Artificial 
53.8 M.D. Vose: Modeling simple genetic algorithms. Systems (University of Michigan, Ann Arbor 1975) 
In: Foundations of Genetic Algorithms, Vol.2,ed. 53.23 D.E. Goldberg, B. Korb, K. Deb: Messy genetic al- 
by L.D. Whitley (Morgan Kaufmann, San Mateo gorithms: Motivation, analysis, and first results, 
1993) pp. 63-73 Complex Syst. 3(5), 493-530 (1989) 
53.9 R. Hamming: Coding and Information The- 53.24 N.J. Radcliffe: Forma analysis and random re- 
ory (Prentice Hall, Englewood Cliffs, New Jersey spectful recombination. In: Foundations of Ge- 
1980) netic Algorithms, ed. by G.J.E. Rawlins (Morgan 
53.10 V.I. Levenshtein: Binary codes capable of correct- Kaufmann, San Mateo 1991) pp. 222-229 
ing deletions, insertions and reversals, Sov. Phys. 53.25 N.J. Radcliffe: Equivalence class analysis of ge- 
Dokl. 10(8), 707-710 (1966), Doklady Akademii netic algorithms, Complex Syst. 5(2), 183-205 
Nauk SSSR, V163 No4 845-848 1965 (1991) 
53.11 R.A. Wagner, M.J. Fisher: The string-to-string cor- 53.26 N.J. Radcliffe: Non-linear genetic representa- 
rection problem, J. ACM 21, 168-174 (1974) tions, Parallel Problem Solving from Nature - 
53.12 R.A. Caruana, J.D. Schaffer: Representation and PPSN II, ed. by R. Manner, B. Manderick (Springer, 
hidden bias: Gray vs. binary coding for genetic Berlin, Heidelberg 1992) pp. 259-268 
algorithms, Proc. 5th Int. Workshop Mach. Learn., 53.27 N.J. Radcliffe: Genetic set recombination. In: 
ed. by L. Laird (Morgan Kaufmann, San Francisco Foundations of Genetic Algorithms, Vol. 2, ed. by 
1988) pp. 153-161 L.D. Whitley (Morgan Kaufmann, San Mateo 1993) 
53.13 T. Back, D.B. Fogel, Z. Michalewicz: Handbook pp. 203-219 
of Evolutionary Computation (Institute of Physics 53.28 N.J. Radcliffe, P.D. Surry: Fitness variance of for- 
Publishing/Oxford Univ. Press, Bristol, New York mae and performance prediction. In: Founda- 
1997) tions of Genetic Algorithms 3, ed. by L.D. Whitley, 
53.14 D.E. Goldberg: The Design of Innovation, Series M.D. Vose (Morgan Kaufmann, San Mateo 1994) 
on Genetic Algorithms and Evolutionary Compu- pp. 51-72 
tation (Kluwer, Boston 2002) 53.29 N.J. Radcliffe: The algebra of genetic algorithms, 
53.15 D.E. Goldberg: Genetic Algorithms in Search, Ann. Maths. Artif. Intell. 10, 339-384 (1994) 
Optimization, and Machine Learning (Addison- 53.30 N.J. Radcliffe, P.D. Surry: Formal algorithms + 


Wesley, Reading 1989) 


formal representations = search strategies, Par- 


Design of Representations and Search Operators 


References 


53.31 


53.32 


53.33 


53.34 


53.35 


53.36 


53.37 


53.38 


53.39 


53.40 


53.41 


53.42 


allel Problem Solving from Nature — PPSN IV, ed. 
by H.-M. Voigt, W. Ebeling, |. Rechenberg, H.- 
P. Schwefel (Springer, Berlin 1996) pp. 366-375 

H. Kargupta, K. Deb, D.E. Goldberg: Ordering ge- 
netic algorithms and deception, Parallel Prob- 
lem Solving from Nature — PPSN IV, ed. by H.- 
M. Voigt, W. Ebeling, |. Rechenberg, H.-P. Schwe- 
fel (Springer, Berlin 1996) pp. 47-56 

J. Doran, D. Michie: Experiments with the graph 
traverser program, Proc. R. Soc. Lond. (A) 294, 
235-259 (1966) 

B. Manderick, M. de Weger, P. Spiessens: The ge- 
netic algorithm and the structure of the fitness 
landscape, Proc. Lth Int. Conf. Genet. Algorithm., 
ed. by R.K. Belew, L.B. Booker (Morgan Kaufmann, 
Burlington 1991) pp. 143-150 

S. Christensen, F. Oppacher: What can we learn 
from no free lunch?, Proc. Genet. Evol. Comput. 
Conf. (GECCO 2001), ed. by L. Spector, E. Goodman, 
A. Wu, W.B. Langdon, H.-M. Voigt, M. Gen, S. Sen, 
M. Dorigo, S. Pezeshk, M. Garzon, E. Burke (Mor- 
gan Kaufmann, San Francisco 2001) pp. 1219-1226 
D. Whitley, K. Mathias, S. Rana, J. Dzubera: Eval- 
uating evolutionary algorithms, Artif. Intell. 85, 
245-276 (1996) 

C.R. Reeves: Landscapes, operators and heuristic 
search, Ann. Oper. Res. 86, 473-490 (1999) 

F. Gray: Pulse code communications, U.S. Patent 
263 2058 (1953) 

R.A. Caruana, J.D. Schaffer, L.J. Eshelman: Us- 
ing multiple representations to improve inductive 
bias: Gray and binary coding for genetic algo- 
rithms, Proc. 6th Int. Workshop Mach. Learn., ed. 
by A.M. Segre (Morgan Kaufmann, San Francisco 
1989) pp. 375-378 

C. Höhn, C.R. Reeves: Are long path problems 
hard for genetic algorithms?, Parallel Problem 
Solving from Nature — PPSN IV, ed. by H.- 
M. Voigt, W. Ebeling, |. Rechenberg, H.-P. Schwe- 
fel (Springer, Berlin 1996) pp. 134-143 

C. Höhn, C.R. Reeves: The crossover landscape for 
the onemax problem, Proc. 2nd Nordic Workshop 
Genet. Algorithm. Appl. (2NWGA), ed. by J.T. Alan- 
der (Vaasa, Finland 1996), pp. 27-43, Department 
of Information Technology and Production Eco- 
nomics, University of Vaasa 

D.B. Fogel: Real-valued vectors. In: Handbook 
of Evolutionary Computation, ed. by T. Bäck, 
D.B. Fogel, Z. Michalewicz (Institute of Physics 
Publishing/Oxford Univ. Press, Bristol, New York 
1997), pp. C3.2:2-(3.2:5 

A. Moraglio, R. Poli: Topological interpretation 
of crossover. In: Proc. Genet. Evol. Comput. 
Conf. (GECCO 2004), ed. by K. Deb, R. Poli, 
W. Banzhaf, H.-G. Beyer, E. Burke, P. Darwen, 
D. Dasgupta, D. Floreano, J. Foster, M. Harman, 
0. Holland, P.L. Lanzi, L. Spector, A. Tettamanzi, 
D. Thierens, A. Tyrrell (Springer, Berlin, Heidelberg 
2004) pp. 1377-1388 


53.43 


53.44 


53.45 


53.46 


53.47 


53.48 


53.49 


53.50 


53.51 


53.52 


53.53 


53.54 


53.55 


53.56 


53.57 


A. Moraglio, Y.-H. Kim, Y. Yoon, B.R. Moon: Geo- 
metric crossovers for multiway graph partitioning, 
Evol. Comput. 15(4), 445-474 (2007) 

A. Moraglio: Towards a Geometric Unification of 
Evolutionary Algorithms, Ph.D. Thesis (Depart- 
ment of Computer Science, University of Essex, 
Colchester 2007) 

F. Rothlauf: Representations for Genetic and Evo- 
lutionary Algorithms, Studies on Fuzziness and 
Soft Computing, Vol. 104, 1st edn. (Springer, 
Berlin, Heidelberg 2002) 

J. Reed, R. Toombs, N.A. Barricelli: Simulation of 
biological evolution and machine learning: |. Se- 
lection of self-reproducing numeric patterns by 
data processing machines, effects of hereditary 
control, mutation type and crossing, J. Theor. 
Biol. 17, 319-342 (1967) 

D.H. Ackley: A Connectionist Machine for Genetic 
Hill Climbing (Kluwer Academic, Boston 1987) 

G. Syswerda: Uniform crossover in genetic al- 
gorithms, Proc. 3rd Int. Conf. Genet. Algorithm. 
(ICGA), ed. by J.D. Schaffer (Morgan Kaufmann, 
Burlington 1989) pp. 2-9 

Z. Michalewicz: Genetic Algorithms + Data Struc- 
tures = Evolution Programs (Springer, New York 
1996) 

L.B. Booker: Binary strings. In: Handbook of Evo- 
lutionary Computation, ed. by T. Back, D.B. Fo- 
gel, Z. Michalewicz (Institute of Physics Publish- 
ing/Oxford Univ. Press, Bristol, New York 1997), pp. 
(3.3:1-G.3:10 

H. Miihlenbein, G. Paa&: From recombination of 
genes to the estimation of distributions |, bi- 
nary parameters, Parallel Problem Solving from 
Nature — PPSN IV, ed. by H.-M. Voigt, W. Ebeling, 
|. Rechenberg, H.-P. Schwefel (Springer, Berlin 
1996) pp. 178-187 

H. Muhlenbein, T. Mahnig: FDA - a scalable evo- 
lutionary algorithm for the optimization of addi- 
tively decomposed functions, Evol. Comput. 7(4), 
353-376 (1999) 

M. Pelikan, D.E. Goldberg, E. Cantu-Paz: BOA: The 
Bayesian optimization algorithm, IIIiGAL Report 
No. 99003 (University of Illinois, Urbana 1999) 

M. Pelikan, D.E. Goldberg, F. Lobo: A survey of 
optimization by building and using probabilis- 
tic models, IIliGAL Report No. 99018 (University of 
Illinois, Urbana 1999) 

P. Larrañaga, R. Etxeberria, J.A. Lozano, J.M. Pea: 
Optimization by learning and simulation of 
Bayesian and Gaussian networks. Technical Re- 
port EHU-KZAA-IK-4/99 (University of the Basque 
Country, San Sebastián 1999) 

P.A.N. Bosman: Design and Application of Iter- 
ated Density-Estimation Evolutionary Algorithms, 
Ph.D. Thesis (Universiteit Utrecht, Utrecht 2003) 

P. Larrañaga, J.A. Lozano: Estimation of Distri- 
bution Algorithms: A New Tool for Evolutionary 
Computation (Springer, Berlin 2001) 


1081 


ES | 3 Hed 


1082 


ES | 3 Hed 


Part E 


Evolutionary Computation 


53.58 


53.59 


53.60 


53.61 


53.62 


53.63 


53.64 


53.65 


53.66 


53.67 


53.68 


53.69 


53.70 


53.71 


M. Pelikan: Hierarchical Bayesian Optimization 
Algorithm: Toward a New Generation of Evolu- 
tionary Algorithms, Studies in Fuzziness and Soft 
Computing (Springer, New York 2006) 

E. Falkenauer: Genetic Algorithms and Grouping 
Problems (Wiley, Chichester 1998) 

S.-S. Choi, B.-R. Moon: Normalization for genetic 
algorithms with non-synonymously redundant 
encodings, IEEE Trans. Evol. Comput. 12(5), 604- 
616 (2008) 

S.-S. Choi, B.-R. Moon: Normalization in ge- 
netic algorithms, Proc. Genet. Evol. Comput. Conf. 
(GECCO 2003), ed. by E. Cantu-Paz, J.A. Foster, 
K. Deb, D. Davis, R. Roy, U.-M. O'Reilly, H.- 
G. Beyer, R. Standish, G. Kendall, S. Wilson, 
M. Harman, J. Wegener, D. Dasgupta, M.A. Pot- 
ter, A.C. Schultz, K. Dowsland, N. Jonoska, J. Miller 
(Springer, Berlin, Heidelberg 2003) pp. 862-873 
L. Davis: Applying adaptive algorithms to epistatic 
domains, Proc. 9th Int. Joint Conf. Artif. Intell., 
ed. by A. Joshi (Morgan Kaufmann, San Francisco 
1985) pp. 162-164 

D.E. Goldberg, R. Lingle Jr.: Alleles, loci, and 
the traveling salesman problem, Proc. Int. Conf. 
Genet. Algorithm. Their Appl., ed. by J.J. Grefen- 
stette (Lawrence Erlbaum, Hillsdale 1985) pp. 154- 
159 

I.M. Oliver, D.J. Smith, J.R.C. Holland: A study of 
permutation crossover operators on the traveling 
salesman problem, Proc. 2nd Int. Conf. Genet. Al- 
gorithm. (ICGA), ed. by J.J. Grefenstette (Lawrence 
Erlbaum, Hillsdale 1987) 

C. Bierwirth: A generalized permutation approach 
to job shop scheduling with genetic algorithms, 
OR Spektrum 17, 87-92 (1995) 

C. Bierwirth, D.C. Mattfeld, H. Kopfer: On permu- 
tation representations for scheduling problems, 
Parallel Problem Solving from Nature — PPSN IV, 
ed. by H.-M. Voigt, W. Ebeling, |. Rechenberg, 
H.-P. Schwefel (Springer, Berlin 1996) pp. 310- 
318 

L.D. Whitley: Permutations. In: Handbook of Evo- 
lutionary Computation, ed. by T. Back, D.B. Fo- 
gel, Z. Michalewicz (Institute of Physics Publish- 
ing/Oxford Univ. Press, Bristol, New York 1997), pp. 
€3.3:114-(3.3:20 

D.C. Mattfeld: Evolutionary Search and the Job 
Shop: Investigations on Genetic Algorithms for 
Production Scheduling (Physica, Berlin, Heidel- 
berg 1996) 

J.R. Koza: Genetic Programming: On the Pro- 
gramming of Computers by Natural Selection (MIT, 
Cambridge 1992) 

W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone: 
Genetic Programming, An Introduction (Morgan 
Kaufmann, Burlington 1997) 

T. Jones, S. Forrest: Fitness distance correlation as 
a measure of problem difficulty for genetic algo- 
rithms, Proc. 6th Int. Conf. Genet. Algorithms, ed. 


53.72 


53.73 


53.74 


53.75 


53.76 


53.77 


53.78 


53.79 


53.80 


53.81 


53.82 


by L. Eschelman (Morgan Kaufmann, San Fran- 
cisco 1995) pp. 184-192 

F. Rothlauf, D.E. Goldberg: Tree network design 
with genetic algorithms — an investigation in the 
locality of the priifernumber encoding. In: Late 
Breaking Papers at the Genetic and Evolution- 
ary Computation Conference 1999, ed. by S. Brave, 
A.S. Wu (Omni, Orlando 1999) pp. 238-244 

J. Gottlieb, G.R. Raidl: Characterizing locality 
in decoder-based EAs for the multidimensional 
knapsack problem, Lect. Notes Comput. Sci. 1829, 
38-52 (1999) 

J. Gottlieb, G.R. Raidl: The effects of locality on the 
dynamics of decoder-based evolutionary search, 
Proc. Genet. Evolu. Comput. Conf. (GECCO 2000), 
ed. by L.D. Whitley, D.E. Goldberg, E. Cantu-Paz, 
L. Spector, L. Parmee, H.-G. Beyer (Morgan Kauf- 
mann, San Francisco 2000) pp. 283-290 

J. Gottlieb, B.A. Julstrom, G.R. Raidl, F. Rothlauf: 
Prüfer numbers: A poor representation of span- 
ning trees for evolutionary search, Proc. Genet. 
Evol. Comput. Conf. (GECCO 2001), ed. by L. Spector, 
E. Goodman, A. Wu, W.B. Langdon, H.-M. Voigt, 
M. Gen, S. Sen, M. Dorigo, S. Pezeshk, M. Garzon, 
E. Burke (Morgan Kaufmann, San Francisco 2001) 
F. Rothlauf, D.E. Goldberg: Prüfernumbers and 
genetic algorithms: A lesson on how the low lo- 
cality of an encoding can harm the performance 
of GAs. In: Parallel Problem Solving from Na- 
ture — PPSN VI, ed. by M. Schoenauer, K. Deb, 
G. Rudolph, X. Yao, E. Lutton, J.J. Merelo, H.- 
P. Schwefel (Springer, Berlin 2000) pp. 395-404 
M. Gerrits, P. Hogeweg: Redundant coding of an 
NP-complete problem allows effective genetic al- 
gorithm search, Lect. Notes Comput. Sci. 496, 
70-74 (1991) 

S. Ronald, J. Asenstorfer, M. Vincent: Represen- 
tational redundancy in evolutionary algorithms, 
Proc. 1995 IEEE Int. Conf. Evol. Comput. 2, Pis- 
cataway, ed. by D.B. Fogel, Y. Attikiouzel (1995) 
pp. 631-636 

R. Shipman: Genetic redundancy: Desirable or 
problematic for evolutionary adaptation?, Proc. 
Lth Int. Conf. Artif. Neural Netw. Genet. Algo- 
rithm. (ICANNGA), ed. by A. Dobnikar, N.C. Steele, 
D.W. Pearson, R.F. Albrecht (Springer, Berlin 1999) 
pp. 1-11 

F. Rothlauf, D.E. Goldberg: Redundant represen- 
tations in evolutionary computation, Evol. Com- 
put. 11(4), 381-415 (2003) 

Z. Michalewicz, C.Z. Janikow: Handling con- 
straints in genetic algorithm, Proc. 3rd Int. Conf. 
Genet. Algorithm. (ICGA), ed. by J.D. Schaffer 
(Morgan Kaufmann, Burlington 1989) pp. 151- 
157 

R.H. Storer, S.D. Wu, R. Vaccari: Problem and 
heuristic space search strategies for job shop 
scheduling, ORSA J. Comput. 7(4), 453-467 
(1995) 


Design of Representations and Search Operators 


References 


53.83 


53.84 


53.85 


53.86 


53.87 


53.88 


53.89 


53.90 


53.91 


53.92 


53.93 


53.94 


53.95 


53.96 


Z. Michalewicz, M. Schoenauer: Evolutionary 
computation for constrained parameter opti- 
mization problems, Evol. Comput. 4(1), 1-32 (1996) 
P.D. Surry: A Prescriptive Formalism for Construct- 
ing Domain-Specific Evolutionary Algorithms, 
Ph.D. Thesis (University of Edinburgh, Edinburgh 
1998) 

J.J. Grefenstette, R. Gopal, B.J. Rosmaita, D. Van 
Gucht: Genetic algorithms for the traveling sales- 
man problem, Proc. Int. Conf. Genet. Algorithm. 
Their Appl., ed. byJ.J. Grefenstette (Lawrence Erl- 
baum, Hillsdale 1985) pp. 160-168 

S. Bagchi, S. Uckun, Y. Miyabe, K. Kawamura: 
Exploring problem-specific recombination oper- 
ators for job shop scheduling, Proc. 4th Int. Conf. 
Genet. Algorithm., ed. by R.K. Belew, L.B. Booker 
(Morgan Kaufmann, Burlington 1991) pp. 10-17 
D.R. Jones, M.A. Beltramo: Solving partition- 
ing problems with genetic algorithms, Proc. 4th 
Int. Conf. Genet. Algorithm., ed. by R.K. Belew, 
L.B. Booker (Morgan Kaufmann, Burlington 1991) 
pp. 442-449 

C.G. Shaefer: The ARGOT strategy: Adaptive rep- 
resentation genetic optimizer technique, Proc. 
2nd Int. Conf. Genet. Algorithm. (ICGA), ed. by 
J.J. Grefenstette (Lawrence Erlbaum, Hillsdale 
1987) pp. 50-58 

C. G. Shaefer, J. S. Smith: The ARGOT strategy 
Il: Combinatorial optimizations, Technical Report 
RL90-1 (Thinking Machines Inc., Cambridge, 1990) 
R.H. Storer, S.D. Wu, R. Vaccari: New search spaces 
for sequencing problems with application to job 
shop scheduling, Manag. Sci. 38(10), 1495-1509 
(1992) 

K.S. Naphade, S.D. Wu, R.H. Storer: Problem 
space search algorithms for resource-constrained 
project scheduling, Ann. Oper. Res. 70, 307-326 
(1997) 

A.T. Ernst, M. Krishnamoorthy, R.H. Storer: Heuris- 
tic and exact algorithms for scheduling aircraft 
landings, Netw.: Int. J. 34, 229-241 (1999) 

N.N. Schraudolph, R.K. Belew: Dynamic parame- 
ter encoding for genetic algorithms, Mach. Learn. 
9, 9-21 (1992) 

J.P. Cohoon, S.U. Hegde, W.N. Martin, D. Richards: 
Distributed genetic algorithms for the floorplan 
design problem, IEEE Trans. Comput.-Aided Des. 
Integr. Circuits Syst. 10(4), 483-492 (1991) 

B.A. Julstrom: Redundant genetic encodings may 
not be harmful, Proc. Genet. Evol. Comput. 
Conf. (GECCO ‘99), ed. by W. Banzhaf, J. Daida, 
A.E. Eiben, M.H. Garzon, V. Honavar, M. Jakiela, 
R.E. Smith (Morgan Kaufmann, Burlington 1999) 
p. 791 

J. Horn: Genetic algorithms, problem difficulty, 
and the modality of fitness landscapes, IIIiGAL 
Report No. 95004 (University of Illinois, Urbana 
1995) 


53.97 


53.98 


53.99 


53.100 


53.101 


53.102 


53.103 


53.104 


53.105 


53.106 


53.107 


53.108 


53.109 


53.110 


K. Deb, L. Altenberg, B. Manderick, T. Bäck, 
Z. Michalewicz, M. Mitchell, S. Forrest: Theoret- 
ical foundations and properties of evolutionary 
computations: Fitness landscapes. In: Handbook 
of Evolutionary Computation, ed. by T. Bäck, 
D.B. Fogel, Z. Michalewicz (Institute of Physics 
Publishing/Oxford Univ. Press, Bristol, New York 
1997), pp. B2.7:1-B2.7:25 

G.R. Raidl, J. Gottlieb: Empirical analysis of local- 
ity, heritability and heuristic bias in evolutionary 
algorithms: A case study for the multidimensional 
knapsack problem, Evol. Comput. 13(4), 441-475 
(2005) 

F. Rothlauf: On optimal solutions for the opti- 
mal communication spanning tree problem, Oper. 
Res. 57(2), 413-425 (2009) 

M. Kimura: On the probability of fixation of mu- 
tant genes in a population, Genetics 47, 713-719 
(1962) 

M. Kimura: Diffusion models in population genet- 
ics, J. Appl. Prob. 1, 177-232 (1964) 

J.S. Gale: Theoretical Population Genetics (Unwin 
Hyman, London 1990) 

T. Nagylaki: Introduction to Theoretical Popula- 
tion. Genetics (Springer, Berlin, Heidelberg 1992) 

D.L. Hartl, A.G. Clark: Principles of Population Ge- 
netics, 3rd edn. (Sinauer, Sunderland 1997) 

D.E. Goldberg, P. Segrest: Finite Markov chain 
analysis of genetic algorithms, Proc. 2nd Int. Conf. 
Genet. Algorithm. (ICGA), ed. by J.J. Grefenstette 
(Lawrence Erlbaum, Hillsdale 1987) pp. 1-8 

H. Asoh, H. Mühlenbein: On the mean con- 
vergence time of evolutionary algorithms with- 
out selection and mutation. In: Parallel Problem 
Solving from Nature — PPSN III, Lecture Notes in 
Computer Science, Vol. 866, ed. by Y. Davidor, 
H.-P. Schwefel, R. Männer (Springer, Berlin 1994) 
pp. 88-97 

D. Thierens, D.E. Goldberg, Â.G. Pereira: Domino 
convergence, drift, and the temporal-salience 
structure of problems, Proc. 1998 IEEE Int. Conf. 
Evol. Comput., Piscataway, ed. by D.B. Fogel 
(1998) pp. 535-540 

F.G. Lobo, D.E. Goldberg, M. Pelikan: Time com- 
plexity of genetic algorithms on exponentially 
scaled problems, Proc. Genet. Evolu. Comput. 
Conf. (GECCO 2000), ed. by L.D. Whitley, D.E. Gold- 
berg, E. Cantú-Paz, L. Spector, L. Parmee, H.- 
G. Beyer (Morgan Kaufmann, San Francisco 2000) 
pp. 151-158 

J.M. Daida, R. Bertram, S. Stanhope, J. Khoo, 
S. Chaudhary, 0. Chaudhri, J. Polito: What makes 
a problem GP-hard?, Analysis of a tunably dif- 
ficult problem in genetic programming, Genet. 
Program. Evol. Mach. 2(2), 165-191 (2001) 

N.X. Hoai, R.I. McKay, D.L. Essam: Representation 
and structural difficulty in genetic programming, 
IEEE Trans. Evol. Comput. 10(2), 157-166 (2006) 


1083 


ES | 3 Hed 


54. Stochastic Local Search Algorithms: An Overview 


Holger H. Hoos, Thomas Stiitzle 


In this chapter, we give an overview of the main 
concepts underlying the stochastic local search 
(SLS) framework and outline some of the most rel- 
evant SLS techniques. We also discuss some major 
recent research directions in the area of stochas- 
tic local search. The remainder of this chapter is 
structured as follows. In Sect. 54.1, we situate the 
notion of SLS within the broader context of fun- 
damental search paradigms and briefly review the 
definition of an SLS algorithm. In Sect. 54.2, we 
summarize the main issues and trends in the 
design of greedy constructive and iterative im- 
provement algorithms, while in Sects. 54.3-54.5, 
we provide a concise overview of some of the 
most widely used simple, hybrid, and popula- 
tion-based SLS methods. Finally, in Sect. 54.6, 
we discuss some recent topics of interest, such 
as the systematic design of SLS algorithms and 
methods for the automatic configuration of SLS 
algorithms. 


54.1 The Nature and Concept of SLS.............. 1086 
54.2 Greedy Construction Heuristics 
and Iterative Improvement .................. 1089 


Stochastic local search (SLS) algorithms are the method 
of choice for solving computationally hard decision 
and optimization problems from a wide range of ar- 
eas, including computing science, operations research, 
engineering, chemistry, biology and physics. SLS com- 
prises a spectrum of techniques ranging from simple 
constructive and iterative improvement procedures to 
more complex methods, such as simulated anneal- 
ing (SA), iterated local search or evolutionary al- 
gorithms (EAs). As evident from the term stochas- 
tic local search, randomization can, and often does, 
play a prominent role in these methods. Randomized 
choices may be used in the generation of initial so- 
lutions or in the decision which of several possible 


54.3 Simple SLS Methods ........................0005 1091 
54.3.1 Randomized Iterative 
IMprovement.............c.ccceceeee wees 1091 
54.3.2 Probabilistic Iterative 
IMprovemMent..............cceeece eee e ees 1091 
54.3.3 Simulated Annealing ................. 1092 
54.34 Tabu Sear orere 1092 
54.3.5 Dynamic Local Search................. 1093 
54.4 Hybrid SLS Methods......................:::: 1094 
54.4.1 Greedy Randomized 
Adaptive Search Procedures........ 1094 
54.4.2 Iterated Greedy Algorithms......... 1094 
54.4.3 Iterated Local Search.................. 1095 
54.5 Population-Based SLS Methods ............ 1095 
54.5.1 Ant Colony Optimization............. 1096 
54.5.2 Evolutionary Algorithms............. 1097 
54.6 Recent Research Directions................... 1097 


54.6.1 Combination of SLS Algorithms 
with Systematic Search 


TOCHMGUES oo so5. ssa sssiesccecacsccesesase 1098 

54.6.2 SLS Algorithm Engineering.......... 1098 
54.6.3 Automatic Configuration 

Of SLS AIBOFITHIMNG...6.5...c0050s0cc0e00s- 1100 

A E 1100 


search steps to perform next — sometimes merely to 
break ties between equivalent alternatives, and some- 
times to heuristically and probabilistically select from 
large and diverse sets of possible candidates. Judi- 
cious use of randomization can arguably simplify 
algorithm design and help achieve robust algorithm 
behavior. 

The concept of an SLS algorithm has been defined 
formally [54.1] and not only provides a unifying frame- 
work for many different types of algorithms, including 
the previously mentioned constructive and iterative im- 
provement procedures, but also provides a wide range 
of more complex search methods commonly known as 
metaheuristics. 


1085 


az 
o 

= 

(an 
m 
vi 
- 


1086 PartE 


Evolutionary Computation 


LHS |3 Hed 


Greedy constructive and iterative improvement pro- 
cedures are important SLS algorithms, since they typ- 
ically serve as building blocks for more complex SLS 
algorithms, whose performance critically depends on 
the design choices and fine tuning of these underly- 
ing components. Greedy constructive algorithms and 
iterative improvement procedures terminate naturally 
when a complete solution has been generated or a local 
optimum of a given evaluation function is reached, re- 
spectively. One possible way to obtain better solutions 
is to restart these basic SLS procedures from randomly 
chosen initial search positions. However, this approach 
has shown to be relatively ineffective in practice for rea- 
sonably sized problem instances (and it breaks down for 
large instances [54.2]). 

To overcome these limitations, over the last 
decades, a large number of more sophisticated, gen- 
eral-purpose SLS methods [54.1] have been introduced; 
these are often called metaheuristics [54.3], since they 
are based on higher level schemes for controlling one 
or more subsidiary heuristic search procedures. We 
divide these general-purpose SLS methods into three 
broad classes: simple, hybrid and population-based 
SLS methods. Simple SLS methods typically use one 
neighborhood relation during the search and either 
modify the acceptance criterion for search steps, allow- 
ing to occasionally accept worsening steps, or modify 
the evaluation function that is used during the local 
search process. Examples of simple SLS methods in- 
clude SA [54.4,5] and (simple) tabu search [54.6—9]. 
A number of SLS methods combine different types of 
search steps — for example, construction steps and per- 
turbative local search steps — or introduce occasional 
larger modifications into current candidate solutions, 
to provide appropriate starting points for subsequent 
iterative improvement search. Examples of such hy- 


54.1 The Nature and Concept of SLS 


Computational approaches for the solution of hard, 
combinatorial problems can all be viewed as perform- 
ing some form of search. Essentially, search algorithms 
generate and evaluate candidate solutions for the prob- 
lem instance at hand. For combinatorial decision prob- 
lems, the evaluation of a candidate solution requires to 
check whether the candidate solution is a feasible so- 
lution satisfying all given constraints; for combinatorial 
optimization problems, it involves computing the value 
of the given objective function. For NP-complete de- 
cision problems and NP-equivalent optimization prob- 


brid SLS methods include greedy randomized adaptive 
search procedures (GRASPs) [54.10] and iterated local 
search [54.11]. Finally, several SLS methods maintain 
and manipulate at each iteration a set, or population, 
of candidate solutions, which provides a natural way 
of increasing search diversification. Examples of such 
population-based SLS methods include EAs [54.12- 
15], scatter search [54.16, 17] and ant colony optimiza- 
tion [54.18—20]. 

Our classification into simple, hybrid and popula- 
tion-based SLS methods is not the only possible one, 
and certain SLS algorithms could be seen as belonging 
to more than one category. For example, many popu- 
lation-based SLS methods are also hybrid, as they use 
different search operators or combine the manipulation 
of the population of candidate solutions with iterative 
improvement on members of the population to achieve 
increased performance. In fact, there is an increas- 
ing trend to design and apply SLS algorithms that are 
not merely based on a single, well-established general- 
purpose SLS method, but rather combine flexibly ele- 
ments of different SLS methods or incorporate mech- 
anisms taken from systematic search algorithms, such 
as branch and bound or dynamic programming. The 
conceptual framework of SLS naturally accommodates 
this development, and the composition of more complex 
SLS algorithms from conceptually simpler components 
is explicitly supported, for example, by the concept of 
generalized local search machines [54.1]. In this con- 
text, methodological issues concerning the engineering 
of SLS algorithms [54.21, 22] are increasingly gaining 
importance. Similarly, the exploitation of automatic al- 
gorithm configuration techniques and, more generally, 
the programming by optimization paradigm [54.23] en- 
able the systematic development of high-performance 
SLS algorithms. 


lems, even the most efficient algorithms known to date 
require running time exponential in the instance size in 
the worst case, while candidate solutions can be evalu- 
ated in polynomial time. 

A candidate solution for an instance of a com- 
binatorial problem is generally composed of solution 
components. Consider, for example, the well-known 
traveling salesperson problem (TSP). In the TSP, one is 
given a weighted, fully connected graph G = (V, E, w), 
where V = {v1,v2,..., Vn} is the set of |V| = n vertices, 
EC Vx V is the set of edges that fully connects the 


Stochastic Local Search Algorithms: An Overview 


54.1 The Nature and Concept of SLS 


graph, and w: Eb Rt is a function that assigns to 
each edge e € E a nonnegative weight w(e). The objec- 
tive is to find a minimum-weight Hamiltonian cycle in 
G. A candidate solution for a TSP instance can be repre- 
sented by a permutation 7 = (7(1),2(2),...,2(n)) of 
the vertex indices, and the objective function w is given 
as 


w(z) = W(Vx(n)> Va(1)) 
a=] 


+9 wa. vaa+n) : (54.1) 


i=1 


In the TSP, a (complete) candidate solution, commonly 
also called a tour, can be seen as consisting of n out of 
the n- (n— 1) possible edges, and each edge represents 
a solution component. 

Any given tour can be modified by removing two 
edges and introducing two unique new edges such that 
another valid tour is obtained. This modification is an 
example of a perturbation of a complete candidate so- 
lution, and we refer to search algorithms that make 
systematic use of such solution modifications as per- 
turbative search methods. In practice, such perturbative 
search methods iteratively modify a current candidate 
solution according to some rule, and this process ends 
when a given termination criterion is met. 

Perturbative search methods start from some com- 
plete candidate solution. The task of generating such 
candidate solutions is commonly accomplished by con- 
structive search methods or construction heuristics. 
Constructive search methods iteratively extend an ini- 
tially empty candidate solution by one or several solu- 
tion components until a complete candidate solution is 
obtained. Constructive search methods can thus be seen 
as operating in a search space of partial candidate solu- 
tions. An example of a constructive search method is the 
nearest neighbor heuristic for the TSP. An initial ver- 
tex is chosen randomly, and at each construction step, 
the nearest neighbor heuristic follows a minimal weight 
edge to one of the vertices that have not yet been vis- 
ited. These steps are iterated until all vertices have been 
visited, and the tour is completed by returning to the 
initial vertex. 

Generally speaking, local search algorithms start at 
some initial search position and iteratively move, based 
on local information, from the current position to neigh- 
boring positions in the search space. Both perturbative 
and constructive search methods match this general de- 
scription. While in the literature, the term local search 
is mostly used for perturbative search methods, it also 


applies to constructive search methods: A partial solu- 
tion corresponds to a position in the search space of 
partial candidate solutions, and the neighbors of a par- 
tial solutions are obtained by extending it with one or 
more solution components. In fact, there are a number 
of well-known generic SLS methods, such as GRASP, 
iterated greedy and ant colony optimization, that are 
based on constructive local search. 

Many local search algorithms use randomized de- 
cisions, for example, for generating initial solutions or 
when determining search steps. We therefore refer to 
such methods as stochastic local search (SLS) algo- 
rithms. The following components need to be specified 
to define an SLS algorithm (for a formal definition, we 
refer to Chap. 1 of [54.1]). 


@ Search space — comprises the set of candidate so- 
lutions (or search positions) for the given problem 
instance. 

© Solution set — consists of the search positions that 
are considered to be solutions of the given problem 
instance. In the case of decision problems, the solu- 
tion set comprises all feasible candidate solutions; 
in the case of optimization problems, the solution 
set typically comprises all optimal feasible candi- 
date solutions. 

@ Neighborhood relation — specifies the direct neigh- 
bors of each candidate solution s, i.e., the search 
positions that can be reached from s in a single 
search step of the SLS algorithm. 

© Memory states — hold additional information about 
the search beyond the search position. If an algo- 
rithm is memoryless, the memory may consist of 
a single, constant state. 

© Initialization function — specifies the search initial- 
ization in the form of a probability distribution over 
initial search positions and memory states. 

© Step function — determines the computation of 
search steps by mapping each search position and 
memory state to a probability distribution over its 
neighboring search positions and memory states. 

© Termination predicate — used to decide search ter- 
mination based on the current search position and 
memory state. 


The formal definition of an SLS algorithm speci- 
fies the initialization function, the step function, and 
the termination predicate as probability distributions, 
which the algorithm samples at each step during any 
given run. In practice, however, the initialization func- 
tion, the step function, and the termination predicate 


1087 


LHS |3 Hed 


1088 PartE 


Evolutionary Computation 


LHS |3 Hed 


will be specified by procedures, and the correspond- 
ing probability distributions are only implicitly defined. 
Note that the definition of an SLS algorithm is general 
enough to include deterministic local search algorithms. 
In fact, formally we can describe deterministic local 
search algorithms as special cases of SLS algorithms — 
deterministic decisions can be modeled using degener- 
ate probability distributions (Dirac delta). 

The working principle of an SLS algorithm is then 
as follows. The search process starts from some ini- 
tial search state that is generated by the initialization 
function. While some termination criterion is not satis- 
fied, search steps are performed according to the step 
function. In the case of optimization problems, the 
SLS algorithm keeps track of the best solution found 
so far, which is then returned upon termination of 
the algorithm. In the case of decision problems, the 
SLS algorithm typically stops as soon as a (feasible) 
solution is found or another termination criterion is 
satisfied. 

In all but the simplest cases, the search process is 
guided by an evaluation function, which measures the 
quality of candidate solutions. The efficacy of this guid- 
ance depends on the properties of the evaluation func- 
tion and the way in which it is integrated into the search 
process. Evaluation functions are generally problem 
specific. For many optimization problems, the objec- 
tive function given by the problem definition is used; 
however, different evaluation functions can sometimes 
provide better guidance, for example, in the sense of ap- 
proximation guarantees [54.24]. In decision problems, 
an appropriate evaluation function has to be defined by 
the algorithm designer. Often, the objective function 
used for optimization variants of the decision prob- 
lem can provide useful guidance. For example, for the 
satisfiability problem in propositional logic (SAT), the 
objective function of MAX-SAT, which, in a nutshell, 
counts the number of constraint violations, provides ef- 
fective guidance. Some SLS methods, such as dynamic 
local search (briefly discussed in Sect. 54.3), modify the 
evaluation function during the search process. 

The general concept of SLS algorithms, as intro- 
duced above and discussed in depth by Hoos and 
Stützle [54.1], provides a unified view of constructive 
and perturbative local search techniques that range from 
rather simplistic greedy constructive heuristics and iter- 
ative improvement algorithms to rather complex hybrid 
and population-based SLS methods. Population-based 
algorithms, which manipulate sets of candidate solu- 
tions at each iteration, fall under the definition of an 
SLS algorithm by considering search positions consist- 


ing of sets of candidate solutions. In this case, the step 
function also operates on sets of candidate solutions for 
the given problem instance. For example, in the case of 
typical EAs, recombination, mutation, and selection can 
all be modeled as operations on sets of candidate solu- 
tions, which are formally parts of a single-step function 
used for mapping one generation to the next. 

It is instructive to contrast the concept of an SLS 
algorithm with that of a metaheuristic. Metaheuristics 
have been described as heuristics that are superimposed 
on another heuristic [54.6], a [54.25]: 


master strategy that guides and modifies other 
heuristics to produce solutions beyond those that 
are normally generated in a quest for local optimal- 


ity, 
as [54.20]: 


a set of algorithmic concepts that can be used to 
define heuristic methods applicable to a wide set of 
different problems, 


and as [54.26]: 


a high-level problem-independent algorithmic 
framework that provides a set of guidelines 
or strategies to develop heuristic optimization 
algorithms. 


However, the term metaheuristic [54.26]: 


is also used to refer to a problem-specific implemen- 
tation of a heuristic optimization algorithm accord- 
ing to the guidelines expressed in such a framework. 


As is evident from these characterizations, there is 
no formal definition of the term metaheuristic, and its 
precise meaning has evolved over time. The term meta- 
heuristic is commonly used to refer to the high-level 
guidance strategies that in many occasions are used 
to extend underlying greedy constructive or perturba- 
tive search procedures. Hence, the scope of the term 
metaheuristic differs from that of an SLS algorithm; it 
comprises what can be similarly loosely characterized 
as general-purpose SLS methods, but extends naturally 
to higher-level search strategies involving paradigms 
other than SLS, such as systematic search methods 
based on backtracking. 

Conversely, the term metaheuristic is usually not 
applied to simple SLS procedures (such as random 


Stochastic Local Search Algorithms: An Overview | 54.2 Greedy Construction Heuristics and Iterative Improvement 1089 


sampling, random walk and iterative improvement), 
nor to problem-specific SLS algorithms with prov- 
able properties. Therefore, there are SLS algorithms 
based on metaheuristics (such as ant colony opti- 
mization, iterated local search or EAs for various 
problems), SLS algorithms that are not metaheuristics 
(such as 2-opt for the TSP or conflict-directed ran- 
dom walk for SAT) and metaheuristics that are not 
based on SLS (such as various branch and bound 


methods and hybrids between systematic and local 
search). 

Because the notion of an SLS algorithm explicitly 
refers to aspects that are not related to the high-level 
guidance of the search process, such as the choice of 
a neighborhood relation, evaluation function and ter- 
mination predicate, research on SLS also covers the 
design, implementation and analysis of these more 
problem-specific components. 


54.2 Greedy Construction Heuristics and Iterative Improvement 


The main SLS techniques underlying more complex 
SLS methods (or metaheuristics) comprise (greedy) 
constructive search and iterative improvement algo- 
rithms. In the following, we discuss the main principles 
and choices underlying these methods. 

Constructive search procedures (or construction 
heuristics) typically evaluate at each construction step 
the quality of the available solution components based 
on a heuristic function. Greedy construction heuristics 
choose to add at each step a solution component with 
best heuristic value, breaking ties either randomly or 
by means of a secondary heuristic function. For several 
polynomially solvable problems, such as the minimum 
spanning tree problem, greedy construction heuristics 
(for example, Kruskal’s algorithm) are guaranteed to 
produce optimal solutions [54.27]; unfortunately, for 
NP-hard problems, this is generally not the case, due 
to the myopic decisions taken during solution construc- 
tion. 

A useful distinction can be made between static and 
adaptive construction heuristics. In static construction 
heuristics, the heuristic values associated with solution 
components are precomputed before the actual con- 
struction process is executed and remain unchanged 
throughout. In adaptive construction heuristics, the 
heuristic values are recomputed at each construction 
step to take into account the impact of the current par- 


Fig. 54.1 A 2-exchange move for the symmetric TSP. 
Note that the pair of edges to be introduced is uniquely 
determined to ensure that the neighbor is again a tour 


tial solution. Adaptive construction heuristics tend to 
be more accurate and result in better quality candidate 
solutions than static heuristics, but they are also com- 
putationally more expensive. 

Construction heuristics are often used to provide 
good initial candidate solutions for perturbative local 
search algorithms. One of the most basic SLS meth- 
ods is to iteratively improve a candidate solution for 
a given problem instance. Such an iterative improve- 
ment algorithm starts from some initial search position 
and iteratively replaces the current candidate solution s 
by an improving neighboring candidate solution s’. The 
local search is terminated once no improving neighbor 
is available, that is, Vs’ € N(s) : g(s) < g(s"), where g(-) 
is the evaluation function to be minimized, and N(s) de- 
notes the set of neighbors of s. In the literature, iterated 
improvement algorithms are also referred to as iterated 
descent or (in the case of maximization problems) hill- 
climbing procedures. 

Neighborhoods are problem specific, and it is gener- 
ally difficult to predict a priori which of several possible 
neighborhoods results in best performance. However, 
for most problems, standard neighborhoods exist. Un- 
der the k-exchange neighborhood, two candidate solu- 
tions are neighbors if they differ by at most k solution 
components. An example is the 2-exchange neighbor- 
hood for the TSP, where two tours are neighbors if they 
differ by a pair of edges. Figure 54.1 illustrates a move 
in this neighborhood. In a k-exchange neighborhood, 
each candidate solution has O(n") direct neighbors, 
where n is the number of solution components in each 
candidate solution. Thus, the neighborhood size is ex- 
ponential in k, as is the time to identify improving 
neighbors. While using larger neighborhoods typically 
makes it possible to reach better solutions, finding those 
solutions also takes more time. In other words, there 
is a tradeoff between the quality of the local optima 


THS | J Hed 


1090 PartE 


Evolutionary Computation 


THS |3 Hed 


reachable by an iterative improvement algorithm and 
its run time. In practice, neighborhoods that involve 
a quadratic or cubic time-complexity may already re- 
sult in prohibitive computation times for large problem 
instances. 

The overall time-complexity of searching a given 
neighborhood is determined by its size and the cost of 
evaluating each neighbor. The power of local search 
crucially relies on the fact that caching and incremen- 
tal updating techniques can significantly reduce the cost 
of evaluating neighbors compared to computing the re- 
spective evaluation function values from scratch. For 
example, the quality of a 2-exchange neighbor of a tour 
for a TSP instance with n vertices can be computed 
from the quality of the current tour by subtracting and 
adding two edge weights (that is, two numbers) each; 
computing the weight of such a tour from scratch, on 
the other hand, requires n additions. Sometimes, to 
render the computation of the incremental updates as 
efficient as possible, additional data structures need to 
be implemented, but the net effect is often a very large 
reduction in computational effort. 

A second important technique for reducing the 
time-complexity of evaluating a given neighborhood 
is based on the idea of excluding from consideration 
neighbors that are unlikely or provably unable to lead 
to improvements. These neighborhoods pruning tech- 
niques play a crucial role in many high-performance 
SLS algorithms. Examples of such pruning techniques 
are the fixed radius searches and nearest neighbors lists 
used for the TSP [54.28-30], the use of so-called don’t 
look bits [54.28], as well as reduced neighborhoods for 
the job-shop scheduling problem [54.31] and pre-tests 
for search steps, as done for the single machine total 
weighted tardiness problem [54.32]. 

The speed and performance of iterative improve- 
ment algorithms also depends on the mechanism 
used to determine search steps, the so-called pivoting 
rule [54.33]. Iterative best improvement chooses at each 
step a neighboring candidate solution that mostly im- 
proves the evaluation function value. Any ties that occur 
can be broken either randomly, according to the order 
in which the neighborhood is searched, or based on 
a secondary criterion (as in [54.34]). In order to find 
a most improving neighbors, iterative best improvement 
needs to examine the entire neighborhood in each step. 
Iterative first improvement, in contrast, examines the 
neighborhood in some given order and moves to the first 
improving neighboring candidate solution found during 
this neighborhood scan. Iterative first improvement ap- 
plies improving search steps earlier than iterative best 


improvement, but the amount of improvement achieved 
in each step tends to be smaller; therefore, it usually 
requires more improvement steps to reach a local opti- 
mum. If a candidate solution is a local optimum, first- 
and best-improvement algorithms detect this only by in- 
specting the entire neighborhoods of that solution; don’t 
look bits [54.28, 29] offer a particularly useful mecha- 
nism for reducing the time required by this final check, 
the so-called check-out time. 

Interestingly, the local optimum found by itera- 
tive first improvement depends on the order in which 
the neighborhood is examined. This property can be 
exploited by using a random order for scanning the 
neighborhood, and repeated runs of random-order first 
improvement algorithms can identify very different lo- 
cal optima, even if each run is started from the same ini- 
tial position [54.1, Sect. 2.1]. Thus, the search process 
in random-order first improvement is more diversified 
than in the first improvement algorithms that scan local 
neighborhoods in fixed order. 

The notion of local optimality is defined with re- 
spect to a specific neighborhood. Thus, changing the 
neighborhood during the local search process may pro- 
vide an effective means for escaping from poor quality 
local optima, and offers the opportunity to benefit from 
the advantages of large neighborhoods without incur- 
ring the computational burden associated with using 
them exclusively. In the context of iterative improve- 
ment algorithms, this idea forms the basis of variable 
neighborhood descent (VND), a variant of a general- 
purpose SLS method known as variable neighborhood 
search (VNS) [54.35,36]. VND uses a sequence of 
neighborhoods Nj, N2,...,Nx; this sequence is typ- 
ically ordered according to increasing neighborhood 
size or increasing time complexity of searching the 
neighborhoods. VND starts by using the first neigh- 
borhood, Nj, until a local optimum is reached. Every 
time the exploration of a neighborhood N; does not 
identify an improving local search step, that is, a lo- 
cal optimum w.r.t. neighborhood N; is found, VND 
switches to the next neighborhood, N;+ 1 in the given se- 
quence. Whenever an improving move has been made 
in a neighborhood N;, VND switches back to N; and 
continues using the subsequent neighborhoods, N3 etc., 
from there. The search is terminated when a local opti- 
mum w.r.t. N; has been reached. The central idea of this 
scheme is to use small neighborhoods whenever possi- 
ble, since they allow for the most efficient local search 
process. The VND scheme typically results in a signif- 
icant reduction of computation time when compared to 
an iterative improvement algorithm that uses the largest 


Stochastic Local Search Algorithms: An Overview | 54.3 Simple SLS Methods 1091 


neighborhood only. VND typically finds high-quality 
local optima, because upon termination, the resulting 
candidate solution is locally optimal with respect to all 
k neighborhoods examined. 

Finally, recent years have seen an explosion in the 
development of iterative improvement methods that ex- 
ploit very large scale neighborhood, whose size is 
typically exponential in the size of the given prob- 
lem instance [54.37]. In fact, there are two main ap- 
proaches to searching these neighborhoods. The first 
is to perform a heuristic search in the neighborhood, 
since a exact search would be computationally too 
demanding. This idea forms the basis of variable- 
depth search algorithms, where the number of solu- 
tion components that are modified in each step is not 
determined a priori. Interestingly, the two best-known 
variable-depth search algorithms, the Kernighan—Lin 
algorithm for graph partitioning [54.38] and the Lin- 


54.3 Simple SLS Methods 


Iterative improvement algorithms accept only improv- 
ing neighbors as new current candidate solutions, and 
they terminate when encountering a local optimum. To 
allow the search process to progress beyond local op- 
tima, many SLS methods permit moves to worsening 
neighbors. We refer to the methods discussed in the 
following as simple SLS methods, because they essen- 
tially only use one type of search steps, in a single, fixed 
neighborhood relation. 


54.3.1 Randomized Iterative Improvement 


The key idea behind randomized iterative improve- 
ment (RII) is to occasionally perform moves to random 
neighboring candidate solutions irrespective of their 
evaluation function value. The simplest way of imple- 
menting this idea is to apply, with a given probability 
Wp, a so-called uninformed random walk step, which 
chooses a neighbor of the current candidate solution 
uniformly at random, while with probability 1 — w,, an 
improvement step is performed. Often, the improve- 
ment step will correspond to one iteration of a best 
improvement procedure. The parameter w, is referred 
to as walk probability or, simply, noise parameter. RII 
algorithms have the property that they can perform arbi- 
trarily long sequences of random walk steps; the length 
of these sequences (i.e., the number of consecutive 
random walk steps) follows a geometric distribution 


Kernighan algorithm for the TSP [54.39], have been 
devised about in the early 1970s, a fact that illus- 
trates the lasting interest in these types of methods. 
The more recent concept of ejection chains [54.40] 
is related to variable-depth search. Another interest- 
ing approach is to devise neighborhoods with a special 
structure that allows them to be searched either in 
polynomial time or at least very efficiently in prac- 
tice [54.37, 41—43]. This is the central idea behind many 
recent developments in very large scale neighborhoods, 
which include techniques such as Dynasearch [54.32, 
44] and cyclic exchange neighborhoods [54.45, 46]. 
As a result of these research efforts, current state- 
of-the-art methods for a variety of combinatorial prob- 
lems such as the TSP [54.47] or the single machine 
total weighted tardiness problem [54.48] rely on iter- 
ative improvement algorithms based on very large scale 
neighborhoods. 


with parameter w,. This allows effective escapes from 
local optima and renders RII probabilistically approx- 
imately complete [54.1, Sect. 4.1]. A main advantage 
of RII is ease of implementation — often, only a few 
additional lines of code are required to extend an it- 
erative improvement procedure to an RII procedure — 
and its behavior is effectively controlled by a single 
parameter. 

RII algorithms have been shown to perform quite 
well in a number of applications. For example, in the 
1990s, minor variations of RII, in which random walk 
steps are determined based on the status of constraint 
violations rather than chosen uniformly at random, have 
been state of the art for solving the SAT [54.49, 50] and 
other constraint satisfaction problems [54.51]. Due to 
their simplicity, RII algorithms also facilitate theoreti- 
cal analyses, including characterization of performance 
in dependence of parameter settings [54.52]. 


54.3.2 Probabilistic Iterative Improvement 


Instead of accepting worsening search steps regardless 
of the amount of deterioration in evaluation function 
value they caused (as is the case for random walk 
steps), it may be preferable to have the probability 
of acceptance depend on the change of the evaluation 
function value incurred. This is the key idea underlying 
probabilistic iterative improvement (PII). Unlike RII, 


EHS | J Hed 


1092 


EHS | 3 Hed 


Part E 


Evolutionary Computation 


each step of PII involves two phases: first, a neighbor- 
ing candidate solution s’ € N(s) is selected uniformly 
at random (proposal mechanism); then, a probabilis- 
tic decision is made whether to accept s’ as the new 
search position (acceptance test). For minimization 
problems, the acceptance probability is often based on 
the Metropolis condition and defined as 


Paccept(T S, s’) 
1 if g(s’) < g(s) 
:= g(s) —g(s’) f 
exp S otherwise , 


(54.2) 


where Paccept(T, S, s’) is the acceptance probability, g 
is the evaluation function to be minimized, and T is 
a parameter that influences the probability of accept- 
ing a worsening search step. PII is closely related to 
simulated annealing (SA), discussed next; in fact, when 
using the acceptance mechanism given above, PII is 
equivalent to constant-temperature SA. In light of this 
connection, parameter T is also called temperature. For 
various applications, such PII procedures have been 
shown to perform quite well, provided that T is cho- 
sen carefully [54.53, 54]. It is worth noting that in the 
limit for T = 0, PII effectively turns into an iterative 
improvement procedure (i. e., never accepts worsening 
steps), while for T = oo, it performs a uniform random 
walk. 


54.3.3 Simulated Annealing 


Simulated annealing (SA) [54.4,5] is similar to PII, 
except that the parameter T is modified at run time. 
Following the analogy of the physical annealing of 
solid materials (e.g., metals and glass), which inspired 
SA, the temperature T is initially set to some high 
value and then gradually decreased. At the beginning 
of the search process, high temperature values result 
in relatively high probabilities of accepting worsening 
candidate solutions. As the temperature is decreased, 
the search process becomes increasingly greedy; for 
very low settings of the temperature, almost only im- 
proving neighbors or neighbors with evaluation func- 
tion value equal to the current candidate solution are 
accepted. 

Standard SA algorithms iterate over the same two 
stage process as PII, typically using uniform sam- 
pling (with or without replacement) from the neigh- 
borhood as a proposal mechanism and a parameter- 


ized acceptance test based on the Metropolis condi- 
tion (54.2) [54.4, 5]. The modification of temperature T 
is managed by a so-called annealing (or cooling) sched- 
ule, which is a function that determines the temperature 
value at each search step. One of the most common 
choices is a geometric cooling schedule, defined by an 
initial temperature, Tọ, a parameter œ between 0 and 
1, and a value k, called the temperature length, which 
defines the number of candidate solutions that are pro- 
posed at each fixed value of the temperature; every k 
steps, the temperature is updated as T := a-T. Impor- 
tant parameters of SA are often determined based on 
characteristics of the problem instance to be solved. 
For example, the initial temperature may be based on 
statistics derived from an initial, short random walk, 
the temperature length may be set to a multiple of 
the neighborhood size, and the search process may 
be terminated when the frequency with which pro- 
posed search steps are accepted falls below a given 
threshold. 

SA is one of the oldest and most studied SLS 
methods. It has been applied to a very broad range of 
computational problems, and many types of annealing 
schedules, proposal mechanisms, and acceptance tests 
have been investigated. SA has also been subject to 
a substantial amount of theoretical analysis, which has 
yielded various convergence results. For more details 
on SA, we refer to [54.55, 56]. 


54.3.4 Tabu Search 


Tabu search (TS) differs significantly from the previ- 
ously discussed SLS methods, in that it makes a direct 
and systematic use of memory to direct the search pro- 
cess [54.25]. In its most basic form, which is also called 
simple tabu search, TS expands an iterative improve- 
ment procedure with a short-term memory to prevent 
the local search process from returning to recently vis- 
ited search positions. Instead of memorizing complete 
candidate solutions and forbidding these explicitly, TS 
usually associates a tabu status with specific solution 
components. In the latter case, TS stores for each so- 
lution component the time (i.e., the iteration number) 
at which it was last modified. Each solution component 
is then considered as potentially tabu if the difference 
between the stored iteration number and the current it- 
eration number is larger than the value of a parameter 
called tabu tenure (or tabu list length). The tabu status 
of a local search step is then determined based on spe- 
cific tabu criteria, which are a function of the tabu status 
of solution components that are affected by it. One ef- 


Stochastic Local Search Algorithms: An Overview | 54.3 Simple SLS Methods 1093 


fect is that once a search step has been performed, it is 
tabu in that it cannot be reversed for a certain number 
of iterations. 

Seen from a neighborhood perspective, TS dynam- 
ically restricts the set of neighbors permissible at each 
local search step by excluding neighbors that are cur- 
rently tabu. Since the tabu mechanism through prohi- 
bition of solution components is quite restrictive, many 
simple TS algorithms use an aspiration criterion, which 
overrides the tabu status of neighbors if specific condi- 
tions are satisfied; for example, if a local search step 
leads to a new best solution, aspiration allows it to be 
accepted regardless of its tabu status. 

As an example, consider a simple TS algorithm for 
the TSP, based on the 2-exchange neighborhood. Edges 
that are removed (or introduced) by a 2-exchange step 
may then not be reintroduced into (or removed from) 
the current tour for tt search steps, where tt is the tabu 
tenure. 

For several problems, even simple TS algorithms 
have been shown to perform quite well. However, the 
performance of TS strongly depends on the tabu tenure 
setting. To avoid the difficulty of finding fixed settings 
suitable for a given problem, mechanisms such as re- 
active tabu search [54.57] have been devised to adapt 
the tabu tenure at run time. Simple TS algorithms can 
be improved in many different ways. In particular, var- 
ious mechanisms have been developed that make use 
of intermediate-term and long-term memory to further 
enhance the performance of simple TS. For a detailed 
description of such techniques, which aim either at in- 
tensifying the search in specific areas of the search 
space or at diversifying the search to explore unvisited 
search space regions, we refer to the book by Glover 
and Laguna [54.25]. 


54.3.5 Dynamic Local Search 


In contrast to the simple SLS methods discussed so far, 
dynamic local search (DLS) does not accept worsening 
search steps, but rather modifies the evaluation func- 
tion during the search in order to escape from local 
optima. These modifications of the evaluation func- 
tion g are commonly triggered whenever the underlying 
local search algorithm, typically an iterative improve- 
ment procedure, has reached a locally optimal solution 
with respect to g’, the current evaluation function. Next, 
the evaluation function is modified and the subsidiary 
local search algorithm is run until a local optimum 
(with respect to the new g’) is encountered. These lo- 
cal search phases and evaluation function updates are 


iterated until some termination criterion is met (see Al- 
gorithm 54.1). 


Algorithm 54.1 High-level outline of dynamic local 
search 
Dynamic local search (DLS): 
determine initial candidate solution s 
initialize penalties 
while termination criterion is not satisfied do 
compute modified evaluation function g’ 
from g and penalties 
perform subsidiary local search on s using g’ 
update penalties based on s 
end while 


The modified evaluation function g’ is typically 
computed as the sum of the original evaluation function 
and penalties associated with each solution component, 
that is 


ge (s) := g(s) + > penalty(i) , (54.3) 
i€SC(s) 


where g is the original evaluation function, SC(s) is 
the set of solution components of candidate solution s, 
and penalty(i) is the penalty of solution component i. 
Initially, all penalties are set to zero. Variants of DLS 
differ in the details of their penalty update mecha- 
nism (e.g., additive vs. multiplicative updates, occa- 
sional reduction of penalties) and the choice of the 
solution components whose penalties are adjusted. For 
example, guided local search [54.58, 59] uses the fol- 
lowing mechanism for choosing the solution compo- 
nents whose penalties are increased: First, a utility 
value u(i) := g;(s)/(1 + penalty(i)) is computed for 
each solution component i, where g;(s) measures the 
impact of i on the evaluation function; then, the penal- 
ties of solution components with maximal utility are 
increased. 

DLS algorithms are sometimes referred to as a soft 
form of tabu search, since solution components are not 
strictly forbidden, but the effect of the penalties resem- 
bles a soft prohibition. There are also conceptual links 
to Lagrangian methods [54.60,61]. DLS algorithms 
have been shown to reach state-of-the-art performance 
for SAT [54.62] and for the maximum clique prob- 
lem [54.63]. 


EHS | J Hed 


1094 PartE | Evolutionary Computation 


14S |3 Hed 


54.4 Hybrid SLS Methods 


The performance of basic SLS techniques can often 
be improved by combining them with each other. In 
fact, even RII can be seen as a combination of iterative 
improvement and random walk, using the same neigh- 
borhood. Several other SLS methods combine different 
types of search steps, and in the following, we briefly 
discuss some prominent examples. 


54.4.1 Greedy Randomized 
Adaptive Search Procedures 


As mentioned previously, construction heuristic can 
be easily and effectively combined with perturbative 
local search procedures. While greedy construction 
heuristics generally generate only one or very few 
different candidate solutions, randomization of the con- 
striction process makes it possible to generate many 
different high-quality solutions. The idea underlying 
GRASP [54.10, 64] is to combine randomized greedy 
construction with a subsequent perturbative local search 
phase, whose goal is to improve the candidate so- 
lutions produced by the construction heuristic. The 
two phases of solution construction and perturbative 
local search are repeated until a termination crite- 
rion, e.g., maximum computation time, is met. The 
term adaptive in GRASP refers to the fact that the 
hybrid search process typically uses an adaptive con- 
struction heuristic. Randomization in GRASP is real- 
ized based on the concept of a restricted candidate 
list, which contains the best-scoring solution compo- 
nents according to the given heuristic function. In 
the simplest and most common GRASP variants, el- 
ements are chosen uniformly at random from this 
restricted candidate list during the construction pro- 
cess. For a detailed description, various extensions, 
and an overview of applications of GRASP, we refer 
to [54.64]. 


54.4.2 Iterated Greedy Algorithms 


A disadvantage of GRASP is that new candidate 
solutions are constructed from scratch and indepen- 
dently of previously found solutions. Iterated greedy 
(IG) algorithms iteratively apply greedy construction 
heuristics to generate a chain of high-quality candi- 
date solutions. The central idea is to alternate be- 
tween solution construction and destruction phases, 
and thus to combine at least two different types of 
search steps. IG algorithms first build an initial, com- 


plete candidate solution s. Then, they iterate over 
the following phases, until a termination criterion is 
met: 


1. Starting from the current candidate solution, s, a de- 
struction phase is executed, during which some 
solution components are removed from s, result- 
ing in a partial candidate solution s’. The solution 
components that are removed in this phase may be 
chosen at random or, for example, based on their 
impact on the evaluation function. 

2. Starting from s’, a construction heuristic is used to 
generate another candidate solution, s”. This con- 
struction heuristic may differ from the one used to 
generate the initial candidate solution. 

3. Based on an acceptance criterion, a decision is 
made whether to continue the search from s or s”. 
Additionally, it is often useful to further im- 
prove complete candidate solutions by means of 
a subsidiary perturbative local search procedure 
(see Algorithm 54.2 for a high-level outline of 
IG). 


Algorithm 54.2 High-level outline of an iterated 
greedy (IG) algorithm 
Iterated greedy (IG): 
construct initial candidate solution s 
perform subsidiary local search on s 
while termination criterion is not satisfied do 
apply destruction to s, resulting in s 
apply constructive heuristic starting from s’, 
resulting in s” 
perform subsidiary local search on s” (optional) 
based on acceptance criterion, keep s or 
accept s := s” 
end while 


The principle underlying IG methods has been 
rediscovered several times, and consequently, can 
be found under various names, including ruin-and- 
recreate [54.65], iterative flattening [54.66], and it- 
erative construction heuristic [54.67]; it has also 
been used in the context of SA [54.68]. IG al- 
gorithms, especially when combined with perturba- 
tive local search methods, have reached state-of-the- 
art performance for a number of problems, includ- 
ing several variants of flowshop scheduling [54.69, 
70]. 


Stochastic Local Search Algorithms: An Overview 


54.5 Population-Based SLS Methods 


54.4.3 Iterated Local Search 


Iterated local search (ILS) generates a sequence of so- 
lutions by alternating applications of a perturbation 
mechanism and of a subsidiary local search algorithm. 
Consequently, ILS can be seen as a hybrid between the 
search methods underlying the local search and pertur- 
bation phases. 

An ILS algorithm is specified by four main compo- 
nents. The first is the mechanism used for generating 
an initial solution, for example, a greedy constructive 
heuristic. The second is a subsidiary (perturbative) local 
search procedure; typically, this is an iterative improve- 
ment algorithm, but often, other simple SLS methods 
are used. The third component is a perturbation proce- 
dure that introduces a modification to a given candidate 
solution. These perturbations should be complementary 
to the modifications introduced by the subsidiary local 
search procedure; in particular, the effect of the pertur- 
bation procedure should not be easily reversible by the 
local search procedure. The fourth component is an ac- 
ceptance criterion, which is used to decide whether to 
accept the outcome of the latest perturbation and local 
search phase. 

ILS starts by generating an initial candidate solu- 
tion, to which then subsidiary local search is applied. It 
then iterates over the following phases, until a termina- 
tion criterion is met: 


1. Perturbation is applied to the current candidate 
solution s, to obtain an intermediate candidate so- 
lution s’. 

2. Subsidiary local search is applied to s’. 

3. Based on the acceptance criterion, a decision is 
made whether to continue the search from s or s’ 
(see Algorithm 54.3 for a high-level outline of 
ILS). 


Often, the subsidiary search is based on iterative im- 
provement and ends in a local optimum; ILS can there- 


54.5 Population-Based SLS Methods 


The use of a population of candidate solutions offers 
a convenient way to increase diversification in SLS. 
For example, population-based extensions of ILS algo- 
rithms have been proposed with this aim in mind [54.74, 
75]. A further potential benefit comes from the inher- 
ent parallelizability of the most population-based SLS 


fore be seen as performing a biased random walk in the 
space of local optima produced by the given subsidiary 
local search procedure. The acceptance criterion (to- 
gether with the strength of the perturbation mechanism) 
then determines the degree of search intensification: if 
only improving candidate solutions are accepted, ILS 
performs a randomized first-improvement search in the 
space of local optima; if any new local optimum is ac- 
cepted, ILS performs a random walk in the space of 
local optima. 


Algorithm 54.3 High-level outline of iterated local 
search 
Iterated local search (ILS): 
generate initial candidate solution s 
perform subsidiary local search on s 
while termination criterion is not satisfied do 
apply perturbation to s, resulting in s’ 
perform subsidiary local search on s’ 
based on acceptance criterion, keep s 
or accept s := s’ 
end while 


An attractive feature of ILS is that basic versions 
can be quickly and easily implemented, especially if 
a simple SLS algorithm or an iterative improvement 
procedure is already available. Using some additional 
refinements, ILS methods define the current state of 
the art for solving many combinatorial problems, in- 
cluding the TSP [54.71]. Similar to IG, ILS is based 
on an idea that has been rediscovered several times 
and is known under various names, including large- 
step Markov chains [54.29] and chained local optimiza- 
tion [54.72]. There is also a close conceptual connection 
with several variants of variable neighborhood search 
(VNS) [54.35]; in fact, the so-called basic VNS and 
skewed VNS algorithms can be seen as variants of 
ILS that adapt the perturbation strength at run time. 
For more details on iterated local search, we refer 
to [54.73]. 


methods, although the parallelization thus achieved is 
not necessarily more effective than the simple and 
generic approach of performing multiple independent 
runs of an SLS algorithm in parallel (see also [54.1], 
Sect. 4.4). As previously remarked, population-based 
methods can be cast into the SLS framework described 


1095 


G'HS |3 Hed 


1096 PartE 


Evolutionary Computation 


S°4S |3 Hed 


in Sect. 54.1 by defining search positions to consist of 
sets of candidate solutions and by using neighborhood 
relations, initialization, and step functions that operate 
on such populations. 

Unfortunately, the benefits derived from the use of 
populations come at the cost of increased complex- 
ity, in terms of implementation effort, and parameters 
that need to be set appropriately. In what follows, 
we describe two of the most prominent population- 
based methods, one based on a constructive search 
paradigm (ant colony optimization), and the other based 
on a perturbative search paradigm (evolutionary algo- 
rithms). 


54.5.1 Ant Colony Optimization 


Ant colony optimization (ACO) algorithms have orig- 
inally been inspired by the trail-following behavior of 
real ant species, which allows them to find shortest 
paths [54.76,77]. This biological phenomenon gave 
rise to a surprisingly effective algorithm for combina- 
torial optimization [54.18, 19]. In ACO, the artificial 
ants perform a randomized constructive search that is 
biased by (artificial) pheromone trails and heuristic in- 
formation derived from the given problem instance. The 
pheromone trails are numerical values associated with 
solution components that are adapted at run time to 
reflect experience gleaned from the search process so 
far. 

During solution construction, at each step every 
ant chooses a solution component, probabilistically 
preferring those with high-pheromone trail and heuris- 
tic information values. For illustration, consider the 
TSP — the first problem to which ACO has been 
applied [54.18]. Each edge (i,j) has an associated 
pheromone value t;; and a heuristic value nj, which for 
the TSP is typically defined as 1/w(i, j), that is, the in- 
verse of the edge weight. In ant system [54.19], the first 
ACO algorithm for the TSP, an ant located at vertex 
i would add vertex j to its current partial tour s’ with 
probability 


a,b 
Tij ‘Mij 


r (54.4) 
B 
Vien) Ti Nit 


Pij 


where N(i) is the feasible neighborhood of vertex i, i. e., 
the set of all vertices that have not yet been visited in s’, 
and a and f are parameters that control the relative im- 
portance of pheromone trails and heuristic information, 
respectively. Note that the tour construction procedure 


implemented by the artificial ants is a randomized ver- 
sion of the nearest neighbor construction heuristic. In 
fact, randomizing a greedy construction heuristic based 
on pheromone trails associated with the decisions to 
be made would generally be a good initial step to- 
ward an effective ACO algorithm for a combinatorial 
problem. 

Once every ant has constructed a complete can- 
didate solution, it is typically highly advantageous to 
apply an iterative improvement procedure or a sim- 
ple SLS algorithm [54.20,78]. Next, the pheromone 
trail values are updated by means of two counteracting 
mechanisms. The first models pheromone evaporation 
and decreases some or all pheromone trail values by 
a constant factor. The second models pheromone de- 
posit and increases the pheromone trail levels of solu- 
tion components that have been used by one or more 
ants. The amount of pheromone deposited typically de- 
pends on the quality of the respective solutions. In the 
best performing ACO algorithms, only some of the ants 
with the highest quality solutions are allowed to deposit 
pheromone. The overall result of the pheromone update 
is an increased probability of choosing solution com- 
ponents in subsequent solution constructions that have 
previously been found to occur in high-quality solu- 
tions. ACO algorithms then cycle through these phases 
of solution construction, application of local search, 
and pheromone update until some termination criterion 
is met (see Algorithm 54.4 for a high-level outline of 
ACO). 


Algorithm 54.4 High-level outline of ant colony 
optimization 
Ant colony optimization (ACO): 
initialize pheromone trails 
while termination criterion is not satisfied do 
generate population sp of candidate solutions 
using subsidiary randomized 
constructive search 
perform subsidiary local search on sp 
update pheromone trails 
end while 


Many different variants of ACO algorithms have 
been studied. Along with many additional details on 
ACO, these are described in the book by Dorigo and 
Stiitzle [54.20]; for more recent surveys, we refer the 
reader to [54.79, 80]. The ACO metaheuristic [54.81, 
82] provides a general framework for these variants 
and a generic view of how to apply ACO algorithms. 
ACO is also one of the most successful algorithmic 


Stochastic Local Search Algorithms: An Overview 


54.6 Recent Research Directions 


techniques within the broader field of swarm intelli- 
gence [54.83]. 


54.5.2 Evolutionary Algorithms 


Evolutionary algorithms (EAs) are a prominent class of 
population-based SLS methods that are loosely inspired 
by concepts from biological evolution. Unlike ACO al- 
gorithms, EAs work with a population of complete can- 
didate solutions. The initial set of candidate solutions 
is typically created randomly, but greedy construction 
heuristics may also be used to seed the population. This 
population then undergoes an artificial evolution, where 
at each iteration, the population of candidate solutions 
is modified by means of mutation, recombination and 
selection. 

Mutation operators typically introduce small, ran- 
dom perturbations into individual candidate solutions. 
The strength of these perturbations is usually controlled 
by a parameter called mutation rate; alternatively, a spe- 
cific, fixed perturbation, akin to a random walk step 
in RII, may be performed. Recombination operators 
generate one or more new candidate solutions by com- 
bining information from two or more parent candidate 
solutions. The most common type of recombination 
is crossover, inspired by the homonymous mechanism 
in biological evolution; it generates offspring by as- 
sembling partial candidate solutions from linear repre- 
sentations of two parents. In addition to mutation and 
recombination, selection mechanisms are used to deter- 
mine the candidate solutions that will undergo mutation 
and recombination, as well as those that will form the 
population used in the next iteration of the evolutionary 
process. Selection is based on the fitness, i.e., evalu- 
ation function values, of the candidate solutions, such 
that better candidate solutions have a higher probability 
to be selected. 

Details of the mutation, recombination and selec- 
tion mechanisms all have a strong impact on the per- 
formance of an EA. Generally, the use of problem 
specific knowledge within these mechanisms leads to 
better performance. In fact, much research in EAs has 
been devoted to the design of effective mutation and 


54.6 Recent Research Directions 


In this section, we concisely discuss three research 
directions that we regard as particularly timely and 
promising: combinations of SLS and systematic search 


recombination operators; a good example for this is 
the TSP [54.84, 85]. To achieve cutting-edge perfor- 
mance in an BA, it is often useful to improve at least 
the best candidate solutions in a given population by 
means of a perturbative local search method, such as 
iterative improvement. The resulting class of hybrid al- 
gorithms, which are also known as memetic algorithms 
(MA) [54.86], are enjoying increasing popularity as 
a broadly applicable method for solving solving combi- 
natorial problems (see Algorithm 54.5 for a high-level 
outline of an MA). 


Algorithm 54.5 High-level outline of a memetic 
algorithm 
Memetic algorithm (MA): 
initialize population p 
perform subsidiary local search on each 
candidate solution in p 
while termination criterion is not satisfied do 
generate set pr of candidate solutions 
through recombination 
perform subsidiary local search on each 
candidate solution of pr 
generate set pm of candidate solutions 
from p U pr through mutation 
perform subsidiary local search on each 
candidate solution of pm 
select new population p from candidate 
solutions in p U pr U pm 
end while 


Several other techniques are conceptually related to 
evolutionary algorithms but have different roots. Scat- 
ter search and path relinking are SLS methods whose 
origins can be traced back to the mid-1970s [54.16]. 
Scatter search can be seen as a memetic algorithm that 
uses special types of recombination and selection mech- 
anisms. Path relinking corresponds to a specific form of 
interpolation between two (or possibly more) candidate 
solutions and is thus conceptually related to recombi- 
nation operators. Both methods have recently become 
increasingly popular; details can be found in [54.17, 
87]. 


techniques, SLS algorithm engineering, and automated 
configuration and design of SLS algorithms. For other 
topics of interests, such as SLS algorithms for mul- 


1097 


9°nS | J Hed 


1098 PartE 


Evolutionary Computation 


9°nS | J Hed 


tiobjective [54.88—90], stochastic [54.91] or dynamic 
problems [54.92, 93], we refer to the literature for more 
details. 


54.6.1 Combination of SLS Algorithms 
with Systematic Search Techniques 


Systematic search and SLS are traditionally seen as 
two distinct approaches for solving challenging com- 
binatorial problems. Interestingly, the particular ad- 
vantages and disadvantages of each of these ap- 
proaches render them rather complementary. There- 
fore, it is hardly surprising that over the last few 
years, there has been increased interest in the ex- 
ploration and development of hybrid algorithms that 
combine ideas from both paradigms. For example, re- 
lated to the area of mathematical programming, the 
term Matheuristics has recently been coined to refer 
to methods that combine elements from mathematical 
programming techniques (which are primarily based 
on systematic search) and (meta)heuristic search algo- 
rithms [54.94]. 

Hybrids between SLS and systematic search fall 
into two main classes. The first of these consists of ap- 
proaches where the systematic search algorithm plays 
the role of the master process, and an SLS proce- 
dure is used to solve subproblems that arise during the 
systematic search process. Probably, the simplest, yet 
potentially effective method is to use an SLS algorithm 
to provide an initial high-quality (primal) bound on the 
optimal solution of the problem, which is then used 
by the systematic search algorithm for pruning parts of 
the search tree. Several more elaborate schemes have 
been devised, e.g., in the context of column generation 
and separation routines in integer programming [54.95]. 
Other approaches introduce the spirit of local search 
into integer programming solvers; examples of these in- 
clude local branching [54.96] and relaxation-induced 
neighborhood search [54.97]. We refer to [54.95] for 
a recent overview of such combinations. 

The second class of hybrid approaches is based on 
the idea of using systematic search procedures to deal 
with specific tasks arising while running an SLS al- 
gorithm. Very-large neighborhood search [54.37], as 
discussed in Sect. 54.2, is probably one of the best- 
known examples. Elements of tree search methods can 
also be exploited within constructive search algorithms, 
as exemplified by the use of branch and bound tech- 
niques in ACO algorithms [54.98, 99]. Other examples 
include tour merging [54.100] and the usage of infor- 
mation derived from integer programming formulations 


of optimization problems in heuristic methods [54.101]. 
We refer to [54.102] for a survey of this general ap- 
proach. A taxonomy of the possible combinations of 
exact and local search algorithms has been introduced 
by Jourdan et al. [54.103]. 

Despite an increasing number of efforts on com- 
bining systematic search methods and SLS methods, as 
reviewed in [54.94], much work remains to be done in 
this direction, especially considering that the two un- 
derlying fundamental search paradigms are developed 
primarily in rather disjoint communities. We believe 
that much can be gained by overcoming the traditional 
view of these two approaches as being competing with 
each other in favour of focusing on synergies due to 
their complementarity. 


54.6.2 SLS Algorithm Engineering 


Despite the impressive successes in SLS research and 
applications — SLS algorithms are now firmly estab- 
lished as the method of choice for tackling a broad 
range of combinatorial problems — there are still sig- 
nificant shortcomings. Perhaps most prominently, there 
is a lack of guidelines and best practices regarding the 
design and development of effective SLS algorithms. 
Current practice is to implement one specific SLS 
method, based on one or more construction heuristics 
or iterative improvement procedures. However, general- 
purpose SLS methods are not fully defined recipes: they 
leave many design choices open, and typically only spe- 
cific combinations of these choices will result in an 
effective algorithms for a given problem. Even worse, 
the underlying basic construction and iterative improve- 
ment procedures have a tremendous influence on the 
final performance of the SLS algorithms built on them, 
and this influence is frequently neglected. 

We firmly believe that a more methodological ap- 
proach needs to be taken toward the design and imple- 
mentation of SLS algorithms. The research direction 
dedicated to developing such an approach is called 
stochastic local search algorithm engineering or, for 
short, SLS engineering; it is conceptually related to 
algorithm engineering [54.104] and software engineer- 
ing [54.105], where similar methodological issues are 
tackled in a different context. Algorithm engineering is 
rather closely related to SLS engineering; it has been 
conceived as an extension to the traditionally more the- 
oretically oriented research on algorithms. Algorithm 
engineering, according to [54.104], deals with the it- 
erative process of designing, analyzing, implementing, 
tuning and experimentally evaluating algorithms. SLS 


Stochastic Local Search Algorithms: An Overview 


54.6 Recent Research Directions 


engineering shares this motivation; however, the al- 
gorithms that are dealt with in the context of SLS 
approaches have substantially more complex and un- 
predictable behavior than those typically considered 
in algorithm engineering. There are several reasons 
for this: SLS algorithms are usually used for solv- 
ing NP-hard problems, they allow for many more 
degrees of freedom in the choice of algorithm com- 
ponents, and their stochasticity makes analysis more 
complex. 

From a high-level perspective, an initial approach 
to a successful SLS engineering process would proceed 
in a bottom-up fashion. Starting from knowledge about 
the problem, it would build SLS algorithms by itera- 
tively adding complexity to simple, basic algorithms. 
More concretely, a tentative first attempt at such a pro- 
cess could be as follows: 


1. Study existing knowledge on the problem to be 
solved and its characteristics; 

2. Implement basic and advanced constructive and it- 
erative improvement procedures; 

3. Starting from these, add complexity (for example, 
by moving to simple SLS methods); 

4. Improve performance by gradually adding concepts 
from more complex SLS techniques (for exam- 
ple, perturbations, prohibition mechanisms, popula- 
tions); 

5. Further configure and fine-tune parameters and de- 
sign choices; 

6. If found to be useful: iterate over steps 4-5. 


Obviously, such a process would not necessarily 
strictly follow this outline, but insights gained at later 
stages could prompt revisiting earlier design decisions. 
Several high-performance SLS algorithms have already 
been developed following roughly the process outlined 
above (see [54.106] for an explicit example). 

The SLS engineering process can be supported in 
various ways. Algorithm development, implementation 
and testing is facilitated by the use of programming 
frameworks like Paradiseo [54.107, 108] and EasyLo- 
cal++ [54.109, 110], dedicated languages and systems 
like COMET [54.111], libraries of data types (such 
as LEDA [54.112]), and statistical tools, such as the 
comprehensive, open-source R environment [54.113]. 
We expect that software environments specifically de- 
signed for the automated empirical analysis and design 
of algorithms, such as HAL [54.114,115], will be 
especially useful in this context. Tools for the auto- 
matic configuration and tuning of algorithms, discussed 


further in the next section are also of considerable 
importance. 

Furthermore, we see an improved understanding of 
the relationship between problem and instance features 
on the one side, and the properties and the behavior of 
SLS methods on the other side as key enabling fac- 
tors for advanced SLS engineering approaches. The 
potential insights to be gained are not only of practical 
value to SLS engineering but also of considerable sci- 
entific interest. Progress in this direction is facilitated 
by advanced search space analysis techniques, statis- 
tical methods and machine learning approaches (see, 
e.g., Merz and Freisleben [54.116], Xu et al. [54.117] 
and Watson etal. [54.118]). Another promising av- 
enue for future research involves the integration of 
theoretical insights into the design process, for ex- 
ample, by restricting design alternatives or parameter 
choices. 

It is important to note that research toward SLS 
engineering adopts a component-wise view of SLS meth- 
ods. For example, iterated local search (ILS) uses 
perturbations to diversify the search as well as ac- 
ceptance tests (components: perturbations, acceptance 
tests), while evolutionary algorithms prominently in- 
volve the use of a population of solutions (component: 
population of solutions). Each of these components 
can be instantiated in different ways, and various com- 
binations are possible. An effective SLS engineering 
process should provide guidance to the algorithm de- 
signer regarding the choice and configuration of these 
components. It would naturally and incrementally lead 
to combinations of algorithmic components taken from 
different SLS methods (or other paradigms, such as 
mathematical programming — [54.94]), if these con- 
tribute to desirable performance characteristics of the 
algorithm under design. Such an engineering process 
would therefore rather naturally produce hybrid algo- 
rithms that are effective for solving the given computa- 
tional problem. 

Finally, SLS engineering highlights more the im- 
portance of decisions concerning the underlying basic 
SLS techniques (such as construction heuristics, neigh- 
borhoods, efficient data structures, etc.) than the gen- 
eral-purpose SLS methods (or metaheuristics) used in 
a given algorithm design scenario. In fact, in our ex- 
perience, such fundamental choices together with: (i) 
the level of expertise of the SLS algorithm developer 
and implementer, (ii) the time invested in designing and 
configuring the SLS algorithm, (iii) the creative use of 
insights into algorithm behavior and interaction with 
problem characteristics play a considerably more im- 


1099 


9°nS | J Hed 


1100 PartE 


Evolutionary Computation 


4S |3 Hed 


portant role in the design of effective SLS algorithms 
than the focus on specific features prescribed by so- 
called metaheuristics. 


54.6.3 Automatic Configuration 
of SLS Algorithms 


The performance of algorithms for virtually any com- 
putationally challenging problem (and in particular, for 
any NP-hard problem) depends strongly on appropriate 
settings of algorithm parameters. In many cases, there 
are tens of such parameter; for example, the well-known 
commercial CPLEX solver for integer programming 
problems has more than 130 user-specifiable param- 
eters that influence its search behavior. Likewise, the 
behavior of most SLS algorithms is controlled by pa- 
rameters, and many design choices can be exposed in 
the form of parameters. This gives rise to algorithms 
with many categorical and numerical parameters. Cate- 
gorical parameters are used to make choices from a dis- 
crete set of design variants, such as search strategies, 
neighborhoods or perturbation mechanisms. Numeri- 
cal parameters often arise as subordinate parameters 
that directly control the behavior of a search strategy 
(e.g., temperature in SA and tabu tenure in simple tabu 
search). The goal in automated algorithm configura- 
tion is to find settings of these parameters that achieve 
optimized performance w.r.t. a performance metric of 
interest (for example, solution quality or computation 
time). 

Automated algorithm configuration methods are an 
active area of research and have been demonstrated to 
achieve very substantial performance gains on many 
widely studied and challenging problems [54.119]. So- 
called offline configuration methods, which determine 
performance-optimizing parameter settings on a rep- 
resentative set of benchmark instances during a train- 
ing phase before algorithm deployment, have arguably 
been studied most intensely been studied. These in- 


References 


54.1 H.H. Hoos, T. Stützle: Stochastic Local Search— 
Foundations and Applications (Morgan Kauf- 


mann, San Francisco 2004) 


54.2 G.R. Schreiber, 0.C. Martin: Cut size statistics of 
graph bisection heuristics, SIAM J. Optim. 10(1), 
231-251 (1999) 

54.3 M. Gendreau, J.-Y. Potvin (Eds.): Handbook of 


Metaheuristics, International Series in Opera- 


clude procedures that are limited to tuning numerical 
parameters, such as CALIBRA [54.120], experimen- 
tal design-based approaches [54.69, 121], SPO [54.122] 
and SPOF [54.123]. Methods that can handle cat- 
egorical as well as numerical parameters are con- 
siderably more versatile; these include racing proce- 
dures [54.124, 125], model-free configuration proce- 
dures [54.126—128], and recent sequential model-based 
techniques [54.129, 130]. 

In online configuration, algorithm parameters are 
modified while attempting to solve a given problem 
instance. There are some inherent advantages of on- 
line configuration methods w.r.t. offline methods, es- 
pecially when targeting very heterogeneous instances, 
where appropriate algorithm parameters may depend 
strongly on the problem instance to be solved. Some 
of these methods fall into the realm of reactive search 
methods [54.131]; others have been studied in the 
area of evolutionary computation (for an overview, 
we refer to [54.132]). Unfortunately, most online con- 
figuration methods presently available deal with very 
few parameters primarily responsible for algorithm 
performance (often only one) and rely on specific 
insight into the working principles of the given algo- 
rithm. 

There are also various approaches for determining 
configurations of a given algorithm that result in good 
performance on a given problem instance. These per- 
instance configuration methods typically make use of 
computationally cheap instance features, which provide 
the basis for selecting the configuration to be used for 
solving a given instance [54.133—135]. 

Finally, we believe that there is significant promise 
in approaches for automating large parts of the de- 
sign process of performance-optimized SLS algorithms 
as, for example, outlined in recent work on computer- 
aided algorithm design for generalized local search ma- 
chines [54.136] and the programming by optimization 
(PbO) software design paradigm [54.23]. 


tions Research & Management Science, Vol. 146 
(Springer, New York 2010) 


54.4 S. Kirkpatrick, C.D. Gelatt Jr., M.P. Vecchi: Opti- 
mization by simulated annealing, Science 220, 
671-680 (1983) 

54.5 V. Cerny: A thermodynamical approach to the 


traveling salesman problem, J. 
Appl. 45(1), 41-51 (1985) 


Optim. Theory 


Stochastic Local Search Algorithms: An Overview 


References 


54.6 


54.7 


54.8 


54.9 


54.10 


54.11 


54.12 


54.13 


54.14 


54.15 


54.16 


54.17 


54.18 


54.19 


54.20 


54.21 


54.22 


54.23 


54.24 


F. Glover: Future paths for integer programming 
and links to artificial intelligence, Comput. Oper. 
Res. 13(5), 533-549 (1986) 

F. Glover: Tabu search — Part I, ORSA J. Comput. 
1(3), 190-206 (1989) 

F. Glover: Tabu search — Part Il, ORSA J. Comput. 
2(1), 4-32 (1990) 

P. Hansen, B. Jaumard: Algorithms for the maxi- 
mum satisfiability problem, Computing 44, 279- 
303 (1990) 

T.A. Feo, M.G.C. Resende: A probabilistic heuristic 
for a computationally difficult set covering prob- 
lem, Oper. Res. Lett. 8(2), 67-71 (1989) 

H.R. Lourenço, 0. Martin, T. Stiitzle: Iterated lo- 
cal search. In: Handbook of Metaheuristics, ed. by 
F. Glover, G. Kochenberger (Kluwer, Norwell 2002) 
pp. 321-353 

J.H. Holland: Adaption in Natural and Artificial 
Systems (The University of Michigan, Ann Arbor 
1975) 

D.E. Goldberg: Genetic Algorithms in Search, 
Optimization, and Machine Learning (Addison- 
Wesley, Reading 1989) 

l. Rechenberg: Evolutionsstrategie - Optimierung 
technischer Systeme nach Prinzipien der biologis- 
chen Information (Fromman, Freiburg, Germany 
1973) 

H.-P. Schwefel: Numerical Optimization of Com- 
puter Models (Wiley, Chichester 1981) 

F. Glover: Heuristics for integer programming us- 
ing surrogate constraints, Decis. Sci. 8, 156-164 
(1977) 

F. Glover, M. Laguna, R. Martí: Scatter search and 
path relinking: Advances and applications. In: 
Handbook of Metaheuristics, ed. by F. Glover, 
G. Kochenberger (Kluwer, Norwell 2002) pp. 1-35 
M. Dorigo, V. Maniezzo, A. Colorni: Positive feed- 
back as a search strategy. Techn. Rep. 91-016, 
Dipartimento di Elettronica, Politecnico di Milano, 
Italy, 1991 

M. Dorigo, V. Maniezzo, A. Colorni: Ant System: 
Optimization by a colony of cooperating agents, 
IEEE Trans. Syst. Man. Cybern. B 26(1), 29-41 
(1996) 

M. Dorigo, T. Stützle: Ant Colony Optimization (MIT, 
Cambridge 2004) 

T. Stützle, M. Birattari, H.H. Hoos: Engineering 
stochastic local search algorithms - designing, 
implementing and analyzing effective heuristics, 
Lect. Notes Comput. Sci. 4638, 1-221 (2007) 

T. Stiitzle, M. Birattari, H.H. Hoos: Engineering 
stochastic local search algorithms - designing, 
implementing and analyzing effective heuristics, 
Lect. Notes Comput. Sci. 5217, 1-155 (2009) 

H.H. Hoos: Programming by optimization, Com- 
mun. ACM 55, 70-80 (2012) 

S. Khanna, R. Motwani, M. Sudan, U. Vazirani: 
On syntactic versus computational views of ap- 
proximability, Proc. 35th Annu. IEEE Symp. Found. 


54.25 


54.26 


54.27 


54.28 


54.29 


54.30 


54.31 


54.32 


54.33 


54.34 


54.35 


54.36 


54.37 


54.38 


54.39 


54.40 


54.41 


Comput. Sci. (IEEE Computer Society, Los Alamitos 
1994) pp. 819-830 

F. Glover, M. Laguna: Tabu Search (Kluwer, Boston 
1997) 

K. Sörensen, F. Glover: Metaheuristics. In: Ency- 
clopedia of Operations Research and Manage- 
ment Science, ed. by S.l. Gass, M.C. Fu (Springer, 
Berlin 2013) pp. 960-970 

C.H. Papadimitriou, K. Steiglitz: Combinatorial 
Optimization — Algorithms and Complexity (Pren- 
tice Hall, Englewood Cliffs 1982) 

J.L. Bentley: Fast algorithms for geometric trav- 
eling salesman problems, ORSA J. Comput. 4(4), 
387-411 (1992) 

0.C. Martin, S.W. Otto, E.W. Felten: Large-step 
Markov chains for the traveling salesman prob- 
lem, Complex Syst. 5(3), 299-326 (1991) 

D.S. Johnson, L.A. McGeoch: The traveling sales- 
man problem: A case study in local optimization. 
In: Local Search in Combinatorial Optimization, 
ed. by E.H.L. Aarts, J.K. Lenstra (Wiley, Chichester 
1997) pp. 215-310 

A.S. Jain, B. Rangaswamy, S. Meeran: New and 
“stronger” job-shop neighbourhoods: A focus on 
the method of Nowicki and Smutnicki, J. Heuris- 
tics 6(4), 457-480 (2000) 

R.K. Congram, C.N. Potts, S. van de Velde: An 
iterated dynasearch algorithm for the single- 
machine total weighted tardiness scheduling 
problem, INFORMS J. Comput. 14(1), 52-67 (2002) 
M. Yannakakis: The analysis of local search prob- 
lems and their heuristics, Lect. Notes Comput. Sci. 
415, 298-310 (1990) 

R. Battiti, M. Protasi: Reactive search, a history- 
based heuristic for MAX-SAT, ACM J. Exp. Algorith- 
mics 2, 2 (1997) 

P. Hansen, N. Mladenović: Variable neighborhood 
search: Principles and applications, Eur. J. Oper. 
Res. 130(3), 449-467 (2001) 

P. Hansen, N. Mladenović: Variable neighborhood 
search. In: Handbook of Metaheuristics, ed. by 
F. Glover, G. Kochenberger (Kluwer, Norwell 2002) 
pp. 145-184 

R.K. Ahuja, 0. Ergun, J.B. Orlin, A.P. Punnen: 
A survey of very large-scale neighborhood search 
techniques, Discrete Appl. Math. 123(1-3), 75-102 
(2002) 

B.W. Kernighan, S. Lin: An efficient heuristic pro- 
cedure for partitioning graphs, Bell Syst. Technol. 
J. 49, 213-219 (1970) 

S. Lin, B.W. Kernighan: An effective heuristic al- 
gorithm for the traveling salesman problem, Oper. 
Res. 21(2), 498-516 (1973) 

F. Glover: Ejection chain, reference structures and 
alternating path methods for traveling salesman 
problems, Discrete. Appl. Math. 65(1-3), 223-253 
(1996) 

R.K. Ahuja, 0. Ergun, J.B. Orlin, A.P. Punnen: Very 
large-scale neighborhood search. In: Handbook 


1101 


4S |3 Hed 


1102 


4S |3 Hed 


Part E 


Evolutionary Computation 


54.42 


54.43 


54.44 


54.45 


54.46 


54.47 


54.48 


54.49 


54.50 


54.51 


54.52 


54.53 


54.54 


54.55 


54.56 


of Approximation Algorithms and Metaheuristics, 
Computer and Information Science Series, ed. by 
T.F. Gonzalez (Chapman Hall/CRC, Boca Raton 2007) 
pp. 1-12 

|. Dumitrescu: Constrained Path and Cycle Prob- 
lems, Ph.D. Thesis (University of Melbourne, De- 
partment of Mathematics and Statistics 2002) 

M. Chiarandini, |. Dumitrescu, T. Stiitzle: Very 
large-scale neighborhood search: Overview and 
case studies on coloring problems. In: Hybrid 
Metaheuristics - An Emergent Approach to Opti- 
mization, Studies in Computational Intelligence, 
Vol. 117, ed. by C. Blum, M.J. Blesa Aguilera, A. Roli, 
M. Sampels (Springer, Berlin 2008) pp. 117-150 
C.N. Potts, S. van de Velde: Dynasearch: Iter- 
ative local improvement by dynamic program- 
ming; Part |, the traveling salesman problem. 
Techn. Rep. LPOM-9511, Faculty of Mechanical En- 
gineering, University of Twente, Enschede, The 
Netherlands, 1995 

P.M. Thompson, J.B. Orlin: The theory of cycle 
transfers, Working Paper OR 200-89, Operations 
Research Center, MIT, Cambridge 1989 

P.M. Thompson, H.N. Psaraftis: Cyclic transfer al- 
gorithm for multivehicle routing and scheduling 
problems, Oper. Res. 41, 935-946 (1993) 

K. Helsgaun: An effective implementation of the 
Lin-Kernighan traveling salesman heuristic, Eur. 
J. Oper. Res. 126(1), 106-130 (2000) 

A. Grosso, F. Della Croce, R. Tadei: An en- 
hanced dynasearch neighborhood for the single- 
machine total weighted tardiness scheduling 
problem, Oper. Res. Lett. 32(1), 68-72 (2004) 

B. Selman, H. Kautz: Domain-independent ex- 
tensions to GSAT: Solving large structured satis- 
fiability problems, Proc. 13th Int. Jt. Conf. Artif. 
Intell., ed. by R. Bajcsy (Morgan Kaufmann, San 
Francisco 1993) pp. 290-295 

B. Selman, H. Kautz, B. Cohen: Noise strategies for 
improving local search, Proc. 12th Natl. Conf. Artif. 
Intell., AAAI/The MIT (1994) pp. 337-343 

0. Steinmann, A. Strohmaier, T. Stiitzle: Tabu 
search vs. random walk, Lect. Notes Artif. Intell. 
1303, 337-348 (1997) 

0.J. Mengshoel: Understanding the role of noise 
in stochastic local search: Analysis and experi- 
ments, Artif. Intell. 172(8/9), 955-990 (2008) 

D.T. Connolly: An improved annealing scheme for 
the QAP, Eur. J. Oper. Res. 46(1), 93-100 (1990) 

M. Fielding: Simulated annealing with an optimal 
fixed temperature, SIAM J. Optim. 11(2), 289-307 
(2000) 

E.H.L. Aarts, J.H.M. Korst, P.J.M. van Laarhoven: 
Simulated annealing. In: Local Search in Com- 
binatorial Optimization, ed. by E.H.L. Aarts, 
J.K. Lenstra (Wiley, Chichester 1997) pp. 91-120 
A.G. Nikolaev, S.H. Jacobsen: Simulated anneal- 
ing. In: Handbook of Metaheuristics, Interna- 
tional Series in Operations Research & Manage- 


54.57 


54.58 


54.59 


54.60 


54.61 


54.62 


54.63 


54.64 


54.65 


54.66 


54.67 


54.68 


54.69 


54.70 


54.71 


ment Science, Vol. 146, ed. by M. Gendreau, J.-Y. 
Potvin (Springer, New York 2010) pp. 1-40 2 edi- 
tion, chapter 8 

R. Battiti, G. Tecchiolli: Simulated annealing and 
tabu search in the long run: A comparison on QAP 
tasks, Comput. Math. Appl. 28(6), 1-8 (1994) 

C. Voudouris: Guided Local Search for Combina- 
torial Optimization Problems, Ph.D. Thesis (Uni- 
versity of Essex, Department of Computer Science, 
Colchester 1997) 

C. Voudouris, E. Tsang: Guided local search and its 
application to the travelling salesman problem, 
Eur. J. Oper. Res. 113(2), 469-499 (1999) 

Y. Shang, B.W. Wah: A discrete Lagrangian-based 
global-search method for solving satisfiability 
problems, J. Glob. Optim. 12(1), 61-100 (1998) 

D. Schuurmans, F. Southey, R.C. Holte: The ex- 
ponentiated subgradient algorithm for heuristic 
boolean programming, Proc. 17th Int. Jt. Conf. Ar- 
tif. Intell., ed. by B. Nebel (Morgan Kaufmann, San 
Francisco 2001) pp. 334-341 

F. Hutter, D.A.D. Tompkins, H.H. Hoos: Scaling and 
probabilistic smoothing: Efficient dynamic local 
search for SAT, Lect. Notes Comput. Sci. 2470, 233- 
248 (2002) 

W.J. Pullan, H.H. Hoos: Dynamic local search for 
the maximum clique problem, J. Artif. Intell. Res. 
25, 159-185 (2006) 

M.G.C. Resende, C.C. Ribeiro: Greedy random- 
ized adaptive search procedures: Advances and 
applications. In: Handbook of Metaheuristics, In- 
ternational Series in Operations Research & Man- 
agement Science, Vol. 146, ed. by M. Gendreau, 
J.-Y. Potvin (Springer, New York 2010) pp. 281-317 
G. Schrimpf, J. Schneider, H. Stamm-Wilbrandt, 
G. Dueck: Record breaking optimization results 
using the ruin and recreate principle, J. Comput. 
Phys. 159(2), 139-171 (2000) 

A. Cesta, A. Oddi, S.F. Smith: Iterative flatten- 
ing: A scalable method for solving multi-capacity 
scheduling problems, Proc. 17th Natl. Conf. Artif. 
Intell., AAAI/The MIT (2000) pp. 742-747 

A.J. Richmond, J.E. Beasley: An iterative con- 
struction heuristic for the ore selection problem, 
J. Heuristics 10, 153-167 (2004) 

L.W. Jacobs, M.J. Brusco: A local search heuristic 
for large set-covering problems, Nav. Res. Logist. 
42(7), 129-1140 (1995) 

R. Ruiz, T. Stiitzle: A simple and effective iter- 
ated greedy algorithm for the permutation flow- 
shop scheduling problem, Eur. J. Oper. Res. 177(3), 
2033-2049 (2007) 

R. Ruiz, T. Stiitzle: An iterated greedy heuristic for 
the sequence dependent setup times flowshop 
problem with makespan and weighted tardi- 
ness objectives, Eur. J. Oper. Res. 187(3), 1143-1159 
(2008) 

D.S. Johnson, L.A. McGeoch: Experimental anal- 
ysis of heuristics for the STSP. In: The Travel- 


Stochastic Local Search Algorithms: An Overview 


References 


54.72 


54.73 


54.74 


54.75 


54.76 


54.77 


54.78 


54.79 


54.80 


54.81 


54.82 


54.83 


54.84 


54.85 


54.86 


54.87 


ing Salesman Problem and its Variations, ed. 
by G. Gutin, A. Punnen (Kluwer, Dordrecht, The 
Netherlands 2002) pp. 369-443 

D. Applegate, W. Cook, A. Rohe: Chained Lin- 
Kernighan for large traveling salesman problems, 
INFORMS J. Comput. 15(1), 82-92 (2003) 

H.R. Lourenço, O. Martin, T. Stiitzle: Iterated 
local search: Framework and applications. In: 
Handbook of Metaheuristics, International Se- 
ries in Operations Research & Management Sci- 
ence, Vol. 146, ed. by M. Gendreau, J.-Y. Potvin 
(Springer, New York 2010) pp. 363-397 

T. Stiitzle: Iterated local search for the quadratic 
assignment problem, Eur. J. Oper. Res. 174(3), 
1519-1539 (2006) 

|. Hong, A.B. Kahng, B.R. Moon: Improved large- 
step Markov chain variants for the symmetric TSP, 
J. Heuristics 3(1), 63-81 (1997) 

S. Goss, S. Aron, J.L. Deneubourg, J.M. Pasteels: 
Self-organized shortcuts in the Argentine ant, 
Naturwissenschaften 76, 579-581 (1989) 

J.-L. Deneubourg, S. Aron, S. Goss, J.-M. Pasteels: 
The self-organizing exploratory pattern of the Ar- 
gentine ant, J. Insect Behav. 3, 159-168 (1990) 

T. Stiitzle, H.H. Hoos: MAX-MIN ant system, Future 
Gener. Comput. Syst. 16(8), 889-914 (2000) 

M. Dorigo, M. Birattari, T. Stiitzle: Ant colony 
optimization: Artificial ants as a computational 
intelligence technique, IEEE Comput. Intell. Mag. 
1(4), 28-39 (2006) 

M. Dorigo, T. Stiitzle: Ant colony optimization: 
Overview and recent advances. In: Handbook of 
Metaheuristics, International Series in Operations 
Research & Management Science, Vol. 146, ed. 
by M. Gendreau, J.-Y. Potvin (Springer, New York 
2010) pp. 227-263 

M. Dorigo, G. Di Caro: The ant colony optimization 
meta-heuristic. In: New Ideas in Optimization, 
ed. by D. Corne, M. Dorigo, F. Glover (McGraw Hill, 
London 1999) pp. 11-32 

M. Dorigo, G. Di Caro, L.M. Gambardella: Ant al- 
gorithms for discrete optimization, Artif. Life 5(2), 
137-172 (1999) 

E. Bonabeau, M. Dorigo, G. Theraulaz: Swarm 
Intelligence: From Natural to Artificial Systems 
(Oxford Univ. Press, New York 1999) 

J.-Y. Potvin: Genetic algorithms for the traveling 
salesman problem, Ann. Oper. Res. 63, 339-370 
(1996) 

P. Merz, B. Freisleben: Memetic algorithms for the 
traveling salesman problem, Complex Syst. 13(4), 
297-345 (2001) 

P. Moscato: Memetic algorithms: A short intro- 
duction. In: New Ideas in Optimization, ed. by 
D. Corne, M. Dorigo, F. Glover (McGraw Hill, Lon- 
don 1999) pp. 219-234 

M. Laguna, R. Martí: Scatter Search: Methodology 
and Implementations in C, Vol. 24 (Kluwer, Boston 
2003) 


54.88 


54.89 


54.90 


54.91 


54.92 


54.93 


54.94 


54.95 


54.96 


54.97 


54.98 


54.99 


54.100 


54.101 


54.102 


M. Ehrgott, X. Gandibleux: Approximative solu- 
tion methods for combinatorial multicriteria op- 
timization, TOP 12(1), 1-88 (2004) 

M. Ehrgott, X. Gandibleux: Hybrid metaheuristics 
for multi-objective combinatorial optimization. 
In: Hybrid Metaheuristics: An emergent approach 
for optimization, ed. by C. Blum, M.J. Blesa, 
A. Roli, M. Sampels (Springer, Berlin, Germany 
2008) pp. 221-259 

L. Paquete, T. Stützle: Stochastic local search al- 
gorithms for multiobjective combinatorial opti- 
mization: A review. In: Handbook of Approxi- 
mation Algorithms and Metaheuristics, Computer 
and Information Science Series, ed. by T.F. Gonza- 
lez (Chapman Hall/CRC, Boca Raton 2007) pp. 1-15 
L. Bianchi, M. Dorigo, L.M. Gambardella, W.J. Gut- 
jahr: A survey on metaheuristics for stochastic 
combinatorial optimization, Nat. Comput. 8(2), 
239-287 (2009) 

D. Ouelhadj, S. Petrovic: A survey of dynamic 
scheduling in manufacturing systems, J. Sched. 
12(4), 417-431 (2009) 

V. Pillac, M. Gendreau, C. Guéret, A. L. Medaglia: 
A review of dynamic vehicle routing problems. 
Techn. Rep. CIRRELT-2011-62, Interuniversity Re- 
search Centre on Enterprise Networks, Logistics 
and Transportation, Montréal, Canada, October 
2011 

V. Maniezzo, T. Stützle, S. Voß (Eds.): Matheuris- 
tics - Hybridizing Metaheuristics and Mathemat- 
ical Programming, Annals of Information Sys- 
tems, Vol. 10 (Springer, New York 2010) 

J. Puchinger, G.R. Raidl, S. Pirkwieser: MetaBoost- 
ing: Enhancing integer programming techniques 
by metaheuristics. In: Matheuristics — Hybridizing 
Metaheuristics and Mathematical Programming, 
Annals of Information Systems, Vol. 10, ed. by 
V. Maniezzo, T. Stiitzle, S. Voß (Springer, New York 
2010) pp. 71-102 

M. Fischetti, A. Lodi: Local branching, Math. Pro- 
gram. 98(1/3), 23-47 (2003) 

E. Danna, E. Rothberg, C. Le Pape: Exploring re- 
laxation induced neighborhoods to improve mip 
solutions, Math Program. 102(1), 71-90 (2005) 

V. Maniezzo: Exact and approximate nondeter- 
ministic tree-search procedures for the quadratic 
assignment problem, INFORMS J. Comput. 11(4), 
358-369 (1999) 

C. Blum: Beam-ACO for simple assembly line bal- 
ancing, INFORMS J. Comput. 20(4), 618-627 (2008) 
W. Cook, P. Seymour: Tour merging via branch- 
decomposition, INFORMS J. Comput. 15(3), 233- 
248 (2003) 

M.A. Boschetti, V. Maniezzo: Benders decompo- 
sition, Lagrangean relaxation and metaheuristic 
design, J. Heuristics 15(3), 283-312 (2009) 

|. Dumitrescu, T. Stiitzle: Usage of exact algo- 
rithms to enhance stochastic local search algo- 
rithms. In: Matheuristics — Hybridizing Meta- 


1103 


4S |3 Hed 


no4 = PartE 


Evolutionary Computation 


4S |3 Hed 


54.103 


54.104 


54.105 


54.106 


54.107 


54.108 


54.109 


54.110 


54.111 


54.112 


54.113 


54.114 


54.115 


54.116 


54.117 


54.118 


54.119 


heuristics and Mathematical Programming, An- 
nals of Information Systems, Vol. 10, ed. by 
V. Maniezzo, T. Stiitzle, S. Voß (Springer, New York 
2010) pp. 103-134 

L. Jourdan, M. Basseur, E.-G. Talbi: Hybridizing 
exact methods and metaheuristics: A taxonomy, 
Eur. J. Oper. Res. 199(3), 620-629 (2009) 

C. Demetrescu, l. Finocchi, G.F. Italiano: Algorithm 
engineering, Bulletin EATCS 79, 48-63 (2003) 

I. Sommerville (Ed.): Software Engineering, 7th 
edn. (Addison Wesley, Boston 2004) 

P. Balaprakash, M. Birattari, T. Stützle: Engineer- 
ing stochastic local search algorithms: A case 
study in estimation-based local search for the 
probabilistic traveling salesman problem. In: Re- 
cent Advances in Evolutionary Computation for 
Combinatorial Optimization, Studies in Computa- 
tional Intelligence, Vol. 153, ed. by C. Cotta, J. van 
Hemert (Springer, Berlin 2008) pp. 55-69 

S. Cahon, N. Melab, E.-G. Talbi: ParadisEO: 
A framework for the reusable design of parallel 
and distributed metaheuristics, J. Heuristics 10(3), 
357-380 (2004) 

Paradiseo: A Software Framework for Metaheuris- 
tics, http://paradiseo.gforge.inria.fr 

L. Di Gaspero, A. Schaerf: Writing local search 
algorithms using EASYLOCAL++. In: Optimiza- 
tion Software Class Libraries, ed. by S. Voß, 
D.L. Woodruff (Kluwer, Boston, 2002) pp. 155-175 
Atlassian Bitbucket: https://bitbucket.org/satt/ 
easylocal-3 

P. Van Hentenryck, L. Michel: Constraint-Based 
Local Search (MIT, Cambridge 2005) 

K. Mehlhorn, S. Näher: LEDA: A Platform for Com- 
binatorial and Geometric Computing (Cambridge 
Univ. Press, Cambridge 1999) 

The R Project for Statistical Computing, http:// 
www.r-project.org 

C.W. Nell, C. Fawcett, H.H. Hoos, K. Leyton-Brown: 
HAL: A framework for the automated design and 
analysis of high-performance algorithms, Lect. 
Notes Comput. Sci. 6683, 600-615 (2011) 

HAL: The High-performance Algorithm Laboratory, 
http://hal.cs.ubc.ca/ 

P. Merz, B. Freisleben: Fitness landscapes and 
memetic algorithm design. In: New Ideas in Op- 
timization, ed. by D. Corne, M. Dorigo, F. Glover 
(McGraw Hill, London 1999) pp. 244-260 

L. Xu, H. Hoos, K. Leyton-Brown: Hierarchical 
hardness models for SAT, Lect. Notes Comput. Sci. 
4741, 696-711 (2007) 

J.-P. Watson, L.D. Whitley, A.E. Howe: Linking 
search space structure, run-time dynamics, and 
problem difficulty: A step towards demystifying 
tabu search, J. Artif. Intell. Res. 24, 221-261 (2005) 
H.H. Hoos: Automated algorithm configuration 
and parameter tuning. In: Autonomous Search, 
ed. by Y. Hamadi, E. Monfroy, F. Saubion (Springer, 
Berlin 2012) pp. 37-71 


54.120 


54.121 


54.122 


54.123 


54.124 


54.125 


54.126 


54.127 


54.128 


54.129 


54.130 


54.131 


54.132 


B. Adenso-Diaz, M. Laguna: Fine-tuning of 
algorithms using fractional experimental de- 
signs and local search, Oper. Res. 54(1), 99-114 
(2006) 

S.P. Coy, B.L. Golden, G.C. Runger, E.A. Wasil: Using 
experimental design to find effective parame- 
ter settings for heuristics, J. Heuristics 7(1), 77-97 
(2001) 

T. Bartz-Beielstein: Experimental Research in 
Evolutionary Computation — The New Experimen- 
talism (Springer, Berlin 2006) 

F. Hutter, H.H. Hoos, K. Leyton-Brown, K.P. Mur- 
phy: An experimental investigation of model- 
based parameter optimisation: SPO and beyond, 
Genet. Evol. Comput. Conf., GECCO 2009, ed. 
by F. Rothlauf (ACM, New York 2009) pp. 271- 
278 

M. Birattari, T. Stiitzle, L. Paquete, K. Varrentrapp: 
A racing algorithm for configuring metaheuristics, 
Proc. Genet. Evol. Comput. Conf. (GECCO-2002), 
ed. by W.B. Langdon, E. Cantu-Paz, K.E. Math- 
ias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, 
V. Honavar, G. Rudolph, J. Wegener, L. Bull, 
M.A. Potter, A.C. Schultz, J.F. Miller, E.K. Burke, 
N. Jonoska (Morgan Kaufmann, San Francisco 
2002) pp. 11-18 

M. Birattari, Z. Yuan, P. Balaprakash, T. Stiitzle: 
F-Race and iterated F-Race: An overview. In: 
Experimental Methods for the Analysis of Opti- 
mization Algorithms, ed. by T. Bartz-Beielstein, 
M. Chiarandini, L. Paquete, M. Preuss (Springer, 
Berlin, Germany 2010) pp. 311-336 

F. Hutter, H.H. Hoos, T. Stiitzle: Automatic algo- 
rithm configuration based on local search, Proc. 
22nd Conf. Artif. Intell. (AAAI), ed. by R.C. Holte, 
A. Howe (AAAI / The MIT, Menlo Park 2007) pp. 1152- 
1157 

C. Ansétegui, M. Sellmann, K. Tierney: A gender- 
based genetic algorithm for the automatic con- 
figuration of algorithms, Proc. 15th Int. Conf. 
Princ. Pract. Constraint Program. (CP 2009) (2009) 
pp. 142-157 

F. Hutter, H.H. Hoos, K. Leyton-Brown, T. Stützle: 
Param ILS: An automatic algorithm configura- 
tion framework, J. Artif. Intell. Res. 36, 267-306 
(2009) 

F. Hutter, H.H. Hoos, K. Leyton-Brown: Sequen- 
tial model-based optimization for general al- 
gorithm configuration, Lect. Notes Comput. Sci. 
6683, 507-523 (2011) 

F. Hutter, H.H. Hoos, K. Leyton-Brown: Parallel al- 
gorithm configuration, Lect. Notes Comput. Sci. 
7219, 55-70 (2011) 

R. Battiti, M. Brunato, F. Mascia: Reactive Search 
and Intelligent Optimization, Operations Re- 
search/Computer Science Interfaces Series, Vol. 45 
(Springer, New York 2008) 

A.E. Eiben, Z. Michalewicz, M. Schoenauer, 
J.E. Smith: Parameter control in evolutionary 


Stochastic Local Search Algorithms: An Overview 


References 


54.133 


54.134 


algorithms. In: Parameter Setting in Evolu- 
tionary Algorithms, ed. by F. Lobo, C.F. Lima, 
Z. Michalewicz (Springer, Berlin, Germany 2007) 
pp. 19-46 

F. Hutter, Y. Hamadi, H.H. Hoos, K. Leyton- 
Brown: Performance prediction and automated 
tuning of randomized and parametric algo- 
rithms, Lect. Notes Comput. Sci. 4204, 213-228 
(2006) 

L. Xu, H.H. Hoos, K. Leyton-Brown: Hydra: Au- 
tomatically configuring algorithms for portfolio- 


54.135 


54.136 


based selection, Proc. 24th AAAI Conf. Artif. Intell. 
(AAAI-10) (2010) pp. 210-216 

S. Kadioglu, Y. Malitsky, M. Sellmann, K. Tierney: 
ISAC - Instance-specific algorithm configuration, 
Proc. 19th Eur. Conf. Artif. Intell. (ECAI 2010) (2010) 
pp. 751-756 

H.H. Hoos: Computer-aided algorithm design us- 
ing generalised local search machines and related 
design patterns. Techn. Rep. TR-2009-26, Univer- 
sity of British Columbia, Department of Computer 
Science, 2009 


1105 


4S |3 Hed 


55. Parallel Evolutionary Combinatorial Optimization 


El-Ghazali Talbi 


In this chapter, a clear difference is made between 
the parallel design aspect and the parallel imple- 
mentation aspect of evolutionary algorithms (EAs). 
From the algorithmic design point of view, the 
main parallel models for EAs are presented. A uni- 
fying view of parallel models for EAs is outlined. 
This chapter is organized as follows. In Sect. 55.2, 
the main parallel models for designing EAs are 
presented. Section 55.3 deals with the implemen- 
tation issues of parallel EAs. In this section, the 
main concepts of parallel architectures and paral- 
lel programming paradigms, which interfere with 
the design and implementation of parallel EAs, are 
outlined. The main performance indicators that 
can be used to evaluate a parallel EAs in terms 
of efficiency are detailed. Finally, Sect. 55.4 deals 
with the design and implementation of differ- 
ent parallel models for EAs based on the software 
framework ParadisE0. 


551 Motivations -scissors tsitis 1107 
55.2 Parallel Design of EAS.......................00. 1108 


55.1 Motivation 


On one hand, optimization problems are more and more 
complex and their resource requirements to solve them 
are ever increasing. Real-life optimization problems are 
often NP-hard, and CPU time, and/or memory con- 
suming. Although the use of evolutionary algorithms 
(EAs) allows us to significantly reduce the computa- 
tional complexity of the solving algorithm, the latter 
remains time-consuming for many problems in diverse 
domains of application, where the objective function 
and the constraints associated with the problem are re- 
source (e.g., CPU, memory) intensive and the size of the 
search space is huge. Moreover, more and more com- 
plex and resource intensive EAs are developed (e.g., 
hybrid EAs, multiobjective EAs) [55.1]. 


55.2.1 Algorithmic-Level 

Parallel Model... 1109 
55.2.2 Iteration-Level Parallel Model..... 1111 
55.2.3 Solution-Level Parallel Model..... 1112 
55.2.4 Hierarchical Combination 


of the Parallel Models................ 4112 

55.3 Parallel Implementation of EAs............. 1113 
55.3.1 Parallel and Distributed 

Architectures... irsesinserisisesiseres 1113 

55.3.2 Dedicated Architectures ............. 1114 


55.3.3 Parallel Programming 
Environments and Middlewares.. 1114 


55.3.4 Performance Evaluation............. 1116 
55.3.5 Main Properties 

gf Parallel EA soc dcceiicsgesssavecietess 1117 
55.3.6 Algorithmic-Level 

Parallel Model... 1118 


55.3.7 Iteration-Level Parallel Model..... 1120 
55.3.8 Solution-Level Parallel Model..... 1121 


55.4 Parallel EAs Under ParadisEO ................ 1122 
55.5 Conclusions and Perspectives ............... 1123 
ROTOTONCOS e neneeese 1124 


On the other hand, the rapid development of tech- 
nology in designing processors (e.g. multicore proces- 
sors, dedicated architectures), networks (local networks 
(LAN) such as Myrinet and Infiniband or wide area 
networks (WAN) such as optical networks), and data 
storage make the use of parallel computing more and 
more popular. Such architectures represent an effec- 
tive opportunity for the design and implementation 
of parallel EAs. Indeed, sequential architectures are 
reaching physical limitations (speed of light, thermo- 
dynamics). Nowadays, even laptops and workstations 
are equipped with multicore processors, which repre- 
sent one class of parallel architecture. Moreover, the 
ratio cost/performance is constantly decreasing. The 


1107 


108 PartE 


Evolutionary Computation 


7°cS |3 Hed 


proliferation of powerful workstations and fast com- 
munication networks have shown the emergence of 
dedicated architectures (e.g., GPUs), clusters of pro- 
cessors (COPs), networks of workstations (NOWs), and 
large-scale networks of machines (Grids) as platforms 
for high-performance computing. 

Parallel and distributed computing can be used in 
the design and implementation of EAs for the following 
reasons: 


@ Speedup the search: One of the main goals in 
parallelizing an EA is to reduce the search time. 
This helps designing real time and interactive 
optimization methods. This is a very important 
aspect for some class of problems where there 
are hard requirements on search time such as 
in dynamic optimization problems and time-crit- 
ical control problems such as real-time plan- 
ning. 

© Improve the quality of the obtained solutions: Some 
parallel models for EAs allow us to improve the 
quality of solutions. Indeed, exchanging informa- 
tion between algorithms will alter their behavior 
in terms of searching in the landscape associated 
with the problem. The main goal in the coopera- 
tion between algorithms is to improve the quality 
of solutions. Both convergence to better solutions 
and reduced search time may happen. Let us note 
that a parallel model for EAs may be more effective 
than a sequential algorithm even on a single proces- 
sor [55.2]. 

@ Improve the robustness: A parallel EA may be more 
robust in terms of solving in an effective manner dif- 
ferent optimization problems and different instances 
of a given problem. Robustness may be measured 
in terms of the sensitivity of the algorithm to its 
parameters. 


55.2 Parallel Design of EAs 


In terms of designing parallel EAs, three major parallel 
models are identified. They follow the following three 
hierarchical levels (Table 55.1): 


© Algorithmic level: In this model, independent or 
cooperating self-contained EAs are used. It is 
a problem-independent interalgorithm paralleliza- 
tion. If the different EAs are independent, the search 
will be equivalent to the sequential execution of the 


@ Solve large-scale problems: Parallel EAs allow to 
solve large-scale instances of complex optimiza- 
tion problems. A challenge here is to solve very 
large instances that cannot be solved on a sequen- 
tial machine. Another similar challenge is to solve 
more accurate mathematical models associated with 
different optimization problems. Improving the ac- 
curacy of mathematical models increases in general 
the size of the associated problems to be solved. 
Moreover, some optimization problems need the 
manipulation of huge databases such as data min- 
ing problems. 


The implementation point of view deals with the 
efficiency of a parallel EAs on a target parallel archi- 
tecture using a given parallel language, programming 
environment, or middleware. The focus is on the paral- 
lelization of EAs on general-purpose parallel and dis- 
tributed architectures, since this is the most widespread 
computational platform. This chapter also deals with 
the implementation of EAs on dedicated architectures 
such as reconfigurable architectures and GPUs (graph- 
ical processing units). Different architectural criteria, 
which affect the efficiency of the implementation, will 
be considered: shared memory versus distributed mem- 
ory, homogeneous versus heterogeneous, shared ver- 
sus nonshared by multiple users, local network versus 
large network. Indeed, those criteria have a strong im- 
pact on the deployment technique employed such as 
load balancing and fault tolerance. Depending on the 
type of parallel architecture used, different parallel 
and distributed languages, programming environments, 
and middlewares may be used such as message pass- 
ing (e.g., MPI), shared memory (e.g., multithreading, 
OpenMP, CUDA), remote procedural call (e.g., Java 
RMI, RPC), high-throughput computing (e.g., Condor), 
and grid computing (e.g., Globus). 


algorithms in terms of the quality of solutions. How- 
ever, the cooperative model will alter the behavior 
of the EAs and enable the improvement in terms of 
the quality of solutions. 

© Iteration level: In this model, each iteration of 
an EA is parallelized. It is a problem-independent 
intra-algorithm parallelization. The behavior of the 
EA is not altered. The main objective is to speedup 
the algorithm by reducing the search time. Indeed, 


Parallel Evolutionary Combinatorial Optimization 


55.2 Parallel Design of EAs 


Table 55.1 Parallel models of EAs 


Parallel model Problem dependency Behavior 
Algorithmic level Independent Altered 
Iteration level Independent Nonaltered 
Solution level Dependent Nonaltered 


the iteration cycle of EAs on large populations, es- 
pecially for real-world problems, requires a large 
amount of computational resources. 

@ Solution level: In this model, the parallelization pro- 
cess handles a single solution of the search space. It 
is a problem-dependent intra-algorithm paralleliza- 
tion. In general, evaluating the objective function(s) 
or constraints for a generated solution is frequently 
the most costly operation in EAs. In this model, the 
behavior of the EA is not altered. The objective is 
mainly the speedup of the search. 


In the following sections, different parallel models 
are detailed and analyzed in terms of algorithmic de- 
sign. 


55.2.1 Algorithmic-Level Parallel Model 


In this model, many EAs are launched in parallel. They 
may cooperate or not to solve the target optimization 
problem. 


Independent Algorithmic-Level Parallel Model 
In the independent algorithmic-level parallel model, 
different EAs are executed without any cooperation. 
The different EAs may be initialized with different pop- 
ulations. Different parameter settings may be used for 
the EAs such as the mutation and crossover proba- 
bilities, etc. Moreover, each search component of an 
EA may be designed differently: encoding, search op- 
erators (e.g., variation operators), objective function, 
constraints, stopping criteria, etc. 

This parallel model is straightforward to design and 
implement. The master/worker paradigm is well suited 
to this model. A worker implements an EA. The master 
defines different parameters to use by the workers and 
determines the best found solution from those obtained 
by different workers. In addition to speeding up the al- 
gorithm, this parallel model enables us to improve its 
robustness [55.3]. 

This model raises particularly the following ques- 
tion: Is it equivalent to execute k EAs during a time f 
and to execute a single EA during kt? The answer 
depends on the landscape properties of the problem 
(e.g., the presence of multiple basins of attraction, 


Granularity Goal 

EA Effectiveness 
Iteration Efficiency 
Solution Efficiency 


distribution of the local optima, and fitness distance cor- 
relation) [55.4]. 


Cooperative Algorithmic-Level Parallel Model 
In the cooperative model for parallel EAs, different 
algorithms are exchanging informations related to the 
search with the intent to compute better and more ro- 
bust solutions. 

In designing a parallel cooperative model for any 
EA, the same design questions need to be answered: 


@ The exchange decision criterion (When?): The ex- 
change of information between the EAs can be 
decided either in a blind (periodic or probabilistic) 
way or according to an intelligent adaptive crite- 
rion. Periodic exchange occurs in each algorithm 
after a fixed number of iterations; this type of com- 
munication is synchronous. Probabilistic exchange 
consists in performing a communication operation 
after each iteration with a given probability. Con- 
versely, adaptive exchanges are guided by some 
run-time characteristics of the search. For instance, 
it may depend on the evolution of the quality of the 
solutions or the search memory. A classical crite- 
rion is related to the improvement of the best found 
local solution. 

© The exchange topology (Where ?): The communica- 
tion exchange topology indicates for each EA its 
neighbor(s) regarding the exchange of information, 
i. e., the source/destination algorithm(s) of the infor- 
mation. Several works have been dedicated to the 
study of the impact of the topology on the quality 
of the provided results, and show that cyclic graphs 
are better [55.5, 6]. The ring, mesh, and hypercube 
regular topologies are often used. 

© The information exchanged (What?): This param- 
eter specifies the information to be exchanged be- 
tween the EAs. In general, this information can be 
composed of: 

— Solutions: This information deals with a selec- 
tion of the solutions found during the search. In 
general, it contains elite solutions that have been 
found such as the best solution at the current 
iteration, local best solutions, global best solu- 
tion, neighborhood best solution, best diversi- 


1109 


TSS |3 Hed 


1m0 PartE | Evolutionary Computation 


T'SS |3 Hed 


a) Parallel insular model for EAs 


b) Parallel cellular model for EAs 


Fig. 55.1a,b The traditional parallel 
(a) island and (b) cellular models for 


evolutionary algorithms 


fied solutions, and randomly selected solutions. 
The number of solutions to exchange may be an 
absolute value or a given percentage of the pop- 
ulation. Any selection mechanism can be used 
to select the solutions. 

— Search memory: This information deals with 
any element of the search memory that is asso- 
ciated with the involved EA. 

© The integration policy (How?): Analogously to the 
information exchange policy, the integration policy 
deals with the usage of the received information. In 
general, there is a local copy of the received infor- 
mation. The local variables are updated using the 
received ones. For instance, the best found solu- 
tion is simply updated by the best between the local 
best solution and the neighboring best solution. Any 
replacement strategy may be applied on the local 
population by the set of received solutions. 


Traditional Parallel Models for EAs. Historically, 
the cooperative parallel model has been largely used 
in EAs [55.7]. In sequential genetic algorithms (the 
sequential model is known as the panmictic genetic 
algorithm), the selection takes place globally. Any indi- 
vidual can potentially reproduce with any other individ- 
ual of the population. Among the best-known parallel 
algorithmic-level models for evolutionary algorithms 
are the island model and the cellular model. In the 
island model (also known as the migration model, dis- 
tributed model, multideme EA, or coarse-grained EA) 
for genetic algorithms, the population is decomposed 
into several subpopulations distributed among different 
nodes (Fig. 55.1). Each node is responsible of the evo- 
lution of one subpopulation. It executes all the steps of 
a classical EA from the selection to the replacement on 
the subpopulation. Each island may use different pa- 
rameter values and different strategies for any search 
component such as selection, replacement, variation 
operators (mutation, crossover), and encodings. After 
a given number of generations (synchronous exchange), 


or when a condition holds (asynchronous exchange), 
the migration process is activated. Then, exchanges of 
some selected individuals between subpopulations are 
realized, and received individuals are integrated into the 
local subpopulation. The selection policy of emigrants 
indicates for each island in a deterministic or stochastic 
way the individuals to be migrated. The stochastic or 
random policy does not guarantee that the best individ- 
uals will be selected, but its associated computation cost 
is relatively lower. The deterministic strategy allows the 
selection of the best individuals. The number of emi- 
grants can be expressed as a fixed or variable number 
of individuals, or through a percentage of individuals 
from the population. The choice of the value of such pa- 
rameter is crucial. Indeed, if the number of emigrants is 
low, the migration process will be less efficient as the is- 
lands will have the tendency to evolve in an independent 
way. Conversely, if the number of emigrants is high, the 
EAs will likely converge to the same solutions [55.8]. In 
EAs, the replacement/integration policy of immigrants 
indicates in a stochastic or deterministic way the local 
individuals to be replaced by the newcomers. The ob- 
jective of the model is to delay the global convergence 
and encourage diversity [55.9, 10]. 

The other well-known parallel model for EAs, the 
cellular model (also known as the diffusion or fine- 
grained model), may be seen as a special case of the 
island model where an island is composed of a sin- 
gle individual. Traditionally, an individual is assigned 
to a cell of a grid. The selection occurs in the neigh- 
borhood of the individual [55.11—13]. Hence, the se- 
lection pressure is less important than in sequential 
EAs. The overlapped small neighborhood in cellular 
EAs helps exploring the search space because a slow 
diffusion of solutions through the population provides 
a kind of exploration, while exploitation takes place 
inside each neighborhood. Cellular models applied to 
complex problems can have a higher convergence prob- 
ability to better solutions than panmictic EAs [55.14, 
15]. 


Parallel Evolutionary Combinatorial Optimization 


55.2 Parallel Design of EAs 


Selection Solutions 
reproduction to evaluate 


FIFO 


Replacement 


55.2.2 Iteration-Level Parallel Model 


Evaluated 
solutions 


FIFO 


In this parallel model, a focus is made on the paral- 
lelization of each iteration of EAs. The iteration-level 
parallel model is generally based on the distribution 
of the handled solutions. Indeed, the most resource- 
consuming part in an EA is the evaluation of the 
generated solutions. Our concerns in this model are 
only search mechanisms that are problem-independent 
operations such as the generation of successive pop- 
ulations. Any search operator of an EA which is not 
specific to the tackled optimization problem is involved 
in the iteration-level parallel model. This model keeps 
the sequentiality of the original algorithm, and, hence, 
the behavior of the EA is not altered. 

Parallel iteration level models arise naturally when 
dealing with EAs, since each element belonging to 
the population is an independent unit. The iteration- 
level parallel model involves the distribution of the 
population. The operations commonly applied to 
each of the population elements are performed in 
parallel. 

The population of individuals can be decomposed 
and handled in parallel. In the beginning of the paral- 
lelization of EAs the well-known master-worker (also 
known as global parallelization) method was used. In 
this scheme, a master performs the selection operations 
and the replacement. The selection and replacement 
are generally sequential procedures, as they require 
a global management of the population. The associ- 
ated workers perform the recombination, mutation and 
the evaluation of the objective function. The master 
sends the partitions (subpopulations) to the workers. 
The workers return back newly evaluated solutions to 
the master. 

According to the order in which the evaluation 
phase is performed in comparison with the other parts 
of the EA, two modes can be distinguished: 


Fig. 55.2 Parallel asynchronous eval- 
uation of a population 


Parallel evaluators 


@ Synchronous: In the synchronous mode, the worker 
manages the evolution process and performs in 
a serial way the different steps of selection and re- 
placement. At each iteration, the master distributes 
the set of new generated solutions among the work- 
ers and waits for the results to be returned back. 
After the results are collected, the evolution process 
is restarted. The model does not change the behav- 
ior of the EA compared to a sequential model. The 
synchronous execution of the model is always syn- 
chronized with the return back of the last evaluated 
solution. 

@ Asynchronous: In the asynchronous mode, the eval- 
uation phase is not synchronized with the other parts 
of the EA. The worker does not wait for the return 
of all evaluations to perform the selection, reproduc- 
tion, and replacement steps. The steady-state EA is 
a good example illustrating the asynchronous model 
and its advantages. In the asynchronous model ap- 
plied to a steady-state EA, the recombination and 
the evaluation steps may be done concurrently. 
The master manages the evolution engine and two 
queues of individuals of a given fixed size: individ- 
uals to be evaluated, and solutions being evaluated. 
The individuals of the first queue wait for a free 
evaluating node. When the queue is full the pro- 
cess blocks. The individuals of the second queue 
are assimilated into the population as soon as pos- 
sible (Fig. 55.2). The reproduced individuals are 
stored in a FIFO data structure, which represents 
the individuals to be evaluated. The EA continues 
its execution in an asynchronous manner, without 
waiting for the results of the evaluation phase. The 
selection and reproduction phase are carried out un- 
til the queue of nonevaluated individuals is full. 
Each evaluator agent picks an individual from the 
data structure, evaluates it, and stores the results 
into another data structure storing the evaluated in- 


1111 


TSS | 3 Hed 


112 Part E | Evolutionary Computation 


and/or input/output intensive. Indeed, most of real-life 
optimization problems need the intensive calculation of 
the objectives and/or the access to large input files or 
databases. 

Two different solution-level parallel models may be 
carried out: 


Algorithmic-level {Th -#. fo +--+. 


independent or cooperative 
self-contained metaheuristics 


Neighborhood or 
population 
partitioning 


Iteration-level 
parallel handling of 
solutions or populations 


© Functional decomposition: In functional oriented 
parallelization, the objective function(s) and/or con- 
straints are partitioned into different partial func- 
tions. The objective function(s) or the constraints 
are viewed as the aggregation of some partial func- 
tions. Each partial function is evaluated in parallel. 
Then, a reduction operation is performed on the 
results returned back by the computed partial func- 


Solution-level 
parallel handling of 
a single solution 


Functional or 
data partitioning 


7°SS |3 Hed 


Fig. 55.3 Combination of the three parallel hierarchical models of 


dividuals. The order of evaluation defined by the 
selection phase may not be the same as in the re- 
placement phase. The replacement phase consists 
in receiving, in a synchronous manner, the results 
of the evaluated individuals, and applying a given 
replacement strategy of the current population. 


In some EAs (e.g., blackboard-based ones) some in- 
formation must be shared. For instance, in ant colony 
optimization (ACO), the pheromone matrix must be 
shared by all ants. The master has to broadcast the 
pheromone trails to each worker. Each worker handles 
an ant process. It receives the pheromone trails, con- 
structs a complete solution, and evaluates it. Finally, 
each worker sends back to the master the constructed 
and evaluated solution. When the master receives all 
the constructed solutions, it updates the pheromone 
trails [55.16—19]. 


55.2.3 Solution-Level Parallel Model 


In this model, problem-dependent operations per- 
formed on solutions are parallelized. In general, the 
interest here is the parallelization of the evaluation of 
a single solution (also called acceleration move par- 
allel model; objective and/or constraints). This model 
is particularly interesting when the objective function 
or the constraints are time and/or memory consuming, 


tions. By definition, this model is synchronous, so 
one has to wait the termination of all workers calcu- 
lating the partial functions. 

© Data partitioning: For some problems, the objec- 
tive function may require the access to a huge 
database that could not be managed on a single ma- 
chine. Due to a memory requirement constraint, the 
database is distributed among different sites, and 
data parallelism is exploited in the evaluation of the 
objective function. In data-oriented parallelization, 
the same identical function is computed on differ- 
ent partitions of the input data of the problem. The 
data is then partitioned or duplicated over different 
workers. 


In the solution-level parallel model, the maximum 
number of parallel operations will be equal to the num- 
ber of partial functions or the number of data partitions. 
A hybrid model can also be used in which a functional 
decomposition and a data partitioning are combined. 


55.2.4 Hierarchical Combination 
of the Parallel Models 


The three presented models for parallel EAs may 
be used in conjunction within a hierarchical struc- 
ture [55.20,21] (Fig. 55.3). The parallelism degree 
associated with this hybrid model is very important. In- 
deed, this hybrid model is very scalable; the degree of 
concurrency is k * m * n, where k is the number of EAs 
used, m is the size of the population, and n is the num- 
ber of partitions or tasks associated with the evaluation 
of a single solution. 


Parallel Evolutionary Combinatorial Optimization 


55.3 Parallel Implementation of EAs 


55.3 Parallel Implementation of EAs 


Parallel implementation of EAs deals with the efficient 
mapping of a parallel model of EAs on a given parallel 
architecture. 


55.3.1 Parallel and Distributed Architectures 


Parallel architectures are evolving quickly. The main 
criteria of parallel architectures, which will have an 
impact on the implementation of parallel EAs, are: 
memory sharing, homogeneity of resources, resource 
sharing by multiple users, scalability, and volatility 
(Fig. 55.4). Those criteria will be used to analyze differ- 
ent parallel models and their efficient implementation. 
A guideline is given for the efficient implementation of 
each parallel model of EAs according to each class of 
parallel architectures. 


Shared Memory/Distributed Memory Architectures. 
In shared memory parallel architectures, the proces- 
sors are connected by a shared memory. There are 
different interconnection schemes for the network (e.g., 
bus, crossbar, multistage crossbar). This architecture 
is easy to program. Conventional operating systems 
and programming paradigms of sequential program- 
ming can be used. There is only one address space for 
data exchange but the programmer must take care of 
synchronization in memory access, such as the mutual 
exclusion in critical sections. This type of architecture 
has a poor scalability (from 2 to 128 processors in cur- 
rent technologies) and a higher cost. An example of 
such shared memory architectures are symmetric multi- 
processors (SMPs) machines and multicore processors. 

In distributed memory architectures, each processor 
has its own memory. The processors are connected by 
a given interconnection network using different topolo- 


gies (e.g., hypercube, 2D or 3D torus, fat-tree, and 
multistage crossbars). This architecture is harder to pro- 
gram; data and/or tasks have to be explicitly distributed 
to processors. Exchanging information is also explicitly 
handled using message passing between nodes (syn- 
chronous or asynchronous communications). The cost 
of communication is not negligible and must be mini- 
mized to design an efficient parallel EA. However, this 
architecture has a good scalability in terms of the num- 
ber of processors. In recent years, clusters of processors 
(COWs) became one of the most popular parallel dis- 
tributed memory architectures. A good ratio between 
cost and performance is obtained with this class of ar- 
chitectures. 


Homogeneous/Heterogenous Parallel Architec- 
tures. Parallel architectures may be characterized by 
the homogeneity of the used processors, communica- 
tion networks, operating systems, etc. For instance, 
COWs are in general homogeneous parallel archi- 
tectures. The proliferation of powerful workstations 
and fast communication networks have shown the 
emergence of heterogeneous networks of workstations 
(NOWs) as platforms for high-performance computing. 
This type of architecture is present in any laboratory, 
company, campus, institution, etc. These parallel 
platforms are generally composed of an important 
number of owned heterogeneous workstations shared 
by many users. 


Shared/Nonshared Parallel Architectures. Most 
massively parallel machines (MPP) and clusters of 
workstations (COWs) are generally nonshared by the 
applications. Indeed, at a given time, the processors 
composing those architectures are dedicated to the 


Target architectures for parallel metaheuristics 


SS noS 


Dedicated architectures 


a 


Reconfigurable GPU (graphical processing unit) 
architectures FPGA 


General-purpose parallel architectures 


5 “a 
- Shared memory 


A - Homogeneous 
- Distributed memory ,.” - Heterogeneous 
5 `a 
- Local network - Volatile 
- Large network - Nonvolatile 


Fig. 55.4 Hierarchical and flat classification of target parallel architectures for EAs 


1113 


ESS | J Hed 


114 = =PartE 


Evolutionary Computation 


ESS | J Hed 


Table 55.2 Characteristics of the main parallel architectures. Hom: Homogeneous, Het: Heterogeneous 


Criteria Memory Homogeneity 
SMP Multicore Shared Hom 

COW Distributed Hom or Het 
NOW Distributed Het 

HPC Grid Distributed Het 

Desktop grid Distributed Het 


execution of a single application. NOWs constitute 
a low-cost hardware alternative to run parallel algo- 
rithms but are in general shared by multiple users and 
applications. 


Local Network (LAN)/Wide-Area Network (WAN). 
Massively parallel machines, clusters, and local net- 
works of workstations may be considered as tightly 
coupled architectures. Large networks of workstations 
and grid computing platforms are loosely coupled and 
are affected by a higher cost of communication. Dur- 
ing the last decade, grid computing systems have been 
largely deployed to provide high-performance comput- 
ing platforms. A computational grid is a scalable pool of 
heterogeneous and dynamic resources geographically 
distributed across multiple administrative domains and 
owned by different organizations [55.22]. Two types of 
grids may be distinguished: 


© High-Performance Computing Grid (HPC grid): 
This grid interconnect supercomputers or clusters 
via a dedicated high-speed network. In general, this 
type of grid is nonshared by multiple users (at the 
level of processors). 

@ Desktop Grid: This class of grids is composed 
of numerous owned workstations connected via 
nondedicated network such as the internet. This grid 
is volatile and shared by multiple users and applica- 
tions. 


Peer-to-peer networks have been developed in par- 
allel to grid computing technologies. Peer-to-peer in- 
frastructures have been focused on sharing data and are 
increasingly popular for sharing computation. 


Volatile/Nonvolatile Parallel Architectures. Desk- 
top grids constitute an example of volatile parallel 
architectures. In a volatile parallel architecture, there 
is a dynamic temporal and spatial availability of re- 
sources. In a desktop grid or a large network of shared 
workstations, volatility is not an exception but a rule. 
Due to the large-scale nature of the grid, the probability 
of resource failure is high. For instance, desktop grids 


Sharing Network Volatility 
Yes or No Local No 
No Local No 
Yes Local Yes 
No Large No 
Yes Large Yes 


have a faulty nature (e.g., reboot, shutdown, and fail- 
ure). 

Table 55.2 recapitulates the characteristics of the 
main parallel architectures according to the presented 
criteria. Those criteria will be used to analyze the effi- 
cient implementation of the different parallel models of 
EAs. 


55.3.2 Dedicated Architectures 


Dedicated hardware represents programmable hard- 
ware or specific architectures that can be designed or 
reused to execute a parallel EA. The best-known ded- 
icated hardware is represented by field programmable 
gate arrays (FPGA) and GPU (Fig. 55.4). 

FPGAs are hardware devices that can be used to 
implement digital circuits by means of a programming 
process (do not confuse with evolvable hardware where 
the architecture is reconfigured using EAs) [55.23]. The 
use of the Xilinx’s FPGAs to implement different EAs 
is more and more popular. The design and the pro- 
totyping of a FPGA-based hardware board to execute 
parallel EAs may restrict the design of some search 
components. However, for some specific challenging 
optimization problems with a high use rate such as in 
bioinformatics, dedicated hardware may be a good al- 
ternative. 

GPU is a dedicated graphics rendering device for 
a workstation, personal computer, or game console. Re- 
cent GPUs are very efficient at manipulating computer 
graphics, and their parallel SIMD structure makes them 
more efficient than general-purpose CPUs for a range 
of complex algorithms [55.24]. The main companies 
producing GPUs are AMD (ATI Radeon series) and 
NVIDIA (NVIDIA Geforce series). The use of GPUs 
for an efficient implementation of EAs is a challenging 
issue [55.25]. 


55.3.3 Parallel Programming Environments 
and Middlewares 


The architecture of the target parallel machine strongly 
influences the choice of the parallel programming 


Parallel Evolutionary Combinatorial Optimization | 55.3 Parallel Implementation of EAs 1115 


Parallel programming environment (PPE) 


ee ee 


Shared memory PPE Distributed memory PPE 


pe ae, fey eee 


Multithreaded Compiler directives Message Remote procedural Object oriented 
programming passing call programming 
y y y i i 
- Pthread library - Open MP - Sockets - RPC - Proactive 
- Java threads - PVM - Java RMI - CORBA 

- MPI - Grid RPC 


Fig. 55.5 Main parallel programming languages, programming environments and middlewares 


model to use. There are two main parallel program- 
ming paradigms: shared memory and message passing 
(Fig. 55.5). 

Two main alternatives exist to program shared 
memory architectures: 


© Multithreading: A thread may be viewed as 
a lightweight process. Different threads of the same 
process share some resources and the same address 
space. The main advantages of multithreading are 
the fast context switch, the low resource usage, 
and the possible recovery between communication 
and computation. Each thread can be executed on 
a different processor or core. Multithreaded pro- 
gramming may be used within libraries such as the 
standard Pthreads library [55.26] or programming 
languages such as Java threads [55.27]. 

© Compiler directives: One of the standard shared 
memory paradigms is OpenMP and CUDA. It rep- 
resents a set of compiler directives interfaced with 
the languages Fortran, C and C++ [55.28]. Those di- 
rectives are integrated in a program to specify which 
sections of the program to be parallelized by the 
compiler. 


Distributed memory parallel programming envi- 
ronments are based mainly on the following three 
paradigms: 


@ Message passing: Message passing is probably the 
most widely used paradigm to program parallel 
architectures. In the message passing paradigm, 
processes of a given parallel program communi- 


cate by exchanging messages in a synchronous or 
asynchronous way. The well-known programming 
environments based on message passing are sockets 
and message passing interface (MPI). 

© Remote Procedure Call: Remote procedure call 
(RPC) represents a traditional way of program- 
ming parallel and distributed architectures. It allows 
a program to cause a procedure to execute on an- 
other processor. 

© Object-oriented models: As in sequential program- 
ming, parallel object oriented programming is a nat- 
ural evolution of RPC. A classical example of such 
a model is Java RMI (Remote Method Invocation). 


In the last decade, great work has been carried 
out on the development of grid middlewares. The 
Globus toolkit represents the de facto standard grid 
middleware. It supports the development of distributed 
service-oriented computing applications [55.29]. 

It is not easy to propose a guideline on which 
environment to use in programming a parallel EA. 
It will depend on the target architecture, the parallel 
model of EAs, and the user preferences. Some lan- 
guages are more system oriented such as C and C++. 
More portability is obtained with Java but the price 
is less efficiency. This tradeoff represents the classical 
efficiency/portability compromise. A Fortran program- 
mer will be more comfortable with OpenMP. RPC 
models are more adapted to implement services. Con- 
dor represents an efficient and easy way to implement 
parallel programs on shared and volatile distributed ar- 
chitectures such as large networks of heterogeneous 
workstations and desktop grids, where fault tolerance is 


ESS | J Hed 


116 PartE 


Evolutionary Computation 


ESS | J Hed 


Table 55.3 Parallel programming environments for differ- 
ent parallel architectures 


Architecture Examples of suitable programming 
environment 

SMP Multithreading library within an operating 
system (e.g., Pthreads) 

Multicore Multithreading within languages: Java 
OpenMP interfaced with C, C++ or 
Fortran 

COW Message passing library: MPI interfaced 
with C, C++, Fortran 

Hybrid MPI or Hybrid models: MPI/OpenMP, 

ccNUMA MPI/Multithreading 

NOW Message passing library: MPI interfaced 
with C, C++, Fortran 
Condor or object models (JavaRMI) 

HPC grid MPICH-G (Globus) or GridRPC models 


(Netsolve, Diet) 
Desktop grid Condor-G or object models (Proactive) 
ensured by a checkpoint/recovery mechanism. The use 
of MPI within Globus is more or less adapted to high- 
performance computing (HPC) grids. However, the user 
has to deal with complex mechanisms such as dynamic 
load balancing and fault tolerance. Table 55.3 presents 
a guideline depending on the target parallel architec- 
ture. 


55.3.4 Performance Evaluation 


For sequential algorithms, the main performance mea- 
sure is the execution time as a function of the input 
size. In parallel algorithms, this measure also depends 
on the number of processors and the characteristics of 
the parallel architecture. Hence, some classical perfor- 
mance indicators such as speedup and efficiency have 
been introduced to evaluate the scalability of parallel al- 
gorithms [55.30]. The scalability of a parallel algorithm 
measures its ability to achieve performance propor- 
tional to the number of processors. 

The speed-up Sy is defined as the time 7; it takes to 
complete a program with one processor divided by the 
time Ty it takes to complete the same program with N 
processors 


Ti 


Sy = =>. 
N Ty 


(55.1) 


One can use wall-clock time instead of CPU time. The 
CPU time is the time a processor spends in the exe- 
cution of the program, and the wall-clock time is the 
time of the whole program including the input and out- 


put. Conceptually the speed-up is defined as the gain 
achieved by parallelizing a program. If Sy > N (resp., 
Sy = N), a superlinear (resp., linear) speedup is ob- 
tained [55.14]. Mostly, a sublinear speedup Sy < N is 
obtained. This is due to the overhead of communica- 
tion and synchronization costs. The case Sy < 1 means 
that the sequential time is smaller than the parallel time 
which is the worst case. This will be possible if the com- 
munication cost is much higher than the execution cost. 

The efficiency Ey using N processors is defined as 
the speed-up Sy divided by the number of processors N. 


SN 
Ey = N (55.2) 
Conceptually the efficiency can be defined as how well 
N processors are used when the program is computed 
in parallel. An efficiency of 100% means that all of 
the processors are fully used all the time. For some 
large real-life applications, it is impossible to have the 
sequential time as the sequential execution of the al- 
gorithm cannot be performed. Then, the incremental 
efficiency Eyy may be used to evaluate the efficiency 
extending the number of processors from N to M pro- 
cessors 


N x Ey 


E = —_____, 55.3 
NM = TSE, (55.3) 


Different definitions of speedup may be used depend- 
ing on the definition of the sequential time reference T}. 
Asking what is the best measure is useless; there is no 
global dominance between the different measures. The 
choice of a given definition depends on the objective of 
the performance evaluation analysis. Then, it is impor- 
tant to specify clearly the choice and the objective of 
the analysis. 

The absolute speedup is used when the sequential 
time Tı corresponds to the best-known sequential time 
to solve the problem. Unlike other scientific domains 
such as numerical algebra where for some operations 
the best sequential algorithm is known, in EA search, 
it is difficult to identify the best sequential algorithm. 
So, the absolute speedup is rarely used. The relative 
speedup is used when the sequential time Tı corre- 
sponds to the parallel program executed on a single 
processor. 

Moreover, different stopping conditions may be 
used: 


© Fixed number of iterations: This condition is the 
most used to evaluate the efficiency of a parallel EA. 


Parallel Evolutionary Combinatorial Optimization 


55.3 Parallel Implementation of EAs 


Using this definition, a superlinear speedup is possi- 
ble Sy > N. This is due to the characteristics of the 
parallel architecture where there is more resources 
(e.g. size of main memory and cache) than in a sin- 
gle processor (Fig. 55.6a). For instance, the search 
memory of an EA executed on a single processor 
may be larger than the main memory of a single 
processor and then some swapping will be carried 
out, which represents an overhead in the sequential 
time. When using a parallel architecture, the whole 
memory of the EA may fit in the main memory of 
its processors, and then the memory swapping over- 
head will not occur. 

© Convergence to a solution with a given quality: This 
measure is interesting to evaluate the effectiveness 
of a parallel EA. It is only valid for parallel models 
of EAs based on the algorithmic level, which alters 
the behavior of the sequential EA. A superlinear 
speedup is possible and is due to the characteris- 
tics of the parallel search (Fig. 55.6b). Indeed, the 
order of searching different regions of the search 
space may be different from sequential search. The 
sequences of visited solutions in parallel and se- 
quential search are different. This is similar to the 
superlinear speedups obtained in exact search algo- 
rithms such as branch and bound (this phenomenon 
is called speedup anomaly) [55.31]. 


Most of evolutionary algorithms are stochastic algo- 
rithms (scatter search, if considered as an evolutionary 
algorithm, is a deterministic algorithm). When the stop- 
ping condition is based on the quality of the solution, 
one cannot use the speedup metric as defined previ- 
ously. The original definition may be extended to the 
average speedup 


_ EM) 
E(Ty) ` 


N (55.4) 


The same seed for the generation of random numbers 
must be used for a more fair experimental performance 
evaluation. 

The speedup metrics have to be reformulated for 
heterogeneous architectures. The efficiency metric may 
be used for this class of architectures. Moreover, it 
can be used for shared parallel machines with multiple 
users. 


55.3.5 Main Properties of Parallel EAs 


The performance of a parallel EA on a given par- 
allel architecture depends mainly on its granularity. 


a) Parallel architecture source: memory hierarchy 


P: Processor 
M: Main memory 


C: Cache 
| 
P1 | P2 Pn 
Search M2 Mn 
memory 
| C2 Cn 


| | 


Interconnection network 


b) Parallel search source: parallele search trajectories 


Objective 


Different initial solutions 


i Local search 1 


Local search n 


First local optima 


Local search 2 \ 


Search space 


Fig. 55.6a,b Superlinear speedups for a parallel EA. 
(a) Parallel architecture source. (b) Parallel search source 


The granularity of a parallel program is the amount of 
computation performed between two communications. 
It computes the ratio between the computation time 
and the communication time. The three parallel mod- 
els (algorithmic level, iteration level, and solution level) 
have a decreasing granularity from coarse-grained to 
fine-grained. The granularity indicator has an important 
impact on the speedup. The larger is the granularity the 
better is the obtained speedup. 

The degree of concurrency of a parallel EA is repre- 
sented by the maximum number of parallel processes at 
any time. This measure is independent from the target 
parallel architecture. It is an indication of the number 
of processors that can employed usefully by the parallel 
EA. 

Asynchronous communications and the recovery 
between computation and communication is also an 
important issue for a parallel efficient implementation. 


1117 


ESS | J Hed 


m8 PartE 


Evolutionary Computation 


ESS | J Hed 


Indeed, most of the actual processors integrate dif- 
ferent parallel elements such as arithmetic logic unit 
(ALU), floating point unit (FPU), graphical processing 
unit (GPU), direct memory access (DMA), etc. Most of 
the computing part takes part in cache. Hence, the ran- 
dom access memory (RAM) bus is often free and can 
be used by other elements such as the DMA. Hence, in- 
put/output operations can be recovered by computation 
tasks. 

Scheduling different tasks composing a parallel EA 
is another classical issue to deal with for their efficient 
implementation. Different scheduling strategies may be 
used depending on whether the number and the location 
of works (tasks, data) depend or not on the load state of 
the target machine: 


@ Static scheduling: This class represents parallel EAs 
in which both the number of tasks of the application 
and the location of work (tasks, data) are gener- 
ated at compile time. Static scheduling is useful 
for homogeneous, and nonshared and nonvolatile 
heterogeneous parallel architectures. Indeed, when 
there are noticeable load or power differences be- 
tween processors, the search time of an iteration is 
derived by the maximum execution time over all 
processors, presumably on the most highly loaded 
processor or the least powerful processor. A signifi- 
cant number of tasks are often idle waiting for other 
tasks to complete their work. 

© Dynamic scheduling: This class represents paral- 
lel EAs for which the number of tasks is fixed at 
compile time, but the location of work is deter- 
mined and/or changed at run time. The tasks are 
dynamically scheduled on different processors of 
the parallel architecture. Dynamic load balancing 
is important for shared (multiuser) architectures, 
where the load of a given processor cannot be de- 
termined at compile time. Dynamic scheduling is 
also important for irregular parallel EAs in which 
the execution time cannot be predicted at compile 
time and varies during the search. For instance, this 
happens when the evaluation cost of the objective 
function depends on the solution. 

Many dynamic load-balancing strategies may be ap- 
plied. For instance, during the search, each time 
a processor finishes its work, it proceeds to a work- 
demand. The degree of parallelism of this class of 
scheduling algorithms is not related to load varia- 
tions in the target machine. When the number of 
tasks exceeds the number of idle nodes, multiple 
tasks are assigned to the same node. Moreover, 


when there are more idle nodes than tasks, some of 
them will not be used. 

© Adaptive scheduling: Parallel adaptive algorithms 
are parallel computations with a dynamically 
changing set of tasks. Tasks may be created or killed 
as a function of the load state of the parallel ma- 
chine. A task is created automatically when a node 
becomes idle. When a node becomes busy, the task 
is killed. Adaptive load balancing is important for 
volatile architectures such as desktop grids. 


For some parallel and distributed architectures such 
as shared networks of workstations and grids, fault tol- 
erance is an important issue. Indeed, in volatile shared 
architectures and large-scale parallel architectures, the 
fault probability is relatively important. Checkpoint- 
ing and recovery techniques constitute one answer to 
this problem. Application-level checkpointing is much 
more efficient than system-level checkpointing. In- 
deed, in system-level checkpointing, a checkpoint of 
the global state of a distributed application composed 
of a set of processes is carried out. In application- 
level checkpointing, only minimal information will be 
checkpointed (e.g., population of individuals, genera- 
tion number). Compared to system-level checkpointing, 
a reduced cost is then obtained in terms of memory and 
time. 

Finally, security issues may be important for large- 
scale distributed architectures such as grids and peer- 
to-peer systems (multidomain administration, firewall, 
etc.) and some specific applications such as medical and 
bioinformatics research applications of industrial con- 
cern. 


55.3.6 Algorithmic-Level Parallel Model 


Granularity 

The algorithmic-level parallel model has the largest 
granularity. Indeed, the time for exchanging the infor- 
mation is in general much less than the computation 
time of an EA. There are relatively low communica- 
tion requirements for this model. The more important 
is the frequency of exchange and the size of exchanged 
information, the smaller is the granularity. This paral- 
lel model is the most suited to large-scale distributed 
architectures over internet such as grids. Moreover, the 
trivial model with independent algorithms is convenient 
for low-speed networks of workstations over intranet. 
As there is no essential dependency and communica- 
tion between the algorithms, the speedup is generally 
linear for this parallel model. 


Parallel Evolutionary Combinatorial Optimization 


55.3 Parallel Implementation of EAs 


For an efficient implementation, the frequency of 
exchange (resp., the size of the exchanged data) must 
be correlated to the latency (resp., bandwidth) of the 
communication network of the parallel architecture. 

To optimize the communication between proces- 
sors, the exchange topology can be specified according 
to the interconnection network of the parallel archi- 
tecture. The specification of the different parameters 
associated with the blind or intelligent migration deci- 
sion criterion (migration frequency/probability and im- 
provement threshold) is particularly crucial on a com- 
putational grid. Indeed, due to the heterogeneous nature 
of computational grids these parameters must be spec- 
ified for each EA in accordance with the machine it is 
hosted on. 


Scalability 
The degree of concurrency of the algorithmic-level par- 
allel model is limited by the number of EAs involved in 
solving the problem. In theory, there is no limit. How- 
ever, in practice, it is limited by the owned resources of 
the target parallel architectures, and also by the effec- 
tiveness aspect of using a large number of EAs. 


Synchronous Versus Asynchronous 

Communications 
The implementation of the algorithmic level model is 
either asynchronous or synchronous. The asynchronous 
mode associates with each EA an exchange decision 
criterion, which is evaluated at each iteration of the EA 
from the state of its memory. If the criterion is satisfied, 
the EA communicates with its neighbors. The exchange 
requests are managed by the destination EAs within an 
undetermined delay. The reception and integration of 
the received information is thus performed during the 
next iterations. However, in a computational grid con- 
text, due to the material and/or software heterogeneity 
issue, the EAs could be at different evolution stages 
leading to the noneffect and/or supersolution problem. 
For instance, the arrival of poor solutions at a very ad- 
vanced stage will not bring any contribution as these 
solutions will likely not be integrated. In the opposite 
situation, the cooperation will lead to premature con- 
vergence. 

From another point of view, as it is nonblocking, 
the model is more efficient and fault tolerant to such 
a degree a threshold of wasted exchanges is not ex- 
ceeded. In the synchronous mode, the EAs perform 
a synchronization operation at a predefined iteration by 
exchanging some data. Such operation guarantees that 
the EAs are at the same evolution stage, and so prevents 


the noneffect and supersolution problem quoted before. 
However, in heterogeneous parallel architectures, the 
synchronous mode is less efficient in term of consumed 
CPU time. Indeed, the evolution process is often hang- 
ing on powerful machines waiting the less powerful 
ones to complete their computation. The synchronous 
model is also not fault tolerant as a fault of a single EA 
implies the blocking of the whole model in a volatile 
environment. Then, the synchronous mode is globally 
less efficient on a computational grid. 

Asynchronous communication is more efficient 
than synchronous communication for shared architec- 
tures such as NOWs and desktop grids (e.g., multiple 
users, multiple applications). Indeed, as the load of 
networks and processors is not homogeneous, the use 
of synchronous communication will degrade the per- 
formances of the whole system. The least powerful 
machine will determine the performance. 

On a volatile computational grid, it is difficult to ef- 
ficiently maintain topologies such as rings and torus. 
Indeed, the disappearance of a given node (i. e., EA) re- 
quires a dynamic reconfiguration of the topology. Such 
reconfiguration is costly and makes the migration pro- 
cess inefficient. Designing a cooperation between a set 
of EAs without any topology may be considered. For 
instance, a communication scheme in which the target 
EA is selected randomly is more efficient for volatile 
architecture such as desktop grids. Many experimental 
results show that such topology allows a significant im- 
provement of the robustness and quality of solutions. 
The random topology is therefore thinkable and even 
commendable in a computational grid context. 


Scheduling 
Concerning the scheduling aspect, in the algorithmic- 
level parallel model the tasks correspond to EAs. 
Hence, the different scheduling strategies will differ as 
follows: 


@ Static scheduling: The number of EAs is constant 
and correlated to the number of processors of the 
parallel machine. A static mapping between the EAs 
and the processors is realized. The localization of 
EAs will not change during the search. 

© Dynamic scheduling: EAs are dynamically sched- 
uled on different processors of the parallel architec- 
ture. Hence, the migration of EAs during the search 
between different machines may happen. 

© Adaptive scheduling: The number of EAs involved 
into the search will vary dynamically. For exam- 
ple, when a machine becomes idle, a new EA is 


1119 


ESS | J Hed 


120 PartE 


Evolutionary Computation 


€°SS | J Hed 


launched to perform a new search. When a ma- 
chine becomes busy or faulty, the associated EA is 
stopped. 


Fault Tolerance 
The memory state of the algorithmic-level parallel 
model required for the checkpointing mechanism is 
composed of the memory of each EA and the in- 
formation being migrated (i. e., population, generation 
number). 


55.3.7 Iteration-Level Parallel Model 


Granularity 

A medium granularity is associated with the iteration- 
level parallel model. The ratio between the evaluation 
of a partition and the communication cost of a parti- 
tion determines the granularity. This parallel model is 
then efficient if the evaluation of a solution is time- 
consuming and/or there are a large number of candidate 
solutions to evaluate. The granularity will depend on the 
number of solutions in each subpopulation. 


Scalability 
The degree of concurrency of this model is limited by 
the size of the population. The use of large populations 
will increase the scalability of this parallel model. 


Synchronous Versus Asynchronous 

Communications 
Introducing asynchronism in the iteration-level parallel 
model will increase the efficiency of parallel EAs. In the 
iteration-level parallel model, asynchronous communi- 
cations are related to the asynchronous evaluation of 
partitions and construction of solutions. Unfortunately, 
this model is more or less synchronous. Asynchronous 
evaluation is more efficient for heterogeneous or shared 
or volatile parallel architectures. Moreover, asynchro- 
nism is necessary for optimization problems where the 
computation cost of the objective function (and con- 
straints) depends on the solution and different solutions 
may have different evaluation cost. 

Asynchronism may be introduced by relaxing the 
synchronization constraints. For instance, steady-state 
algorithms may be used in the reproduction phase. 

The two main advantages of the asynchronous 
model over the synchronous model are fault tolerance 
and robustness if the fitness computation takes very 
different computations time. Whereas some time-out 
detection can be used to address the former issue, the 
latter one can be partially overcome if the grain is set 


to very small values, as individuals will be sent out for 
evaluations upon request of the workers. Therefore, the 
model is blocking and, thus, less efficient on a hetero- 
geneous computational grid. Moreover, as the model 
is not fault tolerant, the disappearance of an evaluat- 
ing agent requires the redistribution of its individuals to 
other agents. As a consequence, it is essential to store 
all the solutions not yet evaluated. The scalability of the 
model is limited to the size of the population. 


Scheduling 
In the iteration-level parallel model, tasks correspond to 
the construction/evaluation of a set of solutions. Hence, 
the different scheduling strategies will differ as follows: 


@ Static scheduling: Here, a static partitioning of the 
population is applied. For instance, the population 
is decomposed into equal size partitions depend- 
ing on the number of processors of the parallel 
homogeneous nonshared machine. A static map- 
ping between the partitions and the processors is 
realized. For a heterogeneous nonshared machine, 
the size of each partition must be initialized ac- 
cording to the performance of the processors. The 
static scheduling strategy is not efficient for vari- 
able computational costs of equal partitions. This 
happens for optimization problems where different 
costs are associated with the evaluation of solutions. 
For instance, in genetic programming individuals 
may widely vary in size and complexity. This makes 
a static scheduling of the parallel evaluation of the 
individuals not efficient [55.32, 33]. 

© Dynamic scheduling: A static partitioning is applied 
but a dynamic migration of tasks can be carried out 
depending on the varying load of processors. The 
number of tasks generated may be equal to the size 
of the population. Many tasks may be mapped on 
the same processor. Hence, more flexibility is ob- 
tained for the scheduling algorithm. For instance, 
the approach based on the master-workers cycle 
stealing may be applied. To each worker is first al- 
located a small number of solutions. Once it has 
performed its iterations, the worker requests from 
the master additional solutions. All the workers are 
stopped once the final result is returned. Faster and 
less loaded processors handle more solutions than 
the others. This approach allows us to reduce the 
execution time compared to the static one. 

© Adaptive scheduling: The objective in this model 
is to adapt the number of partitions generated 
to the load of the target architecture. More effi- 


Parallel Evolutionary Combinatorial Optimization 


55.3 Parallel Implementation of EAs 


cient scheduling strategies are obtained for shared, 
volatile, and heterogeneous parallel architectures 
such as desktop grids. 


Fault Tolerance 
The memory of the iteration-level parallel model re- 
quired for the checkpointing mechanism is composed 
of different partitions. The partitions are composed of 
a set of (partial) solutions and their associated objective 
values. 


55.3.8 Solution-Level Parallel Model 


Granularity 
This parallel model has a fine granularity. There is 
a relatively high communication requirements for this 
model. In the functional decomposition parallel model, 
the granularity will depend on the ratio between the 
evaluation cost of the subfunctions and the commu- 
nication cost of a solution. In the data decomposition 
parallel model, it depends on the ratio between the eval- 
uation of a data partition and its communication cost. 

The fine granularity of this model makes it less suit- 
able for large-scale distributed architectures where the 
communication cost (in terms of latency and/or band- 
width) is relatively important, such as in grid computing 
systems. Indeed, its implementation is often restricted 
to clusters or network of workstations or shared mem- 
ory machines. 


Scalability 
The degree of concurrency of this parallel model is 
limited by the number of subfunctions or data parti- 
tions. Although its scalability is limited, the use of the 
solution-level parallel model in conjunction with the 
two other parallel models enables to extend the scala- 
bility of a parallel EA. 


Synchronous Versus Asynchronous 

Communications 
The implementation of the solution-level parallel model 
is always synchronous following a master-workers 
paradigm. Indeed, the master must wait for all partial 
results to compute the global value of the objective 
function. The execution time T will be bounded by 
the maximum time 7; of the different tasks. An excep- 
tion occurs for hard-constrained optimization problems, 
where feasibility of the solution is first tested. The mas- 
ter terminates the computations as soon as a given task 
detects that the solution does not satisfy a given hard 
constraint. Due to its heavy synchronization steps, this 


parallel model is worth applying to problems in which 
the calculations required at each iteration are time con- 
suming. The relative speedup may be approximated as 
follows: 

T 


a+T/n’ 


where @ is the communication cost. 


Si = (55.5) 


Scheduling 
In the solution-level parallel model, tasks correspond 
to subfunctions in the functional decomposition and to 
data partitions in the data decomposition model. Hence, 
different scheduling strategies will differ as follows: 


© Static scheduling: Usually, the subfunctions or data 
are decomposed into equal size partitions depending 
on the number of processors of the parallel machine. 
A static mapping between the subfunctions (or data 
partitions) and the processors is applied. As for the 
other parallel models, this static scheme is efficient 
for parallel homogeneous nonshared machines. For 
a heterogeneous nonshared machine, the size of 
each partition in terms of subfunctions or data must 
be initialized according to the performance of the 
processors. 

@ Dynamic scheduling: Dynamic load balancing will 
be necessary for shared parallel architectures or 
variable costs for the associated subfunctions or 
data partitions. Dynamic load balancing may be eas- 
ily achieved by evenly distributing at run time the 
subfunctions or the data among the processors. In 
optimization problems, where the computing cost 
of the subfunctions is unpredictable, dynamic load 
balancing is necessary. Indeed, a static scheduling 
cannot be efficient because there is no appropri- 
ate estimation of the task costs (i. e., unpredictable 
costs). 

© Adaptive scheduling: In adaptive scheduling, the 
number of subfunctions or data partitions gener- 
ated is adapted to the load of the target architecture. 
More efficient scheduling strategies are obtained for 
shared, volatile and heterogeneous parallel architec- 
tures such as desktop grids. 


Fault Tolerance 
The memory of the solution-level parallel model re- 
quired for the checkpointing mechanism is straightfor- 
ward. It is composed of the solution and its partial 
objective value calculations. 

Depending on the target parallel architecture, Ta- 
ble 55.4 presents a general guideline for the efficient 


1121 


ESS | J Hed 


1122 


1° | J Wed 


Part E 


Evolutionary Computation 


Table 55.4 Efficient implementation of parallel EAs according to some performance metrics and used strategies 


Medium (nb. of solutions per 
Neighborhood size, popula- 


Moderate (eval. of solutions) 


Solution level 

Fine (eval. subfunctions, eval. data 
partitions) 

Nb. of subfunctions, nb. data partitions 


Exceptional (feasibility test) 
Partial solution(s) 


Property Algorithmic level Iteration level 
Granularity Coarse (frequency of ex- 

change, size of information) partition) 
Scalability Number of EAs 

tions size 

Asynchronism High (information exchange) 
Scheduling and fault EA Solution(s) 
tolerance 


implementation of the different parallel models of EAs. 
For each parallel model (algorithmic level, iteration 
level, and solution level), the table shows its char- 


55.4 Parallel EAs Under ParadisEO 


Designing generic software frameworks to deal with 
the design and efficient transparent implementation 
of parallel and distributed EAs is an important chal- 
lenge. Indeed, efficient implementation of parallel EAs 
is acomplex task, which depends on the type of the par- 
allel architecture used. In designing a software frame- 
work for parallel EAs, one has to keep in mind the 
following important properties: portability, efficiency, 
easiness of use, and flexibility in terms of parallel ar- 
chitectures and models. 

Several white-box frameworks for the reusable de- 
sign of parallel EAs have been proposed and are avail- 
able from the Web. The most important of them are: 
DREAM (distributed resource evolutionary algorithm 
machine) [55.34], ECJ (Java evolutionary computa- 
tion) [55.35], JDEAL (Java distributed evolutionary 
algorithms library) and Distributed BEAGLE (dis- 
tributed Beagle engine advanced genetic learning en- 
vironment) [55.36]. These frameworks are reusable as 
they are based on a clear object-oriented conceptual 
separation. They are also portable as they are devel- 
oped in Java, an exception is the last system, which 
is programmed in C++. However, they are limited 
regarding the parallel distributed models. Indeed, in 
DREAM and ECJ only the island model is implemented 
using Java threads and TCP/IP sockets. DREAM is par- 
ticularly deployable on peer-to-peer platforms. Further- 
more, JDEAL provides only the master-worker model 
(iteration-level parallel model) using TCP/IP sockets. 
The latter also designs the synchronous migration- 
based island model, but implemented on a single 
processor. 

Few frameworks available on the Web are devoted 
to EAs, and their hybridization. MALLBA [55.37], 


acteristics according to the outlined criteria (granu- 
larity, scalability, asynchronism, scheduling and fault 
tolerance). 


MAFRA (Java MuimeticAlgorithms Framework) 
[55.38] and ParadisEO are good examples of such 
frameworks. MAFRA is developed in Java using 
design patterns [55.39]. It is strongly hybridization- 
oriented, but it is very limited regarding parallelism 
and distribution. MALLBA and ParadisEO have nu- 
merous similarities. They are C+-+/MPI open source 
frameworks. They provide all the previously presented 
distributed models, and different hybridization mecha- 
nisms. However, they are quite different as ParadisEO 
is more flexible thanks to the finer granularity of 
its classes. Moreover, ParadisEO also provides the 
MPI-based communication layer and Pthreads-based 
multithreading. MALLBA is deployable on wide area 
networks using NetStream, a message passing service 
upon MPI [55.37]. ParadisEO is deployable on grid 
computing platforms using the Globus toolkit [55.21]. 

ParadisEO-PEO offers transparent implementation 
of the different parallel models on different archi- 
tectures using suitable programming environments. 
ParadisEO-PEO offers an easy implementation of the 
three main parallel models. The algorithmic-level par- 
allel model allows several optimization algorithms to 
cooperate and exchange any kind of data. The iteration- 
level parallel model proposes to parallelize and dis- 
tribute a set of identical operations. In the solution-level 
parallel model, any calculation block specific to the op- 
timization problem can be divided into smaller units to 
speed-up the treatment and gain efficiency. 

ParadisEO contains three interconnected mod- 
ules (Fig. 55.7): EO for evolutionary algorithms 
(population-based metaheuristics), MO for single 
solution-based metaheuristics (e.g., local search, tabu 
search simulated annealing), and MOEO for multi- 


Parallel Evolutionary Combinatorial Optimization 


55.5 Conclusions and Perspectives 


Paradiseo-PEO 


Paradiseo-MOEO 


Paradiseo-MO 


Paradiseo-EO 


5” 
Distributed-memory architectures: 
clusters, ... 


xy 
Shared-memory architectures: 
SMP, multi-cores, ... 


GPUs 


ld 
Networks of workstations 
desktop grids 


v 
High-performance grids 


Fig. 55.7 ParadisEO-PEO implementation under different parallel programming environments and middlewares 


objective evolutionary algorithms. ParadisEO offers 
transparency in the sense that the user has not to 
deal explicitly with parallel programming. One has 
just to instantiate the needed ParadisEO components. 
The implementation is portable on distributed-memory 
machines as well as on shared-memory multiproces- 
sors. The user has not to manage the communications 
and threads-based concurrency. Moreover, the same 
parallel design (i.e., the same program) is portable 


55.5 Conclusions and Perspectives 


Parallel and distributed computing can be used in the 
design and implementation of EAs to speedup the 
search, to improve the quality of the obtained solutions, 
to improve the robustness, and to solve large-scale prob- 
lems. The clear separation between parallel design and 
parallel implementation aspects of EAs is important to 
analyze parallel EAs. The most important lessons of 
this chapter can be summarized as follows: 


© In terms of parallel design, the different parallel 
models for mono-objective EAs have been uni- 
fied. Three hierarchical parallel models have been 
extracted: algorithmic level, iteration level, and so- 
lution level parallel models. 

© In terms of parallel implementation, the question of 
an efficient mapping of a parallel model of EAs on 
a given parallel architecture and programming envi- 
ronment (i. e., language, library, and middleware) is 
handled. The focus was made on the key criteria of 


over different architectures. Hence, ParadisEO-PEO 
has been implemented on different parallel program- 
ming environments and middlewares (MPI, Pthreads, 
Condor, Globus, CUDA) which are adapted to differ- 
ent target architectures (shared and distributed mem- 
ory, cluster and network of workstations, Desktop and 
high-performance grid computing platforms, GPUs) 
(Fig. 55.7). The deployment of the presented parallel 
and distributed models is transparent for the user. 


parallel architectures that influence the efficiency of 
an implementation of parallel EAs. 

@ The use of the ParadisEO-PEO software frame- 
work allows the parallel design of the different 
parallel models of EAs. It also allows their trans- 
parent and efficient implementation on different 
parallel and distributed architectures (e.g., clusters 
and networks of workstations, multicores, GPUs, 
high-performance computing and desktop grids) us- 
ing suitable programming environments (e.g., MPI, 
Threads, Globus, Condor, CUDA). 


One of the perspectives in the coming years is to 
achieve Petascale performance. The emergence of het- 
eregeneous platforms composed of multicore chips and 
many-core chips technologies will speedup the achieve- 
ment of this goal. In terms of programming models, 
cloud computing will become an important alterna- 
tive to traditional high-performance computing for the 


1123 


S'’SS | 3 Hed 


124 Part E 


Evolutionary Computation 


SS | 3 Hed 


development of large-scale EAs that harness massive 
computational resources. This is a great challenge as 
nowadays cloud frameworks for parallel EAs are just 
emerging. 

In the future design of high-performance comput- 
ers, the ratio between power and performance will be 
increasingly important. The power represents the elec- 
trical power consumption of the computer. An excess in 
power consumption uses unnecessary energy, generates 
waste heat and decreases reliability. Very few vendors 
of high-performance architecture publicize the power 
consumption data compared to the performance data 
(the web site www.green500.org ranks the top 500 ma- 


References 


55:1 E.-G. Talbi: Metaheuristics: From Design to Imple- 
mentation (Wiley, Hoboken 2009) 

H. Mühlenbein: Parallel genetic algorithms, pop- 
ulation genetics and combinatorial optimization, 
3rd Int. Conf. Genet. Algorithms (1989) pp. 416-421 
E. Alba, M. Tomassini: Parallelism and evolutionary 
algorithms, IEEE Trans. Evol. Comput. 6(5), 443-462 
(2002) 

E. Alba, E.-G. Talbi, G. Luque, N. Melab: Meta- 
heuristics and parallelism. In: Parallel Metaheuris- 
tics, ed. by E. Alba (Wiley, Hoboken 2005) 

J. Cohoon, S. Hedge, W. Martin, D. Richards: Punc- 
tuated equilibria: A parallel genetic algorithm, Sec- 
ond Int. Conf. Genet. Algorithms (1987) pp. 148-154 
T. Belding: The distributed genetic algorithm revis- 
ited, 6th Int. Conf. Genet. Algorithms (1995) 

E. Cantú-Paz: Efficient and Accurate Parallel Ge- 
netic Algorithms (Kluwer, Boston 2000) 

E. Alba, J.M. Troya: Influence of the migration pol- 
icy in parallel distributed GAs with structured and 
panmictic populations, Appl. Intell. 12(3), 163-181 
(2000) 

T. Hiroyasu, M. Miki, M. Negami: Distributed genetic 
algorithms with randomized migration rate, Proc. 
IEEE Conf. Systems, Man Cybern. 1 (1999) pp. 689- 
694 

S.-L. Lin, W.F. Punch, E.D. Goodman: Coarse-grain 
parallel genetic algorithms: Categorization and 
new approach, 6th IEEE Symp. Parallel Distrib. Pro- 
ces. (1994) pp. 28-37 

P. Spiessens, B. Manderick: A massively parallel 
genetic algorithm, Proc. 4th Int. Conf. Genet. Al- 
gorithms (1991) pp. 279-286 

G. von Laszewski, H. Mühlenbein: Partitioning 
a graph with parallel genetic algorithm, Lect. Notes 
Comput. Sci. 496, 165-169 (1990) 

E.G. Talbi, P. Bessière: A parallel genetic algorithm 
for the graph partitioning problem, Proc. 5th Int. 
Conf. Supercomput. (1991) pp. 312-320 


55.2 


55.3 


55.4 


55.95 


55.9 


55.10 


55.11 


55.12 


55.13 


chines using the number of megaflops they produce for 
each watt of power and complements the www.top500. 
org site). 

In terms of target optimization problems, parallel 
EAs constitute unavoidable approaches to solve large- 
scale real-life challenging problems (e.g., engineering 
design, drug design) [55.23]. They are also an important 
alternative to solve dynamic and robust optimization 
problems, in which the complexities in terms of time 
and quality are more difficult to handle by traditional 
sequential approaches. Moreover, parallel models for 
optimization and learning problems under the presence 
of uncertainty have to be deeply investigated. 


55.14 E.G. Talbi, P. Bessiére: Superlinear speedup of 
a parallel genetic algorithm on the supernode, 
SIAM News 24(4), 12-27 (1991) 

J.M. Ahuactzin, E.G. Talbi, P. Bessiére, E. Mazer: Us- 
ing genetic algorithms for robot motion planning, 
Lect. Notes Comput. Sci. 708, 84-93 (1993) 

K.F. Doerner, R.F. Hartl, G. Kiechle, M. Lucka, M. Rei- 
mann: Parallel ant systems for the capacited vehi- 
cle routing problem, Lect. Notes Comput. Sci. 3004, 
72-83 (2004) 

M. Rahoual, R. Hadji, V. Bachelet: Parallel ant 
system for the set covering problem, Lect. Notes 
Comput. Sci. 2463, 262-267 (2002) 

M. Randall, A. Lewis: A parallel implementation of 
ant colony ptimization, J. Parallel Distrib. Comput. 
62(9), 1421-1432 (2002) 

E.-G. Talbi, 0. Roux, C. Fonlupt, D. Robillard: Par- 
allel ant colonies for combinatorial optimization 
problems, Lect. Notes Comput. Sci. 1586, 239-247 
(1999) 

E.G. Talbi, S. Cahon, N. Melab: Designing cellular 
networks using a parallel hybrid metaheuristic on 
the computational grid, Comput. Commun. 30(4), 
698-713 (2007) 

N. Melab, S. Cahon, E.-G. Talbi: Grid computing for 
parallel bioinspired algorithms, J. Parallel Distrib. 
Comput. 66(8), 1052-1061 (2006) 

|. Foster, C. Kesselman (Eds.): The Grid: Blueprint 
for a New Computing Infrastructure (Morgan Kauf- 
mann, San Mateo 1999) 

R. Zeidman: Designing with FPGAs and CPLDs (CMP, 
Lawrence 2002) 

M. Pharr, R. Fernando: GPU Gems 2: Program- 
ming Techniques for High-Performance Graph- 
ics and General-Purpose Computation (Addison- 
Wesley, Upper Saddle River 2005) 

T.-V. Luong, N. Melab, E.-G. Talbi: Parallel hybrid 
evolutionary algorithms on GPU, IEEE Congr. Evol. 
Comput. (2010) pp. 1-8 


55.15 


55.16 


55.17 


55.18 


55.19 


55.20 


55.21 


55.22 


55.23 


55.24 


55.25 


Parallel Evolutionary Combinatorial Optimization 


References 


55.26 


55.27 


55.28 


55.29 


55.30 


55.31 


55.32 


55.33 


55.34 


D.R. Butenhof: Programming with POSIX Threads 
(Addison-Wesley, Upper Saddle River 1997) 

P. Hyde: Java Thread Programming (Sams, Indi- 
anapolis 1999) 

B. Chapman, G. Jost, R. VanderPas, D.J. Kuck: Us- 
ing OpenMP: Portable Shared Memory Parallel Pro- 
gramming (MIT, Cambridge 2007) 

B. Sotomayor, L. Childers: Globus Toolkit 4: Pro- 
gramming Java Services (Morgan Kaufmann, San 
Mateo 2005) 

V. Kumar, A. Grama, A. Gupta, G. Karypis: Intro- 
duction to Parallel Computing: Design and Analysis 
of Algorithms (Addison-Wesley, Upper Saddle River 
1994) 

E.-G. Talbi: Parallel Combinatorial Optimization 
(Wiley, Hoboken 2006) 

H. Juille, J.B. Pollack: Massively parallel genetic 
programming. In: Advances in Genetic Program- 
ming 2, ed. by P.J. Angeline, K.E. Kinnear Jr. (MIT, 
Cambridge 1996) pp. 339-358 

G. Folino, C. Pizzuti, G. Spezzano: CAGE: A tool for 
parallel genetic programming applications, Lect. 
Notes Comput. Sci. 2038, 64-73 (2001) 

M.G. Arenas, P. Collet, A.E. Eiben, M. Jelasity, 
J.J. Merelo, B. Paechter, M. Preuss, M. Schoenauer: 


55.35 


55.36 


55.37 


55.38 


55.39 


A framework for distributed evolutionary algo- 
rithms, Lect. Notes Comput. Sci. 2439, 665-675 
(2002) 

G.C. Wilson, A. McIntyre, M.I. Heywood: Resource 
review: Three open source systems for evolv- 
ing programs-Lilgp, EC) and grammatical evolu- 
tion, Genet. Program. Evol. Mach. 5(19), 103-105 
(2004) 

C. Gagné, M. Parizeau, M. Dubreuil: Distributed 
Beagle: An environment for parallel and dis- 
tributed evolutionary computations, Proc. 17th 
Ann. Int. Symp. High Perform. Comput. Syst. Appl. 
(2003) pp. 201-208 

E. Alba, F. Almeida, M. Blesa, C. Cotta, M. Diaz, 
|. Dorta, J. Gabarró, J. Gonzalez, C. León, L. Moreno, 
J. Petit, J. Roda, A. Rojas, F. Xhafa: MALLBA: A library 
of skeletons for combinatorial optimisation, Lect. 
Notes Comput. Sci. 2400, 927-932 (2002) 

N. Krasnogor, J. Smith: MAFRA: A Java memetic 
algorithms framework, Workshop Proc. GECCO 
(2002) 

E. Gamma, R. Helm, R. Johnson, J. Vlissides: 
Design Patterns, Elements of Reusable Object- 
Oriented Software (Addison-Wesley, Upper Saddle 
River 1994) 


1125 


SS | 3 Hed 


56. How to Create Generalizable Results 


Thomas Bartz-Beielstein 


Basically, this chapter tries to find answers for the 
following fundamental questions in experimental 
research. 


(Q-1) How can problem instances be generated? 
(Q-2) How can experimental results be generalized? 


The chapter is structured as follows. Sec- 
tion 56.2 introduces real-world and artificial 
optimization problems. Algorithms are described 
in Sect. 56.3. Objective functions and statistical 
models are introduced in Sect. 56.4; these models 
take problem and algorithm features into con- 
sideration. Section 56.5 presents case studies that 
illustrate our methodology. The chapter closes with 
a summary and an outlook. 


56.1 Test Problems 
in Computational Intelligence .............. 1127 
56.2 Features of Optimization Problems ....... 1128 


56.1 Test Problems in Computational 


Computational intelligence (CI) methods have gained 
importance in several real-world domains such as pro- 
cess optimization, system identification, data mining, 
or Statistical quality control. Tools to determine the ap- 
plicability of CI methods in these application domains 
in an objective manner are missing. Statistics provide 
methods for comparing algorithms on certain data sets. 
In the past, several test suites were presented and con- 
sidered as state of the art. However, these test suites 
have several drawbacks, namely: 


@ Problem instances are mostly artificial and have no 
direct link to real-world settings. 

© Since there is a fixed number of test instances, 
algorithms can be fitted or tuned to this specific 
and very limited set of test functions. As a conse- 
quence, studies (benchmarks) provide insight how 
these algorithms perform on this specific set of test 


56.2.1 Problem Classes 


dand TE CAMC BES os. cn dessaccecsenectaass sc 1128 
56.2.2 Feature Extraction 
and Instance Generation............ 1128 
56.3 Algorithm Features ..................:.::cccee 1130 
56.3.1 Factors and Levels............3..000.s08 1130 
56.3.2 Example: Evolution Strategy ....... 1130 
56.4 Objective Functions .....................c 1131 
56.5 Case SUGIES oosina se serdavaaceadetees 1133 
56.5.1 Single Problem Designs: 
SASP aiid MASP... ......cscdeesenedenesace 1233 
56.5.2 SAMP: Single Algorithm, 
Multiple Problems..................0. 1133 
56.5.3 MAMP: Multiple Algorithms, 
Multiple Problems..................... 1137 
56.6 Summary and Outlook ............0.......000. 1141 
References. osoei ce ae E aS ys 1142 
Intelligence 


instances, but no insight on how they perform in 
general. 

@ Statistical tools for comparisons of several algo- 
rithms on several test problem instances are rela- 
tively complex and not easy to analyze. 


We propose a methodology to overcome these dif- 
ficulties. This methodology, which generates problem 
classes rather than uses one instance, is constructed as 
follows: 


1. First, we pre-process the underlying real-world 
data. 

2. In a second step, features from these data are ex- 
tracted. This extraction relies on the assumption 
that mathematical variables can be used to represent 
real-world features. For example, decomposition 
techniques can be applied to model the underlying 
data structures, if we are using time-series data. The 


1127 


128 PartE 


Evolutionary Computation 


7°95 |3 Hed 


original time series is deconstructed into a number 
of component series, where each of these reflects 
a certain type of behavior, e.g., a trend or seasonal- 
ity [56.1]. We obtain an analytical model of the data. 

3. Then, we parameterize this model. Based on 
this parametrization and randomization, we can 
generate infinitely many new problem instances. 

4. If no real-world data are available, problem in- 
stances can be generated using test-problem gen- 
erators. The generation of test problems, which 
are well-founded and have practical relevance, has 
been an on-going field of research for several 
decades. 

5. From this infinite set, we can draw a limited num- 
ber of problem instances which will be used for the 
comparison. 

6. Since problem instances are selected randomly, we 
apply random and mixed models for the analy- 
sis [56.2]. Mixed models include fixed and ran- 
dom effects. A fixed effect is an unknown con- 
stant. Its estimation from the data is a common 
practice in analysis of variance (ANOVA) or re- 
gression. A random effect is a random variable. 
We estimate the parameters that describe its dis- 
tribution, because — in contrast to fixed effects — 


it makes no sense to estimate the random effect 
itself. 


This chapter combines ideas from two approaches: 
problem generation and statistical analysis of com- 
puter experiments. The work presented by Chiarandini 
and Goegebeur [56.3] provides the basis of our sta- 
tistical analysis. They present a systematic and well- 
developed framework for mixed models. Related mod- 
eling approaches were suggested by McGeoch [56.4] 
and Birattari [56.5]. Gallagher and Yuan [56.6] present 
a problem instance (landscape) generator that is pa- 
rameterized by a small number of parameters, and the 
values of these parameters have a direct and intuitive 
interpretation in terms of the geometric features of the 
landscapes that they produce. Castiñeiras et al. [56.7] 
present a parameterizable benchmark generator for bin 
packing instances based on the well-known Weibull dis- 
tribution. Using the shape and scale parameters of the 
Weibull distribution, the authors generate benchmarks 
that contain a variety of item size distributions. They 
report that for all bin capacities, the number of bins re- 
quired in an optimal solution increases as the Weibull 
shape parameter increases. Using this feature, scalabil- 
ity is enabled. 


56.2 Features of Optimization Problems 


56.2.1 Problem Classes and Instances 


Nowadays, it is common practice in optimization to 
choose a fixed set of problem instances in advance and 
to apply classical ANOVA or regression analysis. In 
many experimental studies a few problem instances 7r; 
(i= 1,2,...,q) are used and the results of some runs of 
the algorithms a; (j = 1,2,...,) on these instances are 
collected. The instances can be treated as blocks and all 
algorithms are run on each single instance. Results are 
grouped per instance z;. Analyses of these experiments 
shed some light on the performance of the algorithms 
on those specific instances. However, the interest of the 
researcher should not be just the performance of the al- 
gorithms on those specific instances chosen, but rather 
on the generalization of the results to the entire class JT. 
Generalizations about the algorithm’s performance on 
new problem instances are difficult or impossible in this 
setting. 

Based on ideas from Chiarandini and Goege- 
beur [56.3], to overcome this difficulty, we propose 
the following approach: a small set of problem in- 


stances {; € T|i=1,2,...,q} is chosen at random 
from a large set, or class JI, of possible instances of 
the problem. Problem instances are considered as factor 
levels. However, this factor is of a different nature from 
the fixed algorithmic factors in the classical ANOVA 
setting. Indeed, the levels are chosen at random and the 
interest is not in these specific levels but in the prob- 
lem class JI from which they are sampled. Therefore, 
the levels and the factor are random. Consequently, 
our results are not based on a limited, fixed number of 
problem instances. They are randomly drawn from an 
infinite set, which enables generalization. 


56.2.2 Feature Extraction 
and Instance Generation 


A problem class JI can be generated in different man- 
ners. We will consider artificial and natural problem 
class generators. Artificially generated problems allow 
feature generation based on some predefined charac- 
teristics. They are basically theory driven, i.e., the 
researcher defines certain features such as linearity or 


How to Create Generalizable Results 


56.2 Features of Optimization Problems 


multi modality. Based on these features, a model (for- 
mula) is constructed. By integrating parameters into this 
formula, many problem instances can be generated by 
parameter variation. We will exemplify this approach 
in the following paragraph. The second way, which will 
generate natural problem classes, uses a three-stage ap- 
proach. First, the real-word system and its components 
are described. Then, features are extracted from a real- 
world system. Based on this feature set, a model is 
defined. Adding parameters to this model, new problem 
instances can be generated. There is also a third way to 
generate test instances: if we are lucky, many data are 
available. In this case, we can sample a limited number 
of problem instances from the larger set of real-world 
data. The statistical analysis is similar for these three 
cases. 


Artificial Test Functions 

Several problem instance generators have been pro- 
posed over the last years. For example, Gallagher and 
Yuan present a landscape test generator, which can be 
used to set up problem instances for continuous, bound- 
constrained optimization problems [56.6]. The Max-set 
of Gaussian landscape generator (MSG) uses the max- 
imum of m weighted Gaussian functions 


G(x) = max (wig;(x)), 


i€1,2 


where g : R” — R denotes an n-dimensional Gaussian 
function 


o= (CPE e-w)\" 
í Ory 2S ' 


H is an n-dimensional vector of means, and » is an 
(nxn) covariance matrix. The mean of each Gaussian 
corresponds to an optimum on the landscape and the 
location of all optima is known. The global optimum 
is the one with the largest value. We will use the MSG 
problem instance generator in Sect. 56.5 to demonstrate 
our approach. 


Natural Problem Classes 
This section exemplifies the three fundamental steps for 
generating real-world problem instances, namely: 


1. Describing the real-world system and its data 
2. Feature extraction and model construction 
3. Instance generation. 


We will illustrate this procedure by using the classic 
Box and Jenkins airline data [56.8]. These data contain 


the monthly totals of international airline passengers 
from 1949 to 1961. The feature extraction is based on 
methods from time-series analysis. Because of its sim- 
plicity the Holt-Winters method is popular in many 
application domains. It is able to adapt to changes in 
trends and seasonal patterns. The Holt—Winters predic- 
tion function requires the estimation of three param- 
eters, i.e., a, 8 and y, which can be estimated from 
original time-series data. Their optimal values are deter- 
mined by minimizing the squared one-step prediction 
error. To generate new problem instances, these param- 
eters can be slightly modified. Based on these modified 
values, the model is re-fitted. Finally, we can extract the 
new time series. One typical result from this instance 
generation is shown in Fig. 56.1. Bartz-Beielstein [56.9] 
describes this procedure in detail. 

To illustrate the wide applicability of this approach, 
we will list further real-work problem domains, which 
are subject of our current research: 


@ Smart metering: The development of accurate fore- 
casting methods for electrical energy consumption 
profiles is an important task. We consider time se- 
ries collected from a manufacturing process. Each 
time series contains quarter-hourly samples of the 
energy consumption of a bakery. A detailed data de- 
scription can be found in [56.10]. 

© Water industry: Canary is a software developed by 
the United States Environmental Protection Agency 
(US EPA) and Sandia National Laboratories. Its 
purpose is to detect events in the context of wa- 


Airline passengers 
A 
600 


500 
400 
300 


200 


100 


1950 1952 1954 1956 1958 


Fig. 56.1 Holt—Winters problem instance generator. The solid line 
represents the real data, the dotted line predictions from the Holt- 
Winters model and the fine dotted line modified predictions, respec- 


tively 


1960 


Time 


1129 


79S |3 Hed 


130 PartE 


Evolutionary Computation 


€°9¢ |3 Hed 


ter contamination. An event is in this context de- 
fined as a certain time period where a contaminant 
significantly deteriorates the water quality. Dis- 
tinguishing events from (i) background changes, 
(Gii) maintenance and modification due to operation, 
and (iii) outliers is an essential task, which was 
implemented in the Canary software. Therefore, 
deviations are compared to regular patterns and 
short term changes. The corresponding data con- 
tains multi-variate time-series data. It is a selection 
from a larger dataset shipped with the open source 
event-detection software CANARY developed by 
US EPA and Sandia National Laboratories [56.11]. 
© Finance: The data are real-world data from intra- 
day foreign exchange (FX) trading. The FX market 
is a financial market for trading currencies to en- 
able international trade and investment. It is the 
largest and most liquid financial market in the 
world. Currencies can be traded via a wide variety 
of different financial instruments, ranging from sim- 
ple spot trades over to highly complex derivatives. 


56.3 Algorithm Features 


56.3.1 Factors and Levels 


Evolutionary algorithms (EA) belong to the large class 
of bio-inspired search heuristics. They combine specific 
components, which may be qualitative, like the recom- 
bination operator or quantitative, like the population 
size. Our interest is in understanding the contribution 
of these components. In statistical terms, these compo- 
nents are called factors. The interest is in the effects 
of the specific levels chosen for these factors. Hence, 
we say that the levels and, consequently, the factors 
are fixed. Although modern search techniques like se- 
quential parameter optimization or Pareto genetic pro- 
gramming [56.13] allow multi-objective performance 
measures (solution quality versus variability or descrip- 
tion length), we restrict ourselves to analyzing the effect 
of these factors on a univariate measure of performance. 
We will use the quality of the solutions returned by the 
algorithm at termination as the performance measure. 


56.3.2 Example: Evolution Strategy 
Evolution strategies (ES) are prominent representatives 


of evolutionary algorithms, which includes genetic al- 
gorithms and genetic programming as well [56.15]. 


We use three foreign exchange (currency rate) time 
series collected from Bloomberg. Each time series 
contains hourly samples of the change in currency 
exchange rate [56.12]. 


One typical goal in forecasting is the minimiza- 
tion of the forecast errors or the differences between 
real (observed) values, say y;, and predicted values, 
say ĵ;. This goal can be considered as an optimization 
problem. 

As stated in Sect. 56.2.2, the statistical analysis 
is similar for artificial and natural problem classes. 
Our goal can be stated as follows: For a given prob- 
lem class M, which can be artificial or natural, we 
try to determine if an optimization algorithm a or 
several algorithm instances œ; show similar behavior 
on randomly selected problem instances 7; € M. This 
question will be formulated as a statistical hypothesis. 
Based on the related statistical framework, we can de- 
termine confidence intervals for the performance of the 
algorithm on unseen problem instances. 


They can be classified as generic population-based 
metaheuristic optimization algorithms for global opti- 
mization that in some sense mimics the natural evo- 
lution. Evolution strategies are applied to hard real- 
valued optimization problems. Mutation is performed 
by adding a normally distributed random value to each 
vector component. The standard deviation of these ran- 
dom values is modified by self-adaptation. Evolution 


@ Initialization 
| and evaluation 


Test for termination 


o —-6 


Environmental / 


selection @ 


replacement 0 
Q 


Evaluation 


Mating selection 


. 
/ 


Mutation 


Recombination 
crossover 


Fig. 56.2 The evolutionary cycle, basic working scheme 
of all ES and EA. Terms common for describing evolution 
strategies are used, alternative terms are added below in 
brown 


How to Create Generalizable Results | 56.4 Objective Functions 


Table 56.1 Settings of exogenous parameters of an ES. Recombination operators are labeled as follows: 1 = no, 2 = 
dominant, 3 = intermediate, 4 = intermediate as in [56.14]. Mutation uses the following encoding: 1 = no mutation, 


2 = self adaptive mutation 


Parameter Symbol Name Range Value 
mue H Number of parent individuals N 5 
nu v=A/p Offspring-parent ratio Ry 2.0 
sigmaInit a) Initial standard deviations R+ 1.0 
nSigma no Number of standard deviations. d denotes the problem dimension {1, d} 1 
Cr Multiplier for individual and global mutation parameters R+ 1.0 
tau0 R+ 0.0 
tau R+ 1.0 
rho p Mixing number {1, u} 2 
sel k Maximum age R+ 1.0 
mutation Mutation dil, 2 2 
sreco To Recombination operator for strategy variables {1,2,3,4} 3 
oreco F Recombination operator for object variables Hil, 2, 32 2 


strategies can use a population of several solutions. 
Each solution is considered as individual and consists 
of object and strategy variables. Object variables repre- 
sent the position in the search space, whereas strategy 
variables store the step sizes, i.e., the standard devia- 
tions for the mutation. We analyze the ES basic variant, 
which was proposed in [56.14]. 

Mutation means neighborhood-based movement in 
search space, which includes the exploration of the 
outer space currently not covered by a population, 
whereas recombination rearranges existing information 
and so focuses on the inner space. Selection is meant 
to introduce a bias towards better fitness values. A con- 
crete ES may contain specific mutation, recombination, 
or selection operators, or call them only with a cer- 
tain probability, but the control flow is usually left 


56.4 Objective Functions 


We will use the following optimization framework: 
an ES is applied as a minimizer on the test func- 
tion f(x). Formally speaking, let S denote some set, 
e.g., SCR". We are seeking for values f* and x*, 
such that mines f(x) with f* = minyes f(x) and x* = 
arg min f(x). This approach can be extended in many 
ways. For example, if S denotes times-series data, then 
an optimization algorithm can be applied to minimize 
the empirical mean squared prediction error. 

Test problem instances will be drawn from Gal- 
lagher’s and Yuan’s MSG test function generator. The 
following parameters can be used to specify the MSG 
generator: 


unchanged. Each of the consecutive cycles is termed 
a generation. The control flow is shown in Fig. 56.2. 

Concerning the representation, it should be noted 
that most empiric studies are based on canonical forms 
as binary strings or real-valued vectors, whereas many 
real-world applications require specialized, problem- 
dependent ones. Table 56.1 summarizes important ES 
parameters. This chapter presents two case studies. The 
first case study is based on a fixed ES parameter setting, 
whereas the second case study modifies the recombi- 
nation operator for object variables. We are convinced 
that the applicability of the methods presented in this 
chapter goes far beyond the simplified case studies. Our 
main contribution is a framework, which allows conclu- 
sions that are not limited to a small number of problem 
instances but to problem classes. 


The number of Gaussian components m. 

The mean vector u of each component. 

The covariance matrix X of each component. 

The weight of each component w;. 

A maximum threshold t € [0; 1] can be specified for 
local optima and the fitness value of the global op- 
timum G*. Local optima are randomly generated 
within [0; tx G*]. 


The following tuple can be used to specify an MSG 
generator 


I := (lc, c]",n,m, Dy. {D5}, {t, G*}) , (56.1) 


1131 


79S |3 Hed 


132 Part E | Evolutionary Computation 


79S | 3 Hed 


li l 
0.5 0 0.5 0 


where c € R defines the boundary constraints of the 
search space, n the search space dimensionality, m 
the number of Gaussian components, D,, the distribu- 
tion used to generate the mean vectors of components, 
D>» the distribution or procedures used to generate co- 
variances of components, t € [0; 1] the threshold for 
local optima, and G* the function value of the global 
optimum. 


0.8 0.8 
0.6 0.6 
0.4 0.4 
0.2 
0.2 
0 
1 1 
0.8 0.8 
E] = 
0.6 0.6 
0.4 0.4 
0.2 0.2 
0 0 
1 1 
0.8 0.8 
E Ea 
0.6 0.6 
0.4 0.4 
0.2 0.2 
0 0 


Fig. 56.3a-i Nine test problem instances from [Tmsc, generated with the MSG landscape generator as specified in (56.2). These 
figures exemplify how numbers and locations of the randomly generated optima can vary. Usually, the optima are evenly dis- 
tributed in the search space. In some settings, there are a few dominating optima as can be seen in part (g) 


Based on (56.1), we have specified the following 
MSG landscape generator for our experiments 


Tso := ([-1; 1], 2, 10, U[-1; 1], 
{U[0.05: 0.15], U[—2/4, 7/4]} , {0.8, 1}). 
(56.2) 


With this setting, the mean vector of each component 
is generated randomly within [—1, 1]?. The covariance 


How to Create Generalizable Results | 56.5 Case Studies 


matrix of each component is generated with the proce- 
dure D y in three steps: 


1. A diagonal matrix S with eigenvalues is generated. 
An orthogonal matrix T is generated through n(n — 
1)/2 rotations with random angles between 
[—2/4, 7/4]. 

3. The covariance matrix is generated as T'ST. 


The weight w; of the component correspond- 
ing to the global optimum is set to 1 while other 
weights are randomly generated within [0; 0.8]. The 
nine problem instances, 2; E€ mso, (i= 1,...,9) 


56.5 Case Studies 


Bartz-Beielstein [56.9] introduced the acronyms: 


© SASP: one single algorithm and one single problem 
instance 

© SAMP: one single algorithm and multiple problems 
instances 

@ MASP: multiple algorithms and one single problem 
instance 

@ MAMS: multiple algorithms and multiple problem 
instances 


for classifying optimization designs [56.17]. 


56.5.1 Single Problem Designs: 
SASP and MASP 


In SASP we analyze the performance of an optimiza- 
tion algorithm a on a single problem instance wz. An 
optimization problem has a set of input data which 
instantiate the problem. This might be a function in 
continuous optimization or the location and distances 
between cities in a traveling salesman problem. In the 
following, we will use Y to denote the random per- 
formance measure obtained by r runs of algorithm @ 
on problem instance z. Because many optimization 
algorithms such as evolutionary algorithms are random- 
ized, their performance Y on one instance is a random 
variable. It might be described by a probability den- 
sity/mass function p(y|z). Running the algorithm with 
different random seeds on one problem instance, we 
collect sample data y,,...,y,, which are independent 
and identically distributed (i.1.d.). 

There are situations, in which SASP is the method 
of first choice. Real-world problems, which have to be 
solved only once in a very limited time, are good ex- 


from Fig. 56.3 were generated with this parametriza- 
tion. 

Note that we are using the distance to the op- 
timum as an objective function in our experiments. 
Our objective function reads G* — f(x), because we are 
considering minimization problems. Other measures of 
interest might be the gap percent of optimality 


(A -FO) 100. 
G* 


or computation time, etc., see, e.g., [56.16]. 


amples for using SASP optimizations. MASP shares 
several characteristics with SASP. Because of their lim- 
ited capacities for generalization, SASP and MASP will 
not be investigated further in this study. 


56.5.2 SAMP: Single Algorithm, Multiple 
Problems 


Fixed-Effects Models 
This setup is commonly used for testing an algorithm 
on a given (fixed) set of problem instances. Standard 
assumptions from analysis of variance (ANOVA) lead 
us to propose the following fixed-effects model [56.2] 


Yjy=utute;, (56.3) 


where u is an overall mean, t; is a parameter unique 
to the i-th treatment (problem instance factor), and £; 
is a random error term for replication j on problem in- 
stance i. Usually, the model errors ¢; are assumed to be 
normally and independently distributed with mean zero 
and variance o. If problem instance factors are con- 
sidered fixed, i. e., non random, the stochastic behavior 
of the response variable originates from the algorithm. 
This implies the experimental results 


WeNG@ewe ds i=1,...,q4,j=1,...;F, 


(56.4) 


and that the Y; are mutually independent. Results 
from statistical analyses remain valid only on the spe- 
cific instances. Furthermore, SAMP with a fixed set of 
problem instances is subject to criticism, e.g., that algo- 
rithms are trained for this specific set up test instances 
(over fitting). 

In order to make the results of the analysis inde- 
pendent of the specific instances and dependent instead 


1133 


S°9S | 3 Hed 


134 Part E 


Evolutionary Computation 


S°95 | 3 Hed 


on the class of instances from which the specific in- 
stances are drawn, Chiarandini and Goegebeur propose 
randomized and mixed models for the experimental 
analysis of optimization algorithms as an extension of 
(56.3) [56.3]. In contrast to model (56.3), these mod- 
els allow generalizations of results to the whole class of 
instances. 


Randomized Models 
In the following, we consider a population or class of 
instances JI. The class JI consists of a large, possi- 
bly an infinite, number of problem instances 7;,i = 
1,2,3,... Let p(x) denote the probability of sampling 
instance z. The performance Y of the algorithm œ on 
the class IT is described by the probability function 


PO) = D> polpa). (56.5) 


well 


If we run an algorithm q@ r times on instance x, then 
we receive r replicates of a’s performance, denoted 
by Y,,..., Y,. These r observations are i.i.d., i. e., 


POr- srl) = | [poylx). (56.6) 


j=l 


So far, we have considered r replicates of the perfor- 
mance measure Y on one problem instance 2. Now we 
consider several, randomly sampled problem instances. 
Over all the instances the joint probability distribution 
of the observed performance measures is obtained by 
marginalizing over all instances 


aY) = pee 


well 


POI.. .y-|)p(r). (56.7) 


Extending the model (56.7) to the case where one 
algorithm with several parameter settings or several al- 
gorithms are analyzed leads to mixed models, which 
will be discussed in Sect. 56.5.3. 


Example SAMP: ES on M 

(Random-Effects Design) 
The simplest random-effects experiment is performed 
as follows. For i=1,...,g a problem instance zr; is 
drawn randomly from the class of problem instances JT. 
On each of the sampled z;, the algorithm œ is run r 
times using different seeds for æ. Due to a’s stochastic 
nature, we obtain, conditionally on the sampled in- 
stance, r replications of the performance measure that 
are i.i.d. 


Let Y; G@=1,...,q; j= 1,...,r) denote the ran- 
dom performance measure obtained in the j-th replica- 
tion of œ on z;. We are interested in drawing conclu- 
sions about w’s performance on a larger set of problem 
instances from JT and not just on those q problem 
instances included in the experiment. A systematic 
approach to accomplish this task comprehends the fol- 
lowing steps: 


@ SAMP-! algorithm and problem instances 

@ SAMP-2 ANOVA and restricted maximum likeli- 
hood estimator (REML) model building 

© SAMP-3 validation of the model assumptions 

@ SAMP-4 hypothesis testing 

@ SAMP-5 Confidence intervals and prediction. 


SAMP-1 Algorithm and Problem Instances. The 
goal of this case study is to analyze if one algorithm 
shows a similar performance on a class of problem in- 
stances, say [Tysc. A random-effects design will be 
used to model the results. We illustrate the decompo- 
sition of the variance of the response values in (i) the 
variance due to problem instance and (11) the variance 
due to the algorithm and derive results, which are based 
on hypotheses testing as introduced in (56.12). 

We consider one algorithm, an ES, which is run r = 
10 times on a set of randomly generated problem 
instances. The ES is parameterized with the default 
setting from Table 56.1. These parameters are kept con- 
stant during the experiment. Nine instances are drawn 
from the set of problem instances msg. Problem in- 
stances were generated with the MSG landscape gener- 
ator as specified in (56.2). The corresponding problem 
instances are shown in Fig. 56.3. 

The null hypothesis reads There is no instance ef- 
fect. Since we are considering the SAMP case, our 
experiment is based on one ES instance only. There 
are 90 observations, because 10 repeats were performed 
on 9 problem instances. Figure 56.4 shows the perfor- 
mance of the ES on these nine instances. The variable 
f£Seed is used to denote the problem instance num- 
ber 7. 


SAMP-2 ANOVA and REML Model Building. 


ANOVA Model Building. The following analysis is 
based on the linear statistical model 


PR E 
Yj = M++ ey), j 


j=1 j (56.8) 


How to Create Generalizable Results | 56.5 Case Studies 


a) fecca Performance 
of. . 
gi. 
Galt 
6H . 
5 |_4- 
4|. 
3\4 
2| 
1 . 
0 0.1 0.2 03 
y 


b) Seed Performance 
A 
9 a es Mane . 


8 . 


> 
-15 -10 -5 
YLog 


Fig. 56.4a,b Performance of the ES on nine test problem instances. (a) Problem instances plotted versus algorithm 
performance. (b) Problem instances plotted against logarithmic performance. Smaller values are better 


where jz is an overall mean and ej is a random er- 
ror term for replication j on instance i. Note that in 
contrast to the fixed-effects model from (56.3), Tt; is 
a random variable representing the effect of instance i. 
The stochastic behavior of the response variable orig- 
inates from both the instance and the algorithm. This 
is reflected in (56.8), where both q; and €; are random 
variables. The model (56.8) is the so-called random- 
effects model, cf. [56.2] or [56.3]. 

We assume that T,...,T, are iid. N (0, o2) and 
&j i= 1,...,q4, j= 1,...,r, are iid. N (0,07). If q; 
is independent of €, and has variance V(t;) = o2, the 
variance of any observation is V(Y;) = 07 + 02. Simi- 
lar to the partition in classical ANOVA, the variability 
in the observations can be partitioned into a compo- 
nent that measures the variation between treatments 
and a component that measures the variation within 
treatments. Based on the fundamental ANOVA identity 
SStotat = SStreat + SSer, we define 


SStreat = Pe. DA 
q-1 q-1 ; 


MS treat = 


and 


SSer pE DNIO 


MSerr AT e Tai 
q(r—1) q(r—1) 


It can be shown that 


and E(MSer) = 0°, 
(56.9) 


E(MStreat) = 07? + ro? 


cf. [56.2]. Therefore, the estimators of the variance 
components are 


ô? = MSer , (56.10) 
~ MSrea — MS 
ol = ee (56.11) 


r 


The corresponding ANOVA table is shown in Ta- 
ble 56.2. Based on ANOVA calculations, with (56.10) 
we obtain an estimator of the first variance compo- 
nent 67 = —0.4848257, and from (56.11), we obtain 


Table 56.2 ANOVA table for a one-factor fixed and random effects models 


Source of variation Sum of squares Degrees of freedom Mean square EMS fixed 


Treatment SStreat q-1 
Error SSerr q(r—1) 
Total SStotal qr-1 


EMS random 
Te Ta 2 
MS treat Gre |o ro 
MSerr o o 


135 


S'9S | 3 Hed 


136 PartE 


Evolutionary Computation 


S°9S5 | 3 Hed 


the second component 62 = 11.32854. The model vari- 
ance can be determined as ô? + 62 = 10.84372. The 
mean u = —12.05554 from (56.8) can be extracted. Fi- 
nally, the p value in the ANOVA table is calculated 
as 0.7979083. 

Note that we have obtained a negative variance. 
Since negative variances are not feasible, we can pro- 
ceed by setting their values to zero and proceed with 
these modified values. A more elegant way is presented 
in the following. 


Restricted Maximum Likelihood. In some cases, the 
standard ANOVA, which was used in our example, pro- 
duces a negative estimate of a variance component. 
This can be seen in (56.11): if MSer > MStreat, nega- 
tive values occur. By definition, variance components 
are positive. Methods that always yield positive vari- 
ance components have been developed. Here, we will 
use restricted maximum likelihood estimators (REML). 
The ANOVA method of variance component estima- 
tion, which is a method of moments procedure, and 
REML estimation may lead to different results. Output 
from an R-based analysis with the function 1me from 
the package 1me4 reads as follows (f Seed denotes the 
problem instance) [56.18]: 


Linear mixed model fit by REML 


Formula: yLog ~ 1+ (1 | £Seed) 
Data: samp.df 
AIC BIC logLik deviance REMLdev 
475.6 483.1 -234.8 469.3 469.6 


Random effects: 


Groups Name Variance Std.Dev. 
fSeed (Intercept) 0.000 0.0000 
a) Q-Q plot for residuals 
0.34 o 
025 


Sample quantiles 
S 
= S 
Nn N 


2 
= 


Theoretical quantiles 


10.893 3.3004 
groups: fSeed, 9 


Residual 


Number of obs: 90, 


Fixed effects: 
Estimate Std. Error t value 


(Intercept) -12.0555 0.3479 -34.65 


Compared to the ANOVA setting, different values 
for G?, Ge, and u were obtained. However, the REML- 
based analysis also shows that the variability in the 
response observations can be attributed to the variabil- 
ity of the algorithm. 


SAMP-3 Validation of the Model Assumptions. Be- 
fore performing hypothesis testing based on the mod- 
els introduced in SAMP-2, the validity of the model 
assumptions has to be investigated. If the model is 
adequate, the residuals should exhibit no structure. 
Residuals are plotted against fitted values to check the 
assumption of homoscedasticity and quantile—quantile 
(Q-Q) plots are used to check if residuals meet the 
normality assumption. Quantile—quantile plots of the 
residuals are shown in Fig. 56.5 for the raw and the 
log-transformed responses. These plots provide a good 
way to compare the distribution of a sample with 
a distribution. Large deviations from the line indicate 
non-normality of the sample data. These Q-Q plots in- 
dicate that a log transformation of the response might 
be useful in our setting. 


SAMP-4 Hypothesis Testing. Testing hypotheses 
about individual treatments (instances) is useless be- 


b) Q-Q plot for residuals 


10 


Sample quantiles 


-5 


> 


-2 -1 0 1 2 
Theoretical quantiles 


Fig. 56.5a,b (a) Q-Q plot of the residuals for raw data. (b) Q-Q plot for the log-transformed responses 


How to Create Generalizable Results | 56.5 Case Studies 


cause the problem instances z; are here considered as 
samples from some larger population of instances IT. 
We test hypotheses about the variance component o2, 
i. e., the null hypothesis 

Ho : 02 = 0 versus H, oe >0. (56.12) 
Under Hp, the algorithm performance is identical on all 
problem instances (all treatments are identical), i.e., 
ro2 is very small. Based on (56.9), we conclude that 
E(MS treat) = 07 + roz and E(MSer) = o° are similar. 
Under the alternative, variability exists between treat- 
ments. Standard analysis shows that SSer/o? is dis- 
tributed as chi-square with q(r— 1) degrees of freedom. 
Let F,,, denote the F distribution with u numerator 
and v denominator degrees of freedom. Under Ho, the 
ratio 


Fix SStreat/g — 1 _ MS treat 
°? SSen/40 1) MSc 


is distributed as F4—1.4¢—1). To test hypotheses in 
(56.8), we require that t),...,t, are iid. N (0, a), 
éyj,t=1,...,¢,fjH1,...,4r, are iid. N (0,07), and 
all t; and ¢, are independent of each other. These con- 
siderations lead to the decision rule to reject Ho at the 
significance level a if 

fo>FU-a;q—1,q(r—-1)), (56.13) 
where fo is the realization of Fo from the data observed. 
An intuitive motivation for the form of statistic Fọ can 
be obtained from the expected mean squares. Under Ho 
both MSireat and MSer estimate o° in an unbiased way, 
and Fo can be expected to be close to one. On the other 
hand, large values of Fo give evidence against Ho. 

Regarding the SAMP case, we obtain the following 
values: Based on (56.9) and (56.13), we can deter- 
mine the F statistic and the p value. We get MStreat = 
MSer = 10.89275 and fọ = 1, which results in a large p 
value: 0.4426363. The null hypothesis Ho :o2 =0 
from (56.12) cannot be rejected, i. e., we conclude that 
there is no instance effect. A similar conclusion was ob- 
tained from the ANOVA method of variance component 
estimation as introduced in Table 56.2. 


SAMP-5 Confidence Intervals and Prediction. An 
unbiased estimator of the overall mean ju is 


Its variance is given by 


q r 


V5.) =V 


With (56.9) and (56.10), we obtain an estimator of the 
variance of the overall mean jz as 


VO.) = MS reat 
qr 
Since 
=p 


the confidence limits for jz can be derived as 


= MS 
5. Ẹ H-aj- treat , 
gr 


(56.14) 


We conclude the SAMP case study with predic- 
tion of the algorithm’s performance on a new in- 
stance from the same class. Based on (56.14), we 
obtain the following 95% confidence interval: [2.6773 x 
106: 1.262 x 1075]. Again, confidence intervals from 
the REML and ANOVA methods are very similar. 
Summarizing, we can conclude that the ES performs 
similarly on instances from msg, which were gener- 
ated with (56.2). 


56.5.3 MAMP: Multiple Algorithms, Multiple 
Problems 


In the MAMP case study, fixed effects are included 
in the conditional structure of (56.6), which leads to 
a mixed model. Instead of one fixed algorithm as in 
the SAMP case, we consider either several algorithms 
or algorithms with several parameters. Both situations 
can be treated while considering algorithms as levels 
of a fixed factor, whereas problem instances are drawn 
randomly from the population of instances Tysa: 


MAMP-1 algorithm and problem instances 
MAMP-2 ANOVA and REML model building 
MAMP-3 validation of the model assumptions 
MAMP-4 hypothesis testing: 

1. Random effects 

2. Fixed effects 

@ MAMP-5 confidence intervals and prediction. 


1137 


S°9S | 3 Hed 


138 Part E | Evolutionary Computation 


S°9S5 | 3 Hed 


MAMP-1 Algorithm and Problem Instances 

We aim at comparing the performance of the ES 
with different recombination operators over an instance 
class. More precisely, we have four ES instances using 
recombination operators {1,2,3,4} and nine instances 
randomly sampled from the class Tysa as illustrated in 
Fig. 56.3. Each run is repeated ten times. In this study 
4x 9x 10 = 360 data were used. We are interested in 
the following questions: 


@ Is there an instance effect? 

@ Do the mean performances of the ES with different 
recombination operators differ? 

@ Do the instance—algorithm interactions contribute to 
the variability of the response? 


A first visual inspection, which plots the perfor- 
mance of the algorithm within each problem instance, 
is shown in Fig. 56.6. In eight of the nine instances the 
linear regression line has a negative slope and the inter- 
cepts do not differ very much. This indicates that there 
is no significant interaction between the fixed and the 
random factors. 


130 131 132 
4 A ò ° 8 
o [o] 
. @ 8 8 ? o 
Pe Sh Sse T. 
8 Si ° [o] 8 o] 8 ya 
° 9 S 8 8 ° 8 A Ò 9 8 
=10) hi $ $S 8 O Q J 8 
8 
8 
127 128 129 


if ° T “8 8 
-10 ° ii kd ? 
$ ọ 
ọ 
124 125 126 
ọ ọ ọ 
° 3 ji g ò 8 8 ° l 
= -i z 5 8 
Ò 7 Be, cod ° 
Ò 4 . = a 
8 8 8 8 ° ° i 8 8 
-10 P3 Q o s o 


12 54ain asa liag aa 
Objreco 


Fig. 56.6 Four algorithms (ES with modified recombination oper- 
ators) on nine test problem instances. Each panel represents one 
problem instance and problem instances are labeled from 124 to 
130. Performance is plotted against the level of the recombination 
operator 


MAMP-2 ANOVA and REML Model Building 
The variability in the performance measure can be 
decomposed according to the following mixed-effects 
ANOVA model 


Yijk = H + Oj + Ti + Yij + Eijk > (56.15) 


where u is an overall performance level common to all 
observations, œj is a fixed effect due to the algorithm j, 
T; is a random effect associated with instance i, yj is 
a random interaction between instance i and algorithm j, 
and jx is a random error for replication k of algorithm j 
on instance i. We assume that the ay’s are fixed effects 
such that yy a = 0 and that the random elements 17; 
are iid. N (0,02), yi are iid. N (0, 07), &jx are iid. 
N (0,07) and 1, yi and £; are mutually independent 
random variables. Similarly to (56.6) the conditional 
distribution of the performance measure given the in- 
stance and the instance—algorithm interaction is given 
by 


Yiultis vg © N (u +a + ti + yj o’) ; (56.16) 


with i= 1,...,g,j=1,...,f, and k=1,...,r. The 
marginal model reads (after integrating out the random 
effects t; and yj): 


Yin © N (utaj,0° +07 +02). (56.17) 


Based on these statistical assumptions, hypothesis tests 
can be performed about fixed and random factor effects. 
Using the mixed model (56.16), we are interested in 
testing whether there is a difference between the fac- 
tor level means u +œ; (j = 1,...,h). The hypotheses 
for testing the fixed effects can be formulated as 


Ay:a;=OVi against Hı:3œ; #0. (56.18) 


Regarding random effects, tests about particular levels 
are useless. This is similar to the random-effects model 
(56.8). Again, we perform tests on the variance compo- 
nents o2 and o3 instead. These can be formulated as 
follows 

Ho: ož =0, and Ay: of = 


Y (56.19) 


H: oO, and H: of =O, 


respectively. If all treatment (problem instances) com- 
binations have the same number of observations, i. e., 
if the design is balanced, the test statistics for these hy- 
potheses are ratios of mean squares that are chosen such 


How to Create Generalizable Results | 56.5 Case Studies 


Table 56.3 Expected mean squares and consequent appropriate test statistics for a mixed two-factor model with h fixed 


factors, g random factors, and r repeats (after [56.3]) 


Effects Mean squares Df 

Fixed factor MSA e= íl 
Random factor MSB Gal 
Interaction MSAB (h—1)(q—1) 
Error MSE hq(r — 1) 


Table 56.4 ANOVA for the MAMP case 


Mean squares Factors Df Sum Sq 
MSA objreco 3 154.59 
MSB fSeed 8 251.79 
MSAB objreco:fSeed 24 185.60 
MSE Residuals 324 USI 27 


that the expected mean squares of the numerator dif- 
fers from the expected mean squares of the denominator 
only by the variance components of the random factor 
under test. Chiarandini and Goegebeur [56.3] present 
the resulting analysis of variance, which is shown in Ta- 
ble 56.3. 


ANOVA Model Building 
The ANOVA table for the experiments from the MAMP 
case study is shown in Table 56.4. Equating the ob- 
served mean squares in the lines of the ANOVA table to 
their expected values and solving for the variance com- 
ponents leads to the following equations [56.2] 


.> . MSB—MSAB 
a 


MSAB — MSE 
= = = 0.306907 , 


; 
6? = MSE = 4.664423. 


= 0.593502 , 


Next, we will compare these results to the REML- 
based analysis of the mixed model. 


REML Model Building 
We have specified sum contrasts instead of the default 
treatment contrasts used in lmer(). Again, f Seed rep- 
resents the problem instance, whereas the algorithm 
instance a, j = 1,...,4, is represented by obj reco. 


Linear mixed model fit by REML 
Formula: yLog ~ objreco + (1 | fSeed) 
+ (1 | £Seed:objreco) 


Random effects: 
Groups Name Variance Std.Dev. 
fSeed:objreco (Intercept) 0.30691 0.55399 
fSeed (Intercept) 0.59351 0.77039 


Expected mean squares Test statistics 
h 2 

o? + ro} +r = MSA/MSAB 
G” thier E MSB/MSAB 
oo. MSAB/MSE 
o? 
Mean Sq F value Pr(>F) 
51.53 11.05 0.0000 
31.47 6.75 0.0000 
TIB 1.66 0.0288 
4.66 

Residual 4.66442 2.15973 


Number of obs: 360, 
groups: fSeed:objreco, 36; fSeed, 9 


Fixed effects: 
Estimate Std. Error t value 


(Intercept) -6.0222 0.2956 -20.370 
objrecol 0.6176 0.2539 2.433 
objreco2 0.6918 0.2539 2.725 
obj reco3 -0.6671 0.2539 -2.628 


As can be seen from the Random effects sec- 
tion of the REML model output, the estimated variances 
for the problem instance and the instance-interaction 
random effects are 6? = 0.59351 and ay = 0.30691, 
respectively. The Random effects section presents 
the estimates of the fixed effects model parameters, i. e., 
obj reco. 


MAMP-3 Validation of the Model Assumptions 
Again, a check of the diagnostic plots (Fig. 56.7) reveals 
that a log transformation of the response improves the 
model adequacy. 


MAMP-4a Hypothesis Testing: Random Effects 
We will consider random effects first. Regarding prob- 
lem instances, test about levels are meaningless. Hence, 
we perform tests about the variance components o2 
and Oy: which were presented in (56.19). First, we 
test the null hypothesis, which states that the com- 
ponents of the random effects are zero. Based on 
the ANOVA from Table 56.3, we obtain the values 
for the MAMP case that are shown in Table 56.4. 
The values reveal that there are main factor effects 
(fixed and random), but no significant interaction 
effects. 


1139 


S'9S | 3 Hed 


140 PartE | Evolutionary Computation 


S°9S5 | 3 Hed 


a) Q-Q plot for residuals 
A 
0.3 oo 
o 
o 
o 
5 0.2 2 
E ° 
z oS 
a 
A ol ° 
g * : 
S & 
8 
0 
[o] O O 
> 
-3 -2 -1 0 1 2 3 


Theoretical quantiles 


b) Q-Q plot for residuals 


Sample quantiles 


> 
-3 -2 -l 0 1 2 3 
Theoretical quantiles 


Fig. 56.7a,b (a) Q-Q plot of the residuals for raw data. (b) Q-Q plot for the log-transformed responses 


Alternatively, we can compute the likelihood ratios 
of models with and without the factors under observa- 
tion. 


Data: mamp.df 
Models: 
mamp.lmer2: yLog ~ objreco + (1 | fSeed) 
mamp.lmer3: yLog ~ objreco + (1 | £Seed) 
+ (1 | £Seed:obj reco) 
DE AIC BIC logLik 

mamp.ilmer2 6 1616.7 1640.0 -802.35 
mamp.ilmer3 7 1616.6 1643.8 -801.31 

Chisq Chi Df Pr(>Chisq) 

2.0929 a 0.148 


These tests indicate that there are also no significant 
instance-algorithm interactions. Additional likelihood- 
ratio tests show that the fixed factor and random factor 
effects are significant. 


MAMP-4b Hypothesis Testing: 

Fixed Factor Effects 
Regarding fixed factors, we are interested in testing for 
differences in the factor level means u + @;. These tests 
were formulated in (56.18), i. e., we are testing Ho: all 
a; are equal to 0 versus Hy: at least one a; Æ 0. Here, 
we use the test statistic from [56.2, p. 523] for testing 
that the means of the fixed factor effects are equal. The 
appropriate test statistic for testing that the means of the 
fixed factor effects are equal, i. e., Ho is true, is 


_ MSA _ 154.59/3 
~ MSAB _ 185.6/24 


Fo = 6.663 362, 


with values taken from Table 56.4. The reference dis- 
tribution is Fy—1,(n—1)(q—-1). We calculate the p value 
for the test on the fixed-effect term. The p value ob- 
tained is 0.002, hence the results collected indicate that 
the factor recombination (obj reco) has a statistically 
significant impact on the performance of the algorithm. 
Using sum of contrasts implies that ` a; =0. The 
point estimates for the mean algorithm performance 
with the j-th fixed factor setting can be obtained by 
[Lj = H + aj. The fixed factor effects can be estimated 


4 -————— —H 
3 coo o 
2 ceau 
il ————— 
> 
7 6.5 6 5.5 5 


X 


Fig. 56.8 Paired comparison plots. Results from four ES 
instances with different recombination operators are shown 
in this plot 


How to Create Generalizable Results | 56.6 Summary and Outlook 


in the mixed model as 


ù=5 
å =}... 


which results in the following estimates: a, = 
0.6175519, â = 0.6918047, a3; = —0.6671266, and 
a4 = —0.6423659. 

The same estimates were obtained with the REML 
analysis as can be seen from the REML model out- 
put in Sect 56.5.3. The corresponding fixed effects are 
shown in the Fixed effects section of the REML 
output. For example, we obtain the following value: 
objrecol =a = 0.6176. 


56.6 Summary and Outlook 


In order to answer question (Q-1), we propose an ap- 
proach to generate natural problem classes, which are 
based on real-world data. If no such data are available, 
artificial problem generators such as MSG can be used. 
Since our approach uses a model, say M, to generate 
new problem instances, one conceptual problem arises: 
this approach is not applicable, if the final goal is the 
determination of a model for the data, because M is per 
definition the best model in this case, and the search for 
good models will result in M. However, there is a sim- 
ple solution to this problem. In this case, the feature 
extraction and model generation should be skipped and 
the original data should be modified by adding some 
noise or performing transformations on the data. Nev- 
ertheless, if applicable, the model-based approach is 
preferred, because it sheds some light on the underly- 
ing problem structure. 

The model-based approach can be used to generate 
infinitely many test-problem instances. Instead of using 
a fixed number of problem instances, we propose: 


1. Using randomly generated problem instances 
2. Treating the problem instance as a random factor. 


Algorithms with different parameterizations are 
tested on this set of randomly generated problem in- 
stances. This experimental setup requires modified 
statistics, so-called random-effects models or mixed 
models. This approach may lead to objective evalua- 
tions and comparisons. If normality assumptions are 


MAMP-5 Confidence Intervals and Prediction 
We generate paired comparisons plots, which are 
based on confidence intervals. The wrapper func- 
tion intervals() from Chiarandini and Goege- 
beur [56.3] was used for visualizing these confidence 
intervals as shown in Fig. 56.8. When intervals over- 
lap we conclude that there is no significant difference. 
Here, we can conclude that the recombination op- 
erators (1) and (2) show a similar performance, 
whereas performances between (3) and (2) are 
different. 

Intermediate recombination of the object variables, 
i.e., (3) and (4), results in a significant improvement 
of the performance. 


met, confidence intervals can be determined, which 
forecast the behavior of an algorithm on unseen prob- 
lem instances. Furthermore, results can be generalized 
in real-world settings. This gives an answer to question 
(Q-2). 

In order to demonstrate the applicability of our ap- 
proach, the performance of an evolution strategy was 
analyzed. The first SAMP example illustrates that the 
selection of the problem instance from the problem 
class ITysg has no significant impact on the per- 
formance of the optimization algorithm. Furthermore, 
confidence intervals, which can be used to predict the 
performance of the algorithm on a problem class, were 
determined. The MAMP case exemplifies how to ana- 
lyze the effect of different algorithm parameter settings 
on the performance. Four variants of the recombina- 
tion operator and nine problem instances were used. 
The analysis reveals that the choice of the recombina- 
tion operator has a significant effect on the algorithm’s 
performance: the performance of the algorithm differs 
with different recombination operators. Intermediate re- 
combination of the object variables results in an perfor- 
mance improvement. We demonstrated that the problem 
instances contribute significantly to the variability in 
the response and that there is no significant instance— 
algorithm interaction. 

The software that was used in this study will be inte- 
grated into the R package SPOT (sequential parameter 
optimization toolbox) [56.19]. 


1141 


9°9S | 3 Hed 


1142 


9S | 4 Hed 


Part E 


Evolutionary Computation 


References 


56.1 


56.2 


56.3 


56.4 


56.5 


56.6 


56.7 


56.8 


56.9 


56.10 


P.J. Brockwell, R.A. Davis: Introduction to Time 
Series and Forecasting (Springer, New York 
2002) 

D.C. Montgomery: Design and Analysis of Experi- 
ments (Wiley, New York 2001) 

M. Chiarandini, Y. Goegebeur: Mixed models for 
the analysis of optimization algorithms. In: Exper- 
imental Methods for the Analysis of Optimization 
Algorithms, ed. by T. Bartz-Beielstein, M. Chiaran- 
dini, L. Paquete, M. Preuss (Springer, Berlin Heidel- 
berg 2010) pp. 225-264, preliminary version avail- 
able as Tech. Rep. DMF-2009-07-001 at the The 
Danish Mathematical Society 

C.C. McGeoch: Toward an experimental method for 
algorithm simulation, INFORMS J. Comput. 8(1), 1-15 
(1996) 

M. Birattari: On the Estimation of the Expected Per- 
formance of a Metaheuristic on a Class of Instances 
2004), Tech. Rep. (IRIDIA, Bruxelles 2004) 

M. Gallagher, B. Yuan: A general-purpose tun- 
able landscape generator, IEEE Trans. Evol. Comput. 
10(5), 590-603 (2006) 

|. Castiñeiras, M.D. Cauwer, B. O'Sullivan: Weibull- 
based benchmarks for bin packing, Lect. Notes 
Comput. Sci. 7514, 207-222 (2012) 

G.E.P. Box, G.M. Jenkins, G.C. Reinsel: Time Series 
Analysis, Forecasting and Control (Holden-Day, San 
Francisco 1976) 

T. Bartz-Beielstein: Beyond Particular Problem In- 
stances: How to Create Meaningful and Gener- 
alizable Results, Tech. Rep. TR 03/2012 (Cologne 
University of Applied Sciences, 2012) 

T. Bartz-Beielstein, M. Friese, B. Naujoks, M. Za- 
efferer: SPOT applied to non-stochastic optimiza- 
tion problems - An experimental study, Genet. 


56.11 


56.12 


56.13 


56.14 


56.15 


56.16 


56.17 


56.18 


56.19 


Evol. Comput. Conf. (GECCO 2012) (ACM, Philadelphia 
2012) 

M. Zaefferer: Optimization and Empirical Analysis 
of an Event Detection Software for Water Qual- 
ity Monitoring, Master Thesis (University of Applied 
Sciences, Cologne 2012) 

0. Flasch, T. Bartz-Beielstein, D. Bicker, 
W. Kantschik, C. von Strachwitz: Results of the 
GECCO 2011 Industrial Challenge: Optimizing For- 
eign Exchange Trading Strategies. CIOP Tech. 
Rep. 10/11, Res. Center CIOP (Cologne University of 
Applied Science, Cologne 2011) 

E. Vladislavleva: Model-based Problem Solving 
through Symbolic Regression via Pareto Genetic 
Programming, Ph.D. Thesis (Tilburg University, 
Tilburg 2008) 

H.-G. Beyer, H.-P. Schwefel: Evolution strategies — 
A comprehensive introduction, Nat. Comput. 1, 3- 
52 (2002) 

H.-P. Schwefel: Numerical Optimization of Com- 
puter Models (Wiley, Chichester 1981) 

T. Bartz-Beielstein: Experimental Research in Evo- 
lutionary Computation — The New Experimental- 
ism, Natural Computing Series (Springer, Berlin, 
Heidelberg, New York 2006) 

T. Bartz-Beielstein, M. Preuss: Automatic and in- 
teractive tuning of algorithms, Proc. 13th Annu. 
Conf. Companion Genet. Evol. Comput. (GECCO) 
(ACM, New York 2011) pp. 1361-1380 

J. Pinheiro, D. Bates: Mixed-Effects Models in S and 
S-PLUS (Springer, New York 2000) 

T. Bartz-Beielstein, M. Zaefferer: A Gentle Introduc- 
tion to Sequential Parameter Optimization, Tech. 
Rep. TR 01/2012 Clplus Bd. 1/2012 (Cologne University 
of Applied Sciences, Cologne 2012) 


1143 


57. Computational Intelligence 
in Industrial Applications 


Ekaterina Vladislavleva, Guido Smits, Mark Kotanchek 


In this chapter, we review the progress and the alte cians ia Bey te By ie ne 
impact of computational intelligence for indus- 5723 Research Analytics... 1146 
trial applications sampled from the last 10 years 
of our personal careers and areas of research (all e E 1147 
authors of this chapter do computational mod- 
eling for a living). This chapter is structured as BLA Workflows se 1149 
follows. Section 57.2 introduces a classification of 57.4.1 Data Collection 
data-driven predictive analytics problems into and Adaptation......................06 1149 
three groups based on the goals and the infor- 57.4.2 Model Development IARE 1150 
mation content of the data. Section 57.3 briefly 57.4.3 Problem Analysis 
covers most frequently used methods for predic- and Reduction usores 1150 
tive modeling and compares them inthe contextof 575 Examples -o.n 1150 
available a priori knowledge and required execu- 57.5.1 Hybrid Intelligent Systems 
tion time. Section 57.4 focuses on the importance of for Process Analytics .....c.c.cecsse0e- 1150 
good workflows for successful predictive analytics 57.5.2 Symbolic-Regression Workflow 
projects. Section 57.5 provides several examples of for Process Analytics ..........000.0 1151 
such workflows. Section 57.6 concludes the chapter. 57.5.3 Sensory Evaluation Workflow 
for Research Analytics ................ 1152 
57.1 Intelligence and Computation .............. 1143 , 
S76 GOMCIISIONS -oiei cienie 1155 
57.2 Computational Modeling 
for Predictive Analytics ........................ 1144 References... cccccccccccceceeessesseeeeeeees 1156 


57.1 Intelligence and Computation 


Developments in computational intelligence (CI) are of the human or business. While prediction is the ulti- 
driven by real-world applications. Over the years a lot mate goal and computational modeling is the means to 
of CI has become ubiquitous to the average user and achieve this goal, we will use concepts of predictive an- 
is deeply interwoven into the way modern design, re- alytics and (data-driven) computational modeling as if 
search and development is done. they were the same. 

In our view, CI is human intelligence assisted and Computational modeling methods allow us to gen- 
(dramatically) enhanced by computational modeling. erate various hypotheses about a specific problem based 
Intelligence is the capability to predict, and, in theory, on observations in an objective way. The mental mod- 
there are two directions to get to prediction through els that the scientists develop during this process help 
computing — data-driven modeling and first principle them to filter through these hypotheses and come up 
modeling. In reality though, since even fundamental with new experiments that either support or falsify 
models and theories have to be validated by data, ev- some of the previous hypotheses or lead to new ones. 
erything is data driven. For this reason, from now on This process supports the scientific method and sig- 
we will focus on data-driven computational modeling, nificantly accelerates technological development and 
and say that it exists to enhance predictive capabilities innovation. 


v 
o 

= 

as 
m 
vi 
N 
— 


144 Part E 


Evolutionary Computation 


72s |3 Hed 


There are many examples of new computational 
methods empowering problem solving in the areas of 
material science, energy management, plant optimiza- 
tion, sensory evaluation science, broadband technology, 
social science (economic modeling), infectious disease 
prevention, etc. And while success in many cases is un- 
deniable, two main challenges still remain. 

First, there is an education gap to bridge before 
modern CI techniques can reach their full potential, are 
widely accepted, and become as natural as performing 
experiments in the lab. While many engineering edu- 
cational programs are embracing these techniques and 
help raise awareness of the useful methods in data- 
driven modeling and computational statistics, the ma- 
jority of programs in pure sciences tend to ignore them 
for the most part. There is still a considerable (psycho- 
logical, cognitive, educational) barrier for experimental 
scientists — biologists, chemists, physicists, computer 
scientists — to fully exploit the potential of CI. People 
will happily save an hour of computing time by spend- 


ing an additional week in the lab, while in some cases it 
makes much more sense to spend a week of computing 
time to spare one experiment in the lab (consider, e.g., 
an expensive car crash-test). We appeal to educational 
programs to nurture the interest in computation among 
graduates and facilitate the joint projects of academia 
with industry targeted at the use and further develop- 
ment of computational intelligence methods for real- 
world problems. 

Second, there is a development gap in the pro- 
duction of scalable off-the-shelf CI algorithms. The 
parallelization bottleneck seems to affect most CI 
methods when they are executed on massively par- 
allel architectures. The fact that computational ad- 
vances in hardware (exa-scale computing) happen at 
a much faster pace than advances in the design of 
scalable CI algorithms raises the question: Up to 
which moment can we get more intelligence, i.e., 
more predictive capability, with more computational 
power? 


57.2 Computational Modeling for Predictive Analytics 


While many barriers remain in improving the incorpo- 
ration of CI in classical education, in solving the new 
(previously unthinkable) challenges, and in further in- 
novating CI technology, the current time is a perfect 
moment to make this happen. 

First of all, the realization for the indispensability of 
CI across all industries and all sciences grows as does 
the number of required CI practitioners (computational 
statisticians, data scientists, modelers). The report of 
Manyika et al. on Big Data [57.1] predicts a potential 
gap of 50—60% (300 000 people) in demand relative to 
the supply of well-educated analytical talent in the USA 
by 2018. The data science and big data movement have 
grown in the last decade to become a buzz-word om- 
nipresent in scientific magazines, technology reviews, 
and business offerings. 

While we are happy that the attention of the aver- 
age user is being focused on the importance and impact 
of computational modeling, we are also concerned with 
the fact that too many details are omitted and almost 
everything (business strategies, CI methods, targets for 
predictive analytics, etc.) gets thrown onto one pile. 

While Big Data is occupying the minds of future 
engineers, data scientists, and business majors as a next 
big thing to watch and a synonym of predictive analyt- 
ics, we want to balance the story some more and provide 
a full picture of what we think constitutes predictive an- 


alytics by computational modeling. While business and 
industry is striving to become data driven these days, 
it seeks CI strategies to compete, innovate, and capture 
value. Success and impact of CI will be generated only 
if the right strategies are used in the right place. 

Success of CI in industry will be awarded to meth- 
ods that create impact measured in attaining the new 
level of understanding and knowledge, in units of dol- 
lars. In Fig. 57.1 we sketch a relation between the 
degree of intelligence and the level of competitive ad- 
vantage from [57.2]. Further on, we will use the terms 
predictive analytics and computational modeling (for 
predictive analytics to sustain human intelligence) as if 
they were the same. 

We distinguish three pillars of computational mod- 
eling for predictive analytics: business analytics applied 
to big data (millions to billions of records, dozens 
to hundreds of variables), process analytics applied to 
medium-sized data (tens of thousands of records, hun- 
dreds of variables), and research analytics applied to 
precious data (tens to thousands of records, dozens to 
hundreds to thousands of variables) (Fig. 57.2). 


57.2.1 Business Analytics 


Business analytics is the part of predictive analytics 
associated with big data. In recent years, other sci- 


Computational Intelligence in Industrial Applications 


57.2 Computational Modeling for Predictive Analytics 1145 


ences also created big data problems, so the field could 
be called big data analytics. The distinguishing fea- 
ture of business analytics is the fact that it is used 
to inspect big data streams to provide a quick and 
simple analysis with immediate value reliably and con- 
sistently. Because of the size, big data already offers 
tremendous challenges in stages preceding analytics — 
in storage, retrieval, and visualization. These imply that 
the predictive goals can only be modest, except when 
big computing facilities and specialized data bases are 
available (like it happens in environmental and biolog- 
ical research, Internet search, smart grids, etc.). Main 
goals here are: 


@ Visualization (e.g., dashboards). 

@ Recommendation (e.g., studying customer habits 
and preferences to recommend a new suitable prod- 
uct item). Recommendation uses network analysis 
to select relevant or similar items. 

@ Identification of (simple) trends to enhance cus- 
tomer experience and increase surplus. Trends are 
typically found using time series analysis. 

@ Binary classification to distinguish out-of-the-ordi- 
nary data points from the prototypes following the 
trends (credit risk analysis, fraud detection, spam 
identification). 


Because of the memory limitations, the challenge 
in business analytics is to quickly give an answer to 
simple questions with the main focus on algorithms for 
in- and out-of-memory computation and visualization. 
Industries benefitting most from business analytics are 
retail, banking, insurance, health-care, telecommunica- 
tions, and social networks. 

For example, at large multinational manufactur- 
ing companies, business analytics predominantly re- 
volves around the multivariate forecasting of sup- 
ply and demand. The expected prices and volumes 
of feedstocks and raw materials as well as the ex- 
pected demand for various products are important 
to minimize risk and optimize production as well 
as the supply chain. Classical statistical forecasting 
techniques are the main workhorse for this area and 
the main challenges consist of being able to gather 
the required data, dealing with possibly large num- 
bers of candidate inputs and outputs for the models 
and properly dealing with the hierarchies that exist, 
e.g., products-markets-industry resulting in an explo- 
sion in the number of models that have to be built and 
maintained. 


Competitive advantage 


A Optimizati What’s the best that can 
ptimization happen? 
Predictive modeling © What will hap P 
z Analytics 
Forecasting/extraj datih Whati dese rng i 
P continue? 
Statistical analysis Why is this happening? 
Al © . 5 
erts What actions are needed! 
© 
i ji ? 
Query/drill down Where exactly is the problem: ‘Access & 
© How many? How often? re i 
y? yien: porting 
Ad hoc reports Where? 
@ 
Standard reports What happened? 
> 


Degrees of intelligence 


Fig. 57.1 Davenport and Harris [57.2] have wonderfully adapted 
the graphics from SAS software. The graph above eloquently ex- 
plains why to use predictive modeling to excel, compete, and 


capture value 


Business analytics 


Predictive 


modeling 


Big data 
High data redundancy 
Immediate value 


Medium-sized data 
High deployment 
constraints 
Immediate value 


Precious data, 
context matters 
Customized solutions 
Long-term value 


Fig. 57.2 Predictive modeling has three components: Business an- 
alytics, predictive analytics, and research analytics 


57.2.2 Process Analytics 


Process analytics exploits medium-size data to gener- 
ate time-sensitive prediction of an industrial process 
(e.g., manufacturing, process monitoring, remote sens- 
ing, etc.) with immediate value. 

Process analytics models must be very robust, sim- 
ple (mostly linear), and concise to be deployed in real 
industrial processes. 

This well understood and probably most conserva- 
tive area of predictive analytics has experienced a big 
change in the last years. A couple of decades ago, pro- 
cess optimization and control groups had more people 
and less pressure. Nowadays, pressure for integrating 
production workflows has increased together with the 


72s | 3 Hed 


146 PartE 


Evolutionary Computation 


72s |3 Hed 


need to meet tighter quality specifications, much tighter 
emission thresholds, requirements to reduce produc- 
tion, operation, and energy costs, and to maximize 
throughput. The sensor’s side has changed — sensors 
have become much more sophisticated and much more 
abundant. The human interference has also decreased 
due to (sometimes exaggerated) drive to automation, 
and cost reduction. 

All these factors have dramatically increased the de- 
mand for reliable optimization and control. In general, 
process analytics models must be very robust, simple 
(mostly linear) and concise to be deployed in real in- 
dustrial processes. The main challenge for this industry 
is to integrate more sophisticated models and adopt new 
computational methods for process analytics to adapt to 
the changing world of new requirements while main- 
taining robustness over a wide process range. At the 
time when this chapter was written data-driven mod- 
eling was still considered exotic for the field of process 
analytics, and model deployment is still heavily con- 
strained. 

The main goals in process analytics are process 
forecasting and process optimization and control. 

The challenge in process forecasting is to build 
models that hit the tradeoffs between model inter- 
pretability and their long-term (real-time) predictive 
power. The technological challenge of successful CI 
methods is the capability to identify driving features 
in a large set of correlated features. For example, 
think of a problem of predicting the quality of a man- 
ufactured plastic using the smallest subset of avail- 
able factors controlling the production process — pres- 
sures, temperatures, flows. Robust feature selection is 
as important as good prediction accuracy — models 
that are too bulky will never be accepted by process 
engineers. 

The main challenge in process control is the mul- 
tiobjective nature of control specifications and subse- 
quent optimization problems. Consider an example of 
manufacturing and wholesaling thin sheets of metal. 
The thickness of the sheet is an important quality 
characteristic that should not fall below a predefined 
minimum, or the product will be considered off-spec. If 
due to the processing condition the thickness variabil- 
ity is high (sheets are several meters wide and tens of 
meters long, rolled at high speeds, high temperatures), 
penalty for off-spec material is high, and costs for raw 
steel are also high — the manufacturer faces a delicate 
problem of making the sheet thicker than the allowed 
minimum to keep the clients happy but not too thick 
to keep the production costs down. These competing 


objectives usually require a multiobjective approach to 
process optimization. 

Process analytics relies on a rich data set coming 
from the many sensors in a typical plant. Mature plat- 
forms exist that store this, often high-frequency, sensor 
data in databases and plant information systems. The 
primary intent for this data is to run the various plant 
control and quality control systems but archived data 
are often available for predictive modeling as well. The 
use of models that predict the aging and lifetime of cat- 
alysts and the associated changes in optimal settings for 
the plant are good examples. 

Another example is the building of the so-called 
soft sensors that link difficult measurements, such as, 
e.g., grab samples that need to be brought to the lab 
for analysis with results only becoming available af- 
ter some time to some of the easier high-frequency 
measurements, such as, e.g., temperatures, flows, and 
pressures. These models then serve as substitutes for 
the difficult measurements at a high frequency and 
can be calibrated if needed when the slow measure- 
ments become available. There are also many opportu- 
nities to use the demand and supply forecasting models 
from the business analytics side to optimize produc- 
tion and product mix that is most optimal for a given 
scenario. As an extreme example, it may be cheaper 
to shutdown a plant for a while vs continued pro- 
duction when demand is forecasted to be very weak. 
The level and amount of coupling that is possible be- 
tween demand-supply forecasts and actual production 
can vary significantly and depends on many factors, 
but it is clear that much more is possible in this 
area. 

Examples of industries employing process analytics 
are manufacturing, chemical engineering, energy, envi- 
ronmental science. 


57.2.3 Research Analytics 


Research analytics is used to accelerate the devel- 
opment of new products and systems. This task is 
fundamentally different from all the ones mentioned 
previously as it is usually applied to small, com- 
plex, and precious data, is heavily dependent on 
problem context and provides long-term value with- 
out immediate rewards. (By small we mean any 
data set where the number of record is compara- 
ble or even smaller than the number of features. 
In this way, gene expression data with thousands 
of variables taken over dozens of individuals is 
small.) 


Computational Intelligence in Industrial Applications | 57.3 Methods 


Research analytics provides very customized solu- 
tions and requires a close collaboration between ana- 
lysts/modelers and subject matter experts. 

Research analytics is by nature much less generic 
and becomes very dependent on the specific product 
that is being developed. In research, once you have pre- 
dictive analytics, then there is only a small step to make 
from optimization of existing products to the design of 
new ones. One example of a research analytics success 
story is the development of an application to predict 
the exact color of a plastic part based on the compo- 
sition of the colorants and the specific grade a plastic 
being used, see [57.3]. Robust color prediction mod- 
els led to the capability of actually designing colorant 
compositions in silico directly from customer specifica- 
tions. The models also provided the specifications that 
were necessary to even let the customer produce that 
part himself. 

How far one is able to take this depends on the fi- 
delity of the models as well as the quality of the data 
that is available. Another example of research analyt- 
ics at work is the design of new coatings and catalysts 
based on high throughput experimentation where all 


57.3 Methods 


Over many years of exercising process and research 
analytics, we built up a practice of using predictive 
modeling as the integration technology for real-world 
problem solving. In the last 8—10 years, predictive mod- 
eling for computational intelligence has evolved from 
the solution of last resort to the main stream approach 
to industrial problem solving (prediction, control, and 
optimization). It is technology that glues together fun- 
damental modeling and domain expertise, high-per- 
formance computing and computer science, empirical 
modeling and mathematics — a heaven for an inquiring 
mind and interdisciplinary enthusiast. 

Predictive modeling is a bridge that connects theory 
and facts (data) to enable insight and system under- 
standing. The theory for poorly understood problems 
is often based on simplifying assumptions, on which 
the fundamental models are built. The facts, or empiri- 
cal evidence, are often affected by high uncertainty and 
a limited observability of the system’s behavior. 

Predictive modeling applied iteratively to a grow- 
ing set of facts tests the theory against the data and 
extrapolates models build on the data to confirm or ad- 
just the theory until the theory and facts start to agree. 


the available data is being used to build models on 
the fly. These models are than used to design the next 
experiments such that the information gain is maxi- 
mum. The requirements for the modeling process are 
quite high because everything is embedded in a high- 
throughput workflow but the benefits are also huge. Sig- 
nificant speedups in the total design time as well as 
the performance of new products can be achieved this 
way. 

We stress that because research analytics is an en- 
hancement to human intelligence for the development 
of new products and systems, the benefits of its appli- 
cation scale proportionally with the size of the problem 
and the impact of that particular product or system. For 
big enough problems the benefits quickly get into the 
hundreds of thousands to millions of dollars. 

Research analytics can help drive innovation in all 
industry segments, particularly in materials science, 
formulation design, pharmaceuticals, engineering, sim- 
ulation-based optimization in research, bio-engineer- 
ing, healthcare, telecommunications, etc. In the coming 
10 years, we will continue to see the trend of innovation 
enabled to a large extent by predictive modeling. 


The validation always lies in the hands of a subject mat- 
ter expert who in the case of success accepts both the 
theory and the designed predictive models as plausible 
and interesting. While the real validation comes when 
models are deployed and keep generating value, with- 
out the first step of intriguing the subject matter expert 
the project does not have a chance to succeed. 

To clear any obstacles toward the acceptance of 
models by the domain expert the models should be: 


1. Interpretable 
2. Parsimonious 
3. Accurate 

4. Extrapolative 
5. Trustable, and 
6. Robust. 


In an industrial setting, the capability of having 
a trustable prediction of the output within and outside 
the training range is as important as interpretability and 
the possibility of integrating information from first prin- 
ciples, low maintenance and development costs with no 
(or negligible) operator interference, robustness with re- 


1147 


E'LS | 3 Hed 


148 PartE 


Evolutionary Computation 


eS | 3 Hed 


spect to the variability in data, and the ability to detect 
novelties in data to attune itself toward changes in sys- 
tem’s behavior. 

There is no single technique producing models that 
are guaranteed to fulfill all of the requirements above, 
but rather there is a continuum of methods (and hybrids) 
offering different tradeoffs in these competing objec- 
tives. 

Commonly used predictive modeling techniques in- 
clude linear regression, and nonlinear regression [57.4], 
boosting, regression random forests [57.5], radial-ba- 
sis functions [57.6], neural networks [57.7], support 
vector machines (SVM) [57.8, 9], and symbolic regres- 
sion [57.10, 11] (see more in [57.12]). 

In Fig. 57.3, we place some of the most common 
methods for predictive modeling for process and re- 
search analytics in the objective space of development 
time versus the level of a priori knowledge about the 
problem. When identifying which methods to use other 
objectives (like interpretability and extrapolative capa- 
bility) must also be taken into account. The time axis is 
depicted on a log scale, and the exact development time 
depends on implementation and a particular algorithm 
flavor. 

Support vector machines and ensemble-based neu- 
ral networks lose to linear, nonlinear, and regularized 
regression in interpretability, but have advantages for 
problems where little a priori information is known 
about the system, and no assumptions on model struc- 
tures can be made (see Fig. 57.3). 

Regression random forests, and symbolic regres- 
sion [57.13—15] have further advantages for problems 
where not only model structures but also the variable 
drivers (significant factors) are unknown. 


A priori knowledge 
A 


Variables are 
known, model 
structure is 
known 


Linear regression 
Non-linear regression 


Variables are 
known, model 
structure is 
NOT known 


SVMs, NNs 


Random forests Variables are 
NOT known, 
model structure 

Symbolic regression is NOT known 

HE OU 


Time 


Fig. 57.3 Predictive modeling methods as competing 
tradeoffs in development time versus the level of a priori 
knowledge about the problem 


Random forests proved to be robust and very ef- 
ficient for predicting the response within the training 
range and for identifying the most significant variables, 
but because they do not possess extrapolative properties 
they can only be used in problems where no extrap- 
olation is necessary. Recent studies [57.16] indicate 
that variable selection information obtained by random 
forests can loose meaning if correlated variables are 
present in the data and affect the response differently. 

In business analytics, when the speed of model 
development is the main goal, linear regression and reg- 
ularized learning are the only remaining options. (Re- 
cent developments for predictive modeling for big data 
are also focusing on the feature generation problem, 
where the set of original data variables gets expanded. 
to a much larger set of new features — transformations 
of the original variables, for which regularized linear 
regression is applied. Much like in support vector re- 
gression). 

In process analytics when the driving input factors 
are known — ensemble-based neural networks, support 
vector regression, and ensemble-based symbolic regres- 
sion are the modeling alternatives. 

If very little is known about the process or sys- 
tem, experiments are demonstrating correlations among 
input variables, and concise interpretable models are 
required — symbolic regression is the only resort, 
which comes at a price of a higher development time 
(Fig. 57.3). 

We stress the importance of using ensembles of pre- 
dictive models irrespective of which modeling method 
is used. Ensemble disagreement used as a trustability 
measure defines the confidence of prediction and is cru- 
cial for reliable extrapolation. (It cannot be stressed 
enough that all prediction in a space of dimensionality 
above 3 is mostly extrapolation, even when evaluated 
inside the training range.) 

We deal mostly with process and research analyt- 
ics. In our experience, the aspect of trustablility via 
ensembles of global transparent models, coupled with 
the massive algorithmic efficiency gains and the abil- 
ity to easily handle real-world data with spurious and 
correlated inputs has led to symbolic regression largely 
replacing neural networks and support vector machines 
in our industrial modeling. Our experience also is that 
symbolic regression models tend to extrapolate well as 
well as provide warning of that extrapolation. 

The reason for symbolic regression being success- 
ful for process and research analytics is the fact that all 
real-world modeling problems we have seen up to now 
contained only a dozen of relevant inputs (never more 


Computational Intelligence in Industrial Applications 


57.4 Workflows 


than 25 variables, in most cases less than 10) which 
were truly significantly related to the response. Because 
symbolic regression searches for plausible models in 
a space of all possible structures from the given set of 
potential inputs, and allowed functional transforms, the 
computational complexity increases nonlinearly with 
the dimensionality of the true design space. For this rea- 
son, symbolic regression effortlessly identifies dozens 
of driving variables among tens to hundreds of candi- 
dates (albeit using hours of multicore computing time). 
But it should not be used for problems where hundreds 
of inputs are significantly related to the response and 


57.4 Workflows 


Although there is no universal solution for predictive 
modeling and no size fits all, especially for research an- 
alytics, nothing is as important for a successful solution 
as a good modeling workflow. 

We would like to make a case for the utmost impor- 
tance of workflows and the need to nurture and actively 
proliferate them through all CI projects. In the next 
section, we give an example on how a successful work- 
flow developed in a project from flavor science could be 
seamlessly applied to a project in video quality predic- 
tion. And because predictive modeling for CI will soon 
be used in nearly all industry segments and research 
domains, we believe that it is the responsibility of CI 
practitioners to facilitate innovation through prolifer- 
ation and popularization of (interpretable) workflows 
allowing straightforward application in new domains. 

The most general approach to practical predictive 
modeling is depicted in Fig. 57.3. 

We view this generic framework as an iterative feed- 
back loop between three stages of problem solving (just 
as it usually happens in real-life applications): 


1. Data generation, analysis and adaptation 
2. Model development, and 
3. Problem analysis and reduction. 


An important observation is made in the To- 
ward 2020 Science report edited by Emmott and Ri- 
son [57.17]: 


What is surprising is that science largely looks at 
data and models separately, and as a result, we miss 
the principal challenge — the articulation of mod- 
elling and experimentation. Put simply, models both 
consume experimental data, in the form of the con- 


should be filtered out of thousands of candidates. We 
claim though that no methods are available to solve the 
latter kind of problems because the necessary amount 
of data capturing true input-output relationships will 
never be collected. 

Although tremendous progress has been made 
over the past decade in terms of the efficiency and 
quality of symbolic regression model development, 
we also have made corresponding advances from 
a holistic perspective encompassing the overall mod- 
eling workflows from data collection through model 
deployment. 


text or parameters with which they are supplied, and 
yield data in the form of the interpretations that are 
the product of analysis or execution. Models them- 
selves embed assumptions about phenomena that 
are subject of experimentation. The effectiveness of 
modeling as a future scientific tool and the value of 
data as a scientific resource are tied into precisely 
how modelling and experimentation will be brought 
together. 


This is exactly the challenge of predictive modeling 
workflows — a holistic approach to bring together data, 
models, and problem analysis into one generic frame- 
work. Ultimately, we want to automate this iterative 
feedback loop over data analysis and generation, model 
development, and problem reduction as much as possi- 
ble, not in order to eliminate the expert, but in order to 
free as much thinking time for the expert as possible. 

This philosophical shift away from human replace- 
ment in the modeling workflow toward human aug- 
mentation has been very important in the last decade. 
A successful workflow must offer suites which mine the 
developed models to identify driving factors, variable 
combinations, and key variable transforms that lead to 
insight as well as robust prediction. 


57.4.1 Data Collection and Adaptation 


Very often, especially in big companies, and especially 
for process analytics, CI practitioners do not have ac- 
cess to data creation and experiment planning. This gap 
is a typical example of a situation, where multivariate 
data is given and there is no possibility to gather better 
sampled data. 


1149 


ls | 3 Hed 


150 PartE 


Evolutionary Computation 


G°4S | 3 Hed 


In other situations, there is a possibility to plan the 
experiments, and gather new observations of the re- 
sponse for desired combinations of input variables, but 
the assumption always is that these experiments are 
very expensive, i. e., require long computation, simula- 
tion, or experimentation time. Such a situation is most 
common in research analytics and meta modeling for 
the design and analysis of simulation experiments. 

The questions to ask at the data collection and adap- 
tation stage are: How to design experiments within the 
available timing and cost budget to optimally cover 
the design space (possibly containing spurious vari- 
ables)? How can available data and developed models 
guide design-space exploration in the next iterations? 
Is available data well sampled? Is it balanced? What 
is the information content of performed experiments? 
Is there redundancy in the data and how to minimize 
it? 


57.4.2 Model Development 


In model development, the focus is on automatic cre- 
ation of collections of diverse data-driven models that 
infer hidden dependencies on given data and pro- 
vide insight into the problem, process, or system in 
question. 


57.5 Examples 


57.5.1 Hybrid Intelligent Systems 
for Process Analytics 


A good example of a unified workflow for process 
analytics is the hybrid intelligent systems framework 
popularized at the Core R&D department of the Dow 
Chemical Company in the late 1990s. 

The methodology was developed to improve soft 
sensor performance (performance of predictive mod- 
els), to shorten its development time, and minimize 
maintenance. It employed different intelligent system 
components — genetic programming, support vector 
machines, and analytic neural networks [57.18]. 

The process analytics in this framework consists of 
three steps following data collection: 


1. Data preprocessing and compression. Support vector 
regression using the €-insensitive margin is used to 
identify and remove data outliers and compress data 
to a representative set of prototypes (support vec- 
tors). The result is a clean and compressed data set. 


Irrespective, of which modeling engines are used at 
this stage, the questions on how to best generate, eval- 
uate, select, and validate models given particular data 
features (size and dimensionality) are of great impor- 
tance. Model quality, in general, i.e., generalization, 
interpretability, efficiency, trustworthiness, and robust- 
ness is the main focus for model analysis leading to the 
next stage. 


57.4.3 Problem Analysis and Reduction 


The stage of problem analysis and reduction supposes 
that developed models are carefully scrutinized, fil- 
tered, and validated, to infer preliminary conclusions 
on problem difficulty. The focus is on driving inputs, 
assessment of variable contribution, linkages among 
variables, dimensionality analysis, and construction of 
trustable model ensembles. The latter if defined well 
will contribute to intelligent data collection in the style 
of active learning. 

With a goal to augment human intelligence by com- 
putation, we emphasize the critical need for a human, 
an inquiring mind who will test the theory, the facts 
(data) and their interpretations (models) against each 
other to iteratively develop a convincing story where all 
elements fit and agree. 


2. Preliminary variable selection using ensemble- 
based stacked analytic neural networks [57.19]. The 
result of this step is a ranking of input variables and 
quantification of variable contribution based on it- 
erative input elimination and re-training. 

3. Convolution parameter estimation to identify rele- 
vant time-lags of significant inputs using appropri- 
ate convolution functions. 

4. Development of transparent predictive models us- 

ing symbolic regression via genetic programming 

and final variable selection using symbolic regres- 
sion models. 

Model selection and analytical function validation. 

Online implementation. 

7. Soft sensor maintenance to guarantee robustness of 
process prediction. 


aur 


Examples of the use of this workflow for reactor 
modeling can be found in [57.18]. 

We all practiced the hybrid intelligent systems 
workflow in the past, but the massive algorithmic effi- 


ciency gains in ensemble-based symbolic regression via 
genetic programming of the last decade [57.14, 20] have 
led us to simplify the workflow and largely eliminate 
steps one and two to replace them by direct application 
of symbolic regression. 


57.5.2 Symbolic-Regression Workflow 
for Process Analytics 


The major modeling engine breakthrough was the in- 
corporation of a multiobjective viewpoint; this intro- 
duced orders of magnitude improvements in model 
development speed while simultaneously allowing the 
analyst to choose the proper balance between complex- 
ity and accuracy post facto. In essence, the data could 
now define the appropriate model structure and driving 
inputs, which became the main reason for symbolic re- 
gression’s success for predictive modeling. 

Other conceptual advances ordinal genetic pro- 
gramming, interval arithmetic, Lamarckian evolution 
and secondary optimization objectives, such as age, 
model dimensionality, nonlinearity, etc., have brought 
us to the current situation where we can largely inject 
data into a (properly designed) symbolic regression en- 
gine and interesting and useful models will emerge. 

The symbolic regression workflow has become as 
depicted in Fig. 57.4, but with model development done 
using Pareto-aware symbolic regression [57.14]. 


Distillation Tower Example 
The dataset comes from an industrial problem on mod- 
eling gas chromatography measurements of the compo- 
sition of a distillation tower and is available online at 
http://www.symbolicregression.com. 


Problem Data \ 
analysis collection 
& & 
reduction adaptation 


Model 
development 


Fig. 57.4 Generic iterative model-based problem solving 
workflow (after [57.15]) 


A chemical reaction typically generates a variety of 
chemicals along with the one (or several) of interest. 
One method of isolating the mixture coming from the 
reactor into various purified components is to use a dis- 
tillation column. The (hot gaseous) input stream is fed 
into the bottom and on the way to the top goes through 
a series of trays having successively cooler tempera- 
tures. The temperature at the top is the coolest. Along 
the way, different components will condense at different 
temperatures and be isolated (with some statistical dis- 
tribution on the actual components). With vapors rising 
and liquids falling through the column, purified frac- 
tions (different chemical compounds) can be retrieved 
from the various trays. The distillation column is very 
important for the chemical industry because it allows 
continuous operation as opposed to a batch process and 
is relatively efficient. 

This distillation column problem contains nearly 
7000 records and 23 potential input variables — mix- 
ture of flows, pressures, and temperatures — in addition 
to the quality metric and material balance. The response 
variable is the concentration of a purified component at 
the top of the distillation tower. This quality variable 
needs to modeled as a function of relevant inputs only. 
The range of the measured quality metric is very broad 
and covers most of the expected operating conditions in 
the distillation column. 

To design the test data, we sorted the samples by 
their response values and selected every third and sev- 
enth samples for the validation set and every fourth 
and eight samples for the test set. The remaining points 
formed the training set. 

Many input variables in the data are heavily cor- 
related. Because symbolic regression can deal with 
correlated variables, we used all 23 inputs in the first 
round of modeling to perform initial variable impor- 
tance analysis. 

The workflow that follows exploratory data analysis 
is described below: 


1. Initial modeling: We allocated 2 hours of com- 
puting time on a quad-core machine to perform 
24 20-minute independent runs of symbolic re- 
gression by genetic programming using Evolved- 
Analytics’ DataModeler [57.14]. All symbolic re- 
gression runs used basic arithmetic operators aug- 
mented by a negation and a square as primitives. All 
models were stored on disk, and all other settings set 
to default settings of the symbolic regression func- 
tion of [57.14]. In total, more than 3000 symbolic 


Computational Intelligence in Industrial Applications | 57.5 Examples 1151 


G°4S | 3 Hed 


1152 


G25 | 3 Hed 


Part E 


Evolutionary Computation 


regression models were generated during 24 inde- 
pendent runs. 

2. Variable importance analysis: For all models 
presence-based importances were computed. Fig- 
ure 57.5 demonstrates that only a handful of vari- 
ables is identified as drivers ([57.14] suggests to use 
importance threshold of 20%). 

3. Variable combination analysis: All developed mod- 
els were analyzed for dimensionality and most 
frequent variable combinations. In Fig. 57.6, one 
can see model subsets niched according to con- 
stituting variable combinations. The bottom graph 
suggests that variables colTemp1l, colTemp3, and 
colTemp5 might be sufficient for describing the re- 
sponse, since they cover the knee of the Pareto front 
in complexity vs. accuracy space. 

4. Variable contribution analysis: Models were sim- 
plified by identifying and eliminating the least con- 
tributing variable. Variable combination analysis 
was repeated for simplified models and resulted in 
identifying colTemp1 and colTemp3 as new candi- 
dates for a sufficient subspace. 

5. New runs performed on a subset of input vari- 
ables identified as drivers: The new batch of 
independent symbolic regression runs was ap- 
plied to the same data but only using colTemp1 
and colTemp3 as the candidate input variables. 
As expected, models generated in this experi- 
ment demonstrated that the same complexity— 
accuracy tradeoffs can be achieved in only two- 


headPressure 1 
headPressure2 
headTemp1 
refluxFlow 
headTemp2 
feedFlow 
bottomFlow1! 
colTemp1 
colTemp2 
colTemp3 
colTemp4 
colTemp5 
matBalance 
colTemp6 
colTemp7 
colTemp8 
colTemp9 
colTemp10 
colTemp11 
upstreamFlow | 
upstreamFlow2 
bottomFlow2 
bottomTemp 


0 20 40 60 80 % 


Fig. 57.5 Variable presence in developed symbolic regression 


models 


dimensional rather than 23-dimensional input 
space. 

6. Ensemble generation using developed models and 
a validation set: Final model ensemble was gen- 
erated automatically using developed symbolic re- 
gression models and validation data set. It was 
augmented by quadratic and cubic models on two 
variable drivers. 

7. Ensemble prediction validation using test data: 
Ensemble prediction and ensemble disagreement 
were finally evaluated on the test data. Initial re- 
quirements for prediction accuracy to not exceed 
5—7% of standard deviation were met by all en- 
semble models. Ensemble prediction is graphed in 
Fig. 57.7. 


This example demonstrates the use of a good model 
development workflow. An ensemble similar to the 
one described here has been deployed for controlling 
a gas chromatography measurement in a real distilla- 
tion column. 


57.5.3 Sensory Evaluation Workflow 
for Research Analytics 


A flavor design case study is an example of a more 
specialized workflow [57.21]. In sensory evaluation, 
scientifically designed experiments are used to define 
a small set of mixtures that can be presented aromati- 
cally to evaluators to identify the ingredients that drive 
hedonic response (positively or negatively) of a target 
panel of consumers. Each panelist is asked how much 
they like the flavor, ranging from like extremely to dis- 
like extremely with 9 distinctions. Details of the study 
can be found in [57.21]. Our focus here is the workflow 
that allowed to evaluate the consistency of liking prefer- 
ences in the target population and gain insight into how 
to design or identify flavors that most consumers would 
consistently like. 

The data for this project was provided by the Gi- 
vaudan Flavors Corp. It falls into a category of pre- 
cious data. It consists of sensory evaluation scores of 
36 mixed flavors containing seven ingredients evalu- 
ated by 69 human panelists. In other words, data has 
seven input variables (flavor ingredients), 36 records 
(flavors), and 69 response measurements per record 
(Fig. 57.8). 

Because of the high variability of response values 
per flavor, panelist responses were modeled individ- 
ually. Because transparent and diverse input response 
models were required to approximate this challeng- 


Computational Intelligence in 


Industrial Applications | 57.5 Examples 


Num > % 


1 35>4.1% 


2 29>34% 


3 2252.6% 


4 20>24% 


5 20524% 


Variables used 


refluxFlow 
feedFlow 
colTemp1 
upstreamFlow2 


refluxFlow 
feedFlow 
colTemp1 
colTemp5 
upstreamFlow2 


refluxFlow 
feedFlow 
colTemp1 
colTemp4 
upstreamFlow2 


colTemp1 
colTemp3 
colTemp5 


refluxFlow 
feedFlow 
colTemp1 


1-R? 


1-R’ 


1-R? 


1-R’ 


1-R? 


ParetoFrontPlot 
0.1 dee 
0.08 |“ 
e. 0 
0.06 e%e 
0.04| SE 
e 
0.02 = oe ao @ ee o Seo a 
20 40 60 80 100 120 140 
Complexity 
0.1 den 
0.08 | “= 
sat 
0.06 e 
0.04 s. © e 
oe ii © .00 00g o oe Sue 
> 
20 40 60 80 100 120 140 
Complexity 
0.1 4o8 
0.08 | “ 
8 
0.06 
0.04 e 
© 
0.02) Se ~ec, 0o00 00o °, 
20 40 60 80 100 120 140 
Complexity 
0.1 Ass 
0.08 | $9 
K 
0.06 
@ 
0.04| 34% 
© 
0.02 ‘Some, © © C0 oe 7 
20 40 60 80 100 120 140 
Complexity 
0.1 tow 
0.08 | “ 
0.06 a8, 
ee 
0.04 e 
eco 
0.02 wo oe, s ž SS 
20 40 60 80 100 120 140 = 
Complexity 


Fig. 57.6 Complexity—accuracy tradeoffs for most frequent variable combinations in the distillation column example 


1153 


G°4S | 3 Hed 


1154 PartE 


Evolutionary Computation 


G°4S | 3 Hed 


Fig. 57.7 Prediction of the final ensemble of symbolic re- 
gression models on test data. All models seem to agree on 
unseen test data set. This should not be surprising, because 
the training, validation, and the test set were designed to 
cover the full range of operating conditions > 


ing data set, modeling was done using ensemble-based 
symbolic regression. 

For each panelist, a standard workflow was applied 
to identify driving ingredients which changes in pan- 
elist’s liking [57.22]. 

When developed, model ensembles predicting in- 
dividual responses could be bootstrapped to a richer 
set of virtual mixtures (tens of thousands of flavors in- 
stead of the available 36). The bootstrapped responses 


a) Probability density 
0.35 i 
1 Neutral i. Easyto | 
0.3 S 7 please É 
i i i 
025 l l l 
| 1 | 
| | | 
0.2 i i i 
1 1 l 
1 i} I 
0.15 l ! i 
1 1 l 
ji 1 I 
0.1 I 1 I 
l 1 I 
1 1 l 
0.05 Hardto | i i 
/ please , i 
0 - = 
0 2 4 6 8 10 12 
Liking score 
c) Probability density 
1 
Panelists: 


2, 5, 11, 22, 34, 37, 45, 51, 61, 30, 
7, 18, 26, 33, 36, 41, 50, 57, 4, 63, 
1, 3, 6, 13, 24, 31, 35, 38, 52, 49, 65 


0 = = 
1 2 3 4 S 6 T 8 Cy Kya 
Liking score 


Predicted 
24 


ES 


= 


0 0.5 1 eS) 2 
Observed 


b) Probability density 

l4 
0.9 
0.8 Panelists: 

10, 15, 20, 28, 40, 62, 53, 
0.7 8, 14, 19, 25, 32, 44, 59, 66, 
06 12, 17, 23, 29, 42, 54, 64, 67 
0.5 
0.4 
0.3 
a 

0.2 Ý 


01 f 


0 —S 
1 2 al 4 5 6 J 8 9 10 10 12 
Liking score 
d) Probability density 
1, 
0.9 
0.8 
0.7 
0.6 Panelists: 
21, 46, 55, 68, 
0.5 9, 43, 48, 58, 
0.4 |. 27,47, 56, 69 
0.3 
0.2 
0.1 Ww 
ð \ 


1 2 3 4 5 6 j 8 ee ee Oem 2 
Liking score 


Fig. 57.8a-d Example of panel segmentation by propensity to like from [57.22]: (a) Decision regions for evaluating 
cumulative distribution for liking score density model (b) hard to please panelist (c) neutral panelists, (d) easy to please 


panelists 


Computational Intelligence in Industrial Applications 


57.6 Conclusions 


were used to cluster the target population into three 
segments: easy to please — (cyber)individuals who con- 
sistently give high ratings to most flavors, hard to 
please — individuals that consistently use a low range of 
scores for all flavors, and neutral panelists whose pref- 
erential range is centered around the medium score — 
neither like, nor dislike. Such segmentation of the tar- 
get population by people’s propensity to like products 
turned out to be very useful in several other applications 
beyond flavor design. It focuses product development 
by giving insight into the fundamental variability in the 
preferences of the target audience. 

The standard workflow for variable importance es- 
timation applied to model ensembles forecasting the 
scores of individual panelists also allowed to segment 
the target population by ingredients that drive liking in 
the same direction. Such segmentation of the consumer 
market combined with the cost analysis for new prod- 
uct design offers visualization and analysis of beneficial 
tradeoffs for product specialization. 

The third outcome of this study was the de- 
velopment of a model-guided optimization workflow 


57.6 Conclusions 


In this chapter, we discussed how computational intel- 
ligence leads to predictive analytics to produce busi- 
ness impact. We identified three main areas of pre- 
dictive analytics: business analytics that deals mainly 
with visualization and forecasting, process analytics 
which aims to improve optimization and control of 
manufacturing processes, and research analytics which 
aims at speeding up and improving product and pro- 
cess design. All three areas have the potential to 
save and earn many millions of dollars but deal with 
very different data sources, context, information con- 
tent, amount of available domain knowledge, and 
time and prediction requirements for value genera- 
tion. Driven by different motivations, the areas are 
subsequently employing different predictive modeling 
methods. 

We presented several predictive modeling methods 
in the context of different prediction requirements, so- 
lution development, and deployment constraints. We 
emphasized that there is no single method that fits all 
problems, but rather there is a continuum of methods, 
and each problem dictates selection of a method by 
specific time requirements and the amount of available 
a priori subject-matter knowledge (Fig. 57.3). 


for designing optimal virtual mixtures. Multi-objec- 
tive optimization using swarm intelligence was used 
to find tradeoffs in the flavor design space that 
simultaneously maximize the average liking score 
and minimize variance in the liking across virtual 
panelists. 

Such model-guided optimization workflow com- 
bined with the standard ensemble-based modeling 
workflow presents a strong motivation for the develop- 
ment of a targeted data collection system for designing 
new products. 

We should point out that despite a very custom 
design and specialized domain of sensory evaluation 
in food science, the workflow could successfully be 
applied in the very different domain of video qual- 
ity prediction. Ensemble-based symbolic regression 
was used to model the perceived quality of per- 
turbed video frames and results were used to predict 
customer satisfaction and segment the representative 
population of video viewers by propensity to notice 
perturbations and sensitivity to particular perturba- 
tions [57.23]. 


We stressed the importance of good and stable pre- 
dictive modeling workflows for success in CI projects 
and provided several examples of such workflows for 
process and research analytics, illustrating that re- 
search analytics projects require highly customized 
approaches. 

We point out that successful CI projects are am- 
plifiers, that necessarily keep the human in the loop 
and vastly enhance her/his capabilities. Because of this, 
integrating CI in the various process and business work- 
flows is essential! 

It is clear that our ability to generate data as well as 
our ability to analyze it and produce actionable knowl- 
edge are quickly expanding. The challenge remains on 
how to develop scalable CI algorithms that keep up with 
the ever rising tide of data, given that computational ad- 
vances in hardware (massive parallelization, exa-scale 
computing) are developing at a much faster pace than 
the CI algorithms. 

A question that still puzzles us is: Can we get more 
intelligence with more computational power, and where 
(and whether) it stops? Undoubtedly, the right answer 
lies in the development of new algorithms that can 
tackle the new challenges — advanced material design, 


1155 


9°45 | 3 Hed 


156 PartE 


Evolutionary Computation 


ZS | 3 Hed 


problems in bio-informatics, complex-system model- 
ing in social sciences, and social networks. We expect 
the largest impact of predictive modeling to happen in 
the areas of research and process analytics — in design 
of new products and new processes. Examples of de- 
sign problems that can be assisted by data-driven CI 
methods for research analytics are the development of 
advanced materials — photovoltaic cells, alternative fu- 
els, bio-degradable replacements for paints and plastics, 
composite materials, sustainable food sources. From 
the process analytics side, we would like to see CI 
methods used for optimization of water purification, 
emission control in combustion processes, simulation- 


based optimization of social events on a world scale 
(terror attacks, revolutions, pandemics spread), effi- 
ciency optimization of manufacturing cycles, garbage 
minimization, and recycling. 

It cannot be stressed enough that the dynamics 
around CI is changing — instead of CI being an op- 
tional addition to the arsenal of problem solving tools 
and methods, CI is becoming indispensable to deal and 
make progress with this new breed of real-world prob- 
lems. The only way for CI practitioners to bring CI 
to prime time is to develop scalable algorithms, pro- 
liferate good workflows, and implement them in great 
applications. 


References 

57.1 M. J. Manyika, M. Chui, B. Brown, J. Bughin, R. 57.13 W. Banzhaf, P. Nordin, R.E. Keller, F.D. Fran- 
Dobbs, C. Roxburgh, A. H. Byers: Big data: The next cone: Genetic Programming: An Introduction on 
frontier for innovation, competition, and produc- the Automatic Evolution of Computer Programs and 
tivity, available online at http://www.mckinsey. its Applications (Morgan Kaufmann, San Francisco 
com/mgi (2011) 1998) 

57.2 T.H. Davenport, J.G. Harris: Competing on Analyt- 57.14 M. Kotanchek: Evolved Analytics LLC: DataModeler 
ics: The New Science of Winning, 1st edn. (Harvard Release 8.0 (Evolved Analytics LLC, Midland 2010) 
Business School, Boston 2007) 57.15 E. Vladislavleva: Model-based Problem Solving 

57.3 J.C. Torfs, G.J. Brands, E.G. Goethals, E.M. Dedeyne: through Symbolic Regression via Pareto Genetic 
Method for characterizing the appearance of a par- Programming (Tilburg Univ., Tilburg 2008) 
ticular object, for predicting the appearance ofan 57.16 S. Stijven, W. Minnebo, K. Viadislavleva: Separating 
object, and for manufacturing an object having a the wheat from the chaff: On feature selection and 
predetermined appearance, which has optionally feature importance in regression random forests 
been determined on a basis of a reference object, and symbolic regression, Proc. 13th Annu. Conf. 
WO Patent Ser 20 0204 2750 A1 (2004) Companion Genet. Evol. Comput. (2011) pp. 623- 

57.4 R.A. Johnson, D.W. Wichern: Applied Multivariate 630 
Statistical Analysis (Prentice Hall, Englewood Cliffs 57.17 S. Emmott, S.Rison: Towards 2020 Science, Mi- 
1988) crosoft, Cambridge (2006) 

515 L. Breiman: Random forests, Mach. Learn. 45, 5-32 57.18 A.K. Kordon, G.F. Smits: Soft sensor development 
(2001) using genetic programming, Proc. Genet. Evolut. 

57.6 M.D.J. Powell: Radial basis functions for multi- Comput. Conf. (2001) pp. 1346-1351 
variable interpolation: A review. In: Algorithms for 57.19 A.K. Kordon, G.F. Smits, A.N. Kalos, E.M. Jordaan: 
Approximation, ed. by J. Mason, M.G. Cox (Claren- Robust soft sensor development using genetic pro- 
don, Oxford 1987) pp. 143-167 gramming. In: Nature-Inspired Methods in Chemo- 

57.7 S. Haykin: Neural Networks and Learning Machines, metrics: Genetic Algorithm and Artificial Neural 
3rd edn. (Pearson Educ., Harlow 2008) Networks, ed. by R. Leardi (Elsevier, Amsterdam 

57.8 V. Vapnik: Estimation of Dependences Based on 2003) pp. 69-108 
Empirical Data (Springer, Berlin, Heidelberg 1982) 57.20 M. Kotanchek: Real-world data modeling, Proc. 

57.9  V.Vapnik: The support vector method, Proc. 7th Int. 12th Annu. Conf. Companion Genet. Evol. Comput. 
Conf. Artif. Neural Netw. (1997) pp. 263-271 (2010) pp. 2863-2896 

57.10 J.R. Koza: Genetic Programming: On the Program- 57.21 K. Veeramachaneni, E. Vladislavieva, U.- 
ming of Computers by Means of Natural Selection M. O'Reilly: Knowledge mining sensory evaluation 
(MIT, Cambridge 1992) data: Genetic programming, statistical techniques, 

57.11 R. Poli, W.B. Langdon, N.F. McPhee: A Field Guide and swarm optimization, Genet. Progr. Evol. Mach. 
to Genetic Programming (Lulu, Raleigh 2008) 13(1), 103-133 (2012) 

57.12 A.K. Kordon: Applying Computational Intelligence: 57.22 K. Vladislavieva, K. Veeramachaneni, U.- 


How to Create Value (Springer, Berlin, Heidelberg 
2010) 


M. O'Reilly: Learning a lot from only a little: 
Genetic programming for panel segmentation on 


Computational Intelligence in Industrial Applications | References 1157 


sparse sensory evaluation data, Proc. 13th Eur. a no-reference H.264/AVC bitstream-based video 
Conf. Genet. Progr. (2010) pp. 244-255 quality metric using genetic programming-based 

57.23 N. Staelens, D. Deschrijver, E. Viadislavleva, B. Ver- symbolic regression, IEEE Trans. Circuits Syst. Video 
meulen, T. Dhaene, P. Demeester: Constructing Technol. 23(8), 1322-1333 (2013) 


2g | 3 Hed 


1159 


58. Solving Phase Equilibrium Problems 
by Means of Avoidance-Based Multiobjectivization 


Mike Preuss, Simon Wessing, Giinter Rudolph, Gabriele Sadowski 


Phase-equilibrium problems are good examples 


58.1 Coping with Real-World Optimization 


Tr cawod cminection aaiieaiien piob- PRODI GIS coreane en 1159 
lems with a certain characteristic. Despite their 58.2 The Phase-Equilibrium Calculation 
; A ; S P ; Problemi n e asea 1161 
low dimensionality, finding the desired optima 58.3 Multiobjectivization-Assisted 
is difficult as their basins of attraction are small : p T 
and surrounded by the much larger basin of the Multimonal Optimization: MORMG: E 1162 
i ; 58.3.1 Basics of Multiobjective 
global optimum, which unfortunately resembles eee us 
: A 7 OPUMIZANON..csasdccssesccecrssececesoes 1164 
a physically impossible and therefore unwanted 58.4 Solving G | Phase-Equilibri 
solution. We tackle such problems by means of oo ee eee i aa al 
Pua oa is j F : Probleme- siener an a 1165 
a multiobjectivization-assisted multimodal opti- E aS 
aA : p ie 58.4.1 Ternary Liquid—-Liquid 
mization algorithm which explicitly uses problem Equilibrium: 
knowledge concerning where the sought solu- Water/Methanol/MMA .......c-.0-+.+- 1165 


tions are not in order to find the desired ones. 
The method is successfully applied to three phase- 
equilibrium problems and shall be suitable also for 
tackling difficult multimodal optimization prob- 
lems from other domains. 


58.4.2 Three Phase Equilibria: 
Water/MMA and Water/Furfural... 1167 
58.4.3 Obtaining the Phase Diagrams.... 1168 
58.5 Conclusions and Outlook...................... 1169 
REFEFENCES...... 0... eee eee c ccc eeceeceeaeseesneeneeees 1169 


58.1 Coping with Real-World Optimization Problems 


A multitude of methods from within and beyond evo- 
lutionary computation (EC) has been applied to real- 
valued multimodal optimization problems. These are 
generally considered the harder, the more basins of 
attraction they contain, and the less smooth the fit- 
ness landscape is. Additionally, a search space that 
extends over a large number of dimensions is said to 
complicate search for the desired global or good local 
optima [58.1]. 

However, in a real-world setting, even a low- 
dimensional problem may turn out to be quite difficult. 
This can stem from different factors, one of which 
would be a very small extent of the basins that con- 
tain the sought optima. Figure 58.1 visualizes the fitness 
landscape of an optimization problem that possesses 
this property. The application background will be de- 
tailed in Sect. 58.2, but for now it suffices to know 


that there are only two variables a and b, and that the 
desired minima (function values do not depend on vari- 
able order and are thus symmetric to the main diagonal) 
are located near (0.650,0.001) and (0.001,0.650), re- 
spectively. It is easy to see that the appropriate basins 
are small; in the figure, they are hardly recognizable at 
all. 

Another complicating factor would be uncertainty 
about the relative target function value of the sought 
optima. If it is not a priori known whether we are look- 
ing for a global or only a certain local optimum, there 
is no way around enumerating all existing optima and 
choosing the right solution out of these afterward. Such 
difficulties may occur in cases where it is not possi- 
ble to integrate the whole available application specific 
knowledge into the established target function, i.e., if 
its value must be obtained by simulation and the exist- 


v 
o 

= 

ob 
m 
vl 
© 
= 


1160 


ss |3 Hed 


Part E | Evolutionary Computation 


e) 


a) 
0.8 0.8 
0.6 0.6 
oO 
0.4 
0.4 
0.2 
0.2 
0 0.2 0.4 0.6 0.8 
a 
b) 
08 0.8 
0.6 0.6 
0.4 
0.4 
0.2 
0.2 
0 
0 0.2 0.4 0.6 0.8 


sqrt(a) 


sqrt(b) 


Fig. 58.1a,b Visualizations of the two-dimensional exam- 
ple problem. In the bottom panel, the search space is 
transformed by a square root. The desired optima are 
marked with white dots. Note that the diagonal consists of 
globally optimal but undesired (trivial) solutions 


ing simulation tool is not able to represent all important 
features of the real system. 

Obviously, there are several workarounds to over- 
come the difficulties imposed by this problem: 


© Applying a transformation to the search space, so 
that the local optima at the lower boundaries occupy 
more space. This is shown in Fig. 58.1. 

@ Only initializing the optimization algorithm with so- 
lutions on the boundaries of the search space. In this 
case, we sometimes start from very near to the local 
optima, and thus have a higher chance to find them. 

© Exploiting the symmetry of the landscape by a spe- 
cial representation. This can be done by enforcing 


a > band would help, e.g., recombination operators 
of evolutionary algorithms (EAs). 


However, all these approaches are dependent on 
the location of the desired optima. Any algorithm ex- 
ploiting this expert knowledge would neccessarily show 
a worse performance on problems without these spe- 
cial features, as predicted by the no free lunch theo- 
rem [58.2]. Instead, a more general method, which uses 
information on where the desired optima is not, will be 
discussed and evaluated in this chapter. 

Many different EAs may be used to tackle this 
global or multimodal optimization problem because 
they are able to detect several optima simultaneously 
or subsequently. The latter may be achieved by multi- 
start approaches as sequential niching [58.3], whereas 
the former is established by means of diversity mainte- 
nance. That is, candidate solutions of the search popula- 
tions are prevented from converging to the same region 
by implicitly or explicitly keeping them apart [58.4]. 
Prominent examples are crowding [58.5] and fitness 
sharing [58.6], and their successors. More recent ap- 
proaches include, but are not limited to UEGO [58.7], 
clearing [58.8], species conservation [58.9], clustering- 
based niching [58.10], and cellular EA (CEA) [58.11]. 
Although there is no commonly accepted formal defini- 
tion of what a niching method is [58.12], most of these 
algorithms may be subsumed under the term niching 
EA. They all use the distance between candidate solu- 
tions (diversity) as an implicit criterion which shall be 
maximized. 

However, nothing prevents us from utilizing a diver- 
sity criterion directly. A step into this direction has been 
taken in the shifting balance GA [58.13]. But although 
it employs a separate diversity evaluation via subpopu- 
lation distance computation, it finally resorts to a single 
objective by weighting the distance and target function 
values. 

In [58.14], we established a more radical approach 
and employ diversity in search space as an additional 
objective and treat the resulting combined problem by 
an evolutionary multiobjective algorithm (EMOA). The 
expected benefit is twofold: 


@ It enables placing solution candidates in basins that 
would otherwise go unnoticed due to their small 
size. 

@ We obtain a good overview of the available interest- 
ing search space regions in a single run. 


As we presume that this approach is not only ap- 
plicable to the thermodynamic problems treated in 


Solving Phase Equilibrium Problems 


58.2 The Phase-Equilibrium Calculation Problem 


this work but also to real-valued engineering problems 
with similar properties, it is also followed and further 
extended here. Other related multiobjectivization ap- 


proaches are discussed in Sect. 58.3 after introducing 
the problem context. 


58.2 The Phase-Equilibrium Calculation Problem 


The knowledge of phase equilibria is required for the 
design and optimization of separation processes which 
are essential parts of typical chemical plants. The aim of 
a phase-equilibrium calculation is to quantitatively re- 
late the variables (in particular, temperature T, pressure 
p, and mole fraction x) which describe the state of equi- 
librium of two or more homogenous phases [58.15]. 

In any problem concerning the equilibrium distri- 
butions of k components between two phases, one must 
always begin with the equality of the chemical potential 


w as 
Vie {1,... k}: = pi. (58.1) 


To establish the relation of u; (We use the domain- 
specific notation with upper index denoting different 
phases and lower index standing for separate sub- 
stances.) to T, p, and Xx; , it is convenient to introduce 
a certain auxiliary function such as the fugacity coeffi- 
cient g’ (T, p, x;), which can be calculated by a thermo- 
dynamic model. Then, (58.1) can be rewritten as 


Vie {l,..., Bix gl =x! gl. (58.2) 


Typically, the calculation is performed at constant tem- 
perature and pressure, and the remaining concentrations 
x; and x/’, respectively, are to be found. The fugacity co- 
efficient g; of component i in the mixture is calculated 
as 


res 


M; 
Ing; = —InZ, 58.3 
i = RT (58.3) 


with Z being the compressibility factor, defined as 
Z=—., (58.4) 


where v is the molar volume, and R is the gas constant. 


The residual chemical potential ,1}°* is given by 


we = as + RT(Z— 1) 


das da™* (58.5) 
T Ox; -2 ( dxe ) ? 


where (da™*/dx;) is a partial derivative of the resid- 
ual Helmholtz energy with respect to the mole fraction 
stated in the denominator, while all other mole fractions 
are considered constant. 

The residual Helmholtz energy according to the 
perturbed chain statistical associating fluid theory 
(PC-SAFT) is considered as the sum of different con- 
tributions resulting from repulsion (hard chain), van 
der Waals attraction (dispersion), and hydrogen bond- 
ing (association) 


as = gh oe qiisp + gs9soc | (58.6) 


The detailed equations for each contribution can be 
found in [58.16] and [58.17]. 

Solving phase-equilibrium problems according 
to (58.2) may lead to trivial solutions, i.e., x; =x, 
which are mathematically correct but have no physi- 
cal meaning (except at the so-called critical demixing 
point). To avoid obtaining them, the initial guesses for 
the minimization procedure may not be too far away 
from the correct solutions, provided that the correct so- 
lutions are known. 

In the case of polymer solutions, initialization is 
very critical, because the concentration of the polymer 
in the solvent-rich phase can be in the magnitude of 
1072, which is a numerical challenge for simulation 
programs [58.18]. Another difficulty arises as the num- 
ber of components in the mixture increases. All these 
challenges point out the need for a robust algorithm to 
solve the phase-equilibrium calculation for an arbitrary 
number of components and phases, and which is also 
applicable to polymer solutions. 

Figure 58.1 actually shows a phase-equilibrium 
problem, namely a simple two-component mixture of 
water and pentanol. This type of liquid-liquid equi- 
librium (LLE) data are necessary for the design and 
optimization of liquid—liquid extractors and decanters. 
The two variables correspond to the concentrations of 
water in the water-rich phase (for the larger of the two) 
and in the pentanol-rich phase (for the smaller one). 
Under the assumption that a > b, and that w stands for 
water and p for pentanol, we have a= x, and b= x”. 


1161 


7°38 |3 Hed 


1162 


€°8S | 3 Hed 


Part E 


Evolutionary Computation 


The remaining mole fractions x’, and x” can be obtained 
indirectly as x, = 1 — x/, and x” = 1 — x//, because for 
every phase, the following equality holds: 


k k 
I n 
Ke ee 


i=1 i=l 


(58.7) 


For this two-component problem, two equations of type 
(58.2) have to be satisfied, resulting in two error values 
ew = |x ol, — x!” | and ep = |x yl — x” "|. A feasi- 
w ww wPw p p?p p Pp z 
ble solution to the problem shall exhibit errors below 
10—!° due to practical requirements. In the following, 
ew and ep are aggregated into a single target function 
value by using the sum of squares, which is to be mini- 


mized (note the vector notation) 


AG x) =e be ; (58.8) 


Table 58.1 The sought optima at different temperatures 


Mole fraction 40°C 60°C 90°C 
Xe, 0.74698 0.7097 0.65084 
E 0.00020913 0.00038142 0.00082809 


In Fig. 58.1, (58.8) is modeled at a temperature of 
90°C, for which the sought optimum is located near 
the coordinates (0.650,0.001). As system properties 
change with temperature and pressure, the pursued 
optimum also moves through the search space. Ta- 
ble 58.1 depicts approximate solutions for different 
temperatures and constant pressure of 1.0132 bar. 
The trivial solutions are the only feature representa- 
tive for all phase-equilibrium problems. Thus, this is 
the only information that shall be exploited in the 
following. 


58.3 Multiobjectivization-Assisted Multimodal Optimization: MOAMO 


As seen in Sect. 58.1, the optimization problem at hand 
is inherently multimodal. That is, local optimization 
schemes are only successful if started from a region 
near the desired nontrivial solution. To make things 
worse, the basin of attraction of the undesired triv- 
ial solutions may largely dominate the search space as 
found for the very simple LLE problem (two phases, 
two components: water/pentanol). Hitting the basin 
of attraction of the desired solution can be very dif- 
ficult, and if failing on this, the final outcome of 
quasi-Newton or similar algorithms will be a trivial 
solution. 

Stochastic optimization methods like EAs and other 
metaheuristics employ a more globally oriented opti- 
mization scheme. Several attempts using these methods 
have been tried on equilibrium detection problems in 
recent years, namely genetic algorithms (GA) and simu- 
lated annealing in [58.19] or differential evolution (DE) 
and tabu search (TS) in [58.20]. The algorithms have 
been mostly used in their canonical form with some 
parameters tuning and a concluding local optimization 
step by means of a quasi-Newton method. Alternative 
approaches applied artificial neural networks for learn- 
ing and predicting phase equilibria as in [58.21], the 
authors of which evolve the neural networks by means 
of genetic programming (GP), and [58.22], where the 
authors employ a real-coded GA to optimize initial 
weights and biases of the neural network before it is 
further refined using a quasi-Newton method. Where 


enough training data is available, the binodal curves of 
equilibria can be learned and predicted for the missing 
areas. 

Some recent metaheuristic attempts concentrate on 
the global (multimodal) nature of the optimization 
problem to find equilibrium points for rather difficult 
systems where global optima are located in relatively 
small basins. [58.23] use tabu search, [58.24] a ran- 
dom tunneling method, and [58.25] a DE hybrid with 
TS components. While we agree that looking elsewhere 
for even better solutions is mandatory for a multi- 
modal problem, it may be even more rewarding to 
obtain a good overview over large portions of the search 
space before climbing down into the individual optima. 
This has been attempted by using a refined version of 
the algorithm of [58.26] which has been applied to 
phase stability problems by [58.27]. The base algorithm 
GLOBAL has been developed further in [58.28]. As the 
latter methods start from a random sample, it may how- 
ever happen that either the initial sample is too small so 
that important optima are missed, or it is relatively large 
and thus costly. 

The optimization concept suggested in this work 
therefore relies on an evolutionary multiobjective algo- 
rithm (EMOA) approach in order to generate a spec- 
trum of possible near-optimal solutions before ap- 
plying a local search method on these. We term 
it multiobjectivization-assisted multimodal optimiza- 
tion (MOAMO). The method was successfully applied 


Solving Phase Equilibrium Problems | 58.3 Multiobjectivization-Assisted Multimodal Optimization: MOAMO 1163 


hire fre 


Fig. 58.2 The general concept of 
MOAMO and its influence on search 
and objective space 


mh 


0 
— r 
fi 
X2 d 
o or 


g 

: a 

S e O 

a o., ®0, 

g e o 

5 o @ 

=) e *. 

O — o — L 
fi fi fi 

g X2 A : e X24 X24 

S ee ee 

D ee e o 

a e 

o 

E] e (J 

g e 

o o0 

a e o co co oo de 

eoo 

x) x) x) 


onto the two-phase 2-component water/pentanol sys- 
tem in [58.14]. Here, we demonstrate that it is viable 
for more complicated equilibrium problems with more 
phases and components. Although not yet tried on 
polymer problems, this ultimate goal seems to be in 
reach as very small basins of attraction can be attained 
reliably. 

Figure 58.2 shows the main concept of the 
MOAMO approach. The key idea is to use a population- 
based multiobjective algorithm as a preprocessing step 
for generating search points in the different basins of 
attraction of the tackled problem, the basin of the non- 
trivial optimum being among them. To do this, the 
practitioner first has to formulate an additional objec- 
tive function. This second objective is then employed 
to obtain good coverage of the search space despite 
the high attraction of certain areas. We label this type 
of multiobjectivization avoidance-based because appli- 
cation knowledge about where the sought optimum is 
not helps to transform the single-objective optimiza- 
tion problem into a multiobjective one that is easier to 
solve. More precisely, it enables detecting several dif- 
ferent basins, among them many that would most likely 
have gone unnoticed with the single-objective approach 
alone. 

For this specific application, the distance to the triv- 
ial solution (equal concentrations) is taken into account. 
From then on, the system can work autonomously. In 
the next step, the multiobjective optimization is car- 
ried out. The obtained search points then are fed one 
by one into a local optimization method, until a satisfy- 
ing nontrivial solution is found. For this local search, 
only the original objective is relevant. We employed 
the algorithm of [58.29] and the covariance matrix 
adaptation evolution strategy (CMA-ES) of [58.30] for 


this last step. The experimental results suggest that 
especially the latter seems well suited for the task. 
However, one may resort to another method here (e.g., 
quasi-Newton or similar standard optimization algo- 
rithms as described in [58.31]) if it is deemed more 
appropriate. To avoid superfluous local optimization 
steps on candidate solutions that are close to each 
other, this phase may be prepended with a clustering 
step so that one tries a representative of each group 
of solutions first and then proceeds in a round robin 
fashion. 

The idea of simplifying a difficult single-objective 
problem by a multiobjective approach has some precur- 
sors in evolutionary computation and has been coined 
as multiobjectivization by [58.32]. The approach can 
be divided into two general categories, namely mul- 
tiobjectivization by adding objectives and multiobjec- 
tivization by the decomposition of a scalar objective 
function. 

For the latter one, it can be proven that the ap- 
proach can only decrease the number of local op- 
tima [58.33]. It was for example successfully applied 
to protein structure prediction problems in [58.34, 35]. 
MOAMO belongs to the category of multiobjectiviza- 
tion by adding objectives. No theoretic guarantees of 
benefits can be given [58.36] for this approach, but 
nonetheless it has already been tried in several dif- 
ferent ways [58.37-40]. However, these applications 
somewhat remain in the tradition of evolutionary multi- 
objective algorithms that already contain diversity pre- 
serving mechanisms. The second objectives suggested 
all refer to the current population or single individu- 
als thereof and do not take characteristics of the actual 
problem into account. MOAMO strongly differs as in- 
stead of a population-relative, it employs an absolute 


€°89 | 3 Hed 


164 PartE | Evolutionary Computation 


€°8S9 | 3 Hed 


a) b) 
hA hA r 
O 
| jane a 
Pi 
Pi o é E 
P3 pa 7 
| P3 of] 
P2 P4 

> > 
fi fi 


Fig. 58.3 (a) Pareto dominance for minimization: p, and 
P> are non-dominated, p, is dominated by py, and p4 is 
dominated by p, and p,. (b) A non-dominated front be- 
tween objectives fı and f2, consisting of points pı to p4. 
cı to cq denote the hypervolume contribution of each point 
(the space not covered by any other point) against the ref- 
erence point r 


distance objective, namely the distance to the known 
trivial solutions. The MOAMO approach is therefore 
especially well suited to phase equilibrium problems, 


Start 


—EEEE SS 
Init & evaluation population 


y 
Produce 1 new individual 
by variation, add to population 


Y 
Compute domination count 
dc per individual 


Dominated individuals? 
Y 


no yes 


Compute S-metric contribution 


Remove individual | 


AS of each individual with max(dc) 


Remove individual 
with min(AS) 


yes 


y 
Stop 


Fig. 58.4 Working scheme of the SMS-EMOA. Termination is 
done according to predefined conditions, e.g., a certain budget of 
fitness evaluations 


as the fugacity equations do not allow to directly con- 
clude where the sought solution is, but at least they 
provide information on where it is not. It has been 
demonstrated in [58.14] that by using the multiob- 
jective EA as preprocessing step, the important basin 
can be located with a much smaller amount of func- 
tion evaluations than would be needed by sampling 
the search space randomly, even if the basin is very 
small. 

In the following, basic EMOA concepts are sum- 
marized and the particular multiobjective optimization 
algorithm employed in our experiments is introduced, 
namely the SMS-EMOA by [58.41] and [58.42]. 


58.3.1 Basics of Multiobjective Optimization 


Multi-objective optimization fundamentally relies on 
Pareto dominance. A point in the objective space of 
two or more objective functions is dominated, if there 
is at least one other that is not worse in all ob- 
jectives and better in at least one (Fig. 58.3a). As 
the optimization progresses, the population approaches 
the Pareto front which resembles the set of optimal 
compromises and consists of non-dominated points 
only. 

Several criteria exist to judge the quality of whole 
populations within the algorithm run (as means to 
determine the next search steps) and thereafter to as- 
sess optimization success. One of the most popular 
is the hypervolume, the amount of objective space 
coverered by the population with regard to a ref- 
erence point as documented in the right panel of 
Fig. 58.3. 

The S-metric selection evolutionary multiobjective 
algorithm (SMS-EMOA) is a further development of 
the popular NSGA2 (nondominated sorting genetic al- 
gorithm 2) by [58.43]. Figure 58.4 displays its major 
steps. Starting from a usually randomly placed popu- 
lation, a loop begins with deriving one new individual 
(search point) and adding it to the population. The 
domination count of each individual is computed by 
counting how many other individuals dominate it. If 
such dominated individuals exist, the one with the 
largest domination count is deleted. Otherwise, the 
hypervolume contribution of each individual is deter- 
mined (Fig. 58.3b), and the individual with the smallest 
contribution is deleted. If the current state does not ful- 
fill the termination criterion (e.g., a predefined budget 
of function evaluations) the loop starts over. After ter- 
minating, the remaining population is the result set. 


Solving Phase Equilibrium Problems | 58.4 Solving General Phase-Equilibrium Problems 


58.4 Solving General Phase-Equilibrium Problems 


We present the results of phase-equilibrium calculations 
for the three-component system water/methanol/MMA 
as well as for the three-phase systems water/MMA and 
water/furfural. The corresponding optimization prob- 
lems have four, three, and three decision variables. 
PC-SAFT uses statistical mechanics for its sim- 
ulation of thermodynamic systems and thus requires 
a calibration of some pure-component parameters and 
one binary parameter. The aim of this calibration is 
to achieve a consistency between the calculated phase 
equilibria and results of physical experiments. Car- 
rying out this task manually for a single substance 
takes up several days of work for a chemical engi- 
neer, although the data of the physical experiments 
are already available in the literature [58.44]. These 
data contain series of measurements of temperature, 
density, and pressure for the vapor and the liquid 
phase of each substance. Among the several param- 
eters that model the molecular properties, there are 
five per substance that have to be estimated. These 
are the number of sphere segments m, the segment 
diameter o, the segment energy parameter €/k, an 
association energy «^i /k, and the effective associa- 
tion volume kêi, Two different association sites are 
assigned to all the considered substances. If the sub- 
stance is non-self-associating, then association energy 
as well as association volume are set to zero. Be- 
sides the five (three) parameters per substance, the 


Table 58.2 PC-SAFT pure-component parameters for 
considered components 


model requires one parameter kj that is characteris- 
tic for each binary mixture. The respective values for 
all these parameters were taken from [58.45,46] and 
are summarized in Tables 58.2 and 58.3. The appli- 
cability of PC-SAFT to model the mentioned systems 
in good agreement with experimental data has been 
proved in [58.46]. 

The following experiments show that the MOAMO 
approach provides a reliable and fast tool for the de- 
tection of equilibrium points which are difficult to find 
with standard optimization tools as a gradient or quasi- 
Newton search. 


58.4.1 Ternary Liquid-Liquid Equilibrium: 
Water/Methanol/MMA 


In Fig. 58.5, the ternary phase diagram of wa- 
ter/methanol/MMA at 50°C and 1.013 bar with two 
liquid phases is shown. The calculation of the tie-lines 
was performed for different fixed concentrations of 
MMA in one liquid phase (xuma), See Table 58.4, at 
constant temperature and pressure. 


Pre-Experimental Planning 
The first objective (58.9) is generated from the error val- 
ues output by PC-SAFT. These refer to the departure 
from the equilibrium state between every two phases of 


© A Exp. data 
—— PC-SAFT 
OPI k; = 0 (MMA-water) 


kij = 0 (MMA-methanol) 


1165 


7°85 |3 Hed 


Sub- m o ce/k € numa x AiBi ky = 0.05 (water-methanol) 
stance 
Water 1.0656 3.0007 366.5121 2500.6706 0.0349 
Methyl 3.0632 3.6238 265.6874 0 0.0349 0.6 
methacry- MMA Methanol 
late 
(MMA) 0.4 
Methanol 1.5255 3.2300 188.9046 2899.4906 0.0352 
Furfural 4.1604 3.0204 270.0700 0 0.0349 
1 0 
Table 58.3 PC-SAFT binary parameters 0 0.2 0.4 0.6 0.8 1 
a Water 
Binary system kij 
Water/MMA 0 Fig. 58.5 Phase diagram of water/methanol/MMA system at 50 °C 
Water/methanol —0.05 and 1.013 bar. The symbols are experimental data from [58.47] 
Water/furfural —0.006 and [58.48]. The line is the calculation result of PC-SAFT with 
MMA/methanol 0 


MOAMO-approach 


1166 


7°89 |3 Hed 


Part E 


Evolutionary Computation 


one component as given by (58.1). 


3 
file’) = Yo eeg (58.9) 


i=l 


Different formulations of the second objective for the 
SMS-EMOA were tried and several of them work well. 
Therefore, a generalization of the distance criterion for 
the two-component two-phase case in Sect. 58.1 was 
chosen. It measures the Euclidean norm of a vector 
of concentration differences (slightly shifted to allow 
for minimization) and is easily extendable for more 
components 


fha, x”) = J/2— jx 


=x" 


(58.10) 


Sj 


Experimental Task 

The task for MOAMO in this experiment is to reli- 
ably reach the sought optimum for all indicated MMA 
concentrations, that is the number of individuals con- 
verging to the optimum in the local search phase shall 
be considerably larger than 1 on average. Furthermore, 
the MOAMO-based approach shall find the optimum 
considerably faster than a naïve multistart local search 
procedure. 


Setup 
For each of the concentrations indicated in Table 58.4, 
MOAMO is run five times with 30 individuals in the 


Table 58.4 MOAMO with 30 individuals, remaining pop- 
ulation put into local optimization and rate of success and 
convergence to trivial solution, averaged over five runs. 
Where the sum of optimum and trivial is below 30, some 
local searches did not converge. The last column gives the 
empirical success probabilities for random start points of 
the local search 


FNMA Optimum Trivial Success rate (%) 
0.05 25.2 3.8 45.0 
0.15 0.0 30.0 Del 
0.25 5.8 24.0 3.6 
0.35 19.6 10.2 3.9 
0.45 25.8 4.0 BD 
0.55 29.4 0.6 2.9 
0.65 28.8 0.0 Bell 
0.75 29.0 0.6 Bull 
0.85 232 0.6 23 


multiobjective first step. Each search point contained in 
the last population is then optimized by a local search 
procedure (CMA-ES is employed for this second step). 
For each local search, it is recorded if either the unde- 
sired trivial solution or the sought optimum is obtained 
or if the search did not converge. Other than population 
size and run length (30 and 5000), the SMS-EMOA pa- 
rameters are chosen as in [58.41]. 

In order to perform a comparison, the local search 
procedure (CMA-ES) is started 1000 times for each 
MMA concentration from a randomized start point and 
the rate of success for converging to the sought opti- 
mum is recorded. The CMA-ES terminates if progress 
or adapted stepsizes decrease below 107 !? as usual. 


Observations 
Table 58.4 comprises the results for the MOAMO ap- 
proach and in comparison the success rates for the 
random start local search procedure. Run lengths of the 
CMA-ES are not given in detail, but mostly range be- 
tween 2600 and 5000 evaluations. 

For the MMA concentrations from 0.25 to 0.85, 
both methods are consistent! MOAMO obtains the 
sought optimum from at least 60% of the last popu- 
lation’s search points, while the success rates of the 
random start local search vary between 2 and 4%. How- 
ever, 0.05 and 0.15 are special cases: In the first case, 
the problem is obviously not that hard as the random 
start local search also detects the sought optimum often, 
and in the second case, the MOAMO approach com- 
pletely fails. 


Discussion 
The most striking result of the experiment is that hard- 
ness of the problem for the two compared approaches 
seems uncorrelated. An MMA concentration of 0.05 
is much more easily solved by the random start lo- 
cal search than any other, but the success rates for 
MOAMO do not reflect this. For 0.15, the opposite hap- 
pens as the problem poses average difficulties for the 
random start local search procedure, but is very hard 
for MOAMO. We conjecture that this is an exception as 
we are almost at the critical point here, where concen- 
trations in both phases differ less and less. Presumably, 
trivial solution and sought optimum are too equal to 
separate them in the SMS-EMOA phase via the dis- 
tance objective. However, we can be satisfied with the 
results for the other concentrations, where the MOAMO 
approach reliably detects the sought optimum and is 
much faster than the random start local search pro- 
cedure, even if the effort for the first (multiobjective) 


Solving Phase Equilibrium Problems | 58.4 Solving General Phase-Equilibrium Problems 1167 


phase is considered (which is on the order of one or two 
local searches). 


58.4.2 Three Phase Equilibria: 
Water/MMA and Water/Furfural 


We now turn to an application of the MOAMO ap- 
proach on 2 component/3 phase systems in order to 
detect the heteroazeotrope point (a 3-phase equilib- 
rium). The first objective is again obtained from the 
phase equilibrium equations and differs from the one 
chosen for the 3 component/2 phase system (58.9) in the 
number of relevant phase equations. Due to transitivity, 
four error values remain here. Additionally, a quadratic 
form is chosen here instead of the absolute value form 
used in the previous case, under the assumption that 
the quadratic form simplifies the local optimization task 
(Quasi-Newton as well as evolutionary optimization 
methods usually perform better in this case). 


2 


Fie’ x2") = Y eo o 
i=1 


+ Œg; x pY]. (58.11) 


As for the previous system, it is necessary to determine 
a suitable second (distance) criterion for the multiobjec- 
tive first step. However, for three phases, the approach 


— PC-SAFT kj=0 
@ Exp. data VLE 
A Exp. data LLE 


0 0.2 0.4 0.6 0.8 1 
Xyma 


Fig. 58.6 Phase diagram of water/MMA system at 1 
bar. The symbols are experimental data from [58.49] 
and [58.50]. Lines are calculation results of PC-SAFT with 
MOAMO-approach 


taken in [58.14] has to be generalized in a different 
way than done for three components. Interestingly, our 
preliminary test showed that it is sufficient to consider 
only one component and its three phases to create a dis- 
tance criterion. We may use mutual phase concentration 
differences of phases 1 and 2, 2 and 3, and 1 and 
3 to aggregate an objective function. (Note that Eu- 
clidean distances have been employed in the previous 
section, however our tests show that for the multiob- 
jective MOAMO step, the choice of the distance norm 
itself is not very important and Manhattan distances as 
used here are also sufficient.) 


2 
fae’. x” x”) =2-Y (x 
i=1 
TEE 
(58.12) 


Alternatively, the phase concentration differences can 
also be stated as three separate criteria, resulting in 
a four-objective problem for the SMS-EMOA 


2 


/ M Mr 7 N 
fh, x", x") = 1-5. x, =x 


i= 


2 


fax’, x”,x"”) = = be ae 


i= 


he ee a= le oe". (58.13) 


i= 


The following experiment will show whether the ag- 
gregated formulation or the separate criteria are more 
advisable. 

The binary system water/MMA in Fig. 58.6 exhibits 
a heteroazeotrope behavior at | bar. According to the 
phase rule, only one variable can be fixed to deter- 
mine the heteroazeotrope, as in this case the pressure. 
The temperature of the heteroazeotrope was found at 
81.93 °C and the concentrations of MMA in the three 
phases were xuma = 0.841826, xVima = 0.488033, and 
Xima = 0.002577. 

The identification of the heteroazeotrope point for 
water/furfural at 1 bar was more complicated than the 
previous system due to the fact that two sought wa- 
ter concentrations are close to each other (X ater = 
0.911822 and x’ ter = 0.973374), see Fig. 58.7. The 


water 
JI 


third water concentration was found at Xie. = 


0.507017 and the heteroazeotrope temperature was de- 
termined at 97.64°C. 


7°85 |3 Hed 


1168 


7°89 |3 Hed 


Part E 


Evolutionary Computation 


— PC-SAFT k= —0.006 


40 @ Exp. data VLE 
A Exp. data LLE 
20 
0 0.2 0.4 0.6 0.8 1 


Xwater 


Fig. 58.7 Phase diagram of water/furfural system at 1 bar. 
The symbols are experimental data from [58.51]. Lines are 
calculation results of PC-SAFT with MOAMO-approach 


Pre-Experimental Planning 

Taking over the SMS-EMOA parameters (population 
size and run length) from the previous experiment led 
to an unreliable behavior for the two systems tested 
here. Seemingly, they are more difficult to solve than 
the given three-component/two-phase system. There- 
fore, population size is doubled to 60 individuals and 
run length is accordingly slightly increased to 6000 
evaluations. 


Experimental Task 

As the last paragraph indicated that the problems in 
this section are even more difficult than the that of 
Sect. 58.4.1, there is no point in testing against random 
start local search again. Instead, it shall be determined if 
the aggregated (58.12) or the separate criteria approach 
(58.13) is more suitable for solving the problems with 
MOAMO. To enable a decision between the two, a sig- 
nificant difference in success rates is required. 


Setup 
For each of the two systems (water/MMA and wa- 
ter/furfural) and each of the problem formulations (ag- 
gregated/separate), 30 MOAMO runs are performed 
and the number of successes is recorded. A run is 
successful if at least one of the local search steps ob- 
tains the sought optimum the number of successful 
local searches is not recorded. As before, we employ 
the combination of SMS-EMOA and CMA-ES. The 


Table 58.5 Success rates for detecting the heteroazeotrope 
point via MOAMO approach under different formulation 
of the distance criterion 


System Distance criterion Success rate (%) 
Water/MMA Aggregated 100.0 

Separate 50.0 
Water/furfural Aggregated O83 

Separate 36.7 


resulting values for the first objective function (com- 
puted from the error values output by PC-SAFT) shall 
be below 10~!> in this case, requiring to modify the 
CMA-ES internal stopping criteria accordingly. Its ini- 
tial step size is set to 0.01. The SMS-EMOA parameters 
are set as in the previous experiment except population 
size and run length which are modified as documented 
above. 


Observations 
The number of successful runs is given in Table 58.5. 
The aggregated approach seems to consistently perform 
better than the one with separate criteria, and success 
rates hint to the fact that the second system poses more 
difficulty than the first one. 


Discussion 

Fortunately, the much simpler (aggregated) approach is 
also the more reliable for both systems. The much larger 
objective space in the first phase (four instead of two ob- 
jective functions) obviously outweights the benefits of 
a correct mapping by far. Furthermore, for higher num- 
bers of phases, the number of objectives would grow 
faster than linear, so that in conclusion, the aggregated 
approach is much more suitable than the one with sep- 
arate objective functions. 


58.4.3 Obtaining the Phase Diagrams 


Once the heteroazeotrope point is detected, a phase di- 
agram of the system may be obtained by systematic 
exploration of the two-phase equilibria at different tem- 
peratures. We simply increase or decrease the temper- 
ature (which is a free variable for two-phase systems) 
by 1°C and take the solution for the last tempera- 
ture step as initial point for a local search (executed 
by the CMA-ES) on every binodal curve. Figures 58.6 
and 58.7 have been generated by means of this method. 
(Note that this is different from the common approach 
of detecting several two-phase equilibria by means of 
a quasi-Newton method first and then to conclude on 
the heteroazeotrope point from these.) 


Solving Phase Equilibrium Problems | References 


58.5 Conclusions and Outlook 


In this chapter, a multistage method named MOAMO 
(multiobjectivization-assisted multimodal optimiza- 
tion) was presented. It is especially designed for 
difficult multimodal direct search problems as arising 
in phase equilibrium detection. However, the method 
is very well applicable whenever some problem 
knowledge is available concerning where the global 
optimum is not. The experimental analysis, performed 
on three different systems with either three compo- 
nents and two phases or two components and three 
phases, has shown that the approach is reliable and 
fast. It outperforms random multistart local search 
by a large margin under nearly all tested conditions. 
Two important properties of the approach need to be 
emphasized: 


@ Unlike many attempts to solve phase-equilibrium 
problems by means of evolutionary or related al- 
gorithms, MOAMO utilizes known features of the 
problem to direct the search and thereby avoids 
spending too much effort in repeatedly approaching 


References 


58.1 A.A. Torn, A. Žilinskas (Eds.): Global Optimiza- 
tion, Lecture Notes in Computer Science, Vol. 350 
(Springer, Berlin, Heidelberg 1989) 

58.2 D.H. Wolpert, W.G. Macready: No free lunch the- 
orems for optimization, IEEE Trans. Evol. Comput. 
1(1), 67-82 (1997) 

58.3 D. Beasley, D.R. Bull, R.R. Martin: A sequential 
niche technique for multimodal function opti- 
mization, Evol. Comput. 1(2), 101-125 (1993) 

58.4 A.E. Eiben, J.E. Smith: Introduction to Evolutionary 
Computing (Springer, Berlin, Heidelberg 2003) 

58.5 K.A. De Jong: An Analysis of the Behavior of a Class 
of Genetic Adaptive Systems, Ph.D. Thesis (Univer- 
sity of Michigan, Ann Arbor 1975) 

58.6 D.E. Goldberg, J. Richardson: Genetic algorithms 
with sharing for multimodal function optimiza- 
tion, Proc. Second Int. Conf. Genet. Algorithm. Their 
Appl. (1987) pp. 41-49 

58.7 M. Jelasity: UEGO, an abstract niching technique for 
global optimization, Lect. Notes Comput. Sci. 1498, 
378-387 (1998) 

58.8 A. Pétrowski: A clearing procedure as a niching 
method for genetic algorithms, Proc. 1996 IEEE Int. 
Conf. Evol. Comput. (1996) pp. 798-803 

58.9 J.-P. Li, M.E. Balazs, G.T. Parks, P.J. Clarkson: 
A species conserving genetic algorithm for multi- 
modal function optimization, Evol. Comput. 10(3), 
207-234 (2002) 


trivial solutions. However, it does not make any as- 
sumptions about the location of the sought optima 
and is thus still a generic approach. 

@ Unlike in some other multiobjectivization ap- 
proaches, the second objective is population in- 
dependent. Moving a single individual does not 
change the objective function values of any other. 
This prevents unwanted feedback loops. The op- 
timization focuses on the problem and not on the 
current population. 


Our results indicate the MOAMO approach as re- 
markably independent of the actual formulation of the 
second objective. Performance increases or decreases 
only gradually for alternative objectives, the over- 
all concept remains intact. However, obtaining better 
guidelines for setting up a matching second objective is 
an area for future research, as is the comparison with 
more different algorithms and the adoption for other 
problems, not necessarily restricted to phase equilib- 
rium detection. 


58.10 F. Streichert, G. Stein, H. Ulmer, A. Zell: A cluster- 
ing based niching method for evolutionary algo- 
rithms, Proc. Genet. Evol. Comput. (2003) pp. 644- 
645 

58.11 M. Tomassini: Spatially Structured Evolutionary Al- 
gorithms Artificial Evolution in Space and Time 
(Springer, Berlin, Heidelberg 2005) 

58.12 M. Preuss: Niching prospects, bioinspired opti- 
mization methods and their applications, BIOMA 
2006 (2006) pp. 25-34 

58.13 F. Oppacher, M. Wineberg: The shifting balance 
genetic algorithm: Improving the GA in a dy- 
namic environment, Proc. Genet. Evol. Comput. 
Conf. (1999) pp. 504-510 

58.14 M. Preuss, G. Rudolph, F. Tumakaka: Solving mul- 
timodal problems via multiobjective techniques 
with application to phase equilibrium detection, 
IEEE Cong. Evol. Comput. (CEC 2007) (2007) pp. 2703- 
2710 

58.15 E.G. de Azevedo, J.M. Prausnitz, R.N. Lichtenthaler: 
Molecular Thermodynamics of Fluid Phase Equilib- 
ria (Prentice Hall, Englewood Cliffs 1986) 

58.16 J. Gross, G. Sadowski: Perturbed-chain saft: An 
equation of state based on a perturbation theory 
for chain molecules, Ind. Eng. Chem. Res. 40(4), 
1244-1260 (2001) 

58.17 M. Kleiner, F. Tumakaka, G. Sadowski, H. Latz, 
M. Buback: Phase equilibria in polydisperse and 


1169 


8S |3 Hed 


170 PartE | Evolutionary Computation 


8S | 3 Hed 


58.18 


58.19 


58.20 


58.21 


58.22 


58.23 


58.24 


58.25 


58.26 


58.27 


58.28 


58.29 


58.30 


58.31 


58.32 


associating copolymer solutions: Poly(ethene-co- 
(meth)acrylic acid)-monomer mixtures, Fluid Ph. 
Equilib. 241(1/2), 113-123 (2006) 

S. Behme: Thermodynamik von Polymersystemen 
bei hohen Driicken, Ph.D. Thesis (Technische Uni- 
versitat, Berlin 2000) 

G.P. Rangaiah: Evaluation of genetic algorithms 
and simulated annealing for phase equilibrium 
and stability problems, Fluid Ph. Equilib. 187/188, 
83-109 (2001) 

M. Srinivas, G.P. Rangaiah: A study of differential 
evolution and tabu search for benchmark, phase 
equilibrium and phase stability problems, Comput. 
Chem. Eng. 31(7), 760-772 (2007) 

L. Gao, N.W. Loney: New hybrid neural network 
model for prediction of phase equilibrium in 
a two-phase extraction system, Ind. Eng. Chem. 
Res. 41(1), 112-119 (2002) 

X. He, X. Zhanga, S. Zhanga, J. Liub, C. Lia: Pre- 
diction of phase equilibrium properties for com- 
plicated macromolecular systems by HGALM neu- 
ral networks, Fluid Ph. Equilib. 238(1), 52-57 
(2005) 

Y.S. Teh, G.P. Rangaiah: Tabu search for global opti- 
mization of continuous functions with application 
to phase equilibrium calculations, Comput. Chem. 
Eng. 27(11), 1665-1679 (2003) 

M. Srinivas, G.P. Rangaiah: Implementation and 
evaluation of random tunneling algorithm for 
chemical engineering applications, Comput. Chem. 
Eng. 30(9), 1400-1415 (2006) 

M. Srinivas, G.P. Rangaiah: Differential evolution 
with tabu list for global optimization and its ap- 
plication to phase equilibrium and parameter es- 
timation problems, Ind. Eng. Chem. Res. 46(10), 
3410-3421 (2007) 

C.G.E. Boender, A.H.G. Rinnooy Kan, G.T. Timmer, 
L. Stougie: A stochastic method for global opti- 
mization, Math. Program. 22(1), 125-140 (1982) 

J. Balogh, T. Csendes, R.P. Stateva: Application of 
a stochastic method to the solution of the phase 
stability problem: cubic equations of state, Fluid 
Ph. Equilib. 212(1/2), 257-267 (2003) 

T. Csendes, L. Pal, J.0.H. Sendin, J.R. Banga: The 
global optimization method revisited, Optim. Lett. 
2(4), 445-454 (2008) 

R. Hooke, T.A. Jeeves: Direct search solution of nu- 
merical and statistical problems, J. ACM 8, 212-229 
(1961) 

N. Hansen, A. Ostermeier: Completely derandom- 
ized self-adaptation in evolution strategies, Evol. 
Comput. 9(2), 159-195 (2001) 

J.C. Nash: Compact Numerical Methods for Comput- 
ers: Linear Algebra and Function Minimisation, 2nd 
edn. (Adam Hilger, Bristol 1990) 

J.D. Knowles, R.A. Watson, D.W. Corne: Reduc- 
ing local optima in single-objective problems 
by multi-objectivization, Lect. Notes Comput. Sci. 
1993, 269-283 (2001) 


58.33 


58.34 


58.35 


58.36 


58.37 


58.38 


58.39 


58.40 


58.41 


58.42 


58.43 


58.44 


58.45 


58.46 


58.47 


58.48 


J. Handl, S. Lovell, J. Knowles: Multiobjectivization 
by decomposition of scalar cost functions, Lect. 
Notes Comput. Sci. 5199, 31-40 (2008) 

J. Handl, S. Lovell, J. Knowles: Investigations into 
the effect of multiobjectivization in protein struc- 
ture prediction, Lect. Notes Comput. Sci. 5199, 702- 
711 (2008) 

V. Cutello, G. Narzisi, G. Nicosia: Computational 
studies of peptide and protein structure predic- 
tion problems via multiobjective evolutionary al- 
gorithms. In: Multiobjective Problem Solving from 
Nature. From Concepts to Applications, ed. by 
J. Knowles, D. Corne, K. Deb (Springer, Berlin, Hei- 
delberg 2008) pp. 93-114 

D. Brockhoff, T. Friedrich, N. Hebbinghaus, C. Klein, 
F. Neumann, E. Zitzler: Do additional objectives 
make a problem harder?, Proc. 9th Annu. Conf. 
Genet. Evol. Comput. (2007) pp. 765-772 

H.A. Abbass, K. Deb: Searching under multi- 
evolutionary pressures, Lect. Notes Comput. Sci. 
2632, 391-404 (2003) 

A. Toffolo, E. Benini: Genetic diversity as an ob- 
jective in multi-objective evolutionary algorithms, 
Evol. Comput. 11(2), 151-167 (2003) 

L.T. Bui, J. Branke, H.A. Abbass: Diversity as a se- 
lection pressure in dynamic environments, Proc. 
2005 Conf. Genet. Evol. Comput. (2005) pp. 1557- 
1558 

K. Deb, A. Saha: Multimodal optimization using 
a bi-objective evolutionary algorithm, Evol. Com- 
put. 20(1), 27-62 (2012) 

M. Emmerich, N. Beume, B. Naujoks: An EMO al- 
gorithm using the hypervolume measure as selec- 
tion criterion, Lect. Notes Comput. Sci. 3410, 62-76 
(2005) 

N. Beume, B. Naujoks, M. Emmerich: SMS-EMOA: 
Multiobjective selection based on dominated hy- 
pervolume, Eur. J. Oper. Res. 181(3), 1653-1669 
(2007) 

K. Deb: Multi-Objective Optimization Using Evolu- 
tionary Algorithms (Wiley, New York 2001) 

T.E. Daubert, R.P. Danner: Data Compilation Tables 
of Properties of Pure Compounds, Design Institute 
for Physical Property Data (American Institute of 
Chemical Engineers, New York 1985) 

J. Gross, G. Sadowski: Application of the perturbed- 
chain saft equation of state to associating systems, 
Ind. Eng. Chem. Res. 41(22), 5510-5515 (2002) 

M. Kleiner, G. Sadowski: Modeling of polar sys- 
tems using PC-SAFT: An approach to account for 
induced-association interactions, J. Phys. Chem. C 
111(43), 15544-15553 (2007) 

G.A. Chubarov, S.M. Danov, G.V. Brovkina, 
T.V. Kupriyanov: Equilibrium in system methanol 
methyl methacrylate water, J. Appl. Chem. USSR 
51(2), 434-437 (1978) 

J. Kooi: The system methylmethacrylate - 
methanol - water, J. R. Neth. Chem. Soc. 68(1), 
34-42 (1949) 


Solving Phase Equilibrium Problems 


References 


58.49 S.M. Danov, T.N. Obmelyukhina, G.A. Chubarov, 


58.50 


A.L. Balashov, A.A. Dolgopolov: Investigation and 
calculations of liquid-vapor-equilibrium in binary 
methyl-methacrylate impurity systems, J. Appl. 
Chem. USSR 63(3), 566-568 (1990) 

J. Fu, K. Wang, Y. Hu: Studies on the vapor- 
liquid equilibrium and vapor-liquid-liquid equi- 
librium for a methanol-methyl methacrylate- 


58.51 


water ternary system (II) Ternary system, J. Chem. 
Ind. Eng. (China) 4(1), 14-25 (1988) 

A.C.G. Marigliano, M.B.G. de Doz, H.N. Solimo: 
Influence of temperature on the liquid-liquid 
equilibria containing two pairs of partially mis- 
cible liquids - water + furfural + 1-butanol 
ternary system, Fluid Ph. Equilib. 153(2), 279-292 
(1998) 


1171 


8S |3 Hed 


1173 


59. Modeling and Optimization 
of Machining Problems 


Dirk Biermann, Petra Kersting, Tobias Wagner, Andreas Zabel 


In this chapter, applications of computational 
intelligence methods in the field of production en- 
gineering are presented and discussed. Although 
a special focus is set to applications in machining, 
most of the approaches can be easily transferred to 
respective tasks in other fields of production engi- 
neering, e.g., forming and coating. The complete 
process chain of machining operations is consid- 
ered: The design of the machine, the tool, and the 
workpiece, the computation of the tool paths, the 
model selection and parameter optimization of the 
empirical or simulation-based surrogate model, 
the actual optimization of the process parameters, 
the monitoring of important properties during the 
process, as well as the posterior multicriteria de- 
cision analysis. For all these steps, computational 
intelligence techniques provide established tools. 
Evolutionary and genetic algorithms are commonly 
utilized for the internal optimization tasks. Model- 
ing problems can be solved using artificial neural 
networks. Fuzzy logic represents an intuitive way 
to formalize expert knowledge in automated de- 
cision systems. 


In production engineering and particularly in the field 
of machining, improvements in materials, coatings, 
tools, and machines continuously provide potentials for 
improving the processes. In order to exploit these poten- 
tials, however, optimal setups of the changing processes 
have to be found. Since modern production processes 
involve many complex subsystems, as well as preced- 
ing and subsequent steps, all these systems and steps 
have to be adapted for achieving the optimal result. 

In this chapter, it is shown that computational intel- 
ligence (CI) provides methods to assist in achieving this 
ambitious aim. A particular focus is on the applications 
of evolutionary computation (EC) in machining, but 
also artificial neural networks (NN) and fuzzy logic are 
considered. A comprehensive overview is presented by 


59.1 Elements of a Machining Process.......... 1174 
59.2 Design Optimization ........................0. 1175 

59.2.1 Optimal Design of Machines ....... 1175 

59.2.2 TOG! Optimizat M.se 1175 

59.2.3 Workpiece Layout 

Optimizðtom.. sesseainpeserena 1175 

59.3 Computer-Aided Design 

and Manufacturing... 1176 

59.3.1 Surface Reconstruction............... 1176 

59.3.2 Optimization of NC Paths............ 1176 
59.4 Modeling and Simulation 

of the Machining Process ..................... 1177 

59.4.1 Empirical Modeling.................... 1177 

59.4.2 Physical Modeling 

TOF Simulations ec ecconasecccoeees. 1178 

59.5 Optimization 

of the Process Parameters .................... 1178 
59.6 Process Monitoring ..................:::::ccee 1179 
59.7 Visualization ........0....... cc eeeeeeeseeeeeeeeees 1179 
59.8 Summary and Outlook ....................00... 1180 
References... ce eeeeeeeeecceeeeeeeeeeaeeeeeeneeees 1180 


considering several subsystems, as well as the preced- 
ing and subsequent steps in the operating sequence. In 
this aspect, the chapters contribute to common surveys 
in the literature [59.1—5], which are often only focused 
on the modeling and optimization of the actual process. 
In order to assist interested engineers in choosing 
a suitable method for their problem, the solutions of- 
fered by CI are structured according to the specific 
subproblems to be solved in a machining problem. To 
keep the big picture still apparent, these subproblems 
are integrated into the complete operating sequence in 
the following section. They are then discussed accord- 
ing to their chronological order in the sequence. The 
chapter is concluded with summarizing remarks on CI 
applications in the field of production engineering. 


174 Part E | Evolutionary Computation 


6S |3 Hed 


59.1 Elements of a Machining Process 


An overview of the elements and steps to be consid- 
ered when optimizing a machining process is shown 
in Fig. 59.1. In the focus of the considerations is the 
actual process. The results of this process, however, 
significantly depend on its elements, in particular on 
the mechanical properties and the dynamic character- 
istics of the machine, geometry, and the properties of 
the tools, as well as the layout of the workpiece which 
determines the required machining operations. All these 
elements can be individually optimized to improve the 
results of the process. For the latter, often complex 
numerical control (NC) paths for the machines have 
to be generated using computer-assisted manufactur- 
ing (CAM) software. To accomplish this, a model of 
the final workpiece geometry is required. If no such 
model is available, e.g., after manual modifications of 
a prototype, CI-based methods can assist in computing 
an optimized workpiece model for the CAM software. 
However, even if a model is available, the NC paths 
computed by the CAM software can be far from opti- 
mal due to the complexity of the process, e.g., in five- 
axis milling operations. In this case, the subsequent op- 
timization of the position-dependent parameters of the 
NC code, such as the inclination angles œ and f, and 
the feed rate f [59.6], can significantly increase the ef- 
ficiency of the process. 

When all the components of the actual process are 
selected and fixed therewith, the optimization of the 
adjustable process parameters can begin. Thereby, CI- 


Process Process Process analysis/ 
optimization monitoring decision making 
2 Process > g 
2 Milling Grinding Turning ? 
Q 
z 5 
5 k 
H a 
z a 
a =» 
= =] 
a & 
El 3 
A © 


CAD/CAM Tool 


e E 


Machine 


Empirical model 


Simultan model 


Fig. 59.1 Overview of the elements and steps of an arbitrary ma- 
chining process 


based techniques are usually based on a self-organiz- 
ing process. In order to let the self-organization take 
effect, a high number of experiments is required. Since 
a real-world experiment involves high costs, it can be- 
come necessary to use a surrogate model on which 
the method is applied. In this case, however, additional 
problems have to be solved. It has to be selected which 
kind of model (empirical, analytical, physical, numeri- 
cal) is applied and which type or realization of this kind 
of model is implemented, e.g., an empirical model can 
be computed using artificial neural networks, Gaussian 
processes, or regression techniques. As soon as a model 
is chosen, the parameters of this model (internal coeffi- 
cients, material constants, etc.) have to be adapted with 
respect to the given application. This often represents an 
additional nonlinear optimization problem which can 
be solved using techniques of EC. 

Moreover, the robustness of the process can be 
increased by a monitoring-based process control. To 
accomplish this, dynamic characteristics of the pro- 
cess, such as acoustic emission signals and force 
measurements, are analyzed online and control op- 
erations are initiated as soon as these characteristics 
show suspicious patterns. In this kind of applica- 
tion, however, it is necessary to automatically detect 
what indeed is a suspicious pattern. Fuzzy logic and 
NNs have proven to be capable of performing these 
tasks. 

A lot of information can be obtained in order to 
analyze the process and its results. This information 
can either be achieved by measurements during and 
after the process or by performing simulation stud- 
ies. They usually build the basis for the calculation 
of the actual objectives. In this context, machining 
processes have to be optimized with respect to sev- 
eral conflicting aims, e.g., a simultaneous minimiza- 
tion of tool wear and maximization of the material 
removal rate. Even if multiobjective optimization tech- 
niques are used, a lot of details can be lost in this 
formulization step. Often the first version of the objec- 
tives does not result in the desired results. Additional 
objectives have to be defined or preferences have to 
be integrated. In order to allow a deeper understand- 
ing of the process to be obtained and a refinement 
of the objectives to be made, an intuitive visualiza- 
tion and exploration of the detail information is re- 
quired. For this task, again Cl-based techniques can be 
used. 


Modeling and Optimization of Machining Problems 


59.2 Design Optimization 


59.2 Design Optimization 


The optimal design of a machine, tool, or workpiece 
is a great challenge in the field of production engi- 
neering. The optimization task is often conducted as 
an iterative manual process which is based on expert 
knowledge and which can be very cost and time con- 
suming. Roy et al. [59.7] gave an extensive overview of 
the recent advances in automated and interactive design 
optimization. They presented a classification of the op- 
timization problems and discussed the most important 
optimization approaches and techniques. In the follow- 
ing subsections, examples of successful applications of 
CI for the optimization of machine, tool, and workpiece 
designs are provided. 


59.2.1 Optimal Design of Machines 


Designing machines necessitates the consideration of 
multiple objectives, such as geometric accuracy and 
costs. Liu and Liang [59.8], for instance, presented an 
approach combining a modified Chebyshev program- 
ming method for the scalarization of these objectives 
and a particle swarm optimization algorithm for evolv- 
ing the machine designs. They were dealing with re- 
configurable machine tools, so not only the process 
accuracy and investment costs of the machine layouts, 
but also the configurability was considered. Signifi- 
cant changes in the shape of the product could thus be 
easily adapted. Mekid and Khalid [59.9] discussed an 
optimization method based on a multiobjective genetic 
algorithm for the design of three-axis micromilling 
machines. They took user requirements (for example 
the workspace volume), axis positions, and geomet- 
ric errors of the machine into account. For the latter, 
they used a mathematical error model of the three-axis 
milling machines. 


59.2.2 Tool Optimization 


Designing machining tools is a very difficult optimiza- 
tion task since not only complex geometries, but also 
different machining criteria have to be taken into ac- 
count [59.10]. Abele and Fujara, for example, presented 
a simulation approach for optimizing the drill geometry 
based on a genetic algorithm [59.11]. They consid- 
ered not only the structural stiffness of the tool during 
their optimization run, but also took the coolant flow 
resistance and the chip evacuation capability into ac- 


count. They also defined the machinability, especially 
the grindability of the chip flute, as constraint. In order 
to take all these criteria into account, different simu- 
lation approaches have to be used (Sect. 59.4). Abele 
and Fujara used, for example, the finite element method 
in order to analyze the structural stiffness. The cutting 
forces were computed using a semiempirical cutting 
force model. Additionally, a model of the grinding 
wheel had to be determined in order to evaluate the 
grindability of the optimized drill geometry. Another 
application was presented by Jared et al. [59.12] who 
integrated GA into the computer-aided design software 
CATIA. In one of their case studies, the volume and the 
tip deflection of a cutting tool were minimized by au- 
tomatically parameterizing length and angles between 
segments of a 2-D (two-dimensional) profile which 
were then extruded to the actual tool. 


59.2.3 Workpiece Layout Optimization 


The layout of products can usually be described as 
multiobjective optimization problem. For example, the 
design of aerospace structures always faces a trade- 
off between the stiffness and the weight of the prod- 
ucts [59.13]. The layout of a cooling system, e.g., for 
a turbine blade [59.13] is a tradeoff between the ma- 
chining quality, the cooling effect, and the production 
costs. Weinert et al. [59.14—17] developed a simulation 
system for optimizing the layout of mold temperature 
control systems in order to minimize the production cy- 
cle times and costs, and to maximize the product quality. 
They developed an efficient simulation system in order 
to evaluate the effect and homogeneity of the tempering 
of the design layout and to estimate the manufacturing 
costs [59.18]. Using fast but sufficiently accurate eval- 
uation methods, a computer-aided optimization of the 
temperature control system based on multiobjective op- 
timization methods, like NSGA-II [59.19] and SMS- 
EMOA [59.20], became possible [59.2 1—24]. Neverthe- 
less, this optimization task is very complex and the en- 
gineer’s experience is still necessary. Due to this, Bier- 
mann et al. [59.25] combined the computer-aided opti- 
mization system with the possibility of user interaction 
so that a visual real-time manipulation of target func- 
tions is possible. Diirr and Jurklies [59.26] presented 
a fuzzy expert system in order to use the expert knowl- 
edge in a computer-assisted way. 


1175 


7°65 | J Hed 


176 PartE | Evolutionary Computation 


€°6S |3 Hed 


59.3 Computer-Aided Design and Manufacturing 


In the modern construction process, computer-aided de- 
sign (CAD) software is used for all design tasks — for 
example for the model of the workpiece. This model 
is the basis for the generation of the NC paths by 
CAM software. However, if only a physical prototype 
exists or manual modifications of the original model 
have been performed, methods to compute a respec- 
tive model are required. To accomplish this, the original 
object is scanned and a point-based representation is ob- 
tained. From this point data, a new CAD model has to 
be calculated or the original model has to be adapted. 
This process is called surface reconstruction or reverse 
engineering. 

When a model of the workpiece is available, 
NC paths can be generated based on CAM software 
for most machining processes. For complex five-axis 
milling processes, however, the results of standard 
CAM software are not always optimal with respect to 
the requirements of the specific machine and process. In 
this case, CI-based techniques can be used to improve 
the NC paths generated by the CAM software. 


59.3.1 Surface Reconstruction 


The optimization of the visual quality of triangulations 
with respect to different quality criteria was success- 
fully performed using evolutionary algorithms by Wein- 
ert etal. [59.27]. Based on an initial triangulation, as 
provided by the software of the scanning system, edges 
were flipped in order to minimize the total length of all 
edges, the surface area, the sum of angles between nor- 
mals, and the total absolute curvature. It was found that 
the latter is best suited for generating visually smooth 
surfaces. 

Small tolerances in the representation of the origi- 
nal object, however, result in a huge number of required 
scan points. Current scanners are able to provide this 
dense and precise set of scan points, but the result- 
ing triangulations become very large and difficult to 
handle. Approximating triangulations tackle this prob- 
lem. The number of control points for the triangles 
is independent of the size of the point set and usu- 
ally considerably smaller than the number of scan 
points. Weinert et al. [59.28] documented the capabil- 
ities of a standard evolution strategy to optimize the 
control point positions of approximating triangulations. 
In order to avoid an uncontrolled expansion of the tri- 
angulation, balancing strategies based on mass-spring 
systems were integrated. 


Unfortunately, even approximating triangulations 
produce a nonsmooth surface and are therefore not con- 
venient for the later computation of NC paths. Nonuni- 
form rational B-splines (NURBS) [59.29] are another 
popular mathematical model for free-form surfaces 
in CAD software. The most important advantages of 
NURBS over triangulations are their smoothness, their 
compact definition, the possibility for an intuitive local 
manipulation, as well as the ability to combine NURBS 
patches to larger structures. Mehnen et al. [59.30, 31] 
applied an evolution strategy to the coordinates of the 
NURBS’s control points in order to minimize the dis- 
tance between the scan points and their projection to 
the NURBS. Wagner etal. [59.32] did the same us- 
ing a real-valued genetic algorithm. They also proposed 
another distance indicator that is based on a sampling 
of the NURBS and that is much cheaper to evaluate. 
The use of the sampling-based distance measure in 
combination with a equation-solver-based hybrid real- 
valued genetic algorithm significantly reduced the run- 
time of the optimization. This approach was further 
enhanced [59.16] to a two-step approach, in which the 
single-objectively optimized solution is used as initial 
individual for a multiobjective optimization. As addi- 
tional objective, the smoothness of the NURBS was 
considered. This objective was also considered by Jared 
et al. [59.12] in their GA-based optimization of NURBS 
in CATIA. 

In addition, Weinert etal. [59.33] combined 
NURBS with constructive solid geometries [59.34] 
in a hybrid evolutionary algorithm/genetic program- 
ming approach. By these means, the construc- 
tional logic behind the workpiece could also be 
approximated. 


59.3.2 Optimization of NC Paths 


The five-axis milling process offers the possibilities 
to tilt the milling tool and, thus, to use shorter and 
therewith stiffer tools. This allows complex free-form 
surfaces to be machined in one workpiece clamping, 
and the engagement conditions to be adapted [59.35]. 
An improvement of the machining results and a reduc- 
tion of the machining time can be achieved. However, in 
contrast to the three-axis process, the generation of the 
NC paths particularly for the machining of free-form 
surfaces is much more complex [59.6]. 

Weinert and Stautner [59.36] presented an algo- 
rithm for converting three- into five-axis milling paths 


Modeling and Optimization of Machining Problems 


59.4 Modeling and Simulation of the Machining Process 


in which the position of the tool tip is kept from 
the three-axis NC program. An optimization approach 
based on an evolutionary strategy was used to improve 
the tool movement [59.37]. To accomplish this, they de- 
veloped a fast simulation system of the five-axis milling 
process based on a discrete dexel model of the work- 
piece (Sect. 59.4) [59.38]. 

The NC paths generated for a five-axis milling pro- 
cess are often not smooth enough since the kinematic 
behavior of the specific milling machine is not taken 
into account. Zabel et al. developed a simulation ap- 
proach which is placed in the process chain between 
the CAM system and the real-milling process [59.39]. 
The five-axis tool movement is optimized taking the 
tool axis configuration and the dynamic behavior of 
the milling machine into account. For this purpose, 
methods of evolutionary computation and wavelet the- 


ory were combined [59.35]. In 2007, Mehnen et al. 
integrated a multiobjective optimization algorithm into 
this simulation system which combined the variation 
of a modern single-objective approach with the se- 
lection mechanism of a classical multiobjective opti- 
mization algorithm in order to optimize the tool move- 
ment [59.40]. 

One challenging task during the optimization of 
the five-axis milling process is the avoidance of col- 
lisions between the milling tool and the workpiece. 
Kersting and Zabel [59.6] developed an efficient sim- 
ulation approach, which maps the high-dimensional 
restriction area on a two-dimensional matrix struc- 
ture. They showed that the use of a multipopula- 
tion multiobjective evolutionary algorithm in the re- 
striction-free area improved the corresponding Pareto 
fronts [59.41]. 


59.4 Modeling and Simulation of the Machining Process 


The optimization of real-world applications using CI- 
based or classical optimization approaches requires that 
a performance value or vector can be obtained for all 
possible settings of the input parameters, whereby the 
performance values are usually calculated based on 
measurements during or after the actual process. In or- 
der to achieve a near-optimal result, however, far more 
than 100 different parameter vectors have to be evalu- 
ated — even for low-dimensional problems. This amount 
of real experiments is often impossible due to the costs 
related to them. As a possible solution, the use of em- 
pirical or physical (simulation) models can significantly 
reduce the number of required experiments since most 
of the evaluations can be performed on the model. For 
both kinds of approaches, CI techniques have already 
been successfully used. Some examples are presented 
in the following subsections. 


59.4.1 Empirical Modeling 


For the use of empirical models, real or simulated ex- 
periments are still required in order to build up a data 
base for the training of the model. In contrast to the 
direct optimization of the process, however, these ex- 
periments are performed as a block of moderate size in 
the beginning of the optimization. Afterward, the model 
allows new parameter settings to be predicted based on 
the information obtained from training data. The deter- 
mination of near-optimal solutions can be performed on 
the model. 


The number of empirical models is exhaus- 
tive [59.42]. Nevertheless, NNs often showed their 
capability to empirically model responses from ma- 
chining processes. For instance, the material removal 
rate of an abrasive jet drilling process was successfully 
predicted by using an NN with back error propaga- 
tion [59.43]. As input parameters, varying gas pres- 
sure, nozzle inside diameter, abrasive flow rate, size of 
the medium particle, and standoff distance were con- 
sidered. Accordingly, the ablating depth obtained for 
specific values of the peak power, pulsing frequency, 
and overlapping in a laser drilling process could be 
predicted using NN [59.44]. Casalino etal. [59.45] 
showed that NN achieve higher prediction accuracies 
than regression techniques in predicting surface rough- 
ness and resultant forces for varying cutting speed, 
feed rate, and radial depth in milling. In the same line, 
NN were used for the prediction of the specific cut- 
ting constants resulting from different cutting speeds, 
feeds, inclination angles a and £, cutting depths, and 
cutting widths [59.46]. With respect to tool wear, the 
wheel life of a cylindrical grinding wheel was modeled 
using a feedforward backpropagation NN. A direct pre- 
diction of the tool wear was also accomplished using 
NN [59.47, 48]. Moreover, the thermal expansion of the 
Y-axis ball screw was predicted based on temperature 
measurements at different points of the machine struc- 
ture [59.49]. 

In addition, CI-based techniques can also indirectly 
be used for empirical modeling. As soon as complex 


1177 


16S |3 Hed 


178 PartE 


Evolutionary Computation 


S°6S |3 Hed 


empirical models, such as Gaussian processes, sup- 
port vector or other kernel machines, are used, the 
determination of the optimal model parameters is an in- 
dividual nonlinear optimization problem. Evolutionary 
algorithms, in particular the covariance matrix adap- 
tion evolution strategy (CMA-ES) [59.50], showed to 
be suitable for solving these problems [59.51, 52]. 


59.4.2 Physical Modeling for Simulation 


In cases where sufficient knowledge about the physi- 
cal laws of the process is available, simulation models 
based on equations representing these physical laws 
are likely to be superior to the very general formula- 
tions of the empirical models. Nevertheless, also these 
models have parameters that are related to the prop- 
erties of the material, tool, and machine. Since these 
parameters can often not be measured, their values 
are usually set by minimizing the error between the 
predictions of the simulation and a training set of obser- 
vations from real-world experiments. As consequence, 
EC is a valuable tool for calibrating simulation models 
which was shown to be superior to classical data fitting 
tools [59.53]. 

In an exemplary application, the dynamic behav- 
ior of manufacturing systems was characterized by its 
frequency response function. This function can be mod- 
eled by a superposition of decoupled damped harmonic 


oscillators, whereby each oscillator has three parame- 
ters (mass, natural frequency, and damping) [59.54]. In 
order to minimize the deviation between the measured 
frequency response function and one of the oscillators, 
an interactive approach based on evolutionary algo- 
rithms was successfully implemented [59.54]. 

An open issue in the simulation of machining pro- 
cesses is the modeling of the extremely high strain 
rates which can only rarely be covered by classical 
material models and tensile tests. As a possible solu- 
tion, EC can be used as a submodule of a simulation 
in order to predict the deformation and flow charac- 
teristics for high strain rates. For instance, Weinert 
etal. used symbolic regression by means of a genetic 
programming system to evolve mathematical formu- 
lae that describe the trajectories of single particles of 
steel based on recordings of a high-speed camera dur- 
ing the turning process [59.55, 56]. Teti etal. [59.57] 
employed NN to reconstruct the stress-strain curve of 
the workpiece material from experimental data of ten- 
sile tests. They found out that the learned NN is capable 
of predicting workpiece material properties in a wide 
range of temperature and strain rate values. A hybrid 
simulation model based on physical equations and the 
empirical stress-strain prediction was finally proposed. 
Two recent overviews of hybrid models for simulation 
which also incorporate CI techniques were provided by 
Jawahir et al. [59.58, 59]. 


59.5 Optimization of the Process Parameters 


In this section, possible applications of EC methods for 
the optimization of the actual process parameters are 
discussed. Since a recent survey book for the model- 
based optimization of process parameters exist [59.1], 
only a short summary of possible applications is pro- 
vided. In contrast to this survey, the following presen- 
tation does not distinguish between different processes, 
as the aspects related to the use of EC are independent 
of the actual process, e.g., milling, turning, or grinding. 

As already discussed in the previous section, it 
is mandatory to approximate the process quality indi- 
cators by means of analytical, empirical, or physical 
models. In the literature, no direct application of EC 
optimization techniques to machining processes was 
reported until now. Instead, polynomial or process- 
related equations were usually fitted to experimental 
data [59.60-78]. Neural networks [59.63, 79—83], other 
empirical models [59.51, 62, 84], and simulation mod- 


els [59.85,86] were also popular to accomplish this 
task. 

For the actual optimization, two important deci- 
sions on the formulation of the problem have to be 
taken in order to choose the EC method. These deci- 
sions are concerned with the representation of the input 
parameters and the objectives. In most cases, continu- 
ously defined input parameters, such as feed and cutting 
speed, are to be optimized. This relates to techniques 
such as evolution strategies, particle swarm optimiza- 
tion, and real-valued genetic algorithms (GAs). If also 
discrete parameters, such as the cooling concept or 
tool material, are considered, special evolution strate- 
gies [59.65,87] or binary GAs may better be suited. 
With respect to the objectives, it has to be decided 
whether a single optimal solution or a set of tradeoffs 
is desired. In the former case, almost all EC tech- 
niques can directly be used. Due to the complexity of 


Modeling and Optimization of Machining Problems 


59.7 Visualization 


production engineering problems, however, a suitable 
scalarization of the different objectives has to be found 
in order to achieve reasonable results. In the latter case 
of searching for an approximation of the trade-off struc- 
ture, it is important that the algorithm is capable of 
coping with multiple objectives which have to be con- 
sidered in parallel [59.51, 72, 74, 78, 79, 84, 86]. 

In the literature, the use of continuous input vari- 
ables and single-objective formulations is established. 
The most popular EC methods are particle swarm op- 
timization (PSO) [59.63, 68, 75, 76, 81-83, 85, 88] and 
standard GA or evolutionary algorithm (EA) [59.60, 
62, 64, 67, 69, 77, 80]. The use of specifically designed 
heuristics [59.71, 75, 89] is rather uncommon. Never- 
theless, the formulation of the problem and the design 
of the algorithm should aim at incorporating as much 
knowledge as possible into the optimization [59.16]. 

Unfortunately, the generality of ClI-based_ tech- 
niques often results in problem formulations which are 
not completely sophisticated. An important factor of- 


59.6 Process Monitoring 


The analysis of different process variables — like for 
example the cutting forces, acoustic emission, or tem- 
peratures — allows conclusions about the process- 
dependent state of the machining processes and its 
components (tools, machines, workpieces, etc.) to be 
drawn and provides the possibility for an adaptive 
process control [59.93]. The idea of process monitor- 
ing is to measure, visualize, and analyze the values 
of these variables during the machining process. Teti 
et al. [59.93] gave an extensive overview of advanced 
monitoring of machining operations describing sensor 


59.7 Visualization 


In the field of production engineering, the complex op- 
timization problems are often characterized by multiple 
objectives and restrictions. Additionally, the decision 
space can be high dimensional — like for example in the 
case of optimizing NC paths (Sect. 59.3.2) [59.6]. In 
order to analyze the optimization problems and the ap- 
plied optimization approach, an intuitive visualization 
of the data resulting from the evolutionary process is 
advisable [59.94]. For this purpose, Pohlheim [59.95] 
reviewed several visualization techniques in order to 


ten neglected when optimizing production engineering 
problems is the uncertainty about the external pro- 
cess variables, e.g., properties of the tool or material. 
Although modern algorithms are capable of incorpo- 
rating them into the optimization [59.90], only a few 
applications actually take these uncertainties into ac- 
count [59.70]. More specifically, two sources of un- 
certainty can be considered [59.91]: perturbations in 
the input variables, e.g., due to online control, and 
environmental uncertainties, such as outdoor temper- 
ature, humidity, and the already mentioned external 
process variables. A detailed overview of such factors 
can be found in the literature [59.92]. A compre- 
hensive survey of possible problem formulations and 
respective optimization approaches was presented by 
Beyer and Sendhoff [59.91]. In production-engineer- 
ing applications, however, classical statistical methods 
are usually used to cope with these problems. The 
potential of ClI-based techniques has not yet been 
exploited. 


systems for machining, signal processing, monitor- 
ing scopes, and the decision-making support systems. 
In order to evaluate the measured values, cognitive 
computing methods — for example genetic algorithms, 
fuzzy logic, or NNs — can be used. In contrast to 
the rule-based fuzzy logic approach, NNs do not store 
the knowledge in an explicit form. A survey of the 
successful applications of these techniques for the ad- 
vanced monitoring of machining operations was pro- 
vided by Teti et al. [59.93]. It is thus omitted in this 
section. 


obtain a better understanding of the optimization pro- 
cess of real-world problems. He recommended the 
use of three diagrams in order to analyze the opti- 
mization algorithm: A convergence diagram, visualiza- 
tion of the change of the best individual during the 
optimization approach, and a diagram of the objec- 
tive values of all individuals in the population of all 
generations. 

Miiller et al. discussed techniques for an intuitive 
visualization and interactive analysis of Pareto sets ap- 


179 


2°65 |3 Hed 


180 PartE 


6S | 3 Hed 


Evolutionary Computation 


plied on production engineering systems [59.94]. They 
analyzed different visualization and analysis methods 
in order to gain insight into both the optimization prob- 
lem and the optimization algorithm, and to support 
an intuitive decision-making process. For this purpose, 
they presented a simultaneous visualization of the deci- 
sion and the objective space. An interactive navigation 
through the solution sets supports the user to detect spe- 
cific process characteristics [59.94]. This also helps to 
redesign the objective formulation in cases where the 
optimization results are not in agreement with the ac- 
tual preferences of the decision maker. 

In order to support the trade-off analysis in multi- 
ple dimensions, Obayashi and Sasaki [59.96] presented 
a visualization approach based on self-organizing maps 
(SOMs). The idea is to map from the high-dimen- 


59.8 Summary and Outlook 


This chapter focused on applications of CI in the op- 
timization of machining problems. For this purpose, 
the whole process chain — from the design of a ma- 
chine, tool, or workpiece, as well as the corresponding 
optimization of process parameters, to the process mon- 
itoring and subsequent analysis of the results — was 
taken into account. Different modeling and simula- 
tion techniques, which are necessary to optimize real- 
world problems, were discussed. Successful examples 
in the field of production engineering were compiled to 
present the applicability of the CI methods. In conclu- 
sion, evolutionary and genetic algorithms are general 
and powerful solvers for nonlinear optimization tasks, 


References 


59.1 E. Venkata Rao: Advanced Modeling and Optimiza- 
tion of Manufacturing Processes (Springer, Berlin, 
Heidelberg 2011) 

59.2 D. Dasgupta, Z. Michalewicz: Evolutionary Al- 
gorithms in Engineering Applications (Springer, 
Berlin, Heidelberg 1997) 

59.3 W. Banzhaf, M. Brameier, M. Stautner, K. Wein- 
ert: Genetic programming and its application 
in machining technology. In: Advances in Com- 
putational Intelligence: Theory and Practice, 
ed. by H.-P. Schwefel, |. Wegener, K. Weinert 
(Springer, Berlin, Heidelberg 2003) pp. 194-244, 
Chap. 7 

59.4 I. Mukherjee, P. Ray: A review of optimization tech- 
niques in metal cutting processes, Comput. Ind. 
Eng. 50(1-2), 15-34 (2006) 


sional objective function space to two-dimensional map 
units. They showed the applicability of this approach 
analyzing two multiobjective aerodynamic design prob- 
lems [59.96]. 

The innovization approach of Deb [59.97] provides 
an automated identification of design principles by 
searching for common features of the optimal trade- 
offs in a multiobjective optimization problem. These 
features are provided by means of analytical rela- 
tions between the design variables. A successful ap- 
plication of innovization in machining was already 
reported [59.78]. Another possibility to learn about 
the structure of the objectives and the effect of the 
input parameters is provided by visualizations and anal- 
yses based on the surrogate models of the process 
(Sect. 59.4) [59.51, 98]. 


artificial neural networks can be used for continuous 
modeling problems, and fuzzy logic provides an intu- 
itive way to represent expert knowledge. 

Unfortunately, the generality of CI-based tech- 
niques often results in problem formulations which are 
not completely sophisticated. For instance, possibili- 
ties of creating good initial solutions, uncertainty in the 
design variables, and specific aspects of the quality indi- 
cators resulting in undesirable scalarizations, are often 
neglected. EC provides the means to appropriately con- 
sider these aspects. A proper analysis of the results can 
assist in identifying such pitfalls and in improving the 
problem formulation. 


59.5 H. Aytug, M. Khouja, F.E. Vergara: Use of genetic al- 
gorithms to solve production and operations man- 
agement problems: A review, Int. J. Prod. Res. 
41(17), 3955-4009 (2003) 

59.6 P. Kersting, A. Zabel: Optimizing NC-tool paths for 
simultaneous five-axis milling based on multi- 
population multi-objective evolutionary algo- 
rithms, Adv. Eng. Softw. 40(6), 452-463 (2009) 

59.7 R. Roy, S. Hinduja, R. Teti: Recent advances in engi- 
neering design optimisation: Challenges and future 
trends, CIRP Ann. Manuf. Technol. 57(2), 697-715 
(2008) 

59.8 W. Liu, M. Liang: A particle swarm optimization 
approach to a multi-objective reconfigurable ma- 
chine tool design problem, IEEE Cong. Evol. Com- 
put. (2006) pp. 2222-2229 


Modeling and Optimization of Machining Problems 


References 


59.9 


59.10 


59.11 


59.12 


59.13 


59.14 


59.15 


59.16 


59.17 


59.18 


59.19 


59.20 


59.21 


59.22 


S. Mekid, A. Khalid: Robust design with error opti- 
mization analysis of CNC micromilling machine, 5th 
CIRP Int. Semin. Intell. Comput. Manuf. Eng. (2006) 
pp. 583-587 

H. Schulz, A.K. Emrich: Optimization of the chip 
flute of drilling tools using the principle of genetic 
algorithms, 2nd CIRP Int. Semin. Intell. Comput. 
Manuf. Eng. (2000) pp. 371-376 

E. Abele, M. Fujara: Simulation-based twist drill 
design and geometry optimization, CIRP Ann. 
Manuf. Technol. 59(1), 145-150 (2010) 

G. Jared, R. Roy, J. Grau, T. Buchannan: Flexible op- 
timization within the CAD/CAM environment, CIRP 
Int. Seminar Intell. Comput. Manuf. Eng. (1998) 
pp. 503-508 

R. Roy, A. Tiwari, J. Corbett: Designing a turbine 
blade cooling system using a generalised regres- 
sion genetic algorithm, CIRP Ann. Manuf. Technol. 
52(1), 415-418 (2003) 

J. Mehnen, K. Weinert, H.-W. Meyer: Evolutionary 
optimization of deep drilling strategies for mold 
temperature control, 3rd CIRP Int. Semin. Intell. 
Comput. Manuf. Eng. (2002) 

T. Michelitsch, J. Mehnen: Evolutionary optimiza- 
tion of cooling circuit layouts based on the elec- 
trolytic tank method, 4th CIRP Int. Semin. Intell. 
Comput. Manuf. Eng. (2004) 

K. Weinert, A. Zabel, P. Kersting, T. Michelitsch, 
T. Wagner: On the use of problem-specific can- 
didate generators for the hybrid optimization of 
multi-objective production engineering problems, 
Evol. Comput. 17(4), 527-544 (2009) 

J. Mehnen, T. Michelitsch, K. Weinert: Production 
engineering: Optimal structures of injection mold- 
ing tools. In: Emergence, Analysis and Optimiza- 
tion of Structures — Concepts and Strategies across 
Disciplines, ed. by K. Lucas, P. Roosen (Springer, 
Berlin, Heidelberg 2010) pp. 75-90 

T. Michelitsch, J. Mehnen: Optimization of pro- 
duction engineering problems with discontinuous 
cost-functions, 5th CIRP Int. Semin. Intell. Comput. 
Manuf. Eng. (2006) pp. 275-280 

K. Deb, A. Pratap, S. Agarwal, T. Meyarivan: A fast 
and elitist multiobjective genetic algorithm: NSGA- 
Il, IEEE Trans. Evol. Comput. 6(2), 182-197 (2002) 

N. Beume, B. Naujoks, M. Emmerich: SMS-EMOA: 
Multiobjective selection based on dominated hy- 
pervolume, Eur. J. Oper. Res. 181(3), 1653-1669 
(2007) 

J. Mehnen, T. Michelitsch, T. Bartz-Beielstein, 
N. Henkenjohann: Systematic analyses of multi- 
objective evolutionary algorithms applied to real- 
world problems using statistical design of experi- 
ments, 4th CIRP Int. Semin. Intell. Comput. Manuf. 
Eng. (2004) 

J. Mehnen, H. Trautmann: Integration of expert's 
preferences in pareto optimization by desirability 
function techniques, 5th CIRP Int. Semin. Intell. 
Comput. Manuf. Eng. (2006) pp. 293-298 


59.23 


59.24 


59.25 


59.26 


59.27 


59.28 


59.29 


59.30 


59.31 


59.32 


59.33 


59.34 


59.35 


59.36 


T. Michelitsch, T. Wagner, D. Biermann, C. Hoff- 
mann: Designing memetic algorithms for real- 
world applications using self-imposed constraints, 
Proc. 2007 IEEE Congr. Evol. Comput. (2007) 
pp. 3050-3057 

D. Biermann, R. Joliet, T. Michelitsch, T. Wagner: 
Sequential parameter optimization of an evolution 
strategy for the design of mold temperature control 
systems, Proc. 2010 IEEE Congr. Evol. Comput. (2010) 
pp. 4071-4078 

D. Biermann, R. Joliet, T. Michelitsch: Interactive 
manipulation of target functions for the optimiza- 
tion of mold temperature control systems, 2nd Int. 
Conf. Manuf. Eng., Qual. Prod. Syst. (2010) pp. 239- 
244 

H. Dürr, I. Jurklies: A fuzzy expert system assist the 
CAD/CAM/CAE process chain in the tool and mould 
making industry, 3rd CIRP Int. Semin. Intell. Com- 
put. Manuf. Eng. (2002) 

K. Weinert, J. Mehnen, F. Albersmann, P. Dre- 
rup: New solutions for surface reconstruction from 
discrete point data by means of computational in- 
telligence, CIRP Int. Seminar Intell. Comput. Manuf. 
Eng. (1998) pp. 431-438 

K. Weinert, J. Mehnen, M. Schneider: Evolutionary 
optimization of approximating triangulations for 
surface reconstruction from unstructured 3D Data, 
Proc. 6th Jt. Conf. Inf. Sci. (2002) pp. 578-581 

L. Piegl, W. Tiller: The NURBS Book (Springer, Berlin, 
Heidelberg 1997) 

K. Weinert, J. Mehnen: NURBS-surface approxima- 
tion of discrete 3D-point data by means of evo- 
lutionary algorithms, 2nd CIRP Int. Semin. Intell. 
Comput. Manuf. Eng. (2000) pp. 263-268 

T. Beielstein, J. Mehnen, L. Schönemann, H.- 
P. Schwefel, T. Surmann, K. Weinert, D. Wiesmann: 
Design of evolutionary algorithms and applications 
in surface reconstruction. In: Advances in Compu- 
tational Intelligence: Theory and Practice, ed. by 
H.-P. Schwefel, |. Wegener, K. Weinert (Springer, 
Berlin, Heidelberg 2003) pp. 164-193 

T. Wagner, T. Michelitsch, A. Sacharow: On the 
design of optimisers for surface reconstruction, 
Proc. 9th Annu. Genet. Evol. Comput. Conf. (2007) 
pp. 2195-2202 

K. Weinert, T. Surmann, J. Mehnen: Evolutionary 
surface reconstruction using CSG-NURBS-Hybrids, 
Proc. Genet. Evol. Comput. Conf. (2001) pp. 1456- 
1463 

C.M. Hoffmann: Geometric e7 Solid Modeling (Kauf- 
mann Publ., San Mateo 1989) 

K. Weinert, A. Zabel, H. Müller, P. Kersting: Op- 
timizing NC tool paths for five-axis milling using 
evolutionary algorithms on wavelets, 8th Annu. 
Genet. Evol. Comput. Conf. (2006) pp. 1809-1816 
K. Weinert, M. Stautner: Generating multiaxis tool 
paths for die and mold making with evolutionary 
algorithms, Lect. Notes Comput. Sci. 3103, 1287- 
1298 (2004) 


181 


6S |3 Hed 


1182 


6S | 3 Hed 


Part E 


Evolutionary Computation 


59.37 


59.38 


59.39 


59.40 


59.41 


59.42 


59.43 


59.44 


59.45 


59.46 


59.47 


59.48 


59.49 


59.50 


K. Weinert, M. Stautner: A new system optimizing 
tool paths for multi-axis die and mould making 
by using evolutionary algorithms, production en- 
gineering, Res. Dev. 12(1), 15-20 (2005) 

A. Zabel, M. Stautner: Optimizing the multi- 
axis milling process via evolutionary algorithms, 
Berichte aus dem IWU, 8th CIRP Int. Workshop 
Model. Mach. Oper. (2005) pp. 363-370 

A. Zabel, H. Müller, M. Stautner, P. Kersting: Im- 
provement of machine tool movements for simul- 
taneous five-axes milling, 5th CIRP Inter. Semin. 
Intell. Comput. Manuf. Eng. (2006) pp. 159- 
164 

J. Mehnen, R. Roy, P. Kersting, T. Wagner: ICS- 
PEA: Evolutionary five-axis milling path optimisa- 
tion, 9th Annu. Genet. Evol. Comput. Conf. (2007) 
pp. 2122-2128 

D. Biermann, A. Zabel, T. Michelitsch, P. Kerst- 
ing: Intelligent process planning methods for the 
manufacturing of moulds, Inter. J. Comput. Appl. 
Technol. 40(1/2), 64-70 (2011) 

T. Hastie, R. Tibshirani, J. Friedman: The Elements 
of Statistical Learning: Data Mining, Inference, and 
Prediction, 2nd edn. (Springer, Berlin, Heidelberg 
2009) 

M. Gheorghe, C. Neagu, S. Antoniu, C. lonita: Mod- 
eling of abrasive jet drilling by applying a neu- 
ral network method, 2nd CIRP Int. Seminar Intell. 
Comput. Manuf. Eng. (2000) pp. 221-226 

S.L. Campanelli, A.D. Ludovico, C. Bonserio, P. Cav- 
alluzzi: Artificial neural network modelling of the 
laser milling process, 5th CIRP Int. Semin. Intell. 
Comput. Manuf. Eng. (2006) pp. 107-111 

G. Casalino, A.D. Ludovico, F.M.C. Minutolo, A. Ro- 
tondo: On the numerical modelling of a milling 
operation: Data recoveringand interpolation, 5th 
CIRP Int. Semin. Intell. Comput. Manuf. Eng. (2006) 
pp. 193-197 

P. Clayton, M.A. Elbestawi, T.I. El-Wardany, 
D. Viens: An innovative calibration technique 
using neural networks for a mechanical model of 
the 5-Axis milling process, 2nd CIRP Int. Semin. 
Intell. Comput. Manuf. Eng. (2000) pp. 391-396 

G. Casalino, A.D. Ludovico: Tool life estimation in 
single point turning using artificial neural net- 
works, 4th CIRP Int. Semin. Intell. Comput. Manuf. 
Eng. (2004) 

G. Ambroglio, D. Umbrello, L. Filice: Diffusion wear 
modelling in machining using ANN, 5th CIRP Int. 
Semin. Intell. Comput. Manuf. Eng. (2006) pp. 69- 
73 

C. Bruni, A. Forcellese, F. Gabrielli, M. Simoncini: 
Thermal error prediction in a machining center us- 
ing statistical and neural network-based models, 
4th CIRP Int. Semin. Intell. Comput. Manuf. Eng. 
(2004) 

N. Hansen, A. Ostermeier: Completely derandom- 
ized self-adaptation in evolution strategies, Evol. 
Comput. 9(2), 159-195 (2001) 


59.51 


59.52 


59.53 


59.54 


59.55 


59.56 


59.57 


59.58 


59.59 


59.60 


59.61 


59.62 


59.63 


59.64 


D. Biermann, K. Weinert, T. Wagner: Model-based 
optimization revisited: Towards real-world pro- 
cesses, Proc. 2008 IEEE Congr. Evol. Comput. (2008) 
pp. 2980-2987 

0. Kramer: Covariance matrix self-adaptation and 
kernel regression — perspectives of evolutionary 
optimization in kernel machines, J. Fundam. In- 
form. 98(1), 87-106 (2010) 

T. Özel, Y. Karpat: Identification of constitutive ma- 
terial model parameters for high-strain rate metal 
cutting conditions using evolutionary computa- 
tional algorithms, Mater. Manuf. Process. 22, 659- 
667 (2007) 

D. Biermann, T. Surmann, G. Kehl: Oscillator model 
of machine tools for the simulation of self excited 
vibrations in machining processes, 1st Int. Conf. 
Process Mach. Interact. (2008) pp. 23-29 

K. Weinert, M. Stautner: Reconstruction of parti- 
cle flow mechanisms with symbolic regression via 
genetic programming, Proc. Genet. Evol. Comput. 
Conf. (2001) pp. 1439-1443 

K. Weinert, M. Stautner, J. Mehnen: Automatic gen- 
eration of mathematical descriptions of cutting 
processes from video data, production engineer- 
ing, Res. Dev. 9(2), 55-58 (2002) 

R. Teti, G. Giorleo, U. Prisco, D. DAddona: Inte- 
gration of neural network material modelling into 
the FEM simulation of metal cutting, 3rd CIRP Int. 
Semin. Intell. Comput. Manuf. Eng. (2002) 

I.S. Jawahir, X. Wang: Development of hybrid pre- 
dictive models and optimization techniques for 
machining operations, J. Mater. Process. Technol. 
185(1-3), 46-59 (2007) 

A.D. Jayal, I.S. Jawahir: Analytical and computa- 
tional challenges for developing predictive models 
and optimization strategies for sustainable ma- 
chining, 7th CIRP Int. Conf. Intell. Comput. Manuf. 
Eng. (2010) 

G. Celano, S. Fichera, E.L. Valvo: Optimization of 
cutting parameters in multi pass turning operations 
for continuous forms, 2nd CIRP Int. Semin. Intell. 
Comput. Manuf. Eng. (2000) pp. 417-422 

C.W. Lee, Y.C. Shin: Evolutionary modelling and op- 
timization of grinding processes, Intern. J. Prod. 
Res. 38(12), 2787-2813 (2000) 

X. Wang, Z.J. Da, A.K. Balaji, I.S. Jawahir: 
Performance-based optimal selection of cutting 
parameters and cutting tools in multi-pass turning 
operations using genetic algorithms, 2nd CIRP Int. 
Semin. Intell. Comput. Manuf. Eng. (2000) pp. 409- 
414 

V. Tandon, H. El-Mounayri, H. Kishawy: NC end 
milling optimization using evolutionary compu- 
tation, Int. J. Mach. Tools Manuf. 42(5), 595-605 
(2002) 

X. Wang, l.S. Jawahir: Web-based optimization 
of milling operations for the selection of cutting 
conditions using genetic algorithms, 3rd CIRP Int. 
Semin. Intell. Comput. Manuf. Eng. (2002) 


Modeling and Optimization of Machining Problems 


References 


59.65 


59.66 


59.67 


59.68 


59.69 


59.70 


59.71 


59.72 


59.73 


59.74 


59.75 


59.76 


59.77 


59.78 


59.79 


C.W. Lee, T. Choi, Y.C. Shin: Intelligent model-based 
optimization of the surface grinding process for 
heat-treated 4140 steel alloys with aluminum oxide 
grinding wheels, J. Manuf. Sci. Eng. 125(1), 65-76 
(2003) 

K. Vijayakumar, G. Prabhaharan, P. Asokan, R. Sar- 
avanan: Optimization of multi-pass turning oper- 
ations using ant colony system, Inter. J. Mach. Tools 
Manuf. 43(15), 1633-1639 (2003) 

X. Wang, A. Kardekar, I.S. Jawahir: Performance- 
based optimization of multi-pass face-milling op- 
erations using genetic algorithms, 4th CIRP Int. 
Semin. Intell. Comput. Manuf. Eng. (2004) 

Y. Karpat, T. Özel: Hard turning optimization using 
neural network modeling and swarm intelligence, 
Trans. North Am. Manuf. Res. Inst. 33, 179-186 
(2005) 

T.-H. Hou, C.-H. Su, W.-L. Liu: Parameters op- 
timization of a nano-particle wet milling pro- 
cess using the Taguchi method, response surface 
method and genetic algorithm, Powder Technol. 
173(3), 153-162 (2007) 

J.L. Vigouroux, L. Deshayes, S. Foufou, L.A. Welsh: 
An approach for optimization of machining pa- 
rameters under uncertainties using intervals and 
evolutionary algorithms, CIRP J. Manuf. Syst. 5(36), 
395-399 (2007) 

A.R. Yildiz: A novel hybrid immune algorithm for 
global optimization in design and manufactur- 
ing, Robot. Comput. Integr. Manuf. 25(2), 261-270 
(2009) 

R. Roy, J. Mehnen: Dynamic multi-objective op- 
timisation for machining gradient materials, CIRP 
Ann. Manuf. Technol. 57(1), 429-432 (2008) 

F. Cus, J. Balic, U. Zuperl: Hybrid ANFIS-ants sys- 
tem based optimisation of turning parameters, 
J. Achiev. Mater. Manuf. Eng. 36(1), 79-86 (2009) 
R. Datta, K. Deb: A classical-cum-evolutionary 
multi-objective optimization for optimal machin- 
ing parameters, Nat. Biol. Inspir. Comput. (2009) 
pp. 607-612 

A.R. Yildiz: A novel particle swarm optimization 
approach for product design and manufacturing, 
Inter. J. Adv. Manuf. Technol. 40(5-6), 617-628 
(2009) 

A.N. Sait: Optimization of machining parameters 
of GFRP pipes using evolutionary techniques, Int. 
J. Precis. Eng. Manuf. 11(6), 891-900 (2010) 

A.M. Zain, H. Haron, S. Sharif: Application of GA to 
optimize cutting conditions for minimizing surface 
roughness in end milling machining process, Ex- 
pert Syst. Appl. 37(6), 4650-4659 (2010) 

K. Deb, R. Datta: Hybrid evolutionary multi- 
objective optimization of machining parameters, 
Eng. Optim. 44(6), 685-706 (2011) 

A.A. Krimpenis, P.I.K. Liakopoulos, K.C. Gian- 
nakoglou, G.-C. Vosniakos: Multi-objective design 
of optimal sculptured surface rough machining 
through pareto and nash techniques, 6th Conf. 


59.80 


59.81 


59.82 


59.83 


59.84 


59.85 


59.86 


59.87 


59.88 


59.89 


59.90 


59.91 


59.92 


59.93 


59.94 


Evol. Determ. Methods Des. Optim. Contr. Appl. Ind. 
Soc. Probl. (2005) 

I.N. Tansel, B. Ozcelik, W.Y. Bao, P. Chen, D. Rincon, 
SY. Yang, A. Yenilmez: Selection of optimal cutting 
conditions by using GONNS, Inter. J. Mach. Toolls 
Manuf. 46(1), 26-35 (2006) 

F. Cus, U. Zuperl, V. Gecevska: High speed end- 
milling optimisation using particle swarm intelli- 
gence, J. Achiev. Mater. Manuf. Eng. 22(2), 75-78 
(2007) 

U. Zuperl, F. Cus, V. Gecevska: Optimization of the 
characteristic parameters in milling using the PSO 
evolution technique, J. Mech. Eng. 6, 354-368 
(2007) 

F. Cus, U. Zuperl: Particle swarm intelligence 
based optimisation of high speed end-milling, 
Arch. Comput. Mater. Sci. Surf. Eng. 1(3), 148-154 
(2009) 

T. Wagner, H. Trautmann: Integration of preferences 
in hypervolume-based multiobjective evolutionary 
algorithms by means of desirability functions, IEEE 
Trans. Evol. Comput. 14(5), 688-701 (2010) 

E.L. Valvo, B. Martuscelli, M. Piacentini: NC end 
milling optimization within CAD/CAM system using 
particle swarm optimization, 4th CIRP Int. Semin. 
Intell. Comput. Manuf. Eng. (2004) pp. 357-362 

E. Borsetto, N. Gramegna: Multi-objective opti- 
mization of machining process using advantedge 
FEM tool, 7th CIRP Int. Conf. Intell. Comput. Manuf. 
Eng. (2010) 

R. Li, M.T.M. Emmerich, J. Eggermont, T. Bäck, 
M. Schütz, J. Dijkstra, J.H.C. Reiber: Mixed-integer 
evolution strategies for parameter optimization, 
Evol. Comput. 21(1), 29-64 (2013) 

R.V. Rao, P.J. Pawar: Parameter optimization of a 
multi-pass milling process using non-traditional 
optimization algorithms, Appl. Soft Comput. 10(2), 
445-456 (2010) 

Z.G. Wang, M. Rahman, Y.S. Wong, J. Sun: Opti- 
mization of multi-pass milling using parallel ge- 
netic algorithm and parallel genetic simulated an- 
nealing, Int. J. Mach. Tool. Manuf. 45(15), 1726-1734 
(2005) 

R. Roy, Y.T. Azene, D. Farrugia, C. Onisa, J. Mehnen: 
Evolutionary multi-objective design optimisation 
with real life uncertainty and constraints, CIRP 
Annu. Manuf. Tech. 58(1), 169-172 (2009) 

H.-G. Beyer, B. Sendhoff: Robust optimization - 
A comprehensive survey, Comput. Method. Appl. 
Mech. Eng. 196(33/34), 3190-3218 (2007) 

S. Mekid, T. Ogedengbe: A review of machine tool 
accuracy enhancement through error compensa- 
tion in serial and parallel kinematic machines, Int. 
J. Precis. Technol. 1(314), 251-286 (2010) 

R. Teti, K. Jemielniak, G. O'Donnell, D. Dornfeld: 
Advanced monitoring of machining operations, 
CIRP Ann. Manuf. Technol. 59, 717-739 (2010) 

H. Müller, D. Biermann, P. Kersting, T. Miche- 
litsch, C. Begau, C. Heuel, R. Joliet, J. Kolanski, 


183 


6S |3 Hed 


184 PartE 


6S | 3 Hed 


Evolutionary Computation 


59.95 


M. Kröller, C. Moritz, D. Niggemann, M. Stöber, 
T. Stönner, J. Varwig, D. Zhai: Intuitive visualiza- 
tion and interactive analysis of pareto sets ap- 
plied on production engineering. In: Success in 
Evolutionary Computation, Studies Computational 
Intelligence, Vol. 92, ed. by A. Yang, Y. Shan, 
L.T. Bui (Springer, Berlin, Heidelberg 2008) pp. 189- 
214 

H. Pohlheim: Understanding the course and state 
of evolutionary optimizations using visualization: 
Ten years of industry experience with evolutionary 
algorithms, Artif. Life 12(2), 217-227 (2006) 


59.96 


59.97 


59.98 


S. Obayashi, D. Sasaki: Evolutionary multi-criterion 
optimization. In: Visualization and Data Mining of 
Pareto Solutions Using Self-Organizing Map, ed. by 
C. Fonseca, P. Fleming, E. Zitzler, L. Thiele, K. Deb 
(Springer, Berlin Heidelberg 2003) pp. 796-809 

K. Deb: Innovization: Discovering Innovative So- 
lution Principles Through Optimization (Springer, 
Berlin, Heidelberg 2011) 

B. Sieben, T. Wagner, D. Biermann: Empirical mod- 
eling of hard turning of AISI 6150 steel using design 
and analysis of computer experiments, Prod. Eng. 
Res. Dev. (2-3), 115-125 (2010) 


1185 


60. Aerodynamic Design 
with Physics-Based Surrogates 


Emiliano luliano, Domenico Quagliarella 


Details, references and guidelines are given about 


the adoption of surrogate models and reduced- Aiae a me 
order models within the aerodynamic shape = and Design Space Definition 1192 
optimization context. The aerodynamic design 60.4.2 Design of Experiments AT 
problem and its approximated version are in- 60.4.3 Zonal POD ioniseren 1193 “i 
troduced and discussed and then, an overview 60.4.4 Model Training, Validation, z 
of various surrogate models and surrogate-based and Error Analysis ............00000.. 1194 = 
optimization methods is given. Subsequently, the , i 2 
concept of model order reduction is recalled, and 60.5 ae ee POD Mogel 1199 
the performance analysis of reduced-order models Quality: naaa a a maseadhe 
n 60.54 RatOnalE. cccccsidessessacieasunncsdenss 1199 

based on proper orthogonal decomposition (POD) 
K Sa : : 60.5.2 Improvement 
is discussed. Within this context, some techniques of the Modal Basis 1200 
to adaptively and globally improve the accuracyof = aaea , 00 7 00 ewer 

: : 60.5.3 Improvement 
POD-based surrogates are illustrated. Finally, an ofthe Modal coefficients 1200 


aerodynamic shape design problem of a transonic 
airfoil is used to practically analyze and compare 
the performances of various surrogate-based op- 


60.4 Application Example 


60.6 Aerodynamic Shape Optimization 
by Surrogate Modeling 


nran miRods. and Evolutionary Computing................ 1201 
60.6.1 Problem Definition................... 1202 
60.6.2 Optimization Strategies 
60.1 The Aerodynamic Design Problem......... 1186 and Setup. neersien: 1202 
60.1.1 Problem Approximation............. 1186 60.6.3 Non-Adaptive Optimization 
: : E A 1204 
60.2 Literature Review 
E E 60.6.4 Adaptive Optimization Results .... 1204 
fS te-Based Opt tion.......... 1187 
E E ae 60.6.5 Optima Analysis aaee 1206 
60.3 POD-Based Surrogates... 1190 i 
50.31 Model Order Reduction ............ 1190 60.7 Conclusions.. iieis mseirrintiassres 1207 
60.3.2 POD Theory and Solution............ 1191 References.............eeeeeesrerrrrrrereeerree 1208 


Modern air vehicle design has been increasingly driven 
by environmental as well as operational constraints. En- 
vironmental concerns, including emissions and noise, 
are gaining increasing importance in the design and 
operations of commercial aircraft. Taking into account 
the current prognoses for the growth in air traffic, the 
above-mentioned challenges become even more signif- 
icant [60.1—4]. In this context, the development and 
assessment of new theoretical methodologies represents 
a cornerstone for reducing the experimental load, ex- 
ploring trade-offs, and proposing alternatives along the 
design path. The fidelity of such methods is essential 


to reproduce real-life phenomena with a significant de- 
gree of accuracy and to take them into account from 
the very beginning of the design process. Due to the in- 
trinsic complexity of aircraft design, the design space 
is often huge and difficult to explore fully, so that fast 
semi-empirical tools and rules [60.5—7], derived from 
classical configuration data, have been traditionally ap- 
plied. However, they exhibit a severe lack of accuracy 
when designing novel and unconventional concepts. 
Therefore, highly accurate analysis methods have been 
continuously introduced both in geometric represen- 
tation and physical modeling, but the main drawback 


186 PartE 


Evolutionary Computation 


1'09 |3 Hed 


is that they are computationally expensive. For exam- 
ple, the solution of the Navier-Stokes equations around 
complex aerodynamic configurations requires a huge 
amount of computational resources even on modern 
state-of-art computing platforms. This turns out to be 
an even bigger issue when hundreds or thousands of 
analysis evaluations, like in parametric or optimization 
studies, have to be performed. In order to speed up the 
computation while keeping a high level of fidelity, the 
scientific community is increasingly focusing on sur- 
rogate methodologies like meta-models, multi-fidelity 
models, or reduced-order models. These can provide 
a compact, accurate, and computationally efficient rep- 
resentation of aircraft design performance. 

The present chapter will give details and references 
about the adoption of surrogate models and, in par- 


ticular, reduced-order models within the aerodynamic 
shape optimization context. In Sect. 60.1, the aerody- 
namic design problem and its approximated version 
will be introduced. Then, an overview of various surro- 
gate models and surrogate-based optimization methods 
will be given. In Sects. 60.3 and 60.4, the concept of 
model order reduction will be recalled and the per- 
formance analysis of reduced-order models based on 
proper orthogonal decomposition (POD) will be dis- 
cussed. In the Sect. 60.5, some techniques to adaptively 
and globally improve the accuracy of POD-based sur- 
rogates will be presented. Finally, Sect. 60.6 will be 
devoted to the analysis and comparison of the perfor- 
mances of various surrogate-based optimization meth- 
ods with respect to the aerodynamic shape design of 
a transonic airfoil. 


60.1 The Aerodynamic Design Problem 


A broad class of aircraft design applications can be 
numerically modeled with the minimization of a func- 
tion f which depends on two sets of variables: the 
design variables w, which the designer can directly con- 
trol it, and the state variables x, which provide the 
evolution of the system representing the underlying 
physics. The design problem can be formulated as the 
non-linear programming problem 


min fw, x) 
subject to r(w,x) = 0, h(w,x) =0, 
g(w,x) <0, 


wL wswy. 
(60.1) 


f is the objective function which the designer wants 
to minimize to improve performance. In aircraft de- 
sign, typical objective functions are weight, noise, drag, 
aerodynamic efficiency, or a combination of thereof. 
r(w,x) is the state equations set, which links the de- 
sign variables and the state variables and it usually 
represents the governing laws of physics. In aerody- 
namic design, the state equations are modeled through 
computational fluid dynamics, e.g., the Navier-Stokes 
equations, which relate scalar or vector field (state) vari- 
ables, like pressure or velocity, to the design variable 
vector. In a shape optimization problem, the design vec- 
tor is made dependent on the aircraft component shape 


by means of a parameterization approach. The vec- 
tors g(w, x) and h(w, x) are filled, respectively, with in- 
equality and equality constraint functions, which must 
be satisfied for a design candidate to be considered fea- 
sible. Typical constraint functions in aircraft design are 
related to the generation of a minimum lift level to 
balance the weight or a threshold pitching moment co- 
efficient to allow for trim. wz and wy are the lower 
and upper bounds of the design variables and thus 
specify the range of allowable values for the design vec- 
tor w. 


60.1.1 Problem Approximation 


The computational time required to solve this prob- 
lem is basically affected by two parameters: the num- 
ber of function evaluations required to minimize the 
objective function and the cost of a single evalua- 
tion. Given a vector w*, the latter is dominated by 
the computational effort needed to solve the state 
equations 


r(w*,x) =0. 


The adoption of a surrogate model reduces the cost 
per objective function evaluation. A surrogate model 
consists in replacing the expensive objective f and 
constraint functions g,h with less expensive, lower- 
fidelity models f and 8,h. Concerning reduced-order 
modeling, it can be observed that the dimensionality 


Aerodynamic Design with Physics-Based Surrogates 


60.2 Literature Review of Surrogate-Based Optimization 


of the optimization problem is twofold: the state vec- 
tor and design vector dimension. As the first one is 
usually much bigger than the second one, model re- 
duction can be applied to make explicit the dependency 
of x on w and solve the state variables as functions of 
the design ones. In other words, unlike response sur- 
faces and meta-models, an approximation ĉ of the state 
variables is available thanks to the model order reduc- 
tion. As a consequence, the reduced-order approximate 
form of the optimization problem (60.1) can be cast 


as 


min fw) 
w 
subject to h(w) = 0, èw) <0, (60.2) 
WLSwswu, 
where the dependence on the state variables has been 


dropped and the state vector has an explicit approximate 
relation with the design vector: x = k(w). 


60.2 Literature Review of Surrogate-Based Optimization 


This section proposes a survey of the most relevant 
surrogate-based optimization concepts. The topics have 
been widely discussed in the recent past, thanks to 
their innovative character and broad application areas. 
The introduction of surrogate models as fitness approx- 
imation within an evolutionary optimization system 
mitigates the demand for large computational resources 
associated to such search algorithms, allowing us to 
find a proper balance between the complete exploration 
of huge design spaces and limited cost. To this aim, 
reduced-order modeling through POD is a step for- 
ward, as a modal decomposition of an ensemble of 
functions, derived from numerical simulations, is per- 
formed to extract the most relevant patterns in the data 
set. Hence, compared to standard, interpolating meta- 
models, which are usually trained on an integral func- 
tion representing the objective, reduced-order models 
should assure a deeper insight into the phenomena 
modeled. 

Surrogate-based optimization (SBO) has been in- 
troduced to tackle the number of function evaluations 
in many engineering optimization problems. In aircraft 
design common practice, they can be used as a quick 
evaluator in several tasks: parametric analyses over the 
design space, optimization and control, and uncertainty 
quantification. A special challenge is represented by 
their use in global optimization as state-of-the-art meth- 
ods, which often requires more function evaluations 
than can be comfortably affordable. A well-established 
approach consists in fitting some kind of response func- 
tions to basic data obtained by evaluating the objectives 
and constraints at a few points. The resulting surfaces, 
affordable at low cost, can provide fast answers in terms 
of trade-off analysis and optimization, as well as just an 
intuitive sketch behavior by means of simple visualiza- 
tion. The basic process consists of the following steps: 


sampling the design space — once the design variables 
have been chosen, a sampling plan is defined and some 
initial sample designs are analyzed with an accurate 
solver; surrogate model selection and construction — 
a surrogate model type is selected and used to build 
a meta-model of the underlying problem; model valida- 
tion — the model is checked according to some statistical 
metrics and, if not accurate enough, a search is carried 
out using the model to identify new design points for 
analysis; model updating — the new results are added to 
those already available and a new meta-model is built 
(repeating the last three steps); optimization — the re- 
fined surrogate is used to provide objective/constraint 
functions. 

As SBO covers so many topics, the literature on 
the subject is huge. Many ideas have been proposed in 
the last 20 years, which are classified for design space 
dimensions, surrogate methods, search algorithms, up- 
dating algorithms, application areas. Hence, an exhaus- 
tive survey of all the possible ideas for each topic 
and all their possible combinations would go beyond 
the scope of the present research. Generally speaking, 
surrogate models can be roughly divided into three 
classes: data fit surrogates, multi-fidelity models, and 
reduced-order models. Data fitting models rely on the 
approximation of data (response values, gradients, and 
Hessians) generated from the high-fidelity model. In 
order to give a global behavior to surrogate methods, 
the whole design space must be sampled in advance 
by using design of experiments. Global approxima- 
tions, often referred to as response surface methods, 
can be obtained with polynomial regression [60.8], 
Gaussian processes, Kriging interpolation [60.9], ra- 
dial basis function networks [60.10], multi-adaptive 
regression splines [60.11], and artificial neural net- 
works [60.12]. 


1187 


7°09 | J Hed 


188 PartE | Evolutionary Computation 


2°09 |3 Hed 


A second class of surrogates is the hierarchical one 
(also called multi-fidelity or variable fidelity). Unlike 
the data fit surrogates, they do not need to be trained 
on a sampling dataset, but they rely on a lower fi- 
delity approximation which, however, is still inspired 
by the physical behavior of the system. Multi-fidelity 
models are classified according to the way they oper- 
ate the fidelity reduction: examples in aerodynamics are 
coarser mesh discretization, partially converged solu- 
tion [60.13], and model fidelity reduction [60.14-16] 
(e.g., using the Euler model instead of the Navier— 
Stokes equations by neglecting the effects of fluid 
viscosity and heat transfer). The name multi-fidelity 
usually refers to the capability of mixing and exploiting 
both high-fidelity and lower-fidelity models in an effi- 
cient way so as to keep the fidelity of the former only 
when it is needed and to take advantage of the higher 
speed of the latter otherwise. 

A third class is represented by reduced-order mod- 
els. A reduced-order model (ROM) is mathematically 
derived from a high-fidelity model using a projec- 
tion technique. It consists in computing a set of basis 
functions (e.g., eigenmodes, left singular vectors) upon 
which the available dataset (ensemble) is projected to 
compute the unknown model parameters. The model re- 
duction is obtained by capturing the principal dynamics 
of the system and neglecting the less significant from 
a physical point of view. Hence, similarly to data fit 
surrogates, reduced-order models require the a-priori 
solution of the expensive high-fidelity model. The ad- 
vantage of reduced-order models with respect to data 
fits is that the most significant features of the flow field 
can be derived by approximation, thus offering the po- 
tential to keep more physics within the surrogate. The 
proper orthogonal decomposition or principal compo- 
nent analysis (PCA) is an elegant and powerful data- 
reduction method for non-linear physical systems. Its 
application as a surrogate to the aerodynamic optimiza- 
tion of aircraft components is the core of the present 
chapter. 

Hereinafter, a more in depth look is given at the var- 
ious methods of constructing a surrogate model and, 
in particular, at optimization assisted with the surro- 
gate. Jones et al. [60.17] was among the first to propose 
a response surface methodology based on modeling 
the objective and constraint functions with stochastic 
processes (Kriging). The so-called design and analy- 
sis of computer experiments (DACE) stochastic process 
model was built as a sum of regression terms and 
normally distributed error terms. The main concep- 
tual assumption was that the lack of fit due only to 


the regression terms can be considered as entirely due 
to modeling error, not measurement error or noise, 
because the training data are derived from a determin- 
istic simulation. Hence, by assuming that the errors 
at different points in the design space are not inde- 
pendent and the correlation between them is related 
to the distance between the computed points, the au- 
thors came up with an interpolating surrogate model 
that is able to provide not only the prediction of ob- 
jectives/constraints at a desired sample point, but also 
an estimation of the approximation error. After the 
construction of such a surrogate model, the latter pow- 
erful property is exploited to perform an efficient global 
optimization (EGO), which can be considered as the 
progenitor of a long and still in development chain of 
SBO methods. Indeed, they found a proper balance be- 
tween the need to exploit the approximation surface (by 
sampling where it is minimized) with the need to im- 
prove the approximation (by sampling where prediction 
error may be high). This was done by introducing the 
expected improvement (EI) concept, already proposed 
by Schonlau et al. [60.18], which is an auxiliary func- 
tion to be maximized instead of the original objective. 
The EI function is designed in order to provide a proper 
balance between exploration and exploitation. 

In a further work, Jones [60.19] proposed a tax- 
onomy of global SBO methods. Seven methods were 
identified and classified according to whether they were 
interpolating (cubic splines, thin-plate splines, mul- 
tiquadrics, Kriging) or not (quadratic polynomials), 
whether they provided statistical information (Kriging) 
or not (splines), and whether the method for select- 
ing search points (updating the model by adding new 
sample points) was two stage (probability/expected 
improvement) or one stage (goal-seeking, credibility 
function). 

Gutmann [60.10] reported excellent numerical re- 
sults for a spline-based implementation of Method 7 
and proved the convergence of the method. Compared 
to previous methods, Method 7 required a high number 
of true function evaluations to find the global opti- 
mum, but, as Jones wrote, this is the price we pay for 
the additional robustness. An overview of SBO tech- 
niques was also presented by Queipo et al. [60.20] and 
Simpson et al. [60.21]. They covered some of the most 
popular methods in design space sampling, surrogate 
model construction, model selection and validation, 
sensitivity analysis, and surrogate-based optimization. 
Forrester and Keane [60.22] recently proposed a re- 
view of some advances in surrogate-based optimization. 
An important lesson learned is that only calling the 


Aerodynamic Design with Physics-Based Surrogates 


60.2 Literature Review of Surrogate-Based Optimization 


true function can confirm the results coming from the 
surrogate model. Indeed, the path towards the global 
optimum is made of iterative steps where, even exploit- 
ing some surrogate model, only the best results coming 
from the true function evaluations are taken as optimal 
or sub-optimal design. The true function evaluation also 
has to be invoked to improve the surrogate model. With 
the term in-fill criteria we usually mean some princi- 
ples that allow us to intelligently place new points (in- 
fill points) at which the true function should be called. 
The selection of infill points, also referred to as adap- 
tive sampling or model updating, represent the core of 
a surrogate-based optimization method and helps to im- 
prove the surrogate prediction in promising areas of the 
objective space. 

The right choice of the number of points which the 
initial sampling plan would comprise and the ratio be- 
tween initial/in-fill points has been the focus of several 
recent studies. However, it must be emphasized that no 
universal rules exist, as each choice should be care- 
fully evaluated according to the design problem (e.g., 
the number of variables, computational budget, type of 
surrogate). Forrester and Keane [60.22] assumed that 
there is a maximum budget of function evaluations, so 
as to define the number of points as a fraction of this 
budget. They identified three main cases according to 
the aim of the surrogate construction: pure visualization 
and design space comprehension, model exploitation, 
and balanced exploration/exploitation. In the first case, 
the sampling plan should contain all of the budgeted 
points, as no further refinement of the model is fore- 
seen. In the exploitation case, the surrogate can be used 
as the basis for an in-fill criterion, which means some 
computational budget must be saved for adding points 
to improve the model. They also proposed to reserve 
less than one half of the points for the exploitation 
phase, as a small amount of surrogate enhancement is 
possible during the in-fill process. In the third case, that 
is the two-stage balanced exploitation/exploration in- 
fill criterion, as also shown by Sdbester et al. [60.23], 
they suggested employing one third of the points in the 
initial sample while saving the remaining for the in- 
fill stage. Indeed, such balanced methods rely less on 
the initial prediction, and so fewer points are required. 
Concerning the choice of the surrogate, the authors ob- 
served that it should depend on the problem size, i. e., 
the dimensionality of the design space, the expected 
complexity, the cost of the true analyses, and the in- 
fill strategy to be adopted. 

However, for a given problem, there is no general 
tule. The proper choice could come up after various 


model selection and validation criteria. The accuracy 
of a number of surrogates could be compared by as- 
sessing their ability to predict a validation data set. 
Therefore, part of the true computed data should be 
used for validation purposes only and not for model 
training. This approach can be infeasible when the true 
evaluations are computationally expensive. To over- 
come this issue, Goel et al. [60.24] proposed a weighted 
average of an ensemble of surrogates. For example, 
a better model can be achieved by combining Kriging, 
which might accurately predict the non-linear aspects 
of a function, and polynomials to better capture the 
regression trends. Forrester also underlined that some 
in-fill criteria and certain surrogate models are some- 
what intimately connected. For a surrogate model to 
be considered suitable for a give in-fill criterion, the 
mathematical machinery of the surrogate should ex- 
hibit the capability to adapt to unexpected, local non- 
linear behavior of the true function to be mimicked. 
From this point of view, polynomials can be imme- 
diately excluded since a very high order would be 
required to match this capability, implying a high num- 
ber of sampling points. In principle, the convergence 
to a local optimum might be achieved by simply min- 
imizing the surrogate, evaluating the true function at 
the minimum point and updating the model database 
with the new point. Conversely, a global search would 
require a surrogate model able to provide an estimate 
of the error it commits when predicting. Thus, the 
authors suggested the use of Gaussian process-based 
methods like Kriging, although citing the work of Gut- 
mann [60.10] as an example of a one-stage goal seeking 
approach employing various radial basis functions. Fi- 
nally, some interesting suitable convergence criterion 
to stop the surrogate in-fill process were proposed. In 
an exploitation case, i.e., when minimizing the sur- 
rogate prediction, one can rather obviously choose to 
stop when no further significant improvement is de- 
tected. On the other hand, when an exploration method 
is employed, one is interested in obtaining a satisfying 
prediction everywhere, so that one can decide to stop 
the in-filling when some generalization error metrics, 
e.g., cross-validation, fall below a certain threshold. 
When using the probability or expectation of improve- 
ment, a natural choice is to consider the algorithm 
converged when the probability is very low or the ex- 
pected improvement drops below a percentage of the 
range of objective function values observed. However, 
the authors also observed that a discussion on conver- 
gence criteria may be interesting and fruitful, but in 
many real engineering problems we actually stop when 


1189 


7°09 | J Hed 


190 PartE 


Evolutionary Computation 


€°09 | 3 Hed 


we run out of available time or resources, as dictated 
by design cycle scheduling or costs. This is what typi- 
cally happens in aerodynamic design, where the high- 
dimensionality of the design space and expensive com- 


60.3 POD-Based Surrogates 


In this section a review of the mathematical core of 
POD is presented. POD is a mathematical procedure 
that allows us to perform a modal decomposition of 
a large set of multi-dimensional data so as to derive a di- 
mensionality reduction and describe the original system 
with much fewer unknowns. The mathematical devel- 
opment of POD for fluid flow applications, in particular, 
is described in some detail in [60.25]. Here, the main 
aspects related to the construction of a reduced-order 
model through singular value decomposition are pre- 
sented and mainly the use of this technique for steady- 
state problems is addressed. 


60.3.1 Model Order Reduction 


Physics-based approximation concepts require a deep 
understanding of the governing equations and the nu- 
merical methods employed for their solution. The sub- 
stantial difference between a reduced-order model and 
data fit model consists in retaining an explicit depen- 
dency between state variables, related to the governing 
equations and design parameters. In other words, re- 
duced-order models operate on the dimensionality of 
the discretization of the state equations rather than 
on the design space. Thus, such models are partially 
independent of a notable increase in the number of 
design variables. A reduced-order model, in fact, mim- 
ics the basic structure of the problem and not just 
a functional relationship between input and output pa- 
rameters. Hence, the main advantage of using reduced- 
order models lies in their being mostly insensitive to the 
curse of dimensionality. 

To illustrate the model order reduction concept, 
consider the discrete mathematical model (e.g., the 
Navier-Stokes equations) of a physical system written 
in the form 

R(w, x(w)) = 0, (60.3) 
where w € R' is the vector of design variables and x € 
R“ the discretized vector of state (or field) variables (ve- 
locity, energy, density). Note that x is an implicit vector 


puter simulations often do not allow us to reach the 
global optimum of the design problem but suggest that 
we consider even a premature, sub-optimal solution as 
a converged point. 


function of the design variable vector. Unlike classical 
data fit methods (e.g., Kriging, RBF) which work on 
local or integral values of the state variables, reduced- 
order methods, instead, provide an approximation of the 
state vector in the form 


X=c)6,+---+cyby = @c, 
where 


Oy} E ROM 


9 = {$,,.. 
is a matrix of known basis vectors and 


c= {c1,...,¢cmu} eR” 

is a vector of unknown coefficients. The underlying 
approximation is that the state vector lies in the sub- 
space spanned by a set of basis vectors. Obviously, 
this is not true for each state vector, but a proper 
choice of an orthonormal basis can lead to the mini- 
mization of the approximation error in a least squares 
sense. This is how a proper orthogonal decomposi- 
tion is derived. Following this approach, the problem 
of representing a state vector with q unknowns can be 
recast into a problem with M unknowns and, as usu- 
ally q > M, it is possible to obtain an approximation 
of x very efficiently. The estimation of the vector c 
can be obtained with different techniques, classified as 
intrusive and non-intrusive. The first introduce the ap- 
proximation in the governing equations and find the 
coefficients by minimization of the residual norm; the 
second employ data fit techniques trained on a set of 
known coefficients. 

The basis vectors can be computed starting from 
state solutions of the discrete governing equations 
which correspond to M different values of the param- 
eters w. As a consequence, the matrix ® contains the 
basis vectors of the subspace 


® = span{x(w}), x(w2),...,x(wy)} ERP . 
(60.4) 


Aerodynamic Design with Physics-Based Surrogates 


60.4 Application Example of POD-Based Surrogates 


For instance, the state solutions x; =x(w;) are ob- 
tained by solving the Reynolds-averaged Navier-Stokes 
(RANS) equations on M different configurations gen- 
erated by applying the parameterization method em- 
ployed on M design vectors w;. The definition of the 
M design sites where to compute the solutions is not 
a trivial issue; generally speaking, standard design of 
experiments techniques are used to sample the design 
space with good coverage properties, but, as will be 
discussed in next sections, this approach may lead to er- 
roneous results when facing highly multi-modal, highly 
non-linear problems. Indeed, the quality of the approxi- 
mation strongly depends on the location of training data 
in the design space. 


60.3.2 POD Theory and Solution 


The construction and training of POD-based surrogate 
models for aerodynamic applications are described in 
detail in [60.26, 27]. The singular value decomposition 
(SVD) solution of the POD basis vectors and coeffi- 
cients is used for steady-state problems. This approach 
is normally preferred to the eigenvalue/eigenvector so- 
lution, as it is faster and easier to implement. POD 
modeling is specifically focused on compressible aero- 
dynamic problems, hence the space domain will be 
represented by the discretized volume occupied by the 
flowing air, and the snapshot vectors will be defined 
from computed flow fields. The column vectors of the 
snapshot matrix (here also referred to as the ensemble 
matrix) contain the volume grid and flow variables as 
computed with a computational fluid dynamics (CFD) 
solver. 

The SVD solution allows us to obtain an optimal 
basis in the sense of the maximization of the averaged 
projection of the ensemble onto it. Hence, each snap- 
shot vector can be retrieved as linear combination of 
the POD basis. If a fluid dynamics problem is approxi- 


mated with a suitable number of snapshots from which 
a rich set of basis vectors is available, the singular val- 
ues become small rapidly and a small number of basis 
vectors are adequate to reconstruct and approximate the 
snapshots as they preserve the most significant ensem- 
ble energy contribution. In this way, POD provides an 
efficient mean of capturing the dominant features of 
a multi-degree of freedom system and representing it to 
the desired precision by using the relevant set of modes. 
The reduced-order model is derived by projecting the 
CFD model onto a reduced space spanned by only some 
of the proper orthogonal modes or POD eigenfunctions. 
This process realizes a kind of lossy data compression 
of the original ensemble. 

The resulting POD-based reduced-order model can 
be used in an optimization process to predict state so- 
lutions that are not included in the original ensemble. 
This useful feature requires the transformation of the 
projection coefficients from the discrete sample space 
for which they have been computed to a continuous 
space. In other words, by itself the POD model does 
not have a predictive feature globally, i.e., over the 
whole design space. Among the possible options to 
accomplish this task, here a functional relation is estab- 
lished between the POD coefficients, which represent 
the projection of a generic CFD flow field onto the 
set of POD basis vectors, and the design variables. It 
is well known that regression techniques are partic- 
ularly suitable to fit experimental data, as they filter 
the random noise out from the data. This behavior is 
less desirable when working with computer simula- 
tions based on determinism. In this case, one asks the 
data fit model to exactly reproduce the sample data 
used for training and to consistently catch the local 
data trends. A radial basis function (RBF) network an- 
swers to these criteria and, therefore, is used here to 
interpolate the POD coefficients over the whole design 
space. 


60.4 Application Example of POD-Based Surrogates 


This section proposes an application of POD-based re- 
duced-order models to the transonic flow around an 
airfoil. Indeed, as the POD model should introduce 
more physics within a surrogate approximation, one 
is interested in comparing such a novel approach to 
standard methodologies in order to establish, though in 
a preliminary way, its advantages and drawbacks. We 
consider a typical case in aircraft aerodynamics which, 


far from being considered of industrial interest, retains 
the main physics features of it. The aerodynamic case 
is represented by the steady, viscous air flow around 
a scaled RAE 2822 airfoil. This geometry was selected 
as it is a standard object in CFD numerical modeling 
and validation [60.28]. The POD snapshots are ob- 
tained by perturbing the RAE 2822 airfoil by means 
of the parameterization described later on. A mixed 


1191 


7°09 |3 Hed 


1192 


7°09 |3 Hed 


Part E 


Evolutionary Computation 


POD/CFD approach (zonal POD) is proposed to in- 
crease the accuracy level of the surrogate model in 
transonic conditions. 


60.4.1 Parameterization 
and Design Space Definition 


In the present context, surrogate modeling is aimed at 
providing a fast and accurate tool to speed up the pro- 
cess in aerodynamic shape design. As a consequence, 
one of the most important issues is to show its suitabil- 
ity and applicability to the shape optimization problem. 
Indeed, the definition of the design space through shape 
modification parameters typically involves a complex, 
often highly non-linear relation between the flow field 
and the design variables. Moreover, modifying an air- 
craft component (e.g., a wing airfoil) requires several 
parameters, thus enlarging the dimensions of the design 
space. It is straightforward, then, that the complexity of 
the problem increases and approaches a real-world ap- 
plication level. The class-shape transformation (CST) 
method [60.29] provides an analytical form to represent 
various geometries of aeronautical interest and it shows 
the interesting properties of continuity, differentiability, 
and reproducibility of a huge number of test shapes. It 
allows us to specify the airfoil contour as a product of 
a class function, which in the proposed case defines the 
rounded leading edge/pointed trailing edge airfoil class, 
and a shape function obtained as a linear combination 
of n-th-order Bernstein polynomials. The design vector 
is 
w= (4%, AŤ, eA Aig) 5 

where the first and last parameters AXT, Ae! are related, 
respectively, to the upper and lower leading edge radius 


a | 
ul ul __ le 
Re | Ag = aa 


and to the trailing edge closure angle B (A!! = 
tan 6”), as is shown in detail in [60.29]. 

In the present context, seventh-order Bernstein 
polynomials are considered, hence each airfoil side 
is described by eight design variables. The design 
space DW is then a subset of R!6. A scaled 14% 
thickness ratio RAE 2822 airfoil is selected as base- 
line airfoil. The airfoil geometry is shown in Fig. 60.1, 
where the x-coordinate is the abscissa along the airfoil 
chord and y is given by the aforementioned approach. 


The values of the corresponding design parameters, 
which define the RAE 2822 profile according to the 
parameterization, and their range of variation, which 
defines the design space, are reported in Table 60.1. 


60.4.2 Design of Experiments 


The location of sample points is an important issue with 
respect to the cost and accuracy of any surrogate model. 
The design of experiment theory (DOE) [60.30] was 
developed to provide experimentalists with a tool to 
optimally choose the independent variable values for 
a limited number of experiments. The aim is generally 
to use the results of the experiments to study and inves- 
tigate the response and sensitivity of some dependent 
quantity to the independent variables. Classical DOE 
methods were originally designed to alleviate the ef- 
fects of noise, so they have been generally employed 
in conjunction with regression techniques. However, 
computer experiments are not subject to random er- 
rors, hence it is worthwhile using a different strategy to 
obtain as much information as possible about the input— 
output dependence. A variety of methods have been 
developed to fill the design space in an optimal sense. 
One of the most widely adopted is the Latin hyper- 
cube sampling (LHS) technique. It was first proposed 
by McKay et al. [60.31] as an alternative to Monte Carlo 
techniques for the design of computer experiments. The 
basic principle of LHS is to bound the randomness of 
the sample selection in a given region. In fact, given 
that ¢ is the number of design variables, each design 
variable range is divided into m intervals or bins of 
equal probability. This generates a total of m x t bins 
in the whole space. Within each bin only one sample 
is allocated randomly. This ensures that a one-dimen- 
sional projection onto the parameter space will produce 
one sample in each bin, thus eliminating the correla- 
tions between the design variables. LHS is useful for 
the initialization of POD-based surrogate models, but, 
as will be detailed later on, it exhibits some major 


y 0.1 

0.05 

0 

—0.05 
0.1 o 


Y 


0.2 0.4 0.6 0.8 


Fig. 60.1 Baseline geometry, RAE 2822 airfoil 


Aerodynamic Design with Physics-Based Surrogates 


60.4 Application Example of POD-Based Surrogates 


Table 60.1 Design parameters, values, and ranges 


Parameter Baseline value Range 

AO 0.1293 0.1293, 0.2293 
At 0.1282 0.1282, 0.2282 
A5 0.1771 0.1771, 0.2771 
Ay 0.1219 0.1219, 0.2219 
A% 0.2393 0.2393, 0.3393 
AS 0.1662 0.1662, 0.2662 
AG 0.1976 0.1976, 0.2976 
As 0.2110 0.2110, 0.3110 


limits that prevent it from being used as a standard 
sampling technique for optimization purposes. Indeed, 
LHS is optimal in the sense of design space cover- 
age, but it does not allow for refining the sampling 
distribution according to enrichment or improvement 
criteria, e.g., design space exploration or objective func- 
tion minimization. 


60.4.3 Zonal POD 


The POD surrogate model is mainly designed as aROM 
within a shape optimization process, where the geome- 
try and, hence, the volume mesh vary with the design 
site. Moreover, the application is focused on transonic 
aerodynamics with potential flow separations and shock 
waves. Therefore, care must be taken with the defi- 
nition of the snapshot domain and how to extract the 
integral quantities of interest (e.g., aerodynamic force 
coefficients) from the snapshot structure. Indeed, as the 
snapshots are expressed through a linear combination of 
POD modes, shock waves, flow separations, and other 
non-linearities present in the training ensemble would 
be captured and replied in the POD modes, so that any 
prediction of a new snapshot would be likely to bring 
the footprint of those flow features with it. This is desir- 
able behavior on average for a physics-based approach, 
but when approaching the optima, which should be fea- 
tured with a shockless and fully attached flow profile, 
a POD approximation of this type would hide the po- 
tential improvement behind the trace of the original 
snapshots. This issue is of paramount importance and 
can be tackled by introducing and combining two con- 
cepts: zonal POD and adaptive sampling. The first will 
allow us to reduce the inherent variability of the snap- 
shots by means of a domain partitioning, thus avoiding 
the POD basis to capture all the physics within the field. 
The second technique will allow us to enrich the POD 
approximation by sampling at new points, which are 
optimal in the sense of exploration/model improvement 
balance. In this section, the discussion will be focused 


Parameter Baseline value Range 

Ab —0.1280 [—0.2280, —0.1280] 
Al —0.1483 [—0.2483, —0.1483] 
A, —0.1080 [—0.2080, —0. 1080] 
A} —0.2580 [—0.3580, —0.2580] 
A; —0.0918 [—0.1918, —0.0918] 
Ab —0.1079 [—0.2079, —0.1079] 
Ae —0.0561 [—0.1561, —0.0561] 
Ab 0.0638 [—0.0362, 0.0638] 


on the zonal POD approach. The basic idea proposed 
in [60.26] is to use a mixed full-order/reduced-order 
model (FOM/ROM) by splitting the solution domain 
into two sub-domains: the FOM (i. e., the CFD RANS 
model) is used only in the vicinity of the surface to 
accurately solve the near wall boundary layer, non-lin- 
earities (e.g., shock waves), and flow separations where 
they occur; the ROM (i. e., the POD surrogate model) is 
exploited to reconstruct the flow field far from the solid 
wall, where a smoother and weakly varying solution is 
expected. 

Figure 60.2 shows a sketch of the domain decom- 
position. The POD-based surrogate model is built on 
the spatial domain defined in the light gray region. 
Once the POD model has been trained, the surrogate 
response on the FOM/ROM boundary interface is ex- 
tracted and used as boundary conditions to iterate the 
full-order CFD solver in the inner domain (dark gray). 
Details about the specific boundary condition formula- 
tion across the two domains can be found in [60.26]. 

A useful advantage of the zonal POD is that any 
aerodynamic coefficient or surface distribution of inter- 
est (e.g., pressure or skin friction distributions) can be 


POD/ROM 
Fi domain 


Boundary 


interfaces 


FOM domain 


Fig. 60.2 Zonal approach, FOM/ROM domains and vol- 
ume mesh 


193 


7°09 |3 Hed 


1194 PartE 


Evolutionary Computation 


7°09 |3 Hed 


directly extracted from the CFD solution in the inner 
domain. On the other hand, when the full POD model 
is used (i. e., trained on the whole domain), the surro- 
gate model would provide a prediction of the mesh and 
state variables, so that a properly designed condensa- 
tion procedure has to be applied to retrieve the integral 
coefficients like lift (C1), drag (Ca), and pitching mo- 
ment (Cm) coefficients. 


60.4.4 Model Training, Validation, 
and Error Analysis 


Training and validation are key phases that POD surro- 
gates must undergo for assessment of their performance 
and potential. Validation means measuring the good- 
ness of a surrogate model with respect to a so-called 
truth response (e.g., the CFD solution) and, therefore, 
drawing information to eventually optimize it. The goal 
is to evaluate the potential of the model to globally ap- 
proximate the design space. Once a surrogate model 
has been trained, classical validation is carried out by 
sampling the design space once more, estimating the 
full and reduced-order models on the new sampling set, 
and computing a set of statistics from the data obtained. 
This approach requires computing new CFD solutions 
and, for this reason, it is computationally expensive. 
In order to reduce the number of full order computa- 
tions, the validation points could be represented by the 
same set used for training, provided that a cross-vali- 
dation technique is used (e.g., leave-one-out). Indeed, 
cross-validation implies the partitioning of a sample of 
data into complementary subsets: one subset is used for 
training, the other one for validation or testing. The 
variability due to the choice of the partitions is usu- 
ally reduced by performing multiple rounds of cross- 
validation and averaging over the rounds. Here a classi- 
cal validation is performed, while cross-validation will 
be used later on in auto-adaptive sampling, when the 
estimation of the quality of the POD model basis and 
coefficients will be required. 


Model Training and Setup 
Before getting into the validation process, the 
POD/ROM models have to be trained, so that an initial 
Latin hypercube sampling is done on the design space 
made of 16 variables (see Sect. 60.4.1). The size M 
of the training sampling is chosen to be very large 
(M = 180) to cover each design variable with a suffi- 
cient number of samples. The set of design sites {w;} 
is then transformed into the physical representation of 
the airfoil geometry due to the chosen parameteriza- 


tion. The baseline geometry is a modified RAE 2822 
airfoil, scaled to 14% thickness-to-chord ratio to am- 
plify compressibility effects. 180 calls of the volume 
mesh generator and CFD solver are launched in paral- 
lel at fixed flow conditions to compute the flow field 
around each airfoil shape. Due to a proper selection 
of the baseline geometry and design weight ranges, 
a wide and varied distribution of shock wave locations 
and flow separations is obtained through the training 
dataset. This is a highly desirable feature to test the 
predictive capability of such a physics-based surrogate 
model. The Mach number is 0.729, the Reynolds num- 
ber is 6500000, and the flow angle of attack is 2°. 
Fully turbulent flow is assumed. For each airfoil shape, 
a single-block structured volume mesh made of 25 186 
points (12288 cells) is computed by means of an auto- 
matic hyperbolic grid generator. Using fixed topology, 
mesh parameters, and sizes, standard quality grids are 
obtained for each geometry. The first cell at the wall 
is placed so as to have a unit y+ at the specified flow 
conditions. A sketch of the volume mesh around the 
baseline airfoil is shown in Fig. 60.2. 

With reference to Fig. 60.3, the mesh partitioning 
is applied to define the FOM/ROM domains, which are 
required to be non-overlapping and adjoining. This can 
be easily done when a structured mesh is available as 
the grid lines can be used as interfaces between do- 
mains. To this aim, the d parameter is introduced as 
the distance of the FOM/ROM interface from the airfoil 
leading edge. Indeed, different POD-based reduced-or- 
der models can be defined by varying this distance 
and, hence, reducing or increasing not only the size but 
also the inherent variability of the snapshot set. This 
mechanism has to be carefully considered beside the 
coexistence of eight heterogeneous variables (spatial 
coordinates, density, pressure and velocity) in the same 
snapshots, as it could introduce a bias in the correlation 
process. For example, the POD reduction could give 
more importance to the flow features related to the snap- 
shot variables which exhibit the largest absolute values 
or the widest range of variation. To avoid this, a scaling 
operator is applied to the snapshot set prior to feeding 
the POD model. The scaling factors are designed so as 
to map each variable to the interval [0, 1] by normaliz- 
ing as follows 
k Xh — min (xn) 

Mh max Ga) — min Om) ° 


(60.5) 


where xp is the vector containing the h-th flow variable 
in the snapshot s = (x1, x2, . . ., xg)?” and the minimum 


Aerodynamic Design with Physics-Based Surrogates 


60.4 Application Example of POD-Based Surrogates 


FOM domain d = 1.25 


and maximum are taken over the vector xp. In the 
present investigation the scaling factors are defined 
once for each variable and kept constant even when 
varying the FOM/ROM domains (and hence the snap- 
shot size). 

Dealing with a zonal approach, we do not know 
a priori the optimal distance of the domain interface 
from the airfoil surface. Hence, in order to assess the 
effects of changing the interface position, three domain 
partitions are considered, corresponding to three differ- 
ent settings of the domain interface. As a consequence, 
three POD-surrogate models (here referred to with the 
initials SM) are built exploiting the CFD data obtained 
in the training phase: SM1 — a POD model with d = 0, 
i. e., the full-order model is not invoked in the validation 
as the POD approximation is used to get the flow field 
everywhere and no boundary condition is exchanged, 
the snapshot size N is 201488; SM2 — a POD model 
with d = 0.35, i.e., the full-order domain is the dark 
gray one in Fig. 60.3, while the POD approximation 
is used to get the flow field anywhere else, the snap- 
shot size N is 91792; SM3 — a POD surrogate model 
with d = 1.25, similar to the previous one but now the 
full-order domain is the light gray one in Fig. 60.3, the 
snapshot size N is 75 232. Besides, two more surrogates 
are introduced and trained on the same dataset to act as 
standard meta-models: SM4 — a Kriging interpolation 
model with Gaussian correlation using the aerodynamic 
efficiency C\/Cg as response function (Dakota package 
implementation [60.32]); SM5 — a quadratic polyno- 
mial regression model using the aerodynamic efficiency 
Cı/Ca as response function. Given t the number of de- 
sign variables, at least (t+ 1) x (t+ 2)/2 design sites 
should be evaluated to train this type of model. In the 
present case, (t+ 1) x (t+ 2)/2 = 153, hence the size 
of the a-priori sampling is sufficient. 

For each of the POD-based approximations, the 
ensemble energy content threshold € is reported in 


Fig. 60.3 FOM/ROM domains with 
varying interface 


FOM domain 
d=0.35 


a) Energy content 


1004 a E 
99 g 
98 É 
97 ai a 
bg --m-: SM3 
96 -@- SM2 
—o- SMI 
> 
= 0 5 10 15 20 25 30 
Number of POD modes 


Fig. 60.4a,b Effect of zonal interface on the energy 
amount captured by POD 


Fig. 60.4a as a function of the number of POD modes. 
It is clearly evident that SM1 requires a big number of 
modes even to reproduce a relatively low energy level 
(95%), while SM3 performs considerably better (97%) 


1195 


7°09 |3 Hed 


196 PartE 


Evolutionary Computation 


7°09 |3 Hed 


with just four modes preserved. SM2 requires more 
modes with respect to SM3 because the corresponding 
ROM domain embeds part of the supersonic region on 
the airfoil suction side; Fig. 60.4b clarifies this issue 
as it reports the FOM/ROM domains (as in Fig. 60.3) 
superimposed with the local Mach number contours. 
The solution is here computed around an airfoil se- 
lected within the ensemble database. While the SM3 
ROM domain is quite far off the supersonic region, 
the SM2 FOM/ROM interface lies across it, thus in- 
troducing a stronger source of variability (and of slight 
discontinuity due to the shock wave) into the ensemble. 
Therefore, for each model a given energy level is ob- 
tained with different number of modes. In order to make 
a fair assessment, the models will not be compared us- 
ing a pre-defined number of modes, but at a fixed energy 
level (95%). Indeed, the number of preserved modes is 
ten for SM1 (95%), seven for SM2 (96.4%), and four 
for SM3 (97%). 


Error Estimation 
For each design candidate, the aerodynamic efficiency 
E = Cı/Ca is computed and used to assess some error 
measures: the percentage error 


x100, „M; 


2) ae 


i 


E;— Ê; 
PE; = 
E, 


the mean percentage error (MPE) 


1 M 
MPE = = ) PE; ; 


i=l 


the standard deviation of the percentage error (SDPE) 


M 
1 
SDPE = —— PE; — MPE}? ; 
JIL ] 


i=1 


the R-squared coefficient of determination 


P EO 
' D (z.-&) 
R= 
= = 9? 
Sii (-% 0.8) 


where index i denotes the i-th sample of the DOE 
validation plan, the hat quantities refer to the surro- 
gate predictions, while the hatless ones to the full-order 
predictions. This type of error measure provides a pic- 
ture of how the POD model reconstruction error is 


propagated on a surface integral, as only aerodynamic 
force coefficients appear. Therefore, it is very useful 
to understand the suitability of the surrogate model 
to approximate the fitness function in an aerodynamic 
optimization process, which usually requires the evalu- 
ation of aero-coefficients. However, in order to ensure 
a more general error analysis, the mean absolute per- 
centage error between the exact CFD computation and 
the predicted value is introduced at snapshot level as 


N 
B= A 


j=! 


Sij = Sij 


x 100, 


Sij 


where N is the snapshot size, s; j ($; j) is the j-th element 
of the computed (predicted) snapshot vector at the i-th 
validation site. 

Finally, monotonicity is one of the properties a good 
surrogate should have in an optimization process. Given 
two true data f (w;) and f (wj) and the corresponding sur- 
rogate predictions Î(wi) and f (w;), a surrogate model is 
monotonic when 


fw) <fw) => fw) <w). (60.6) 


This property can be global (i. e., valid for each w;, w; € 
DW C R’) or local. In order to measure the monotonic- 
ity, the following metric is introduced 


M i a 
6= Py -min(o a) , 


i=1j=1 


(60.7) 


where AE; = E; — & and AE; = E; -È. The G index 
can assume any non-negative value, zero value indicates 
global monotonicity, and the higher the magnitude, the 
more significant the monotonicity loss. 


Validation Results and Analysis 
The validation plan is generated with a new Latin 
hypercube sampling of size M = 50. The goodness- 
of-fit for each model is estimated; the results are sum- 
marized in Table 60.2. The surrogate models can be 


Table 60.2 Surrogate goodness-of-fit estimation 


Surrogate R-squared MPE SDPE G Ranking 
SM1 0.5876 10.33 48.14 1597.22 4 
SM2 0.8899 4.55 12.85 647.83 2 
SM3 0.9791 2.30 iol | A | 1 
SM4 0.8657 456 26.61 853.98 3 
SM5 0.06074 15.62 171.62 1761.64 5 


Aerodynamic Design with Physics-Based Surrogates 


60.4 Application Example of POD-Based Surrogates 


a) Model CL/CD 


504 
E SM1 m á 
> SM2 Lb ° 
45 o SM5 o a © 
fe} > 
Lal 
40 ob 
E 
35 Q 
30 
25 
20 > 
20 25 30 35 40 45 50 
Exact CL/CD 


b) Model CL/CD 


50 
E SM3 lo 
o SM4 

45 


— CFDf 


40 


35 


20 25 30 35 40 45 50 
Exact CL/CD 


Fig. 60.5a,b Correlation plot of the prediction of surrogate models 


ranked as reported in the rightmost column: SM3 ex- 
hibits superior performances for each quality index, 
while the quadratic polynomial surface is very poor 
in approximating the objective function. SM3 performs 
very well even on the SDPE estimate, which measures 
the variation of the percentage prediction error along 
the validation sampling. Hence, the prediction errors 
at any validation site are comparable and close to the 
mean value. This is a very desirable feature for a sur- 
rogate model designed for optimization. On the other 
hand, SM5 shows very poor performances because 
such a polynomial regression is unable to approximate 
a multi-modal, rapidly changing objective function. 
Looking at the figures in Table 60.2, models SM2 
and SM4 show similar results, even if they differ com- 
pletely in methodology and construction. This is a use- 
ful indication when seeking the proper balance between 
the FOM and ROM domains (i. e., to determine the 
distance d): the POD surrogate accuracy increases by 
moving the FOM/ROM interface away from the airfoil 


Table 60.3 Surrogate estimations of aerodynamic effi- 
ciency for best and worst validation airfoils 


Surrogate IDof IDof max min A max A min 


max min (%) (%) 
CFD 12 22 46.43 20.61 0 0 
SM1 12 22 52.19 22.84 12.40 10.81 
SM2 12 22 48.27 20.45 385 | 0.7/7 
SM3 12 2P. 46.40 20.12 —0.07 —2.39 
SM4 26 2D 47.40 19.86 AAs} = 
SM5 12 39 54.62 16.78 17.63 —18.59 


surface, and there exists a peculiar value of the distance 
d for which its predictive power is very close to stan- 
dard and efficient interpolation techniques. As shown 
in Table 60.2, the monotonicity measure is coherent 
with the previously introduced indicators and, consider- 
ing the big difference between SM3 and other models, 
it provides additional evidence of the quality of this 
model. 

Figure 60.5 reports the correlation plot between the 
prediction of the model and the true CFD data. Again, 
SM2, SM3, and SM4 are globally closer to the lin- 
ear trend, resulting in a better fit. The correlation plot 
highlights another significant feature of SM3 model, as 
it generally underestimates the aerodynamic efficiency. 
For further comparisons, Table 60.3 summarizes the 
validation set indices where each model predicts the 
highest and lowest efficiencies, the corresponding val- 
ues of aerodynamic efficiency, and the percentage error 
with respect to the CFD datum. This is useful for evalu- 
ating the capability of the model identifying the global 
extrema of the objective function. It is observed that 
only SM4 leads to a wrong estimation of the position of 
the optimal airfoil while SM5 underestimates the per- 
formance of the worst profile. 

The last two properties, i.e., the capability to pre- 
serve the monotonicity of the dataset and to correctly 
identify the best/worst candidates, are crucial aspects in 
SBO, so that models SM2 and SM3 seem to be more 
suitable for this purpose. 

The accuracy of POD models is also evaluated and 
compared in terms of the point-to-point snapshot er- 


1197 


7°09 |3 Hed 


1198 PartE 


7°09 |3 Hed 


Evolutionary Computation 


Error 
10'4 
10° 
— SMI 
— SM2 
— SM3 
7 rbawn New poh 
0 10 20 30 40 50 


Validation sample ID 


Fig. 60.6 Snapshot error prediction 


ror. Figure 60.6 shows the results for each snapshot 
belonging to the validation plan (again ranging from 1 
to 50). The error index is plotted in logarithmic scale. 
It turns out that, in strong transonic conditions, train- 
ing a POD model on the full CFD domain (SM1) would 
lead to misleading results in the prediction phase, as the 
model would not be able to catch the highly non-linear 
trends that characterize this kind of flow. Indeed, the 
high number of POD modes required and the low good- 
ness-of-fit performance suggest that further modeling 
is needed to optimize the computation of the basis vec- 
tors and modal coefficients in transonic aerodynamics. 
In the next sections, we will introduce some adaptive 
sampling concepts to globally improve the reduced-or- 
der models predictions. 

A final comparison is possible in terms of POD 
model accuracy versus computational time and cell sav- 
ing. In particular, the R-squared prediction error can be 
taken as a measure of the model accuracy, while the 
time saving index TS and the cell saved index CS are 
defined as 


T; -T N: 
Pa M Calico. ea 
TFULL NFuLL 


where T and N are, respectively, the computational time 
for 1000 CFD iterations and the number of solved com- 
putational cells. The subscripts FULL and SM refer to 
the full grid CFD computation and the CFD compu- 
tation on the smaller FOM domain. In Fig. 60.7a the 
three indices (R-squared, TS, and CS) are plotted against 


the distance d of the FOM/ROM interface from the 
airfoil leading edge. It shows that a clear trade-off 
exists between accuracy and time/cell saving and pro- 
vides useful guidelines to tailor the choice of the best 
POD model to the basic requirements of the target ap- 
plication. For instance, if the target is to do a pre- 
screening of the objective space, one could use a faster 
and less accurate POD model that, however, guarantees 
the preservation of the physics. Figure 60.7 proposes 
a comparison between surrogate models in terms of the 
MPE and SDPE. The plot shows a graphical picture 
of the results in Table 60.2 concerning the accuracy 
of the POD models with moving FOM/ROM inter- 
face and the comparison with more classical meta- 
models. 


Conclusions 

Three POD/ROM models have been trained and com- 
pared: the first one consisted in feeding the POD en- 
semble with the full field, hence without any domain 
decomposition; in the second and third one, the zonal 
approach was applied by defining two different values 
of the distance of the interface from the airfoil lead- 
ing edge. Results showed that the model accuracy is 
strongly dependent on the distance parameter, mainly 
because of the presence of the supersonic expansion 
lobe and the pressure jump across the shock wave on the 
airfoil suction side. In fact, the SM3 model showed su- 
perior performance with respect to both the other POD 
models and standard interpolation techniques like Krig- 
ing and regression methods like quadratic polynomial 
fitting. It also allows us to obtain a very accurate re- 
construction of airfoil surface distributions and, hence, 
of aerodynamic coefficients, which are very often the 
actual target of aerodynamic design. Another important 
conclusion of the work is that it seems completely mis- 
leading to base the POD ensemble on the full flow field 
when transonic conditions and shape modifications act 
together. Indeed, as the POD reconstruction is a linear 
combination of POD modes, capturing the combined 
non-linear effects of boundary layer and compressibil- 
ity is hardly possible when the position and intensity of 
the shock wave and its interaction with the boundary 
layer vary too much. Globally, the proposed POD sur- 
rogate model was shown to have many characteristics 
that make it suitable for aerodynamic design. However, 
a trade-off was found between POD model accuracy 
and resource saving as a function of the distance pa- 
rameter: the smaller the full-order domain, the shorter 
the computational time required but also the less accu- 
rate the reconstruction. 


Aerodynamic Design with Physics-Based Surrogates | 60.5 Strategies for Improving POD Model Quality: Adaptive Sampling 1199 


a) Index value 


—a- R-squared 
--@-- Time-saving ratio 
L \ --®-- Cell-saving ratio 
W ee R-squared SM4 
` ğ -— - R-squared SM5 


0 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 is 
d 


b) Index value 


10! 


0 


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8.0.9 1 1.1 1.2 13 
d 


Fig. 60.7a,b Performance of surrogate models as a function of FOM/ROM interface positioning 


Indeed, one of the key points of the research is to 
recover the accuracy issues by optimally selecting the 
training candidates. In the proposed example, we se- 
lected 180 sites to a-priori sample the 16-dimensional 
design space, but in principle we do not have any in- 
formation about the appropriate size and locations of 
the sample points. Intuitively, we would like to have 
a sampling strategy that would fill the space in an 
efficient manner and would allocate more points in 


regions of the design space where the simulation re- 
sponse is strongly non-linear or is likely to find an 
optimum. In industrial practice, this would mean, given 
a computational budget, improving the quality of the 
POD surrogate by intelligently choosing the training 
samples. Conversely, given a POD model with a cer- 
tain quality level, the rationale would be to reach 
the same accuracy with less high-fidelity computa- 
tions. 


60.5 Strategies for Improving POD Model Quality: Adaptive Sampling 


In order to get rid of the issues raised in the previous 
section, a set of strategies has been proposed to update 
and enhance the surrogate/POD model through adap- 
tive DOE techniques. Indeed, the selection of the design 
sites to be included in the POD ensemble, instead of 
being fully derived from an a-priori sampling strategy, 
can be tailored to match specific POD-related improve- 
ment requirements. Adaptive sampling strategies can 
be properly designed to account for these requirements 
by means of the so-called in-fill criteria. While a- 
priori sampling techniques do not use any information 
about the model prediction, adaptive techniques incre- 
mentally select new sampling points by exploiting the 
input/output relation observed at the previous stages. 
Hence, some adaptive DOE strategies for POD-based 
reduced-order models are proposed. The main refer- 
ences are the works by Goblet and Lepot [60.33] and 
Sainvitu et al. [60.34]. 


60.5.1 Rationale 


Adaptive sampling is aimed at improving the modal ba- 
sis or the modal coefficients set, which represents the 
core of POD modeling. Indeed, given a POD model 
built on a snapshot ensemble {s(w1), s(w2),...,5(wa)}, 
the aim is to find a new point Wnew in the design 
space so that the new POD model, built on the new 
set {s(w 1), s(w2), . . . , sS(wm), S(Wnew)}, Will provide for 
improved predictions and better exploration of the de- 
sign space at the same time. The fundamental idea is 
to realize a proper trade-off between local accuracy and 
global design space exploration. On the one hand, we 
would like to sample near those design sites whose im- 
portance is higher (exploitation). On the other hand, 
knowing the relative distances between new sampling 
points and snapshot sites, we would like to sample far 
away from the known points in order to potentially 


s°09 | 3 Hed 


1200 PartE | Evolutionary Computation 


s°09 | 3 Hed 


enrich the global prediction of the POD model (explo- 
ration). In other words, we need to know how much 
a new potential sample is near the training set and 
how much the nearest training sample weighs on the 
POD model. Of course, the meaning of distance and 
importance needs to be mathematically defined, but the 
underlying idea is to combine the information about the 
relative importance of the snapshot and the nearest dis- 
tance in order to satisfy both requirements. This leads to 
the definition of a potential of enrichment in the general 
form 


Ve(wi,W2,...,Wu.y) = R(w1, w2, ..., wm, Y) 
X Ie (w1, W2, ..., Wm, Y) » 
(60.9) 
where y is a generic point in the design space and 
{w1,W2,...,Wy} is the usual set of training points. 


A new sample can be obtained by simply maximizing 
the enrichment function. The function R gives a mea- 
sure of the distance of y from the set {w1, w2, . . . , wa}, 
according to a certain norm, and helps to obtain good 
space-filling properties. The function J. can be referred 
to as an importance function, as it has to properly 
provide the information about the quantity to be im- 
proved. A natural candidate could be a measure of 
the error at each new point if the surrogate model 
would be designed so as to provide for an error es- 
timation at each y. Otherwise, in a surrogate-based 
optimization process, the function J. could be directly 
linked to the approximation of the objective function 
to drive the search for a new optimized design sam- 
ple. With reference to the POD modeling presented 
so far, the importance function should be closely tied 
to the quality of the modal basis {$,,...,@,,} or the 
quality of the RBF models built on the modal coeffi- 
cient {œ (w), @2(w),...,@y(w)}. Hence, two different 
approaches can be followed, both based on the leave- 
one-out cross-validation technique. For the sake of clar- 
ity, the superscript ~/ will indicate that the referenced 
element (basis vector, coefficient model, SVD matrices) 
has been obtained by means of a leave-one-out process, 
i.e., by removing the j-th sample from the training set 
and re-computing the model. 


60.5.2 Improvement of the Modal Basis 


The first strategy consists in defining the importance 
function as a measure of the relative influence of each 
snapshot on the modal basis. This requires evaluating 
how much the modal basis changes when removing the 
snapshots one by one. The relative influence of the j-th 


snapshot on the modal basis is defined as 


; Is (Wj) 
Lw) = =r. (60.10) 
HLO EL wo 
where 
M 
hm) => Oi l (60.11) 


=i A(s) 


is the influence of the j-th snapshot on the modal ba- 
sis, o; is the singular values associated to the i-th basis 
vector, and o; is the i-th basis vector obtained after the 
substitution of the j-th snapshot vector with a null vector 
in the ensemble matrix. Goblite and Lepot [60.33] show 
how to efficiently compute ġ;” for each j. According 
to (60.9), this choice of the importance function drops 
the dependency on y, so that we need to condense it in 
the distance function. 

As was mentioned above, a new optimal sample can 
be found by maximizing the potential of enrichment. 
However, in order to avoid solving a maximization 
problem, the design space is heavily sampled with 
a Latin hypercube technique, e.g., 100 times the dimen- 
sion ¢ of the design space. Then, the Euclidean distance 
of each new sampled point y;, i = 1,...,l = 100¢ from 
each of the snapshot sites w,k = 1,...,M is com- 
puted and, for each y;, the distance from the nearest 
snapshot wz is stored as A(wz, y;). This represents the 
distance function. Hence, for each new candidate y;, 
the potential of enrichment can be written as Vg (yi) = 
A(wz, yi); (wz). Finally, a new sample point is selected 
at Whew = argmaxy, Vo (yj). 


60.5.3 Improvement 
of the Modal Coefficients 


The second adaptive method is conceived to improve 
the quality of the RBF networks built on the POD 
modal coefficients. Two sub-strategies are proposed: 
the first aims at improving the worst modal coefficient, 
the second is designed to improve all coefficients simul- 
taneously. 


First Sub-Strategy 
This strategy is applied when one of the coefficient 
model {a1 (w), . . . , @(w)} exhibits low quality with re- 
spect to the others. Therefore, we first need to select 
a modal coefficient. The quality of the i-th modal co- 
efficient œ;(w) is estimated by using a leave-one-out 


Aerodynamic Design with Physics-Based Surrogates 


60.6 Application to Aerodynamic Design 


process and computing a weighted form of the Pearson 
correlation coefficient as 


Oi 


Lii Ok 


u (œa) — a(o) u (a) 


x 
uea- y (a; ‘0; ") _ [u (a) f 


Pe (ai) = 


> 


(60.12) 
where u is the arithmetic mean operator applied to 
a generic dataset K = {k,..., Km} 

1 
a(k) = 7 > Kj. 


j=l 


Weighting the correlation coefficient is needed be- 
cause the POD modal coefficients are not equally 
important but they can be ranked according to the mag- 
nitude of the corresponding singular values. 

The modal coefficient with the lowest value of the 
weighted correlation coefficient is selected and tagged 
as 1. The importance function is then defined as 


Tox, (w) = (w) — 0 "(w;)} 


i.e., the absolute error of the 7-th model when leaving 
out the j-th snapshot. The choice of the distance func- 
tion is the same as in the previous case. Hence, for each 
new candidate y;, the potential of enrichment (with re- 
spect to the worst coefficient model) is defined as 


Vaz (yi) = A(z, yilo (wz) . 


Finally, a new sample point is selected at Whew = 
argmax,, Va;(Vi). 


The evaluation of the quantities œ; ” (w) is very ex- 
pensive as, for each j, a new model has to be computed. 
However, when using RBF network interpolators for 
POD coefficient models, the leave-one-out procedure 
can be performed at no extra cost by using the efficient 
formula provided by Rippa [60.35]. 


Second Sub-Strategy 

This sub-strategy is used when the quality of all coeffi- 
cient models is comparable and it is very similar to the 
improvement of the modal basis. The importance func- 
tion is defined as a measure of the relative influence 
of each snapshot on the whole set of coefficient mod- 
els. This requires evaluating the absolute error of each 
modal coefficient model when removing the snapshots 
one by one. The relative influence of the j-th snap- 
shot on the whole set of coefficient models is defined 
as 


T.(wj) 


To= H 
We are 


where 


M M 
Lw) =È olay (wy) =Y o; leio) -a7 w]. 
i=1 i=1 


(60.13) 


As in the previous cases, the potential of enrichment Vg 
(with respect to all POD coefficients) is defined as 


Va Yi) = A(wy vip) - 


Finally, a new sample point is selected at Whew = 
argmax,, Væ i. 


60.6 Aerodynamic Shape Optimization by Surrogate Modeling 


and Evolutionary Computing 


The POD surrogates as well as the adaptive sampling 
techniques described in Sects. 60.3 and 60.5 have been 
included within an evolutionary optimization loop. The 
aim is to assess POD-based surrogate models as fitness 
function evaluators in a shape optimization problem in 
transonic flow. Several approaches are proposed, dif- 
fering for the key ingredients of the methodology: the 
construction of the POD model (full/zonal approach), 


the strategy chosen to compute the training sample (a- 
priori, auto-adaptive), and the strategy to exploit the 
optimization results (single optimization, iterative op- 
timization with real-time updating). The optimization 
approaches share the same target, i. e., to improve the 
aerodynamic performance of the scaled RAE 2822 air- 
foil. The surrogate-based shape optimization process 
consists of an a-priori design of the experiment mod- 


1201 


9°09 | 3 Hed 


1202 ~PartE | Evolutionary Computation 


9°09 | 3 Hed 


ule (e.g., LHS), the CST parameterization module, an 
in-house developed automatic mesh generator, the ZEN 
CFD flow solver [60.36], the POD/ROM module, which 
also encloses the adaptive sampling techniques, and 
the in-house adaptive genetic algorithm optimization li- 
brary ADGLIB [60.15, 37-39]. 


60.6.1 Problem Definition 


The geometry parameters, the design variable ranges, 
the parameterization technique, and the design point 
from Sect. 60.4.1 will be used. Here, we define the 
airfoil shape optimization problem in terms of objec- 
tive/constraint function specifications as 


eae Ci 
minimize — — 
wEDWCR!6 Ca 
t 
subject to (-) =0.14, C>05, 
C/ max 
Cm => —0.05 , 
Cm < 0.05. (60.14) 


In other words, the goal is to maximize the aerodynamic 
efficiency C\/Cy while keeping a minimum level of lift 
generation (Cı > 0.5) and pitching moment controlla- 
bility (|Cm| < 0.05). Moreover, a geometric constraint 
is added in order to set the airfoil maximum thickness- 
to-chord ratio t/c at 14%: this constraint is implic- 
itly treated within the parameterization. The constraint 
functions are actually treated as quadratic penalties, 
hence the constrained optimization is transformed into 
the following unconstrained problem 


C 
minimize — — + K[min(C, — 0.5, 0)]? 
wEDWCR!S Ca 


+ K[min(C,, + 0.05, 0)]? 
+ K[min(—Cm + 0.05, 0)? , 
(60.15) 


where K is a constant weight (equal to 10*) which 
amplifies the relative importance of possible constraint 
violations. For instance, a unit penalty will be applied 
to the objective function in the case of an airfoil having 
a pitching moment of +0.06. 


60.6.2 Optimization Strategies and Setup 


In Sect. 60.4, three POD-based surrogate models were 
introduced, trained, and validated against an indepen- 
dent dataset. Here, we propose a set of numerical exper- 
iments to assess their potential as fitness evaluator and 


their suitability for an evolutionary optimization prob- 
lem. Several optimization approaches were set up and 
tested in order to possibly cover all the issues concern- 
ing surrogate/ROM training and prediction. Table 60.4 
summarizes the characteristics of each optimization 
in terms of: fitness evaluator, optimization algorithm, 
POD energy threshold (when using POD as surrogate), 
high-fidelity computational budget, i. e., the total num- 
ber of computations with the ZEN RANS solver during 
the optimization process, number of a-priori LHS sam- 
ples Mapr, number of adaptively added samples Maap, 
and number of surrogate-based optima Mo, which are 
iteratively added to the ensemble database. It must be 
noted that not all the optimization strategies use POD 
as a surrogate; in particular, optimizations tagged as 
Kriging-driven Genetic Algorithm (KGA) and EGO 
have been performed by using a Kriging method as the 
fitness evaluator and the EGO algorithm [60.17], based 
on Kriging and expected improvement evaluation, to 
compute new optimal samples. The EGO algorithm rep- 
resents a modern state-of-art method in surrogate-based 
global optimization. In the following, with the term 
truth or true we will indicate the results obtained with 
the Zonal Euler—Navier—Stokes (ZEN) CFD solver as 
it is adopted as the reference high-fidelity simulation 
tool. Each optimization method is described in detail 
here: 


@ DGA (Direct Genetic Algorithm) — a plain, brute- 
force genetic optimization with the full high-fidelity 
solver ZEN called as fitness evaluator. 

© FPGAI (Full POD Genetic Algorithm 1) — a sur- 
rogate-based optimization where the aerodynamic 
analysis is carried out through a POD model built 
on the complete flow field of a set of 180 initial 
samples. This case corresponds to the POD-driven 
standalone mode and the surrogate POD evaluator 
is the one presented as SM1. No zonal approach is 
used. The POD energy content is 85%. The snap- 
shot size N is 201 488. 

@ FPGA2 (Full POD Genetic Algorithm 2), FPGA3 
(Full POD Genetic Algorithm 3) — same as FPGA1, 
but the POD models are defined by increasing the en- 
ergy content (95 and 99%, respectively). 

@ MPGAI (Mixed-flow POD Genetic Algorithm 1) — 
a surrogate-based optimization where the zonal 
CFD/POD model is trained on the initial design 
space sampling (180 snapshots) and adopted as 
the objective function evaluator throughout the op- 
timization cycle. The FOM domain is defined at 
a distance d = 1.25 chord length from the airfoil’s 


Aerodynamic Design with Physics-Based Surrogates 


60.6 Application to Aerodynamic Design 


Table 60.4 Optimization approaches 


Opt Tag Fitness evaluator Optimizer 
DGA ZEN ADGLIB 
FPGA1 standalone POD ADGLIB 
FPGA2 standalone POD ADGLIB 
FPGA3 standalone POD ADGLIB 
MPGAI1 zonal POD ADGLIB 
MPGA2 zonal POD ADGLIB 
KGA Kriging Dakota SOGA 
EGO Kriging Dakota EGO 
AFPGA1 standalone POD ADGLIB 
AFPGA2 standalone POD ADGLIB 
AFPGA3 standalone POD ADGLIB 
AMPGA1 zonal POD ADGLIB 
AMPGA2 zonal POD ADGLIB 


leading edge. The POD model used here has been 
already validated as SM3 in previous sections. The 
POD energy threshold is set at 95%. The snapshot 
size is 75 232. 

MPGA2 (Mixed-flow POD Genetic Algorithm 2) — 
same as MPGA1, but the POD energy content is in- 
creased up to 99%. 

KGA -— a surrogate-based optimization where 
a Kriging meta-model, built on the objective func- 
tion, is coupled to the genetic optimization. Here, 
the DAKOTA package [60.32] is used both for 
optimization process control and algorithm capa- 
bilities. The John Eddy Genetic Algorithm (JEGA) 
library [60.40] was used for optimization purposes. 
In particular, the single-objective genetic algorithm 
(SOGA) was used to perform optimization on a sin- 
gle objective function with general constraints. 
Kriging is initially trained on the 180 samples 
dataset. Then, a classical surrogate-based iterative 
optimization scheme is performed, consisting in 
building the surrogate, optimizing the surrogate ob- 
jective, evaluating the minimizers with the truth 
(CFD) model, and rebuilding the surrogate. In the 
present optimization, 10 SBO iterations are per- 
formed. 

EGO - the key idea in EGO [60.17—19] is to exploit 
the Gaussian process capability to provide both the 
prediction at a new input location as well as the un- 
certainty associated with that prediction. 

AFPGAI (Adaptive Full POD Genetic Algo- 
rithm 1), AFPGA2 (Adaptive Full POD Genetic 
Algorithm 2), AFPGA3 (Adaptive Full POD Ge- 
netic Algorithm 3) — the surrogate model employed 
is the same as FPGA3, but the training method 
is different and an adaptive sampling strategy is 


POD energy (%) Budget hi-fi Mapr Maap Mont 
- 9600 0 0 0 
85 180 180 0 0 
95 180 180 0 0 
99 180 180 0 0 
95 180 180 0 0 
99 180 180 0 0 
- 190 180 0 10 
- 553 153 400 0 
99 96 32 16 48 
99 96 16 32 48 
99 96 4 44 48 
99 112 8 56 48 
99 96 8 40 48 


added. In particular, it was decided to follow a dif- 
ferent approach: the aim was to check whether, with 
a limited computational budget, better results can 
be obtained by adaptively training the POD model. 
Hence, the surrogate training phase was split into 
three contributions: an a-priori contribution, sam- 
pling the design space with the LHS technique 
and producing Mapr samples, an iterative, adaptive 
sampling aimed at improving the modal basis and 
enriching the ensemble dataset with Maap samples, 
and a series of Mop genetic optimizations, each pro- 
ducing an optimal candidate to update the ensemble 
and recompute the surrogate. The last phase is also 
called real-time updating. The three strategies differ 
for the relative amount of these three contributions 
as highlighted in Table 60.4, keeping fixed the total 
computational budget. The POD energy content is 
99%. The snapshot size N is 201 488. 

AMPGA1 - the surrogate model employed is the 
SM2. The FOM/ROM interface is defined at d = 
0.35 chord length from the airfoil’s leading edge. 
However, the training method is different as it em- 
beds a-priori, auto-adaptive, and optimal samples as 
described earlier. The POD energy content is 99%. 
The snapshot size N is 91 792; 

AMPGA2 - the surrogate model employed is again 
the SM3, but it differs from MPGA2 because 
the training method embeds a-priori, auto-adap- 
tive, and optimal samples as described before. The 
POD energy content is 99%. The snapshot size N 
is 75 232. 


The optimization setup is the same for all the ap- 


proaches, except for AMPGA1 and AMPGA2. A pop- 
ulation of 64 individuals is let to evolve for 150 gen- 


1203 


9°09 | 3 Hed 


1204 


9°09 | 3 Hed 


Part E 


Evolutionary Computation 


erations with an 80% bit crossover rate and a 2% bit 
mutation rate. The genetic evolution is repeated ev- 
ery time a new optimal sample has to be added to 
the ensemble. Hence, a total number of 9600 evalua- 
tions are required for each optimization process. The 
setup of AMPGA1 and AMPGA2 differ slightly be- 
cause the surrogate models adopted are more expensive 
(Fig. 60.7a). In order to increase the frequency of model 
updating stages, a population of 48 individuals is let 
to evolve for just 10 generations and the process is 
repeated 48 times to iteratively provide new optimal 
samples. The new feature is that each optimization step 
is a restart of the previous one with re-evaluation of 
the population candidates as the surrogate model has 
meanwhile been updated. In other words, the idea is 
to update the surrogate model more frequently (af- 
ter just 10 genetic algorithm (GA) generations instead 
of 150) even if with smaller amounts of improve- 
ment (10 generations are not enough to converge the 
GA). 


a) Objective function 


20 
0 —-— FPGA| best 
——- FPGA2 best 
-20 —— FPGA3 best 
— DGA best 


O 20 40 60 80 100 120 140 160 
Best individual per generation 


b) Objective function 
484 i 71 
i 
eg || x —-+ MPGAI best 
4 —-—- MPGA2 best 
—49 |— \ — DGA best 


0 20 40 60 80 
Best individual per generation 


100 120 140 160 


Fig. 60.8a,b Non-adaptive POD-driven optimization his- 
tory 


By looking at the details of the SBO approaches 
described so far, it seems quite natural to divide 
them into two main classes: non-adaptive (FPGAx, 
MPGAx), i.e., those without any adaptation/real-time 
updating, and adaptive optimizations (KGA, EGO, 
AFPGAx, AMPGAx). Consequently, the presenta- 
tion of the results obtained will follow this logical 
sequence. 


60.6.3 Non-Adaptive Optimization Results 


Figure 60.8 shows the convergence history of the three 
FPGA optimizations compared to the plain DGA (solid 
black line) on the left and the two MPGA optimization 
histories on the right. The graphs show the sequence 
of the best candidates for each generation. It must be 
pointed out that while the DGA predictions (solid black 
lines) are obtained with the CFD solver, the POD-based 
predictions (dash, dotted, and dash-dotted lines) are the 
surrogate ones. For example, the dash-dotted line does 
not indicate that FPGA1 reached objective levels signif- 
icantly lower than DGA, but simply that the predicted 
values of the airfoil performances were significantly 
overestimated. The plot clearly highlights that what- 
ever the energy content, the full-POD approximation 
is not able to match the true data during the search 
process. Moreover, the general trend is towards an over- 
estimation of the aerodynamic characteristics, which 
leads to lower values of the objective function. On the 
other hand, the MPGA model agreement with the CFD 
progress is very satisfying, both in terms of trends and 
accuracy. 


60.6.4 Adaptive Optimization Results 


Figure 60.9a shows the convergence history of the it- 
erative SBSO (which stands for surrogate based shape 
optimization) KGA run. As was already mentioned, it 
is made of ten sequential surrogate optimizations; at the 
end of each of them, the optimal candidate is re-evalu- 
ated with the CFD solver and injected in the training set, 
so that an updated surrogate is available. The left-hand 
figure compares the surrogate and true prediction of the 
optimal candidate at each iteration. After about 6—7 
SBO iterations, the Kriging model has been improved 
enough to predict very closely to the CFD solver. In 
Fig. 60.9b, the convergence history corresponding to 
the tenth SBO iteration is superimposed with the DGA 
run: a noticeable agreement is found, both in the ini- 
tial drop in the fitness function and in the final plateau. 
Among the SBO minimizers, the ninth iteration shows 


Aerodynamic Design with Physics-Based Surrogates 


60.6 Application to Aerodynamic Design 


Fig. 60.9a-c Kriging-based optimization > 


the lowest true objective function value, so that it will 
be considered as the actual KGA optimum. 

Figure 60.9c reports the convergence history of the 
EGO optimization. Dark gray circles depict the initial 
DOE sampling (153 candidates), while light gray cir- 
cles denote the subsequent 400 candidates found by 
minimizing the expected improvement. The graph also 
reports the expected improvement values in gray white- 
filled circles and logarithmic scale (right axis). It is 
clearly evident how the progressive decrease of the EIF 
produces a better quality of the Kriging model, which 
in turn results in a minimization of the true objective 
function. 

The convergence histories of the AFPGA1, AF- 
PGA2, and AFPGA3 optimizations are reported in 
Fig. 60.10a together with the objective function val- 
ues computed on the training points. In the plot, each 
point represents a single high-fidelity evaluation, the 
squares depict the a-priori and adaptive training sites, 
while the circles connected with lines represent the se- 
quence of optima from Mop GA optimizations. It is 
fairly evident that adaptive sampling is often helpful 
as it allows us to find sub-optimal solutions even be- 
fore optimization (see AFPGAI and AFPGA2). On 
the other hand, this somewhat disappointing behav- 
ior in the optimization step is due to the fact that the 
surrogate underestimates the objective function, thus 
pushing the surrogate-based optimizer to explore un- 
interesting design space regions. In particular, results 
show that the more adapted the initial sampling, the 
smaller the underestimation. Hence, the ratio Mapr/ Maap 
should be kept low. However, another important fea- 
ture is related to the AFPGA3 method; it shows that 
by lowering the ratio Mapr/ Maap too much (up to 0.09), 
the performance of the method deteriorates, as the fi- 
nal AFPGA3 optimum is worse than the previous ones. 
Indeed, leaving too much room for adaptive criteria 
seems to produce a sampling with very poor exploratory 
capabilities. 

These considerations give a helpful hint about the 
right combination of a-priori and adaptive sampling: 
the ratio Mapr/Maap Should be kept between 0.1 and 
0.5, which is in line with the value of the EGO and 
goal-seeking methods proposed by Jones [60.19]. This 
information is exploited in tuning the parameters for 
AMPGA1 and AMPGA2 optimization. Figure 60.10b 
shows the true objective functions of the training sam- 
ples and of the sequence of optima candidates. Even 
if the AMPGAI performs quite well, it exhibits simi- 


a) Objective function 
-30 


-35 —@-— Truth obj 


—40 


ees coe 
=, aD i a ae 


0 1 2 3 4 5 6 7 8 
SBO iterations 


b) Objective function 
-20 


0 20 40 60 80 100 120 140 


- ®- Kriging obj 


| 


9 


160 


Best individual per generation 


c) 
10! 
E 5 
= E 
i=} oO 
& 5 
(D) 
3 = 
ay g 
© is] 
5 10° 2. 
E a 
© o 
e 
100 200 300 400 500 600 


Design candidates 


lar characteristics to the AFPGAx optimization. On the 
other hand, the AMPGA2 optimum outperforms the op- 
tima seen so far and, as it will be clear in the next 


1205 


9°09 | 3 Hed 


1206 PartE | Evolutionary Computation 


9°09 | 3 Hed 


a) True objective function 


10 Aa 
-----0--- AFPGAI 
5 a7 7 6 — AFPGA2 
0 a a DE == AFPGA3 
m o AFPGAl-training sites 


E 
-5 e z = m AFPGA2-training sites 
a 
—10 |" Amo m m z AFPGA3-training sites 


-35 |, Paha g 
_40 ma ho g a cae 
o =< 
50 x 


a > 
0 10 20 30 40 50 60 70 80 90 100 
Design candidates 


b) True objective function 


15 
ion = ===" AMPGAL 
a —e— AMPGA2 
> n a AMPGA I|-training sites 
0 = m  AMPGA2-training sites 
a o m 
-5 og 


a 
—20 z m oŒ g 5 P A 
lo 

a, a og 

N) P a fa oh 
-35 . a "af 80 
=49 p§* 4s | "= 
—45 |= tele 
—50 ana 
55 


> 
0 20 40 60 80 100 120 
Design candidates 


Fig. 60.10a,b Convergence history of POD-based optimization 


Table 60.5 Optimal candidates, objective function breakdown 


Opt. run ID Truth obj. Predicted obj. Penalty q Ca Gu 

DGA =i. les) —51.18 1.025 0.619 0.0118 —0.0602 
MPGA1 —48.70 —50.86 ills} 0.578 0.0116 —0.0606 
FPGA3 38.33 —73.45 0.608 0.553 0.0142 —0.0578 
KGA —47.65 —51.94 0.585 0.612 0.0127 —0.0576 
EGO —49.71 —49.71 0.530 0.618 0.0123 —0.0573 
AFPGA1 —49.24 —47.14 1.12 0.635 0.0126 —0.0606 
AFPGA2 —49.20 =l 0.551 0.631 0.0127 —0.0574 
AFPGA3 —48.13 —47.88 1.29 0.583 0.0118 —0.0614 
AMPGA1 —48.58 —44.61 0.567 0.576 0.0117 —0.0575 
AMPGA2 —51.13 —50.31 0.947 0.612 0.0117 = 00597 


section, it is the only candidate to get very close to the 
truth optimum, i. e., the DGA optimum. 


60.6.5 Optima Analysis 


This section gives details about the optima computed 
with each of the methodologies presented. In the fol- 
lowing, ten optimal candidates will be considered to 
assess the optimization results, namely the optima 
from runs DGA, FPGA3, MPGA1, KGA, EGO, AF- 
PGA1, AFPGA2, AFPGA3, AMPGA1, and AMPGA2. 
FPGA3 and MPGA1 have been selected among the 
FPGAx and MPGAx optima because they are the clos- 
est to the high-fidelity DGA optimum. The objective 
function breakdown for each optimal candidate is sum- 
marized in Table 60.5. The table reports both the true 
data, obtained by re-computing each design with the 
CFD solver, and the predicted objective function as cal- 


culated by the surrogate model. Each optimum does 
not satisfy the pitching moment constraint because the 
quadratic penalty function and its weight, chosen in the 
problem definition, purposely do not enforce this con- 
straint strictly to have a less stiff optimization problem. 
Indeed, getting precisely into the constraint boundaries 
would probably have penalized the aerodynamic ef- 
ficiency to much, i.e., the actual objective function, 
while applying small penalties near a constraint bound- 
ary gives more flexibility to the search of the optimal 
design. 


Non-Adaptive Optima 
Among the non-adaptive methods (here KGA is con- 
sidered as non-adaptive to set a comparison), optimal 
designs coming from MPGA1 and KGA are closer 
to the plain one in terms of global performance. The 
MPGAI optimum catches the DGA constraint viola- 


Aerodynamic Design with Physics-Based Surrogates 


60.7 Conclusions 


a) True objective 


04 
Line of perfect fit 
-20 
a 
—40 
—60 
-80 
Perfect fit @ KGA 
-100 DGA m EGO 
FPGA1 a AFPGAI 
à FPGA2 © AFPGA2 
—120 Y FPGA3 m AFPGA3 
4 MPGAI 4 AMPGAI 
è MPGA2 o AMPGA2 


> 
140 -120 -100 -80 60 40 20 0 
Predicted objective 


b) True objective 


—40 
Line of perfect fit 
—45 
o 
o 
-50 
fe) 
Perfect fit @ KGA 
DGA m EGO 
—55 FPGA1 a AFPGAI 
a FPGA2 © AFPGA2 
v FPGA3 m AFPGA3 
4 MPGAI A AMPGAI 
+è MPGA2 o AMPGA2 
60 > 
-60 -55 -50 45 —40 


Predicted objective 


Fig. 60.11a,b Computed optima in the surrogate vs truth objective plane 


tion almost perfectly, while KGA design performs even 
better on pitching moment but at the cost of a slightly 
lower aerodynamic efficiency. FPGA3 design, although 
using 75 POD modes, does not belong to an optimal 
sub-set but exhibits a small penalty. The best surrogate 
solution is the MPGA1, where a weak shock appears 
on the suction side but at a lower lift level. Indeed, the 
optimal leading edge radius is almost twice the DGA 
value, and this feature causes an over-expansion on the 
suction side, which in turn makes the shock wave occur 
more upstream and more strongly. The Kriging-based 
best candidate shows a reduced rear loading to limit 
nose-down pitching moment and trim drag associated 
with the rear location of the center of pressure. This 
beneficial feature is counterbalanced by the lift produc- 
tion increase in the fore airfoil part and, consequently, 
by a more pronounced pressure jump across the shock 
wave. 


Adaptive Optima 
In order to highlight the undertaken improvement path, 
Fig 60.11 reports a correlation plot where the whole 
set of optima is depicted in the surrogate objective — 
the true objective plane. Two different zooming levels 


60.7 Conclusions 


The aim of the present book chapter was to review 
and investigate ad-hoc computational techniques to ease 
the solution of complex aerodynamic shape optimiza- 


are set, as they reflect the non-adaptive and adap- 
tive process: the FPGA1, FPGA2, and FPGA3 optima 
show very large discrepancies between true value and 
surrogate prediction, hence they are located very far 
from the line of perfect fit. However, a trend is ob- 
servable as, increasing the POD energy content (i.e., 
passing from FPGA1 to FPGA3), the best candidate 
gets closer to the true optimum. By looking at the 
top part of the figure, a clustering of the remaining 
optima is observable, so that a closer look is offered 
in the bottom figure for better understanding. Among 
the adaptive optima, AMPGA2 and EGO produce the 
best results and demonstrate the benefits of opportunely 
coupling the zonal approach and an intelligent design 
space sampling. Indeed, these optimal candidates are 
the closest to the target point in the sense of the Eu- 
clidean distance in the objective plane. From a more 
strict aerodynamic point of view, turning on the adap- 
tive criteria brought the quality of the optimization 
result to a level that is very close to the expecta- 
tions of the designer, as a shockless profile featured 
with a gentle re-compression on the suction side rep- 
resents the golden goal for the proposed optimization 
problem. 


tion problems such as those commonly encountered 
in aerospace design at industrial level. Among the 
various approaches that are the subject of research in- 


1207 


2°09 | 3 Hed 


1208 PartE 


Evolutionary Computation 


09 | 3 Hed 


vestigations, we chose to focus on ad-hoc surrogate 
methods. In particular, we demonstrated that the well- 
known proper orthogonal decomposition approach is 
not adequate to provide reliable predictions in peculiar 
aerodynamic conditions like transonic flow and when 
the boundary of the computational domain changes like 
in shape optimization. We proposed a zonal approach 
to de-couple the strong non-linearities occurring near 
the body-wall from the POD approximation. This zonal 
approach proved to give reliable results at a reduced 
computational cost compared to the full CFD simula- 
tion. Furthermore, we showed that the zonal approach 
can give an accurate approximation of the true opti- 
mum when trained with specifically designed adaptive 
sampling techniques. The latter have been purposely 
conceived to improve POD model machinery, namely 
the basis vectors and coefficients. By using such an 
intelligent design of experiment method, the high-fi- 
delity computational budget can be further reduced and 
the overall performance of the design loop increases. 
The beneficial effects of this approach have been il- 
lustrated by a comparison of several surrogate-based 
optimization processes on the shape design of a two-di- 
mensional airfoil. The extension of the methodology to 
complex three-dimensional problems is straightforward 
and under way. Indeed, one of the main advantages of 
the proposed methodology is its relative insensitivity 
to the curse of dimensionality of the design parame- 
ter space. On the other hand, the larger snapshot size 
required by three-dimensional CFD flow fields, where 
millions of unknowns may be handled, does not repre- 


References 


60.1 Advisory Council for Aviation Research and innova- 
tion in Europe: European aeronautics: A vision for 
2020 (European Communities, Luxembourg 2001) 
G. Schrauf: KATnet. Key Aerodynamic Technolo- 
gies for Aircraft Performance Improvement (DGLR - 
Deutsche Gesellschaft fiir Luft- und Raumfahrt, Vi- 
enna 2006) 

0. von Tronchin: The Airbus Global Market Forecast 
(Herndon 2010) 

Boeing: Current Market Outlook 2009-2028 (Seattle 
2009) 

E. Torenbeek: Synthesis of Subsonic Airplane De- 
sign: An Introduction to the Preliminary Design of 
Subsonic General Aviation and Transport Aircraft, 
with Emphasis on Layout, Aerodynamic Design, 
Propulsion and Performance (Delft Univ. Press, Delft 
1982) p. 620 


60.2 


60.3 


60.4 


60.5 


sent a big issue with current linear algebra numerical 
solver technology. Another significant advantage of the 
zonal approach with respect to other surrogates lies 
in its favorable scaling property when the third di- 
mension is introduced because the ratio between CFD- 
solved and POD-predicted points decreases. Further- 
more, zonal POD allows us to solve the high-fidelity 
flow field locally, i. e., only where it is required by ge- 
ometry-driven considerations. This represents a tremen- 
dous benefit when the complexity of the design case 
grows. As an example, if the goal is to optimally fit 
a nacelle body in an already optimal wing, the high-fi- 
delity computation zone can be restricted to catch only 
the wing—nacelle interaction phenomena, leaving the 
POD model to predict the flow field outside. To further 
bridge the gap with real-world applications and needs, 
future work will be focused on validating the proposed 
methodology in large-scale, multi-point aerodynamic 
problems involving huge design parameter spaces and 
aiming at predicting the aerodynamic characteristics 
in deep transonic flow. Finally, improvements towards 
a more efficient exploitation of approximate models in 
the search algorithm are under investigation. In partic- 
ular, a bi-objective approach seems promising where 
the first objective is the true function evaluation and 
the second is the surrogate one. An asymmetric al- 
gorithm, i.e., an algorithm that invokes the surrogate 
many more times than the high-fidelity one, was al- 
ready proposed in a multi-fidelity environment [60.16] 
and will be extended to adaptive POD-based surrogate 
models. 


60.6 D.P. Raymer: Aircraft Design: A Conceptual Ap- 
proach (AIAA, Reston 1999) 

D.P. Raymer: RDS-Student, Software for Aircraft De- 
sign, Sizing, and Performance (AIAA, Reston 2000) 
R.H. Myers, D.C. Montgomery: Response Surface 
Methodology: Process and Product Optimization 
Using Designed Experiments (Wiley, New York 1995) 
A. Giunta, L.T. Watson: A comparison of approx- 
imation modeling techniques: Polynomial versus 
interpolating models, Proc. 11th AIAA/ISSMO Multi- 
discip. Anal. Optim. Conf., St. Louis (1998) pp. 392- 
404 

H.M. Gutmann: A radial basis function method for 
global optimization, J. Global Optim. 19, 201-227 
(2001) 

J.H. Friedman: Multivariate adaptive regression 
splines, Ann. Stat. 19(1), 1-141 (1991) 


60.7 


60.8 


60.9 


60.10 


60.11 


Aerodynamic Design with Physics-Based Surrogates 


References 


60.12 


60.13 


60.14 


60.15 


60.16 


60.17 


60.18 


60.19 


60.20 


60.21 


60.22 


60.23 


60.24 


60.25 


M.H. Hassoun: Fundamentals of Artificial Neural 
Networks, A (MIT Press, Cambridge 1995) 

A.1.J. Forrester, N.W. Bressloff, A.J. Keane: Opti- 
mization using surrogate models and partially con- 
verged computational fluid dynamics simulations, 
Proc. R. Soc. A Math. Phys. Eng. Sci. 462(2071), 2177- 
2204 (2006) 

P.I.K. Liakopoulos, 1.C. Kampolis, K.C. Gian- 
nakoglou: Grid enabled, hierarchical distributed 
metamodel-assisted evolutionary algorithms for 
aerodynamic shape optimization, Future Gener. 
Comput. Syst. 24(7), 701-708 (2008) 

N.M. Alexandrov, R.M. Lewis, C.R. Gumbert, 
L.L. Green, P.A. Newman: Optimization with 
variable-fidelity models applied to wing design, 
ICASE Re. No. 99-49 (Institute for Computer Ap- 
plications in Science and Engineering, Hampton 
1999) 

D. Quagliarella: Airfoil design using Navier-Stokes 
equations and an asymmetric multi-objective 
genetic algorithm, Proc. EUROGEN 2003 Conf., 
Barcelona (2003) 

D.R. Jones, M. Schonlau, W.J. Welch: Efficient 
global optimization of expensive black-box func- 
tions, J. Global Optim. 13, 455-492 (1998) 

M. Schonlau, W.J. Welch, D.R. Jones: Global 
versus local search in constrained optimization 
of computer models, IMS Lect. Notes 34, 11-25 
(1998) 

D.R. Jones: A taxonomy of global optimization 
methods based on response surfaces, J. Global Op- 
tim. 21, 345-383 (2001) 

N. Queipo, R. Haftka, W. Shyy, T. Goel, R. Vaidya- 
nathan, P. Kevintucker: Surrogate-based analysis 
and optimization, Prog. Aerosp. Sci. 41(1), 1-28 
(2005) 

T.W. Simpson, V.V. Toropov, V. Balabanov, F.A.C. Vi- 
ana: Design and analysis of computer experiments 
in multidisciplinary design optimization: A review 
of how far we have come - or not, Proc. 12th 
AIAAIISSMO Multidiscip. Anal. Optim. Conf. (2008) 
pp. 1-22 

A.l.J. Forrester, A.J. Keane: Recent advances in 
surrogate-based optimization, Prog. Aerosp. Sci. 
45(1-3), 50-79 (2009) 

A. Sdbester, S.J. Leary, A.J. Keane: A parallel updat- 
ing scheme for approximating and optimizing high 
fidelity computer simulations, Struct. Multidiscip. 
Optim. 27, 371-383 (2004) 

T. Goel, R.T. Haftka, W. Shyy, N.V. Queipo: Ensemble 
of surrogates, Struct. Multidiscip. Optim. 33(3), 199- 
216 (2007) 

J.L. Lumley: The structure of inhomogeneous tur- 
bulent flows, Atmos. Turbul. Radio Wave Propag.- 
Proc. Int. Colloq. (1965) pp. 166-178 


60.26 


60.27 


60.28 


60.29 


60.30 


60.31 


60.32 


60.33 


60.34 


60.35 


60.36 


60.37 


60.38 


60.39 


60.40 


E. luliano: Towards a pod-based surrogate model 
for CFD optimization, Proc. ECCOMAS CFD Optim. 
Conf. (2011) 

E. luliano, D. Quagliarella: Surrogate-based aero- 
dynamic optimization via a zonal pod model, Proc. 
EUROGEN 2011 Conf., Barcelona (2011) 

P.H. Cook, M.C.P. Firmin, M.A. McDonald: Aero- 
foil RAE 2822: Pressure Distributions, and Boundary 
Layer and Wake Measurements, Technical memo- 
randum (Royal Aircraft Establishment, Famborough 
1977) 

B.M. Kulfan: Universal parametric geometry repre- 
sentation method, J. Aircr. 45(1), 142-158 (2008) 
D.C. Montgomery: Design and Analysis of Experi- 
ments (Wiley, New York 2006) 

M.D. McKay, W.J. Conover, R.J. Beckman: A compar- 
ison of three methods for selecting values of input 
variables in the analysis of output from a computer 
code, Technometrics 21, 239-245 (1979) 

M.S. Eldred, B.J. Bichon, B.M. Adams, S. Mahade- 
van: Overview of reliability analysis and design 
capabilities in DAKOTA with application to shape 
optimization of MEMS. In: Structural Design Opti- 
mization Considering Uncertainties, ed. by Y. Tsom- 
panakis, N.D. Lagaros, M. Papadrakakis (Taylor 
Francis, New York 2008) pp. 401-432 

J. Goblet, |. Lepot: Two Adaptive DOE Strategies for 
POD-Based Surrogate Models, Tech. Rep. (CENAERO 
2010) 

C. Sainvitu, M. Guenot, I. Lepot, J. Goblet: Adaptive 
sampling strategies for POD-based surrogate mod- 
els in an optimization framework, Proc. EUROGEN 
2011 Conf. (2011) 

S. Rippa: An algorithm for selecting a good value 
for the parameter c in radial basis function inter- 
polation, Adv. Comput. Math. 11, 193-210 (1999) 

M. Amato, P. Catalano: Non linear «-e turbulence 
modeling for industrial applications, Proc. ICAS 
2000 Conf. (2000) 

D. Quagliarella, P. lannelli, P.L. Vitagliano, G. Chin- 
nici: Aerodynamic shape design using hybrid evo- 
lutionary computation and fitness approximation, 
AAIA 1st Int. Syst. Tech. Conf. (2004) 

D. Quagliarella, A. Vicini: GAs for aerodynamic 
shape design l: General issues, shape parametriza- 
tion problems and hybridization techniques, Lect. 
Ser. van Kareman Inst. Fluid Dyn. (2000) 

D. Quagliarella, A. Vicini: GAs for aerodynamic 
shape design Il: Multiobjective optimization and 
multi-criteria design, Lect. Ser. van Kareman Inst. 
Genet. Algorithm. Optim. Aeronaut. Turbomach. 
(2000) 

J.E. Eddy, K. Lewis: Effective generation of Pareto 
sets using genetic programming, Proc. DETC '01 
ASME Comput. Inform. Eng. Conf. (2001) 


1209 


09 | 3 Hed 


61. Knowledge Discovery in Bioinformatics 


Julie Hamon, Julie Jacques, Laetitia Jourdan, Clarisse Dhaenens 


Biomedical research progresses rapidly, in par- 
ticular in the area of genomic and postgenomic 
research. Hence many challenges appear for bio- 
statistics and bioinformatics to deal with the large 
amount of data generated. After presenting some 
of these challenges, this chapter aims at pre- 
senting evolutionary combinatorial optimization 
approaches proposed to deal with knowledge dis- 
covery in bioinformatics. Therefore, the chapter 
will focus on three main tasks of data mining (as- 
sociation rules, feature selection, and clustering) 
widely encountered in bioinformatics applications. 
For each of them, a description of the task will 
be given as well as information about their uses 
in bioinformatics. Then, some evolutionary ap- 
proaches proposed to cope with such a task will 
be exposed and discussed. 


61.1 Challenges in Bioinformatics ................ 1211 
61.2 Association Rules by Evolutionary 
Algorithm in Bioinformatics ................. 1212 


61.1 Challenges in Bioinformatics 


Biomedical research progresses rapidly, in particular in 
the area of genomic and postgenomic research. Hence 
many challenges appear for biostatistics and bioinfor- 
matics to deal with the large amount of data gener- 
ated. This data, related to the sequencement of the 
genome, may deal, for example, with the identifica- 
tion of more than 1 million single nucleotide polymor- 
phisms (SNPs) — corresponding to genetic variations — 
that can be used to carry out genome-wide associ- 
ation studies (GWAS). Analyzing such data requires 
advanced methods able to deal with such a large num- 
ber of information and with their specificities. This is 
the reason why knowledge discovery approaches have 


61.2.1 Association Rules Discovery ........ 1212 
61.2.2 Evolutionary Approaches 

for Association Rules 

in Bioinformatics... 1213 


61.3 Feature Selection for Classification 
and Regression by Evolutionary 
Algorithm in Bioinformatics ................. 1215 
61.3.1 Feature Selection..................00 1215 
61.3.2 Evolutionary Approaches 
for Feature Selection 
for Classification and Regression 
im Bioinformatits. sineren: 1216 


61.4 Clustering by Evolutionary Algorithm 
in Bioinformatics .......................cc cece 1218 
GLUT CSEOE oiris s 1218 
61.4.2 Evolutionary Approaches 
for Clustering 


in BiointotmatitS.ssrsescnsesssso 1219 
61.5 COMCIUSION, 6.535 cccscscccsavcssccssesceesdesstccsseee 1220 
References e aiaiai 1221 


been proposed to either: 


1. Extract interesting rules or 
2. Reduce the dimensionality of the data or 
3. Classify/cluster data. 


All these knowledge discovery tasks have been 
addressed by several communities: statistics, machine 
learning, and combinatorial optimization. This latest is 
the subject of this chapter and a recent review reports 
synergies between operations research and data min- 
ing [61.1]. In this chapter, we focus on evolutionary 
combinatorial optimization and see how it may be used 
to extract knowledge for bioinformatics. 


1211 


1212 


719 |3 Hed 


Evolutionary Computation 


Many problems arise in bioinformatics. In order to 
illustrate this chapter, three types of applications will be 
mainly used. They are described hereafter: 


@ Microarray — Gene expression data: A typical 
bioinformatics application requiring knowledge dis- 
covery deals with DNA microarray data analysis. 
Indeed, DNA microarray experiments are of great 
interest and importance for biologists; thanks to 
their ability to simultaneously measure the expres- 
sions and interactions of thousands of genes. Such 
experiments are used to point out, for example, 
genes of predisposition to some diseases such as 
diabetes, cancer, etc. These experiments are gener- 
ating huge amounts of data that need to be analyzed. 
Those data are mainly represented in gene ex- 
pression matrix. Some experiments add the time 
parameter (to analyze the evolution of the expres- 
sions after a stress, for example) and report results 
at different time points. This special case is some- 
times called 3-D-microarray (three-dimensional mi- 
croarray). For microarray data analysis several data 
mining approaches have been proposed (associa- 
tion rule discovery, feature selection, clustering, and 
bi-clustering) [61.2] and benchmarks are available 
to compare efficiency of methods. Hence it will 
provide a good illustrative application along this 
chapter. As the number of genes to consider is huge 
many heuristics, and in particular evolutionary algo- 
rithms, have been proposed to deal with such data. 

© Genome-wide association studies: Another inter- 
esting approach to find genetic susceptibility for 
disease is to track genetic variations. Indeed, as 
indicated by Moore et al. in their recent study of 
Bioinformatics challenges for genome-wide associ- 
ation studies, the sequencing of the human genome 
has made possible to identify more than 1 million 
SNPs (genetic variations) across the genome that 
can be used to carry out GWAS in order to re- 
veal genetic basis of disease susceptibility [61.3]. 
First approaches used to deal with these massive 


amounts of GWAS data, mainly based on biostatis- 
tics have enabled the discovery of new associations. 
However, as such approaches consider only one 
SNP at a time and most of the time ignore the 
genomic and environmental contexts, more com- 
plex approaches, that consider genotype—phenotype 
relationships have to be proposed. Regarding the 
large number of SNPs to consider and the complex- 
ity of the relationships to discover, the knowledge 
discovery paradigm has been used to deal with 
such data and optimization approaches have been 
proposed. 

© Protein analysis: There are now plenty of identified 
proteins that are not completely known. For exam- 
ple, their function may still be unknown (even if the 
sequence may be known). However, the knowledge 
of their functions is crucial for the development 
of new drugs. Hence, automated function predic- 
tion is an active research field and computational 
techniques that use high-throughput experimen- 
tal data (protein and genome sequences, protein 
interaction networks, phylogenetic profiles, etc.), 
have been developed. Once again such experiments 
produce a large amount of data that need to be 
analyzed. 


Considering this variety of applications, this chapter 
aims at presenting evolutionary combinatorial opti- 
mization approaches proposed to deal with knowledge 
discovery in bioinformatics. Therefore, the chapter will 
focus on three main tasks of data mining widely en- 
countered in bioinformatics applications. These tasks 
are: association rules, feature selection, and clustering 
(unsupervised classification). For each of them, a de- 
scription of the task will be given as well as information 
about their uses in bioinformatics. Then, some evolu- 
tionary approaches proposed to cope with such a task 
will be exposed and discussed. A table provides an 
overview of these approaches and serves as a guideline 
for the reader to know which type of approach to use in 
a specific context. 


61.2 Association Rules by Evolutionary Algorithm in Bioinformatics 


61.2.1 Association Rules Discovery 


Task Description 
The problem of discovering association rules was first 
formulated in [61.4] and was called the market-basket 


problem. The initial problem was the following: given 
a set of items and a large collection of sales records, 
which consist of a transaction date and the items pur- 
chased in that transaction, the task is to find significant 
relationships between the items contained in different 


Knowledge Discovery in Bioinformatics 


61.2 Association Rules in Bioinformatics 


transactions. Since this first application, many other 
problems, in particular in bioinformatics, have been 
studied with association rules that may be defined in 
a more general way. Let us consider a database com- 
posed of transactions (records or objects) described ac- 
cording to several — maybe many — attributes (features 
or columns). Association rules provide a very simple 
(but useful) way to present correlations or other rela- 
tionships among attributes (features) expressed in the 
form A => C, where A is the antecedent part (condition) 
and C the consequent part (prediction). A and C are 
sets of attributes that are disjoint. The best-known algo- 
rithm to mine association rules is A-priori, proposed by 
Agrawal and Srikant [61.5]. This two-phase algorithm 
first finds all frequent item sets (sets of items — or at- 
tributes — that often occur together within transactions) 
that have at least a given minimum level of confidence. 
This is done via an efficient search exploiting the down- 
ward closure property of support (which measures the 
frequency of the rules). A lot of improvements upon the 
initial method, as well as efficient implementations (in- 
cluding parallel implementations) have been proposed 
to be able to deal with very large databases [61.6-8]. 

We note that a specific case of rule mining deals with 
classification rules where the consequent is the same for 
every rule. This may be seen as a straightforward clas- 
sification task; however, the models and methods used 
for this are closed to those used more generally in rule 
mining; hence it will be considered in this section. 

The task of discovering effective association rules 
may be seen as a combinatorial optimization problem, 
as rules are combinations of attributes. Each attribute 
may participate to the rule in the antecedent or the con- 
sequent part. Each attribute may have several values 
that have to be checked. As the number of attributes 
may be very large (up to several thousands), the number 
of possible rules (choice of the attributes that participate 
to the rule and their values) may be very large. There- 
fore, efficient methods (heuristic approaches and in 
particular evolutionary approaches) are direly needed. 


Use in Bioinformatics 
In their survey, Atluri et al. present different types of as- 
sociation patterns and discuss some of their applications 
in bioinformatics [61.9]. They indicate that associa- 
tion rules discovery has not been widely used yet in 
bioinformatics except to deal with microarray data and 
data on genetic variations (SNP data) for which sev- 
eral works exist. Their feeling is that association rules 
have been underutilized in bioinformatics and they pro- 
pose, in their article, to give hints on how to exploit the 


potential benefits of such an approach to deal with pro- 
tein function prediction and in particular to address the 
noise and the incompleteness issues of currently protein 
interaction network data. 

In addition, association rules discovery allows the 
integration of external biological information with gene 
expression data. For example, Carmona-Saez et al. pro- 
pose an approach based on co-occurrence patterns, 
that integrates gene annotations and expression data 
to discover intrinsic associations among both data 
sources [61.10]. 


61.2.2 Evolutionary Approaches 
for Association Rules 
in Bioinformatics 


Motivations 
Association rules are a very general model and may 
overcome some drawbacks of other classical knowledge 
discovery tasks such as classification. For example, 
considering microarray data in which relationships be- 
tween genes are searched for, using classification will 
impose a gene participating to several relations to be 
classified in a single group. Classification will also have 
difficulty to point out relations between genes belong- 
ing to a same group and finally, classification will be 
made according to the whole set of experiments which 
do not allow to exhibit relationships between genes in 
a subset of conditions. Association rules may overcome 
these drawbacks by providing relationships between 
genes that occur in certain conditions. 

However, one of the drawback of classical associa- 
tion rules discovery approaches (algorithm A-priori for 
example), is the role played by the support measure. 
Indeed, allowing to identify low support rules (but still 
interesting information as rare rules may be very im- 
portant in the context of bioinformatics) will generate 
a huge number of rules that will be difficult to interpret. 
In this sense other types of approaches, using different 
quality measures, have to be proposed. 

In this sense, for example, Khabzaoui et al. pro- 
posed to analyze microarray data with an association 
rule-based technique. They modeled this problem as 
a multiobjective combinatorial optimization problem 
(which allows us to use other quality criteria than the 
support) and solved it using an evolutionary algorithm 
based on a genetic algorithm. Therefore, specific mech- 
anisms (mutation and crossover operators, elitism, and 
so on) are designed for this task [61.11]. In order to 
improve the quality of the rules obtained, cooperative 
approaches are proposed [61.12]. 


1213 


7'19 |3 Hed 


1214 Part E 


Evolutionary Computation 


7'19 |3 Hed 


Overview 

Firstly, this section will introduce common approaches 
used in evolutionary rule mining, like learning clas- 
sifier systems (LCSs), rough sets approach or genetic 
programming. Secondly, Pittsburgh and Michigan rule 
designs will be detailed. Finally, implementation details 
of genetic algorithms for rule mining with applications 
in bioinformatics will be presented, and a summary ta- 
ble will be provided. 


Some Classical Approaches. Learning classifier sys- 
tems (LCS) come from the machine learning com- 
munity and are useful for classification tasks using 
classification rules. LCS evolve a population of classi- 
fiers — decision trees, rules, or rule sets — using a genetic 
algorithm and a credit assignment module that awards 
good classifiers. A more detailed introduction to LCS 
can be found in [61.13]. Some bioinformatics applica- 
tions have been realized with GAssist algorithm [61.14] 
and its successor (BioHEL) (bioinformatics-oriented 
hierarchical evolutionary Learning) [61.15]. 

The rough set approach consists in finding approx- 
imation sets of features: a lower set, whose features 
allows us to identify objects that certainly belong to 
approximated set, and an upper set whose features de- 
scribe objects that probably belong to approximated 
set. Rules can be generated from these resulting sets. 
More complete information about rough set theory 
and applications can be found in [61.16]. In [61.17, 
18], Rosetta toolkit was used to solve bioinformatics 
problems; Vinterbo and Øhrn explained in [61.19] the 
implementation of a genetic algorithm for rough sets in 
Rosetta toolkit and show that this algorithm allows us 
to produce smaller rules with better predictability. They 
measured the predictability with the AUC score (area 
under ROC curve). The ROC curve (receiver operating 
characteristic) is often used in data mining to assess the 
performance of classification algorithms. It is plotted 
using true positive rate (known as sensitivity) and false 
positives rate (also called 7 — specificity) as axes. More 
details can be found in [61.20]. 

ROC curve is often used in data mining to assess 
the performance of classification algorithms, especially 
ranking algorithms. It is plotted using true positive 
rate (known as sensitivity) and false positives rate (also 
called 7 — specificity) as axes. More details can be found 
in [61.20].)). 

Genetic programming has been used to extract rules 
from biological data. For example, Pappa and Frei- 
tas proposed an original approach to predict protein 
postsynaptic activity [61.22]. Since there are a lot of 


classification algorithms, and the impact of the choice 
of the algorithm is important, they chose to design an 
algorithm that searches for a good classification al- 
gorithm. Therefore, they use grammar-based genetic 
programming (GGP). The main difference with the 
genetic programming approach implemented by Yang 
et al. [61.23] is the use of a grammar. 


Rule Design. When mining rules, two designs are 
available: in Michigan design, each solution is a rule, 
while in Pittsburgh design each solution is a rule set. 
Pittsburgh design has a larger search space; moreover, 
fitness and operators are harder to implement. However, 
with this design there is no need to use a covering algo- 
rithm to encourage rules from the same solution (rule 
set) to cover different objects. With the Michigan de- 
sign, without any covering algorithm, solutions (rules) 
can cover the same objects. This point may cause prob- 
lems when searching for classification rules. In [61.26], 
Bacardit and Butz compared Michigan and Pittsburgh 
LCS. They concluded that both are suitable for data 
mining. Michigan LCS tend to overfit the data — rules 
are too specific — while Pittsburgh LCS are sometimes 
too general and miss some search subspaces. 


Genetic Algorithm Design. As many genetic algo- 
rithms have been proposed to deal with rule mining 
(Table 61.1), this paragraph presents the main compo- 
nents used: 


© Initial population: Most of the time, the initial popu- 
lation is composed of random individuals. However, 
Cho et al. initialized their population with rules gen- 
erated by a neural network [61.21]. In [61.14], the 
population is initialized by iteratively choosing one 
object and generating a rule covering it. 

© Fitness function: It often contains an objective on 
the rule size to limit bloat effect and overfitting, 
both responsible of generating specific and compli- 
cated rules. This is an application of the minimum 
description length (MDL) principle. It is frequently 
associated to a performance measure: quality of 
hitting sets in rough sets, accuracy, coverage, con- 
fidence, or AUC. As many measures have been 
proposed, we will not detail all of them, but refer 
to the review of Geng and Hamilton on rule in- 
terestingness measures [61.27]. Pappa and Freitas 
recommend to use sensitivity x specificity as the fit- 
ness function when class distribution is unbalanced 
(in their protein data, only 6.04% of objects had the 
positive class) [61.22]. The majority of bioinformat- 


Knowledge Discovery in Bioinformatics 


61.3 Feature Selection in Bioinformatics 


Table 61.1 Overview of applications of evolutionary rule mining in bioinformatics 


Application EA Approach Design Evaluation Encoding Operators Reference 
function 

Protein structure GA LCS Pittsburgh Size accuracy Hyper Tournament selection, [61.15] 

prediction rectangle 1-point crossover 

Protein binding GA NN Michigan Confidence, size Fixed size RWS, 1|-point [61.21] 

(hybrid) value-vectors crossover 

Protein GGP Normalized Grammar [61.22] 

classification accuracy derivation tree 

Protein binding GA Rough sets Michigan Size, coverage Binary SUS [61.18] 

Microarray GP Michigan Size, coverage Binary Elite selection, cut & [61.23] 

splice crossover 

Microarray MOEA Michigan Support, jmeasure, RWS [61.11] 
interest surprise, 
confidence 

Microarray GA LCS Pittsburgh Accuracy, Binary Tournament selection, [61.24] 
misclassification 1-point crossover 
rate 

3-D-Microarray GA Rough sets Michigan Size, coverage Binary SUS [61.25] 


(time series) 


MOEA: multiobjective evolutionary algorithm, GA: genetic algorithm, GP: genetic programming, GGP: grammar-based genetic 
programming, SUS: stochastic universal sampling, LCS: learning classifier systems, NN: neural network, RWS: roulette wheel 


selection 


ics applications use an aggregation to combine these 
multiple objectives. Weights are sometimes intro- 
duced to balance between objectives. For exam- 
ple, Bacardit uses an automatic weighting function 
that changes weights while the algorithm is run- 
ning [61.28]. However, weights can be difficult to 
configure; therefore, multiobjective algorithms can 
overcome this problem [61.11, 29]. 

© Encoding: For the greater part, encoding is binary. 
In rough sets approach, it is fixed size and matches 
selected features of the approximated sets. Rules 
can be encoded as binary or list of values or fea- 
tures and values. In [61.15] a rule representation for 
real-values features is used: hyper-rectangle instead 


of fuzzy rules approaches or data preparation with 
discretization methods. Pappa and Freitas encoding 
differentiates itself from others because of a gram- 
mar derivation tree encoding [61.22]. This is needed 
since they search for classification algorithms, and 
not for rules. 

© Operators: Mostly used crossover operators are 1- 
point crossovers. Casillas et al. proposed adapted 
crossover and mutation operators to deal with rule 
overlapping when using rule sets [61.29]. Parents 
are mainly selected using fitness proportionate se- 
lection, as stochastic universal sampling (SUS) or 
roulette wheel. Less frequently, elite selection and 
tournament are used. 


61.3 Feature Selection for Classification and Regression by Evolutionary 


Algorithm in Bioinformatics 


61.3.1 Feature Selection 


Feature selection is an active research domain in statis- 
tics (variable selection) and data mining communities. 
Feature selection can, jointly used with classification 
(or clustering), significantly improve the comprehensi- 
bility of the resulting classifier models and often build 
a model that generalizes better unseen points. The main 


idea of feature selection is to choose a subset of input 
variables by eliminating features with little or no pre- 
dictive information. Hence, finding the correct subset 
of predictive features is an important problem in itself. 

Feature selection for classification can be classi- 
fied in three classes depending on how the selection 
process is combined with the classifier: the wrapper ap- 
proach, the filter approach, and the embedded approach. 


1215 


€°L9 | 3 Hed 


1216 PartE 


Evolutionary Computation 


€°L9 | 3 Hed 


The wrapper approach model uses learning algorithms 
during the feature selection process and assesses the se- 
lected features by the learning algorithm’s performance 
using, for example, accuracy, sensitivity, or specificity. 
The filter approach model considers statistical char- 
acteristics of a dataset directly without involving any 
learning algorithm. In the embedded approach model, 
the learning algorithm uses its own embedded feature 
selection algorithm (either explicit or implicit). Let us 
remark that an hybrid approach model is sometimes 
used to, first adopt a filter approach that will reduce the 
number of features to consider, and then realize, with 
the remaining features, a wrapper approach that will se- 
lect features in a more accurate way. 

The general task of feature selection can be for- 
mulated as an optimization problem. Binary values 
of the variables x; are used in order to indicate the 
presence (x; = 1) or the absence (x; = 0) of the fea- 
ture i in the optimal feature set. Then, the problem 
is formulated as max,—(.,,....x,)efo:1}" F(x) for a func- 
tion F that has to be determined regarding the context 
(filter, wrapper, or embedded approach and applica- 
tion under study). In filter approaches, many different 
statistical feature selection measures, such as the cor- 
relation feature selection (CFS) measure, the minimal- 
redundancy-maximal-relevance (nRMR) measure, the 
discriminant function, or the Mahalanobis distance have 
been used to assess to each feature a score. In wrapper 
approaches, classification algorithms may be used to as- 
sign to a selection of features a score that represents 
the ability of the selection to lead to a correct classi- 
fication. Such classical algorithms are KNNs (k nearest 
neighbors), SVMs (support vector machines), NN (neu- 
ral networks), etc. 

As reported by Kim etal., traditional approaches 
to feature selection with a single criterion have shown 
some limitations [61.30]. Therefore, they propose to 
consider this problem as a multiobjective one and 
present an adaptation of (ELSA) (evolutionary local se- 
lection algorithm) , inspired from artificial life models 
of adaptive agents to cope with this multiobjective prob- 
lem. Another multiobjective approach may be found in 
Garcia-Nieto et al. where a multiobjective genetic algo- 
rithm is used for cancer diagnosis from gene selection 
in microarray datasets [61.31]. 


Use in Bioinformatics 
As indicated by Saeys et al., feature selection in bioin- 
formatics is motivated by the high-dimensional na- 
ture of modeling tasks (sequence analysis over mi- 
croatray analysis, spectral analyses, literature mining, 


etc.) [61.32]. Let us remark, that in contrast with other 
dimensionality reduction techniques (based on projec- 
tion, or compression, for example), feature selection 
techniques do not modify data. Thus they preserve the 
original semantics of the variables which helps the in- 
terpretability of results. 

In their review of Feature selection techniques in 
bioinformatics, Saeys etal. identify three classes of 
problems where feature selection is involved [61.32]: 


@ Sequence analysis 
@ Microarray analysis 
@ Mass spectra analysis. 


Sequence analysis deals with the study of either the 
content of the sequence or its signal. As far as the con- 
tent is concerned, the prediction of subsequences that 
code for proteins requires a feature selection to cope 
with the large number of features that can be extracted 
from a sequence and the lack of samples available. 
Recently, feature selection approaches have also been 
used for other applications such as the recognition of 
promoter regions or the prediction of microRNA tar- 
get. Regarding the signal analysis, the aim is mainly 
to identify more or less conserved signals in the se- 
quence (motifs), representing binding sites for proteins. 
Therefore, regression approaches are proposed to relate 
motifs to gene expression levels and feature selection 
can be used to search for the best motif. 

Microarray analysis, as already said, poses a great 
challenge for computational techniques because of their 
large dimensionality. Saeys et al. give in their survey an 
overview of the most influential techniques [61.32]. In 
particular, genetic algorithms can be used to deal with 
microarray data in wrapper type approaches. 

Mass spectra analysis deals with the analysis of 
thousands of signal intensity measures. This context, 
even if data are different, is very similar to microarray 
analysis and feature selection is an important step to 
reduce the dimensionality of the problem. Genetic al- 
gorithms and nature inspired algorithms have been pro- 
posed to deal with such data, using wrapper approaches. 


61.3.2 Evolutionary Approaches for Feature 
Selection for Classification 
and Regression in Bioinformatics 


Motivations 
With the development of technologies, many bioinfor- 
matics applications deal with large datasets, often with 
more features than objects (samples). However, among 
these features, some are irrelevant or redundant. That is 


Knowledge Discovery in Bioinformatics 


61.3 Feature Selection in Bioinformatics 


why feature selection aims to select a subset of relevant 
features. Reducing the dimension of the problem, this 
method can reduce the computational time and improve 
prediction accuracy. Indeed, including nonsignificant 
features can induce a noise and may mask significant 
ones. 

Traditional feature subset selection methods are se- 
quential and based on greedy heuristics. For example, 
sequential forward selection (SFS) starts with an empty 
subset and iteratively adds some features, whereas the 
sequential backward selection (SBS) starts with the full 
feature set and iteratively removes features [61.47]. An 


important drawback of these methods is that they con- 
sider one feature at a time, ignoring possible interactions 
between features. Fairly recently, more advanced meth- 
ods such as evolutionary algorithms have been proposed 
to explore the space of feature subsets [61.35, 48]. 


Overview 
Evolutionary algorithms for feature selection in bioin- 
formatics are rarely used in a filter approach as such 
approaches ignore the effects of the selected feature 
subset on the performance of the classifier and do not 
consider existing correlations between features. Hence, 


Table 61.2 Overview of evolutionary feature selection applications in bioinformatics 


Application EA Approach Classifier Evaluation function Encoding Operators Reference 
Microarray (Cancer) GA wW KNN CA (LOOCYV) Binary [61.33] 
Mass spectra GA W KNN CA (LOOCV) Discrete [61.34] 
(Cancer) 
Microarray (Cancer) Hybrid W KNN CA (LOOCV), Binary Rank-based RWS, [61.35] 
GA # features m-point crossover 
Microarray GA WwW SVM Sensitivity, specificity, Binary RWS, 1-point/2-point [61.36] 
geometric mean crossover 
Microarray (Cancer) GA W AP/SVM_ CA (LOOCV) SUS and RWS, Uniform [61.37] 
and 1-point crossover 
Microarray (Cancer) GA W SVM CA (10-fold), Binary RWS, 2-point crossover, [61.38] 
# features, feature cost elitist replacement 
Microarray (Cancer) GA W SVM CA (10-fold) Binary Specific SSOCF crossover [61.39] 
Microarray (Cancer) GA E SVM CA (10-fold), Binary + SUS, Specific crossover, [61.40] 
# features coefficient Specific mutation 
vector 
Microarray (Cancer) GA H SVM CA (LOOCV) Binary RWS, random 1-point [61.41] 
crossover, multiuniform 
mutation 
Microarray (Cancer) GP CA (10-fold), Binary Reproduction, [61.42] 
# features homo(hetero)geneous 
crossover 
Microarray (Cancer) GP AUC-ROC Generational, tournament [61.43] 
selection 
Microarray (Cancer) MOEA W GS CA (LOOCY), Binary Elitist + ranking selection [61.44] 
# features 
Microarray (Cancer) MOEA WwW SVM Sensitivity, specificity Binary SSOCF crossover, bit-flip [61.31] 
(NSGATI) (10-fold) # features mutation (uniform, one 
reduction, zero reduction) 
Microarray Parallel EH-DIALL, CLUMP Discrete Uniform crossover, [61.45] 
(Diabetes/obesity) GA specific mutation 
Mass spectra GA WwW MLR PLS RMSEP Binary [61.46] 
(Regression) (data-splitting) 


EA: evolutionary algorithm, MOEA: multiobjective evolutionary algorithm, GA: genetic algorithm, GP: genetic programming, GPSO: geo- 
metric particle swarm optimization. Approach: wrapper (W), embedded (E), hybrid (H). Classifiers: KNN: k nearest neighbor, SVM: support 
vector machine, AP: all paired, MLR: multiple linear regression, PLS: partial least square. Evaluation functions: CA: classification accuracy, 
AUC: area under curve, LOOCYV: leave-one-out cross-validation, RMSEP: root-mean-square error of prediction. Operators: SUS: stochastic 
universal sampling, RWS: roulette wheel selection, SSOCF: subset size-oriented common features 


1217 


€19 | 3 Hed 


1218 ~=PartE 


Evolutionary Computation 


Lo | 3 Hed 


considering jointly the feature subset selection and the 
classification (or regression) process is more promising. 
This can be performed by three different approaches: 
embedded [61.40], hybrid [61.41], or more frequently 
wrapper approaches. 

Any evolutionary algorithm can be used for feature 
selection. For example, some methods use genetic pro- 
gramming [61.42], but the vast majority uses genetic 
algorithms (GAs). Table 61.2 reports some works about 
evolutionary feature selection applications in bioinfor- 
matics. These works are described according to the 
application field, the evolutionary algorithm used, the 
approach (embedded, hybrid, wrapper), the classifier 
used (when one is used), the evaluation function and the 
specificities about encoding and operators. This table 
helps to identify tendencies of the use of evolutionary 
algorithms for feature selection in bioinformatics: 


@ Encoding: Solutions are mainly encoded with bi- 
nary vectors of size n (initial number of features), 
each bit indicating if a feature is selected or not. 
However, in their studies, Jourdan et al. [61.45] and 
Li etal. [61.34] propose to use a discrete vector 
encoding, where each solution is described by the 
list of the selected features that is particularly well 
adapted for large datasets as encountered in bioin- 
formatics. 

@ Evaluation functions: As explained before, evolu- 
tionary algorithms are mainly used jointly with 
a classification algorithm such as KNN [61.35] or 
SVM [61.36]. Using such a classifier, allows us to 
evaluate the potential of the selection to lead to 
a good classification by the computation of the clas- 
sification accuracy. This accuracy can be computed 
with various methodologies such as k-fold cross- 
validation (10-fold, for example), leave-one-out 


cross-validation (LOOCV) or bootstrap methodol- 
ogy. The 0.632 booststrap has been proven to be the 
best estimator in [61.49], but the drawback of this 
method is its computational cost in comparison to 
LOOCYV. For this reason, most of the authors use 
LOOCYV which is fast and almost unbiased [61.35]. 
When dealing with larger datasets, 10-fold cross- 
validation can also be used [61.38]. 
The evaluation function can also take into ac- 
count other parameters such as the number of fea- 
tures [61.39]. In this context (feature subset size 
minimization and performance maximization), fea- 
ture selection can be viewed as a multiobjective 
optimization problem [61.31, 44]. 

© Operators: In terms of operators, some works 
deal with specific ones [61.39], but classical op- 
erators are mainly used. For example, we may 
cite the SUS or the roulette wheel selection 
(RWS) [61.37] for the selection, the 1-point or 
2-point crossovers [61.36] for the evolution of so- 
lutions, the bit-flip mutation etc. 


Feature selection is often used for classification, in 
order to predict a discrete trait and to classify sam- 
ples (disease or not, for example). However, to predict 
a quantitative trait (a value indicating the good dispo- 
sition for a treatment, for example), regression is used 
instead of classification. The problem is the same, as 
if too many features are available, including nonsignif- 
icant ones, the regression method will have difficulties 
to give good results. Hence, feature selection may also 
be used in a regression context. For example, Broad- 
hurst et al. combined a genetic algorithm with a mul- 
tiple linear regression (MLR) or with a partial least 
square (PLS) regression on a mass spectrometry prob- 
lem [61.46]. 


61.4 Clustering by Evolutionary Algorithm in Bioinformatics 


61.4.1 Clustering 


Task Description 
Clustering or unsupervised classification aims at de- 
composing or partitioning a (usually multivariate) 
dataset into groups so that objects in a group are similar 
to each other, and are as different as possible from ob- 
jects of other groups. A survey of clustering algorithms 
can be found in [61.50]; thus we will just introduce gen- 
eralities below. 

Clustering techniques can be broadly divided into 
three main types: partitional, hierarchical, and overlap- 


ping. Partitional and hierarchical clusterings produce 
a hard partition of data as an object must belong to one 
and a single group, whereas in overlapping clustering 
objects may belong to several groups. In clustering, the 
number of groups can be known and fixed before real- 
izing the clustering or must be determined directly by 
the algorithm. 

For partitional-based methods, the most common 
algorithm is k-means [61.51], which is often described 
as a local search. For hierarchical clustering, two dis- 
tinct types of hierarchical methods are identifiable: The 
agglomerative ones and the divisive ones. 


Knowledge Discovery in Bioinformatics 


61.4 Clustering in Bioinformatics 


Use in Bioinformatics 
Clustering is the most popular method currently used in 
the first step of gene expression matrix analysis. Clus- 
tering is appropriate when there is no a priori knowl- 
edge about the data. In such circumstances, the only 
possible approach is to study the similarity between dif- 
ferent samples or experiments. There are two straight- 
forward ways to study the gene expression matrix: 
comparing expression profiles of genes by comparing 
rows in the expression matrix and comparing expres- 
sion profiles of samples by comparing columns in the 
matrix. By comparing rows, we may find similarities or 
differences between different genes and thus conclude 
about the correlation between the two genes. If we find 
that two rows are similar, we can hypothesize that the 
respective genes are co-regulated and possibly function- 
ally related. By comparing samples, we can find which 
genes are differentially expressed in different situations. 


61.4.2 Evolutionary Approaches 
for Clustering in Bioinformatics 


Motivations 

Evolutionary clustering has been particularly used in 
bioinformatics as datasets are particularly large and 
classical methods are inefficient as they often lead to 
suboptimal solutions. A good survey of the use of 
evolutionary algorithms for clustering can be found 
in [61.52]. The authors proposed a classification of al- 
gorithms taking into consideration different aspects of 
evolutionary data clustering: 


@ Fixed or variable number of clusters 
Cluster-oriented or nonoriented operators 
Context-sensitive or context-insensitive operators 
Binary, integer, or real encoding 

Centroid-based, medioid-based, label-based, tree- 
based, or graph-based representations. 


Other surveys can be found on genetic based [61.53] 
and on multiobjective clustering [61.54, 55]. 


Overview 
Evolutionary algorithms for clustering bioinformatics 
data are applied to both, fixed number of clusters and 
variable number of clusters. The majority of the appli- 
cations concerns microarray data. In Table 61.3, some 
works are presented through important components (ap- 
plication field, evolutionary algorithm used, fixed or 
variable number of clusters (k is known or not?), etc.). 
Here below an attempt to describe tendencies of exist- 
ing methods is proposed by separating the two cases: 


fixed or variables number of clusters. At the end, infor- 
mation will be given about biclustering that is more and 
more used in bioinformatics. 


Fixed Number of Clusters. The number of clusters 
can be fixed before finding a clustering model through 
evolutionary algorithms. Therefore, the number of clus- 
ters (often denoted by k) can be fixed by an expert of 
the domain (here a biologist for example) or by using 
some specific criteria like a naive criterion k ~ a 
where n is the number of objects, or more specific 
ones based on an information criterion approach such 
as the Akaike information criterion (AIC), Bayesian in- 
formation criterion (BIC), or the deviance information 
criterion (DIC): 


@ Encoding: For a fixed number of clusters, there ex- 
ist several possible encodings: binary, integer, and 
real. For binary encoding, both prototype or parti- 
tion could be realized. For integer encoding, two 
usual representations are used: label-based encod- 
ing where each gene represents an object and the 
value indicates the label of the cluster it is assigned 
to [61.60]; the medioid-based encoding represents 
the prototype of each cluster (the object that depicts 
the cluster). The real encoding is used to repre- 
sent the coordinates of the center of each cluster 
and corresponds to the centroid-based representa- 
tion [61.57, 62, 63]. 

@ Fitness function: One other specific component for 
evolutionary clustering of bioinformatics data is the 
fitness function. Many clustering validity criteria 
exist and can be adapted to measure the quality 
of a solution of an evolutionary algorithm. Some 
examples are: minimization of the sum of within- 
cluster distances, minimization of the sum of the 
squared Euclidian distance of the objects from their 
respective cluster means [61.57], minimization of 
the distortion of the cluster (intracluster diversity), 
sum of the within cluster distances, etc. 

@ Operators: Concerning operators, some authors use 
classical operators like the 1-point crossover but 
a lot of articles show the drawbacks of classical 
genetic operators [61.57] and prefer to use context- 
sensitive operators [61.56]. 


Variable Number of Clusters. 


@ Encoding: For a variable number of clusters, where 
evolutionary algorithms aim at optimizing both the 
number of clusters and the partition of objects, the 
previously mentioned representations can be used 


1219 


Lo | 3 Hed 


1220 


G°L9 | 3 Hed 


Part E 


Evolutionary Computation 


Table 61.3 Overview of applications of evolutionary clustering in bioinformatics 


Application EA K? Evaluation function Encoding Operators Reference 
Microarrays GA Y __Interestingness measure Set of clusters + label Specific crossover, [61.56] 
of objects specific mutation 
Microarrays (HL-60, Memetic Y Minimum sum of square  Centroid Kmean (LS), Uniform + specific [61.57] 
HD-4cl, Yeast) (GA) crossover, split mutation 
Microarrays GA N XB Centroid RWS, specific crossover, muta- [61.58] 
(AD400_10_10, (Cluster validity index) tion value 
Yeast, Human, Rats) 
Microarrays MOEA N Overall deviation Set of clusters Binary tournament, specific [61.59] 
+ connectivity crossover, no mutation 
Microarrays GA N Silhouette + K Label Mutation: split + eliminate [61.60] 
a cluster, specific crossover 
Microarrays GA N | Silhouette + VRC Label + K Centroid-based crossover [61.61] 
(Lung + Leukemia) + distance + centroid 
Microarrays GA N Bayesian validation Centroid RWS, 1|-point crossover, muta- [61.62, 63] 
tion value 
Protein structure Chaotic N Max clustering Binary Specific crossover, [61.64] 
GA coefficient specific mutation 
Protein-Protein MOEA N Cluster size + 3 problem Centroid No crossover, mutation: split [61.65] 


functional interaction 


related functions 


+ delete, merge 


K?: is the number of clusters known? (Y: yes, N: no), MOEA: multiobjective evolutionary algorithm, GA: genetic algorithm, K: number of 
clusters, VCR: variance ratio criterion, XB: Xie-Beni cluster validity index, LS: local search, RWS: roulette wheel selection 


but there are also some new ones. For example, the 
number of clusters can be stored in the representa- 
tion [61.61]. There are also some rule-based repre- 
sentations [61.66], graph-based representations. . . 

© Operators: As the encodings can be more com- 
plex than in the case of a fixed number of clus- 
ters [61.61], operators are adapted to the represen- 
tation and the context of clustering. 

© Fitness function: Concerning the evaluation, the 
authors often use criteria of validity of cluster- 
ing [61.67]. We can also observe that the silhouette 
coefficient is often used to evaluate the quality of 
a clustering [61.60]. The authors can also add some 
problem-related functions as in [61.65]. 


Biclustering. Biclustering, (also called co-clustering 
or two-mode clustering) has for objective to com- 
pute biclusters (or co-clusters) that are associations 


61.5 Conclusion 


Bioinformatics research generates a lot of data and 
knowledge extracted from this data is still basic. Much 
more knowledge could be discovered with the proposi- 
tion of new data mining methods. Many of these data 
mining problems can be modelized as combinatorial 


of (possibly overlapping) sets of objects with sets of 
features. A biclustering algorithm computes simulta- 
neously linked partitions on both rows and columns. 
Many formulations of the biclustering problem have 
been proposed, such as hierarchical model, biclustering 
model, and pattern-based model. The term biclustering 
has been introduced by Cheng and Church in [61.68]. 
Up to now, in the context of bioinformatics, biclus- 
tering approaches have been proposed mainly to deal 
with microarray data [61.69]. As clusters may overlap 
in the two dimensions of the matrix and no constraint 
is given about their size, it may be possible to find 
a very large number of significant biclusters. Hence, 
to have a concise description of the data through bi- 
clusters, the size aspect is often considered as an 
additional objective. This leads to multiobjective mod- 
els for which MOEAs have been proposed [61.70, 
71]: 


optimization problems and efficient algorithms such as 
evolutionary algorithms can be used to explore the huge 
search space of these problems. Some research has been 
conducted in this sense and the aim of this chapter was 
to present the tendencies of these works. It shows that 


Knowledge Discovery in Bioinformatics | References 


some promising results have been obtained for several 
applications in bioinformatics. 

However, there is much room for future research 
since the problems in bioinformatics presented in this 
chapter require even more effective approaches to gain 
important knowledge from the biological and biomedi- 
cal experiments. In particular, information about the do- 
main under study is still underutilized within research 
methods. Biological aspects should be more present and 
this may be done thanks to a more accurate model- 
ing or the incorporation of biological concepts within 


References 


evaluation functions, for example. This could lead to 
multiobjective modelizations of these problems, where 
classical data mining criteria are jointly used with bio- 
logical ones. Some interesting works on multiobjective 
optimization in bioinformatics and computational biol- 
ogy are reported in [61.72]. 

Another interesting perspective is the cooperation 
between methods coming from different domains. In- 
deed, several communities are working of these prob- 
lematics and each of them acquired experience that can 
be exploited by making them cooperating. 


61.1 D. Corne, C. Dhaenens, L. Jourdan: Synergies be- 
tween operations research and data mining: The 
emerging use of multi-objective approaches, Eur. 
J. Oper. Res. 221(3), 469-479 (2012) 

61.2 F. Valafar: Pattern recognition techniques in mi- 
croarray data analysis, Ann. N. Y. Acad. Sci. 980(1), 
41-64 (2002) 

61.3 J.H. Moore, F.W. Asselbergs, S.M. Williams: Bioin- 
formatics challenges for genome-wide association 
studies, Bioinformatics 26(4), 445-455 (2010) 

61.4 R. Agrawal, T. Imielinski, A.N. Swami: Mining as- 
sociation rules between sets of items in large 
databases, Proc. 1993 ACM SIGMOD Int. Conf. Manag. 
Data (ACM, New York 1993) pp. 207-216 

61.5 R. Agrawal, R. Srikant: Fast algorithms for mining 
association rules in large databases, VLDB '94: Proc. 
20th Int. Conf. Very Large Data Bases (Morgan Kauf- 
mann, 1994) pp. 487-499 

61.6 C. Borgelt: Efficient implementations of a priori and 
eclat, Proc. 1st IEEE ICDM Workshop Freq. Item Set 
Min. Implement. (FIMI 2003) (2003), p. 90 

61.7 Y. ‘Ye, C.-C. Chiang: A parallel apriori algorithm 
for frequent itemsets mining, Proc. 4th Int. Conf. 
Softw. Eng. Res. Manag. Appl. (2006) pp. 87-94 

61.8 M.J. Zaki: Parallel sequence mining on shared- 
memory machines, J. Parallel Distrib. Comput. 
61(3), 401-426 (2001) 

61.9 G. Atluri, R. Gupta, G. Fang, G. Pandey, M. Stein- 
bach, V. Kumar: Association analysis techniques for 
bioinformatics problems, Proc. 1st Int. Conf. Bioin- 
form. Comput. Biol. (BICOB '09) (Springer, Berlin, 
Heidelberg 2009) pp. 1-13 

61.10 P. Carmona-Saez, M. Chagoyen, A. Rodriguez, 
0. Trelles, J. Carazo, A. Pascual-Montano: Inte- 
grated analysis of gene expression by association 
rules discovery, BMC Bioinformatics 7(1), 54 (2006) 

61.11 M. Khabzaoui, C. Dhaenens, E.-G. Talbi: A mul- 
ticriteria genetic algorithm to analyze microarray 
data, Evol. Comput., CEC2004. Congr., Vol. 2 (2004) 
pp. 1874-1881 


61.12 L.Jourdan, M. Khabzaoui, C. Dhaenens, E.-G. Talbi: 
A hybrid evolutionary algorithm for knowledge dis- 
covery in microarray experiments. In: Handbook 
of Bioinspired Algorithms and Applications, ed. by 
S. Olariu, A.Y. Zomaya (CRC, London 2005) pp. 491- 
508 

61.13 P. Lanzi: Learning classifier systems: Then and now, 
Evol. Intell. 1, 63-82 (2008) 

61.14 M. Stout, J. Bacardit, J.D. Hirst, R.E. Smith, 
N. Krasnogor: Prediction of topological contacts 
in proteins using learning classifier systems, Soft 
Comput. J. 13(3), 245-258 (2009) 

61.15 J. Bacardit, E.K. Burke, N. Krasnogor: Improving 
the scalability of rule-based evolutionary learning, 
Memet. Comput. 1(1), 55-67 (2008) 

61.16 R. Slowinski, S. Greco, B. Matarazzo: Rough sets 
in decision making. In: Encyclopedia of Complexity 
and Systems Science, ed. by R.A. Meyers (Springer, 
New York 2009) pp. 7753-7787 

61.17 J. Komorowski, A. Øhrn, A. Skowron: The ROSETTA 
Rough Set Software System (Oxford Univ. Press, New 
York 2002), Chap. D.2.3. 

61.18 H. Strombergsson, P. Prusis, H. Midelfart, M. Lap- 
insh, J.E.S. Wikberg, J. Komorowski: Rough set- 
based proteochemometrics modeling of G-pro- 
tein-coupled receptor-ligand interactions, Pro- 
teins: Struct. Funct. Bioinform. 63(1), 24-34 (2006) 

61.19 S. Vinterbo, A. Øhrn: Minimal approximate hitting 
sets and rule templates, Int. J. Approx. Reason. 
25(2), 123-143 (2000) 

61.20 T. Fawcett: An introduction to ROC analysis, Pattern 
Recognit. Lett. 27(8), 861-874 (2006) 

61.21 Y.J. Cho, H. Kim, H.-B. Oh: Generating rules for pre- 
dicting MHC class | binding peptide using ANN and 
knowledge-based GA, JDCTA: Int. J. Dig. Content 
Technol. Appl. 3, 111-119 (2009) 

61.22 G.L. Pappa, A.A. Freitas: Automatically evolving rule 
induction algorithms tailored to the prediction of 
postsynaptic activity in proteins, Intell. Data Anal. 
13, 243-259 (2009) 


1221 


19 | 3 Hed 


1222 


19 | 3 Hed 


Part E 


Evolutionary Computation 


61. 


61. 


61. 


61. 


61. 


61. 


61. 


61. 


61. 


61. 


61. 


61. 


61. 


61. 


61. 


23 


24 


25 


Z.R. Yang, G. Lertmemongkolchai, G. Tan, P.L. Fel- 
gner, R.W. Titball: A genetic programming ap- 
proach for Burkholderia pseudomallei diagnostic 
pattern discovery, Bioinformatics 25(17), 2256-2262 
(2009) 

X. Llorá, R. Reddy, B. Matesic, R. Bhargava: To- 
wards better than human capability in diagnosing 
prostate cancer using infrared spectroscopic imag- 
ing, GECCO '07 Proc. 9th Annu. Conf. Genet. Evol. 
Comput. (2007) 

A. Laegreid, T.R. Hvidsten, H. Midelfart, J. Ko- 
morowski, A.K. Sandvik: Predicting gene ontology 
biological process from temporal gene expression 
patterns, Genome Res. 13(5), 965-979 (2003) 

J. Bacardit, M.V. Butz: Data mining in learning clas- 
sifier systems: Comparing XCS with GAssist. IWLCS 
2003-2005, Lect. Notes Artif. Intell. 4399, 282-290 
(2007) 

L. Geng, H.J. Hamilton: Interestingness measures 
for data mining: A survey, ACM Comput. Surv. (CSUR) 
38(3), 9 (2006) 

J. Bacardit: Pittsburgh Genetic-Based Machine 
Learning in the Data Mining Era: Representations, 
Generalization, and Run-Time, Ph.D. Thesis (Uni- 
versitat Ramon Llull, Barcelona 2004) 

J. Casillas, P. Martínez, A. Benítez: Learning consis- 
tent, complete and compact sets of fuzzy rules in 
conjunctive normal form for regression problems, 
Soft Comput. Fus. Found. Methodol. Appl. 13, 451- 
465 (2009) 

Y.S. Kim, W.M. Street, F. Menczer: Feature selection 
in data mining. In: Data Mining: Opportunities and 
Challenges, ed. by J. Wang (Idea Group, Hershey 
2002) pp. 80-105 

J. García-Nieto, E. Alba, L. Jourdan, E.-G. Talbi: 
Sensitivity and specificity based multiobjective ap- 
proach for feature selection: Application to cancer 
diagnosis, Inf. Process. Lett. 109, 887-896 (2009) 
Y. Saeys, I. Inza, P. Larraaga: A review of feature 
selection techniques in bioinformatics, Bioinfor- 
matics 23(19), 2507-2517 (2007) 

T.J. Umpai, S. Aitken: Feature selection and clas- 
sification for microarray data analysis: Evolution- 
ary methods for identifying predictive genes, BMC 
Bioinformatics 6(1), 148 (2005) 

L. Li, D.M. Umbach, P. Terry, J.A. Taylor: Application 
of the GA/KNN method to SELDI proteomics data, 
Bioinformatics 20(10), 1638-1640 (2004) 

l.-S. Oh, J.-S. Lee, B.-R. Moon: Hybrid genetic al- 
gorithms for feature selection, IEEE Trans. Pattern 
Anal. Mach. Intell. 26(11), 1424-1437 (2004) 

P. Xuan, M.Z. Guo, J. Wang, C.Y. Wang, XY. Liu, Y. Liu: 
Genetic algorithm-based efficient feature selection 
for classification of pre-miRNAs, Genet. Mol. Res. 
10(2), 588-603 (2011) 

S. Peng: Molecular classification of cancer types 
from microarray data using the combination of 
genetic algorithms and support vector machines, 
FEBS Letters 555(2), 358-362 (2003) 


61.38 


61.39 


61.40 


61.41 


61.42 


61.43 


61.44 


61.45 


61.46 


61.47 


61.48 


61.49 


61.50 


61.51 


61.52 


C.-L. Huang, C.-J. Wang: A GA-based feature se- 
lection and parameters optimization for support 
vector machines, Expert Syst. Appl. 31(2), 231-240 
(2006) 

E.-G. Talbi, L. Jourdan, J. Garca-Nieto, E. Alba: 
Comparison of population based metaheuristics 
for feature selection: Application to microarray 
data classification, IEEE/ACS Int. Conf. Comput. Syst. 
Appl. (2008) pp. 45-52 

J.C.H. Hernandez, B. Duval, J.-K. Hao: A genetic 
embedded approach for gene selection and classi- 
fication of microarray data, Proc. 5th Eur. Conf. Evol. 
Comput. Mach. Learn. Data Min. Bioinform. (Evo- 
B10'07) (Springer, Berlin, Heidelberg 2007) pp. 90- 
101 

E.B. Huerta, B. Duval, J.-K. Hao: A hybrid GA/SVM 
approach for gene selection and classification of 
microarray data, Lect. Notes Comput. Sci. 3907, 34- 
44 (2006) 

D.P. Muni, N.R. Pal, J. Das: Genetic programming 
for simultaneous feature selection and classifier 
design, IEEE Trans. Syst. Man Cybern. Part B 36(1), 
106-117 (2006) 

J. Yu, J. Yu, A.A. Almal, S.M. Dhanasekaran, 
D. Ghosh, W.P. Worzel, A.M. Chinnaiyan: Feature 
selection and molecular classification of cancer us- 
ing genetic programming, Neoplasia 9(4), 292-303 
(2007) 

J. Liu, H. Iba, M. Ishizuka: Selecting informa- 
tive genes with parallel genetic algorithms in tis- 
sue classification, Genome Inform. Ser. 9, 14-23 
(2001) 

L. Jourdan, C. Dhaenens, E.-G. Talbi: Linkage dis- 
equilibrium study with a parallel adaptive GA, Int. 
J. Found. Comput. Sci. 16(2), 241-260 (2004) 

D. Broadhurst, R. Goodacre, A. Jones, J.-J. Row- 
land, D.B. Kelp: Genetic algorithms as a method 
for variable selection in multiple linear regression 
and partial least squares regression, with applica- 
tions to pyrolysis mass spectrometry, Anal. Chim. 
Acta 348, 71-86 (1997) 

A.W. Whitney: A direct method of nonparamet- 
ric measurement selection, IEEE Trans. Comput. 
C-20(9), 1100-1103 (1971) 

M. Pei, E.D. Goodman, W.F. Punch: Feature ex- 
traction using genetic algorithms, Proc. ist Int. 
Symp. Intell. Data Eng. Learn. (IDEAL), Vol. 98 (1998) 
pp. 371-384 

U.M. Braga-Neto, E.R. Dougherty: Is cross-validat- 
ion valid for small-sample microarray classifica- 
tion?, Bioinformatics 20(3), 374-380 (2004) 

R. Xu, D. Wunsch: Survey of clustering algorithms, 
IEEE Trans. Neural Netw. 16, 645-678 (2005) 

J.B. MacQueen: Some methods for classification 
and analysis of multivariate observations, Proc. 5th 
Berkeley Symp. Math. Stat. Probab. (1967) pp. 281- 
297 

E.R. Hruschka, R.J. Campello, A.A. Freitas, A.C. de 
Carvalho: A survey of evolutionary algorithms for 


Knowledge Discovery in Bioinformatics 


References 


61.53 


61.54 


61.55 


61.56 


61.57 


61.58 


61.59 


61.60 


61.61 


61.62 


61.63 


clustering, IEEE Trans. Syst. Man Cybern. Part C39(2), 
133-155 (2009) 

R.H. Sheikh, M.M. Raghuwanshi, A.N. Jaiswal: Ge- 
netic algorithm based clustering: A survey, 1st Int. 
Conf. Emerg. Trends Eng. Technol. ICETET '08. (2008) 
pp. 314-319 

J. Handl, J. Knowles: An evolutionary approach to 
multiobjective clustering, IEEE Trans. Evol. Comput. 
11(1), 56-76 (2007) 

J. Handl, J. Knowles: Evolutionary multiobjec- 
tive clustering, Parallel Problem Solving Nat. 3242, 
1081-1091 (2004) 

P.C. Ma, K.C. Chan, Y. Xin, D.K. Chiu: An evolutionary 
clustering algorithm for gene expression microarray 
data analysis, IEEE Trans. Evol. Comput. 10(3), 296- 
314 (2006) 

P. Merz, A. Zell: Clustering gene expression pro- 
files with memetic algorithms, Proc. 7th Int. Conf. 
Parallel Problem Solving Nat. (PPSN VII) (Springer, 
London 2002) pp. 811-820 

S. Bandyopadhyay, A. Mukhopadhyay, U. Maulik: 
An improved algorithm for clustering gene ex- 
pression data, Bioinformatics 23(21), 2859-2865 
(2007) 

K. Faceli, M. de Souto, D. de Araujo, A. de Carvalho: 
Multi-objective clustering ensemble for gene ex- 
pression data analysis, Neurocomputing 72(13-15), 
2763-2774 (2009) 

E. Hruschka, L. de Castro, R. Campello: Evolution- 
ary algorithms for clustering gene-expression data, 
4th IEEE Int. Conf. Data Min. (ICDM '04) (2004) 
pp. 403-406 

M.C. Naldi, A. de Carvalho: Clustering using genetic 
algorithm combining validation criteria, Proc. 15th 
Eur. Symp. Artif. Neural Netw. (2007) pp. 139-147 
H.S. Park, S.H. Yoo, S.B. Cho: Evolutionary fuzzy 
clustering algorithm with knowledge-based eval- 
uation and applications for gene expression pro- 
filing, J. Comput. Theor. Nanosci. 2(4), 524-533 
(2005) 

H.S. Park, S.B. Cho: Evolutionary fuzzy cluster anal- 
ysis with bayesian validation of gene expres- 
sion profiles, J. Intell. Fuzzy Syst. 18(6), 543-559 
(2007) 


61.64 


61.65 


61.66 


61.67 


61.68 


61.69 


61.70 


61.71 


61.72 


D. Hutchison, T. Kanade, J. Kittler, J.M. Kleinberg, 
F. Mattern, J.C. Mitchell, M. Naor, O. Nierstrasz, 
C. Pandu Rangan, B. Steffen, M. Sudan, D. Ter- 
zopoulos, D. Tygar, M.Y. Vardi, G. Weikum, H. Liu, 
J. Liu: Clustering protein interaction data through 
chaotic genetic algorithm. In: Simulated Evolution 
and Learning, Vol. 4247, ed. by T.-D. Wang, X. Li, 
S.-H. Chen, X. Wang, H. Abbass, H. Iba, G.-L. Chen, 
X. Yao (Springer, Berlin, Heidelberg 2006) pp. 858- 
864 

J.J. Tapia, E.E. Vallejo, E. Morett: MOCEA: A multi- 
objective clustering evolutionary algorithm for 
inferring protein-protein functional interactions, 
Proc. 11th Annu. Conf. Genet. Evol. Comput. (2009) 
pp. 1793-1794 

I.A. Sarafis, P.W. Trinder, A.M.S. Zalzala: NOCEA: 
A rule-based evolutionary algorithm for effi- 
cient and effective clustering on massive high- 
dimensional databases (invited paper), Int. J. Appl. 
Soft Comput. 7(3), 668-710 (2007) 

J.J. Tapia, E. Morett, E.E. Vallejo: A clustering ge- 
netic algorithm for genomic data mining. In: Foun- 
dations of Computational Intelligence (4), Stud- 
ies in Computational Intelligence, Vol. 204, ed. 
by A. Abraham, A.E. Hassanien, A.C.P.L. de Fer- 
reira Carvalho (Springer, Berlin, Heidelberg 2009) 
pp. 249-275 

Y. Cheng, G.M. Church: Biclustering of expression 
data, Proc. 8th Int. Conf. Intell. Syst. Mol. Biol. 
(ISMB 2000), San Diego (2000) pp. 93-103 

F. Divina, J.S. Aguilar-Ruiz: Biclustering of ex- 
pression data with evolutionary computation, IEEE 
Trans. Knowl. Data Eng. (2006) p. 18 

S. Mitra, H. Banka: Multi-objective evolutionary bi- 
clustering of gene expression data, Pattern Recog- 
nit. 39(12), 2464-2477 (2006) 

K. Seridi, L. Jourdan, E.-G. Talbi: Multi-objective 
evolutionary algorithm for biclustering in microar- 
rays data, IEEE Congr. Evol. Comput. (2011) pp. 2593- 
2599 

J. Handl, D.B. Kell, J. Knowles: Multiobjective op- 
timization in bioinformatics and computational 
biology, IEEE/ACM Trans. Comput. Biol. Bioinform. 
4 (2), 279-292 (2007) 


1223 


19 | 3 Hed 


1225 


62. Integration of Metaheuristics 
and Constraint Programming 


Luca Di Gaspero 


A promising research line in the optimiza- 
tion community regards the hybridization of 
exact and heuristics methods. In this chap- 
ter we survey the specific integration of two 
complementary optimization paradigms, namely 
Constraint Programming, for the exact part, and 
metaheuristics. 


62.1 Constraint Programming 
and Metaheuristics .......................00008 1225 


62.2 Constraint Programming Essentials....... 1226 
62.251 MOGGIIIG sienose aiis 1226 
62.2.2 Solution Methods............. 1227 
DLA SYO oeeie aris 1229 

62.3 Integration of Metaheuristics and CP .... 1230 
62.3.1 Local Search and CP.............20..... 1230 
62.3.2 Genetic Algorithms and CP......... 1233 
623.2 MO and OP. iccicsceccissvcdeieseetiedees 1233 

62.4 COMCIUSIONS .............. cc eeeeeeeeeeeeteeeeeeeeeees 1234 

References. -occire ciassa 1235 


62.1 Constraint Programming and Metaheuristics 


Constraint programming (CP) [62.1,2] is an effective 
methodology for the solution of combinatorial prob- 
lems that has been successfully applied in many do- 
mains. In a nutshell, CP is a declarative programming 
paradigm based on the idea of describing the relations 
(i. e., constraints) between variables that must hold in all 
solutions of the combinatorial problem at hand. For ex- 
ample, in the solution to a Sudoku puzzle, the numbers 
to be placed must be unique with respect to columns, 
rows, and blocks of the board. 

CP has an interdisciplinary nature, since it re- 
lies on contributions and methods from the communi- 
ties of logic programming (LP), artificial intelligence 
(AI), and operations research (OR). Indeed, the sim- 
ple declarative modeling language of CP, consisting 
of variables and constraints, is very similar to those 
available in classical LP languages such as Prolog. 
The solution method features constraint propagation 
which, in its essence, is a reasoning or inference proce- 
dure typical of AI. Finally, especially for optimization 
problems, the solution process makes use of OR in- 
spired branch and bound procedures and/or of dedicated 
OR solvers for specific types of variables/constraints 
(e.g., the simplex method for real variables and linear 
constraints). 


A CP model is an encoding of the problem state- 
ment using the basic CP building blocks, i. e., variables 
and constraints. Once a CP model of the problem un- 
der consideration has been stated, a CP solver is used to 
systematically search the solution space by alternating 
deterministic phases (constraint propagation) and non- 
deterministic phases (variable assignment, tree search), 
thus exploring implicitly or explicitly the whole search 
space. To this respect, CP belongs to the family of com- 
plete (or exact) solution methods. In other words, CP 
guarantees finding the (optimal) solution of the prob- 
lem or proving that the problem is not satisfiable. 

A different approach is usually taken by metaheuris- 
tics [62.3], such as local search [62.4], evolutionary 
algorithms [62.5], and ant colony optimization [62.6], 
just to name a few. These methods are incomplete, since 
they rely on heuristic information to focus on inter- 
esting areas of the search space and, in general, do 
not explore it entirely but are stopped after a given 
time limit. As a consequence, these algorithms do not 
guarantee finding the (optimal) solution, trading com- 
pleteness for a (possibly) greater (empirical) efficiency 
in the solution process. 

Just looking at completeness, it seems that the 
clear choice for solving combinatorial problems would 


1226 PartE 


Evolutionary Computation 


7°79 |3 Hed 


be to always prefer CP over metaheuristics as the 
solution method. However, in practice completeness 
is hindered by the high computational effort due to 
the worst case complexity of the problems consid- 
ered (usually NP-complete or NP-hard). Therefore, for 
practical purposes, also the execution of CP solvers 
is terminated before the whole search space has been 
explored and a number of heuristics is used to focus 
the search in the regions where it is more likely to 
find the solutions of the problem. Consequently, CP 
and metaheuristics could be seen as complementary 
approaches. 

Although these two kinds of methods are have been 
individually studied by separated scientific communi- 
ties (for historical reasons), in recent years we have 
witnessed an increasing interest in the integration of the 
methods. In many cases, indeed, each approach has its 
own strengths and weaknesses, and the general aim of 
method integration is to create hybrid algorithms that 
enhance the strengths of both approaches and (possi- 
bly) overcome some of the weaknesses. To this respect, 
Yunes maintains a web page listing a number of success 
stories of hybrid solution methods [62.7], that is, papers 
describing integrated approaches that outperform single 
optimization methods. 

A number of conferences and workshops specifi- 
cally aiming at bringing together researchers working 
on the integration of solution techniques for combinato- 
rial problems have also recently started. Notable exam- 
ples are the series of CP-AI-OR conferences [62.8, 9], 
started in 1999, and the Hybrid Metaheuristics work- 
shops [62.10-16], started in 2004. The scope of these 
conferences is not limited to the integration of CP 


techniques with metaheuristics, but they also consider 
hybridization among other methods. 

Additionally, a few surveys on the integration of 
complete methods with metaheuristics have appeared in 
the literature [62.17—19]. However, these surveys either 
deal with a particular class of metaheuristics (i. e., local 
search) [62.17, 19] and/or with a different class of com- 
plete methods (integer linear programming) [62.17, 
18]. Jourdan et al. [62.20] also took CP methods into 
account, but they provide mostly a taxonomy of coop- 
eration between optimization methods rather than sur- 
veying the specific integrations. Wallace and Azevedo 
et al. [62.21,22] surveys hybrid algorithms, but from 
a constraint programming viewpoint and mainly in the 
settings of hybrid exact methods. In their recent review 
of hybrid metaheuristics Blum et al. [62.23] include 
a section on the integration of CP with local search and 
ant colony optimization (ACO). However, to the best 
of our knowledge, at present no specific survey on the 
integration of metaheuristics and constraint program- 
ming has been published in the literature. This work 
tries to overcome this lack and to review the different 
approaches specifically employed in the integration of 
CP methods within metaheuristics. 

The chapter is organized as follows. In Sect. 62.2 
the basic concepts of the constraint programming 
paradigm are introduced. They include modeling 
(Sect. 62.2.1), solution methods (Sect. 62.2.2), and CP 
systems (Sect. 62.2.3). The integration of CP with meta- 
heuristics is presented in Sect. 62.3, which is organized 
on the basis of the metaheuristic type involved in the 
integration. Finally, in Sect. 62.4 some conclusions are 
drawn. 


62.2 Constraint Programming Essentials 


In this section, we will briefly describe the essential 
concepts of CP, which are needed to understand the 
following sections. The readers interested in a more de- 
tailed introduction to CP are referred to the book of 
Apt [62.1] and to the recent comprehensive Handbook 
of Constraint Programming [62.2]. 

In order to apply constraint programming to a com- 
binatorial problem one first needs to model it through 
the specific formalism of constraint satisfaction or con- 
strained optimization problems. Afterwards, the model 
can be solved by a CP solver, which alternates the anal- 
ysis of constraints with tree search. Let us review these 
basic concepts. 


62.2.1 Modeling 


Constraint satisfaction problems (CSPs) are a useful 
formalism for modeling many real-world problems, ei- 
ther discrete or continuous. Remarkable examples are 
planning, scheduling, timetabling, just to name a few. 
A CSP is generally defined as the problem of associ- 
ating values (taken from a set of domains) to variables 
subject to a set of constraints. A solution of a CSP is 
an assignment of values to all the variables so that the 
constraints are satisfied. In some cases not all solutions 
are equally preferable and we can associate a cost func- 
tion to the variable assignments. In these cases, we talk 


Integration of Metaheuristics and Constraint Programming 


62.2 Constraint Programming Essentials 


about constrained optimization problems (COPs), and 
we are looking for a solution that (without loss of gen- 
erality) minimizes the cost value. These concepts are 
formally introduced in the following. 


Constraint Satisfaction Problems 


Given: 
@ X={x,...,x,} is a set of variables. 
© D={D,,..., Dx} is a set of domains associated to 


the variables. In other words, each variable x; can 
assume value d; if and only if d; € Dj. 

@ Cisa set of constraints, i. e., mathematical relations 
over Dom = D; x --- x Dx. 


We say that a tuple (d),...,d¢) € Dom satisfies 
a constraint C € C if and only if (d1, .. . , dk} € C. 

A constraint satisfaction problem (CSP) P, de- 
scribed by the triple (X, D, C), is the problem of 
finding the tuples d = (dı, . . . , dy) € Dom that satisfy 
every constraint C € C. Such tuples are called solutions 
of the CSP, and the set of solutions of P is denoted 
by sol(P). 

P is said to be consistent or satisfiable if and only if 
sol(P) # Ø. 

Notice that, depending on the modeling of the com- 
binatorial problem at hand, we could be interested in 
determining different properties of the CSP. In the ex- 
treme case, for example, one could just want to know 
whether the problem is satisfiable, regardless of the ac- 
tual solutions. The most common case is to search and 
provide a single solution to the problem, whereas some- 
times one could be interested in all the solutions. 


Constrained Optimization Problems 
A constrained optimization problem (COP) O= 
(X,D,C,f) is a CSP P = (X, D, C) with an associ- 
ated objective function f : sol(P) > E, where (E, <) 
is a well-ordered set (typically, Æ is one of the sets 
N, Z,R). 

Differently from the previous case, the tuples d € 
sol(©) that satisfy every constraint are called feasible 
solutions, and the set of these tuples is usually assumed 
to be non-empty. A solution of the COP © is a feasible 
solution € € sol() for which the value of the objective 
function f is minimized, i. e., 


Vd € sol(O) f (€) < f(d). 


Observations 
A few observations about this formalism are worth not- 
ing. First, notice that the general framework does not 


impose any restriction on either the type of domains 
and constraints or on the form of the objective func- 
tion that can be used to express the problem. The basic 
type of domain is a finite set of integer values (also 
known as a finite domain), but there are other possibili- 
ties that enhance the expressive power of the modeling 
framework and capture some combinatorial substruc- 
tures of the problem more naturally. For example, it 
is possible to deal with variables whose values are fi- 
nite (multi)sets, (hyper)graphs, real valued intervals, or 
resources of a scheduling problem. Moreover, also the 
kind of constraints that can be employed is quite rich 
and includes arithmetic constraints, set constraints, per- 
mutation, counting and other types of combinatorial 
constraints, resource scheduling constraints, path con- 
straints on graphs, and constraints expressible through 
regular expressions, just to name a few possibilities 
(see [62.24] for a comprehensive set of constraints and 
their implementation in actual CP systems). 

These features clearly make the modeling phase 
easier and more precise with respect to other for- 
malisms such as integer linear programming. Indeed, 
part of the combinatorial structure of the problem 
can be directly captured by the use of complex do- 
mains/constraints and, as for the objective function, 
there is no general limitation on its form, in particular, 
there is no assumption of linearity. 

Another important point to be noticed regards the 
role of constraints. Differently from other modeling 
formalisms, which distinguish between constraints that 
must be satisfied (called hard constraints) and that 
should preferably be satisfied (soft constraints), in the 
original CSP/COP framework constraints are all hard 
and the solution methods, described in the following 
section, consider it mandatory to satisfy all of them. 
There have been several attempts in the CP literature to 
include soft constraints in the general framework (see, 
e.g., [62.25] for a review) but the most common way to 
handle them is to include a measure of their violation in 
the objective function of the problem. 


62.2.2 Solution Methods 


CP solution methods basically exploit a form of tree 
search that interleaves a branching phase with an anal- 
ysis of constraints called constraint propagation. These 
two components are described in the following. 


Branching and Tree Search 
Once the combinatorial problem has been modeled as 
a CSP or a COP, CP solves it by constructing a solution 


1227 


7°79 |3 Hed 


1228 PartE 


Evolutionary Computation 


7°79 |3 Hed 


by a process that exploits a non-deterministic variable 
assignment, where one value is selected together with 
one value in its current domain. This phase is also called 
labeling using (constraint) logic programming termi- 
nology, and a solution to the problem is a complete 
labeling. The process proceeds by recursively checking 
whether the current labeling can be extended to a con- 
sistent solution or, in the negative case, undoing the 
current assignment. 

The pseudocode of the procedure, called (chrono- 
logical) backtracking, is given in Algorithm 62.1. 
The procedure is at first called with the full 
set of variables and empty labeling as follows 
Backtracking(X, Ø, C, Dom). The procedure performs 
an implicit form of tree search, where a branch is iden- 
tified by the selection of one variable (a node of the 
search tree) and all the possible values for that variable 
(the edges). 

Note that, at each step of the recursive procedure, 
the choice of the variable and the value to branch on is 
non-deterministic. Therefore, these choices are suscep- 
tible to heuristics to enhance performances. 

In addition, there are also other possibilities to de- 
fine a branching rule. For example, instead of selecting 
a possible value for the variable selected (i. e., the as- 
signment x; := v), the branching rule could split the 
domain of a given variable x; in two by selecting a value 
v € D; and adding the constraint x; < v on one branch 
and x; > v on the other. 


Consistency and Constraint Propagation 
The check for solution consistency does not need all 
the variables to be instantiated, in particular for de- 
tecting the unsatisfiability of the CSP with respect 
to some constraint. For example, in Algorithm 62.2, 
the most straightforward implementation of the proce- 
dure Consistent(L, C, Dom) is reported. The procedure 
simply checks whether the satisfiability of a given con- 
straint can be ascertained according to the current label- 
ing (i. e., if all of the constraint variables are assigned). 
However, the reasoning about the current labeling with 
respect to the constraints of the problem and the do- 
mains of the unlabeled variables does not necessarily 
need all the variables appearing in a constraint to be in- 
stantiated. Moreover, the analysis can prune (as a side 
effect) the domains of the unlabeled variables while 
preserving the set of solutions sol(P), making the ex- 
ploration of the subtree more effective. This phase is 
called constraint propagation and is interleaved with 
the variable assignment. In general, the analysis of each 
constraint is repeated until a fixed point for the current 


situation is achieved. In the case that one of the do- 
mains becomes empty consistency cannot be achieved 
and, consequently, the procedure returns a fail. 

Different notions of consistency can be employed. 
For example, one of the most common and most stud- 
ied notions is hyper-arc consistency [62.26]. For a k-ary 
constraint C it checks the compatibility of a value v in 
the domain of one of the variables with the currently 
possible combinations of values for the remaining k— 1 
variables, pruning v from the domain if no support- 
ing combination is found. The algorithms that maintain 
hyper-arc consistency have a complexity that is polyno- 
mial in the size of the problem (measured in terms of 
number of variables/constraints and size of domains). 
Other consistency notions have been introduced in the 
literature, each having different pruning capabilities and 
computational complexity, which are, usually, propor- 
tionally related to their effectiveness. 

One of the major drawbacks of (practical) consis- 
tency notions is that they are local in nature, that is, they 
just look at the current situation (partial labeling and 
current domains). This means that it would be impossi- 
ble to detect future inconsistencies due to the interaction 
of variables. A basic technique, called forward check- 
ing, can be used to mitigate this problem. This method 
exploits a one-step look-ahead with respect to the cur- 
rent assignment, i. e., it simulates the assignment of pair 
of variables, instead of a single one, thus evaluating the 
next level of the tree through a consistency notion. This 
technique can be generalized to several other problems. 


Algorithm 62.1 Backtracking (U, L, C, Dom) 
1: if U = Ø then 
2 return L 
3: end if 
4: pick variable x; € U 
/*possibly x; is selected non-deterministically*/ 


5: for v € Di /*Try to label x; with value v*/ do 
6: Dom’ < Dom 
7: if Consistent(L U {x := v}, C, Dom’) 
/*consistency notions can be different and have 
side effects on Dom*/ then 
8: r < Backtracking(U \ {x}, LU {x := 
v}, C, Dom’) 
9: if r Æ fail then 
10: return r /*a consistent assignment has 
been found for the variables in U \ {x;} with 
respect to x; := v*/ 
11: end if 
12: endif 
13: end for 


Integration of Metaheuristics and Constraint Programming 


62.2 Constraint Programming Essentials 


14: return fail /*backtrack to the previous variable (no 
consistent assignment has been found for x;)*/ 


Algorithm 62.2 Consistent (L, C, Dom) 
1: for C € C do 
2: if all variables in C are labeled in LAC is not 
satisfied by L then 
return fail 
end if 
end for 
return true 


Do 


Algorithm 62.3 BranchAndBound (U, L, C, Dom, f, 
b, Ly) 
1: if U = Ø then 
2: if f(L) <b then 
3 b<f(L) L, -L 
4: endif 
5: else 
6: pick variable x; € U 
/*possibly x; is selected non-deterministically*/ 
7: for v € D; /*Try to label x; with value v*/ do 
8: Dom’ < Dom 
9: if Consistent(L U {x := v}, C, Dom’) A 
bound(f, LU {x := v}, Dom’) < b 
/*additionally verify whether the current 
solution is bounded*/ then 


10: BranchAndBound(U \ {x}, LU {x := v}, C, 
Dom’, f, b, Lp) 

11: end if 

12: end for 


13: end if 


Branch and Bound 

In the case of a COP, the problem is solved by exploring 
the set sol(O) in the way above, storing the best value 
for f found as sketched in Algorithm 62.3. However, 
a constraint analysis (bound(f, LU {x := v}, Dom’)) 
based on a partial assignment and on the best value 
already computed, might allow to sensibly prune the 
search tree. This complete search heuristic is called 
(with a slight ambiguity with respect to the same con- 
cept in operations research) branch and bound. 


62.2.3 Systems 


A number of practical CP systems are available. 
They mostly differ with regards to the targeted pro- 


gramming language and modeling features available. 
For historical reasons, the first constraint program- 
ming systems were built around a Prolog system. 
For example, SICStus Prolog [62.27], was one of 
the first logic programming systems supporting con- 
straint programming which is still developed and re- 
leased under a commercial license. Another Prolog- 
based system specifically intended for constraint pro- 
gramming is ECL'PS® [62.28], which differently from 
SICStus Prolog is open source. Thanks to their 
longevity, both systems cover many of the model- 
ing features described in the previous sections (such 
as different type of domains, rich sets of constraints, 
etc.). 

Another notable commercial system specifically de- 
signed for constraint programming is the ILOG CP 
optimizer, now developed by IBM [62.29]. This sys- 
tem offers modeling capabilities either by means of 
a dedicated modeling language (called OPL [62.30]) or 
by means of a callable library accessible from differ- 
ent imperative programming languages such as C/C++, 
Java, and C#. The modeling capabilities of the system 
are mostly targeted to scheduling problems, featuring 
a very rich set of constructs for this kind of prob- 
lems. Interestingly, this system is currently available 
at no cost for researchers through the IBM Academic 
Initiative. 

Open source alternatives that can be interfaced with 
the most common programming languages are the C++ 
libraries of Gecode [62.31], and the Java libraries of 
Choco [62.32]. Both systems are well documented and 
constantly developed. 

A different approach has been taken by other au- 
thors, who developed a number of modeling languages 
for constraint satisfaction and optimization problems 
that can be interfaced to different type of general 
purpose CP solvers. A notable example is MiniZinc 
[62.33], which is an expressive modeling language for 
CP. MiniZinc models are translated into a lower level 
language, called FlatZinc, that can be compiled and ex- 
ecuted, for example, by Gecode, ECL'PS® or SICStus 
prolog. 

Finally, a mixed approach has been taken by the 
developers of Comet [62.34]. Comet is a hybrid CP 
system featuring a specific programming/modeling lan- 
guage and a dedicated solver. The system has been 
designed with hybridization in mind and, among other 
features, it natively supports the integration of meta- 
heuristics (especially in the family of local search 
methods) with CP. 


1229 


7°79 |3 Hed 


1230 PartE | Evolutionary Computation 


€'Z9 | 3 Hed 


62.3 Integration of Metaheuristics and CP 


Differently from Wallace [62.21], we will review the 
integration of CP with metaheuristics from the perspec- 
tive of metaheuristics, and we classify the approaches 
on the basis of the type of metaheuristic employed. 
Moreover, following the categorization of Puchinger 
and Raidl [62.18], we are mostly interested in review- 
ing the integrative combinations of metaheuristics and 
constraint programming, i. e., those in which constraint 
programming is embedded as a component of a meta- 
heuristic to solve a subproblem or vice versa. 

Indeed, the types of collaborative combinations are 
either straightforward (e.g., collaborative-sequential ap- 
proaches using CP as a constructive algorithm for 
finding a feasible initial solution of a problem) or rather 
uninvestigated (e.g., parallel or intertwined hybrids of 
metaheuristics and CP). 


62.3.1 Local Search and CP 


Local search methods [62.4] are based on an iterative 
scheme in which the search moves from the current so- 
lution to an adjacent one on the basis of the exploration 
of a neighborhood obtained by perturbing the current 
solutions. 

The hybridization of constraint programming with 
local search metaheuristics is the most studied one and 
there is an extensive literature on this subject. 


CP Within Local Search 

The integration of CP within local search methods is 
the most mature form of integration. It dates back 
to the mid 1990s [62.35], and two main streams are 
identifiable to this respect. The first one consists in 
defining the search of the candidate neighbor (e.g., 
the best one) as a constrained optimization problem. 
The neighborhoods induced by these definitions can 
be quite large, therefore, a variant of this technique 
is known by the name of large neighborhood search 
(LNS) [62.36]. The other kind of integration, lately 
named constraint-based local search (CBLS) [62.34], 
is based on the idea of expressing local search algo- 
rithms by exploiting constraint programming primitives 
in their control (e.g., for constraint checks during the 
exploration of the neighborhood) [62.37]. In fact, the 
two streams have a non-empty intersection, since the 
CP primitives employed in CBLS could be used to 
explore the neighborhood in a LNS fashion. In the fol- 
lowing sections we review some of the work in these 
two areas. 


A few surveys on the specific integration between 
local search and constraint programming exist, for ex- 
ample [62.38, 39]. 


Large Neighborhood Search. In LNS [62.36, 40] an 
existing solution is not modified just by applying small 
perturbations to solutions but a large part of the prob- 
lem is perturbed and searched for improving solutions 
in a sort of re-optimization approach. This part can be 
represented by a set F C X of released variables, called 
fragment, which determines the neighborhood relation 
N . Precisely, given a solutions = (d,,...,d,) anda set 
F C{X,...,X;} of free variables, then 


N(s, F) = {(e1,.--, ex) € Sol(O) 
: (Xi E F) > (ei =d))} . 


Given F, the neighborhood exploration is performed 
through CP methods (i. e., propagation and tree search). 

The pseudocode of the general LNS procedure is 
shown in Algorithm 62.4. Notice that in the proce- 
dure there are a few hotspots that can be customized. 
Namely, one of the key issues of this technique con- 
sists in the criterion for the selection of the set F 
given the current solution s, which is denoted by 
SelectFragment(s) in the algorithm. The most straight- 
forward way to select it is to randomly release a per- 
centage of the problem variables. However, the vari- 
ables in F could also be chosen in a structured 
way, i. e., by releasing related variables simultaneously. 
In [62.41], the authors compare the effectiveness of 
these two alternative choices in the solution of a job- 
shop scheduling problem. 

Also the upper bounds employed for the branch and 
bound procedure can be subject to a few design alter- 
natives. A possibility, for example, is to set the bound 
value to f ($p), the best solution value found that far, so 
that the procedure is forced to search at each step only 
for improving solutions. This alternative can enhance 
the technique when the propagation on the cost func- 
tions is particularly effective in pruning the domains 
of the released variables. At the opposite extreme, in- 
stead, the upper bound could be set to an infinite value 
so that a solution is searched regardless whether or not 
it is improving the cost function with respect to the cur- 
rent incumbent. 

Moreover, another design point is the solution 
acceptance criterion, which is implemented by the 
AcceptSolution function. In general, all the classical lo- 


Integration of Metaheuristics and Constraint Programming | 62.3 Integration of Metaheuristics and CP 1231 


cal search solution acceptance criteria are applicable, 
obviously in dependence on the neighborhood selec- 
tion criterion employed. For example, in the case of 
randomly released variables a Metropolis acceptance 
criterion could be adequate to implement a sort of sim- 
ulated annealing. 

Finally, the TerminateSearch criterion is one of 
those usually adopted in non-systematic search meth- 
ods, such as the expiration of a time/iteration budget, 
either absolute or relative, or the discovery of an opti- 
mal solution. 


Algorithm 62.4 LargeNeighborhoodSearch (X, C, 
Dom, f) 
1: create a (feasible) initial solution sp = (d?, ae dł) 
/*possibly random or finding the first feasible 
solution of the full CP model*/ 
2: Sp = So 
3: i< 0 
4: while not TerminateSearch(i, 5;,f(5;), 5p) do 
5: F <SelectFragment(s;) 
/*strategy for selecting the released variables*/ 
6 Le (y= ding F} 
T: U< F 
8: BranchAndBound(U, L, C, Dom, f, 
ChooseBounds(s;, 5,)) 
/*neighborhood exploration*/ 
9: if AcceptSolution(L) then 


10: Sit <L 

11: if f(si+1) < f (Sp) then 
12: Sp — Sit 

13: end if 

14: else 

15: Sit = Si 

16: endif 

17: i<i+1 


18: end while 
19: return s, 


LNS has been successfully applied to routing prob- 
lems [62.36, 42—45], nurse rostering [62.46], university 
course timetabling [62.47], protein structure predic- 
tion [62.48, 49], and car sequencing [62.50]. 

Cipriano etal. propose GELATO, a modeling 
language and a hybrid solver specifically designed for 
LNS [62.51-53]. The system has been tested on a set 
of benchmark problems, such as the asymmetric travel- 
ing salesman problem, minimum energy broadcast, and 
university course timetabling. 

The developments of the LNS technique in the 
wider perspective of very large neighborhood search 


(VLNS) was recently reviewed by Pisinger and 
Ropke [62.54]. Charchrae and Beck [62.55] also pro- 
pose a methodological contribution to this area with 
some design principles for LNS. 


Constraint-Based Local Search. The idea of encod- 
ing a local search algorithm by means of constraint 
programming primitives was originally due to Pesant 
and Gendreau [62.35, 56], although in their papers they 
focus on a framework that allows neighborhoods to be 
expressed by means of CP primitives. The basic idea 
is to extend the original CP model of the problem with 
a sort of surrogate model comprising a set of variables 
and constraints that intentionally describe a neighbor- 
hood of the current solution. 

A pseudocode of CBLS defined along these lines 
is reported in Algorithm 62.5. The core of the proce- 
dure is at line 5, which determines the neighborhood 
model on the basis of the current solution. The main 
components of the neighborhood model are the new set 
of variables Y and constraints Cy y that act as an in- 
terface of the neighborhood variables Y with respect to 
those of the original problem X. For example, the classi- 
cal swap neighborhood, which perturbs the value of two 
variables of the problem by exchanging their values, 
can be modeled by the set Y = {y,, y2}, consisting of the 
variables to exchange, and with the interface constraints 


Qi =iAy2 =f) SS = FH AGH=Si) 
Vije {l,...,n}. 


Moreover, an additional component of the neighbor- 
hood model is the evaluator of the move impact Af, 
which can be usually computed incrementally on the 
basis of the single move. 

It is worth noticing that the use of different mod- 
eling viewpoints is common practice in constraint pro- 
gramming. In classical CP modeling the different view- 
points usually offer a convenient way to express some 
constraint in a more concise or more efficient manner. 
The consistency between the viewpoints is maintained 
through the use of channeling constraints that link the 
different modelings. Similarly, although with a different 
purpose, in CBLS the linking between the full problem 
model and the neighborhood model is achieved through 
interface constraints. 


Algorithm 62.5 ConstraintBasedLocalSearch (X, Cy, 
Domy, f) 
1: create a (feasible) initial solution Sọ = (d, sts , dp) 
/*possibly random or finding the first feasible 
solution of the original CP model*/ 


€°79 | 3 Hed 


1232 


€°@9 | 3 Hed 


Evolutionary Computation 


2: Sp <= So 

3: i 0 

4: while not TerminateSearch(i, 5;,f(5;),5,) do 

5: (Y, Cyy, Domy, Af) <_— 
NeighborhoodModel(s;) 

6 Lø 

7 U<yY 

8: BranchAndBound(U, L, Cy,y, Domy, Af) 
/*neighborhood exploration*/ 

9: if AcceptSolution(L) then 


10: Si+1 <— Apply(Z, s;) 
Ti; if f(si+1) < f (Sp) then 
12; Sp = Sit 

13: end if 

14: else 

15: Sit <= Si 

16: endif 

17: i<i+l 


18: end while 
19: return s, 


This stream of research has been revamped thanks 
to the design of the Comet language [62.34, 57], the aim 
of which is specifically to support declarative compo- 
nents inspired from CP primitives for expressing local 
search algorithms. An example of such primitives are 
differentiable invariants [62.58], which are declarative 
data structures that support incremental differentiation 
to effectively evaluate the effect of local moves (i. e., the 
Af in Algorithm 62.5). Moreover, Comet support con- 
trol abstractions [62.59, 60] specifically designed for 
local search such as the neighbors construct, which 
aims at expressing the unions of heterogeneous neigh- 
borhoods. Finally, Comet has been extended also to 
support distributed computing [62.61]. 

The embedding of local search within a constraint 
programming environment and the employment of 
a common programming language makes it possible 
to automatize the synthesis of CBLS algorithms from 
a high-level model expressed in Comet [62.62, 63]. The 
synthesizer analyzes the combinatorial structure of the 
problem, expressed through the variables and the con- 
straints, and combines a set of basic recommendations, 
which are the basic constituents of the synthesized 
algorithm. 


Other Integrations. The idea of exploring with lo- 
cal search a space of incomplete solutions (i. e., those 
where not all variables have been assigned a value) 
exploiting constraint propagation has been pursued, 
among others, by Jussien and Lhomme [62.64] for 


an open-shop scheduling problem. Constraint prop- 
agation employed in the spirit of forward checking 
and, more in general, look-ahead has been effectively 
employed, among others, by Schaerf [62.65] and Prest- 
wich [62.66], respectively, for scheduling and graph 
coloring problems. 


Local Search Within CP 

Moving to the integration of local search within con- 
straint programming, the most common utilization of 
local search-like techniques consists in limiting the ex- 
ploration of the tree search only to paths that are “close” 
to a reference one. An example of such a procedure is 
limited discrepancy search (LDS) [62.67], an incom- 
plete method for tree search in which only neighboring 
paths of the search tree are explored, where the proxim- 
ity is defined in terms of different decision points called 
discrepancies. Only the paths (i. e., complete solutions) 
with at most k discrepancies are considered, as outlined 
in Algorithm 62.6. 


Algorithm 62.6 LimitedDiscrepancySearch (X, C, 
Dom, f, k) 

1: s* < FirstSolution(X, C, Dom, f) 

2: Sp < 5* 

3: forie {1,...,k} do 

4: forte {5:5 differs w.r.t. 5* for 

exactly i variables} do 

5 if Consistent(t, Dom) ^ f (t) < f ($p) then 

6: Sp <t 

T: end if 

8: end for 

9: end for 
0: return s, 


Another approach due to Prestwich [62.68] is called 
incomplete dynamic backtracking. Differently from 
LDS, in this approach proximity is defined among par- 
tial solutions, and when backtracking needs to take 
place it is executed by randomly unassigning (at most) 
b variables. This way, the method could be intended as 
a local search on partial solutions. In fact, the method 
also features other CP machinery, such as forward 
checking, which helps in boosting the search. 

An alternative possibility is to employ local search 
in constraint propagation. Local probing [62.69, 70] is 
based on the partition of constraints into the set of 
easy and hard ones. At each choice point in the search 
tree the set of easy constraints is dealt with a lo- 
cal search metaheuristic (namely simulated annealing), 
while the hard constraints are considered by classi- 


Integration of Metaheuristics and Constraint Programming | 62.3 Integration of Metaheuristics and CP 1233 


cal constraint propagation. This idea generalizes the 
approach of Zhang and Zhang [62.71], who first pre- 
sented such a combination. Another similar approach 
was taken by Sellmann and Harvey [62.72], who used 
local search to propagate redundant constraints. 

In [62.73] the authors discuss the incorporation of 
the tabu search machinery within CP tree search. In 
particular, they look at the memory mechanisms for 
limiting the size of the tree and the elite candidate list 
for keeping the most promising choices in order to be 
evaluated first. 


62.3.2 Genetic Algorithms and CP 


A genetic algorithm [62.5] is an iterative metaheuris- 
tic in which a population of strings, which represent 
candidate solutions, evolves toward better solutions in 
a process that mimics natural evolution. The main com- 
ponents of the evolution process are crossover and 
mutation operators, which, respectively, combine two 
parent solutions generating an offspring and mutate 
a given solution. Another important component is the 
strategy for the offspring selection, which determines 
the population at the next iteration of the process. 

To the best of our knowledge, one of the first at- 
tempts to integrate constraint programming and genetic 
algorithms is due to Barnier and Brisset [62.74]. They 
employ the following genetic representation: given 
a CSP with variables {X,...,X;,}, the i-th gene in 
the chromosomes is related to the variable X; and it 
stores a subset of the domain D; that is allowed to be 
searched. Each chromosome is then decoded by CP, 
which searches for the best solution of the sub-CSP in- 
duced by the restrictions in the domains. The genetic 
operators used are a mutation operator that changes val- 
ues on the subdomain of randomly chosen genes and 
a crossover operator that is based on a recombination of 
the set-union of the subdomains of each pair of genes. 
The method was applied to a vehicle routing problem 
and outperformed both a CP and a GA solver. 

A different approach, somewhat similar to local 
probing, was used in [62.75] for tackling a production 
scheduling problem. In this case, the problem variables 
are split into two sets, defining two coupled subprob- 
lems. The first set of variables is dealt with by the 
genetic algorithm, which determines a partial schedule. 
This partial solution is then passed to CP for complet- 
ing (and optimizing) the assignment of the remaining 
variables. 

Finally, CP has been used as a post-processing 
phase for optimizing the current population in the spirit 


of memetic algorithms. In [62.76] CP actually acts 
as an unfeasibility repairing method for a university 
course timetabling problem, whereas in [62.77] the op- 
timization performed by CP on a flow-shop scheduling 
problem is an alternative to the classical local search 
applied in memetic algorithms. This approach is illus- 
trated in Algorithm 62.7. 


Algorithm 62.7 A Memetic Algorithm with CP for 
Flow-Shop scheduling (adapted from [62.77]) 
1: generate an initial population P = {p;,...,p;} of 
permutations of n jobs (each composed of k tasks 
Ti; whose start time and end time are denoted by oj 
and nj respectively) 


2: g<0 
3: while not TerminateSearch(g, P, min,cpf(p)) do 
4: select pı and p2 from P by binary tournament 
5: c< pı ® p2 /*apply crossover*/ 
6: if f(c) > min epf (p) then 
R mutate c under probability pm 
8: endif 
9: decode c = (c1, .. . , Cn) to the set of precedence 
constraints C = {Nka < Olgy J51,- n—1} 
10: L<@ 
11: U < {oy nyi:i=l,...,kj=1,...,n} 
12: BranchAndBound(U, L, C U {ny < oj41; : i = 
. , k}, Dom, f) 
13: iff(c) > max epf (p) then 
14: discard c 
15: else 
16: select r by reverse binary tournament 
17: c replaces r in P 
18: endif 


19: g<g+l 
20: end while 
21: return arg min epf (p) 


62.3.3 ACO and CP 


Ant colony optimization [62.6] is an iterative construc- 
tive metaheuristic, inspired by ant foraging behavior. 
The ACO construction process is driven by a proba- 
bilistic model, based on pheromone trails, which are 
dynamically adjusted by a learning mechanism. 

The first attempt to integrate ACO and CP is due 
to Meyer and Ernst [62.78], who apply the method for 
solving a job-shop scheduling problem. The proposed 
procedure employs ACO to learn the variable and value 
ordering used by CP for branching in the tree search. 
The solutions found by the CP procedure are fed back 


€'Z9 | 3 Hed 


1234 PartE 


Evolutionary Computation 


7°79 | 3 Hed 


to the ACO in order to update its probabilistic model. 
In this approach, ACO can be conceived as a master 
online-learning branching heuristic aimed at enhancing 
the performance of a slave CP solver. 

A slightly different approach was taken by 
Khichane etal. [62.79, 80]. Their hybrid algorithm 
works in two phases. At first, CP is employed to sample 
the space of feasible solutions, and the information col- 
lected is processed by the ACO procedure for updating 
the pheromone trails according to the solutions found 
by CP. In the second phase, the learned pheromone in- 
formation is employed as the value ordering used for 
CP branching. This approach, differently from the pre- 
vious one, uses the learning capabilities of ACO in an 
offline-learning fashion. 

More standard approaches in which CP is used to 
keep track of the feasibility of the solution constructed 
by ACO and to reduce the domains through constraint 
propagation have been used by a number of authors. 
Khichane etal. apply this idea to job-shop schedul- 
ing [62.78] and car sequencing [62.79, 81]. Their gen- 
eral idea is outlined in Algorithm 62.8, where each ant 
maintains a partial assignment of values to variables. 
The choice to extend the partial assignment with a new 
variable/value pair is driven by the pheromone trails 
and the heuristic factors in lines 7—8 through a standard 
probabilistic selection rule. Propagation is employed at 
line 10 to prune the possible values for the variables not 
included in the current assignment. 


62.4 Conclusions 


In this chapter we have reviewed the basic concepts of 
constraint programming and its integration with meta- 
heuristics. Our main contribution is the attempt to give 
a comprehensive overview of such integrations from the 
viewpoint of metaheuristics. 

We believe that the reason why these integrations 
are very promising resides in the complementary mer- 
its of the two approaches. Indeed, on the one hand, 
metaheuristics are, in general, more suitable to deal 
with optimization problems, but their treatment of con- 
straints can be very awkward, especially in the case of 
tightly constrained problems. On the other hand, con- 
straint programming is specifically designed for finding 


Another work along this line is due to Benedettini 
et al. [62.82], who integrate a constraint propagation 
phase for Boolean constraints to boost a ACO approach 
for a bioinformatics problem (namely, haplotype infer- 
ence). Finally, in the same spirit of the previous idea, 
Crawford et al. [62.83, 84] employ a look-ahead tech- 
nique within ACO and apply the method to solve set 
covering and set partitioning problems. 


Algorithm 62.8 Ant Constraint Programming 
(adapted from [62.79]) 


1: initialize all pheromone trails to Tmax 


2: g<0 
3: repeat 
4 fork €{1,...,n}do 
5 Ax < @ 
6: repeat 
T select a variable x € X so that x; ¢ var( Ax) 
according to the pheromone trail 7 
8: choose a value v €D; according to the 
pheromone trail t; and a heuristic factor nj, 
9: add {x := v} to A, 
10: Propagate( Ax, C) 
11: until var(.A;) = X or Failure 
12: update pheromone trails using {A1,..., An} 
13: end for 
14: until var(A;)=Xfor somei€ {1,(...),n} or 


TerminateSearch(g, A;) 


feasible solutions, but it is not particularly effective 
for handling optimization. Consequently, a hybrid al- 
gorithm that uses CP for finding feasible solutions and 
metaheuristics to search among them has good chances 
to outperform its single components. 

Despite the important steps made in this field during 
the last decade, there are still promising research oppor- 
tunities, especially in order to investigate topics such 
as collaborative hybridization of CP and metaheuristics 
and validate existing integration approaches in the yet 
uninvestigated area of multiobjective optimization. We 
believe that further research should devote more atten- 
tion to these aspects. 


Integration of Metaheuristics and Constraint Programming 


References 


References 

62.1 K.R. Apt: Principles of Constraint Programming Computer Science, Vol. 5818 (Springer, Berlin, Hei- 
(Cambridge Univ. Press, Cambridge 2003) delberg 2009) 

62.2 F. Rossi, P. van Beek, T. Walsh: Handbook of Con- 62.16 M.J. Blesa, C. Blum, G.R. Raidl, A. Roli, M. Sam- 
straint Programming, Foundations of Artificial In- pels (Eds.): Hybrid Metaheuristics: 7th International 
telligence (Elsevier Science, Amsterdam 2006) Workshop (HM 2010), Lecture Notes in Computer 

62.3 M. Dorigo, M. Birattari, T. Stützle: Metaheuristic. In: Science, Vol. 6373 (Springer, Berlin, Heidelberg 
Encyclopedia of Machine Learning, ed. by C. Sam- 2010) 
mut, G.I. Webb (Springer, Berlin, Heidelberg 2010) 62.17 |. Dumitrescu, T. Stiitzle: Combinations of local 
p. 662 search and exact algorithms, Lect. Notes Comput. 

62.4 H.H. Hoos, T. Stützle: Stochastic Local Search: Foun- Sci. 2611, 211-223 (2003) 
dations & Applications (Morgan Kaufmann, San 62.18 J. Puchinger, G. Raidl: Combining metaheuristics 
Francisco 2004) and exact algorithms in combinatorial optimiza- 

62.5 C. Sammut: Genetic and evolutionary algorithms. tion: A survey and classification, Lect. Notes Com- 
In: Encyclopedia of Machine Learning, ed. by put. Sci. 3562, 113-124 (2005) 

C. Sammut, G.I. Webb (Springer, Berlin, Heidelberg 62.19 S. Fernandes, H. Ramalhinho Dias Lourenço: Hy- 
2010) pp. 456-457 brids combining local search heuristics with exact 

62.6 M. Dorigo, M. Birattari: Ant colony optimization. In: algorithms, V Congr. Esp. Metaheurísticas, Algoritm. 
Encyclopedia of Machine Learning, ed. by C. Sam- Evol. Bioinspirados (MAEB2007), Tenerife, ed. by 
mut, G.I. Webb (Springer, Berlin, Heidelberg 2010) F. Rodriguez, B. Mélian, J.A. Moreno, J.M. Moreno 
pp. 36-39 (2007) pp. 269-274 

62.7 T. Yunes: Success stories in integrated optimiza- 62.20 L.Jourdan, M. Basseur, E.-G. Talbi: Hybridizing ex- 
tion (2005) http://moya.bus.miami.edu/~tallys/ act methods and metaheuristics: A taxonomy, Eur. 
integrated.php J. Oper. Res. 199(3), 620-629 (2009) 

62.8 W. J. van Hoeve: CPAIOR conference series (2010) 62.21 M. Wallace: Hybrid algorithms in constraint pro- 
available online from http://www.andrew.cmu. gramming, Lect. Notes Comput. Sci. 4651, 1-32 
edu/user/vanhoevelcpaior/ (2007) 

62.9 P. van Hentenryck, M. Milano (Eds.): Hybrid Op- 62.22 F. Azevedo, P. Barahona, F. Fages, F. Rossi (Eds.): 
timization: The Ten Years of CPAIOR, Springer Op- Recent Advances in Constraints: 11th Annual ERCIM 
timization and Its Applications, Vol. 45 (Springer, International Workshop on Constraint Solving and 
Berlin 2011) Contraint Logic Programming (CSCLP 2006), Lec- 

62.10 C. Blum, A. Roli, M. Sampels (Eds.): Hybrid Meta- ture Notes in Computer Science, Vol. 4651 (Springer, 
heuristics, First International Workshop (HM 2004), Berlin, Heidelberg 2007) 

Valencia (2004) 62.23 C. Blum, J. Puchinger, G.R. Raidl, A. Roli: Hy- 

62.11 M.J. Blesa, C. Blum, A. Roli, M. Sampels (Eds.): brid metaheuristics in combinatorial optimization: 
Hybrid Metaheuristics: Second International Work- A survey, Appl. Soft Comput. 11(6), 4135-4151 (2011) 
shop (HM 2005), Lecture Notes in Computer Science, 62.24 N. Beldiceanu, H. Simonis: Global constraint cata- 
Vol. 3636 (Springer, Berlin, Heidelberg 2005) log (2011), available online from http://www.emn. 

62.12 F. Almeida, M.J. Blesa Aguilera, C. Blum, fr/z-info/sdemasse/gccat/ 

J.M. Moreno-Vega, M. Pérez, A. Roli, M. Sampels 62.25 P. Meseguer, F. Rossi, T. Schiex: Soft constraints. 
(Eds.): Hybrid Metaheuristics: Third International In: Handbook of Constraint Programming, Foun- 
Workshop, Lecture Notes in Computer Science, dations of Artificial Intelligence, ed. by F. Rossi, 
Vol. 4030 (Springer, Berlin, Heidelberg 2006) P. van Beek, T. Walsh (Elsevier, Amsterdam 

62.13 T. Bartz-Beielstein, M.J. Blesa Aguilera, C. Blum, 2006) 

B. Naujoks, A. Roli, G. Rudolph, M. Sampels (Eds.): 62.26 A.K. Mackworth: Consistency in networks of rela- 
Hybrid Metaheuristics: 4th International Workshop tions, Artif. Intell. 8(1), 99-118 (1977) 

(HM 2007), Lecture Notes in Computer Science, 62.27 SlICStus prolog homepage, available online from 
Vol. 4771 (Springer, Berlin, Heidelberg 2007) http://www.sics.se/isl/sicstuswwwi/site/index.html 

62.14 M.J. Blesa, C. Blum, C. Cotta, A.J. Fernández, 62.28 K.R.Apt,M. Wallace: Constraint Logic Programming 
J.E. Gallardo, A. Roli, M. Sampels (Eds.): Hybrid Using Eclipse (Cambridge Univ. Press, Cambridge 
Metaheuristics: 5th International Workshop (HM 2007) 

2008), Lecture Notes in Computer Science, Vol.5296 62.29 ILOG CP optimizer, available online from 
(Springer, Berlin, Heidelberg 2008) http://www-01.ibm.com/software/integration/ 

62.15 M.J. Blesa, C. Blum, L. Di Gaspero, A. Roli, M. Sam- optimization/cplex-cp-optimizer/ 

pels, A. Schaerf (Eds.): Hybrid Metaheuristics: 6th 62.30 P. van Hentenryck: The OPL Optimization Program- 


International Workshop (HM 2009), Lecture Notes in 


ming Language (MIT Press, Cambridge 1999) 


1235 


79 | 3 Hed 


1236 PartE 


Evolutionary Computation 


79 | 3 Hed 


62.31 


62.32 


62.33 


62.34 


62:35 


62.36 


62.37 


62.38 


62.39 


62.40 


62.41 


62.42 


62.43 


62.44 


62.45 


62.46 


Gecode Team: Gecode: Generic constraint devel- 
opment environment (2006), available online from 
http://www.gecode.org 

CHOCO Team: Choco: An open source java constraint 
programming library, Res. Rep. 10-02-INFO (Ecole 
des Mines de Nantes, Nantes 2010) 

N. Nethercote, P.J. Stuckey, R. Becket, S. Brand, 
G.J. Duck, G. Tack: Minizinc: Towards a standard CP 
modelling language, Lect. Notes Comput. Sci. 4741, 
529-543 (2007) 

P.V. Hentenryck, L. Michel: Constraint-Based Local 
Search (MIT Press, Cambridge 2005) 

G. Pesant, M. Gendreau: A view of local search in 
constraint programming, Lect. Notes Comput. Sci. 
1118, 353-366 (1996) 

P. Shaw: Using constraint programming and local 
search methods to solve vehicle routing problems, 
Lect. Notes Comput. Sci. 1520, 417-431 (1998) 

B.D. Backer, V. Furnon, P. Shaw, P. Kilby, P. Prosser: 
Solving vehicle routing problems using constraint 


programming and metaheuristics, J. Heuristics 
6(4), 501-523 (2000) 
F. Focacci, F. Laburthe, A. Lodi: Local search 


and constraint programming. In: Handbook of 
Metaheuristics, ed. by F. Glover, G. Kochenberger 
(Kluwer, Boston 2003) pp. 369-403 

P. Shaw: Constraint programming and local search 
hybrids. In: Hybrid Optimization, Springer Opti- 
mization and Its Applications, Vol. 45, ed. by P. van 
Hentenryck, M. Milano (Springer, Berlin, Heidelberg 
2011) pp. 271-303 

L. Perron, P. Shaw, V. Furnon: Propagation guided 
large neighborhood search, Lect. Notes Comput. 
Sci. 3258, 468-481 (2004) 

E. Danna, L. Perron: Structured vs. unstructured 
large neighborhood search: A case study on job- 
shop scheduling problems with earliness and tar- 
diness costs, Lect. Notes Comput. Sci. 2833, 817-821 
(2003) 

Y. Caseau, F. Laburthe, G. Silverstein: A meta- 
heuristic factory for vehicle routing problems, Lect. 
Notes Comput. Sci. 1713, 144-158 (1999) 

L.M. Rousseau, M. Gendreau, G. Pesant: Using 
constraint-based operators to solve the vehicle 
routing problem with time windows, J. Heuristics 
8(1), 43-58 (2002) 

S. Jain, P. van Hentenryck: Large neighborhood 
search for dial-a-ride problems, Lect. Notes Com- 
put. Sci. 6876, 400-413 (2011) 

J.H.-M. Lee (Ed.): Principles and Practice of Con- 
straint Programming — CP 2011 - 17th International 
Conference, CP 2011, Perugia, Italy, September 12- 
16, 2011, Proceedings, Lecture Notes in Computer 
Science, Vol. 6876 (Springer, Berlin, Heidelberg 
2011) 

R. Cipriano, L. Di Gaspero, A. Dovier: Hybrid ap- 
proaches for rostering: A case study in the integra- 
tion of constraint programming and local search, 
Lect. Notes Comput. Sci. 4030, 110-123 (2006) 


62.47 


62.48 


62.49 


62.50 


62.51 


62.52 


62.53 


62.54 


62.55 


62.56 


62.57 


62.58 


62.59 


62.60 


62.61 


62.62 


H. Cambazard, E. Hebrard, B. O'Sullivan, A. Pa- 
padopoulos: Local search and constraint pro- 
gramming for the post enrolment-based course 
timetabling problem, Ann. Oper. Res. 194(1), 111-135 
(2012) 

|. Dotu, M. Cebrián, P. van Hentenryck, P. Clote: Pro- 
tein structure prediction with large neighborhood 
constraint programming search. In: Principles and 
Practice of Constraint Programming, ed. by |. Dotu, 
M. Cebrián, P. van Hentenryck, P. Clote (Springer, 
Berlin, Heidelberg 2008) pp. 82-96 

R. Cipriano, A. Dal Palù, A. Dovier: A hybrid ap- 
proach mixing local search and constraint pro- 
gramming applied to the protein structure pre- 
diction problem, Proc. Workshop Constraint Based 
Methods Bioinform. (WCB 2008), Paris (2008) 

L. Perron, P. Shaw: Combining forces to solve the 
car sequencing problem, Lect. Notes Comput. Sci. 
3011, 225-239 (2004) 

R. Cipriano, L. Di Gaspero, A. Dovier: A hybrid solver 
for Large Neighborhood Search: Mixing Gecode and 
EasyLocal++, Lect. Notes Comput. Sci. 5818, 141-155 
(2009) 

R. Cipriano: On the hybridization of constraint pro- 
gramming and local search techniques: Models and 
software tools, Lect. Notes Comput. Sci. 5366, 803- 
804 (2008) 

R. Cipriano: On the Hybridization of Constraint Pro- 
gramming and Local Search Techniques: Models 
and Software Tools, Ph.D. Thesis (PhD School in 
Computer Science — University of Udine, Udine 2011) 
D. Pisinger, S. Ropke: Large neighborhood search. 
In: Handbook of Metaheuristics, ed. by M. Gen- 
dreau, J.-Y. Potvin (Springer, Berlin, Heidelberg 
2010) pp. 399-420, 2nd edn., Chap. 13 

T. Carchrae, J.C. Beck: Principles for the design of 
large neighborhood search, J. Math. Model, Algo- 
rithms 8(3), 245-270 (2009) 

G. Pesant, M. Gendreau: A constraint programming 
framework for local search methods, J. Heuristics 
5(3), 255-279 (1999) 

L. Michel, P. van Hentenryck: A constraint-based 
architecture for local search, Proc. 17th ACM SIGPLAN 
Object-oriented Program. Syst. Lang. Appl. (OOPSLA 
'02), New York (2002) pp. 83-100 

P. van Hentenryck, L. Michel: Differentiable in- 
variants, Lect. Notes Comput. Sci. 4204, 604-619 
(2006) 

P. van Hentenryck, L. Michel: Control abstrac- 
tions for local search, J. Constraints 10(2), 137-157 
(2005) 

P. van Hentenryck, L. Michel: Nondeterministic 
control for hybrid search, Lect. Notes Comput. Sci. 
3524, 863-864 (2005) 

L. Michel, A. See, P. van Hentenryck: Distributed 
constraint-based local search, Lect. Notes Comput. 
Sci. 4204, 344-358 (2006) 

P. van Hentenryck, L. Michel: Synthesis of 
constraint-based local search algorithms from 


Integration of Metaheuristics and Constraint Programming 


References 


62.63 


62.64 


62.65 


62.66 


62.67 


62.68 


62.69 


62.70 


62.71 


62.72 


62.73 


high-level models, 22nd Natl. Conf. Artif. Intell. 
AAAI, Vol. 1 (2007) pp. 273-278 

S.A. Mohamed Elsayed, L. Michel: Synthesis of 
search algorithms from high-level cp models, Lect. 
Notes Comput. Sci. 6876, 256-270 (2011) 

N. Jussien, 0. Lhomme: Local search with constraint 
propagation and conflict-based heuristic, Artif. In- 
tell. 139(1), 21-45 (2002) 

A. Schaerf: Combining local search and look-ahead 
for scheduling and constraint satisfaction prob- 
lems, 15th Int. Joint Conf. Artif. Intell. (UCAI-97), 
Nagoya (1997) pp. 1254-1259 

S. Prestwich: Coloration neighbourhood search 
with forward checking, Ann. Math. Artif. Intell. 34, 
327-340 (2002) 

W.D. Harvey, M.L. Ginsberg: Limited discrepancy 
search, 14th Int. Joint Conf. Artif. Intell., Montreal 
(1995) pp. 607-613 

S. Prestwich: Combining the scalability of local 
search with the pruning techniques of systematic 
search, Ann. Oper. Res. 115(1), 51-72 (2002) 

0. Kamarainen, H. Sakkout: Local probing applied 
to scheduling, Lect. Notes Comput. Sci. 2470, 81-103 
(2006) 

0. Kamarainen, H. El Sakkout: Local probing ap- 
plied to network routing, Lect. Notes Comput. Sci. 
3011, 173-189 (2004) 

J. Zhang, H. Zhang: Combining local search and 
backtracking techniques for constraint satisfaction, 
Proc. 13th Natl. Conf. Artif. Intell. (AAAI96) (1996) 
pp. 369-374 

M. Sellmann, W. Harvey: Heuristic constraint prop- 
agation, Lect. Notes Comput. Sci. 2470, 319-325 
(2006) 

M. Dell’Amico, A. Lodi: On the integration of meta- 
heuristic stratgies in constraint programming. In: 
Metaheuristic Optimization Via Memory and Evo- 
lution: Tabu Search and Scatter Search, Operations 
Research/Computer Science Interfaces, Vol. 30, ed. 
by C. Rego, B. Alidaee (Kluwer, Boston 2005) 
pp. 357-371, Chap. 16 


62.74 


62.75 


62.76 


62.77 


62.78 


62.79 


62.80 


62.81 


62.82 


62.83 


62.84 


N. Barnier, P. Brisset: Combine & conquer: Ge- 
netic algorithm and CP for optimization, Lect. Notes 
Comput. Sci. 1520, 463-463 (1998) 

H. Hu, W.-T. Chan: A hybrid GA-CP approach for 
production scheduling, 5th Int. Conf. Nat. Comput. 
(2009) pp. 86-91 

S. Deris, S. Omatu, H. Ohta, P. Saad: Incorporat- 
ing constraint propagation in genetic algorithm for 
university timetable planning, Eng. Appl. Artif. In- 
tell. 12(3), 241-253 (1999) 

A. Jouglet, C. Oguz, M. Sevaux: Hybrid flow- 
shop: a memetic algorithm using constraint-based 
scheduling for efficient search, J. Math. Model Al- 
gorithms 8(3), 271-292 (2009) 

B. Meyer, A. Ernst: Integrating ACO and constraint 
propagation, Lect. Notes Comput. Sci. 3172, 166-177 
(2004) 

M. Khichane, P. Albert, C. Solnon: CP with ACO. 
In: Integration of Al and OR Techniques in Con- 
straint Programming for Combinatorial Optimiza- 
tion Problems, ed. by L. Perron, M.A. Trick (Springer, 
Berlin, Heidelberg 2008) pp. 328-332 

M. Khichane, P. Albert, C. Solnon: Strong combi- 
nation of ant colony optimization with constraint 
programming optimization, Lect. Notes Comput. 
Sci. 6140, 232-245 (2010) 

M. Khichane, P. Albert, C. Solnon: Integration of ACO 
in a constraint programming language, Lect. Notes 
Comput. Sci. 5217, 84-95 (2008) 

S. Benedettini, A. Roli, L. Di Gaspero: Two-level 
ACO for haplotype inference under pure parsimony, 
Lect. Notes Comput. Sci 5217, 179-190 (2008) 

B. Crawford, C. Castro: Integrating lookahead and 
post processing procedures with ACO for solving 
set partitioning and covering problems, Lect. Notes 
Comput. Sci. 4029, 1082-1090 (2006) 

B. Crawford, C. Castro, E. Monfroy: Constraint pro- 
gramming can help ants solving highly constrained 
combinatorial problems, ICSOFT 2008 - Proc. 3rd 
Int. Conf. Software Data Technol., INSTICC, Porto 
(2008) pp. 380-383 


1237 


79 | 3 Hed 


63. Graph Coloring and Recombination 


Rhyd Lewis 


It is widely acknowledged that some of the most 
powerful algorithms for graph coloring involve the 
combination of evolutionary-based methods with 
exploitative local search-based techniques. This 
chapter conducts a review and discussion of such 
methods, principally focussing on the role that re- 
combination plays in this process. In particular we 
observe that, while in some cases recombination 
seems to be usefully combining substructures in- 
herited from parents, in other cases it is merely 
acting as a macro perturbation operator, helping 
to reinvigorate the search from time to time. 


63.1 Graph Coloring 


Graph coloring is a well-known NP-hard combinato- 
rial optimization problem that involves using a minimal 
number of colors to paint all vertices in a graph such 
that all adjacent vertices are allocated different col- 
ors. The problem is more formally stated as follows: 
given an undirected simple graph G = (V, E), with ver- 
tex set V and edge set E, our task is to assign each vertex 
v € V an integer c(v) € {1,2,...,k} so that: 


© civ) AcWWV{y, uy EE 
@ kis minimal. 


Though essentially a theoretical problem, graph 
coloring is seen to underpin a wide variety of 
seemingly unrelated operational research problems, 
including satellite scheduling [63.1], educational 
timetabling [63.2,3], sports league scheduling [63.4], 
frequency assignment problems [63.5,6], map color- 
ing [63.7], airline crew scheduling [63.8], and compiler 
register allocation [63.9]. The design of effective algo- 
rithms for graph coloring thus has positive implications 
for a large range of real-world problems. 

Some common terms used with graph coloring are 
as follows: 


63.1 Graph Coloring ............00....:ccceceeeeeeeeees 1239 
63.2 Algorithms for Graph Coloring.............. 1240 

63.2.1 EAs for Graph Coloring ............... 1241 
G38. SOUP ea aei E eae TENS 1244 

63.3.1 Problem Instances... 1245 
63.4 Experiment 1...............c cc ceeceecceeeeeeeeeeees 1246 
63.5 Experiment 2...............ccccceceecceeeeeeeeeeees 1249 
63.6 Conclusions and Discussion.................. 1251 
Referentes. orenera tienne 1252 


© A coloring of a graph is called complete if all ver- 
tices v € V are assigned a color c(v) € {1,...,k}; 
else it is considered partial. 

@ A clash describes a situation where a pair of adja- 
cent vertices u,v € V are assigned the same color 
(that is, {u,v} € E and c(v) = c(u)). If a coloring 
contains no clashes, then it is considered proper; 
else it is improper. 

© A coloring is feasible if and only if it is both com- 
plete and proper. 

© The chromatic number of a graph G, denoted x(G), 
is the minimal number of colors required in a feasi- 
ble coloring. If a feasible coloring uses x(G) colors, 
it is considered optimal. 

@ An independent set is a subset of vertices I C V 
that are mutually non-adjacent. That is, Vu,v € J, 
{u,v} € E. Similarly, a clique is a subset of ver- 
tices C C V that are mutually adjacent: Vu,v € C, 
{u,v} E€ E. 


Given these definitions, we might also view graph 
coloring as a type of partitioning/grouping problem 
where the aim is to split the vertices into a set of subsets 
U = {U1, ... , Ug} such that V;N U; = 9 <i<j<k). 


1239 


1240 PartE 


Evolutionary Computation 


T'E9 | J Hed 


Fig. 63.1 A simple graph (left) and a feasible five-coloring (right) 


If We U; = V, then the partition represents a com- 
plete coloring. Moreover, if all subsets U;,..., Uy are 
independent sets, the coloring is also feasible. 

To exemplify these concepts, Fig. 63.1 shows an 
example graph with ten vertices, together with a cor- 
responding coloring. In this case the presented color- 
ing is both complete and proper, and therefore fea- 
sible. It is also optimal because it uses just five 
colors, which happens to be the chromatic number 
in this case. The graph also contains one clique 
of size 5 (vertices vj,v3,V4,V6, and v7), and nu- 
merous independent sets, such as vertices v2, V3, Vg, 


63.2 Algorithms for Graph Coloring 


Graph coloring has been studied as an algorithmic prob- 
lem since the late 1960s and, as a result, an abundance 
of methods have been proposed. Loosely speaking, 
these methods might be grouped into two main classes: 
constructive methods, which build solutions step-by- 
step, perhaps using various heuristic and backtracking 
operators; and stochastic search-based methods, which 
attempt to navigate their way through a space of can- 
didate solutions while optimizing a particular objective 
function. 

The earliest proposed algorithms for graph coloring 
generally belong to the class of constructive meth- 
ods. Perhaps the simplest of these is the first-fit (or 
greedy) algorithm. This operates by taking each ver- 
tex in turn in a specified order and assigning it to the 
lowest indexed color where no clash is induced, creat- 
ing new colors when necessary [63.12]. A development 
on this method is the DSATUR algorithm [63.13, 14] in 


and vo. As a partition, this coloring is represented U = 
{{V1, Vio}, {V7, Va}, (V3, Vs}, (V2, Va, Vo}, {V6}. 

It should be noted that various subsidiary prob- 
lems related to the graph coloring problem are also 
known to be NP-hard. These include computing the 
chromatic number itself, identifying the size of the 
largest clique, and determining the size of the largest 
independent set in a graph [63.10,11]. In addition, 
the decision variant of the graph coloring prob- 
lem, which asks: given a fixed positive integer k, is 
there a feasible k-coloring of the vertices? is NP- 
complete. 


which the ordering of the vertices is determined dy- 
namically — specifically, by choosing at each step the 
uncolored vertex that currently has the largest number 
of different colors assigned to adjacent vertices, break- 
ing ties by taking the vertex with the largest degree. 
Other constructive methods have included backtracking 
strategies, such as those of Brown [63.15] and Kor- 
man [63.16], which may ultimately perform complete 
enumerations of the solution space given excess time. 
A survey of backtracking approaches was presented by 
Kubale and Jackowski [63.17] in 1985. 

Many of the more recent methods for graph coloring 
have followed the second approach mentioned above, 
which is to search a space of candidate solutions and 
attempt to identify members that optimize a specific 
objective function. Such methods can be further classi- 
fied according to the composition of their search spaces, 
which can comprise: 


Graph Coloring and Recombination | 63.2 Algorithms for Graph Coloring 1241 


(a) The set of all feasible solutions (using an undefined 
number of colors) 

(b) The set of complete colorings (proper and im- 
proper) for a fixed number of colors k 

(c) The set of proper solutions (partial and complete), 
also for a fixed number of colors k. 


Algorithms following scheme (a) have been con- 
sidered by, among others, Culberson and Luo [63.18], 
Mumford [63.19], Erben [63.20], and Lewis [63.21]. 
Typically, these methods consider different permuta- 
tions of the vertices, which are then fed into a construc- 
tive method (such as first-fit) to form feasible solutions. 
An intuitive cost function in such cases is simply the 
number of colors used in a solution, though other more 
fine-grained functions have been suggested, such as the 
following due to Erben [63.20] 


_ Eueu [Erev des] 
lul 


fi (63.1) 


Here, the term (J ey, deg(v)) gives the sum of the de- 
grees of all vertices assigned to a color class U;. The aim 
is to maximize f, by making increases to the numerator 
(by forming large color classes that contain high-degree 
vertices), and decreases to the denominator (by reduc- 
ing the number of color classes). 

On the other hand, algorithms following scheme (b) 
operate by first proposing a fixed number of colors k. At 
the start of a run, each vertex will be assigned to one of 
the k colors using heuristics, or randomly. However, this 
may involve the introduction of one or more clashes, 
resulting in a complete, improper k-coloring. The cost 
of such a solution might then be evaluated using the 
following cost function, which is simply a count on the 
number of clashes 


h= 5 g(v,u) where 


V{v, u} EE 
_ fl ifc@)= c(u) 
80,u) = b otherwise . (63:2) 


The strategy in such approaches is to make alterations 
to a solution such that the number of clashes is re- 
duced to zero. If this is achieved k can be reduced; 
alternatively if all clashes cannot be eliminated, k can 
be increased. This strategy has been quite popular in 
the literature, involving the use of various stochas- 
tic search methodologies, including simulated anneal- 
ing [63.22, 23], tabu search [63.24], greedy randomized 
adaptive search procedure (GRASP) methods [63.25], 


iterated local search [63.26, 27], variable neighborhood 
search [63.28], ant colony optimization [63.29], and 
evolutionary algorithms (EA) [63.30-35]. 

Finally, scheme (c) also involves using a fixed 
number of colors k; however in this case, rather than 
allowing clashes to occur in a solution, vertices that 
cannot be feasibly assigned to a color are placed into 
a set of uncolored vertices S. The aim is, therefore, to 
make changes to a solution so that these vertices can 
eventually be feasibly colored, resulting in S = Ø. This 
approach has generally been less popular in the litera- 
ture than scheme (b), though some prominent examples 
include the simulated annealing approach of Morgen- 
stern [63.36], the tabu search method of Blochliger and 
Zufferey [63.37], and the EA of Malaguti et al. [63.38]. 
More recently, Hertz et al. [63.39] also suggested an 
algorithm that searches different solution spaces dur- 
ing different stages of a run. The idea is that when the 
search is deemed to have stagnated in one space, a pro- 
cedure is used to alter the current solution so that it 
becomes a member of another space (e.g., clashing ver- 
tices are uncolored by transferring them to S). Once this 
has been done, the search can then be continued in this 
new space where further improvements might be made. 


63.2.1 EAs for Graph Coloring 


In this section we now examine the ways in which EAs 
have been applied to the graph coloring problem, partic- 
ularly looking at issues surrounding the recombination 
of solutions. 


Assignment-Based Operators 
Perhaps the most intuitive way of applying EAs to 
the graph coloring problem is to view the task as one 
of assignment. In this case, a candidate solution can 
be viewed as a mapping of vertices to colors c: V > 
{1,..., k}, and a natural chromosome representation is 
a vector (c(v1), c(v2),..., c(vıv|)), where c(v;) gives the 
color of vertex v; (the solution given in Fig. 63.1 would 
be represented by (1,4,3,4,3,5,2,2,4, 1) under this 
scheme). However, it has long been argued that this 
sort of approach brings disadvantages, not least because 
it contradicts a fundamental design principle of EAs: 
the principle of minimum redundancy [63.40], which 
states that each member of the search space should be 
represented by as few distinct chromosomes as possi- 
ble. To expand upon this point, we observe that under 
this assignment-based representation, if we are given 
a solution using l < k colors, the number of different 
chromosomes representing this solution will be ‘P; due 


T'E9 |3 Hed 


1242 PartE | Evolutionary Computation 


T'E9 | J Hed 


to the arbitrary way in which colors are allocated labels. 
(For example, swapping the labels of colors 2 and 4 
in Fig. 63.1’s solution would give a new chromosome 
(1,2,3,2,3,5,4,4,2,1), but the same solution.) Of course, 
this implies a search space that is far larger than neces- 
sary. 

Furthermore, authors such as Falkenauer [63.41] 
and Coll et al. [63.42] have also argued that tradi- 
tional recombination schemes such as 1, 2, and n-point 
crossover with this representation have a tendency to 
recklessly break up building-blocks that we might want 
promoted in a population. As an example, consider a re- 
combination of the two example chromosomes given 
in the previous paragraph using two-point crossover: 
(1,4,3,4,3,5,2,2,4,1) crossed with (1,2,3,2,3,5,4,4,2,1) 
would give (1,4,3,4,3,5,4,4,4, 1) as one of the offspring. 
Here, despite the fact that the two parent chromosomes 
actually represent the same feasible solution, the resul- 
tant offspring seems to have little in common with its 
parents, having lost one of its colors, and seen a number 
of clashes having been introduced. Thus, it is concluded 
by these authors that such operations actually consti- 
tute more of a random perturbation operator, rather than 
a mechanism for combining meaningful substructures 
from existing solutions. Nevertheless, recent algorithms 
following this scheme are still reported in the litera- 
ture [63.43]. 

In recognition of the proposed disadvantages of 
the assignment-based representation, Coll et al. [63.42] 
proposed a procedure for relabeling the colors of one 
of the parent chromosomes before applying crossover. 
Consider two (not necessarily feasible) parent solutions 
represented as partitions: U; = {U11,...,U1.,}, and 
Un = {U2,1,..., U24}. Now, using U; and Us, a com- 
plete bipartite graph Kx, is formed. This bipartite graph 
has k vertices in each partition, and the weights between 
two vertices i,j from different partitions are defined as 
wij = |U1 i N U2,;|. Given K;,,, a maximum weighted 
matching can then be determined using any suitable 
algorithm (e.g., the Hungarian algorithm [63.44] or auc- 


Parent 1 


tion algorithm [63.45]), and this matching can be used 
to re-label the colors in one of the chromosomes. 

Figure 63.2 gives an example of this procedure and 
shows how the second parent can be altered so that 
its color labelings maximally match those of parent 1. 
In this case, we note that the color classes {v1, vio}, 
{v3, vs}, and {v6} occur in both parents and will be 
preserved in any offspring produced via a traditional 
crossover operator. However, this will not always be the 
case and will depend very much on the best matching 
that is available in each case. 

A further scheme for color relabeling that also ad- 
dresses the issue of redundancy was proposed by Tucker 
et al. [63.46]. This method involves representing solu- 
tions using the assignment-based scheme, but under the 
following restriction 


cmv) =1, (63.3) 
c(vi+1) < max{c(vı),... cv} +1. (63.4) 


Chromosomes obeying these labeling criteria might, 
therefore, be considered as being in their canonical 
form such that, by definition, vertex vı is always col- 
ored with color 1, v2 is always colored with color | or 2, 
and so on. (The solution given in Fig. 63.1 would be 
represented by (1,2,3,2,3,4,5,5,2,1) under this scheme.) 
However, although this ensures a one-to-one correspon- 
dence between the set of chromosomes and the set of 
vertex partitions (thereby removing any redundancy), 
research by Lewis and Pullin [63.47] demonstrated that 
this scheme is not particularly useful for graph color- 
ing, not least because minor changes to a chromosome 
(such as the recoloring a single vertex) can lead to major 
changes to the way colors are labeled, making the prop- 
agation of useful solution substructures more difficult to 
achieve when applying traditional crossover operators. 


Partition-Based Operators 
Given the proposed issues with the assignment-based 
approach, the last 15 years or so have also seen a num- 


Parent 2 lUi, 0 Up, | 

(1, 4, 3, 4, 3, 5, 2, 2, 4, 1) (372.15 20.1,.55-45:45453) 123465 
Partition Partition loona oa 
U11 = {V1, Vio} U21 = {v3, V5} 210 0 D 20 
U12 = {v7, vs} U22 = {v2, va} — 5; = 
Ui3 = {v3, v5} U23 = {v1, vio} 7 F : : r ; Fig. 63.2 Example of the relabel- 
a ee V4, Vo} ae He Vg, Vo} / s|0 000 1 ing procedure proposed by Coll 

k i aA manhole et al. [63.42]. Here, parent 2 is rela- 


(44,324 BS, 2, 2, i) 


beled as 1 > 3,2 —> 4,3 —> 1,4 —> 
2,and 5—5 


Graph Coloring and Recombination 


63.2 Algorithms for Graph Coloring 


ber of articles presenting recombination operators fo- 
cussed on the partition (or grouping) interpretation of 
graph coloring. The philosophy behind this approach 
is that it is actually the color classes (and the vertices 
that are assigned to them) that represent the underly- 
ing building blocks of the graph coloring problem. In 
other words, it is not the color of individual vertices 
per se, but the way in which vertices are grouped that 
form the meaningful substructures. Consequently, the 
focus should be on the design of operators that are 
successfully able to combine and promote these within 
a population. 

Perhaps the first major work in this area was due 
to Falkenauer [63.48] in 1994 (and later [63.41]) who 
argued in favor of the partition interpretation in the jus- 
tification of his grouping genetic algorithm (GGA) — 
an EA methodology specifically designed for use with 
partitioning problems. Falkenauer applied this GGA 
to two important operational research problems: the 
bin-packing problem and bin-balancing problem, with 
strong results being reported. In subsequent work, Er- 
ben [63.20] also tailored the GGA for graph coloring. 
Erben’s approach operates in the space of feasible col- 
orings and allows the number of colors in a solution 
to vary. Solutions are then stored as partitions, and 
evaluated using (63.1). In this approach, recombination 
operates by taking two parent solutions and randomly 
selecting a subset of color classes from the second. 
These color classes are then copied into the first par- 
ent, and all color classes coming from the first parent 
containing duplicate vertices are deleted. This opera- 


tion results in an offspring solution that is proper, but 
most likely partial. Thus uncolored vertices are then 
reinserted into the solution, in this case using the first-fit 
algorithm. A number of other recombination operators 
for use in the space of feasible solutions have also been 
suggested by Mumford [63.19]. These operate on per- 
mutations of vertices, which are again decoded into 
solutions using the first-fit algorithm. 

Another recombination operator that focusses on 
the partition interpretation of graph coloring is due 
to Galinier and Hao, who in 1999 proposed an EA 
that, at the date of writing, is still understood to be 
one of the best performing algorithms for graph color- 
ing [63.33, 38, 49, 50]. Using a fixed number of colors k, 
Galinier and Hao’s method operates in the space of 
complete (proper and improper) k-colorings using cost 
function fọ (63.2). A population of candidate solu- 
tions is then evolved using local search (based on tabu 
search) together with a specialized recombination oper- 
ator called greedy partition crossover (GPX). The latter 
is used as a global operator and is intended to guide the 
search over the long term, gently directing it towards fa- 
vorable regions of the search space (exploration), while 
the local search element is used to identify high quality 
solutions within these regions (exploitation). 

The idea behind GPX is to construct offspring 
using large color classes inherited from the parent so- 
lutions. A demonstration of how this is done is given 
in Fig. 63.3. As is shown, the largest (not necessarily 
proper) color class in the parents is first selected and 
copied into the offspring. Then, in order to avoid dupli- 


Parent 1 Parent 2 Offspring 

a) U= {vi v2, v3} {V3, V4, V5, V7} } Select the color with most vertices and copy to the 
U= {v4 V5, Vo, V7} {v1, Ve, Vo} {} child (U; from parent 1 here). 
U3 {vs, Vo, Vio} {v2, Vg, Vio} } Delete copied vertices from both parents. 

b) U= {vi v2, v3} {v3} {v4, V5, Ve, V7} Select the color with most vertices in parent 2 and 
U=  {} {v1, Vo} } copy to child. 
U3 {vs, Vo, Vio} {v2, vs, Vio} } Delete copied vertices from both parents. 

c) U= {vrv} {v3} {V4, V5, V6, V7} Select the color with most vertices in parent 1 and 
U= f} {v1, vo} {v2, Vs, Vio} copy to the child. 
U3 {vo} {} } Delete copied vertices from both parents. 

d) U= ff} {} {Va Vs, Vo, V7} Having formed k colors, assign any missing vertices 
U= {} {vo} V2, V8, Vio} to random colors. 
U3 {vo} {} {vi v3} 

e) Ga {} {vo} {Va Vs, V6, V7} A complete (though not necessarily proper) solution 
U= {vo} {} V2, V8, Vio, Vo} results. 
U; {} {} V1, V3} 


Fig. 63.3 Demonstration of the GPX operator using k = 3 


1243 


T'E9 | a Hed 


1244 Part E 


Evolutionary Computation 


€°€9 | J Hed 


cate vertices occurring in the offspring at a later stage, 
these copied vertices are removed from both parents. 
To form the next color, the other (modified) parent is 
then considered and, again, the largest color class is 
selected and copied into the offspring, before again re- 
moving these vertices from both parents. This process 
is continued by alternating between the parents until the 
offspring’s k color classes have been formed. At this 
point, each color class in the offspring will be a subset 
of a color class existing in one or both of the parents. 
That is 

VU; € Ue AU; € (UU U2) : UJ; CU;, (63.5) 
where Uc, U1, and U, represent the offspring, and par- 
ents | and 2, respectively. 

One feature of the GPX operator is that on produc- 
tion of an offspring’s k color classes, some vertices may 
be missing (this occurs with vertex vo in Fig. 63.3). 
Galinier and Hao [63.33] suggest assigning these un- 
colored vertices to random classes, which of course 
could introduce further clashes. This element of the 
procedure might, therefore, be viewed as a type of per- 
turbation (mutation) operator in which the number of 
random assignments (the size of the perturbation) is de- 
termined by the construction stages of GPX. However, 
Glass and Prugel-Bennett [63.49] observe that GPX’s 
strategy of inheriting the largest available color class at 
each step (as opposed to a random color class) generally 
reduces the number of uncolored vertices. This means 
that the amount of information inherited directly from 
the parents is increased, reducing the potential for dis- 
ruption. Once a complete offspring is formed, it is then 
modified and improved via a local search procedure be- 
fore being inserted into the population. 

Since the proposal of GPX by Galinier and 
Hao [63.33], further recombination schemes based on 
this method have also been suggested, differing primar- 
ily in the criteria used for selecting the color classes 
that are inherited by the offspring. Lii and Hao [63.34], 
for example, extended the GPX operator to allow more 


63.3 Setup 


The EA used in the following experiments operates in 
the same manner as Galinier and Hao’s [63.33]. To 
form an initial population, a modified version of the 
DSATUR algorithm is used. Specifically, each individ- 
ual is formed by taking the vertices in turn according to 


than two parents to play a part in producing a single 
offspring (Sect. 63.5). On the other hand, Porumbel 
et al. [63.35] suggest that instead of choosing the largest 
available color class at each stage of construction, 
classes with the least number of clashes should be 
prioritized, with class size (and information regarding 
the degrees of the vertices) then being used to break 
ties. Malaguti et al. [63.38] also use a modified version 
of GPX with an EA that navigates the space of par- 
tial, proper solutions. In all of these cases the authors 
combined their recombination operators with a local 
search procedure in the same manner as Galinier and 
Hao [63.33] and, with the problem instances consid- 
ered, the reported results are generally claimed to be 
competitive with the state of the art. 


Assessing the Effectiveness of EAs 

for Graph Coloring 
In recent work carried out by the author of this chap- 
ter [63.50], a comparison of six different graph coloring 
algorithms was presented. This study was quite broad 
and used over 5000 different problem instances. Its con- 
clusions were also rather complex, with each method 
outperforming all others on at least one class of prob- 
lems. However, a salient observation was that the GPX- 
based EA of Galinier and Hao [63.33] was by far the 
most consistent and high-performing algorithm across 
the comparison. 

In the remainder of this chapter we pursue this mat- 
ter further, particularly focussing on the role that GPX 
plays in this performance. Under a common EA frame- 
work, described in Sect. 63.3, we first evaluate the 
performance of GPX by comparing it to two other re- 
combination operators (Sect. 63.4). Using information 
gained from these experiments, Sect. 63.5 then looks 
at how the performance of the GPX-based EA might 
be enhanced, particularly by looking at ways in which 
population diversity might be prolonged during a run. 
Finally, conclusions and a further discussion surround- 
ing the virtues of recombination in this problem domain 
are presented in Sect. 63.6. 


the DSATUR heuristic and then assigning it to the lowest 
indexed colori € {1,...,k} where no clash occurs. Ver- 
tices for which no clash-free color exists are assigned 
to random colors at the end of this process. Ties in the 
DSATUR heuristic are broken randomly, providing di- 


Graph Coloring and Recombination | 63.3 Setup 


Table 63.1 Details of the five problem instances used in our analysis 


Vertex degree Best known 
#: Name IVI Density min; med; max Mean SD (colors) 
1: Random 1000 0.499 450; 499; 555 499.4 16.1 83 
2: Flat(10) 500 0.103 36; 52; 61 ST 4.4 10 
3: Flat(100) 500 0.841 393; 421; 445 420.7 7.6 100 
4: TT(A) 682 0.128 0; 77; 472 87.4 62.0 27, 
5: TT(B) 2419 0.029 0; 47; 857 Wi) 92.3 32 


versity in the initial population. Each individual is then 
improved by the local search routine. 

The EA evolves the population using recombina- 
tion, local search, and replacement pressure. In each 
iteration two parent solutions are selected at random, 
and the selected recombination operator is used to pro- 
duce one offspring. This offspring is then improved via 
local search and inserted into the population by replac- 
ing the weaker of its two parents. 

The local search element of this EA makes use of 
tabu search — specifically the TABUCOL algorithm of 
Hertz and de Werra [63.24], run for a fixed number of 
iterations. In this method, moves in the search space 
are achieved by selecting a vertex v whose assignment 
to color 7 is currently causing a clash, and moving it 
to a new color j Æ i. The inverse of this move is then 
marked as tabu for the next f steps of the algorithm 
(meaning that v cannot be re-assigned to color į until 
at least ¢ further moves have been performed). In each 
iteration, the complete neighborhood is considered, and 
the non-tabu move that is seen to invoke the largest de- 
crease in cost (or failing that, the smallest increase) is 
performed. Ties are broken randomly, and tabu moves 
are also carried out if they are seen to improve on the 
best solution observed so far in the process. The tabu 
search routine terminates when the iteration limit is 
reached (at which case the best solution found during 
the process is taken), or when a zero cost solution is 
achieved. Further descriptions of this method, includ- 
ing implementation details, can be found in [63.51]. 

In terms of parameter settings, in all cases we use 
a population size of 20 (as in [63.34,35]) and set the 
tabu search iteration limit to 16|V|, which approximates 
the settings used in the best reported runs in [63.33]. 
As with other algorithms that use this local search 
technique [63.29, 33, 37], the tabu tenure ¢ is made pro- 
portional to the current solution cost: specifically, t = 
[0.6f2| +r, where r is an integer uniformly selected 
from the range 0—9 inclusive. 

Finally, because this algorithm operates in the space 
of complete k-colorings (proper and improper), values 
for k must be specified. In our case, initial values are 


determined by executing DSATUR on each instance and 
setting k to the number of colors used in the resultant so- 
lution. During runs, k is then decremented by 1 as soon 
as a feasible k-coloring is found, and the algorithm is 
restarted. Computational effort is measured by count- 
ing the number of constraint checks carried out by the 
algorithm, which occur when the algorithm requests in- 
formation about a problem instance, including checking 
whether two vertices are adjacent (by accessing an ad- 
jacency list or matrix), and referencing the degree of 
a vertex. In all trials a cut-off point of 5 x 10!! checks 
is imposed, which is roughly double the length of the 
longest run performed in [63.33]. In our case, this led 
to run times of ~ 1h on our machines (algorithms were 
coded in C++ and executed on a PC under Windows XP 
using a 3.0 GHz processor with 3.18 GB of RAM). 


63.3.1 Problem Instances 


For our trials a set of five problem instances is con- 
sidered. Though this set is quite small, its members 
should be considered as case studies that have been 
deliberately chosen to cover a wide range of graph 
structure — a factor that we have found to be very im- 
portant in influencing the relative performance of graph 
coloring algorithms [63.50]. The first three graphs are 
generated using the publicly available software of Cul- 
berson [63.52], while the remaining two are taken from 
a collection of real-world timetabling problems com- 
piled by Carter et al. [63.53]. Names and descriptions 
of these graphs now follow. Further details are also 
given in Table 63.1: 


#1: Random. This graph features |V| = 1000 and is 
generated such that each of the (| W) pairs of vertices 
is linked by an edge with probability 0.5. Graphs of 
this nature are nearly always considered in compar- 
isons of coloring algorithms. 

#2: Flat(10). Flat graphs are generated by partition- 
ing the vertices into K equi-sized groups and then 
adding edges between vertices in different groups 
with probability p. This is done such that the vari- 


1245 


€°€9 | 3 Hed 


1246 PartE 


Evolutionary Computation 


7°€9 | 3 Hed 


Parent 1 

Ui = {v1, Vio} 
Uy = {v7, vs} 
U3 = {v3, vs} 
U, = {V2, Va, Vo} 
Us = {v6} 


ance in vertex degrees is kept to a minimum. It 
is well known that feasible K-colored solutions 
to such graphs are generally easy to achieve ex- 
cept in cases where p is within a specific range of 
values, which results in problems that are notori- 
ously difficult. Such ranges are commonly termed 
phase transition regions [63.54]. This particular in- 
stance is generated so that it features a relatively 
small number of large color classes (using V = 
500 and K = 10, implying ~ 50 vertices per color). 
A value of p = 0.115 is used, which has been ob- 
served to provide very difficult instances for a range 
of different graph coloring algorithms [63.50]. 

#3 Flat(100). This graph is generated in the same man- 
ner as the previous one, using |V| = 500, K = 100, 
and p= 0.85. Solutions thus feature a relatively 
large number of small color classes (~ 5 vertices 
per color). 


63.4 Experiment 1 


Our first set of experiments looks at the performance of 
GPX by comparing it to two additional recombination 
operators. To gauge the advantages of using a global op- 
erator (recombination in this case), we also consider the 
performance of TABUCOL on its own, which iterates on 
a single solution until the run cut-off point is met. 

Our first additional recombination operator follows 
the assignment-based scheme discussed in Sect. 63.2.1 
and, in each application, utilizes the procedure of Coll 
et al. [63.42] (Fig. 63.2) to relabel the second parent. 
Offspring are then formed using the classical n-point 
crossover, with each gene being inherited from either 
parent with probability 0.5. 

Our second recombination operator is based on 
the grouping genetic algorithm (GGA) methodology 
(Sect. 63.2.1), adapted for use in the space of k- 
colorings. An example is given in Fig. 63.4. Given 


Parent 2 
U; = {V1, Vo} 
U2 = {v7} 

> U3 = {v3, Vs} 

© U4 = {V2, Va, Vo} 
Us = {V6, Vio} 


Offspring 

U1 = {v1, Vio} 

Uz = {v7, #8} 

U3 = {V3, Vs} 

U4 = {V2, Va, Vo} 

Us = {vo} Uncolored = {vo} 


Fig. 63.4 Demonstration of the GGA recombination operator. 
Here, color classes in parent 2 are labeled to maximally match those 
of parent 1 


#4: TT(A). This graph is named car_s_9J in the original 
dataset of Carter et al. [63.53]. It is chosen be- 
cause it is quite large and, unlike the previous three 
graphs, the variance in vertex degrees is quite high. 
This problem’s structure is also much less regular 
than the previous three graphs, which are generated 
in a fairly regimented manner. 

#5: TT(B). This graph, originally named pur_s_93, is 
the largest problem in Carter’s dataset, with |V| = 
2419. It is also quite sparse compared to the previ- 
ous graph, though it still features a high variance in 
vertex degrees (Table 63.1). 


The rightmost column of Table 63.1 also gives in- 
formation on the best solutions known for each graph. 
These values were determined via extended runs of our 
algorithms, or due to information provided by the prob- 
lem generator. 


two parents, the color classes in the second parent 
are first relabeled using Coll et al.’s procedure. Using 
the partition-based representations of these solutions, 
a subset of colors in parent 2 is then chosen randomly, 
and these replace the corresponding colors in a copy 
of parent 1. Duplicate vertices are then removed from 
color classes originating from parent 1 and uncolored 
vertices are assigned to random color classes. Note that 
like GPX, before uncolored vertices are assigned, the 
property defined by (63.5) is satisfied by this operator; 
however, unlike GPX there is no requirement to inherit 
larger color classes or to inherit half of its color classes 
from each parent. 

A summary of the results achieved by the three 
recombination operators (together with TABUCOL) is 
given in Table 63.2. For each instance the same set of 20 
initial populations was used with the EAs, and entries 
in bold signify samples that are significantly different 
to the non-bold EA entries according to a Wilcoxon 
signed-rank test at the 0.01 significance level. For graph 
#1 we see that GPX has clearly produced the best re- 
sults — indeed, even its worst result features two fewer 
colors than the next best solution. However, for graphs 
#2 and #5, no significant difference between the EAs 
is observed, while for #3 and #4, better results are pro- 
duced by the GGA and the n-point crossover. 

Figure 63.5 shows run profiles for two example 
graphs. We see that in both cases TABUCOL provides 


Graph Coloring and Recombination 


63.4 Experiment 1 


a) Colors b) Colors 
102 — GPX — GPX 
n-point Lie n-point 
100 — GGA — GGA 
98 —— TabuCol 112 —— TabuCol 


0 1 2 3 4 5 
Checks (x10!) 


0 1 2 3 4 5 
Checks (x10!!) 


Fig. 63.5a,b Run profiles for the instances (mean of 20 runs): (a) #1 (random). (b) #3 (Flat 100) 


the fastest rates of improvement, though it is eventu- 
ally overtaken by at least one of the EAs. Table 63.2, 
however, also reveals that TABUCOL performs very 
poorly with graphs #4 and #5. This seems due to the 
high degree variance in these cases, which we observe 
makes the cost of neighboring solutions in the search 
space vary more widely. This suggests a more spiky cost 
landscape in which the use of local search in isolation 
exhibits a susceptibility for becoming trapped at local 
optima (see also [63.50]). 

An important factor behind the differing perfor- 
mances of these EAs is the effect that recombination 
has on the population diversity. To examine this, we first 
define a metric for measuring the distance between two 
solutions: Given a solution U, let Py = {{u, v} : c(u) = 
c(v)}, for Yu, v € V, u# v. The distance between two 
solutions U; and U, can then be defined, 


|Pu, UPu,|—|Pu, N Pus| 
Pu, U Pup | 


D(U;, U2) = (63.6) 
This measure gives the proportion of vertex pair- 
ings (assigned to the same color) that exist in 
just one of the two solutions. Consequently, if Uı 


and Uz are identical, then Py, U Pu, = Pu, Pw, 
giving D(U,, U2) = 0. Conversely, if no vertex pair 
is assigned the same color, Py, O Pu, = Ø, imply- 
ing D(U,, U2) = 1. Population diversity can also be 
defined as the mean distance between each pair of so- 
lutions in the population. That is, given a set of m 
individuals U = {U,, Un, ..., Um} 


1 
Diversity (U) = 5 5X Duu). 
27 YU,UEU:i<j 


(63.7) 


Considering our results, the two scatter plots of 
Fig. 63.6 demonstrate the positive correlation that exists 
between parental distance and the number of uncol- 
ored vertices that result in applications of the GPX and 
GGA operators. This data was derived from graph #4, 
though similar patterns were observed for the other in- 
stances. Note that the correlation is weaker for GGA 
due to two reasons. First, unlike GPX, which requires 
half of the color classes to be inherited from each par- 
ent, with GGA this proportion can vary. Thus if the 
majority of color classes are inherited from just one par- 


Table 63.2 Number of colors in the best feasible solution achieved at the cut-off point (mean (min; median; max) of 20 


runs) 
GPX n-point GGA TABUCOL 
#1 87.00 (87; 87; 87) 93.35 (93; 93; 94) 91.55 (91; 92; 92) 89.10 (89; 89; 90) 
#2 12.95 (12; 13; 13) 13.00 (13; 13; 13) 13.00 (ie 1332 1133) 13.00 (138 113% 1133) 
#3 105.60 (105; 106; 106) 105.05 (105; 105; 106) 105.05 (105; 105; 106) 105.90 (105; 106; 106) 
#4 29.05 (28; 29; 30) 28.00 (28; 28; 28) 27.90 (27; 28; 29) 38.20 (32; 37.5; 46) 
#5 33.30 (33; 33; 34) 33.15 (32; 33; 34) 33.10 (32; 33; 34) 52.05 (47; 52; 56) 


1247 


7€9 | 3 Hed 


1248 PartE | Evolutionary Computation 


7€9 | 3 Hed 


a) Vertices uncolored 
1604 


140 
120 
100 

80 


4 
ty 
60 EE 


doe the 
+ He 
A a ey at 


n snare 
20 


0 nn niall 
0 0.2 0.4 0.6 0.8 1 


Distance between parents 


c) Vertices uncolored 
1604 


140 
120 
100 
80 | 
60 
40 | 
20 


0 


0 2 4 6 8 10 
Crossover (x 1000) 


b) Vertices uncolored 
1604 


140 mt 
120 
100 
80 
60 
40 
20 


1 
Distance between parents 


Crossover (x 1000) 


Fig. 63.6a-d Relationship between parental distance and number of uncolored vertices with the GPX (a) and GGA (b) 
operators. Also shown is the number of uncolored vertices in the first 10000 applications of GPX (c) and GGA (d) 


ent, it is possible to have two very different parents, but 
only a small number of uncolored vertices. Second, as 
mentioned earlier GGA shows no bias towards inher- 
iting larger color classes, meaning that the number of 
uncolored vertices can also be higher than GPX, partic- 
ularly when inheriting around half of the color classes 
from each parent. An effect of these patterns is shown 
in the lower graphs of Fig. 63.6, where throughout the 
evolutionary process, the number of uncolored vertices 
occurring during recombination is fewer and less varied 
with GPX. In comparison to GGA, this behavior leads 
to a more rapid loss of diversity, as is demonstrated in 
Fig. 63.7 for two example graphs. 

Whether sustained diversity is a help or hindrance 
with these EAs thus seems to depend on the type of 
graph being tackled. As can be seen in Fig. 63.7, for 


graph #1 GPX is the only recombination operator that 
leads to any sort of population convergence, and it is 
also the algorithm that produces the best solutions given 
sufficient time, suggesting that is suitably homing in on 
high-quality regions of the search space. On the other 
hand, for graphs #3 and #4, GGA’s more sustained di- 
versity (caused and perpetuated by the greater number 
of uncolored vertices that occur during recombination) 
causes the operator to be more disruptive. However, in 
these cases this factor also seems to provide a useful 
diversification mechanism, allowing the algorithm to 
sample wider areas of the search space, leading to better 
results. An extreme case of diversity loss occurs with 
graph #5, which we recall has a low density and high 
degree variance. In this case, when using GPX large 
color classes of low-degree vertices that are formed in 


Graph Coloring and Recombination | 63.5 Experiment 2 


a) Diversity 
1 Mn 


0.8 
0.6 


0.4 


— GPX 


0.2 n-point 
—— GGA 
> 
S 0 2 4 6 8 10 


Crossover (x1000) 


b) Diversity 
1 
— GPX 


—— n-point 
0.8 — GGA 


0 2 4 6 8 70 
Crossover (x 1000) 


Fig. 63.7a,b Population diversity during the first 10000 recombinations with (a) the random (#1) and (b) TT(A) (#4) 


instances 


early stages of the algorithm quickly come to dominate 
the population limiting the exploration that then takes 
place — indeed, in many runs the algorithm was actu- 
ally unable to improve on costs achieved in the initial 
population. 

Figure 63.7 also shows that n-point crossover tends 
to maintain diversity for longer periods than GPX in this 
case, allowing it to produce superior results for graphs 
#3 and #4. However, the sustained diversity is not due 


63.5 Experiment 2 


In this section we now consider ways in which the 
results of the GPX operator might be improved, partic- 
ularly looking at how we might encourage diversity to 
be sustained in the population. 

As mentioned in Sect. 63.2.1, Lii and Hao [63.34] 
previously proposed extending the GPX operator to al- 
low offspring to be produced using m > 2 parents. In 
this operator, which we call MULTIX, offspring are 
constructed in the same manner as GPX, except that at 
each stage the largest color class from multiple parents 
is chosen to be copied into the offspring. The intention 
behind this increased choice is that larger color classes 
will be identified, resulting in fewer uncolored vertices 
once the k color classes have been constructed. In order 
to prohibit too many colors being inherited from one 
particular parent, Lii and Hao also make use of a pa- 
rameter q, specifying that if the i-th color class in an 
offspring is copied from a particular parent, then this 


to uncolored vertices (which do not occur with this op- 
erator); rather, it seems due to the naturally occurring 
disruption that results from the color labeling issues 
mentioned in Sect. 63.2.1. 

Finally, we also mention that during our runs with 
these EA’s, the local search element was observed to be 
by far the most expensive part of the algorithm, with 
none of the recombination operators consuming more 
than 1.8% of the available run time. 


parent should not be considered for further g colors. In 
our application of MULTIX we follow the recommenda- 
tions of the Lii and Hao, choosing m randomly from the 
set {2,...,6} in each application and using q = |m/2]. 
Note also that GPX is simply an application of MULTIX 
using m = 2 and q = 1. 

Though having the potential to produce good re- 
sults [63.34], an issue with MULTIX is that it could 
result in diversity being lost even more rapidly than with 
GPX, particularly if fewer vertices need to be randomly 
recolored at the end of each application. In [63.34], Lü 
and Hao attempt to deal with this using a mechanism 
whereby offspring are only inserted into the population 
if they are seen to be sufficiently different or better than 
existing members. However, in our case, we suggest 
two alternative methods. 

The first of these involves altering the MULTIX 
operator so that it works exclusively with proper col- 


1249 


S'E9 | 3 Hed 


1250 PartE 


Evolutionary Computation 


S'E9 | 3 Hed 


Fig. 63.8 Example Kempe chain involving, e.g., vertex v7 and color 4 (left), and the resultant coloring due to a color 


interchange (right) 


orings. As noted, GPX and MULTIX currently operate 
on colorings in which clashes are permitted; however, 
this could in theory result in large color classes that 
feature many clashes being unduly promoted in the pop- 
ulation, when perhaps the real emphasis should be on 
the promotion of large color classes that are indepen- 
dent sets. The ISETS approach thus operates by first 
iteratively removing clashing vertices from each parent 
(in a random order, until proper colorings are achieved), 
and then using the MULTIX operator to produce an off- 
spring as before. This implies that, before recoloring 
missing vertices, offspring will also be proper, since 
subsets of independent sets are themselves independent 
sets. A further effect is that a greater number of vertices 
might need to be recolored, since vertices originally re- 
moved from the parents could also be missing in the 
resultant offspring. 

Our second proposal for prolonging diversity is to 
make changes directly to an offspring to try to increase 
its distance from its parents before reinsertion into the 
population. One way of doing this would be to in- 
crease the iteration limit of the local search procedure, 
as demonstrated by Galinier and Hao [63.33]. How- 
ever, we find that such an approach can slow the algo- 
rithm unnecessarily, particularly because as the proce- 
dure progresses, movements in the search space (due 
to improving or sideways moves) become less frequent. 
An alternative in this case is to exploit the structure of 
the graph coloring problem via the use of a Kempe chain 
interchange operator. Kempe chains define connected 
sub-graphs that involve exactly two colors, and can be 
generated by taking an arbitrary vertex v and color i, 
such that c(v) Æ i. An example is given in Fig. 63.8. 
Note that when interchanging the colors of vertices in 


a Kempe chain, if the original coloring is proper, then 
so is the new coloring. Thus we have the opportunity 
to quickly alter colorings without compromising their 
quality. 

Our KEMPE approach operates in the same man- 
ner as ISETS, except that before reassigning uncolored 
vertices, a series of randomly selected Kempe chain 
interchanges are performed on the existing proper col- 
oring. In our case, 2k such moves are applied. 

The results achieved by our three modifications are 
summarized in Table 63.3, where bold entries signify 
samples that are significantly different to GPX at signif- 
icance level 0.01. We see that improvements over GPX 
were only obtained on graph #1, where all three vari- 
ants were successful, and graph #4 using the KEMPE 
variant. In practice, we found that MULTIX causes di- 
versity to be lost more quickly than GPX with these 
graphs — however, the ISETS mechanism did not seem 
to alter this behavior a great deal, usually because the 
number of clashing vertices needing to be removed was 
quite small (less than 10). 

Surprisingly, we also found that the KEMPE vari- 
ant was only able to maintain higher levels of diversity 
with instances #4 and #5. For graphs #1, #2, and #3, 
it turns out that when using a suitably low number of 
colors k, the bipartite graphs induced by most pairs of 
color classes in a solution are connected. In these cases, 
all of the vertices belonging to the two color classes are 
included in the Kempe chain, meaning that a color inter- 
change does not alter the structure of the solution, but 
merely produces a relabeling of the two color classes. 
(An example of such a Kempe chain would occur in 
Fig. 63.8 using vertex v3 and color 2.) This is not the 
case for the less structured graphs #4 and #5, where we 


Table 63.3 Number of colors in the best feasible coloring achieved at the cut-off point (mean (min; median; max) from 


20 runs) 
GPX MULTIX ISETS KEMPE 
#1 87.00 (87; 87; 87) 85.00 (85; 85; 85) 85.05 (85; 85; 86) 85.15 (85; 85; 86) 
#2 12.95 (12; 13; 13) 13.00 (13; 13; 13) 13.00 (UBB s 13) 12.90 (2B i138 1133) 
#3 105.60 (105; 106; 106) 105.55 (105; 106; 106) 105.85 (105; 106; 106) 105.30 (105; 105; 106) 
#4 29.05 (28; 29; 30) 29.10 (29; 29; 30) 29.00 (28; 29; 30) 28.00 (28; 28; 28) 
#5 33.30 (33; 33; 34) 33.30 (33; 33; 34) 33.30 (33; 33; 34) 33.30 (33; 33; 34) 
a) Colors b) Diversity 
29.5 4 1 
C ee ee 0.8 
29i 
\ 
\ 0.6 
28.5 \ — Multix | = Muli 
\ — ISets } ——~ Kets 
Moe — Kempe 0.4 
28 
0.2 
SS > 0 > 
0 1 2 3 4 5 0 2 4 6 8 10 


Checks (x10!!) 


Crossover (x1000) 


Fig. 63.9 (a) Run profile for TT(A) (graph #4, left), and (b) its diversity over the first 10 000 recombinations 


found that diversity could be maintained for longer peri- 
ods. However, this only led to significant improvements 
in the results for graph #4, whose run profiles are shown 


63.6 Conclusions and Discussion 


In this chapter we have examined the relative per- 
formance of a number of different graph coloring 
recombination operators. Using a common evolution- 
ary framework, we have seen that this performance 
varies, particularly due to the underlying structures of 
the graphs being tackled. 

A desirable property of recombination is that it 
should be able to combine meaningful substructures of 
existing candidate solutions (parents) in the production 
of new, hopefully fitter, offspring. However, does that 
process actually occur with any of these operators? Or, 
by involving the random reassignment of some vertices, 
do the operators simply provide a mechanism by which 
large random perturbations are periodically applied to 
a solution, helping to re-invigorate the search process? 


in Fig. 63.9. Also note that these enhanced results still 
fail to beat those of the GGA and n-point operators, as 
shown in Table 63.2. 


Again, the answer to such a question seems to de- 
pend on the problem instance at hand. In Table 63.4 
we compare the costs of solutions achieved by the best 
available recombination operator for each instance, to- 
gether with those produced by a corresponding random 
perturbation operator. Specifically, for each graph we 
identified the best run from the EA’s sample of 20 and 
recorded the number of uncolored vertices that resulted 
in each application of recombination. We then used 
these figures, together with the same k-value, to specify 
the number of vertices that would be randomly selected 
and reassigned in each corresponding application of our 
random perturbation operator. In each iteration this al- 
gorithm then operated by selecting two parents, making 
a copy of parent 1, randomly perturbing this copy, ap- 


Graph Coloring and Recombination | 63.6 Conclusions and Discussion 1251 


9°€9 | 3 Hed 


1252 PartE | Evolutionary Computation 


€9 | 3 Hed 


Table 63.4 Comparison of the best EA and corresponding random perturbation operator. (Cost of best solutions using f2 
(63.2); (mean, (min; median; max) from 20 runs), and proportion of runs where fọ = 0 (feasibility) was achieved) 


EA Random 
k Type Cost Feas. Cost Feas. 
#1 85 MULTIX 0.00 (0; 0; 0) 1.00 16.80 (4; 17.5; 31) 0.00 
#2 12 GPX 2.40 (0; 2; 4) 0.05 7.60 (5; 8; 10) 0.00 
#3 105 GPX 0.90 (0;1;2) 0.40 1.75 (0; 2; 3) 0.15 
#4 2 GGA 1.10 (0; 1; 2) 0.15 1.35 (0; 1; 2) 0.05 
#5 32 GGA 1.75 (0; 2; 3) 0.05 1.50 (0; 1.533) 0.15 


plying local search, and finally replacing the weaker of 
the two parents. 

The results in Table 63.4 indicate that, for graph 
#1, recombination is clearly doing more than just ran- 
domly perturbing solutions since all runs have resulted 
in feasible 85-colorings. However, although recombina- 
tion has achieved significantly lower costs with graph 
#2, the proportion of runs where feasibility has been 
achieved shows no significant difference for any of the 
graphs #2 to #5 (according to McNemar’s test at signif- 
icance level 0.01). We find this observation compelling 
as it might suggest that better results might ultimately 
be achieved using schemes that make more informed 
decisions about the size and frequency of perturbations. 
Indeed, currently the size of random perturbations tends 
to fall as the run progresses (Fig. 63.6); however, it may 
be useful to allow this trend to be reversed, particularly 
if improvements are not achieved for a lengthy period of 
time. In addition, the way in which vertices are chosen 
for random reassignment might also influence perfor- 
mance — for example, we might target those belonging 
to a specific color, those that are causing clashes, those 
that have been assigned to a particular color for the 
longest, and so on. This requires further research. 

An interesting point regarding the structure of solu- 
tions was raised previously by Porumbel et al. [63.35], 
who considered the sizes of the color classes. Specifi- 
cally, they propose that when solutions involve a small 


References 


63.1 N. Zufferey, P. Amstutz, 0. Giaccari: Graph colouring 
approaches for a satellite range scheduling prob- 
lem, J. Sched. 11(4), 263-277 (2008) 

63.2 M. Carter: A survey of practical applications of ex- 
amination timetabling algorithms, Oper. Res. 34(2), 
193-202 (1986) 

63.3 R. Lewis: A survey of metaheuristic-based tech- 
niques for university timetabling problems, OR 
Spectrum 30(1), 167-190 (2008) 


number of large color classes (such as graph #2 in our 
case), good quality colorings tend to result through the 
identification of large independent sets. On the other 
hand, if a solution involves many small color classes, 
quality is determined more by the productive interac- 
tion between classes. In other words, the proposal is 
that small independent sets in isolation do not constitute 
good features in these cases; rather, quality results from 
appropriate combinations of these sets. Such an obser- 
vation might provide evidence as to why the GGA re- 
combination has outperformed GPX with graph #3 be- 
cause, unlike GPX, it does not require half of the color 
classes to be inherited from each parent, thus potentially 
allowing more class-combinations to be considered. 
However, this argument is countered by the fact that, 
according to Table 63.4, GGA has not outperformed the 
random perturbation operator, suggesting that it is actu- 
ally this mechanism that influences the search. Clearly, 
further research in this area is also required. 

Given such observations, another important avenue 
of future research will be to increase our understanding 
of the links between a graph’s structure and the best al- 
gorithms that can then be used to color it. This might, 
for example, be derived by increasing our understand- 
ing of the behavior, strengths, and weaknesses of the 
various algorithmic operators available for graph col- 
oring, and also via more empirical means such as data 
mining, as discussed by Smith-Miles and Lopes [63.55]. 


63.4 R. Lewis, J. Thompson: On the application of 
graph colouring techniques in round-robin sports 
scheduling, Comput. Oper. Res. 38(1), 190-204 
(2010) 

63.5 K. Aardel, S. van Hoesel, A. Koster, C. Man- 
nino, A. Sassano: Models and solution tech- 
niques for the frequency assignment problems, 
KOR: Q. J. Belg. Fr. Ital. Oper. Res. Soc. 1(4), 1-40 
(2002) 


Graph Coloring and Recombination | References 

63.6 C.M. Valenzuela: A study of permutation operators perimental evaluation; part Il, graph coloring 

for minimum span frequency assignment using an and number partitioning, Oper. Res. 39, 378-406 

order based representation, J. Heuristics 7, 5-21 (1991) 

(2001) 63.24 A. Hertz, D. de Werra: Using tabu search techniques 
63.7 K. Appel, W. Haken: Solution of the four color map for graph coloring, Computing 39(4), 345-351 (1987) 

problem, Sci. Am. &, 108121 (1977) 63.25 M. Laguna, R. Marti: A GRASP for coloring sparse 
63.8 M. Gamache, A. Hertz, J. Ouellet: A graph color- graphs, Comput. Optim. Appl. 19, 165-178 (2001) 

ing model for a feasibility problem in monthly 63.26 M. Chiarandini, T. Stiitzle: An application of iterated 

crew scheduling with preferential bidding, Com- local search to graph coloring, Proc. Comput. Symp. 

put. Oper. Res. 34, 2384-2395 (2007) Graph Color. Gen. (2002) pp. 112-125 
63.9 G. Chaitin: Register allocation and spilling via 63.27 L. Paquete, T. Stiitzle: An experimental investiga- 

graph coloring, ACM SIGPLAN Notices 39(4), 66-74 tion of iterated local search for coloring graphs, 

(2004) applications of evolutionary computing, Lect. Notes 
63.10 M.R. Garey, D.D. Johnson: Computers and In- Comput. Sci. 2279, 121-130 (2002) 

tractability - A guide to NP-completeness, 1stedn. 63.28 C. Avanthay, A. Hertz, N. Zufferey: A variable neigh- 

(W. H. Freeman, San Francisco 1979) borhood search for graph coloring, Eur. J. Oper. Res. 
63.11 M. Karp: Reducibility among combinatorial prob- 151, 379-388 (2003) 

lems. In: Complexity of Computer Computations, 63.29 J. Thompson, K. Dowsland: An improved ant colony 

The IBM Research Symposia Series, Vol. 1972, ed. by optimisation heuristic for graph colouring, Discrete 

R.E. Miller, J.W. Thatcher, J.D. Bohlinger (Plenum Appl. Math. 156, 313-324 (2008) 

Press, New York 1972) pp. 85-103 63.30 R. Dorne, J.-K. Hao: A new genetic local search al- 
63.12 D. Welsh, M. Powell: An upper bound for the gorithm for graph coloring, Lect. Notes Comput. Sci. 

chromatic number of a graph and its application 1498, 745-754 (1998) 

to timetabling problems, Comput. J. 12, 317-322 63.31 A.E. Eiben, J.K. van der Hauw, J.I. van Hemert: 

(1967) Graph coloring with adaptive evolutionary algo- 
63.13 D. Brélaz: New methods to color the vertices of rithms, J. Heuristics 4(1), 25-46 (1998) 

a graph, Commun. ACM 22(4), 251-256 (1979) 63.32 C. Fleurent, J. Ferland: Genetic and hybrid algo- 
63.14 P. Spinrad, G. Vijayan: Worse case analysis of rithms for graph colouring, Ann. Oper. Res. 63, 

a graph colouring algorithm, Discrete Appl. Math. 437-461 (1996) 

12, 89-92 (1984) 63.33 P. Galinier, J.-K. Hao: Hybrid evolutionary algo- 
63.15 R. Brown: Chromatic scheduling and the chro- rithms for graph coloring, J. Comb. Optim. 3, 379- 

matic number problem, Manag. Sci. 19(4), 451-463 397 (1999) 

(1972) 63.34 Z.Li,J.-K. Hao: Amemetic algorithm for graph col- 
63.16 S. Korman: The graph-colouring problem. In: Com- oring, Eur. J. Oper. Res. 203(1), 241-250 (2010) 

binatorial Optimization, ed. by N. Christofides, 63.35 D. Porumbel, J.-K. Hao, P. Kuntz: An evolution- 

A. Mingozzi, P. Toth, C. Sandi (Wiley, New York 1979) ary approach with diversity guarantee and well- 

pp. 211-235 informed grouping recombination for graph color- 
63.17 M. Kubale, B. Jackowski: A generalized implicit ing, Comput. Oper. Res. 37, 1822-1832 (2010) 

enumeration algorithm for graph colouring, Com- 63.36 C. Morgenstern: Distributed coloration neighbor- 

munications ACM 28(28), 412-418 (1985) hood search, Discrete Math. Theor. Comput. Sci. 26, 
63.18 J. Culberson, F. Luo: Exploring the k-colorable 335-358 (1996) 

landscape with iterated greedy, Proc. 2nd DIMACS 63.37 |. Blochliger, N. Zufferey: A graph coloring heuristic 

Implement. Chall. (1996), pp. 245-284 using partial solutions and a reactive tabu scheme, 
63.19 C. Mumford: New order-based crossovers for the Comput. Oper. Res. 35, 960-975 (2008) 

graph coloring problem, Lect. Notes Comput. Sci. 63.38 E. Malaguti, M. Monaci, P. Toth: A metaheuristic ap- 

4193, 880-889 (2006) proach for the vertex coloring problem, INFORMS 
63.20 E. Erben: A grouping genetic algorithm for graph J. Comput. 20(2), 302-316 (2008) 

colouring and exam timetabling, Lect. Notes Com- 63.39 A. Hertz, M. Plumettaz, N. Zufferey: Variable space 

put. Sci. 2079, 132-158 (2001) search for graph coloring, Discrete Appl. Math. 
63.21 R. Lewis: A general-purpose hill-climbing method 156(13), 2551-2560 (2008) 

for order independent minimum grouping prob- 63.40 N.J. Radcliffe: Forma analysis and random respect- 

lems: A case study in graph colouring and bin ful recombination, Proc. 4th Int. Conf. Genet. Algo- 

packing, Comput. Oper. Res. 36(7), 2295-2310 rithms (1991) pp. 222-229 

(2009) 63.41 E. Falkenauer: Genetic Algorithms and Grouping 
63.22 M. Chams, A. Hertz, 0. Dubuis: Some experiments Problems, 1st edn. (Wiley, New York 1998) 

with simulated annealing for coloring graphs, Eur. 63.42 E. Coll, G. Duran, P. Moscato: A discussion on some 

J. Oper. Res. 32, 260-266 (1987) design principles for efficient crossover operators 
63.23 D. Johnson, C. Aragon, L. McGeoch, C. Schevon: for graph coloring problems, An. XXVII Simp. Bras. 


Optimization by simulated annealing: An ex- 


Pesqui. Oper. (1995) 


1253 


£9 | 3 Hed 


1254 PartE 


Evolutionary Computation 


€9 | 3 Hed 


63.43 


63.44 


63.45 


63.46 


63.47 


63.48 


63.49 


R. Abbasian, M. Mouhoub, A. Jula: Solving graph 
coloring problems using cultural algorithms, Proc. 
24th Florida Artif. Intell. Res. Soc. Conf. (2011) 

J. Munkres: Algorithms for the assignment and 
transportation problems, J. Soc. Ind. Appl. Math. 
5(1), 32-38 (1957) 

D. Bertsekas: Auction algorithms for network flow 
problems: A tutorial introduction, Comput. Optim. 
Appl. 1, 7-66 (1992) 

A. Tucker, J. Crampton, S. Swift: RGFGA: An efficient 
representation and crossover for grouping genetic 
algorithms, Evol. Comput. 13(4), 477-499 (2005) 

R. Lewis, E. Pullin: Revisiting the restricted growth 
function genetic algorithm for grouping problems, 
Evol. Comput. 19(4), 693-704 (2011) 

E. Falkenauer: A new representation and operators 
for genetic algorithms applied to grouping prob- 
lems, Evol. Comput. 2(2), 123-144 (1994) 

C. Glass, A. Prugel-Bennett: Genetic algorithms 
for graph coloring: Exploration of Galnier and 


63.50 


63.51 


63.52 


63.53 


63.54 


63.55 


Hao's algorithm, J. 
(2003) 

R. Lewis, J. Thompson, C. Mumford, J. Gillard: 
A wide-ranging computational comparison of 
high-performance graph colouring algorithms, 
Comput. Oper. Res. 39(9), 1933-1950 (2012) 

P. Galinier, A. Hertz: A survey of local search algo- 
rithms for graph coloring, Comput. Oper. Res. 33, 
2547-2562 (2006) 

J. Culberson: Graph coloring page, http://web.cs. 
ualberta.ca/~joe/Coloring/ (2010) 

M. Carter, G. Laporte, S.Y. Lee: Examination 
timetabling: Algorithmic strategies and applica- 
tions, J. Oper. Res. Soc. 47, 373-383 (1996) 

T. Hogg, B. Huberman, C. Williams: Refining the 
phase transition in combinatorial search, Artif. In- 
tell. 81(1/2), 127-154 (1996) 

K. Smith-Miles, L. Lopes: Measuring instance dif- 
ficulty for combinatorial optimization problems, 
Comput. Oper. Res. 39(5), 875-889 (2012) 


Comb. Optim. 7, 229-236 


64. Metaheuristic Algorithms and Tree Decomposition 


Thomas Hammerl, Nysret Musliu, Werner Schafhauser 


This chapter deals with the application of evo- 
lutionary approaches and other metaheuristic 
techniques for generating tree decompositions. 
Tree decomposition is a concept introduced by 
Robertson and Seymour [64.1] and it is used to 
characterize the difficulty of constraint satisfaction 
and NP-hard problems that can be represented 
as a graph. Although, in general, no polynomial 
algorithms have been found for such problems, 
particular instances can be solved in polyno- 
mial time if the treewidth of their corresponding 
graph is bounded by a constant. The process of 
solving problems based on tree decomposition 
comprises two phases. First, a decomposition with 
small width is generated. Basically in this phase 
the problem is divided into several subproblems, 
each included in one of the nodes of the tree de- 
composition. The second phase includes solving 
a problem (based on the generated tree decompo- 
sition) with a particular algorithm such as dynamic 
programming. The main idea is that by decompos- 
ing a problem into subproblems of limited size, 
the whole problem can be solved more efficiently. 
The time for solving the problem based on its tree 
decomposition usually depends on the width of 
the tree decomposition. Thus, it is of high inter- 
est to generate tree decompositions having small 
widths. 

Finding the treewidth of a graph is an NP-hard 
problem [64.2]. In order to solve this problem, 
different algorithms have been proposed in the 
literature. Exact methods such as branch and 
bound techniques can be used only for small 
graphs. Therefore, metaheuristic algorithms based 
on genetic algorithms [64.3], simulated anneal- 
ing [64.4], tabu search [64.5], iterated local 
search [64.6], and ant colony optimization 


64.1 Tree Decompositions. ......................005 1256 
64.1.1 Formal Definitions.................0. 1257 


64.2 Generating Tree Decompositions 


by Metaheuristic Techniques ................ 1258 


64.2.1 Genetic Algorithms 


for Tree Decomposition.............. 1258 


64.2.2 Ant Colony Optimization 


for Tree Decomposition .............. 1260 


64.2.3 Iterated Local Search 


for Tree Decomposition.............. 1263 


64.2.4 Other Techniques 


for Tree Decomposition.............. 1266 


64.2.5 Comparison of Algorithms 


for Tree Decomposition .............. 1266 


64.2.6 Application 
of Tree Decomposition 


in Metaheuristic Techniques....... 1267 
64.3 Conclusion... 1268 
REFEFENCES......... 0... cece ces ees eee eee eseeaeseeeaeeneeees 1269 


(ACO) [64.7, 8] have been proposed in the literature 
to generate good upper bounds for larger graphs. 
Such techniques have been applied very success- 
fully and they are able to find the best existing 
upper bounds for many benchmark problems in 
the literature. 

In this chapter, we will first introduce the 
concept of tree decomposition, and then give 
a survey on metaheuristic techniques used to 
generate tree decompositions. Three approaches 
based on genetic algorithms, iterated local search, 
and ACO that were proposed in the literature 
will be described in detail. Finally, we will also 
mention briefly two recent approaches that ex- 
ploit tree decompositions within metaheuristic 
search. 


1255 


vV 
far] 

I 

(oa 
m 
oO 
f 


1256 PartE | Evolutionary Computation 


L479 | J Hed 


64.1 Tree Decompositions 


We start with an informal description of tree decompo- 
sition. Suppose that we have to find solutions for the 
graph coloring problem (GCP), which is a well-known 
constraint satisfaction problem (CSP) in the literature. 
For this problem, we have to find a coloring of vertices 
of a given graph in such a way that no two vertices con- 
nected by an edge share the same color. An instance of 
the GCP is shown on the left-hand side of Fig. 64.1. The 
task is now to find a valid coloring just using the colors 
red, green, and blue. 

A naive approach to solve this problem might be 
to try out all possible combinations of variable assign- 
ments and see which ones are valid. In general, there 
are d” possible combinations, where d is the number of 
available colors and n is the number of vertices. 

To solve this problem by tree decomposition, first 
we generate the tree decomposition of the correspond- 
ing problem graph. Informally, a tree decomposition is 
a tree containing a group of graph vertices where each 
tree node fulfils the following conditions: each vertex of 
the graph appears in one of the nodes of the tree; if two 
vertices are connected in the graph, they must appear 
together in some of the tree nodes; connectedness con- 
dition must be fulfilled, i. e., if a vertex appears in two 
different nodes of the tree, it must appear also in other 
nodes between these two nodes. The formal definition 
of tree decomposition is given in the next section. 

The corresponding constraint graph of a coloring 
problem and a possible tree decomposition is shown 
in Fig. 64.1a,b. If we want to solve the graph color- 
ing problem based on this tree decomposition, we can 
start out by solving the subproblems given by each node 
in the tree decomposition. Using a naive approach of 
trying out all possible combinations of variable assign- 
ments, one has to generate 3° (27) different solution 
candidates for the vertex containing A, B, and C. Be- 
cause of the constraints A Æ B, A Æ C, and B Æ C only 
six of them are valid. For the subproblem containing the 


a) b) [cp 


o 6 ©) 


A,B,C 


Fig. 64.1a,b Instance of the graph coloring problem and 
a possible tree decomposition 


vertices C and D we generate 3° (9) solution candidates 
and rule out three of them because of the constraint 
C # D. We can now get all solutions to the whole prob- 
lem by joining the subproblem solutions. Therefore, we 
will take a look at the variables that both subproblems 
have in common. In this case, that is the variable C. 
Each solution for the subproblem A, B, C is joined with 
the solutions for the subproblem C, D sharing the same 
color for the vertex C. By using the tree decomposition, 
we have to generate 36 combinations of variable assign- 
ments in order to determine all solutions compared to 
the 81 combinations we would have to generate with- 
out the tree decomposition. This difference increases 
very quickly with the size of the graph coloring prob- 
lem and constraint satisfaction problems in general. The 
smaller the subproblems in the tree decomposition the 
more efficiently we can solve a particular problem. This 
motivates our interest in finding tree decompositions of 
small width. 

Note that tree decompositions have been applied 
for several applications, like combinatorial optimiza- 
tion problems, expert systems, computational biology 
etc. The use of tree decomposition for inference prob- 
lems in probabilistic networks is shown in [64.9]. 
Koster et al. [64.10] propose the application of tree de- 
compositions for frequency assignment problem. Tree 
decomposition has also been applied for the vertex 
cover problem on planar graphs [64.11]. Furthermore, 
solving partial constraint satisfaction problems (e.g. 
MAX-SAT) with tree-decomposition-based method has 
been investigated in [64.12]. In computational biology 
tree decompositions has been used for protein struc- 
ture prediction [64.13]. Recently, the application of tree 
decomposition in Answer-Set Programming has been 
investigated in [64.14]. 


b) 


4,5,6,7 


Fig. 64.2a,b A graph G (a) and a tree decomposition 
of G (b) 


Metaheuristic Algorithms and Tree Decomposition 


64.1 Tree Decompositions 


64.1.1 Formal Definitions 


The concept of tree decompositions has been first in- 
troduced by Robertson and Seymour [64.1]. The for- 
mal definition of tree decomposition is given as fol- 
lows [64.1, 15]. 


Definition 64.1 

Let G = (V, E) be a graph. A tree decomposition of G 
is a pair (T, x), where T = (I, F) is a tree with node 
set J and edge set F, and y = {y;: i € I} is a family of 
subsets of V, one for each node of T, such that: 


L Ujerxi=V, 

2. for every edge (v, w) € E, there is ani € Z with v € 
Xi and w € x;, and 

3. for all i,j,k € I, if j is on the path from i to k in T, 
then x; N Xk E %5- 


The width of a tree decomposition is maxje; | %;|— 1. 
The treewidth of a graph G, denoted by tw(G), is the 
minimum width over all possible tree decompositions 
of G. 


Figure 64.2 shows a graph G and a possible tree 
decomposition of G. The width of shown tree decom- 
position is 3. 

For the given graph G, the treewidth can be found 
from its triangulation. In the following, we will give ba- 
sic definitions, explain how the triangulation of graph 
can be constructed, and give lemmas which give rela- 
tion between the treewidth and the triangulated graph. 

Two vertices u and v of graph G(V, E) are neigh- 
bors, if they are connected by an edge e€ E. The 
neighborhood of a vertex v is defined as N (v) := {w|w € 
V, (v,w) € E}. A set of vertices is clique if they are 
fully connected. An edge connecting two nonadjacent 
vertices in the cycle is called chord. The graph is trian- 
gulated if there exists a chord in every cycle of length 
larger than 3. 

A vertex of a graph is simplicial if its neighbors 
form a clique. An ordering of nodes o(1,2,...,n) of 
V is called a perfect elimination ordering for G if 
for any i € {1,2,...,m}, o (i) is a simplicial vertex in 
Glo(i),...,0(n)] [64.16]. In [64.17] it is proved that 
the graph G is triangulated if and only if it has a perfect 
elimination ordering. Given an elimination ordering of 
nodes the triangulation H of graph G can be constructed 
as following. Initially H = G, then in the process of 
elimination of vertices, the next vertex in order to be 
eliminated is made simplicial vertex by adding of new 


edges to connect all its neighbors in current G and H. 
The vertex is then eliminated from G. This process is 
repeated for all vertices in the ordering. 

The treewidth of a triangulated graph can be cal- 
culated based on its cliques. For the given triangulated 
graph, the treewidth is equal to its largest clique minus 
1 [64.18]. Moreover, the largest clique of a triangu- 
lated graph can be calculated in polynomial time. The 
complexity of calculating the largest clique for the tri- 
angulated graphs is O(|V| + |E|) [64.18]. For every 
graph G=(V,E), there exists a triangulation of G, 
G= (V,EVUE,), with tw(G) = tw(G). Thus, finding 
the treewidth of a graph G is equivalent to finding a tri- 
angulation G of G with the minimum clique size (for 
more information see [64.15]). 

The process of elimination of nodes from the given 
graph G is illustrated in Fig. 64.3. Suppose that we have 
given the following elimination ordering: 10, 9, 8, 7, 2, 
3, 6, 1, 5, 4. The vertex 10 is first eliminated from G. 
When this vertex is eliminated no new edges are added 
to the graph G and H (graph H is not shown in the fig- 
ure), as all neighbors of node 10 are connected. From 
the remained graph G the vertex 9 is eliminated. To 
connect all neighbors of vertex 9, two new edges are 
added in G and H (edges (5, 7) and (6, 7)). The process 
of elimination continues until the triangulation H is ob- 
tained. A more detailed description of the algorithm for 
constructing a graph’s triangulation for a given elimina- 
tion ordering is found in [64.15]. 

For generating the tree decomposition during the 
vertex elimination process, first the nodes of the tree de- 
composition are created. This is illustrated in Fig. 64.3. 
When vertex 10 is eliminated a new tree decompo- 
sition node is created. This node contains the vertex 
10 and all other vertices which are connected with 
this vertex in the current graph G. Further, the next 
tree node with vertices {5,6,7,9} is created when the 
vertex 9 is eliminated. To the end of elimination pro- 
cess all tree decomposition nodes will be created. The 
created tree nodes should be connected, such that the 
connectedness condition for vertices is fulfilled. This 
is the third condition in the tree decomposition defi- 
nition. To fulfil this condition, the tree decomposition 
nodes are connected as following. The tree decomposi- 
tion node with vertices {7,9, 10} that is created when 
vertex 10 is eliminated, is connected with the tree de- 
composition node which will be created when the next 
vertex which appears in {7, 9, 10} is eliminated. In this 
case, the node {7,9, 10} should be connected with the 
node created when vertex 9 is eliminated, because this 
is the next vertex in the ordering that is contained in 


1257 


L479 |3 Wed 


1258 PartE | Evolutionary Computation 


7°49 | J Hed 


Fig. 64.3 Elimination of 
vertices 10, 9, 8, 7, 2, 3, 6, 1, 
5, 4. When a vertex is elimi- 
nated a tree node containing 
eliminated vertex and its 
neighbors is created 


{7,9, 10}. This rule is further applied for connection 
of other tree decomposition nodes, and from the graph 
the tree decomposition in Fig. 64.2 will be constructed. 
Note that some of tree nodes that are created in the 
elimination process are not presented in the tree de- 
composition, because they are contained in larger tree 


nodes. For example, the node {4,5,6} which is cre- 
ated by eliminating vertex 6 is already contained in the 
node {4,5, 6,7} which is created by eliminating vertex 
7. Moreover, tree nodes which are created by eliminat- 
ing vertices 1,5, 4 are also contained in other larger tree 
nodes. 


64.2 Generating Tree Decompositions by Metaheuristic Techniques 


As described in the previous section, the width of the 
tree decomposition depends on the elimination ordering 
of vertices. Therefore, the task of finding tree decom- 
position with minimal width consists of finding the best 
permutation of graph vertices. This problem is similar 
to the traveling salesman problem, but with a different 
objective function. 

In the last two decades, researchers have been 
proposing different techniques to find tree decomposi- 
tions for different benchmark examples. This includes 
the exact techniques based on tree search and branch 
and bound, the simple greedy techniques and meta- 
heuristic techniques. In this chapter, we focus on meta- 
heuristic techniques. At the end of this section, we will 
also shortly describe other approaches used for tree de- 
compositions. 

The metaheuristic techniques applied for tree de- 
composition can be divided in two groups: population 
based/nature inspired techniques, and local search tech- 
niques. Regarding nature inspired techniques the ap- 


plication of genetic algorithms has been investigated 
in [64.19,20], and ACO has been used in [64.21]. 
Examples of local search techniques for tree decompo- 
sitions are [64.16, 22, 23]. 


64.2.1 Genetic Algorithms 
for Tree Decomposition 


Application of genetic algorithm for tree decomposi- 
tions has been first investigated in [64.19]. This algo- 
rithm tried to minimize a weight associated with the 
decompositions of Bayesian networks which is not ex- 
actly the same as the width of the tree decomposition. 
In [64.20], this algorithm has been extended for gener- 
ating hypertree decompositions and with some changes 
in fitness function (the width of tree decompositions has 
been used as a objective function) has been tested on 
different problems from the literature. The following 
description of genetic algorithm for tree decomposition 
is based on our previous work in [64.20]. 


Metaheuristic Algorithms and Tree Decomposition | 64.2 Generating Tree Decompositions by Metaheuristic Techniques 1259 


Genetic algorithms (GAs) were developed by 
[64.3]. They try to find a good solution for an optimiza- 
tion problem by imitating the principle of evolution. 
Genetic algorithms alter and select individuals from 
a population of solutions for the optimization problem. 
In the following, we describe frequently used terms 
within the field of genetic algorithms: 


Population ... set of candidate solutions. 

Individual ... a single candidate solution. 

Chromosome ... set of parameters determining the 
properties of a solution. 

Gene ... Single parameter. 


A genetic algorithm tends to optimize the value of 
an objective function of an optimization problem, in 
terms of genetic algorithms also called fitness func- 
tion. At the beginning a genetic algorithm creates an 
initial population containing randomly or heuristically 
created individuals. These individuals are evaluated and 
assigned a fitness value, which is the value of the fitness 
function for the solution represented by the individual. 
The population is evolved over a number of generations 
until a halting criterion is satisfied. At each generation, 
the population undergoes selection and recombination, 
also called crossover and mutation. 

During the selection process, the genetic algorithm 
decides which individuals from the current population 
are allowed to enter the next population. This decision 
is based on the fitness value of the individuals and indi- 
viduals of better fitness should enter the next population 
with higher probability than individuals of lower fit- 
ness. Not selected individuals are discarded and will not 
be evolved further. 

The recombination process or crossover combines 
different properties of several parent solutions within 
one or more children solutions, also denoted as off- 
springs. Crossover exchanges properties between the 
individuals with the aim of increasing the average qual- 
ity of the population. 

During the mutation process, individuals are 
slightly altered. Mutation is used to explore new regions 
of the search space and to avoid early convergence to 
local optima. 

In practice, parameters are used in order to control the 
behavior of a genetic algorithm. Typical control param- 
eters are mutation rate, crossover rate, population size, 
and parameters for selection techniques. The choice of 
the control parameters has a crucial effect on the quality 
of the best solution found by a genetic algorithm. 

The genetic algorithm for tree decomposition pre- 
sented below is named GA-tw and was implemented 


in [64.20]. Algorithm 64.1 presents algorithm GA-tw in 
pseudo code notation. 

The algorithm takes as input a graph and several 
control parameters. Individual solutions are vertex or- 
derings. Each individual is assigned the width of the 
tree decomposition returned from the corresponding 
vertex ordering as its fitness value. 

Initially GA-tw generates a population consisting 
of randomly created individuals. Tournament selection 
was chosen as the selection technique. Tournament 
selection selects an individual by randomly choosing 
a group of several individuals from the former popula- 
tion. The individual of highest fitness (smallest width) 
within this group is selected to join the next popula- 
tion. This process is applied until enough individuals 
have entered the next population. Finally, after a certain 
number of generations, algorithm GA-tw will return the 
best fitness (smallest width) of an individual found dur- 
ing the search process. 


Crossover and Mutation Operators 
Within the genetic algorithms in [64.20] nearly all types 
of crossover operators and all mutation operators were 
implemented. The same operators were also applied 
in [64.19] for decomposing the moral graph of Bayesian 
networks. 


Algorithm 64.1 Genetic Algorithm for Tree Decom- 
positions — GA-tw 
Input: a graph G = (V, E) 
control parameters for the GA n, Pm, De, S 
and max_iterations 
Output: an upper bound on the treewidth 
of the graph 
t=0 
initialize (population(t),n) 
evaluate population(t) 
while t < max_iterations do 
t=t+1 
population(t) = tournament_selection 
(population(t — 1), s) 
recombine (population(t), pe) 
mutate (population(t), Pm) 
evaluate population(t) 
end while 
return the smallest width found during the search 


Crossover Operators 
@ Partially mapped crossover (PMX) 
@ Cycle crossover (CX) 


7°79 | 3 Hed 


1260 PartE 


Evolutionary Computation 


7°49 |3 Hed 


Order crossover (OX1) 

Order-based crossover (OX2) 
Position-based crossover (POS) 
Alternating-position crossover (AP). 


Mutation Operators 

Displacement mutation operator (DM) 
Exchange mutation operator (EM) 
Insertion mutation operator (ISM) 
Simple-inversion mutation operator (SIM) 
Inversion mutation operator (IVM) 
Scramble mutation operator (SM). 


We will describe the crossover and mutation opera- 
tors which returned the best results of algorithm GA-tw 
in more detail. 


Order Crossover (0X1) 
The order crossover operator determines a crossover 
area within the parents by randomly selecting two posi- 
tions within the ordering. The elements in the crossover 
area of the first parent are copied to the offspring. Start- 
ing at the end of the crossover area all elements outside 
the area are inserted in the same order in which they 
occur in the second parent. 


Order-Based Crossover (0X2) 
The order-based crossover operator selects at random 
several positions in the parent orderings by tossing 
a coin for each position. The elements of the first par- 
ent at these positions are deleted in the second parent. 
Afterward they are reinserted in the order of the second 
parent. 


Position-Based Crossover (POS) 

The position-based crossover operator also starts with 
selecting a random set of positions in the parent strings 
by tossing a coin for each position. The elements at the 
selected positions are exchanged between the parents in 
order to create the offsprings. The elements missing af- 
ter the exchange are reinserted in the order of the second 
parent. 


Exchange Mutation Operator (EM) 
The exchange mutation operator randomly selects two 
elements in the solution and exchanges them. 


Insertion Mutation Operator (ISM) 
The insertion mutation operator randomly chooses an 
element in the solution and moves it to a randomly se- 
lected position (Fig. 64.5). 


The genetic algorithm implemented in [64.19] was 
applied to two artificial graphs. This genetic approach 
returned competitive results when compared to results 
obtained by simulated annealing [64.22]. The algo- 
rithm implemented in [64.20] was evaluated on 62 
graphs of the second DIMACS graph coloring chal- 
lenge [64.24]. Different experiments were performed to 
find the best parameter values for parameters of the ge- 
netic algorithm and it turned out that the position-based 
crossover operator(POS) and the insertion mutation 
operator(ISM) were best suited for finding tree de- 
compositions of small width. Existing upper bounds 
for treewidth for several DIMACS instances could be 
improved. 


64.2.2 Ant Colony Optimization 
for Tree Decomposition 


Ant colony optimization (ACO has been applied 
for tree decompositions in [64.21,25]. The current 
section is based on [64.21] and describes differ- 
ent ant colony optimization variants applied for tree 
decomposition. 

ACO is a population-based metaheuristic intro- 
duced by Dorigo et al. [64.7, 8]. As the name suggests, 
the technique was inspired by the behavior of real 
ants. Ant colonies are able to find the shortest path be- 
tween their nest and a food source just by depositing 
and reacting to pheromones while they are explor- 
ing their environment. The basic principles driving 
this system can also be applied to many combina- 


Parents Offsprings 
oxı [1[2 BBB] 6|7[s 817] 3]4]5[1[2]6 
2 [4 FB 5 [3 [1 \ GHGBHHEE 
Ox2 [1 OB HE 1[2/3]4]6[5|7[8 
A BE BE O BOBBBEGE 
Pros [1 B24 157s 1]4]6]2|3]5|7[8 
A BE BE >< QBaonGEE 


Fig. 64.4 Selected crossover operators for vertex order- 
ings 


EM 


o0 
an 
N 
a 
a 
n 
w 
x 
o0 


2 B 4] s 7 


ISM 1|2 4/5/6|7|8 


— |1|2]}4/5]6/7]3]8 


Fig. 64.5 Selected mutation operators for vertex orderings 


Metaheuristic Algorithms and Tree Decomposition | 64.2 Generating Tree Decompositions by Metaheuristic Techniques 1261 


torial optimization problems. For a detailed descrip- 
tion of different ACO algorithms and their applica- 
tions the reader is referred to the book ant colony 
optimization [64.26]. 

The following variants of ACO algorithms for 
finding good upper bounds for tree decompositions 
were investigated in [64.21,25]: simple ant sys- 
tem [64.7,8], elitist ant system [64.7,8], rank-based 
ant system [64.27], max—min ant system [64.28, 
29], and ant colony system [64.30]. Two differ- 
ent pheromone update strategies were proposed and 
two stagnation measures were implemented that in- 
dicate the degree of diversity of the solutions con- 
structed by the ants. Furthermore, two constructive 
heuristics (Min-Degree, Min-Fill) were implemented 
and incorporated alternatively into every ACO vari- 
ant as a guiding function, and the combination of 
ACO with two existing local search methods: Hill 
Climbing and Iterated Local Search [64.23] were 
investigated. 

A simple constraint graph and the corresponding 
ACO construction tree are shown in Fig. 64.6. The con- 
struction tree can be obtained from the constraint graph 
as follows: 


1. Create a root node s that will be the starting point of 
every ant in the colony. 

2. For every vertex of the constraint graph append 
a child node to the root node s. 

3. To every leaf node append a child node for every 
vertex of the constraint graph that is neither repre- 
sented by the leaf node itself nor by an ancestor of 
this node. 

4. Repeat step 3 until there are no nodes left to append. 


All possible elimination orderings for the constraint 
graph can now be represented as a path from the root 
node s to one of the leaf nodes in the construction tree. 
Therefore, each of the ants finds such a path and at 
each node on its way the ant decides where to move 
next probabilistically based on the pheromone trails 
and a heuristic value both associated with the outgoing 
edges. 


Pheromone Trails 
A pheromone trail gives information how favorable it is 
to eliminate a certain vertex x after another vertex y. 
The more pheromone is located on a trail the more 
likely the corresponding vertex will be chosen by the 
ant. A way to represent the pheromone trails of the con- 
struction tree in Fig. 64.6 is the matrix as shown in the 


Fig. 64.6a,b Constraint graph G and the ACO construc- 
tion tree 


following, 


Txixy Tx Txx3 


T= Txax1 Taxa Tax (64.1) 


Tx3x1 Tx3x2  Tx3x3 
Tsx Tsx Tsx3 


In this matrix, each row contains the amounts of 
pheromone located on the trails connecting a certain 
node with all the other nodes. For example, the first row 
contains the pheromone levels related to the node x, 
describing the desirability of eliminating x2(t,,x,), re- 
spectively, x3(T;,,,) immediately after xı. The last row 
is related to the root node s that is the starting point for 
every ant. 

All pheromone trails are initialized to the same 
value in the beginning of the algorithm that is computed 
according to the following equation, 


m 


SS, VujeT. 64.2 
W, i (64.2) 


Tij 
where W, is the width of the decomposition obtained 
using the guiding heuristic (min-degree or min-fill) 
while m is the size of the ant colony. 


Heuristic Information 
The ants make their decision about which vertex to 
eliminate next not solely based on the pheromone ma- 


7°79 | J Hed 


1262 


7°79 |3 Hed 


Part E 


Evolutionary Computation 


trix but also consider a guiding heuristic. Two different 
heuristics have been implemented. In order to compute 
them, a separate graph in addition to the construction 
tree is maintained. This graph is called the elimination 
graph because it is obtained from the original constraint 
graph by successively eliminating the vertices traversed 
by the ant in the construction tree. Further, this graph 
is denoted as E(G, o) where G is the original constraint 
graph and o is a partial elimination ordering. 


Min-Degree. The value for the min-degree heuristic is 
computed according to this equation 


1 


T GEG oN +1 = 


Nij 


The node i represents the last eliminated node, whereas 
jis a node which is not eliminated yet. The expres- 
sion d(j, E(G,0o)) represents the degree of vertex j in 
the elimination graph E(G, 0). 


Min-Fill. The value for the min-fill heuristic is com- 
puted according to this equation 


1 


= GEG a 


Nij 


The expression f (j, E(G,o)) represents the number of 
edges that would be added to the elimination graph due 
to the elimination of vertex j. 


Probabilistic Vertex Elimination 

In the following it is shown how exactly the ants move 
from node to node on the construction tree. All of the 
ACO variants with the exception of ant colony system 
use (64.5) alone to compute the probability pj of mov- 
ing from a node i to another node j where œ and f 
are parameters that can be passed to the algorithm in 
order to weight the pheromone trails and the heuristic 
values 


ie (cil [ni]? l 
© DO teal f 


IEE(G.0) 


ifje E(G,o). (64.5) 


This probability is computed for each vertex left in the 
elimination graph. According to these probabilities, the 
ant decides which vertex to eliminate next. 

Ant colony system introduces an additional pa- 
rameter qo that constitutes the probability that the ant 


makes a greedy move instead of making a probabilistic 
decision 


arg max {[ti|*[nu]®}. ifq < qo; 
IEE(G.0o) 


(64.5), 


otherwise . 
(64.6) 


If a randomly generated number q in the interval of 
[0, 1] is less or equal go then the ant moves to the node 
that otherwise would have the highest probability to be 
chosen. Ties are broken randomly. 

Ant colony system also introduces a so-called lo- 
cal pheromone update. After an ant has constructed its 
solution it removes pheromone from the trails belong- 
ing to its solution according to the following equation 
whereas € is a variant-specific parameter and To is the 
initial amount of pheromone 


ty + 1 —&)ty + Et. (64.7) 
The motivation is to diversify the search so that subse- 
quent ants will more likely choose other branches of the 
construction tree. 


Pheromone Update 

After each of the ants has constructed an elimina- 
tion ordering (that optionally has been improved by 
a local search thereafter) the values in the pheromone 
matrix are updated reflecting the quality of the con- 
structed solutions which will enable the subsequent 
ants in the following iteration to make decisions in 
a more informed manner. Moreover, pheromone is 
gradually removed from the pheromone trails so that 
solutions that might have been the best known so- 
lutions in earlier iterations of the algorithm can be 
forgotten. 


Pheromone Deposition 
In this step for an elimination ordering op that was con- 
structed by an ant k the amount of pheromone that 
will be deposited for each (i,j) in og is determined. 
An edge-independent and an edge-specific pheromone 
update strategy were considered. The first adds the 
same amount of pheromone to all trails belonging to ox 
while the latter adds more or less pheromone to in- 
dividual trails depending on the quality of a certain 
elimination. 

The edge-independent pheromone update strategy 
adds the reciprocal value of the tree decomposition’s 


Metaheuristic Algorithms and Tree Decomposition 


64.2 Generating Tree Decompositions by Metaheuristic Techniques 


width to all pheromone trails that are part of op 


1 
—— , if (i,/) belongs to og ; 
Ark = 4 WO) a a T 
0, otherwise . 


In contrast to the edge-independent update strategy 
the edge-specific update strategy deposits different 
amounts of pheromone onto the trails belonging to the 
same elimination ordering 


1 1 
MEG. oy) /EG op] Won ’ 
tj = if (i, j) belongs to ox ; 


0, otherwise. 
(64.9) 


This amount depends on the ratio between the degree 
of the vertex j when it was eliminated d(j, E(G, ox;)) 
and the number of vertices left in the elimination graph 
|E(G, o;;)| at that time (øx; is the partial elimination or- 
dering that is obtained from op by omitting j and all 
vertices that are eliminated after /). 

The selection of ants that deposit pheromone and 
the weighting of this pheromone varies between the dif- 
ferent ACO variants. The reader is referred to [64.26] 
for description of these variants. 


Pheromone Evaporation 
After the pheromone has been added to the trails, a cer- 
tain amount of pheromone is removed. This amount is 
determined based on the pheromone evaporation rate p 

Tij = ad — p)tij , Vii eT. (64.10) 
Ant colony system only removes pheromone from the 
trails belonging to the best known elimination order- 
ing Obs 

Tij = ad — p)tij : v(i, j) E Obs - (64.11) 

Hybridization with Local Search 
All ACO variants were extended with two local search 
methods for tree decompositions. Both of these algo- 
rithms try to improve the quality of the solutions that 
were constructed by the ant colony by changing the po- 
sitions of certain vertices in the elimination orderings. 
Two local search techniques were used: an hill climb- 
ing algorithm and an iterated local search similar to the 
algorithm proposed in [64.23]. 


Stagnation Measures 
If the distribution of the pheromone on the trails be- 
comes too unbalanced due to the pheromone depo- 
sitions, the ants will generate very similar solutions 
causing the search to stagnate. In order to enable 
the algorithm to detect such situations two stagna- 
tion measures were implemented (variation coefficient 
and A branching factor) proposed by Dorigo and Stiit- 
zle [64.26] that indicate how explorative the search 
behavior of the ants is. A detailed description of stag- 
nation measures is given in [64.25, page 67]. 

All described ACO variants in [64.21] were eval- 
uated experimentally with DIMACS Graph Coloring 
Challenge instances. Max—Min ant system and ant 
colony system performed slightly better than the other 
variants. Although the ant colony optimization in gen- 
eral could not compete with iterated local search and 
genetic algorithms, it could improve the upper bound 
for one of problems. 


64.2.3 Iterated Local Search 
for Tree Decomposition 


The application of iterated local search for generating 
tree decompositions has been investigated in [64.23, 
31]. In this section, we give the description of this al- 
gorithm based on these references. 

The algorithm is based on the iterated local search 
framework and it includes a simple local search heuris- 
tic to generate good orderings, and an iterative process 
in which the algorithm calls a local search technique 
with the initial solution produced in the previous it- 
eration. The algorithm also includes a mechanism for 
acceptance of a candidate solution for the next itera- 
tion. Although the constructing phase is very important, 
choosing the appropriate perturbation at each iteration 
as well as the mechanism for acceptance of solution are 
also crucial to obtain good results for an iterative local 
search algorithm. The iterated local search algorithm 
for tree decomposition is presented below. 


Algorithm 64.2 Iterative Heuristic Algorithm — IHA 
Generate initial solution S1 
BestSolution = S1 
while Termination Criteria is not fulfilled do 
S2 = ConstructionPhase(S1) 
if Solution S2 fulfils the acceptance criteria then 


S1= $2 
else 

S1 = BestSolution 
end if 


1263 


7°79 | 3 Hed 


1264 PartE 


Evolutionary Computation 


7°79 |3 Hed 


Apply perturbation in solution $1 
Update BestSolution if solution $2 has better 
(or equal) width than the current best solution 
end while 
return BestSolution 


The algorithm starts with an initial solution which 
takes an order of nodes as they appear in the input. 
Better initial solutions can also be constructed by us- 
ing other heuristics which run in polynomial time, such 
as maximum cardinality search, min-fill heuristic, etc. 
However, as the proposed method usually finds a solu- 
tion produced by these heuristics in a very short time, 
the algorithm starts with an ordering of nodes given in 
the input. 

After constructing the initial solution the iterative 
phase starts. In this phase, the local search method is 
called iteratively, and then the selected solution is per- 
turbed. Two different local search techniques that can 
be used in the construction phase were proposed. The 
solution returned from the construction phase is ac- 
cepted for the next iteration if it fulfils the specific 
criteria determined by the solution acceptance mech- 
anism. Experiments with different possibilities for the 
acceptance of the solution returned from the construc- 
tion phase were performed. If the solution does not fulfil 
the acceptance criteria this solution is discarded and the 
currently best solution is selected. In the selected so- 
lution, the perturbation mechanism is applied. Different 
possibilities are used for perturbation. The perturbed so- 
lution is given as an input solution in the next call of the 
construction phase. This process continues until the ter- 
mination criterion is fulfilled. 

Two local search methods were proposed for gener- 
ating a good solution which is used as an initial solution 
with some perturbation in the next call of the same local 
search algorithm. Both techniques are based on the idea 
of moving only vertices in the ordering which cause the 
largest clique during the elimination process. The mo- 
tivation for using this method is to reduce the number 
of solutions that should be evaluated. The first proposed 
technique named LS 1 is presented below. 


Algorithm 64.3 Local Search Algorithm 1 - LS1 
(InputSolution) 
BestLSSolution = InputSolution 
NrNotImprovments = 0 
while NrNotlmprovments < MAXNotImprovments 
do 
In the current solution (nputSolution) select a ver- 
tex in the elimination ordering which causes the 


largest clique when eliminated — ties are broken 
randomly if there are several vertices which cause 
the clique equal with the largest clique 
Swap this vertex with another vertex located in 
a randomly chosen position 
if the current solution is better than BestLSSolution 
then 
BestLSSolution = InputSolution 
NrNotImprovments = 0 
else 
NrNotlmprovments = NrNotIlmprovements + 1 
end if 
end while 
return BestLSSolution 


The proposed algorithm applies a simple heuris- 
tic. In the current solution a vertex is chosen randomly 
among the vertices that produce the largest clique in the 
elimination process. Then the selected vertex is moved 
from its position. Two types of moves were used. In the 
first variant, the vertex is inserted in a random position 
in the elimination ordering, while in the second vari- 
ant the vertex is swapped with another vertex located 
in a randomly selected position, i.e., the two chosen 
vertices change their position in the elimination order- 
ing. The swap move was shown to give better results. 
The heuristic stops if the solution does not improve for 
a certain number of iterations. Experiments with differ- 
ent MAXNotIimprovments were performed. LS1 alone is 
a simple heuristic and usually cannot produce good re- 
sults for tree decompositions. However, by using this 
heuristic as a local search heuristic in the iterated local 
search algorithm good results for tree decompositions 
are obtained. 

The second proposed heuristic (LS2) is similar to 
algorithm LS1. However, this technique differs from 
LS1 regarding the exploration of the neighborhood. In 
LS2 in some of iterations the neighborhood of solu- 
tion consists of only one solution which is generated by 
swapping a vertex (that causes the largest clique) in the 
elimination ordering with another vertex located in the 
randomly chosen position. This neighborhood is used 
in a particular iteration with probability p. Experiments 
with different values for parameter p were performed. 
With probability 1 — p, the other type of neighborhood 
will be explored. The neighborhood of current solution 
in this case consists of all solutions which can be ob- 
tained by swapping of a vertex (which causes the largest 
clique) in the elimination ordering with its neighbors. 
The best solution from the generated neighborhood is 
selected for the next iteration in the LS2. Note that 


Metaheuristic Algorithms and Tree Decomposition 


64.2 Generating Tree Decompositions by Metaheuristic Techniques 


in this technique the number of solutions that have to 
be evaluated is much larger than in LS1. In particular, 
in the first phase of search the node which causes the 
largest clique usually has many neighbors and therefore 
the number of solutions to be evaluated when the sec- 
ond type of neighborhood is used is equal to the size 
of the largest clique produced during the elimination 
process. 


Perturbation 

During the perturbation phase the solution obtained 
by local search procedure is perturbed and the newly 
obtained solution is used as an initial solution for 
the new call of the local search technique. The main 
idea is to avoid the random restart. Instead of ran- 
dom restart the solution is perturbed with a bigger 
move(s) as those applied in the local search technique. 
This enables some diversification that helps to escape 
from the local optimum, but avoids beginning from 
scratch (as in the case of random restart), which is very 
time consuming. Three perturbation mechanisms were 
proposed: 


@ RandPert: N vertices are chosen randomly and they 
are moved into new random positions in the order- 
ing. 

@ MaxCliquePer: All nodes that produce the maxi- 
mal clique in the elimination ordering are inserted 
in a new randomly chosen positions in the order- 
ing. 

@ DestroyPartPert: All nodes between two positions 
(selected randomly) in the ordering are inserted 
in the new randomly chosen positions in the 
ordering. 


The perturbation RandPert just perturbs the solu- 
tion with a larger random move and would be kind 
or random restart if N is very large. Keeping N 
smaller avoids restarting from completely new solu- 
tion, and the perturbed solution does not differ much 
from the previous solution. MaxCliquePer concen- 
trates on moving only vertices which produce maxi- 
mal clique in the elimination ordering. The basic idea 
for this perturbation is to apply a technique similar 
to min-conflict heuristic, by moving only the vertices 
that cause large treewidth. DestroyPartPert is simi- 
lar to RandPert, except that the selected nodes to be 
moved are located near each other in the elimination 
ordering. 

Determining the number of nodes N that will be 
moved is complex and may be dependent on the prob- 


lem. To avoid this problem an adaptive perturbation 
mechanism was proposed that takes into consideration 
the feedback from the search process. The number of 
nodes N varies from 2 to some number y (determined 
experimentally), and the algorithm begins with small 
perturbation (N = 2). If during the iterative process 
(for a determined number of iterations) the local search 
technique produces solutions with same tree width for 
more than 20% of cases, the size of perturbation is in- 
creased by 1, otherwise the size of N will be decreased 
by 1. This enables an automatic change of perturbation 
size based on the repetition of solutions with the same 
width. 

The combination of two perturbations was consid- 
ered. The mixed perturbation applies two perturbations: 
RandPert and MaxCliquePer. The algorithm starts with 
RandPert, and switches alternatively between two per- 
turbations if the solution is not improved for a deter- 
mined number of iterations. Experiments with different 
sizes of perturbation sizes for each type of perturbation 
were performed. 


Acceptance Criterion 
Different techniques can be applied for accepting the 
solution obtained by the local search technique. Fol- 
lowing variants for acceptance of solution for the next 
iteration were used: 


@ Solution returned from the construction phase is ac- 
cepted only if it has a better width than the best 
current existing solution. 

@ Solution returned from the construction phase is al- 
ways accepted. 

@ Solution is accepted if its treewidth is smaller than 
the treewidth of the best yet found solution plus x, 
where x is an integer. 


The first variant for accepting a solution is very 
restrictive. In this variant, the solution from the con- 
struction phase is accepted only if it improves the best 
existing solution. Otherwise, the best existing solution 
is perturbed and it is used as input solution for next call 
of the construction phase. In the second variant, the it- 
erated local search applies the perturbation in a solution 
returned from the construction phase, independently 
from the quality of produced solution. The third vari- 
ant is between the first and the second variant, and in 
this case the solution which does not improve the best 
existing solution can be accepted for the next iteration, 
if its width is smaller than the best found width plus 
some bound. 


1265 


7°79 | 3 Hed 


1266 PartE 


Evolutionary Computation 


7°79 |3 Hed 


64.2.4 Other Techniques 
for Tree Decomposition 


This section gives a short overview on other approaches 
applied for tree decomposition. Examples of com- 
plete algorithms for tree decompositions are [64.32- 
34]. Gogate and Dechter [64.33] reported good results 
for tree decompositions by using branch and bound 
algorithms. They showed that their algorithm is supe- 
rior compared to the algorithm proposed in [64.32]. 
The branch and bound algorithm proposed in [64.33] 
applies different pruning techniques, and provides any- 
time solutions, which are good upper bounds for tree 
decompositions. The algorithm proposed in [64.34] in- 
cludes several other pruning and reduction rules and is 
successful on small graphs. The complete techniques 
described earlier have exponential running time in the 
worst case and can only be used to find the optimal 
width for not too large graphs. 

To generate good upper bounds (which can be suf- 
ficient for many applications) for treewidth several 
greedy heuristic techniques that run in polynomial time 
have been proposed. These heuristics select the order- 
ing of nodes step by step based on different criteria, 
such as the degree of the nodes, the number of edges to 
be added to make the node simplicial etc. Most popu- 
lar techniques are maximum cardinality search (MCS), 
min-fill heuristic and minimum degree heuristic. 

MCS [64.35] initially selects a random vertex of the 
graph to be the first vertex in the elimination ordering 
(the elimination ordering is constructed from right to 
left). The next vertex will be picked such that it has 
the highest connectivity with the vertices previously 
selected in the elimination ordering. Ties are broken 
randomly. MCS repeats this process iteratively until all 
vertices are selected. 

The min-fill heuristic first picks the vertex which 
adds the smallest number of edges when eliminated 
(ties are broken randomly). The selected vertex is made 
simplicial (a vertex of a graph is simplicial if its neigh- 
bors form a clique) and it is eliminated from the graph. 
The next vertex in the ordering will be any vertex that 
adds the minimum number of edges when eliminated 
from the graph. This process is repeated iteratively un- 
til the whole elimination ordering is constructed. 

The minimum degree heuristic picks first the vertex 
with the minimum degree. The selected vertex is made 
simplicial and it is removed from the graph. Further, 
the vertex that has the minimum number of unselected 
neighbors will be chosen as the next node in the elimi- 
nation ordering. This process is repeated iteratively. 


MCS, min-fill, and min-degree heuristics run in 
polynomial time and usually produce tree decompo- 
sitions in a reasonable amount of time. According 
to [64.33], the min-fill heuristic performs better than 
MCS and min-degree heuristic. Although these heuris- 
tics sometimes give good upper bounds for tree decom- 
positions, more advanced techniques usually provide 
better upper bounds for most problems. Min-degree 
heuristic has been improved by Clautiaux et al. [64.16] 
by adding a new criterion based on the lower bound 
of the treewidth for the graph obtained when the node 
is eliminated. Recently, Kask et al. [64.36] proposed 
an iterative greedy variable ordering algorithm to im- 
prove the greedy heuristics given earlier. We refer 
to [64.15,37] for a survey of different upper bounds 
algorithms. 


64.2.5 Comparison of Algorithms 
for Tree Decomposition 


In this section, we compare results obtained with meta- 
heuristic aproaches described in this chapter and other 
existing methods in the literature. The results of these 
methods for 62 DIMACS vertex coloring instances are 
given. These instances have been used for testing sev- 
eral methods for tree decompositions proposed in the 
literature. The compared methods have been executed 
in different computers and we give here only results re- 
garding the width of the tree decomposition. The reader 
is referred to [64.15, 16, 20, 23, 25, 33], for the informa- 
tion about the computers used and the time needed to 
generate solutions. 

In Tables 64.1 and 64.2, the results for DIMACS 
graph coloring instances are presented. First and sec- 
ond columns of the tables present the instances and 
the number of nodes and edges for each instance. 
In column KBH are shown the best results obtained 
by algorithms in [64.15]. The TabuS column presents 
the results reported in [64.16], and column BB shows 
the results obtained with the branch and bound al- 
gorithm proposed in [64.33]. Finally, columns GA, 
IHA, and ACO represent, respectively, results ob- 
tained with a genetic algorithm [64.20], iterated local 
search [64.23], and ant colony optimization [64.21, 
25]. 

Based on the results given in Tables 64.1 and 64.2 
we conclude that regarding the width of tree de- 
composition, the metaheuristic techniques described 
in this paper give very good results and for many 
instances the best existing upper bounds for the 
treewidth. 


Metaheuristic Algorithms and Tree Decomposition 


64.2 Generating Tree Decompositions by Metaheuristic Techniques 


Table 64.1 Algorithms comparison regarding treewidth for DIMACS graph coloring instances 


Instance IVI/IE| KBH TabuS 
anna 138/986 12 12 
david 87/812 13 13 
huck 74/602 10 10 
homer 561/3258 Sill Sil 
jean 80/508 9 9 
games120 120/638 Si 33 
queen5_5 25/160 18 18 
queen6_6 36/290 26 25 
queen7_7 49/476 35 35 
queen8_8 64/728 46 46 
queen9_9 81/1056 59 58 
queen10_10 100/1470 73 72 
queenl1_11 121/1980 89 88 
queen12_12 144/2596 106 104 
queen13_13 169/3328 125 122 
queenl4_14 196/4186 145 141 
queen15_15 225/5180 167 163 
queen16_16 256/6320 191 186 
fpsol2.i.1 269/11 654 66 66 
fpsol2.i.2 363/869 1 Sil 31 
fpsol2.i.3 363/8688 31 31 
inithx.i.1 519/18 707 56 56 
inithx.i.2 558/13 979 35 35 
inithx.i.3 559/13 969 35 35 
miles1000 128/3216 49 49 
miles1500 128/5198 77 Til 
miles250 125/387 9 9 
miles500 128/1170 UD Da 
miles750 128/2113 37 36 
mulsol.i.1 138/3925 50 50 
mulsol.i.2 173/3885 32 32 
mulsol.i.3 174/3916 32 32 
mulsol.i.4 175/3946 32 32 
mulsol.i.5 176/3973 31 si 
myciel3 11/20 5 5 
myciel4 23/71 11 10 
myciel5 47/236 20 19 
myciel6 95/755 35 35 
myciel7 191/2360 74 66 


64.2.6 Application of Tree Decomposition 
in Metaheuristic Techniques 


Traditionally, tree decompositions have been used to 
solve constraint satisfaction problems exactly by dy- 
namic programming algorithms. Recently, researchers 
have been investigating the incorporation of tree de- 
composition within metaheuristics techniques. The 
work in this direction is just in the starting phase and to 
the best of our knowledge only two papers investigated 


BB GA IHA ACO 
12 12 12 12 
13 13 13 13 
10 10 10 10 
31 il 31 30 

9 9 9 9 

= 32 32 37 
18 18 18 18 
25 26 25 25 
35 35 35 35 
46 45 45 46 
59 58 58 59 
72 w 12 @ 
89 87 87 89 

110 104 103 109 

125 121 121 128 

143 141 140 150 

167 162 162 174 

205 186 186 201 
66 66 66 66 
31 32 31 31 
31 31 31 31 
56 56 56 56 
31 35 35 31 
al 35 35 31 
49 50 49 50 
77 77 77 W 

9 10 9 9 
o) 24 22 25 
37 si 36 38 
50 50 50 50 
32 32 32 32 
32 32 32 32 
32 32 32 32 
Sill 31 31 31 

5 5 5) 3 
10 10 10 10 
19 19 19 19 
35 35 3) 35 
54 66 66 66 


yet the application of tree decomposition in metaheuris- 
tic search. 

In [64.38] tree-decomposition-based heuristics have 
been developed for the two-dimensional bin packing 
problem with conflicts. The aim is to find a conflict- 
free packing of given items by using minimal number 
of bins. Tree decomposition is applied to decompose 
a problem instance into subproblems which can be 
solved independently. First a tree decomposition is ob- 
tained, and then each item is assigned to a specific 


1267 


7°79 | J Hed 


1268 PartE 


Evolutionary Computation 


€°79 | J Hed 


Table 64.2 Algorithms comparison regarding treewidth for DIMACS graph coloring instances 


Instance IVI/IE| KBH TabuS 
school1 385/19 095 244 188 
schooll_nsh 352/14 612 192 162 
zeroin.i.1 126/4100 50 50 
zeroin.i.2 157/3541 33 32 
zeroin.i.3 157/3540 33 32 
le450_5a 450/5714 310 256 
1e450_5b 450/5734 38} 254 
1e450_Sc 450/9803 340 272 
1e450_5d 450/9757 326 278 
1e450_15a 450/8168 296 272 
1e450_15b 450/8169 296 270 
1e450_15c 450/16680 376 359 
1e450_15d 450/16750 375 360 
1e450_25a 450/8260 255 234 
1e450_25b 450/8263 251 233 
1e450_25c 450/17343 355 327 
1e450_25d 450/17425 356 336 
dsjc125.1 125/736 67 65 
dsjc125.5 125/3891 110 109 
dsjc125.9 125/6961 119 119 
dsjc250.1 250/3218 179 173 
dsjc250.5 250/15 668 233 232 
dsjc250.9 250/,897 243 243 


cluster (this phase is called cluster separation). Then 
these clusters are considered as subproblems which are 
solved iteratively. Finally, the partial solutions from 
subproblems are merged to obtain solutions for the 
whole problem. 

Another application of tree decomposition includes 
the approach introduced by Fontaine etal. [64.39] 
where tree decomposition is used to guide the explo- 


64.3 Conclusion 


Several metaheuristic approaches based on nature in- 
spired strategies and local search have been used suc- 
cessfully in the literature for generating tree decomposi- 
tions. Among these approaches, genetic algorithms and 
iterated local search-based algorithms provide best up- 
per bounds for many benchmark instances. 

Although metaheuristic techniques currently pro- 
vide state-of-the-art upper bounds for most problems, 
the runtime of such algorithms for large graphs is 
still high. Greedy heuristic approaches generate slightly 
worse upper bounds, but are more efficient. Therefore, 
developing more efficient metaheuristics for tree de- 


BB GA THA ACO 
= 185 178 228 
= 157 152 185 
= 50 50 50 
= 32 32 33 
= 32 32 33 
307 243 244 304 
309 248 246 308 
315 265 266 309 
303 265 265 290 
= 265 262 288 
289 265 258 292 
372 351 350 368 
371 353 355 371 
255 225 216 249 
251 227 219 245 
349 320 322 346 
349 327 328 355 
64 61 60 63 
109 109 108 108 
119 119 119 119 
176 169 167 174 
231 230 229 231 
243 243 243 243 


ration for the search space. Authors propose a method 
called decomposition guided VNS (variable neighbor- 
hood search) that exploits the graph of clusters to build 
neighborhood structures. By using clusters better inten- 
sification and diversification is achieved. For example, 
the moves are favored in regions that are closely linked 
and the search is diversified by selecting new clusters 
and therefore exploring new regions of the search space. 


compositions is still a challenging task. Moreover, for 
many problems the treewidth is still not known, and the 
question is if the current metaheuristics can still be im- 
proved to find new upper bounds for such problems. To 
obtain better upper bounds, it would be interesting to 
investigate some other approaches such as memetic al- 
gorithms, large neighborhood search, and other hybrid 
techniques. Furthermore, the iterative improvement of 
the initial generated tree decomposition (based on ver- 
tex ordering) is an interesting question. 

Finally, in some applications, the treewidth is not 
the only important parameter for solving problems 


Metaheuristic Algorithms and Tree Decomposition | References 


based on tree decompositions efficiently. Therefore, 
the development of metaheuristics for generating tree 


References 


decompositions which optimize other features of tree 
decomposition would be of interest in the future. 


64.1 N. Robertson, P.D. Seymour: Graph minors Il: Al- 
gorithmic aspects of tree-width, J. Algorithms 7, 
309-322 (1986) 

64.2 S. Arnborg, D.G. Corneil, A. Proskurowski: Com- 
plexity of finding embeddings in a k-tree, SIAM 
J. Algebr. Discrete Methods 8, 277-284 (1987) 

64.3 J.H. Holland: Adaptation in Natural and Artificial 
Systems (Univ. of Michigan Press, Ann Arbor 1975) 

64.4 S. Kirkpatrick, C.D. Gelaff, M.P. Vecchi: Optimization 
by simmulated annealing, Science 220(4598), 671- 
680 (1983) 

64.5 F. Glover: Future paths for integer programming 
and links to artificial intelligence, Comput. Oper. 
Res. 5, 533-549 (1986) 

64.6 H. Lourenço, 0. Martin, T. Sttitzle: Iterated local 
search. In: Handbook of Metaheuristics, Vol. 57, ed. 
by F. Glover, G.A. Kochenberger (Springer, New York 
2003) pp. 320-353 

64.7 M. Dorigo: Optimization, Learning and Natural Al- 
gorithms, Ph.D. Thesis (Dipartimento di Elettronica, 
Politecnico di Milano, Italy 1992), in Italian 

64.8 M. Dorigo, V. Maniezzo, A. Colorni: The ant system: 
Optimization by a colony of cooperating agents, 
IEEE Trans. Syst. Man Cybern. B 26(1), 29-41 (1996) 

64.9 S. Lauritzen, D. Spiegelhalter: Local computations 
with probabilities on graphical structures and their 
application to expert systems, J. R. Stat. Soc. Ser. B 
50, 157-224 (1988) 

64.10 A.M. Koster, S.P. van Hoesel, A.W. Kolen: Optimal 
solutions for frequency assignment problems via 
tree decomposition, Lect. Notes Comput. Sci. 1665, 
338-350 (1999) 

64.11 J. Alber, F. Dorn, R. Niedermeier: Experimental 
evaluation of a tree decomposition based algo- 
rithm for vertex cover on planar graphs, Discrete 
Appl. Math. 145, 210-219 (2004) 

64.12 A. Koster, S. van Hoesel, A. Kolen: Solving par- 
tial constraint satisfaction problems with tree- 
decomposition, Networks 40(3), 170-180 (2002) 

64.13 J. Xu, F. Jiao, B. Berger: A tree-decomposition ap- 
proach to protein structure prediction, Proc. IEEE 
Comput. Syst. Bioinform. Conf. (2005) pp. 247-256 

64.14 M. Morak, N. Musliu, R. Pichler, S. Riimmele, 
S. Woltran: Evaluating tree-decomposition based 
algorithms for answer set programming, Proc. 
Learn. Intell. Optim. Conf. (LION 6) (2012) 

64.15 A. Koster, H. Bodlaender, S. van Hoesel: Treewidth: 
Computational Experiments, Electronic Notes in 
Discrete Mathematics, Vol. 8 (Elsevier Science, Am- 
sterdam 2001) 

64.16 F. Clautiaux, A. Moukrim, S. Négre, J. Carlier: 
Heuristic and meta-heurisistic methods for com- 


puting graph treewidth, RAIRO Oper. Res. 38, 13-26 
(2004) 

64.17 D.R. Fulkerson, 0. Gross: Incidence matrices 
and interval graphs, Pac. J. Math. 15, 835-855 
(1965) 

64.18 F. Gavril: Algorithms for minimum coloring, maxi- 
mum clique, minimum coloring cliques and max- 
imum independent set of a chordal graph, SIAM 
J. Comput. 1, 180-187 (1972) 

64.19 P. Larranaga, C. Kuijpers, M. Poza, R. Murga: De- 
composing Bayesian networks: Triangulation of the 
moral graph with genetic algorithms, Stat. Comput. 
7(1), 19-34 (1997) 

64.20 N. Musliu, W. Schafhauser: Genetic algorithms for 
generalized hypertree decompositions, Eur. J. Ind. 
Eng. 1(3), 317-340 (2007) 

64.21 T. Hammerl, N. Musliu: Ant colony optimization for 
tree decompositions. In: EvoCOP, ed. by P. Cowling, 
P. Merz (Springer, Berlin, Heidelberg 2010) pp. 95- 
106 

64.22 U. Kjaerulff: Optimal decomposition of probabilis- 
tic networks by simulated annealing, Stat. Comput. 
2(1), 2-17 (1992) 

64.23 N. Musliu: An iterative heuristic algorithm for tree 
decomposition. In: Studies in Computational Intel- 
ligence, Recent Advances in Evolutionary Compu- 
tation for Combinatorial Optimization, Vol. 153, ed. 
by C. Cotta, J.l. van Hemert (Springer, Berlin, Hei- 
delberg 2008) pp. 133-150 

64.24 D.S. Johnson, M.A. Trick: The Second Dimacs Im- 
plementation Challenge: NP-Hard Problems: Max- 
imum Clique, Graph Coloring, and Satisfiability, 
Series in Discrete Mathematics and Theoretical 
Computer Science (American Mathematical Society, 
Boston 1993) 

64.25 T. Hammerl: Ant Colony Optimization for Tree 
and Hypertree Decompositions, M.S. Thesis (Vienna 
University of Technology, Vienna 2009) 

64.26 M. Dorigo, T. Stiitzle: Ant Colony Optimization, 
A Bradford Book (MIT Press, Cambridge 2004) 

64.27 B. Bullnheimer, R.F. Hartl, C. Strauss: A new rank 
based version of the ant system: A computational 
study, Cent. Eur. J. Oper. Res. Econ. 7(1), 25-38 
(1999) 

64.28 T. Stiitzle, H. Hoos: Max-min ant system and local 
search for the traveling salesman problem, IEEE Int. 
Conf. Evol. Comput. (1997) pp. 309-314 

64.29 T. Stützle, H. Hoos: Max-min ant system, 
Future Gener. Comput. Syst. 16(9), 889-914 
(2000) 

64.30 M. Dorigo, L.M. Gambardella: Ant colony system: 
A cooperative learning approach to the traveling 


1269 


H9 | 3 Hed 


1270 


H9 |3 Hed 


Part E 


Evolutionary Computation 


64.31 


64.32 


64.33 


64.34 


64.35 


salesman problem, IEEE Trans. Evol. Comput. 1(1), 
53-66 (1997) 

N. Musliu: Generation of tree decompositions by 
iterated local search. In: EvoCOP, ed. by C. Cotta, 
J. van Hemert (Springer, Berlin, Heidelberg 2007) 
pp. 130-141 

K. Shoikhet, D. Geiger: A practical algorithm for 
finding optimal triangulations, Proc. Natl. Conf. Ar- 
tif. Intell. (AAAI'97) (1997) pp. 185-190 

V. Gogate, R. Dechter: A complete anytime algo- 
rithm for treewidth, Proc. 20th Annu. Conf. Uncer- 
tain. Artif. Intell. UAI-04 (2004) pp. 201-208 

E. Bachoore, H. Bodlaender: A branch and bound 
algorithm for exact, upper, and lower bounds on 
treewidth, Lect. Notes Comput. Sci. 4041, 255-266 
(2006) 

R. Tarjan, M. Yannakakis: Simple linear-time al- 
gorithm to test chordality of graphs, testacyclicity 


64.36 


64.37 


64.38 


64.39 


of hypergraphs, and selectively reduce acyclic 
hypergraphs, SIAM J. Comput. 13, 566-579 
(1984) 

K. Kask, A. Gelfand, L. Otten, R. Dechter: Pushing 
the power of stochastic greedy ordering schemes 
for inference in graphical models, Proc. Natl. Conf. 
Artif. Intell. (AAAI) (2011) pp. 54-60 

H.L. Bodlaender, A.M.C.A. Koster: Treewidth com- 
putations |. Upper bounds, Inf. Comput. 208(3), 
259-275 (2010) 

A. Khanafer, F. Clautiaux, E.-G. Talbi: Tree- 
decomposition based heuristics for the two- 
dimensional bin packing problem with conflicts, 
Comput. Oper. Res. 39(1), 54-63 (2012) 

M. Fontaine, S. Loudni, P. Boizumault: Guiding VNS 
with tree decomposition, 23rd IEEE Int. Conf. Tools 
Artif. Intell. (ICTAI) (IEEE, Boca Raton 2011) pp. 505- 
512 


65. Evolutionary Computation 


Jano I. van Hemert 


In this chapter we will focus on the combination 
of evolutionary computation (EC) techniques and 
constraint satisfaction problems (CSPs). Constraint 
programming (CP) is another approach to deal 
with constraint satisfaction problems. In fact, it is 
an important prelude to the work covered here as 
it advocates itself as an alternative approach to 
programming [65.1]. The first step is to formulate 
a problem as a CSP such that techniques from 
CP, EC, combinations of the two, often referred to 
as hybrids [65.2,3], or other approaches can be 
deployed to solve the problem. The formulation 
of a problem has an impact on its complexity in 
terms of effort required to either find a solution or 
that proof no solution exists. It is, therefore, vital 
to spend time on getting this right. 

CP defines search as iterative steps over a search 
tree where nodes are partial solutions to the prob- 
lem where not all variables are assigned values. 
The search then maintains a partial solution that 
satisfies all variables with assigned values. Instead, 
in EC algorithms sample a space of candidate so- 
lutions where for each sample point variables are 
all assigned values. None of these candidate so- 
lutions will satisfy all constraints in the problem 
until a solution is found. Such algorithms are often 
classified as Davis-Putnam—Logemann-Loveland 
(DPLL) algorithms, after the first backtracking al- 
gorithm for solving CSP [65.4]. 

Another major difference is that many con- 
straint solvers from CP are sound, whereas EC 
solvers are not. A solver is sound if it always finds 


65.1 Informal Introduction to CSP 


For a formal definition please skip to the next sec- 
tion. A constraint satisfaction problem consists of a set 
of variables and each variable must be assigned one 
value from its finite set of values, called its domain. 


and Constraint Satisfaction 


65.1 Informal Introduction to CSP ................ 1271 
65.2 Formal Definitions ...............000000000000000- 1272 
65.3 Solving CSP 
with Evolutionary Algorithms............... 1273 
65:31 Direct ENCOdINE -ciiisean 1273 
65.3.2 Indirect Encoding................:.005 1273 
65.3.3 General Techniques 
to Improve Performance ............ 1274 
65.4 Performance Indicators..................000... 1275 
OSAT EMOCNEY. oesie 1276 
6544.2 EffectventSS sisri: 1276 
65.5 Specific Constraint Satisfaction 
PrODlONS:.<..icicc.csccccssanedssssvacsdoassoedeesoes 1277 
65.5.1 Boolean Satisfiability Problem.... 1277 
65.5.2 Graph COMMING sc ccccsessscsssocecens ses 1278 
65.5.3 Binary Constraint 
Satisfaction Problems ................ 1278 
65.5.4 Examination Timetabling........... 1282 


65.6 Creating Rather than Solving Problems. 1283 
65.6.1 Evolving Binary Constraint 
Satisfaction Problem Instances ... 1283 
65.6.2 Evolving Boolean Satisfiability 


Problem Instances.................0065 1283 

65.6.3 Further Investigations................ 1283 

65.7 Conclusions and Future Directions ........ 1284 
References.. 1284 


a solution if it exists. Furthermore, most constraint 
solvers from CP can easily be made complete, al- 
though this is often not a desired property for 
a constraint solver. A constraint solver is complete 
if it can find every solution to a problem. 


A set of constraints restricts certain simultaneous as- 
signments. In most CSPs, the objective is to search 
for a simultaneous assignment of all the variables such 
that all constraints are satisfied, i.e., no forbidden si- 


1271 


1272 


7'99 |3 Hed 


Part E 


Evolutionary Computation 


multaneous assignment from the set of constraints is 
used. 

A famous example is the SEND MORE MONEY 
puzzle, where each letter must be replaced by a unique 
number such that the following sum holds [65.5] 


In this CSP, the variables are S, E, N, D,M, O,R,Y 
and the domains are {1,...,9} for S,M and {0,...,9} 
for E,N,D,O,R,Y. The constraint can be also writ- 
ten as 1000 x S+ 100x E+ 10x N +D + 1000x M+ 
100x0+10xR+E = 10000 xM + 1000x 0+ 100x 
N+10xE+Y. Every CSP A can be rewritten into 
an another CSP B where a bijective mapping ex- 
ists between the solutions of A and B, which follows 


65.2 Formal Definitions 


Slightly different, but equivalent, formal definitions of 
CSP exist. The most common definition is: 


Definition 65.1 (Constraint Satisfaction Problem) 
is a triple (V, D, C}: 


@ Visan n-tuple of variables V = (v1, v2,..., Vn), 

@ Each v € V has a corresponding m-tuple of values 
called its domains, D, = (d\,do,...,dm) of which 
it can be assigned one and 

© C=(C,...,C;) is a t-tuple of constraints where 
each c € C restricts certain simultaneous variable 
assignments to occur. 


The definition of a constraint is often reversed in the 
literature, where generic CSP is discussed in that con- 
straints are defined as the set of assignments that are 
allowed rather than restricted. Note, in generic CSP lit- 
erature, variables are often denoted with X, whereas in 
graph-oriented problem domains such as graph coloring 
and maximum clique, V is adopted. 


Definition 65.2 (Solution to a CSP) 
is an assignment of variables (d,,...,d,) € D1 x---x 
D,, such that for every constraint c € C on x;,,.. 


(di, pessy di,,) € C: 


+ Xim: 


In the context of one constraint c, we say an as- 
signment of variables satisfies the constraint c if the 


from the reducibility theorem from complexity the- 
ory [65.6]. The solution to this CSP is the assign- 
ment S=9,E=5,N=6,D=7,M =1,0=0,R 
8, Y = 2, which uniquely satisfies the constraint. 

Other very well-known constraint satisfaction prob- 
lems are map coloring, more commonly known as 
vertex coloring (Sect. 65.5.2), and the recreational game 
Sudoku, which is equivalent to completing a graph 
9-coloring problem on a given specific graph with 
81 vertices. A specific EC solution is provided by 
Lewis [65.7]. Quite a lot of constraint satisfaction 
problems exist; we will first look at CSP in general 
within the context of EC as problem solvers. Then 
we will discuss several specific constraint satisfaction 
problems and the particular EC approaches applied to 
these problems. Last, we will provide a brief overview 
on using EC for generating problem instances for 
CSP. 


assignment is in c or violates the constraint c if the 
assignment is not in c. A CSP can be insoluble — 
more commonly written as insolvable, which means 
every assignment of variables will violate at least one 
constraint. 

A constraint solver is an algorithm that takes as in- 
put a CSP and produces as output either a solution or 
a proof that no solution exists or a notification of failure. 
The input is often referred to as a problem instance, as 
a CSP is often defined to cover a class of problems such 
as, 3-satisfiability. The output can be more than one so- 
lution, in fact it could be every solution. However, as 
EC techniques are based on sampling, in principle they 
cannot proof that every solution has been found, which 
is referred to as not complete. Moreover, they cannot 
proof no solution exists, which is referred to as not 
sound. Therefore, constraint solvers based on EC and 
other heuristic approaches often terminate after a cer- 
tain criterion is met, e.g., a predefined elapsed time is 
reached in terms of the number of solutions evaluated, 
the computation time spent, or a certain convergence of 
the population reached. 

We recommend the following books for further 
reading on constraint satisfaction. For the foundations 
of the problem and basic algorithms, Tsang [65.8]; for 
an introduction with comprehensive overview of con- 
straint programming techniques, Dechter [65.9] and 
Lecoutre [65.10]; and for a more theoretical approach 
Apt [65.1] and Chen [65.11]. 


Evolutionary Computation and Constraint Satisfaction | 65.3 Solving CSP with Evolutionary Algorithms 1273 


65.3 Solving CSP with Evolutionary Algorithms 


In this chapter we will restrict ourself to covering 
the conceptual mapping required to solve a CSP with 
an evolutionary algorithm. This mapping will con- 
sist of choosing a representation for the problem and 
a corresponding fitness function to determine the qual- 
ity of a solution. Once this mapping is complete, 
the evolutionary algorithm will require other compo- 
nents, such as appropriate variation operators, selection 
mechanisms, and a suitable initialization method for 
the population and termination criteria. All these, and 
other optional variants can be found elsewhere in the 
handbook. 

We will explain the two most common mappings 
using the well-known n-queens on an n x n-chessboard 
problem. These mappings are direct encoding and in- 
direct encoding. First we introduce a conceptual defini- 
tion of the problem. 

The n-queens problem requires the placing of n 
queens on an n xn chessboard such that no queen at- 
tacks any of the other n—1 queens. Thus, a solution 
requires that no two queens share the same row, col- 
umn, or diagonal. Several common formal definitions 
of the problem exist. The most common is to define n 
variables {q1,...,4n}, where each variable q has a do- 
main that consists of the row position the queen will 
be placed on in its corresponding unique column, i. e., 


q€{1,...,n}Yi= 1,...,n. The set of constraints con- 
sists of q; A qj (i. e., not in the same row) and |q; —q| 4 
|i—j|Vi,j =1,...,n(i-e., not in the same diagonal). 


The n-queens problem is no longer considered 
a challenging problem as it has a structure that can be 
exploited to solve very large problems of over 9 million 
queens by repeating a pattern [65.12]. It is, however, an 
excellent problem for explaining characteristics of con- 
straint satisfaction problems and their solvers due to the 
simple 2-D spatial nature of the problem. For instance, 
to explain symmetry in CSP, the 8-queens problem can 
be used to show it has 12 unique solutions, as shown in 
Fig. 65.1 out of the 92 distinct solutions when removing 
variants due to rotational and reflection symmetry. 


65.3.1 Direct Encoding 


With a direct encoding the genotype consists of a vec- 
tor where each element corresponds uniquely to one 
variable of the CSP; an element g; contains values di- 
rectly from the domain of its corresponding variable D;. 
A wide variety of genetic operators both for mutation 
and recombination are applicable to this encoding and 


can be found in [65.13]. Most of these operators will be 
called discrete or mixed-integer operations. 

The genotype is mapped to the phenotype by taking 
into consideration the constraints; it requires a measure- 
ment for determining the quality of candidate solutions. 
Thus, we need to introduce a fitness function. The most 
common fitness function takes the sum of all constraints 
violated by a candidate solution 


fitness(#) = X violated(c) , 
cEC 
1 ifc violated by g 


where violated(c) = . : ie 
0 ifc satisfied by g 


The fitness should be minimized and once it reaches 
zero, a solution has been found. 


65.3.2 Indirect Encoding 


With an indirect encoding the genotype first needs to be 
transformed into a full or partial assignment of the vari- 
ables of the CSP. It is also referred to as local search 
depending on the level of sophistication; these transfor- 
mations range from as simple as a greedy assignment all 
the way to sound search algorithms evaluating a small 
part of the CSP. 

The most common approach for this representa- 
tion takes as a genotype the permutation of variables 
of the CSP. Many genetic operators are designed to 
maintain a permutation and several are explained in the 
Handbook of Evolutionary Computation [65.13]. The 
permutation is the input to the local search and de- 
termines the order in which variables are processed; 
processing a variable involves trying to assign a value 
such that no constraint is violated and perhaps further 
steps if no value can be assigned without violating at 
least one constraint. 

More advanced encodings may also include the or- 
dering in which to consider values from each variable’s 
domain. From constraint programming we know that 
the order in which variables and values are considered 
has a huge impact on the efficiency of search algo- 
rithms [65.14]; more often it is the search method that 
determines the order using a particular heuristics such 
as choosing the next vertex with the maximum satura- 
tion degree, as is used in DSatur [65.15]. The saturation 
degree for a vertex is defined as the total number of 
colors used for coloring its neighbors. The principle 


€°S9 | J Hed 


1274 PartE 


Evolutionary Computation 


€°S9 | J Hed 


8 8 
7 7 
6 6 
5 5 
4 4 
3 3 
2l 2 
1 w iw 


W 
W 


-NUU KMD I 
PNW MN DAA C 


W 


Ab @ al @ i g In 
W 


Ail ec ade i gy in 


FNwW KUNDAN © 
FNwWKUNDYA 


Ww 


Al) € ale ia In 


one 


a bc de f ¢ h 


W 


W 


PNW RAAN 
PNW RAAN 


WwW 


awene ig In 


abie deeh 


8 8 

7 7 

oi 6 

5 5 

4 4i 
3 3 

2 2 

1 iw 


Aj) © al tf i e In 


A by @ ol © i fy Im 


=. NU RARUA 
rPnNwWRADNYA © 


Ab @ al @ i g In 


awede i & In 


a lb € ale i w In 


Ab) @ ol © if fy im 


Fig. 65.1 The 12 unique solutions under symmetry via rotations and reflections for the 8-queens problem 


has been used in many algorithms since its introduction 
in 1979. 

The most common fitness function used with indi- 
rect encoding simply counts the number of unassigned 
variables after the local search terminates. Note that 
two different strategies will influence the resolution of 
this function. If the local search terminates after it first 
encounters a variable it cannot assign, then many candi- 
date solutions will have the same fitness but can still be 
very different. On the other hand, terminating after all 
variables have been considered will give a richer land- 
scape to consider but may incur more computational 
effort. See [65.16] for a comprehensive theoretical and 
empirical analysis of sampling in EC. 


65.3.3 General Techniques 
to Improve Performance 


Over the past two decades, many techniques were de- 
veloped to improve the efficiency and/or the effective- 
ness of EC for solving constraint satisfaction problems. 


Only a handful of these techniques were evaluated on 
more than one problem. Hence, we cannot draw any 
general conclusions about the success of these tech- 
niques. Even worse is that many studies will show 
improvement only compared to their previous results 
or compare their results with an algorithm that has 
already been superseded in terms of performance by 
many other techniques. Often the set of competitor al- 
gorithms is chosen to fall within EC, which severely 
limits the strength of the competition. Therefore, we 
will discuss techniques for improving performance in 
the context of the problems they were developed for. 
Section 65.5 reviews several popular CSPs used for 
developing more efficient and effective evolutionary 
algorithms. 

One approach that has been applied to several CSPs 
with varying success is that of assigning weights to 
constraints to allow biasing the search towards satis- 
fying certain constraints; in the first experiments this 
approach was referred to as penalty functions [65.17]. 
Moreover, the search can be influenced dynamically 


Evolutionary Computation and Constraint Satisfaction | 65.4 Performance Indicators 1275 


by adapting weights according to heuristics, such as 
increasing the weight of the constraint that has been sat- 
isfied the least number of times recently [65.18]. The 
origin of this idea can be found in the self-adaptation 
used in evolution strategies [65.19]. 

With penalty functions, the optimization objectives 
replacing the constraints are traditionally viewed as 
penalties for constraint violation, hence to be mini- 
mized [65.20]. There are two basic types of penalties: 


1. Penalty for violated constraints 
2. Penalty for wrongly instantiated variables. 


Formally, let us assume that we have constraints 
c; (i= {1,...,m}) and variables y (j= {1,...,n}). 
Let C’ be the set of constraints involving variable vj. 
Then the penalties relative to the two options described 
above can be expressed as follows: 


1. fils) = OL, wi x (s, ci), where 


1 ifs violates c; 


X(8, ci) = | 


0 otherwise 
2. f(s) = } j= w x X05, CÌ), where 


; 1 ifs violates at least one c € C 
x(s,C) = 


0 otherwise, 


where the w; and w; are weights that correspond to 
a constraint and a variable, respectively. These will 
be important later on, for now we assume all these 
weights equal to 1. 


65.4 Performance Indicators 


An understanding of the efficiency and effectiveness is 
vital when choosing which solver to use or when de- 
veloping an algorithm to deal with a specific CSP. In 
this section we briefly explain measures for determining 
these properties in the context of solving CSP. How- 
ever, these properties must be measured using a suite 
of benchmark instances and, as EAs are generally ran- 
domized algorithms, with multiple independent runs of 
the algorithm on each instance. Choosing an appro- 
priate suite of benchmark instances is paramount to 
making decisions on which algorithm, parameter set- 
ting, or next algorithmic feature to add. 


Obviously, for each of the above functions f € 
{fi,f2} and for each se S we have that $(s) = true 
if and only if f(s) = 0. For instance, in the graph 3- 
coloring problem the vertices of a given graph G = 
(V, E), E C V x V, have to be colored by three colors 
in such a way that no neighboring vertices, i. e., graph 
nodes connected by an edge, have the same color. This 
problem can be formalized by means of a CSP with n = 
|V| variables, each with the same domain D = {1, 2, 3}. 
Furthermore, we have m = |E| constraints, one for each 
edge e = (k,l) € E, with ce(s) = true if and only if sk # 
sı. Then the corresponding CSP is (S,@), where S = 
D" and $(s) = Neeg Ce. Using the constraint-oriented 
penalty function fı with w; = 1 for all i= {1,...,m} 
we count the incorrect edges that connect two vertices 
with the same color. The variable-oriented penalty func- 
tion f with w; = 1 for all i= {1,...,m} amounts to 
counting the incorrect vertices that have a neighbor with 
the same color. 

Advantages of indirect encoding: 


@ Introduces in general, e.g., f\,f2 are problem- 
independent penalty functions 

@ Reduces problem to simple optimization 

e@ Allows user preferences by weights. 


Disadvantages of indirect encoding: 


© Loss of information by packing everything in a sin- 
gle number 

@ In the case of constrained optimization (as opposed 
to CSP as we are handling here) fı, f2 are reported 
to be weak [65.21]. 


In a sense, the search for a good algorithm is in 
itself an optimization problem. The suite of bench- 
mark instances represents only the problem, just like 
training data in a machine learning problem represents 
all data possibly encountered. Changing an algorithm 
and tuning its parameters on the same small suite of 
instances could lead to over-fitting [65.22,23], which 
in turn means the algorithm will have a poorer per- 
formance in the general case. Therefore, the first step 
should be to characterize the problem well and have 
a good representation, e.g., spread, of the instances pos- 
sibly encountered when deployed. 


1'99 | J Hed 


1276 Part E 


Evolutionary Computation 


1°99 | 3 Hed 


65.4.1 Efficiency 


The time taken by an algorithm to provide a solution is 
an important factor. Even more so in situations where 
solutions are required in real time. Much research is 
devoted to speeding up algorithms, either by cleverly 
exploiting properties of the problem, by parallelization, 
or by balancing aspects of the quality of the solution. 

The most common approach to measuring the ef- 
ficiency of evolutionary algorithms is by counting the 
number of evaluations, i.e., the number of times the 
fitness function is executed. This approach has several 
drawbacks. First, the approach allows comparison only 
with algorithms that use the exact same fitness func- 
tion and spend the most significant part of their time 
on computing that function. Second, the computational 
complexity of the evolutionary algorithm may not be 
dependent on the fitness function. For instance, with the 
indirect encoding described in Sect. 65.3.2, much com- 
putational effort will go into the local search, whereas 
the computation of the fitness is trivial. 

Another common approach is to measure time spent 
as reported by the operating system. This has even 
more drawbacks as the reported numbers will depend 
on the computer programming language used for im- 
plementing the algorithm, the compiler and its setting 
for translating the implementation into machine code, 
the architecture of the computer for executing the ma- 
chine code, and the operating system for hosting the 
execution environment. Variations of these will have an 
affect on the reported results and, moreover, as these 
environments themselves change over time, future stud- 
ies will find it hard to reproduce results accurately 
or even create meaningful comparisons to reported 
results. 

A more meaningful solution is to count all the 
atomic operations that are directly related to the prob- 
lem. The operations that must be included should be 
those that in theory increase exponentially in numbers 
with larger problems, as CSP fall under the class of non- 
polynomial deterministic problems. The most common 
operation will be a conflict check; this is also referred 
to as a constraint check, but in the strictest sense, a con- 
straint check consists of multiple conflict checks [65.8]. 
For example, when solving the n-queens problem, ev- 
ery time the algorithm checks q; Æ qj for any q; and qj, 
this should be recorded as one check. The same proce- 


dure should be followed for the constraint concerning 
diagonal attacks |q;— q;| # |i—j|. The sum of all checks 
when the algorithm terminates is the computational ef- 
fort spent. 

By reporting the number of conflict checks we as- 
sure future studies can compare with current results as 
this measurement will not be affected by future changes 
in hardware and software environments. We are mea- 
suring a property of the algorithm here as opposed to 
a property of one implementation of the algorithm run- 
ning in one particular environment. 

It is important to note that there are subtle differ- 
ences in the reporting used in different studies. Some 
studies report the average number of operations over all 
independent runs, including runs that are unsuccessful, 
i. e., where no solution was found. Other studies report 
the average number of operations to a solution, where 
only the runs that yield a solution are taken into ac- 
count. The former method will produce higher averages 
than the latter if the success rate is less than 1. 


65.4.2 Effectiveness 


Efficiency is only one aspect of which to measure the 
success of a constraint solver. The other most impor- 
tant aspect is that of effectiveness, which measures how 
successful an algorithm is in finding or approximating 
a solution. The easiest and most commonly used mea- 
surement is that of the success rate, which is defined 
for an experiment as the number of runs in which an 
algorithm finds a solution divided by the total of num- 
ber of runs of the same algorithm in that experiment. As 
no prior knowledge is required about whether problem 
instances are insolvable, this measurement is straight- 
forward to implement. 

Another popular measurement in combinatorial op- 
timization is distance to the optimal solution. This 
measurement poses two challenges in the context of 
constraint satisfaction. Unlike a combinatorial opti- 
mization problem, which has the function to optimize, 
a CSP has no such function. As an alternative we could 
use the fitness function, but that is not an inherent 
property of the problem. Also, we often do not know 
whether a CSP has a solution and when it does not, then 
we do not know the optimal fitness function. Distance 
to the optimal solution is rarely used when solving CSP 
due to these impracticalities. 


Evolutionary Computation and Constraint Satisfaction 


65.5 Specific Constraint Satisfaction Problems 


65.5 Specific Constraint Satisfaction Problems 


Many specific constraint satisfaction problems have 
been addressed in the literature. A full overview 
of these would not provide much benefit, as the 
most likely scenario is that one is looking for pa- 
pers that provide descriptions of algorithms and re- 
sults with those algorithms on a certain problem. 
The exceptions to this are several problems that in 
the literature are used to drive the development of 
algorithms in terms of efficiency and effectiveness. 
These core problems are used over and over to test 
whether new algorithms are better than existing algo- 
rithms. 

Several reasons exist for the choice of these prob- 
lems. Their compact definition means that the problem 
is easy to replicate by everyone and quick to introduce 
in papers. The most popular problems were used in the 
1970s when the theory on non-polynomial determinis- 
tic problems was developed, which were consequently 
seen as important intelligent building blocks. Also, test 
sets and later problem generators were released in the 
public domain, thereby providing easy access to test 
suites. 

We will use several of these core problems to de- 
scribe the progress of development in evolutionary 
computation for constraint satisfaction problems. For 
each problem we will provide a quick introduction, 
a justification of its importance in terms of practical ap- 
plications, and a set of pointers to problem suites before 
describing the approaches used. 


65.5.1 Boolean Satisfiability Problem 


Given a Boolean formula ¢ determine whether an as- 
signment of the variables in @ exists that makes it 
TRUE. It is often referred to as satisfiability and ab- 
breviated to SAT [65.24]. In SAT variables are often 
referred to as literals. Most often the problem is studied 
in conjunctive normal form (CNF) where ¢ is a con- 
junction of clauses where each clause is a disjunction 
of variables. Every SAT problem can be reduced to 
a 3-CNF-SAT (three variables/clause-conjunctive nor- 
mal form-satisfiability) [65.25], where each clause has 
three literals. 

3-CNF-SAT was the first problem to be shown to 
be NP-complete [65.26]. It serves as an important basis 
to proving that other problems are NP-complete, such 
as the maximal clique problem. Such a proof involves 
a polynomial-time reduction from 3-CNF-SAT to the 
other problem [65.6]. 


The following is an example of 3-CNF-SAT: 


© d= (x1 V7%3 V x4) A (x2 V x1 V =x6) A (X3 V X2 V 
7X5) 

© A solution: x; = 1, x2 = 0, x3 = 1, x4 = 0, x5 = 0, 
X6 = 0. 


Important practical applications of SAT are model 
checking [65.27], for example, in mathematical proof 
planning [65.28], generic planning problems, espe- 
cially using the planning domain definition language 
(PDDL) [65.29], test pattern generation [65.30], and 
haplotyping in the scientific field of bioinformat- 
ics [65.31]. 

As far as the development of efficient and effective 
CSP solvers go, SAT is the most active field. It has 
an annual conference — The International Conference 
on Theory and Applications of Satisfiability Testing, 
which also hosts an annual competition to determine 
the current best solvers. The latter also ensures that new 
problem instances are continuously added, which pre- 
vents what is called overfitting [65.32] of the solvers to 
an existing set of problem instances. 

The general approach to solve satisfiability with EC 
is to directly represent the variables in # and assign 
these either TRUE or FALSE, i.e., these form the do- 
main. The fitness function used is the number of clauses 
violated, which should be minimized. 

The earliest evolutionary algorithm for SAT was re- 
ported in 1994 by [65.33] and was soon followed by 
the work of Gottlieb and Voss [65.34,35], who were 
looking to improve its performance. Soon after, inde- 
pendent efforts led to parallelized algorithms [65.36, 
37]. In 2000, the first adaptive evolutionary algorithms 
were applied [65.38], which was 3 years after they were 
applied to graph coloring (Sect. 65.5.2). 

The introduction of hybrid evolutionary algorithms 
with local search created a real boost of research ac- 
tivity [65.39-43]. However, a major issue remains with 
research on solving satisfiability with EC, as all studies 
include only local search and evolutionary algorithms 
without comparing to the state-of-art DPLL and heuris- 
tic solvers from the annual satisfiability community. 
This holds true even for recent studies such as [65.44]. 
Due to this major gap between the two communities of 
EC and CP, we do not comment on the comparison in 
terms of effectiveness and efficiency. 

New research [65.45] focusses on using EC to 
evolve parameter settings for existing sound SAT 


1277 


s°s9 | 3 Hed 


1278 PartE 


Evolutionary Computation 


s°S9 | 3 Hed 


solvers, mostly ones based on the Davis—Putnam— 
Logemann—Loveland algorithm [65.46]. All modern 
SAT solvers have many parameters to tune how the 
search is organized. These parameters are often tuned 
manually, which allows for only a small exploration. 
Using EC, a much larger space can be explored in order 
to create fast SAT solvers for a given benchmark. 


65.5.2 Graph Coloring 


Graph coloring has several variants. The most com- 
monly used definition is that of graph k-coloring, also 
known as the vertex coloring problem. Given a graph of 
vertices and edges (V, E) the goal is to find a coloring 
of the vertices V of the graph such that no two adja- 
cent vertices have the same coloring. If c(v) provides 
the color assigned to v, then Vv, w E€ V : c(v) # c(w) iff 
(v, w) € E. The objective is to make use of k or less col- 
ors. The problem is known to be NP-complete for k > 3 
and to be decidable in linear time for k < 2. 

Graph coloring is an abstract problem that lies at 
the core of many applications. Well-known applications 
are scheduling, most specifically timetabling [65.47], 
register allocation in compilers [65.48], and frequency 
assignment in wireless communication [65.49]. It is 
a well-studied problem as is shown by the number of 
entries in the best-kept bibliography source until April 
2010 with over 450 publications contributing to vertex 
coloring [65.50]. 

The Second DIMACS Implementation Challenge 
in 1992-1993 focused on maximum clique, graph col- 
oring, and satisfiability. The challenge provided not 
only a standard format for graph k-coloring prob- 
lem instances, but also provided a set of problem 
instances that is still popular today. Soon after, in 
1994, Culberson and Luo [65.51] created a problem in- 
stance generator, which can create problem instances 
with a known k and various other properties. Sev- 
eral other generators exist with specific goals, such as 
to hide cliques [65.52], to create register-interference 
graphs [65.53], and to create timetabling problems 
(Sect. 65.5.4). 

The most straightforward approach to solving graph 
k-coloring with EC is to represent a genome as a vec- 
tor of all variables of the problem. This vector can then 
undergo genetic operators suitable for integer represen- 
tations. The fitness function is simply the number of 
violated constraints, which should be minimized until 
a solution is found when the fitness is equal to zero. Un- 
fortunately, this approach leads to algorithms that are 
inefficient and ineffective [65.54]. 


To make EC more efficient and effective for solving 
graph k-coloring, new algorithms have been developed; 
these broadly fall into two categories. The first cate- 
gory consists of adding mechanisms that prevent the 
stagnation of search due to premature convergence. 
The second category consists of alternative representa- 
tions that make use of decoders to map genotypes to 
phenotypes. The two categories are not mutually exclu- 
sive, and studies have included algorithms that combine 
mechanisms from both categories. 

The earliest work on solving graph k-coloring with 
EC includes the following. Fleurent and Ferland suc- 
cessfully considered various hybrid evolutionary algo- 
rithms [65.55] with Tabu search and extended their 
work into a general implementation of heuristic search 
methods in [65.56]. Von Laszewski looked at structured 
operators and used adaption to improve the convergence 
rate of a genetic algorithm [65.57]. Davis designed an 
algorithm [65.58] to maximize the total of weights of 
nodes in a graph colored with a fixed number of col- 
ors. Coll et al. [65.59] discussed graph coloring and 
crossover operators in a more general context. 

Juhos and van Hemert introduced several heuris- 
tics [65.60, 61] for guiding the search of an evolutionary 
algorithm. All these heuristics depend on their novel 
representation that collapses the graph by combining 
nodes assigned with the same color into one hypernode, 
which speeds up further constraint checking as edges 
are merged into hyperedges [65.62]. This representation 
benefits both complete and heuristic methods. 

Moreover, as shown in the results in Fig. 65.2, 
the evolutionary algorithms developed by Juhos and 
van Hemert are able to outperform a complete method 
(Backtracking-DSatur) on very difficult problem in- 
stances where the chromatic number is 10 or 20. These 
algorithms are unable to compete with the complete 
method for smaller chromatic numbers of 3 and 5. 


65.5.3 Binary Constraint 
Satisfaction Problems 


A binary constraint satisfaction problem (BINCSP) is 
a CSP where every constraint c € C restricts at most 
two variables [65.63]. Often, network graphs are used 
to visualize (CSP) instances. In Fig. 65.3, we provide 
an example of a restricting hypergraph of a BINCSP. 
It consists of three variables V = {v1, v2, v3}, all of 
which have domain D = {a, b}. In a hypergraph every 
vertex corresponds to a possible variable assignment, 
i.e., (v,d), where v € V and d € D,. Every edge indi- 
cates the variable assignments that are forbidden by the 


Evolutionary Computation and Constraint Satisfaction 


65.5 Specific Constraint Satisfaction Problems 


a) Average number of colors used 


On 
x=3 
5 caer * 
x nny 
4.5 ie "i 
Pa —+— EA-cos, bmt a 
~--x--- EA-dotprod, bmt `~.. 
4 K- - 2- OK ---x--- EA-noheur “Pe. 
==- Backtrack-dsatur, imt | “=y 
35 


3 —pit 
0.02 0.03 0.04 0.05 0.06 0.07 
Average density of edges in the graphs 


c) Average number of colors used 


—— EA-cos, bmt 

II- .--x--- EA-dotprod, bmt 

10 ---*--- EA-noheur 
oo Backtrack-dsatur, imt 


03 #035 O04 045 O05 0.55 0.6 
Average density of edges in the graphs 


b) Average number of colors used 
104 


—— EA-cos, bmt 
f ---x--- EA-dotprod, bmt 

Ca ---*--- EA-noheur 

ae &-- Backtrack-dsatur, imt 


9 
8 
7 
6 
5 
4 
3 > 
0.04 0.06 0.08 01 O12 0.14 0.16 0.18 
Average density of edges in the graphs 
d) Average number of colors used 


404 
38 


26 —— EA-cos, bmt 

24 ---x--- EA-dotprod, bmt 
---*--- EA-noheur 

2) &-» Backtrack-dsatur, imt 


20 
0.5 0.55 0.6 0.65 0.7 0.75 0.8 
Average density of edges in the graphs 


Fig. 65.2a-d Results of several evolutionary algorithms against the complete method Backtracking-DSatur; average 


minimum number of colors used through the phase transition 


set of constraints C. In the example, we show all the 
edges that correspond to the following set of forbidden 
value pairs C={ {(vi,a),(v2,a)}, {(v1, a), (v3, b)}, 
(ib), waah Evi b), (v2, )}, Ei, b), v3.4) }, 
{(v1,), (v3,b)}, (02,4), (v3, a)}, {(v2, a), (v3, DY} } 
For problem instances, studies on BINCSP gener- 
ally create large sets of instances using one of many 
problem instance generators. Several models to ran- 
domly create BINCSPs have been designed and an- 
alyzed [65.63—65]. All of these incorporate a set of 


Fig. 65.3 Example of a |V|-partite hypergraph of 
a (BINCSP) with one solution: {(v;, a), (v2, b}, (v3, a)} 


parameters that may be used to control the size and dif- 
ficulty of the problems. Often, these parameters can be 
used to create a set of problems that go through a phase 
transition. That is, we order the set on the parameters 
and observe how the algorithms behave when we move 
through the parameter space. In most constraint satis- 
faction problems we observe that the performance drops 
gradually until it reaches a minimum, after which it 
rises again. Most researchers test their algorithms in the 
region where the minimum is reached. Here the set of 
most difficult to solve problem instances is found. We 
will discuss these methods next. 

The model most often used in empirical research on 
binary constraint satisfaction problems is one that uses 
four parameters to control, to some degree, the diffi- 
culty of an instance. By varying these global parameters 
one can characterize instances that are more likely to be 
either more or less difficult to solve. These parameters 
are: the number of variables n = |V|, the size of each 
variable’s domain m = |D, | = |D,,| |D,,,|, the 
density of constraints pı, and the average tightness of 


1279 


s°s9 | 3 Hed 


1280 PartE 


Evolutionary Computation 


s°s9 | 3 Hed 


all the constraints p2. There are two ways of looking at 
parameters pı and p2. We will use the following defini- 
tions. 


Definition 65.3 (Density) 

The density of a BINCSP is the ratio between the max- 
imum number of constraints (5) and the actual number 
of constraints |C], 


s 


AS 


Definition 65.4 (Tightness) 

The tightness of a constraint c C C over the variables 
v,w € V of a BINCSP (V, D, C) is the ratio between 
the total number of forbidden variable assignments |c| 
and the total number of combinations of variable as- 
signments possible m = |D,||Dw], 


[el 
po(c) = — - 
m 


Definition 65.5 (Average Tightness) 

The average tightness of a BINCSP (V, D, C} is the 
sum of the tightness over all constraints divided by the 
number of constraints, 


p= decec P2(C) 
IC| 


These definitions give the density and tightness in 
terms of a ratio, or in other words, as the percent- 
ages of the maximum. Another way of looking at these 
two properties uses probabilities [65.66]. We could de- 
fine the density of a BINCSP as the probability that 
a constraint exists between two variables. The tight- 
ness can be alternatively defined in an analogous way, 
as the probability that a conflict exists between two in- 
stantiations of two variables. The differences in these 
viewpoints becomes apparent in the different imple- 
mentations of algorithms that generate BINCSPs, as 
with uniform generation the ratio in an instance is deter- 
mined beforehand, while with probability the ratio will 
vary according to a normal distribution. When compar- 
ing studies it is important to know when probabilities 
are used whether the results reported are against the 
probability set or the actual measured ratio in the whole 
instance. 


Table 65.1 Different models for the general method for 
generating binary constraint satisfaction problems 


Nogoods 
Probability Uniform 
Constraints Probability Model A Model C 
Uniform Model D Model B 


The simplest way to empirically test the perfor- 
mance of an algorithm on solving CSPs is by generating 
instances using different settings for the four main pa- 
rameters, n, m, pı, and p>. However, there are two ways 
of choosing where to put constraints in a constraint net- 
work. We can choose the number of constraints we want 
to have beforehand and then uniformly distribute them 
in the constraint network. Alternatively, we can choose 
for each possible edge in the constraint network with 
the probability pı if this edge is inserted, i.e., a con- 
straint is added. We will call the first model the uniform 
model and the second the probability model. The same 
categorization holds for nogoods. Given a constraint we 
can either distribute pzm? nogoods uniformly or with 
probability pz decide which value pairs become no- 
goods. Now we can define four different models and 
we will name them according to the models in [65.63, 
65]. The models are shown in Table 65.1. 


Definition 65.6 (Parameter Vector of a BINCSP) 

A parameter vector of a binary constraint satisfaction 
problem (BINCSP) with n variables and m as each vari- 
able’s domain size is a 4-tuple (n,m, pı, p2) of four 
parameters: the number of variables n, the domain size 
of each variable m, the density pı, and the average tight- 
ness p2. 


We can also characterize a set of binary constraints 
satisfaction problems using the parameter vector as a set 
B of BINCSP instances where 


Y(n, m, pı. P2); (n,m, pi, P?) € B 


on i 


pi ^D =P . 


nAm=m Api 


Such a set we call a suite of problem instances. 
Achlioptas et al. proves in [65.64] that as the num- 
ber of variables becomes large almost all instances 
created by Models A-D become unsolvable. The rea- 
son lies in the existence of flawed variables. Whenever 
a variable v is involved in a constraint and has all its val- 
ues incompatible with a value of an adjacent variable w, 
this variable is called flawed. In terms of compound la- 
bels using the constraint c over variables v and w this is 


Evolutionary Computation and Constraint Satisfaction 


65.5 Specific Constraint Satisfaction Problems 


written as, 


Yv € D, : Aw € Dp: 
satisfies(((v, v}, (w,w)}),c) Ac EC. 


When the number of variables is increased without 
changing the other parameters, the number of flawed 
variables will increase, thus making it easy to prove 
instances have no solution. To overcome the problems 
a new model is proposed [65.64]: 


Definition 65.7 (Model E) 

The graph C” is a random n-partite graph with m nodes 
in each part that is constructed by uniformly, indepen- 
dently, and with repetitions selecting pe (5)m? edges out 


of the (')m” possible ones. 


The idea behind this model is that the difficulty is 
controlled by the tightness and not influenced by the 
structure of the constraint network. The parameter p, 
is responsible for the average tightness of the BINCSP. 
However, it is not the same parameter as the average 
tightness pz. Because we allow repetitions in the pro- 
cess we end up with an average tightness smaller than 
or at most equal to pe. 

Parameter pe also influences the value of pı. 
In [65.65] we find the proof that using Model E with 
fairly small values (pe < 0.05) will result in a fully 
connected constraint network (pı = 1). This is seen as 
a flaw in Model E, as many problems do not require 
a fully connected constraint network. This has led to 
yet another model. 

MacIntyre et al. propose a more generalized version 
of Model E called Model F [65.65]. This model starts 
out the same way as Model E by generating pip2m(5) 
nogoods. Afterwards, a constraint network is gener- 
ated with exactly pib) edges in the uniform way. All 
nogoods that are not in a constraint in the constraint 
network are removed from the problem instance. Model 
E is the special case of Model F where p; = 1. The 
benefit of Model F is the ability to generate problems 
where pı < 1, which is more realistic towards real- 
world problems. 

Craenen et al. [65.67] present the largest compari- 
son study of EC and CP approaches for the BINCSP. 
In this study they compare the success rate and av- 
erage number of conflict checks to a solution of 11 
evolutionary algorithms. The best four evolutionary al- 
gorithms are compared with forward checking with 
conflict-directed backjumping [65.68], and the authors 


concluded the latter has a superior performance on ev- 
ery problem instance in the benchmark. 

The following heuristic approaches are included in 
the study. In [65.69, 70], Eiben et al. propose to incor- 
porate existing CSP heuristics into genetic operators. 
A study on the performance of these heuristic-based 
operators when solving binary CSPs was published 
in [65.71]. Two heuristic-based genetic operators are 
specified: an asexual operator that transforms one in- 
dividual into a new one and a multi-parent operator 
that generates one offspring using a number of par- 
ents. In [65.72—74], Riff-Rojas introduced an EA for 
solving CSPs that uses information about the con- 
straint network in the fitness function and in the genetic 
operators (crossover and mutation). The fitness func- 
tion is based on the notion of the error evaluation of 
a constraint. Marchiori et al. introduced and investi- 
gated EAs for solving CSPs based on pre-processing 
and post-processing techniques [65.75—77]. Included 
in the comparison is the variant form [65.75, 78] that 
transforms constraints into a canonical form in such 
a way that there is only one single (type of) primi- 
tive constraint; we call this algorithm glass-box. This 
approach is used in constraint programming, where 
CSPs are given in implicit form by means of for- 
mulas of a given specification language. In [65.79, 
80] Handa et al. formulate a coevolutionary algorithm 
where a population of schemata are parasitic on the host 
population. Schemata in this algorithm are individuals 
where a portion of variables in the individual has val- 
ues while all other variables have do-not-care symbols 
represented by asterisks. 

The following approaches with emphasis on adap- 
tive features are included in the comparison; a co- 
evolutionary approach invented by Paredis and evalu- 
ated on different problems, such as neural net learn- 
ing [65.81], constraint satisfaction [65.81,82], and 
searching for cellular automata that solve the den- 
sity classification task [65.83]. Furthermore, results 
on the performance of the co-evolutionary approach 
when facing the task of solving binary CSPs are 
reported in [65.84,85]. In the co-evolutionary ap- 
proach for CSPs two populations evolve according to 
a predator-prey model: a population of candidate solu- 
tions and a population of constraints. In the approach 
proposed by Dozier et al. in [65.86] and further re- 
fined and applied in [65.87—89], information about 
the constraints is incorporated both in the genetic op- 
erators and in the fitness function. In the microge- 
netic iterative descent algorithm the fitness function 
is adaptive and employs Morris’ breakout creating 


1281 


s°s9 | 3 Hed 


1282 


s°S9 | 3 Hed 


Part E 


Evolutionary Computation 


mechanism [65.90] to escape from local optima. The 
stepwise adaptation of weights mechanism was in- 
troduced by Eiben and van der Hauw [65.91,92] as 
an improved version of the weight adaptation mech- 
anism of Eiben et al. [65.93,94]. The approach has 
been studied in several comparisons and often proved 
to be a robust technique for solving several specific 
CSPs [65.95-97]. A comprehensive study of differ- 
ent parameters and genetic operators can be found 
in [65.98]. The basic idea is that constraints that are 
not satisfied or variables causing constraint violations 
after a certain number of steps must be hard, thus 
must be given a high weight (penalty) in the fitness 
function. 


65.5.4 Examination Timetabling 


Examination timetabling has been studied for many 
years as it is a common problem in many organi- 
zations. Already in 1986, Carter gave an extended 
survey of work on automated timetabling [65.99]. He 
is also responsible for providing problem instances, 
which are still available and popular today [65.100], 
although a more diverse benchmark is used in the 
annual timetabling competition [65.101]. Burke et al. 
provide the most extensive recent surveys of automated 
timetabling in [65.102, 103]. Examination timetabling 
is just one of many problems under the topic of 
timetabling [65.104]. 

Timetabling as a problem has many different def- 
initions due to different kinds of constraints and ob- 
jectives. The definition that is most relevant for con- 
straint satisfaction is often referred to as examination 
timetabling. The most abstract definition simply con- 
sists of a matrix C where C;j = 1 if exam i conflicts 
with exam j by having common students that must take 
both exams, C; j = 0 otherwise. This definition is equiv- 
alent to a graph coloring problem if the objective is 
to minimize the number of exam slots required, where 
the number of slots equals the number of colors re- 
quired for coloring the graph with incidence matrix C. 
Hence, an appropriate approach to performance testing 
is via graph coloring instances based on examination 
timetabling, such as the problem instances labeled SCH 
(school) in the graph coloring instances suite provided 
by Lewandowski [65.105]. 


Many problem instances and problem instance gen- 
erators exist. Infrequently, an International Timetabling 
Competition is organized by The International Series 
of Conferences on the Practice and Theory of Auto- 
mated Timetabling. At each event, another definition 
of timetabling problems is tackled. The differences be- 
tween definitions are in the objectives and the soft 
and hard constraints used. Hard constraints are treated 
the same as in constraint satisfaction, whereas soft 
constraints may be violated but will either incur an ad- 
ditional penalty on the objective function or be used 
to prioritize solutions otherwise, for instance, using 
a Pareto front. Corne et al. [65.106] identified five cat- 
egories of constraints, unary, binary, capacity, event 
spread, and agent preference. 

Three approaches exist to solving timetabling prob- 
lems. The first approach is called one-stage optimiza- 
tion. It aggregates all types of constraints of one prob- 
lem, often by summation, into one objective function 
where each type is assigned a weight. The advantage 
is that, in principle, the approach can be applied to 
any set of constraints. In practice, it may prove dif- 
ficult to optimize such a function. Representations of 
the problem fall into the two main categories direct en- 
coding (Sect. 65.3.1) [65.107] and indirect encoding 
(Sect. 65.3.2) [65.106, 108]. 

The second approach is called two-stage optimiza- 
tion. It first solves the problem of finding a feasible 
solution where all the hard constraints are satisfied. In 
the second stage it searches within the space set with 
these hard constraints and optimizes only against the 
soft constraints. The benefits are that during search we 
do not have to distinguish between feasible and infea- 
sible constraints and, therefore, are not in danger of 
the search wandering off into an infeasible part of the 
search space. Thompson and Dowsland [65.109] were 
the first to report on this approach using simulated an- 
nealing, closely followed by the first EA by Yu and 
Sung [65.110]. 

The third approach uses relaxation of constraints. 
Typically, relaxation in timetabling is achieved by 
not assigning events to slots or by adding addi- 
tional time slots. An early example of an EA is by 
Burke etal. [65.111], where an indirect encoding is 
used and additional time slots are used to relax the 
problem. 


Evolutionary Computation and Constraint Satisfaction 


65.6 Creating Rather than Solving Problems 


65.6 Creating Rather than Solving Problems 


So far we have covered evolutionary computation for 
solving CSP. A contrasting idea proposed first for con- 
straint satisfaction in [65.112] is to use evolutionary 
computation to generate problem instances. Such an 
approach allows a search for problem instances that 
adhere to certain properties as long as these can be mea- 
sured efficiently by a fitness function. 

A straightforward use for such an approach is to 
evolve problem instances that are difficult to solve for 
a particular algorithm. By measuring the efficiency of 
an algorithm to solve instances of a certain problem we 
can then change the instances with the aim of decreas- 
ing the efficiency. Measurements for efficiency of EC 
for CSP are discussed in Sect. 65.4.1. It is important 
to note that the algorithm we are evolving problem in- 
stances for can be of any kind, as long as we can execute 
it on problem instances generated and we can measure 
its efficiency. 

Such hard problem instances identify the weak 
spots in the algorithm that tries to solve it. Moreover, 
if we can characterize a set of problem instances where 
all members of the set are hard for an algorithm, then we 
can use that characterization to decide what algorithm 
is suitable for solving a new problem instance. That is, 
if the work required to obtain the characteristics of one 
instance takes less effort than solving the actual prob- 
lem instance itself [65.113]. 


65.6.1 Evolving Binary Constraint 
Satisfaction Problem Instances 


The first application to constrained problems was 
for the binary constraint satisfaction problem 
(Sect. 65.5.3), where problem instances are rep- 
resented as a binary vector with each element 
corresponding to the element of a conflict matrix 
between two variables [65.114]. Even the small in- 
stances investigated in the study led to large vectors, 
i.e., with 15 variables each with a domain of size 
15, the corresponding vector has OE 15? = 23 625 
elements. Results with problem instances of this size 
show problem instances can be created that are far 
more difficult to solve than when creating a much 
larger set of randomly generated instances [65.112]. 
Furthermore, analysis of these instances provides 
an insight as to what structure is responsible for 
making instances difficult for the algorithm; two 
well-known algorithms from constraint programming 
were tested: chronological backtracking [65.115] and 


forward checking with conflict-directed backjump- 
ing [65.116]. 


65.6.2 Evolving Boolean Satisfiability 
Problem Instances 


In [65.114] an evolutionary algorithm is used to evolve 
solvable Boolean satisfiability problem instances that 
are in conjunctive normal form and have three variables 
per clause. A 3-SAT problem is represented by a list 
of natural numbers. A number in the list, i.e., a gene, 
corresponds to a unique clause with three different lit- 
erals. The number of possible unique clauses depends 
on the number of variables and the size of the clause. 
Here, the number of variables is set to 100 and the 
size of the clause is 3, hence there are 1 313 400 unique 
clauses. This representation has strong advantages over 
a simple one gene for every literal approach. Most 
importantly, it prevents duplicate variables in clauses, 
which reduces the state space and could otherwise intro- 
duce trivial clauses, e.g., (xV =x V y), or 2-SAT clauses, 
e.g., (xV xV y). Also, the variation operators now sim- 
ply become mutation and uniform crossover for lists of 
natural numbers over a fixed domain. 

Two problem solvers are used from the annual SAT 
competition [65.117]; both are based on the Davis— 
Putnum procedure [65.4]. zChaff [65.118] is based on 
Chaff [65.119], a SAT solver that employs a partic- 
ularly efficient implementation of Boolean constraint 
propagation and a novel low overhead decision strat- 
egy. Relsat [65.120] is explained in [65.121, 122]. In 
both solvers, the number of states of instantiations are 
enumerated to determine the search effort required. 

The change of certain structural properties over the 
duration of evolution was analyzed. Two established 
properties were used: the number of solutions [65.123, 
124] and the backbone size [65.125]. No clear relation- 
ship was identified with these properties. 

However, a new relationship was identified: when 
problem instances are becoming more difficult to solve, 
the variance in the frequency in variable usage de- 
creases. In other words, the distribution of variables 
throughout the instances is more uniform when prob- 
lems are more difficult to solve. 


65.6.3 Further Investigations 


The application of evolutionary computation in 
problem generation is widespread. Smith-Miles and 


1283 


9°99 | 3 Hed 


1284 PartE | Evolutionary Computation 


99 | 3 Hed 


Lopes [65.126] provide an extensive review in terms 
of measuring instance difficulty in combinatorial op- 
timization problems, which also discusses studies that 
evolve problem instances for constrained optimization 
as well as for constraint satisfaction problems. 

The maximization of the effort required to solve 
a problem instance highlights only one aspect of the 
problem difficulty. Another aspect that looks at the ef- 


fectiveness is to maximize the distance a solver is able 
to reach to the optimal solution. To compute this dis- 
tance, we require the fitness of the optimal solution 
a priori. Note, however, we do not need to know what 
the optimal solution is, only its fitness. Another ap- 
proach is to directly compare solvers by maximizing the 
difference in some aspect, e.g., efficiency or effective- 
ness, between two solvers. 


65.7 Conclusions and Future Directions 


Research on solving constraint satisfaction problems 
with evolutionary computation has produced a rich 
set of research papers that contribute solvers, insights 
into solvers and their performance, and heuristic sub- 
routines. One major flaw in this research has re- 
mained consistent over the past 20 years: most stud- 
ies compare performance results only to other evo- 
lutionary or closely associated techniques. Even re- 
cent studies, such as [65.127—129], restrict themselves 
to comparing only results from other heuristic meth- 
ods or have not included alternative techniques at 
all. 

Many studies report on the promising performance 
of a particular evolutionary algorithm over another ex- 
isting heuristic technique. The few systematic studies 
that do compare evolutionary and constraint program- 
ming techniques conclude that constraint programming 


References 


65.1 K. Apt: Principles of Constraint Programming 
(Cambridge Univ. Press, Cambridge 2003) 

65.2 B.G.W. Craenen, A.E. Eiben: Hybrid evolutionary 
algorithms for constraint satisfaction problems: 
Memetic overkill?, 2005 IEEE Congr. Evol. Comput., 
Vol. 3 (2005) pp. 1922-1928 

65.3 R. Kibria, Y. Li: Optimizing the initialization of dy- 
namic decision heuristics in DPLL SAT solvers using 
genetic programming, Lect. Notes Comput. Sci. 
3905, 331-340 (2006) 

65.4 M. Davis, H. Putnam: A computing procedure 
for quantification theory, Journal ACM 7, 201-215 
(1960) 

65.5 H.E. Dudeney: Cryptarithm, Strand Mag. 68, 97 
and 214 (1924) 

65.6 M.R. Garey, D.S. Johnson: Computers and In- 
tractability: A Guide to the Theory of NP- 
Completeness (W.H. Freeman, San Francisco 
1979) 


is superior in terms of efficiency [65.60, 67]. Also, con- 
straint programming techniques are generally sound 
and, therefore, given sufficient time, always find a so- 
lution or proof that none exists. Hence, these solvers 
are more effective unless they are bounded by time. Re- 
cent efforts have shown success in speeding up modern 
DPLL-based techniques using heuristics for guiding the 
search [65.130, 131]. 

In Sect. 65.5 we reviewed many techniques that 
were developed and studied for the purpose of improv- 
ing EC in terms of efficiency and effectiveness. The vast 
majority of these techniques was applied to one prob- 
lem only. A huge benefit would come from studies that 
show the success of a technique across several CSPs. 
Such studies would be especially opportune for the SAT 
problem, which is still the most actively used CSP for 
benchmarking algorithms [65.132]. 


65.7 R. Lewis: Metaheuristics can solve Sudoku puz- 
zles, J. Heuristics 13, 387-401 (2007) 

65.8 E. Tsang: Foundations of Constraint Satisfaction 
(Academic, London 1993) 

65.9 R. Dechter: Constraint Processing (Morgan Kauf- 


mann, San Francisco 2003) pp. 1-481 

65.10 C. Lecoutre: Constraint Networks: Techniques and 
Algorithms (Wiley, Hoboken 2009) 

65.11 H. Chen: A rendezvous of logic, complexity, and 
algebra, ACM Comput. Surv. 42(1), 2 (2009) 

65.12 B. Bernhardsson: Explicit solutions to the n- 
queens problem for all n, SIGART Bull. 2, 7 (1991) 

65.13 T. Bäck, D. Fogel, Z. Michalewicz (Eds.): Handbook 
of Evolutionary Computation (Oxford Univ. Press, 
New York 1997) 

65.14 F. Rossi, P. Van Beek, T. Walsh: Handbook of Con- 
straint Programming (Elsevier, Amsterdam 2006) 

65.15 D. Brélaz: New methods to color the vertices of 
a graph, Communications ACM 22, 251-256 (1979) 


Evolutionary Computation and Constraint Satisfaction 


References 


65 


65. 


65 


65 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


.16 


17 


18 


19 


20 


21 


29 


30 


31 


32 


D.B. Fogel: Evolutionary Computation: Towards 
a New Philosophy of Machine Intelligence, 2nd 
edn. (Wiley, Hoboken 1999) 

A.E. Eiben, Z. Ruttkay: Self-adaptivity for con- 
straint satisfaction: Learning penalty functions, 
Int. Conf. Evol. Comput. (1996) pp. 258-261 

R. Hinterding, Z. Michalewicz, A.E. Eiben: Adapta- 
tion in evolutionary computation: A survey, Proc. 
4th IEEE Conf. Evol. Comput. (1997) pp. 65-69 

T. Back: Introduction to the special issue: Self- 
adaptation, Evol. Comput. 9(2), 3-4 (2001) 

T. Runnarson, X. Yao: Constrained evolutionary 
optimization - The penalty function approach. 
In: Evolutionary Optimization, ed. by R. Sarker, 
M. Mohammadian, X. Yao (Kluwer, Boston 2002) 
pp. 87-113, Chap. 4 

J.T. Richardson, M.R. Palmer, G. Liepins, 
M. Hilliard: Some guidelines for genetic al- 
gorithms with penalty functions, Proc. 3rd Int. 
Conf. Genet. Algoritms. (1989) pp. 191-197 

M.L. Braun, J.M. Buhmann: The noisy Euclidean 
traveling salesman problem and learning, Proc. 
2001 Neural Inf. Process. Syst. Conf. (2002) 

D. Whitley, J.P. Watson, A. Howe, L. Barbulescu: 
Testing, evaluation and performance of opti- 
mization and learning systems. In: Adaptive 
Computing in Design and Manufacturer, ed. by 
I.C. Parmee (Springer, Berlin, Heidelberg 2002) 
pp. 27-39 

A. Biere, M. Heule, H. van Maaren, T. Walsh: 
Handbook of Satisfiability (IOS, Amsterdan 2009) 
V. Malek: Introduction to Mathematics of Satisfi- 
ability (Chapman Hall, Boca Raton 2009) 

S.A. Cook: The complexity of theorem-proving 
procedures, Proc. 3rd Annu. ACM Symp. Theory 
Comput. (1971) pp. 151-158 

M. Utting, B. Legeard: Practical Model-Based Test- 
ing: A Tools Approach (Morgan Kaufmann, San 
Francisco 2007) 

A. Bundy: A science of reasoning: extended ab- 
stract, Proc. 10th Int. Conf. Autom. Deduc. (1990) 
pp. 633-640 

D. McDermott, M. Ghallab, A. Howe, C. Knoblock, 
A. Ram, M. Veloso, D. Weld, D. Wilkins: PDDL - The 
planning domain definition language, Tech. Rep. 
TR-98-003, Yale Center for Computational Vision 
and Control (1998) 

R. Drechsler, S. Eggersgltif8, G. Fey, D. Tille: Test 
Pattern Generation using Boolean Proof Engines 
(Springer, Berlin, Heidelberg 2009) pp. 1-192 

D. He, A. Choi, K. Pipatsrisawat, A. Darwiche, 
E. Eskin: Optimal algorithms for haplotype as- 
sembly from whole-genome sequence data, 
Bioinformatics 26(12), i1183-i190 (2010) 

I.V. Tetko, D.J. Livingstone, A.I. Luik: Neural net- 
work studies. 1. Comparison of overfitting and 
overtraining, J. Chem Inf. Comput. Sci. 35, 826- 
833 (1995) 


65.33 


65.34 


65.35 


65.36 


65.37 


65.38 


65.39 


65.40 


65.41 


65.42 


65.43 


65.44 


65.45 


65.46 


65.47 


65.48 


65.49 


J.-K. Hao, R. Dorne: An empirical comparison of 
two evolutionary methods for satisfiability prob- 
lems, Int. Conf. Evol. Comput. (1994) pp. 451- 
455 

J. Gottlieb, N. Voss: Fitness functions and ge- 
netic operators for the satisfiability problem, Lect. 
Notes Comput. Sci. 1363, 55-68 (1997) 

J. Gottlieb, N. Voss: Improving the performance 
of evolutionary algorithms for the satisfiability 
problem by refining functions, Lect. Notes Com- 
put. Sci. 1498, 755-764 (1998) 

G. Folino, C. Pizzuti, G. Spezzano: Solving the 
satisfiability problem by a parallel cellular ge- 
netic algorithm, Proc. 24th Euromicro Conf. (1998) 
pp. 715-722 

N. Nemer-Preece, R.W. Wilkerson: Parallel genetic 
algorithm to solve the satisfiability problem, Proc. 
1998 ACM Symp. Appl. Comput. (1998) pp. 23-28 
C. Rossi, E. Marchiori, J.N. Kok: An adaptive evo- 
lutionary algorithm for the satisfiability prob- 
lem, Proc. 2000 ACM Symp. Appl. Comput. (2000) 
pp. 463-469 

J.-K. Hao, F. Lardeux, F. Saubion: Evolutionary 
computing for the satisfiability problem, Lect. 
Notes Comput. Sci. 2611, 258-267 (2003) 

M.E. Bachir Menai: An evolutionary local search 
method for incremental satisfiability, Lect. Notes 
Comput. Sci. 3249, 143-156 (2004) 

L. Aksoy, E.0. Giines: An evolutionary local search 
algorithm for the satisfiability problem, Lect. 
Notes Comput. Sci. 3949, 185-193 (2005) 

M.E. Bachir Menai, M. Batouche: Solving the 
maximum satisfiability problem using an evolu- 
tionary local search algorithm, Int. Arab J. Inf. 
Technol. 2(2), 154-161 (2005) 

P. Guo, W. Luo, Z. Li, H. Liang, X. Wang: Hy- 
bridizing evolutionary negative selection algo- 
rithm and local search for large-scale satisfiability 
problems, Lect. Notes Comput. Sci. 5821, 248-257 
(2009) 

Y. Kilani: Comparing the performance of the ge- 
netic and local search algorithms for solving the 
satisfiability problems, Appl. Soft. Comput. 10(1), 
198-207 (2010) 

R.H. Kibria: Soft Computing Approaches to DPLL 
SAT Solver Optimization, Ph.D. Thesis (TU Darm- 
stadt, Darmstadt 2011) 

M. Davis, G. Logemann, D. Loveland: A machine 
program for theorem-proving, Communications 
ACM 5(7), 394-397 (1962) 

R. Lewis, J. Thompson: On the application of 
graph colouring techniques in round-robin sports 
scheduling, Comput. Oper. Res. 38, 190-204 (2011) 
S.S. Muchnick: Advanced Compiler Design and 
Implementation (Morgan Kaufmann, San Fran- 
sisco 1997) 

W.K. Hale: Frequency assignment: Theory and 
applications, Proc. IEEE 68(12), 1497-1514 (1980) 


1285 


99 | 3 Hed 


1286 PartE 


Evolutionary Computation 


99 | 3 Hed 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


50 


60 


J. Culberson: Graph Coloring Page (2010), avail- 
able online at http://webdocs.cs.ualberta.ca/ 
~joe/Coloring/ 

J.C. Culberson, F. Luo: Exploring the k-colorable 
landscape with iterated greedy. In: Cliques, Col- 
oring, and Satisfiability: Second DIMACS Imple- 
mentation Challenge, DIMACS Series in Discrete 
Mathematics and Theoretical Computer Science, 
Vol. 26, ed. by D.S. Johnson, M.A. Trick (American 
Mathematical Society, Providence 1996) pp. 245- 
284 

M. Brockington, J.C. Culberson: Camouflaging in- 
dependent sets in quasi-random graphs. In: 
Cliques, Coloring, and Satisfiability: Second DI- 
MACS Implementation Challenge, DIMACS Series in 
Discrete Mathematics and Theoretical Computer 
Science, Vol. 26, ed. by D.S. Johnson, M.A. Trick 
(American Mathematical Society, Providence 1996) 
pp. 75-88 

G.J. Chaitin, M.A. Auslander, A.K. Chandra, 
J. Cocke, M.E. Hopkins, P.W. Markstein: Register 
allocation via coloring, Comput. Lang. 6(1), 47-57 
(1981) 

D.J.A. Welsh, M.B. Powell: An upper bound for the 
chromatic number of a graph and its application 
to timetabling problems, Comput. J. 10(1), 85-86 
(1967) 

C. Fleurent, J. Ferland: Genetic and hybrid algo- 
rithms for graph coloring, Ann. Oper. Res. 63(3), 
437-461 (1996) 

C. Fleurent, J.A. Ferland: Object-oriented imple- 
mentation of heuristic search methods for graph 
coloring, maximum clique, and satisfiability. In: 
Cliques, Coloring, and Satisfiability: Second DI- 
MACS Implementation Challenge, DIMACS Series in 
Discrete Mathematics and Theoretical Computer 
Science, Vol. 26, ed. by D.S. Johnson, M.A. Trick 
(American Mathematical Society, Providence 1996) 
pp. 619-652 

G. von Laszewski: Intelligent structural operators 
for the k-way graph partitioning problem, Proc. 
4th Int. Conf. Genet. Algorithms (1991) pp. 45- 
52 

L. Davis: Order-based genetic algorinms and the 
graph coloring problem. In: Handbook of Genetic 
Algorithms, ed. by L. Davis (Van Nostrand Rein- 
hold, New York 1991) pp. 72-90 

P.E. Coll, G.A. Duran, P. Moscato: A discussion 
on some design principles for efficient crossover 
operators for graph coloring problems, An. XXVII 
Simp. Brasil. Pesqui. Oper. (1995) 

|. Juhos, J.l. van Hemert: Contraction-based 
heuristics to improve the efficiency of algo- 
rithms solving the graph colouring problem. In: 
Recent Advances in Evolutionary Computation 
for Combinatorial Optimization, ed. by C. Cotta, 
J.1. van Hemert (Springer, Berlin, Heidelberg 2008) 
pp. 167-184 


65.61 


65.62 


65.63 


65.64 


65.65 


65.66 


65.67 


65.68 


65.69 


65.70 


65.71 


65.72 


65.73 


65.74 


65.75 


l. Juhos, J.I. van Hemert: Graph colouring heuris- 
tics guided by higher order graph properties, Lect. 
Notes Comput. Sci. 4972, 97-109 (2008) 

|. Juhos, J.l. van Hemert: Increasing the efficiency 
of graph colouring algorithms with a represen- 
tation based on vector operations, J. Softw. 1(2), 
24-33 (2006) 

E.M. Palmer: Graphical Evolution (Wiley, New York 
1985) 

D. Achlioptas, L.M. Kirousis, E. Kranakis, 
D. Krizanc, M.S.0. Molloy, Y.C. Stamatiou: 
Random constraint satisfaction: A more accurate 
picture, Lect. Notes Comput. Sci. 1330, 107-120 
(1997) 

E. Macintyre, P. Prosser, B.M. Smith, T. Walsh: 
Random constraint satisfaction: Theory meets 
practice. In: Principles and Practice of Con- 
straint Programming — CP98, ed. by M. Maher, 
J.-F. Puget (Springer, Berlin, Heidelberg 1998) 
pp. 325-339 

E. Freuder, R.J. Wallace: Partial constraint satis- 
faction, Artif. Intell. 65, 363-376 (1992) 

B.G.W. Craenen, A.E. Eiben, J.l. van Hemert: Com- 
paring evolutionary algorithms on binary con- 
straint satisfaction problems, IEEE Trans. Evol. 
Comput. 7(5), 424-444 (2003) 

R. Haralick, G. Elliot: Increasing tree search effi- 
ciency for constraint-satisfaction problems, Artif. 
Intell. 14(3), 263-313 (1980) 

A.E. Eiben, P.-E. Raué, Z. Ruttkay: Heuristic Ge- 
netic Algorithms for Constrained Problems, Part I: 
Principles, Tech. Rep. IR-337 (Vrije Universiteit 
Amsterdam 1993) 

A.E. Eiben, P.-E. Raué, Z. Ruttkay: Solving con- 
straint satisfaction problems using genetic algo- 
rithms, Proc. 1st IEEE Conf. Evol. Comput. (1994) 
pp. 542-547 

B.G.W. Craenen, A.E. Eiben, E. Marchiori: Solving 
constraint satisfaction problems with heuristic- 
based evolutionary algorithms, Congr. Evol. Com- 
put. (2000) 

M.C. Riff-Rojas: Using the knowledge of the con- 
straint network to design an evolutionary algo- 
rithm that solves CSP, Proc. 3rd IEEE Conf. Evol. 
Comput. (1996) pp. 279-284 

M.C. Riff-Rojas: Evolutionary search guided by the 
constraint network to solve CSP, Proc. 4th IEEE 
Conf. Evol. Comput. (1997) pp. 337-348 

M.-C. Riff-Rojas: A network-based adaptive evo- 
lutionary algorithm for constraint satisfaction 
problems. In: Meta-heuristics: Advances and 
Trends in Local Search Paradigms for Optimiza- 
tion, ed. by S. Voss (Kluwer, Boston 1998) pp. 325- 
339 

E. Marchiori: Combining constraint processing 
and genetic algorithms for constraint satisfaction 
problems, Proc. 7th Int. Conf. Genet. Algorithms 
(1997) pp. 330-337 


Evolutionary Computation and Constraint Satisfaction 


References 


65.76 


65.77 


65.78 


65.79 


65.80 


65.81 


65.82 


65.83 


65.84 


65.85 


65.86 


65.87 


65.88 


65.89 


65.90 


65.91 


E. Marchiori, A. Steenbeek: A genetic local search 
algorithm for random binary constraint satisfac- 
tion problems, Proc. ACM Symp. Appl. Comput. 
(2000) pp. 458-462 

B.G.W. Craenen, A.E. Eiben, E. Marchiori, 
A. Steenbeek: Combining local search and 
fitness function adaptation in a GA for solving 
binary constraint satisfaction problems, Proc. 
Genet. Evol. Comput. Conf. (2000) 

P. van Hentenryck, V. Saraswat, Y. Deville: Con- 
straint processing in cc(FD). In: Constraint Pro- 
gramming: Basics and Trends, ed. by A. Podelski 
(Springer, Berlin, Heidelberg 1995) 

H. Handa, C.0. Katai, N. Baba, T. Sawaragi: Solving 
constraint satisfaction problems by using coevo- 
lutionary genetic algorithms, Proc. 5th IEEE Conf. 
Evol. Comput. (1998) pp. 21-26 

H. Handa, N. Baba, 0. Katai, T. Sawaragi, T. Ho- 
riuchi: Genetic algorithm involving coevolution 
mechanism to search for effective genetic infor- 
mation, Proc. 4th IEEE Conf. Evol. Comput. (1997) 
J. Paredis: Co-evolutionary computation, Artif. 
Life 2(4), 355-375 (1995) 

J. Paredis: Coevolutionary constraint satisfaction, 
Lect. Notes Comput. Sci. 866, 46-55 (1994) 

J. Paredis: Coevolving cellular automata: Be aware 
of the red queen, Proc. 7th Int. Conf. Genet. Al- 
gorithms (1997) 

A.E. Eiben, J.l. van Hemert, E. Marchiori, 
A.G. Steenbeek: Solving binary constraint satis- 
faction problems using evolutionary algorithms 
with an adaptive fitness function, Lect. Notes 
Comput. Sci. 1498, 196-205 (1998) 

J.i. van Hemert: Applying Adaptive Evolutionary 
Algorithms to Hard Problems, M.Sc. Thesis (Leiden 
University, Leiden 1998) 

G. Dozier, J. Bowen, D. Bahler: Solving small 
and large constraint satisfaction problems using 
a heuristic-based microgenetic algorithm, Proc. 
1st IEEE Conf. Evol. Comput. (1994) pp. 306-311 

J. Bowen, G. Dozier: Solving constraint satisfac- 
tion problems using a genetic/systematic search 
hybride that realizes when to quit, Proc. 6th 
Int. Conf. Genet. Algorithms (Morgan Kaufmann, 
Burlington 1995) pp. 122-129 

G. Dozier, J. Bowen, D. Bahler: Solving randomly 
generated constraint satisfaction problems using 
a micro-evolutionary hybrid that evolves a pop- 
ulation of hill-climbers, Proc. 2nd IEEE Conf. Evol. 
Comput. (1995) pp. 614-619 

P.J. Stuckey, V. Tam: Improving evolutionary al- 
gorithms for efficient constraint satisfaction, Int. 
J. Artif. Intell. Tools 8(4), 363-384 (1999) 

P. Morris: The breakout method for escaping from 
local minima, Proc. 11th Natl. Conf. Artif. Intell. 
(1993) pp. 40-45 

A.E. Eiben, J.K. van der Hauw: Adaptive penal- 
ties for evolutionary graph-coloring, Lect. Notes 
Comput. Sci. 1363, 95-106 (1998) 


65.92 


65.93 


65.94 


65.95 


65.96 


65.97 


65.98 


65.99 


65.100 


65.101 


65.102 


65.103 


65.104 


65.105 


65.106 


65.107 


65.108 


J.K. van der Hauw: Evaluating and Improving 
Steady State Evolutionary Algorithms on Con- 
straint Satisfaction Problems, M.Sc. Thesis (Leiden 
University, Leiden 1996) 

A.E. Eiben, P.-E. Raué, Z. Ruttkay: Constrained 
problems. In: Practical Handbook of Genetic Al- 
gorithms, ed. by L. Chambers (Taylor Francis, Boca 
Raton 1995) pp. 307-365 

A.E. Eiben, Z. Ruttkay: Self-adaptivity for con- 
straint satisfaction: Learning penalty functions, 
Proc. 3rd IEEE Conf. Evol. Comput. (1996) pp. 258- 
261 

T. Bäck, A.E. Eiben, M.E. Vink: A superior evolu- 
tionary algorithm for 3-SAT, Lect. Notes Comput. 
Sci. 1477, 125-136 (1998) 

A.E. Eiben, J.K. van der Hauw, J.I. van Hemert: 
Graph coloring with adaptive evolutionary algo- 
rithms, J. Heuristics 4(1), 25-46 (1998) 

A.E. Eiben, J.l. van Hemert: SAW-ing EAs: Adapt- 
ing the fitness function for solving constrained 
problems. In: New Ideas in Optimization, ed. by 
D. Corne, M. Dorigo, F. Glover (McGraw Hill, New 
York 1999) pp. 389-402 

B.G.W. Craenen, A.E. Eiben: Stepwise adaption of 
weights with refinement and decay on constraint 
satisfaction problems, Proc. Genet. Evol. Comput. 
Conf. (2001) pp. 291-298 

M.W. Carter: A survey of practical applications of 
examination timetabling algorithms, Oper. Res. 
34, 193-202 (1986) 

M.W. Carter, G. Laporte, S.Y. Lee: Examina- 
tion timetabling: Algorithmic strategies and ap- 
plication, J. Oper. Res. Soc. 47(3), 373-383 
(1996) 

International Timetabling Competition 2011: 
available online at http://www.utwente.nl/ctit/ 
itc2011/ 

E.K. Burke, S. Petrovic: Recent research direc- 
tions in automated timetabling, Eur. J. Oper. Res. 
14.0(2), 266-280 (2002) 

R. Qu, E.K. Burke, B. Mccollum, L.T. Merlot, 
S.Y. Lee: A survey of search methodologies and 
automated system development for examination 
timetabling, J. Sched. 12, 55-89 (2009) 

E.K. Burke, D. Corne, B. Paechter, P. Ross 
(Eds.): Proc. ist Int. Conf. Pract. Theory Autom. 
Timetabling (Napier University, Edinburgh 1995) 
G. Lewandowski: Course scheduling: Metrics, 
Models, and Methods (Xavier University, Cincin- 
nati 1996) 

D. Corne, P. Ross, H.-L. Fang: Evolving timetables. 
In: Practical Handbook of Genetic Algorithms: 
Applications, Vol. |, ed. by L. Chambers (Taylor 
Francis, Boca Raton 1995) pp. 219-276 

A. Colorni, M. Dorigo, V. Maniezzo: Metaheuristics 
for high school timetabling, Comput. Optim. Appl. 
9(3), 275-298 (1998) 

M.P. Carrasco, M.V. Pato: A multiobjective genetic 
algorithm for the class/teacher timetabling prob- 


1287 


99 | 3 Hed 


1288 PartE 


Evolutionary Computation 


99 | 3 Hed 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


65. 


109 


110 


111 


112 


113 


114 


115 


116 


117 


118 


119 


120 


121 


lem, Proc. 3rd Int. Conf. Pract. Theory Autom. 
Timetabling (2001) pp. 3-17 

J.M. Thompson, K.A. Dowsland: A robust simu- 
lated annealing based examination timetabling 
system, Comput. Oper. Res. 25, 637-648 (1998) 

E. Yu, K.-S. Sung: A genetic algorithm for a uni- 
versity weekly courses timetabling problem, Int. 
Trans. Oper. Res. 9(6), 703-717 (2002) 

E.K. Burke, D. Elliman, R.F. Weare: A hy- 
brid genetic algorithm for highly constrained 
timetabling problems, Proc. 6th Int. Conf. Genet. 
Algorithms (1995) pp. 605-610 

J.i. van Hemert: Evolving binary constraint sat- 
isfaction problem instances that are difficult to 
solve, Proc. IEEE 2003 Congr. Evol. Comput. (New 
York) (2003) pp. 1267-1273 

K. Smith-Miles, J.l. van Hemert: Discovering the 
suitability of optimisation algorithms by learning 
from evolved instances, Ann. Math. Artif. Intell. 
61(2), 87-104 (2011) 

J.i. van Hemert: Evolving combinatorial problem 
instances that are difficult to solve, Evol. Comput. 
14(4), 433-462 (2006) 

S.W. Golomb, L.D. Baumert: Backtrack program- 
ming, Journal ACM 12(4), 516-524 (1965) 

P. Prosser: Hybrid algorithms for the constraint 
satisfaction problem, Comput. Intell. 9(3), 268- 
299 (1993) 

D. Le Berre, L. Simon: Sat Competitions http:// 
www.satcompetition.org. (2005) 

Z. Fu: zChaff (Princeton University) Version 
2004.11.15 http://www. princeton.edu/~chaff/ 
zchaff.htm! (2004) 

M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, 
S. Malik: Chaff: Engineering an efficient SAT solver, 
Proc. 38th Design Autom. Conf. (2001) pp. 530- 
535 

R. Bayardo: Relsat. Version 2.00 (IBM, San Jose 
2005), available online at http://www.almaden. 
ibm.com/cs/people/bayardo/resources.html 

R. Bayardo, R.C. Schrag: Using CSP look-back tech- 
niques to solve real world SAT instances, Proc. 14th 
Natl. Conf. Artif. Intell. (1997) pp. 203-208 


65.122 


65.123 


65.124 


65.125 


65.126 


65.127 


65.128 


65.129 


65.130 


65.131 


65.132 


R. Bayardo, J. Pehoushek: Counting models using 
connected components, Proc. 17th Natl. Conf. Ar- 
tif. Intell. (2000) 

D. Achlioptas, C.P. Gomes, H.A. Kautz, B. Selman: 
Generating satisfiable problem instances, Proc. 
17th Natl. Conf. Artif. Intell. 12th Conf. Innov. Appl. 
Artif. Intell. (2000) pp. 256-261 

D. Achlioptas, H. Jia, C. Moore: Hiding satisfying 
assignments: Two are better than one, J. Artif. In- 
tell. Res. 24, 623-639 (2005) 

S. Boettcher, G. Istrate, A.G. Percus: Spines of ran- 
dom constraint satisfaction problems: Definition 
and impact on computational complexity, 8th Int. 
Symp. Artif. Intell. Math. (2005), extended version 
K. Smith-Miles, L. Lopes: Review: Measuring 
instance difficulty for combinatorial optimiza- 
tion problems, Comput. Oper. Res. 39, 875-889 
(2012) 

R. Abbasian, M. Mouhoub: An efficient hierarchi- 
cal parallel genetic algorithm for graph coloring 
problem, Proc. 13th Annu. Conf. Genet. Evol. Com- 
put. (2011) pp. 521-528 

D.C. Porumbel, J.-K. Hao, P. Kuntz: An evolu- 
tionary approach with diversity guarantee and 
well-informed grouping recombination for graph 
coloring, Comput. Oper. Res. 37(10), 1822-1832 
(2010) 

M. Mouhoub, B. Jafari: Heuristic techniques for 
variable and value ordering in CSPs, Proc. 13th 
Annu. Conf. Genet. Evol. Comput. (2011) pp. 457- 
464 

J. Chen: Building a hybrid sat solver via conflict- 
driven, look-ahead and XOR reasoning tech- 
niques, Lect. Notes Comput. Sci. 5584, 298-311 
(2009) 

A. Balint, M. Henn, 0. Gableske: A novel approach 
to combine a SLS- and a DPLL-solver for the sat- 
isfiability problem, Lect. Notes Comput. Sci. 5584, 
284-297 (2009) 

0. Kullmann (Ed.): Theory and Applications of 
Satisfiability Testing — SAT 2009, Lecture Notes in 
Computer Science, Vol. 558 (Springer, Berlin, Hei- 
delberg 2009) 


1289 


Swartft tr 


66 Swarm Intelligence 
in Optimization and Robotics 
Christian Blum, San Sebastian, Spain 
Roderich Groß, Sheffield, UK 


67 Preference-Based Multiobjective 
Particle Swarm Optimization 
for Airfoil Design 
Robert Carrese, Clayton North, Australia 
Xiaodong Li, Melbourne, Australia 


68 Ant Colony Optimization 
for the Minimum-Weight Rooted 
Arborescence Problem 
Christian Blum, San Sebastian, Spain 
Sergi Mateo Bellido, Barcelona, Spain 


69 An Intelligent Swarm 
of Markovian Agents 
Dario Bruneo, Messina, Italy 
Marco Scarpa, Messina, Italy 
Andrea Bobbio, Alessandria, Italy 
Davide Cerotti, Milano, Italy 
Marco Gribaudo, Milano, Italy 


70 


71 


72 


73 


74 


Part F Swarm Intelligence 


Ed. by Christian Blum, Roderich Groß 


Honey Bee Social Foraging Algorithm 
for Resource Allocation 

Jairo Alonso Giraldo, Bogota, Colombia 
Nicanor Quijano, Bogota, Colombia 
Kevin M. Passino, Columbus, USA 


Fundamental Collective Behaviors 
in Swarm Robotics 

Vito Trianni, Roma, Italy 

Alexandre Campo, Brussels, Belgium 


Collective Manipulation 
and Construction 
Lynne Parker, Knoxville, USA 


Reconfigurable Robots 
Kasper Støy, Copenhagen S, Denmark 


Probabilistic Modeling 

of Swarming Systems 

Nikolaus Correll, Boulder, USA 

Heiko Hamann, Paderborn, Germany 


66. Swarm Intelligence in Optimization and Robotics 


Christian Blum, Roderich Groß 


Swarm intelligence is an artificial intelligence dis- 
cipline, which was created on the basis of the laws 
that govern the behavior of, for example, social 
insects, fish schools, and flocks of birds. The or- 
ganization of these animal societies has always 
mesmerized humans. Therefore, it is surprising 
that it has only been in the second half of the last 
century that some of the most important prin- 
ciples of swarm intelligent behavior have been 
unraveled. A prime example is stigmergy, which 
refers to a self-organization of the animal society 
via changes applied to the environment. 

In this chapter, we provide a concise introduc- 
tion to swarm intelligence, with two main research 
lines in mind: optimization and robotics. Popular 
examples of optimization algorithms based on 
swarm intelligence principles are ant colony opti- 
mization and particle swarm optimization. On the 
other side, the field of robotics has adopted var- 


66.1 Overview 


Swarm intelligence (SI) [66.1—3] is a subfield of the 
more general field of artificial intelligence [66.4]. The 
term swarm intelligence was introduced and used for 
the first time by Beni et al. [66.5—7] in the context 
of cellular robotic systems. Nowadays, SI research is 
generally concerned with the design of intelligent mul- 
tiagent systems whose inspiration is taken from the 
collective behavior of social — or even eusocial — in- 
sects and other animal populations. Examples include 
ant colonies, bee hives, wasp colonies, frog popula- 
tions, flocks of birds, and fish schools. Among these, 
social insects have always played a prominent role in 
the inspiration of SI techniques. Even though their in- 
trinsic ways of functioning have fascinated researchers 
for many years, the mechanisms that govern their be- 
havior remained unknown for a long time. In colonies of 
social insects, for example, single colony members are 


66.1 OVEVÍEW.. oiiire 1291 
66.2 SI in Optimization................0000...00... 1292 
66.2.1 Ant Colony Optimization............. 1292 
66.2.2 Particle Swarm Optimization ...... 1293 


66.2.3 Artificial Bee Colony Algorithm ... 1294 
66.2.4 Other SI Techniques 
for Optimization 


and Management Tasks ............. 1295 

66.3 SI in Robotics: Swarm Robotics ............. 1296 
DOI MS) (2 | ee eee 1296 
GO- TASKS si oicesiidicssvetassssedeceveedcaetenss 1301 

66.4 Research Challenges .................... ee 1302 
Referees.. neiii ccc ceecccneeeeeeeeeaeeenees 1303 


ious swarm intelligent behaviors for problem solv- 
ing and organizing groups of robots. This has 
resulted in a separate research field nowadays 
known as swarm robotics. 


unsophisticated individuals, yet they are able to achieve 
complex tasks in cooperation. Essential colony behav- 
iors emerge from relatively simple interactions between 
the colony’s individual members. 

An important aspect of any SI system is self-orga- 
nization [66.8]. Originally, the term self-organization 
was introduced by the German philosopher Immanuel 
Kant [66.9] in an attempt to characterize what makes or- 
ganisms so different from other objects. Nowadays, the 
term self-organization refers to a process where some 
form of global order or coordination emerges from 
rather simple interactions between low-level compo- 
nents of an initially unordered system. Self-organizing 
processes are neither directed nor controlled by any 
agent or component, neither from inside nor from out- 
side the system. They are often triggered by random 
fluctuations that are amplified by positive feedback and 


1291 


v 
fen 
= 
pes 
n 
Oo 
OV 
= 


1292 


7°99 | 4 Hed 


Part F | Swarm Intelligence 


Fig. 66.1 Ants cooperate for retrieving a heavy prey 
(photo courtesy of M. J. Blesa) 


possibly counterbalanced by negative feedback, which 
generally aids in stabilizing the system. The global 
properties exhibited by self-organizing systems are thus 
the result of this distributed interplay of their com- 
ponents. As such, self-organization is typically robust 
and able to survive and self-repair damage or pertur- 
bations. Historically, self-organization processes have 
been studied in physical, chemical, biological, social, 
and cognitive systems. Well known examples are crys- 


66.2 SI in Optimization 


The use of SI techniques for solving optimization prob- 
lems has already a rather extensive history. SI tech- 
niques have been used for both solving combinatorial 
and continuous optimization problems in static and in 
distributed settings. Two of the most well-known SI 
techniques for solving optimization problems are ant 
colony optimization (ACO) and particle swarm opti- 
mization (PSO). More recently, other techniques such 
as the artificial bee colony algorithm have been de- 
veloped. Apart from solving optimization problems, SI 
techniques are being used for management tasks, for ex- 
ample, in distributed settings or in online optimization. 
The following sections will give a brief overview of this 
application field of SI. 


66.2.1 Ant Colony Optimization 
ACO [66.11] is one of the earliest SI techniques for op- 


timization. Dorigo and colleagues developed the first 
ACO algorithms in the early 1990s [66.12—14]. The 


tallization, molecular self-assembly, and the way in 
which neural networks learn to recognize complex pat- 
terns. 

During the last 50 years or so, biologists discovered 
that many aspects of the collective activities of social 
insects are self-organized as well, that is, they func- 
tion without a central control. For example, the African 
weaver ant constructs nests by pulling leaves together. 
Where the gap between leaves exceeds the body length 
of an individual ant, multiple ants organize into pulling 
chains. Once the leaves are in contact, they are glued 
together using silk from larvae, which are carried to the 
site by other workers of the colony [66.10]. Other exam- 
ples concern the recruitment of fellow colony members 
for prey retrieval (Fig. 66.1), the capabilities of termites 
and wasps to build sophisticated nests, or the ability 
of bees and ants to orient themselves in their environ- 
ment. For more examples, we refer the interested reader 
to [66.1, 2]. 

In the meantime, some of the above mentioned be- 
haviors have been used as inspiration for the resolution 
of technical problems, especially in the context of op- 
timization and in robotics. This chapter is dedicated 
to reviewing some of the — in the opinion of the au- 
thors — most interesting algorithms/systems from these 
two fields. 


development of these algorithms was inspired by the 
observation of ant colonies. Ants are social insects. 
They live in colonies and their behavior is governed 
by the goal of colony survival rather than being fo- 
cused on the survival of individuals. The behavior that 
provided the inspiration for ACO is the ants’ foraging 
behavior, and in particular, how ants of many species 
can find shortest paths between food sources and their 
nest. In order to search for food, ants initially explore 
the area around their nest by means of random walks. 
While moving, ants leave tiny drops of a pheromone 
substance on the ground. Ants are also able to scent 
these pheromones. When choosing their way, they are 
attracted by paths marked by strong pheromone con- 
centrations. When having identified a food source, ants 
evaluate the quantity and the quality of the food and 
carry some of it back to their nest. During the re- 
turn trip, the quantity of pheromone that ants leave 
on the ground may depend on the quantity and qual- 
ity of the food. The pheromone trails will guide other 


Swarm Intelligence in Optimization and Robotics 


66.2 SI in Optimization 


ants to the food source. It has been shown in [66.15] 
that the indirect communication between the ants via 
pheromone trails — known as stigmergy [66.16] — en- 
ables them to find the shortest paths between their 
nest and food sources. Initially, ACO algorithms were 
developed with the aim of solving discrete optimiza- 
tion problems. It should be mentioned, however, that 
nowadays the class of ACO algorithms also comprise 
methods for the application to problems arising in net- 
works, such as routing and load balancing [66.17], and 
for the application to continuous optimization prob- 
lems [66.18]. 

ACO algorithms may be regarded from different 
perspectives. First of all, as mentioned above, they are 
SI techniques. However, seen from an operations re- 
search perspective, ACO algorithms belong to the class 
of metaheuristics [66.19-21]. The term metaheuristic, 
first introduced in [66.22], has been derived from the 
composition of two Greek words. Heuristic derives 
from the verb heuriskein (evptoKetv) which means to 
find, while the prefix meta means beyond, in an upper 
level. Before this term was widely adopted, metaheuris- 
tics were often called modern heuristics [66.23]. In 
addition to ACO, other algorithms such as evolutionary 
computation, iterated local search, simulated annealing, 
and tabu search, are often regarded as metaheuristics. 
For books and surveys on metaheuristics, we refer the 
reader to [66.19—21, 23]. 


Algorithm 66.1 Ant colony optimization (ACO) 
1: while termination conditions not met do 
2: ScheduleActivities 


3 AntBasedSolutionConstruction() 
4 PheromoneUpdate() 

5: DaemonActions(){optional} 

6 end ScheduleActivities 

7: end while 


From a technical perspective, ACO algorithms work 
as follows. Given a combinatorial optimization problem 
to be solved, first a finite set C of the so-called solution 
components, used for assembling solutions to the prob- 
lem, must be defined. Second, a set T of pheromone 
values must be defined. This set of values is commonly 
called the pheromone model, which is — from a mathe- 
matical point of view — a parameterized probabilistic 
model. The pheromone model is one of the central 
components of any ACO algorithm. The pheromone 
values t; € T are commonly associated with solution 
components. The pheromone model is used to prob- 
abilistically generate solutions to the problem under 


consideration by assembling them from the set of solu- 
tion components. In general, ACO algorithms attempt 
to solve an optimization problem by iterating the fol- 
lowing two steps: 


© Candidate solutions are constructed using 
a pheromone model, that is, a parameterized 
probability distribution over the search space. 

© The candidate solutions are used to update the 
pheromone values in a way that is deemed to bias 
future sampling toward high-quality solutions. 


The pheromone update aims to concentrate the 
search in regions of the search space containing high- 
quality solutions. In particular, the reinforcement of so- 
lution components depending on the solution quality 
is an important ingredient of ACO algorithms. It im- 
plicitly assumes that good solutions consist of good 
solution components. To learn which components con- 
tribute to good solutions can help assemble them into 
better solutions. The main steps of any ACO algorithm 
are shown in Algorithm 66.1. DaemonActions (see 
line 5 of Algorithm 66.1) may include, for example, the 
application of local search to solutions constructed in 
function AntBasedSolutionConstruction(). 

The class of ACO algorithms comprises several 
variants. Among the most popular ones are MAX- 
MIN Ant System (MMAS) [66.24] and ant colony 
system (ACS) [66.25]. For more comprehensive infor- 
mation, we refer the interested reader to [66.26]. 


66.2.2 Particle Swarm Optimization 


PSO [66.2, 27] is an SI technique for optimization that 
is inspired by the collective behavior of flocks of birds 
and/or fish schools. The first PSO algorithm was intro- 
duced in 1995 by Kennedy and Eberhart [66.28] for the 
purpose of optimizing the weights of a neural network, 
that is, for continuous optimization. In the meantime, 
PSO has also been adapted for its application to discrete 
optimization problems [66.29]. 

In PSO, solutions to the problem under consider- 
ation are labeled particles. The algorithm works on 
a whole set of particles at the same time, the so-called 
swarm. Therefore, PSO can be seen as a population- 
based optimization technique. During the run time of 
the algorithm, particles move through the search space 
on the search for an optimal, or good enough, so- 
lution. Moreover, particles communicate their current 
positions to neighboring particles. The position of each 
particle is updated according to three terms: its so- 


1293 


7°99 | 4 Hed 


1294 


2°99 | 4 Hed 


Part F 


Swarm Intelligence 


called velocity, the difference between its current po- 
sition and the best position it has found so far, and that 
from the best position found by its neighbors. This has 
the effect that, during the execution of the algorithm, 
the swarm increasingly focuses on areas of the search 
space containing high-quality solutions. The term parti- 
cle swarm was chosen by Kennedy and Eberhart for the 
following reason. Their initial intention was to model 
the movements of flocks of birds and fish schools. As 
their model further evolved toward an algorithm for op- 
timization, the visual plots produced from the results of 
the algorithm rather resembled swarms of mosquitoes. 
The term particle was used due to making use of the 
term velocity, and particle seemed to be the most ap- 
propriate term in this context. 

PSO is closely related to artificial life models. Early 
works by Reynolds on the flocking model known as 
boids [66.30], and Heppner and Grenander’s studies 
on rules governing large numbers of birds flocking 
synchronously [66.31], suggested that bird flocking is 
an emergent behavior resulting from local interactions 
between the birds. These studies laid the foundation 
for the development of PSO for solving optimization 
problems. PSO is — in some way — similar to cellular 
automata (CA), which are often used for generating as- 
tonishing self-replicating patterns based on simple local 
rules. CAs may be characterized by the following three 
main attributes: 


1. Cells are updated in parallel. 

2. The value of each new cell depends only on the old 
values of the cell and its neighbors. 

3. There is no difference in rules for updating different 
cells [66.32]. 


These three attributes also hold for the particles in 
PSO. 

Henceforth, v; denotes the velocity of the ith particle 
in the swarm, x; denotes its position, p; denotes the per- 
sonal best position, and p, is the best position found by 
particles in its neighborhood. In the original PSO algo- 
rithm, v; and x;, for i = 1,...,n, are updated according 
to the following two equations [66.28]: 


vi 4 vi +c R1 Q (pj —Xj) + c&2R2 Q (pe — Xi) , 
(66.1) 


Xi xit Vi, (66.2) 
where R, and R, are independent functions return- 
ing a vector of values, generated uniformly at random, 
from the range [0, 1]. Moreover, cı and cz are the so- 
called acceleration coefficients. The symbol ® refers to 


point-wise vector multiplication. As shown in (66.1), 
the velocity term v; of a particle is composed of three 
components: the momentum, the cognitive and the so- 
cial terms. The momentum term v; carries the particle 
toward the previous direction; the cognitive term, 


cıRı Q (Pi — xi) . 


represents a force that pulls the particle toward its per- 
sonal-best position; finally, the social part, 


c2R2 Q (Pg — Xi) , 


represents a force that influences the new direction 
toward the best position of neighboring particles. Var- 
ious different neighborhood topologies may be used 
for this purpose. Examples include ring, star, and von 
Neumann. The use of rather small neighborhood topolo- 
gies — such as the one induced by the von Neumann 
neighborhood — has generally been shown to lead to 
better results when rather complex problems are ad- 
dressed, whereas larger neighborhoods generally lead 
to a better performance for simpler problems [66.33]. 
Algorithm 66.2 summarizes the basic PSO algorithm. 


Algorithm 66.2 Particle swarm optimization (PS0) 
1: Randomly generate an initial swarm 
2: while termination conditions not met do 
3: for each particle i do 

4 if f(x;) < f (p;) then p; <— x; 

5: Pg = min (Pneighbors) 

6: Update velocity (66.1) 

7 Update position (66.2) 

8: end for 

9: end while 


The class of PSO algorithms is characterized by 
a multitude of different variants, rendering it impos- 
sible to mention all of them here. However, popular 
variants include the Inertia Weight PSO [66.34], fully 
informed PSO [66.33], and adaptive hierarchical parti- 
cle swarm optimizer [66.35]. Moreover, Frankenstein’s 
PSO [66.36] is a PSO variant that was created by 
analyzing the components of existing PSO variants 
and combining (some of) them in a beneficial way. 
For more information, the interested reader may con- 
sult [66.37]. 


66.2.3 Artificial Bee Colony Algorithm 


The artificial bee colony (ABC) algorithm was first 
proposed by Karaboga and Basturk in 2005 [66.38, 


Swarm Intelligence in Optimization and Robotics | 66.2 Slin Optimization 1295 


39]. The inspiration for the ABC algorithm is to be 
found in the foraging behavior of honey bees, which 
essentially consists of three components: food source 
positions, amount of nectar and three types of honey 
bees, that is, employed bees, onlookers, and scouts. In 
short, the algorithm works as follows. Feasible solu- 
tions to the problem under consideration are modeled as 
food source positions. Moreover, the quality of a feasi- 
ble solution is modeled as the amount of nectar present 
at the corresponding food source position. Each type 
of bee is responsible for one particular operation in the 
context of generating new candidate food source po- 
sitions, that is, new candidate solutions. Specifically, 
employed bees will search in the vicinity of the food 
source position that is presently in their memory; mean- 
while they pass information about good food source 
positions to onlooker bees. Onlooker bees tend to se- 
lect good food source positions from those found by 
the employed bees, and then further search for bet- 
ter food source positions around the selected food 
source position. In case the employed bee and the 
onlookers associated with a food source position are 
not able to find a better food source position, their 
current food source position is abandoned and the em- 
ployed bee associated with this food source becomes 
a scout bee that performs a search for discovering 
new food source positions. If a scout identifies a new 
food source position, it turns into an employed bee 
again. 

Essentially, the difference between the ABC algo- 
rithm and other population-based optimization tech- 
niques is to be found in the specific way of managing 
the resources of the algorithm, as suggested by the 
foraging behavior of honey bees. Due to its simplic- 
ity and ease of implementation, the ABC algorithm 
has captured much attention recently. It should also 
be mentioned that, although the algorithm has initially 
been introduced for continuous optimization, in the 
meantime it has been adapted for its application to com- 
binatorial optimization problems as well [66.40, 41]. 
For a recent survey, we refer to [66.42]. 


66.2.4 Other SI Techniques 
for Optimization 
and Management Tasks 


In the following, we briefly mention other applica- 
tions of SI techniques for optimization and management 
tasks, the latter especially for what concerns distributed 
environments. They are grouped with respect to their 
natural inspiration. 


Division of Labor (Ants/Wasps) 

In colonies of ants and wasps, for example, there are 
various tasks to be dealt with by the colony members. 
However, the urgency to engage in certain tasks may 
change over time. In 1984, Wilson [66.43] showed that 
the concept of division of labor in colonies of Pheidole 
genus ants allows the colony to adapt to these changing 
demands. Division of labor was later modeled in [66.44, 
45] by means of response threshold models. 

These models were later used in several techni- 
cal applications. In the following, we mention a few 
of them. Nouyan et al. [66.46] consider static and dy- 
namic task allocation problems in which trucks have 
to be painted in painting booths. Another applica- 
tion concerns media streaming in peer-to-peer net- 
works [66.47]. A multiagent system for the schedul- 
ing of dynamic job shops with flexible routing and 
sequence-dependent setups is considered in [66.48]. 
Merkle et al. [66.49] made use of a response threshold 
model for self-organized task allocation in the context 
of computing systems with reconfigurable components. 
Finally, [66.50] present a system for task allocation in 
distributed environments. 


Cemetery Formation (Ants) 

The term cemetery formation refers to a behavior which 
has been observed in ant colonies of the species Phei- 
dole pallidula, among others, which cluster the bodies 
of dead nest mates. This self-organized behavior has 
given rise to several applications, especially in the con- 
text of clustering and sorting. In 1991, a model for the 
clustering and sorting behavior of ants was published 
in [66.51]. Note that clustering refers in this context to 
the formation of piles, and sorting, on the other hand, 
refers to the spatial arrangement of objects according to 
their properties. 

Mainly based on the model from [66.51], several 
algorithms for clustering and sorting were proposed in 
the literature. The first one was presented in [66.52], 
extending the original model to handle numerical data. 
More recent papers include [66.53] which deals with 
clustering and topographic mapping. Finally, the ceme- 
tery formation behavior of ants has also inspired an 
algorithm for dynamic load balancing [66.54]. 


Flashing in Fireflies 
Fireflies are winged beetles that make use of biolumi- 
nescence to attract mates or prey. Moreover, tropical 
fireflies, in particular the ones from Southeast Asia, 
synchronize their light flashes in large groups of indi- 
viduals. This is a self-organized phenomenon which is 


7°99 | 4 Hed 


1296 


€°99 | 4 Hed 


Part F 


Swarm Intelligence 


mathematically described by the so-called phase-cou- 
pled oscillator models [66.55]. The benefits of this self- 
synchronization are not yet fully understood. Current 
hypotheses consider diet, social interaction, and alti- 
tude. 

The literature contains, at least, two types of tech- 
nical applications that are inspired by different aspects 
of the flashing of fireflies. First, there are applications 
that require some type of self-synchronization. Exam- 
ples include, but are not limited to, a synchronization 
protocol in sensor networks [66.56], the synchroniza- 
tion in overlay networks [66.57], and dynamic pricing 
in online markets [66.58]. Second, the literature offers 
the so-called firefly algorithm (FA) [66.59], which is in- 
spired by the way in which fireflies attract mates or prey. 
This algorithm was initially introduced for continuous 
optimization. It has, however, been adapted for the ap- 
plication to combinatorial optimization as well [66.60]. 


Fish Schooling 
A group of fish that have gathered are commonly called 
an aggregation of fish. Such a fish aggregation is called 
unstructured in the case in which the group consists of 
various species of fish having randomly gathered, for 
example, in the vicinity of a food source. If there is 
some social component to this gathering, the fish are 
said to be shoaling. Shoaling fish are aware of each 
other’s presence, adjusting, for example, their swim- 
ming behavior to each other in order to stay together. 
However, their relation is rather loose. If, in contrast, 
an aggregation of fish is more tightly organized, for ex- 
ample, when all fish move at the same speed in the same 
direction, then the aggregation is said to be school- 
ing. Schooling is a self-organized behavior that results 
from local interactions between the fish. This behav- 
ior comes with several advantages such as providing 
a means for social interactions, more successful forag- 
ing, and predator avoidance. 

There are basically two different algorithms for op- 
timization based on fish schooling to be found in the 
literature. The first algorithm is referred to as the artifi- 
cial fish swarm algorithm (AFSA). It has, for example, 


66.3 SI in Robotics: Swarm Robotics 


Swarm robotics refers to the study and use of SI tech- 
niques for the coordination of groups of robots. The 
following sections provide a brief overview of this field, 
with a focus on swarm robotic systems and the tasks 
they accomplish. 


been applied to the training of feed-forward neural 
networks [66.61], multiuser detection [66.62], image 
segmentation [66.63], and generally to continuous op- 
timization [66.64]. The second algorithm is known as 
fish school search [66.65]. 


Self-Desynchronized Croaking 

(Japanese Tree Frogs) 
Different biological studies — for example, [66.66] — 
have dealt with the croaking of Japanese tree frogs. The 
male individuals make use of their croaks in order to at- 
tract females. Moreover, females of this family of frogs 
can recognize the source of such a croak and are able 
to determine the current location of the corresponding 
male. However, this is only possible if no two frogs (that 
are close enough to the female) croak at the same time. 
In such a case, the female is not able to detect where the 
croaks came from. This is why, over time, male frogs 
evolved a self-organized way of desynchronizing their 
croaks. Aihara et al. [66.67] introduced a first formal 
model based on a set of pulse-coupled oscillators for 
capturing this behavior. So far, this model has only been 
applied to distributed graph coloring [66.68, 69]. How- 
ever, the algorithm proposed in [66.69] is currently the 
state of the art for this problem. 


Nest Building (Termites/Wasps) 

Both termites and wasps build highly complex nests 
in cooperation. The construction of such nests is well 
beyond the capabilities of an individual insect. The 
nests of both termites and wasps have a very com- 
plex internal structure. Moreover, termite nests are 
extremely large in comparison to individual insects. 
Scientists studying the nest-building behavior came 
up with probabilistic models for describing (parts of) 
the behavior [66.70]. It is nowadays generally ac- 
cepted that stigmergy plays a central role in nest 
building. 

Models for nest building based on stigmergy have 
been used mainly in software tools for simulating the 
automated building of certain structures. Examples can 
be found in [66.7 1-74]. 


66.3.1 Systems 


In the late 1940s, Walter [66.75] built two autonomous 
robots called Machina speculatrix, or simply tortoise, 
which exhibited behaviors resembling those of simple 


Swarm Intelligence in Optimization and Robotics 


66.3 SI in Robotics: Swarm Robotics 


animals. The robots had a driving/steering mechanism, 
a head light, a photoreceptor, and a bump sensor. They 
were designed to search for and approach light sources 
of moderate intensity. If a robot observed such a source, 
its head light was turned off, otherwise it was turned on. 
In an experiment, the robots were set up in a dark envi- 
ronment, where they approached each other exhibiting 
complex motion patterns. Such mutual recognition al- 
lowed a population of machines to form a sort of 
community, which broke up once an external light 
source was introduced [66.75, p. 129]. This two-robot 
system may be the first self-organizing multirobot sys- 
tem. Interestingly, even a single robot was reported to 
exhibit complex interactions when facing its mirror im- 
age — such a behavior, if observed in an animal, might 
be accepted as evidence of some degree of self-aware- 
ness [66.75, pp. 128-129]. 

In the 1950s, inspired by von Neumann’s kinematic 
model of machine replication [66.76], the first physi- 
cal models of self-replication were built. Penrose and 
Penrose [66.77] studied a system in which passive me- 
chanical parts move on a linear track when the latter is 
subjected to side-to-side agitation. In their default posi- 
tion, the parts do not link under the influence of shaking 
alone. If a seed object composed of two complementary 
parts, one hooked up to the other, is added, it repli- 
cates by interacting with the other parts on the track. 
Jacobson [66.78] implemented a system in which self- 
propelled electromechanical parts move on a circular 
track with several branches. A seed object composed 
of two parts could trigger other parts to assemble into 
identical objects without human intervention. 

In the late 1980s, studies of Fukuda and Nak- 
agawa [66.79-81], Beni [66.5], and Wang and 
Beni [66.82] provided an enormous impetus for the field 
that developed into swarm robotics. Fukuda and Naka- 
gawa proposed a novel type of robotic system called dy- 
namically reconfigurable robotic system (DRRS), which 
can dynamically reorganize its shape and structure 
[...] for a given task and strategic purpose. DRRS 
is made of several cells with built-in intelligence and 
the ability to autonomously connect to and detach from 
one another [66.81, pp. 55-56]. The authors also pre- 
sented a first prototype of this system, the CEBOT 
Mark I [66.80]. At the same time, Beni introduced 
the term cellular robotic system, referring to a sys- 
tem that can encode information as patterns of its 
own structural units [66.5, p. 59]; the units would be 
structural elements, each with built-in intelligence, able 
to move in space and act asynchronously under dis- 
tributed control. Beni and Wang also used the terms 


swarm and swarm intelligence in this context [66.83, 
84]. 

Other early physical implementations of distributed 
robotic systems include the CEBOT Mark II [66.85], 
ACTRESS [66.86], and GOFER [66.87]. 


Hardware Architectures 
Advances in technology, for example, in computers, 
manufacturing and mobile devices have made it af- 
fordable to study swarms of around 20—1000 physi- 
cal robots [66.88] and up to around 1000000 robots 
in simulation [66.93-95]. At present, most swarm 
robotic systems consist of mobile robots that operate 
on the ground. An example is the Kilobot platform 
(Fig. 66.2a), which was designed to facilitate the fab- 
rication and operation of thousands of robots — includ- 
ing their charging, programming and activation all at 
once [66.88]. Other state-of-the-art robotic systems in- 
clude the r-one (Fig. 66.2b), which features, among 
others, a set of IR transmitters and receivers for com- 
munication and relative localization [66.89], and the 
Khepera I-IV [66.96] and e-puck [66.97], which fea- 
ture a range of sensors including a camera. Increasingly, 
swarm robotic systems operate in spaces other than on 
the ground, such as underwater [66.90, 98] (Fig. 66.2c) 
or in the air [66.99, 100]. In some robotic systems, 
the swarms operate and collaborate across multiple 
spaces, such as on the ground and in the air [66.91, 101] 
(Fig. 66.2d,e). 

According to their system architecture, most swarm 
robotic systems can be categorized into either multi- 
robot systems or modular reconfigurable robot systems. 
Multirobot systems are composed of multiple distinct 
robots, which are typically mobile and able to perform 
(collectively) more than one task in parallel (Fig. 66.2a— 
c). Modular reconfigurable robot systems are composed 
of component modules that can be physically linked 
together to form a robot (Fig. 66.2f). A few hybrid sys- 
tems exist, sharing properties of both multirobot and 
modular reconfigurable robot systems [66.91, 102-104] 
(Fig. 66.2d). 

Of particular interest among systems of modular re- 
configurable robots are those where the robots can build 
themselves [66.105, 106]. The term self-reconfigurable 
denotes the general ability of physical modules to re- 
configure themselves, regardless of whether the process 
is centrally controlled, for example, by an external com- 
puter, or decentralized and autonomous. In the follow- 
ing, we use the term self-assembly to refer to processes 
by which pre-existing components (separate or distinct 
parts of a disordered structure) autonomously organize 


1297 


€°99 | 4 Hed 


1298 PartF | Swarm Intelligence 


€°99 | 4 Hed 


Fig. 66.2a-f Examples of swarm robotic systems: (a) Kilobots developed by Harvard University [66.88]; (b) r-one 
(after [66.89], photo courtesy of J. McLurkin, Rice University); (c) Lily developed in the CoCoRo project (after [66.90], 
photo courtesy of T. Schmickl, University of Graz); (d,e) a heterogeneous system studied in the Swarmanoid project 
(after [66.91], photo courtesy of M. Dorigo, Université Libre de Bruxelles); (f) Pebbles (after [66.92], photo courtesy of 


D. Rus, MIT) 


into patterns or structures without external interven- 
tion. Self-assembly is responsible for the generation of 
much of the order in nature [66.107] and has widely 
been applied in the synthesis of products from molec- 
ular components. Increasingly, the potential of self- 
assembly processes involving larger components — up 
to the centimeter-scale — is being recognized [66.108]. 
In robotic systems, two distinct classes of self-assem- 
bling systems exist [66.109]: (i) systems in which the 
components that self-assemble are externally propelled, 
and (ii) systems in which the components that self-as- 
semble are self-propelled. 


Sensing and Communication 
In most multirobot systems, robots interact with each 
other by using their sensors or some form of com- 
munication. Dudek et al. [66.110] presented a detailed 
taxonomy considering communication range, topology, 
and bandwidth. In the following, we adopt a simpler 
categorization proposed by Cao et al. [66.111]: 


@ Interaction via environment refers to the transfer 
of information that is mediated through the mem- 
ory of the environment. In this case, robots leave 
persistent signs that stimulate the activity of other 
robots. This kind of indirect communication is also 
referred to as stigmergy [66.16]. Stigmergic com- 
munication is widely used in social insect societies, 
for example, during the construction of mounds by 


termites of Macrotermes bellicosus [66.8], and has 
been implemented in several swarm robotic sys- 
tems [66.112-116]. 

Interaction via sensing refers to local interactions 
that occur between agents as a result of agents 
sensing one another, but without explicit communi- 
cation [66.111, p. 12]. We include in this category 
interactions where agents sense each other indi- 
rectly, that is, where the current presence or motion 
of another agent can be inferred from changes in the 
environment. Note that the boundary to stigmergic 
communication is blurred; for example, consider 
the situation where multiple agents push an object 
simultaneously [66.1 17-119]. 

In some social animals, the members of a group ob- 
serve acommon leader individual. Their actions can 
be highly dependent on the observed behavior of 
the leader, as, for instance, during an attack of the 
group [66.120]. In other animals, no recognizable 
leader individual exists; instead, individuals observe 
nearby group members. The latter situation is typi- 
cal for swarm systems. It is reported, for instance, 
for animal groups that exhibit herding, flocking, 
and schooling behavior [66.8]. Note that where the 
groups are not homogeneous, even a minority of in- 
dividuals may be able to influence the rest of the 
group [66.121]. 

In principle, interaction via sensing can be consid- 
ered an implicit form of communication, in par- 


Swarm Intelligence in Optimization and Robotics 


66.3 SI in Robotics: Swarm Robotics 


ticular, as an observed agent can change action 
and thereby influence the behavior of its observers. 
Arkin [66.122] referred to the interaction via sens- 
ing category as cooperation without communica- 
tion, and showed that it is sufficient to accom- 
plish tasks, that require the cooperation of multiple 
robots. Other examples of swarm robotic studies us- 
ing interaction via sensing include [66. 123-126]. 

© Interaction via communication refers to interac- 
tions involving explicit communication. Thereby, 
information is either broadcast or transferred to 
specific teammates. Information transfer can take 
place through direct physical interactions, such as 
touch. This latter form of communication can also 
be referred to as direct interaction [66.127]. Ex- 
plicit communication can improve the performance 
of a multirobot system. This is typically the case 
where the system benefits from robots being re- 
cruited to certain areas of the environment. Balch 
and Arkin [66.128] studied such an environment 
and showed that it can be sufficient for each robot 
to signal its overall state. The transfer of more 
elaborate information however would not result in 
any significant increase in task performance. Ex- 
plicit communication is commonly used in modular 
reconfigurable robot systems, for example, to ex- 
change information between inter-connected mod- 
ules or to support the docking process of separate 
modules [66.129]. 


Control and Coordination 
Over the last two decades, a range of design methods 
have been proposed for the control of swarm robotic 
systems. They can be broadly classified into behavior- 
based design methods and automated design meth- 
ods [66.130]. 

In behavior-based design methods, the user ap- 
proaches the problem in a bottom-up manner [66.131]. 
A repertoire of behaviors for individual robots is de- 
fined and often refined through a trial-and-error process. 
A common approach is the use of finite state machines. 
Each state defines a basic behavior. Transitions between 
states can be triggered by probability, external events, 
time-outs, and combinations of these [66.132-134]. 
A prominent example is the use of response threshold 
functions, for example, 


2 
=e: A 
si/ Oi or 5 i z> 
s; +6; 


1 — exp 


which define the probability for an individual to engage 
in task i based on the perceived task stimulus s; and 


threshold 6;. The particular threshold value 6; can either 
be fixed for each individual from the outset [66.135] or 
learned during its lifetime [66.136, 137]. In both cases, 
the mechanism can facilitate the emergent allocation of 
tasks in groups of otherwise identical individuals (see 
also Sect. 66.2.4). In addition, intentional approaches 
to task allocation have been considered [66.138, 139]. 
These require the agents to cooperate explicitly with 
each other. For example, the decentralized ALLIANCE 
algorithm [66.140, 141] can be used for groups of het- 
erogenous robots to perform tasks and subtasks, which 
may have ordering dependencies, in a fault-tolerant 
way. It assumes that the robots detect with some prob- 
ability the effect of their own actions as well as the 
actions of other team members. 

Virtual potential fields [66.142, 143], and physi- 
comimetics [66.144], is another widely used behavior- 
based design method. The robots mimic a physical par- 
ticle under the influence of a potential field. The latter 
guides the particle toward a point of minimal poten- 
tial energy. While the goal point, which the robot shall 
reach, would exert an attractive force on the particle, 
any obstacle would exert a repulsive force. Other robots 
can exert forces on the particle as well. Using this con- 
cept, a wide repertoire of behaviors can be realized, 
such as the collective movement of robots arranged in 
particular formations [66.145], or the tracking of mul- 
tiple moving targets [66.146]. The properties of the 
resulting swarm systems, for example, the cohesion of 
the swarm, can also be formally analyzed [66.147]. 

Other design methods include the Growing 
Point Language [66.148], the Origami Shape Lan- 
guage [66.149], and Proto [66.150]. These languages 
were developed in the context of Amorphous Comput- 
ing [66.151], which considers systems of massively 
distributed, disordered, asynchronous, and locally 
interacting computational devices. The Proto language 
has been extended for use on mobile devices. This 
extension was validated with a swarm of 40 iRobot 
robots [66.152]. Some amorphous computing ap- 
proaches allow users to specify desired global system 
properties in the language. A compiler then produces 
the local rule set for the agents to achieve these 
properties [66.149]. 

Automated design methods can be grouped into 
reinforcement learning and evolutionary robotics. In re- 
inforcement learning [66.153], an agent interacts with 
its environment by choosing actions and receiving re- 
wards. Matarié [66.154, 155] applied reinforcement 
learning in a swarm robotic context. The robots had to 
learn how to collaborate in a foraging task. The robots 


1299 


€°99 | 4 Hed 


1300 


€°99 | 4 Hed 


Part F 


Swarm Intelligence 


were provided with a set of hand-coded behaviors (as 
in a behavior-based approach) and were required to 
learn how to correlate appropriate conditions for each 
of these behaviors in order to optimize the higher-level 
behavior [66.155]. The difficulties of using reinforce- 
ment learning in a swarm robotic context are discussed 
in [66.130]. A recent survey of reinforcement learning 
in robotics is reported in [66.156]. 

Evolutionary robotics is an approach to design- 
ing robots, or aspects of them (e.g., morphology, 
control) using evolutionary algorithms [66.157, 158]. 
This approach can also be applied to the design of 
swarm robotic systems. In principle, evolution can 
bypass both the problem of decomposing a given 
task and the problem of identifying basic behaviors 
that achieve the subtasks [66.159, 160]. Early studies 
in evolutionary robotics developed collective behav- 
ior such as herding or flocking in simplistic simula- 
tion environments [66.161—163]. Simulation environ- 
ments with physically embodied agents were consid- 
ered in [66.159], where neural network controllers for 
aggregation were first evolved using a group of five 
robots in a simple simulation environment; the best 
of these controllers were subsequently validated us- 
ing a more detailed simulation model of the robots. 
Quinn etal. [66.164] evolved neural network con- 
trollers for collective motion using a group of three 
simulated robots and subsequently tested the best-rated 
network in 100 trials with a group of three physical 
robots. Watson et al. [66.165] went a step further in that 
controllers for a simple phototaxis task were directly 
evolved on a group of eight physical robots. Working 
toward a distributed evolution of robot morphologies in 
hardware, Griffith et al. [66.166] demonstrated a sys- 
tem of template-replicating polymers, which were made 
of reconfigurable modules that slid passively on an 
air table and executed a finite state machine to con- 
trol their connectivity. Recent work on evolutionary 
swarm robotics considers cultural evolution, for ex- 
ample, where behaviors that can be imitated (memes) 
are subject to an evolutionary process. In these, the 
robots engage as both teachers and learners to exchange 
memes [66.167]. 

Several design methods were developed specifically 
for, or mainly adopted in, the context of modular re- 
configurable robot systems. One class of algorithms 
addresses the problem of how to adjust the relative po- 
sitions of modules without changing their connection 
topology. Yim [66.168] proposed the use of gait con- 
trol tables to produce a range of animal-like locomotion 
patterns, such as the walking gaits of hexapods. Each 


gait control table specifies for each control cycle and 
module a basic action to be performed. The controller 
is executed either from a central place or in a distributed 
fashion. In the latter case, the modules synchronize their 
actions using internal timers. Shen et al. [66.169] pro- 
posed hormone-inspired communication and control, in 
which artificial hormones help modules to synchronize 
actions and discover changes in their topology. For ex- 
ample, a set of independent caterpillar-like robots could 
be connected into a single entity, which would adapt 
its gait to the new topology. In a similar experiment, 
a connected entity was manually split into smaller enti- 
ties that continued to move as independent caterpillars. 
Støy [66.170] proposed a role-based control algorithm 
to let modular robots display periodic locomotion pat- 
terns. A module’s role specifies its actions and how 
to synchronize them with neighbor modules. For com- 
munication, a parent-child architecture is used; thus, 
modules need to be arranged in acyclic graphs. An ex- 
tended version of the control algorithm can also cope 
with cycles. 

Another class of algorithms addresses the problem 
of how to adjust the relative positions of modules by 
changing the connection topology [66.106]. One ap- 
proach is to formulate the problem as a search problem. 
For example, in order to reconfigure a lattice-based 
robot from one topology to another, a graph search 
is performed, where the start node of the graph cor- 
responds to the initial topology of the robot and the 
end node corresponds to the desired topology of the 
robot [66.171]. Due to the combinatorial explosion of 
possibilities, an exhaustive search of such graphs is im- 
practical whenever the number of modules is not small. 
State-of-the-art approaches are thus heuristic and con- 
sider ways of reducing the problem complexity. For 
example, Yoshida et al. [66.172] proposed a two-level 
motion planner. A global planner ensures that the robot 
as a whole follows a predefined 3D trajectory. To do 
so, it specifies several candidate paths that bring indi- 
vidual modules from the tail to the head of the robot. 
A motion scheme selector chooses a feasible path for 
each module based on a rule database. Another exam- 
ple is to merge logically a group of nearby modules into 
meta-modules, which, typically, have more advanced 
locomotion abilities than the individual modules. The 
problem is then reduced to developing controllers for 
both meta-modules and modular robots composed of 
meta-modules [66.173]. In principle, modular robots 
can solve the search problem on the fly [66.174]. Other 
than by search, the reconfiguration problem can also 
be attempted by local movement strategies, for ex- 


Swarm Intelligence in Optimization and Robotics | 66.3 SI in Robotics: Swarm Robotics 


ample, random walks [66.175, 176], cellular automata 
rules [66.177], gradient rules [66.178, 179], or combi- 
nations of these [66.180]. These approaches naturally 
lead to decentralized implementations, as is desired in 
swarm robotics. 


66.3.2 Tasks 


A range of capabilities have already been demonstrated 
with swarm robotic systems. In the following, a brief 
overview is given. More detailed information is pro- 
vided in Chaps. 71-74 of Part F of this handbook. 
Garnier et al. [66.189] demonstrated how a group of 20 
Alice robots aggregate in a homogeneous environment. 
The robots mimic the aggregation behavior of cock- 
roaches, which are reported to join and leave clusters 
with probabilities that depend on the sizes of clus- 
ters [66.190]. Such probabilistic algorithms have the ad- 
vantage that, as long as the environment is bounded, it 
is not required that the robots initially form a connected 
graph in terms of their sensing and/or communication. 
A deterministic algorithm for aggregation is considered 
in [66.181]. It requires robots to have one binary sen- 
sor, which informs them whether or not there is another 
robot in their line of sight. The robots do not need mem- 
ory and do not need to perform arithmetic computation. 
They rotate on the spot when they perceive another 
robot, and move backward along a circular trajectory 
otherwise. This algorithm was validated with groups of 
40 e-puck robots (Fig. 66.3a). 

Werfel et al. [66.116] developed a system of robots 
that can simultaneously construct and navigate struc- 
tures from a supply of building blocks (Fig. 66.3b). The 
robots are inspired by termites, which use stigmergic 
rules to construct sophisticated structures, in particular, 
the mounds they inhabit. Given a desired target struc- 
ture, it is possible to generate automatically a set of 
rules to be uploaded onto each robot. Using only local 
information, these rules allow the robots to coordinate 
their activities in a way that avoids conflict. A group of 
three robots constructed several structures, one resem- 
bling a castle. 

Halloy et al. [66.182] showed that hybrid societies 
comprising both cockroaches and robots can collec- 
tively decide to aggregate under either of two shel- 
ters (Fig. 66.3c) and that it is possible for the robots 
to influence the decision-making process. In general, 
such interactive robots could be used to study and 
control animal groups [66.182, 191], including live- 
stock [66.192, 193], and to inform ecological conserva- 
tion policy. 


Following the pioneering simulation works on 
boids [66.30], Turgut et al. [66.183] demonstrated how 
a group of robots can flock through a real environment 
using simple rules. To align with each other, the robots 
used virtual heading sensors, each comprising a digital 
compass and a wireless communication module. Flock- 
ing was demonstrated with 9 Kobot robots in a bounded 
environment (Fig. 66.3d). 

Krieger etal. [66.184] studied algorithms that al- 
low a group of robots to forage (Fig. 66.3e). The robots 
rested in a central place, the nest. A robot would leave 
the nest if the total energy of the colony dropped below 
a threshold. Each robot had its own threshold, which ef- 
fectively enabled the division of labor within the group. 
In addition, a robot would leave the nest when being 
recruited by another robot that had found a cluster of 
food. The pair of robots would then perform a tandem 
run to reach the cluster. The algorithms were tested on 
groups of up to 12 Khepera robots. The groups were 
reported to perform more efficiently when employing 
the division of labor and recruitment mechanisms than 
without such mechanisms. 

Grof etal. demonstrated how a group of 16 
s-bot robots self-assemble into a single composite en- 
tity [66.185]. The process was seeded by one of the 
robots activating its light emitting diode (LED) ring 
in red. Other robots activated their LED rings in blue. 
Once a robot would connect to the seed structure, it 
became red too, thereby attracting other robots to the 
structure as it grows (Fig. 66.3f). The problem of self- 
assembling into arbitrary morphologies of s-bot robots 
was considered in [66.194]. 

Holland and Melhuish [66.186] studied algorithms 
that allow groups of robots to sort (and cluster) ob- 
jects of different types (Fig. 66.3g). Six robots were 
programmed using simple rules, which regulated the 
conditions under which objects of different types were 
picked up and deposited. 

Following the pioneering work of Kube 
etal. [66.195,196], Chen etal. [66.187] proposed 
an algorithm for a group of robots to transport ob- 
jects larger than themselves toward a goal location 
(Fig. 66.3h). The robots were programmed to only 
push the object across the portion of its surface where 
the direct line of sight to the goal is occluded by the 
object. The algorithm was proven to work for objects 
of arbitrary convex shape and it was tested with 20 
e-puck robots. 

Ijspeert et al. [66.188] studied an algorithm that al- 
lows a group of robots to pull sticks out of the ground 
collaboratively (Fig. 66.31). Upon encountering a stick, 


1301 


€°99 | 4 Hed 


1302 


7°99 | 4 Hed 


Part F | Swarm Intelligence 


Fig. 66.3a-i Examples of capabilities demonstrated by swarm robotic systems: (a) aggregation (after [66.181]); (b) 
construction (after [66.116]; reprinted with permission from AAAS); (c) decision making (after [66.182]; photo courtesy 
of J. Halloy, Université Libre de Bruxelles); (d) flocking (after [66.183]; photo courtesy of E. Sahin, Middle East Tech- 
nical University); (e) foraging (after [66.184]; photo courtesy of L. Keller, University of Lausanne); (f) self-assembly 
(after [66.185]); (g) sorting of objects (after [66.186]; photo courtesy of C. Melhuish, Bristol Robotics Laboratory); (h) 
transport of objects (after [66.187]); (i) pulling sticks out of the ground (after [66.188]; reprinted with permission from 


Springer) 


a robot would only be able to pull it partially out of 
the ground. It would then wait for a second robot to 
arrive and pull the stick out completely. The optimal 


66.4 Research Challenges 


Research challenges concerning the use of swarm intel- 
ligence in optimization are mainly related to increasing 
their efficiency. More specifically, in addition to pro- 
viding an innovative way of problem solving, swarm 
intelligence approaches must also be efficient concern- 
ing, for example, computation time in order to be 
able to compete with state-of-the-art optimization tech- 
niques. This may often be achieved by hybridizing 
swarm intelligence approaches with components taken 
from optimization algorithms in other fields such as, 


waiting time for the first robot was derived from an an- 
alytic model of the system. The algorithm was validated 
using a system of six Khepera robots. 


for example, operations research. The interested reader 
may find various references to such kind of techniques 
in [66.197]. 

With regard to swarm robotics, a major challenge 
is the transition from systems operating in structured 
indoor environments, as typically found in laborato- 
ries, to the more complex environments found in the 
real world. Over the next decades, swarms of robots 
are expected to have impact in a range of application 
scenarios, including cognitive factories, deep sea ex- 


Swarm Intelligence in Optimization and Robotics | References 


ploration, disaster management, precision farming, and 
space systems. Working toward more complex environ- 
ments also concerns the ability of swarms of robots to 
interact safely with humans. Another challenge con- 
cerns the miniaturization of swarm robotic systems. 
Most of the current systems comprise of centimeter- 


References 


sized robots. The swarm robotics approach, however, 
should be equally applicable to intelligent autonomous 
devices operating at scales from a millimeter down to 
a micrometer. This could have profound implications, 
for example, on advanced materials and healthcare 
technologies. 


66.1 E. Bonabeau, M. Dorigo, G. Theraulaz: Swarm 
Intelligence: From Natural to Artificial Systems 
(Oxford Univ. Press, New York 1999) 

66.2 J. Kennedy, R.C. Eberhart, Y. Shi: Swarm Intelli- 
gence (Morgan Kaufmann, San Francisco 2001) 

66.3 C. Blum, D. Merkle (Eds.): Swarm Intelligence: 
Introduction and Applications (Springer, Berlin, 
Heidelberg 2008) 

66.4 S.J. Russell, P. Norvig: Artificial Intelligence. 
A Modern Approach (Simon Schuster Co., Engle- 
wood Cliffs 1995) 

66.5 G. Beni: The concept of cellular robotic systems, 
Proc. 3rd IEEE Int. Symp. Intell. Syst., Piscataway 
(1988) pp. 57-62 

66.6 G. Beni, J. Wang: Swarm intelligence, Proc. 7th 
Annu. Meet. Robot. Soc. Japan, RSJ, Tokyo (1989) 
pp. 425-428 

66.7 G. Beni, S. Hackwood: Stationary waves in cyclic 
swarms, Proc. 1992 IEEE Int. Symp. Intell. Control, 
Los Alamitos (1992) pp. 234-242 

66.8 S. Camazine, J.-L. Deneubourg, N.R. Franks, 
J. Sneyd, G. Theraulaz, E. Bonabeau: Self- 
Organization in Biological Systems (Princeton 
Univ. Press, New Jersey 2001) 

66.9 |. Kant: Critique of Judgement (Hackett, Indi- 
anapolis 1987), Translated by W. S. Pluhar 

66.10 B. Hölldobler, £.0. Wilson (Eds.): The Ants 
(Springer, Berlin, Heidlberg 1990) 

66.11 M. Dorigo, T. Stiitzle: Ant Colony Optimization (MIT, 
Cambridge 2004) 

66.12 M. Dorigo: Optimization, Learning and Natural Al- 
gorithms, Ph.D. Thesis (Dipartimento di Elettron- 
ica, Politecnico di Milano, Italy 1992), in Italian 

66.13 M. Dorigo, V. Maniezzo, A. Colorni: Positive feed- 
back as a search strategy, Tech. Rep. 91-016, Di- 
partimento di Elettronica, Politecnico di Milano, 
Italy, 1991 

66.14 M. Dorigo, V. Maniezzo, A. Colorni: Ant System: 
Optimization by a colony of cooperating agents, 
IEEE Trans. Syst. Man Cybern. Part B 26(1), 29-41 
(1996) 

66.15 J.-L. Deneubourg, S. Aron, S. Goss, J.-M. Pasteels: 
The self-organizing exploratory pattern of the ar- 
gentine ant, J. Insect Behav. 3, 159-168 (1990) 

66.16 P.-P. Grassé: La reconstruction du nid et les coor- 
dinations interindividuelles chez Bellicositermes 
natalensis et Cubitermes sp. La théorie de la stig- 


mergie: Essai d'interprétation du comportement 
des termites constructeurs, Insectes Soc. 6(1), 41- 
80 (1959), in French 

66.17 G. Di Caro, M. Dorigo: AntNet: Distributed stig- 
mergetic control for communications networks, 
J. Artif. Intell. Res. 9, 317-365 (1998) 

66.18 K. Socha: ACO for continuous and mixed-variable 
optimization, Lect. Notes Comput. Sci. 3172, 25-36 
(2004) 

66.19 C. Blum, A. Roli: Metaheuristics in combina- 
torial optimization: Overview and conceptual 
comparison, ACM Comput. Surv. 35(3), 268-308 
(2003) 

66.20 F. Glover, G. Kochenberger (Eds.): Handbook of 
Metaheuristics (Kluwer, Boston 2002) 

66.21 H.H. Hoos, T. Stiitzle: Stochastic Local Search: 
Foundations and Applications (Elsevier, Amster- 
dam 2004) 

66.22 F. Glover: Future paths for integer programming 
and links to artificial intelligence, Comput. Oper. 
Res. 13, 533-549 (1986) 

66.23 C.R. Reeves (Ed.): Modern Heuristic Techniques for 
Combinatorial Problems (Wiley, New York 1993) 

66.24 T. Stiitzle, H.H. Hoos: MAX-MIN Ant System, 
Futur. Gener. Comput. Syst. 16(8), 889-914 (2000) 

66.25 M. Dorigo, L.M. Gambardella: Ant colony system: 
A cooperative learning approach to the traveling 
salesman problem, IEEE Trans. Evol. Comput. 1(1), 
53-66 (1997) 

66.26 M. Dorigo, T. Stützle: Ant colony optimization: 
Overview and recent advances. In: Handbook of 
Metaheuristics, ed. by M. Gendreau, J.-Y. Potrin 
(Springer, Berlin, Heidelberg 2010) pp. 227-264 

66.27 M. Clerc (Ed.): Particle Swarm Optimization (ISTE, 
Newport Beach 2006) 

66.28 J. Kennedy, R.C. Eberhart: Particle swarm opti- 
mization, Proc. 1995 IEEE Int. Conf. Neural Netw., 
Piscataway, Vol. 4 (1995) pp. 1942-1948 

66.29 Q.-K. Pan, M. Fatih Tasgetiren, Y.-C. Liang: A dis- 
crete particle swarm optimization algorithm for 
the no-wait flowshop scheduling problem, Com- 
put. Oper. Res. 35(9), 2807-2839 (2008) 

66.30 C.W. Reynolds: Flocks, herds and schools: A dis- 
tributed behavioral model, Comput. Graph. 21(4), 
25-34 (1987) 

66.31 F. Heppner, U. Grenander: A stochastic nonlinear 
model for coordinated bird flocks. In: The Ubiq- 


1303 


99 | 4 Hed 


1304 Part F 


Swarm Intelligence 


99 | 4 Hed 


66.32 


66.33 


66.34 


66.35 


66.36 


66.37 


66.38 


66.39 


66.40 


66.41 


66.42 


66.43 


66.44 


66.45 


66.46 


66.47 


uity of Chaos, ed. by S. Krasner (AAAS, Washington 
DC 1990) 

R. Rucker: Seek! (Four Walls Eight Windows, New 
York 1999) 

R. Mendes, J. Kennedy, J. Neves: The fully in- 
formed particle swarm: Simpler, maybe better, 
IEEE Trans. Evol. Comput. 8(3), 204-210 (2004) 

Y. Shi, R. Eberhart: A modified particle swarm 
optimizer, Proc. 1998 IEEE World Congr. Comput. 
Intell. (1998) pp. 69-73 

S. Janson, M. Middendorf: A hierarchical parti- 
cle swarm optimizer and its adaptive variant, IEEE 
Trans. Syst. Man Cybern. Part B Cybern. 35(6), 1272- 
1282 (2005) 

M.A. de Montes Oca, T. Stiitzle, M. Birattari, 
M. Dorigo: Frankenstein's PSO: A composite par- 
ticle swarm optimization algorithm, IEEE Trans. 
Evol. Comput. 13(5), 1120-1132 (2009) 

M. Dorigo, M.A. de Montes Oca, A. Engelbrecht: 
Particle swarm optimization, Scholarpedia 3(11), 
1486 (2008) 

D. Karaboga, B. Basturk: A powerful and efficient 
algorithm for numerical function optimization: 
artificial bee colony (ABC) algorithm, J. Glob. Op- 
tim. 39(3), 459-471 (2007) 

D. Karaboga, B. Basturk: On the performance of 
artificial bee colony (ABC) algorithm, Appl. Soft 
Comput. 8(1), 687-697 (2008) 

Q.-K. Pan, M.F. Tasgetiren, P.N. Suganthan, 
T.J. Chua: A discrete artificial bee colony algo- 
rithm for the lot-streaming flow shop scheduling 
problem, Inf. Sci. 181(12), 2455-2468 (2011) 

F.J. Rodriguez, C. Garcia-Martinez, C. Blum, 
M. Lozano: An artificial bee colony algorithm 
for the unrelated parallel machines scheduling 
problem, Lect. Notes Comput. Sci. 7492, 143-152 
(2012) 

D. Karaboga, B. Gorkemli, C. Ozturk, N. Karaboga: 
A comprehensive survey: Artificial bee colony 
(ABC) algorithm and applications, Artif. Intell. 
Rev. 42, 21-57 (2014) 

E.0. Wilson: The relation between caste ratios and 
division of labour in the ant genus phedoile, Be- 
hav. Ecol. Sociobiol. 16(1), 89-98 (1984) 

G. Theraulaz, E. Bonabeau, J.-L. Deneubourg: 
Response threshold reinforcement and division 
of labour in insect societies, Proc. Biol. Sci. 
265(1393), 327-332 (1998) 

E. Bonabeau, G. Theraulaz, J.-L. Deneubourg: 
Fixed response thresholds and the regulation of 
division of labor in social societies, Bull. Math. 
Biol. 60, 753-807 (1998) 

S. Nouyan, R. Ghizzioli, M. Birattari, M. Dorigo: An 
insect-based algorithm for the dynamic task al- 
location problem, Künstl. Intell. 4, 25-31 (2005) 
M. Sasabe, N. Wakamiya, M. Murata, H. Miyahara: 
Effective methods for scalable and continuous 
media streaming on peer-to-peer networks, Eur. 
Trans. Telecommun. 15, 549-558 (2004) 


66.48 


66.49 


66.50 


66.51 


66.52 


66.53 


66.54 


66.55 


66.56 


66.57 


66.58 


66.59 


66.60 


66.61 


66.62 


X. Yu, B. Ram: Bio-inspired scheduling for dy- 
namic job shops with flexible routing and 
sequence-dependent setups, Int. J. Prod. Res. 
44(22), 4793-4813 (2006) 

D. Merkle, M. Middendorf, A. Scheidler: Self- 
organized task allocation for computing systems 
with reconfigurable components, Proc. 20th Int. 
Parallel Distrib. Proc. Symp., IPDPS 2006 (2006) 
p. 8 

R. Klazar, A.P. Engelbrecht: Dynamic load balanc- 
ing inspired by division of labour in ant colonies, 
Proc. 2011 IEEE Symp. Swarm Intell., SIS (2011) 
pp. 1-8 

J.-L. Deneubourg, S. Goss, N. Franks, A. Sendova- 
Franks, C. Detrain, L. Chrétien: The dynamics of 
collective sorting: Robot-like ants and ant-like 
robots, Proc. 1st Int. Conf. Simul. Adapt. Behav.: 
From Animals to Animats 1, SAB 91 (MIT, Cambridge 
1991) pp. 356-365 

E.D. Lumer, B. Faieta: Diversity and adaptation in 
populations of clustering ants, Proc. 3rd Int. Conf. 
Simul. Adapt. Behav.: From Animals to Animats 3, 
SAB 94, MIT Diversity, ed. by D. Cliff, P. Husbands, 
J.-A. Meyer, S.W. Wilson (1994) pp. 501-508 

J. Handl, J. Knowles, M. Dorigo: Ant-based clus- 
tering and topographic mapping, Artif. Life 12(1), 
35-62 (2006) 

R. Klazar, A.P. Engelbrecht: Dynamic load bal- 
ancing inspired by cemetery formation in ant 
colonies, Lect. Notes Comput. Sci. 7461, 236-243 
(2012) 

R.E. Mirollo, S.H. Strogatz: Synchronization of 
pulse-coupled biological oscillators, SIAM J. Appl. 
Math. 50(6), 1645-1662 (1990) 

Y.-W. Hong, A. Scaglione: A scalable synchroniza- 
tion protocol for large scale sensor networks and 
its applications, IEEE J. Sel. Areas Commun. 23(5), 
1085-1099 (2005) 

0. Babaoglu, T. Binci, M. Jelasity, A. Montre- 
sor: Firefly-inspired heartbeat synchronization in 
overlay networks, Proc. SASO 2007 - 1st Int. Conf. 
Self-Adapt. Self-Organ. Syst. (2007) pp. 77-86 

J. Jumadinova, P. Dasgupta: Firefly-inspired syn- 
chronization for improved dynamic pricing in on- 
line markets, Proc. SASO 2008 — 2nd IEEE Int. Conf. 
Self-Adapt. Self-Organ. Syst. (2008) pp. 403-412 
X.S. Yang: Nature Inspired Metaheuristic Algo- 
rithms (Luniver, UK 2010) 

G.K. Jati, S. Suyanto: Evolutionary discrete firefly 
algorithm for travelling salesman problem, Lect. 
Notes Comput. Sci. 6943, 393-403 (2011) 

C.-R. Wang, C.-L. Zhou, J.-W. Ma: An improved 
artificial fish-swarm algorithm and its applica- 
tion in feed-forward neural networks, Proc. 2005 
Int. Conf. Mach. Learn. Cybern., Vol. 5 (2005) 
pp. 2890-2894 

M. Jiang, Y. Wang, S. Pfletschinger, M.A. Lagunas, 
D. Yuan: Optimal multiuser detection with artifi- 
cial fish swarm algorithm. In: Advanced Intelli- 


Swarm Intelligence in Optimization and Robotics 


References 


66.63 


66.64 


66.65 


66.66 


66.67 


66.68 


66.69 


66.70 


66.71 


66.72 


66.73 


66.74 


66.75 


66.76 


gent Computing Theories and Applications. With 
Aspects of Contemporary Intelligent Computing 
Techniques, Communications in Computer and 
Information Science, Vol. 2, ed. by D.-S. Huang, 
L. Heutte, M. Loog (Springer, Berlin, Heidelberg 
2007) pp. 1084-1093 

M. Jiang, N.E. Mastorakis, D. Yuan, M.A. Lagu- 
nas: Image segmentation with improved arti- 
ficial fish swarm algorithm, Proc. Eur. Comput. 
Conf., Lect. Notes Electr. Eng., Vol. 28, ed. by 
N. Mastorakis, V. Mladenov, V.T. Kontargyri (2009) 
pp. 133-138 

A.M.A.C. Rocha, T.F.M.C. Martins, E.M.G.P. Fernan- 
des: An augmented Lagrangian fish swarm based 
method for global optimization, J. Comput. Appl. 
Math. 235(16), 4611-4620 (2011) 

C.J.A.B. Filho, F.B. de Lima Neto, A.J.C.C. Lins, 
A.I.S. Nascimento, M.P. Lima: A novel search al- 
gorithm based on fish school behavior, Proc. SMC 
2008 — IEEE Int. Conf. Syst. Man Cybern. (2008) 
pp. 2646-2651 

K.D. Wells: The social behaviour of anuran am- 
phibians, Anim. Behav. 25, 666-693 (1977) 

|. Aihara, H. Kitahata, K. Yoshikawa, K. Aihara: 
Mathematical modeling of frogs’ calling behavior 
and its possible application to artificial life and 
robotics, Artif. Life Robot. 12(1), 29-32 (2008) 

S.A. Lee, R. Lister: Experiments in the dynamics of 
phase coupled oscillators when applied to graph 
coloring, ACSC 2008 —- Proc. 31st Australas. Conf. 
Comput. Sci., Darlinghurst (2008) pp. 83-89 

H. Hernandez, C. Blum: Distributed graph color- 
ing: An approach based on the calling behavior of 
Japanese tree frogs, Swarm Intell. 6, 117-150 (2012) 
E. Boneabeau, G. Theraulaz, J.-L. Deneubourg, 
N.-R. Franks, 0. Rafelsberger, J.L. Joly, S. Blanco: 
A model for the emergence of pillars, walls and 
royal chambers in termite nests, Philos. Trans. R. 
Soc. 353(1375), 1561-1576 (1997) 

Z. Mason: Programming with stigmergy: Using 
swarms for construction, Proc. Artif. Life VIII - 8th 
Int. Conf. Artif. Life (2003) pp. 371-374 

R.L. Stewart, R.A. Russell: A distributed feed- 
back mechanism to regulate wall construction by 
a robotic swarm, Adapt. Behav. 14(1), 21-51 (2006) 
A. Grushin, J.A. Reggia: Stigmergic self-assembly 
of prespecified artificial structures in a con- 
strained and continuous environment, J. Integr. 
Comput.-Aided Eng. 13(4), 289-312 (2006) 

E. Bonabeau, S. Guerin, D. Snyers, P. Kuntz, 
G. Theraulaz: Three-dimensional architectures 
grown by simple stigmergic agents, Biosystems 
56(1), 13-32 (2000) 

W.G. Walter: The Living Brain (W. W. Norton, New 
York 1953) 

J. von Neumann: The general and logical theory of 
automata, Cerebral Mechanisms in Behavior: The 
Hixon Symposium, ed. by L.A. Jeffress (Wiley, New 
York 1951) pp. 1-41 


66.77 


66.78 


66.79 


66.80 


66.81 


66.82 


66.83 


66.84 


66.85 


66.86 


66.87 


66.88 


66.89 


66.90 


66.91 


L.S. Penrose, R. Penrose: A self-reproducing ana- 
logue, Nature 179(4571), 1183 (1957) 

H. Jacobson: On models of reproduction, Am. Sci. 
46, 255-284 (1958) 

T. Fukuda, S. Nakagawa: A dynamically reconfig- 
urable robotic system (concept of a system and 
optimal configurations), Proc. 1987 IEEE Int. Conf. 
Ind. Electron. Control Instrum., Piscataway (1987) 
pp. 588-595 

T. Fukuda, S. Nakagawa: Dynamically reconfig- 
urable robotic system, Proc. 1988 IEEE Int. Conf. 
Robot. Autom., Piscataway, Vol. 3 (1988) pp. 1581- 
1586 

T. Fukuda, S. Nakagawa: Approach to the dy- 
namically reconfigurable robotic system, J. Intell. 
Robot. Syst. 1(1), 55-72 (1988) 

J. Wang, G. Beni: Pattern generation in cellular 
robotic systems, Proc. 3rd IEEE Int. Symp. Intell. 
Control (IEEE, Piscataway 1988) pp. 63-69 

G. Beni, J. Wang: Swarm intelligence in cellular 
robotic systems, Proc. NATO Adv. Workshop Robot. 
Biol. Syst., Il Ciocco (1989) pp. 703-712 

G. Beni, J. Wang: Swarm intelligence, Proc. 7th 
Annu. Meet. Robot. Soc. Japan, RSJ, Tokyo (1989) 
pp. 425-428, in Japanese 

T. Fukuda, S. Nakagawa, Y. Kawauchi, M. Buss: Self 
organizing robots based on cell structures — CE- 
BOT, Proc. 1988 IEEE Int. Workshop Intell. Robot., 
Piscataway (1988) pp. 145-150 

H. Asama, A. Matsumoto, Y. Ishida: Design of an 
autonomous and distributed robot system: AC- 
TRESS, Proc. 1989 IEEE/RSJ Int. Workshop Intell. 
Robot. Syst., Piscataway (1989) pp. 283-290 

P. Caloud, W. Choi, J.-C. Latombe, C. Le Pape, 
M. Yim: Indoor automation with many mo- 
bile robots, Proc. 1990 IEEE Int. Workshop Intell. 
Robot. Syst., Piscataway, Vol. 1 (1990) pp. 67-72 
M. Rubenstein, A. Cornejo, R. Nagpal: Pro- 
grammable self-assembly in a thousand-robot 
swarm, Science 345, 795-799 (2014) 

J. McLurkin, A.J. Lynch, S. Rixner, T.W. Barr, 
A. Chou, K. Foster, S. Bilstein: A low-cost multi- 
robot system for research, teaching, and out- 
reach, Proc. 10th Int. Symp. Distrib. Auton. Robot. 
Syst. (DARS 2010), ed. by A. Martinoli, F. Mon- 
dada, N. Correll, G. Mermoud, M. Egerstedt, M. Ani 
Hsieh, L.E. Parkes, K. Støy (2010) pp. 597-609 

T. Schmickl, R. Thenius, C. Möslinger, J. Timmis, 
A. Tyrrell, M. Read, J. Hilder, J. Halloy, A. Campo, 
C. Stefanini, L. Manfredi, S. Orofino, S. Kernbach, 
T. Dipper, D. Sutantyo: Cocoro: The self-aware un- 
derwater swarm, 2011 Firth IEEE Conf. Self-Adapt. 
Self-Organ. Syst. Workshop, SASOW (2011) pp. 120- 
126 

M. Dorigo, D. Floreano, L.M. Gambardella, F. Mon- 
dada, S. Nolfi, T. Baaboura, M. Birattari, M. Bo- 
nani, M. Brambilla, A. Brutschy, D. Burnier, 
A. Campo, A.L. Christensen, A. Decugniére, G.A. Di 
Caro, F. Ducatelle, E. Ferrante, A. Förster, J. Guzzi, 


1305 


99 | 4 Hed 


1306 Part F 


Swarm Intelligence 


99 | 4 Hed 


66.92 


66.93 


66.94 


66.95 


66.96 


66.97 


66.98 


66.99 


66.100 


66.101 


66.102 


66.103 


66.104 


V. Longchamp, S. Magnenat, J. Martinez Gonza- 
lez, N. Mathews, M.A. de Montes Oca, R. O'Grady, 
C. Pinciroli, G. Pini, P. Rétornaz, J. Roberts, 
V. Sperati, T. Stirling, A. Stranieri, T. Stuetzle, 
V. Trianni, E. Tuci, A.E. Turgut, F. Vaussard: Swar- 
manoid: A novel concept for the study of hetero- 
geneous robotic swarms, IEEE Robot. Autom. Mag. 
20(4), 60-71 (2013) 

K. Gilpin, A. Knaian, D. Rus: Robot pebbles: One 
centimeter modules for programmable matter 
through self-disassembly, IEEE Int. Conf. Robot. 
Auton., ICRA (2010) pp. 2485-2492 

R. Vaughan: Massively multi-robot simulation in 
stage, Swarm Intell. 2(2—4), 189-208 (2008) 

R. Fitch, Z. Butler: Million module march: Scalable 
locomotion for large self-reconfiguring robots, 
Int. J. Robot. Res. 27(3-4), 331-343 (2008) 

M.P. Ashley-Rollman, P. Pillai, M.L. Goodstein: 
Simulating multi-million-robot ensembles, Proc. 
2011 IEEE Int. Conf. Robotic. Autom. (2011) 
pp. 1006-1013 

F. Mondada, E. Franzi, A. Guignard: The De- 
velopment of Khepera, Exp. Mini-Robot Khep- 
era, Proc. Ist Int. Khepera Workshop (HNI- 
Verlagsschriftenreihe, Heinz Nixdorf Institut 1999) 
pp. 7-14 

F. Mondada, M. Bonani, X. Raemy, J. Pugh, 
C. Cianci, A. Klaptocz, S. Magnenat, J.-C. Zufferey, 
D. Floreano, A. Martinoli: The e-puck, a robot de- 
signed for education in engineering, Proc. 9th 
Conf. Mobile Robot. Compet., ROBOTICA 2009, 
Castelo Branco (2009) pp. 59-65 

N. Kottege, U.R. Zimmer: Underwater acoustic lo- 
calization for small submersibles, J. Field Robot. 
28(1), 40-69 (2011) 

J.F. Roberts, T. Stirling, J.-C. Zufferey, D. Floreano: 
3-D relative positioning sensor for indoor flying 
robots, Auton. Robot. 33(1-2), 5-20 (2012) 

A. Kushleyev, D. Mellinger, C. Powers, V. Kumar: 
Towards a swarm of agile micro quadrotors, Au- 
ton. Robot. 35(4), 287-300 (2013) 

L. Chaimowicz, V. Kumar: Aerial shepherds: Co- 
ordination among UAVs and swarms of robots, 
7th Int. Symp. Distrib. Auton. Robot. Syst. (2004) 
pp. 23-25 

F. Mondada, L.M. Gambardella, D. Floreano, 
S. Nolfi, J.-L. Deneubourg, M. Dorigo: The coop- 
eration of swarm-bots: Physical interactions in 
collective robotics, IEEE Robot. Autom. Mag. 12(2), 
21-28 (2005) 

S. Kernbach, E. Meister, F. Schlachter, K. Jebens, 
M. Szymanski, J. Liedke, D. Laneri, L. Winkler, 
T. Schmickl, R. Thenius, P. Corradi, L. Ricotti: Sym- 
biotic robot organisms: REPLICATOR and SYMBRION 
projects, Proc. 8th Workshop Perform. Metr. Intell. 
Syst. (2008) pp. 62-69 

H. Wei, Y. Chen, J. Tan, T. Wang: Sambot: A self- 
assembly modular robot system, IEEE/ASME Trans. 
Mechatr. 16(4), 745-757 (2011) 


66.105 


66.106 


66.107 


66.108 


66.109 


66.110 


66.111 


66.112 


66.113 


66.114 


66.115 


66.116 


66.117 


66.118 


66.119 


66.120 


66.121 


M. Yim, W.-M. Shen, B. Salemi, D. Rus, M. Moll, 
H. Lipson, E. Klavins, G.S. Chirikjian: Modular self- 
reconfigurable robot systems, IEEE Robot. Autom. 
Mag. 14(1), 43-52 (2007) 

K. Støy, D. Brandt, D.J. Christensen: Self- 
Reconfigurable Robots: An Introduction (MIT, 
Cambridge 2010) 

G.M. Whitesides, B. Grzybowski: Self-assembly at 
all scales, Science 295(5564), 2418-2421 (2002) 
G.M. Whitesides, M. Boncheva: Beyond 
molecules: Self-assembly of mesoscopic and 
macroscopic components, Proc. Natl. Acad. Sci. 
USA 99(8), 4769-4774 (2002) 

R. Groß, M. Dorigo: Self-assembly at the macro- 
scopic scale, Proc. IEEE 96(9), 1490-1508 (2008) 
G. Dudek, M. Jenkin, E. Milios: A taxonomy of 
multirobot systems. In: Robot teams: From Diver- 
sity to Polymorphism, ed. by T. Balch, L.E. Parker 
(A. K. Peters, Natick 2002) pp. 3-22 

Y.U. Cao, A.S. Fukunaga, A.B. Kahng: Cooperative 
mobile robotics: Antecedents and directions, Au- 
ton. Robot. 4(1), 7-27 (1997) 

S. Goss, J.-L. Deneubourg: Harvesting by a group 
of robots, Proc. 1st Eur. Conf. Artif. Life (MIT, Cam- 
bridge 1992) pp. 195-204 

R. Beckers, 0. Holland, J.-L. Deneubourg: From 
local actions to global tasks: Stigmergy and col- 
lective robotics, Proc. 4th Int. Workshop Synth. 
Simul. Living Syst. (Artificial Life IV) (MIT, Cam- 
bridge 1994) pp. 181-189 

A. Martinoli: Swarm Intelligence in Autonomous 
Collective Robotics: From Tools to the Analysis and 
Synthesis of Distributed Control Strategies, Ph.D. 
Thesis (EPFL, Lausanne, Switzerland 1999) 

I.A. Wagner, Y. Altshuler, V. Yanovski, A.M. Bruck- 
stein: Cooperative cleaners: A study in ant 
robotics, Int. J. Robot. Res. 27(1), 127-151 (2008) 

J. Werfel, K. Petersen, R. Nagpal: Designing 
collective behavior in a termite-inspired robot 
construction team, Science 343(6172), 754-758 
(2014) 

D.J. Stilwell, J.S. Bay: Toward the development 
of a material transport system using swarms of 
ant-like robots, Proc. 1993 IEEE Int. Conf. Robot. 
Autom. 1 (1993) pp. 766-771 

S. Sen, M. Sekaran, J. Hale: Learning to coor- 
dinate without sharing information, Proc. 12th 
Natl. Conf. Artif. Intell., AAAI'94, Menlo Park (1994) 
pp. 426-431 

R. Groß, M. Dorigo: Evolution of solitary and group 
transport behaviors for autonomous robots ca- 
pable of self-assembling, Adapt. Behav. 16(5), 
285-305 (2008) 

M.R.A. Chance, J.J. Clifford: Social Groups of Mon- 
keys, Apes and Men (E. P. Dutton, New York 1970) 
I.D. Couzin, J. Krause, N.R. Franks, S.A. Levin: 
Effective leadership and decision-making in an- 
imal groups on the move, Nature 433(7025), 513- 
516 (2005) 


Swarm Intelligence in Optimization and Robotics 


References 


66.122 


66.123 


66.124 


66.125 


66.126 


66.127 


66.128 


66.129 


66.130 


66.131 


66.132 


66.133 


66.134 


66.135 


66.136 


66.137 


R.C. Arkin: Cooperation without communica- 
tion: Multiagent schema-based robot navigation, 
J. Robot. Syst. 9(3), 351-364 (1992) 

M.J. Matarić: Designing emergent behaviors: From 
local interactions to collective intelligence, Proc. 
2nd Int. Conf. Simul. Adapt. Behav. (MIT, Cam- 
bridge 1992) pp. 432-441 

M.J. Matarić: Minimizing complexity in con- 
trolling a mobile robot population, Proc. 1992 
IEEE Int. Conf. Robot. Autom., Piscataway (1992) 
pp. 830-835 

Y. Kuniyoshi, N. Kita, S. Rougeaux, S. Sakane, 
M. Ishii, M. Kakikura: Cooperation by observation: 
The framework and basic task patterns, Proc. 1994 
IEEE Int. Conf. Robot. Autom., Piscataway (1994) 
pp. 767-774 

B.P. Gerkey, M.J. Matari¢: Sold!: Auction methods 
for multirobot coordination, IEEE Trans. Robot. 
Autom. 18(5), 758-768 (2002) 

V. Trianni, M. Dorigo: Self-organisation and com- 
munication in groups of simulated and physical 
robots, Biol. Cybern. 95(3), 213-231 (2006) 

T. Balch, R.C. Arkin: Communication in reactive 
multiagent robotic systems, Auton. Robot. 1(1), 
27-52 (1994) 

M. Rubenstein, K. Payne, P. Will, W.-M. Shen: 
Docking among independent and autonomous 
CONRO self-reconfigurable robots, Proc. 2004 IEEE 
Int. Conf. Robot. Autom. 3 (2004) pp. 2877-2882 
M. Brambilla, E. Ferrante, M. Birattari, M. Dorigo: 
Swarm robotics: A review from the swarm engi- 
neering perspective, Swarm Intell. 7(1), 1-41 (2013) 
R.A. Brooks: Intelligence without representation, 
Artif. Intell. 47(1-3), 139-159 (1991) 

S. Berman, A. Halász, M.A. Hsieh, V. Kumar: Op- 
timized stochastic policies for task allocation in 
swarms of robots, IEEE Trans. Robot. 25(4), 927- 
937 (2009) 

S. Nouyan, R. Groß, M. Bonani, F. Mondada, 
M. Dorigo: Teamwork in self-organized robot 
colonies, IEEE Trans. Evol. Comput. 13(4), 695-711 
(2009) 

W. Liu, A.F.T. Winfield: Modeling and optimization 
of adaptive foraging in swarm robotic systems, 
Int. J. Robot. Res. 29(14), 1743-1760 (2010) 

E. Bonabeau, G. Theraulaz, J.-L. Deneubourg: 
Quantitative study of the fixed threshold model 
for the regulation of division of labour in in- 
sect societies, Proc. R. Soc. B 263(1376), 1565-1569 
(1996) 

G. Theraulaz, E. Bonabeau, J.-L. Deneubourg: Re- 
sponse threshold reinforcements and division of 
labour in insect societies, Proc. R. Soc. B 265(1393), 
327-332 (1998) 

T.H. Labella, M. Dorigo, J.-L. Deneubourg: Divi- 
sion of labor in a group of robots inspired by ants’ 
foraging behavior, ACM Trans. Auton. Adapt. Syst. 
1(1), 4-25 (2006) 


66.138 


66.139 


66.140 


66.141 


66.142 


66.143 


66.144 


66.145 


66.146 


66.147 


66.148 


66.149 


66.150 


66.151 


66.152 


66.153 


66.154 


66.155 


66.156 


L.E. Parker, F. Tang: Building multirobot coalitions 
through automated task solution synthesis, Proc. 
IEEE 94(7), 1289-1305 (2006) 

G.A. Korsah, A. Stentz, M. Bernardine Dias: A com- 
prehensive taxonomy for multi-robot task alloca- 
tion, Int. J. Robot. Res. 32(12), 1495-1512 (2013) 
L.E. Parker: ALLIANCE: An architecture for fault- 
tolerant multi-robot cooperation, IEEE Trans. 
Robot. Autom. 14(2), 220-240 (1998) 

L.E. Parker: Adaptive heterogeneous multi-robot 
teams, Neurocomputing 28(1-3), 75-92 (1999) 

0. Khatib: Real-time obstacle avoidance for ma- 
nipulators and mobile robots, Int. J. Robot. Res. 
5(1), 90-98 (1986) 

J.H. Reif, H. Wang: Social potential fields: 
A distributed behavioral control for autonomous 
robots, Robot. Auton. Syst. 27(3), 171-194 (1999) 
W.M. Spears, D.F. Spears (Eds.): Physicomimet- 
ics: Physics-Based Swarm Intelligence (Springer, 
Berlin, Heidelberg 2011) 

W.M. Spears, D.F. Spears: Distributed, physics- 
based control of swarms of vehicles, Auton. 
Robot. 17(2-3), 137-162 (2004) 

A. Kolling, S. Carpin: Cooperative observation of 
multiple moving targets: An algorithm and its 
formalization, Int. J. Robot. Res. 26(9), 935-953 
(2007) 

V. Gazi, K.M. Passino: Stability analysis of social 
foraging swarms, IEEE Trans. Syst. Man Cybern. 
Part B Cybern. 34(1), 539-557 (2004) 

D. Coore: Botanical Computing: A Developmental 
Approach to Generating Interconnect Topologies 
on an Amorphous Computer, Ph.D. Thesis (MIT, 
Cambridge 1999) 

R. Nagpal: Programmable self-assembly using 
biologically-inspired multiagent control, Proc 1st 
Int. Joint Conf. Auton. Agents Multiagent Syst.: 
Part 1, AAMAS '02, New York (2002) pp. 418-425 

J. Beal, J. Bachrach: Infrastructure for engineered 
emergence on sensor/actuator networks, IEEE In- 
tell. Syst. 21(2), 10-19 (2006) 

H. Abelson, D. Allen, D. Coore, C. Hanson, 
G. Homsy, T.F. Knight Jr., R. Nagpal, E. Rauch, 
G.J. Sussman, R. Weiss: Amorphous computing, 
Commun. ACM 43(5), 74-82 (2000) 

J. Bachrach, J. Beal, J. McLurkin: Composable 
continuous-space programs for robotic swarms, 
Neural Comput. Appl. 19(6), 825-847 (2010) 

R.S. Sutton, A.G. Barto: Reinforcement Learning: 
An Introduction (MIT, Cambridge 1998) 

M.J. Mataric: Reward functions for accelerated 
learning, Proc. 11th Int. Conf. Mach. Learn. (Mor- 
gan Kaufmann, San Francisco 1994) pp. 181-189 
M.J. Matari¢: Learning social behavior, Robot. Au- 
ton. Syst. 20(2-4), 191-204 (1997) 

J. Kober, J.A. Bagnell, J. Peters: Reinforcement 
learning in robotics: A survey, Int. J. Robot. Res. 
32(11), 1238-1274 (2013) 


1307 


99 | 4 Hed 


1308 Part F 


Swarm Intelligence 


99 | 4 Hed 


66.157 


66.158 


66.159 


66.160 


66.161 


66.162 


66.163 


66.164 


66.165 


66.166 


66.167 


66.168 


66.169 


66.170 


I. Harvey, P. Husbands, D. Cliff, A. Thompson, 
N. Jakobi: Evolutionary robotics: The Sussex ap- 
proach, Robot. Auton. Syst. 20(2-4), 205-224 
(1997) 

S. Nolfi, D. Floreano: Evolutionary Robotics — 
The Biology, Intelligence, and Technology of Self- 
Organizing Machines (MIT, Cambridge 2000) 


M. Dorigo, V. Trianni, E. Sahin, R. Groß, 
T.H. Labella, G. Baldassarre, S. Nolfi, J.- 
L. Deneubourg, F. Mondada, D. Floreano, 


L.M. Gambardella: Evolving self-organizing be- 
haviors for a swarm-bot, Auton. Robot. 17(2-3), 
223-245 (2004) 

V. Trianni: Evolutionary Swarm Robotics: Evolv- 
ing Self-Organising Behaviours in Groups of Au- 
tonomous Robots. In: Studies in Computational 
Intelligence, Vol. 108, (Springer, Berlin, Heidelberg 
2008) 

C.W. Reynolds: An evolved, vision-based be- 
havioral model of coordinated group motion, 
From Animals to Animats 2. Proc. 2nd Int. Conf. 
Simul. Adapt. Behav. (SAB92) (MIT, Cambridge 
1993) pp. 384-392 

G.M. Werner, M.G. Dyer: Evolution of herding 
behavior in artificial animals, From Animals to 
Animats 2. Proc. 2nd Int. Conf. Simul. Adapt. Be- 
hav. (SAB92) (MIT, Cambridge 1993) pp. 393-399 
L. Spector, J. Klein, C. Perry, M. Feinstein: Emer- 
gence of collective behavior in evolving popu- 
lations of flying agents, Genet. Program. Evol. 
Mach. 6(1), 111-125 (2005) 

M. Quinn, L. Smith, G. Mayley, P. Husbands: 
Evolving controllers for a homogeneous system 
of physical robots: Structured cooperation with 
minimal sensors, Philos. Trans. R. Soc. A 361(1811), 
2321-2343 (2003) 

R.A. Watson, S.G. Ficici, J.B. Pollack: Embodied 
evolution: Distributing an evolutionary algorithm 
in a population of robots, Robot. Auton. Syst. 
39(1), 1-18 (2002) 

S. Griffith, D. Goldwater, J.M. Jacobson: Self- 
replication from random parts, Nature 437(7059), 
636 (2005) 

A.F.T. Winfield, M. Dincer Erbas: On embodied 
memetic evolution and the emergence of be- 
havioural traditions in robots, Memet. Comput. 
3(4), 261-270 (2011) 

M. Yim: Locomotion with a Unit-Modular Recon- 
figurable Robot, Ph.D. Thesis (Dept. Mech. Eng., 
Stanford Univ., Stanford 1994) 

W.-M. Shen, B. Salemi, P. Will: Hormone-inspired 
adaptive communication and distributed control 
for CONRO self-reconfigurable robots, IEEE Trans. 
Robot. Autom. 18(5), 700-712 (2002) 

K. Stgy: Emergent Control of Self-Reconfigurable 
Robots, Ph.D. Thesis (The Maersk Mc-Kinney 
Moller Institute for Production Technology, Univ. 
Southern Denmark, Denmark 2004) 


66.171 


66.172 


66.173 


66.174 


66.175 


66.176 


66.177 


66.178 


66.179 


66.180 


66.181 


66.182 


66.183 


66.184 


66.185 


G. Chirikjian, A. Pamecha: Evaluating efficiency of 
self-reconfiguration in a class of modular robots, 
J. Robot. Syst. 13(5), 317-338 (1996) 

E. Yoshida, S. Murata, A. Kamimura, K. Tomita, 
H. Kurokawa, S. Kokaji: A motion planning 
method for a self-reconfigurable modular robot, 
Proc. 2001 IEEE/RSJ Int. Conf. Intell. Robot. Syst., 
Piscataway, Vol. 1 (2001) pp. 590-597 

D. Brandt, D.J. Christensen: A new meta-module 
for controlling large sheets of ATRON modules, 
Proc. 2007 IEEE Int. Workshop Intell. Robot. Syst. 
(2007) pp. 2375-2380 

Z. Butler, D. Rus: Distributed planning and control 
for modular robots with unit-compressible mod- 
ules, Int. J. Robotic. Res. 22(9), 699-715 (2003) 

S. Murata, H. Kurokawa, S. Kokaji: Self- 
assembling machine, Proc. 1994 IEEE Int. Conf. 
Robot. Autom. 1 (1994) pp. 441-448 

M.D. Rosa, S. Goldstein, P. Lee, J. Campbell, P. Pil- 
lai: Scalable shape sculpturing via hole motion: 
Motion planning in lattice-constrained modular 
robots, Proc. 2006 IEEE Int. Conf. Robot. Autom. 
(2006) pp. 1462-1468 

Z. Butler, K. Kotay, D. Rus, K. Tomita: Generic 
decentralized control for a class of self- 
reconfigurable robots, Proc. 2002 IEEE Int. 
Conf. Robot. Autom., Piscataway, Vol. 1 (2002) 
pp. 809-816 

K. Hosokawa, T. Tsujimori, T. Fujii, H. Kaetsu, 
H. Asama, Y. Kuroda, |. Endo: Self-organizing col- 
lective robots with morphogenesis in a vertical 
plane, Proc. 1998 IEEE Int. Conf. Robot. Autom., 
Piscataway, Vol. 4 (1998) pp. 2858-2863 

M. Yim, Y. Zhang, J. Lamping, E. Mao: Distributed 
control for 3D metamorphosis, Auton. Robot. 
10(1), 41-56 (2001) 

K. Støy: Using cellular automata and gradients to 
control self-reconfiguration, Robot. Auton. Syst. 
54(2), 135-141 (2006) 

M. Gauci, J. Chen, W. Li, T.J. Dodd, R. Groß: Self- 
organised aggregation without computation, Int. 
J. Robot. Res. 33, 1145-1161 (2014) 

J. Halloy, G. Sempo, G. Caprari, C. Rivault, 
M. Asadpour, F. Tche, I. Sad, V. Durier, S. Canonge, 
J.M. Am, C. Detrain, N. Correll, A. Martinoli, 
F. Mondada, R. Siegwart, J.L. Deneubourg: Social 
integration of robots into groups of cockroaches to 
control self-organized choices, Science 318(5853), 
1155-1158 (2007) 

A.E. Turgut, H. Çelikkanat, F. Gökçe, E. Şahin: 
Self-organized flocking in mobile robot swarms, 
Swarm Intell. 2(2-4), 97-120 (2008) 

M.J.B. Krieger, J.-B. Billeter, L. Keller: Ant-like 
task allocation and recruitment in cooperative 
robots, Nature 406(6799), 992-995 (2000) 

R. Grok, M. Bonani, F. Mondada, M. Dorigo: 
Autonomous self-assembly in swarm-bots, IEEE 
Trans. Robot. 22(6), 1115-1130 (2006) 


Swarm Intelligence in Optimization and Robotics 


References 


66.186 


66.187 


66.188 


66.189 


66.190 


66.191 


0. Holland, C. Melhuish: Stigmergy, self- 
organization, and sorting in collective robotics, 
Artif. Life 5(2), 173-202 (1999) 

J. Chen, M. Gauci, W. Li, A. Kolling, R. Groß: 
Occlusion-based cooperative transport with 
a swarm of miniature mobile robots, IEEE Trans. 
Robotic. (in press) 

A.J. ljspeert, A. Martinoli, A. Billard, L.M. Gam- 
bardella: Collaboration through the exploitation 
of local interactions in autonomous collective 
robotics: The stick pulling experiment, Auton. 
Robot. 11(2), 149-171 (2001) 

S. Garnier, C. Jost, J. Gautrais, M. Asadpour, 
G. Caprari, R. Jeanson, A. Grimal, G. Theraulaz: The 
embodiment of cockroach aggregation behavior 
in a group of micro-robots, Artif. Life 14(4), 387- 
408 (2008) 

R. Jeanson, C. Rivault, J.-L. Deneubourg, 
S. Blanco, R. Fournier, C. Jost, G. Theraulaz: 
Self-organized aggregation in cockroaches, 
Anim. Behav. 69, 169-180 (2005) 

J. Krause, A.F.T. Winfield, J.-L. Deneubourg: In- 
teractive robots in experimental biology, Trends 
Ecol. Evol. 26(7), 369-375 (2011) 


66.192 


66.193 


66.194 


66.195 


66.196 


66.197 


R. Vaughan, N. Sumpter, J. Henderson, A. Frost, 
S. Cameron: Experiments in automatic flock 
control, Robot. Auton. Syst. 31(1/2), 109-117 
(2000) 

A. Gribovskiy, J. Halloy, J.-L. Deneubourg, 
H. Bleuler, F. Mondada: Towards mixed societies 
of chickens and robots, Proc. 2010 IEEE Int. 
Workshop Intell. Robot. Syst. (2010) pp. 4722- 
4728 

A.L. Christensen, R. O'Grady, M. Dorigo: 
Swarmorph-script: A language for arbitrary 
morphology generation in self-assembling 
robots, Swarm Intell. 2(2-4), 143-165 
(2008) 

C.R. Kube, H. Zhang: Collective robotics: From so- 
cial insects to robots, Adapt. Behav. 2(2), 189-218 
(1993) 

C.R. Kube, E. Bonabeau: Cooperative transport by 
ants and robots, Robot. Auton. Syst. 30(1-2), 85- 
101 (2000) 

C. Blum, J. Puchinger, G. Raidl, A. Roli: Hybrid 
metaheuristics in combinatorial optimization: 
A survey, Appl. Soft Comput. 11(6), 4135-4151 
(2011) 


1309 


99 | 4 Hed 


1311 


67. Preference-Based Multiobjective Particle Swarm 
Optimization for Airfoil Design 


Robert Carrese, Xiaodong Li 


a) 
fey) 
=E 

ary aa 67.1 Airfoil Design.............0....cccccceeeeeeees 1311 a 

A significant challenge to the application of evo- 67.1.1 pech Design Architecture 1311 = 

lutionary multiobjective optimization (EMO) for 67.1.2 Intelligent Optimization: PSO... 1312 = 

transonic airfoil design is the often excessive 67.1.3 Multiobjective Optimization 1313 

number of computational fluid dynamic (CFD) 67.1.4 Surrogate Modeling ......-cssss.00--- 1316 

simulations required to ensure convergence. In ea 

: EE : 67.2 Shape Parameterization 

this study, a multiobjective particle swarm op- 

NE SED and Flow Solver... 131% 

timization (MOPSO) framework is introduced, 67.2.1 The PARSEC Parameterization 

which incorporates designer preferences to pro- 1T Method 1317 

vide further guidance in the search. A reference 67.2.2 Transonic Flow Solver................. 1318 

pomt i prejeciat onto Gne Pereo IemcBahe by 67.3 Optimization Algorithm........................ 1319 

the designer to guide the swarm towards so- 6731 The Reference Point Method 1319 

lutions of interest. The framework is applied 67.3.2 Üsei-Prefetente Multiobjective 

to a typical transonic airfoil design scenario ae PSO: UPMOPSO 1320 

for robust aerodynamic performance. Time- 67.3.3. Kriging Modeling... 1322 

adaptive Kriging models are constructed based 67.3.4 Reference Point Screening 

on a high-fidelity Reynolds-averaged Navier- CEDON ocsi 1323 


Stokes (RANS) solver to assess the performance 
of the solutions. The successful integration of 
these design tools is facilitated through the ref- 


67.4 Case Study: Airfoil Shape Optimization.. 1323 
67.4.1 Pre-Optimization 


; : and Variable Screening.............. 1324 
erence point, which ensures that the swarm 67.4.2 Optimization Results.................. 1325 
does not deviate from the preferred search 67.4.3 Post-Optimization 
trajectory. A comprehensive discussion on the and Trade-Off Visualization ........ 1336 
proposed optimization framework is provided, 67.4.4 Final Designs oa. ccscasacsixvasasirstians 1327 
highlighting its viability for the intended design 67.5 CONCIUSION................. cc ceeeeeeeeccceeeeeeeeeaee 1329 
PRHO i 1329 


67.1 Airfoil Design 


Airfoil design originates from an understanding of the 
fundamental physics of flight, where the aim is to 
identify or conform to the best possible shape for the 
given operating requirements. It has evolved from the 
use of wind tunnel catalogs and traditional cut-and- 
try methods to automated computational frameworks. 
While automated frameworks effectively simplify the 
design process, success is still largely dependent on 
the fidelity of the computational methods, as well as 
the experience of the designer in formulating the prob- 
lem [67.1]. This section is devoted to a discussion 


of airfoil design optimization architecture. The con- 
cepts that are especially applicable to this study are 
introduced, laying the foundations for the proposed 
methodology. 


67.1.1 Airfoil Design Architecture 


The direct method of airfoil design, pioneered by the 
work of Hicks and Henne [67.2], refers to the philos- 
ophy of using mathematical optimization methods to 
identify the optimal shape that achieves the prescribed 


1312 


1'29 | 4 Hed 


Part F | Swarm Intelligence 


Define objective Geometrical shape 


function(s) parameterization 
Define Computational 
constraints flow solver 


Initialize 
candidate shape(s) 


Convergence 
obtained? 


Optimal shape | 


design criteria. The generalized framework for an aero- 
dynamic shape optimization process is demonstrated in 
Fig. 67.1. The success of the direct approach is essen- 
tially dependent on three main components within the 
design loop: 


@ Shape parameterization. All design strategies share 
the common requirement that the geometry is rep- 
resented by a finite number of design variables. 
A method to mathematically parameterize shapes 
is, therefore, required so that modifications can be 
made via direct manipulation of the design vari- 
ables. The number of design variables is directly 
proportional to the geometrical degrees of freedom 
and, therefore, governs the dimensionality of the 
problem. 

© Computational flow solver. The objective function 
is obtained from the flow solver and it is, therefore, 
up to the discretion of the designer to appropriately 
formulate the objective and constraint functions, 
such that they reflect the design and operating re- 
quirements. The choice of the flow solver ultimately 
governs the overall fidelity and efficiency of the op- 
timization process, since repeated evaluations of the 
objective function are required for each candidate 
shape. 

© Optimization algorithm. The responsibility of the 
optimizer is to iteratively determine the shape modi- 
fications required to satisfy the objective, whilst ad- 
hering to any shape or performance constraints. The 
optimizer should be robust and applicable to a wide 
operational spectrum, yet efficient to guarantee con- 
vergence with the least computational expense. 


The integration of high-fidelity flow solvers and 
flexible parameterization methods for numerical op- 


candidate shape(s) 


Fig. 67.1 Generalized process 
flowchart for direct airfoil shape op- 
timization 


Perturb 


Optimization 


timization is still a computationally challenging and 
intensive undertaking. The extension to multiple objec- 
tives leads to a more generalized problem formulation, 
yet significantly increases the computational cost of 
convergence. While all elements of the design loop in- 
fluence the efficiency of the process, arguably the most 
important element is the optimizer itself. The following 
section introduces the optimization paradigm adopted 
in this study, derived from the field of computational 
swarm intelligence. 


67.1.2 Intelligent Optimization: PSO 


The formation of hierarchies within groups of animals 
is a naturally occurring phenomenon and is simple to 
comprehend. Even humans have the intuitive tendency 
to appoint leaders (e.g., political leaders, military gen- 
erals, etc.). Another interesting phenomenon, which is 
more difficult to perceive, is the self-organized behav- 
ior of groups where a leader cannot be identified. This 
is known as swarming and is evident from the flock- 
ing behavior of birds or fish moving in unison. The 
increasingly cited field of computational swarm intel- 
ligence focuses on the artificial simulation of swarming 
behavior to model a wide range of applications, includ- 
ing optimization [67.3]. 

Particle swarm optimization (PSO) is the stochas- 
tic population-based technique described by Kennedy 
and Eberhart [67.4] in accordance with the principles 
of swarm intelligence. The PSO architecture was de- 
rived from a synthesis of the fields of social psychology 
and engineering optimization. As was eloquently stated 
by the authors in their original paper [67.4]: 


Why is social behavior so ubiquitous in the animal 
kingdom? Because it optimizes. What is a good way 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design 


67.1 Airfoil Design 


to solve engineering optimization problems? Mod- 
eling social behavior. 


The dynamics of the swarm are modeled on the 
social-psychological tendency of individuals to learn 
from previous experience and emulate the success of 
others. Similar to most evolutionary techniques, the 
swarm is initialized with a population of random indi- 
viduals (particles) sampled over the design space. The 
particles navigate the multi-dimensional design space 
over a number of iterations or time steps. Each particle 
maintains knowledge of its current position in the de- 
sign space. This is analogous to the fitness concept of 
conventional evolutionary algorithms (EAs). Each par- 
ticle also records its personal best position, which is 
where the particle has experienced the greatest success. 
Aside from recording personal information, each par- 
ticle also tracks the position of other members in the 
swarm. This level of social interaction between parti- 
cles is coined the swarm topology. Particles may either 
be confined to share information only with their imme- 
diate neighbors, or they may be encouraged to share 
their experiences with the entire swarm. Utilizing this 
information, each particle adjusts its position in the de- 
sign space by accelerating towards the successful areas 
of the design space. The absence of selection is com- 
pensated by this use of leaders to guide the swarm to 
converge to the most successful position. In this way, 
a solution which initially performs poorly may possibly 
be on the future road to success. 

PSO has steadily gained popularity as a global op- 
timization technique [67.3]. Its increasing use in the 
literature is due to its simple and straightforward imple- 
mentation (despite its intricate origins) and its efficient 
and accurate convergence rates [67.5]. 


67.1.3 Multiobjective Optimization 


Airfoil design problems are often characterized by sev- 
eral interacting or conflicting requirements, which must 
be satisfied simultaneously. In the case of an airfoil 
operating within the transonic regime, airfoil shape op- 
timization is performed to limit shock and viscous drag 
(C4) losses, and reduce shock-induced boundary layer 
instability at the design Mach number (M) and lift 
coefficient (C). This often occurs at the expense of ex- 
cessive pitching moments (Cm) due to aft loading and 
performance degradation under off-design conditions. 
To facilitate adequate performance over a wide opera- 
tional spectrum requires a search algorithm capable of 
handling multiple conflicting objectives. 


Let S € R” denote the design space and let x = 
{X1,X2,..-,Xn}€S denote the decision vector with 
lower and upper bounds Xin and Xmax, respectively. The 
generic unconstrained multiobjective problem (MOP) is 
thus expressed as, 


min f(x) = {fi(&),..-.fm()} . (67.1) 


where f;(x): IR” — R is the i-th component of the ob- 
jective vector and m is the number of objectives. The 
definition of the optimum must be redefined since in 
the presence of conflicting objectives, improvement in 
one objective may cause a deterioration in another. It is 
often necessary to identify a set of trade-off solutions, 
which can all be considered equally optimal. A solution 
is termed non-dominated or Pareto optimal (after the 
nineteenth century Italian economist Vilfredo Pareto) if 
the value of any objective cannot be improved without 
deteriorating at least one other objective. The candidate 
solutions are defined as a and b € S. The candidate a 
dominates the candidate b (denoted by a < b) if, 


Vi=l,....m f(a) <fb) AF: f@ <f). 
(67.2) 


The concept of dominance is illustrated in Fig. 67.2. 
The shaded area denotes the area of objective vectors 
dominated by a. A decision vector a® is, therefore, non- 
dominated or Pareto optimal if there is no other feasible 
decision vector a £ a* € S such that f(a) < f(a*). The 
Pareto front is the set of objective vectors which cor- 
respond to all non-dominated solutions. Multiobjective 
algorithms aim to identify the closest approximation to 
the true Pareto front, while ensuring a diverse Pareto 
optimal set. 


h 
Dominance region 
of solution a 
f(b) |-------------- a 9 
i@)\|-------------- — 
| 
fila) filb) 7 


Fig. 67.2 Illustration of dominance on a two-objective 
landscape 


1313 


1'29 | 4 Hed 


1314 Part F 


Swarm Intelligence 


1'29 | 4 Hed 


Techniques for Solving MOP 
From a design perspective, the primary aim of mul- 
tiobjective optimization is to obtain Pareto optimal 
solutions which are in the preferred interests of the de- 
signer, or best suit the intended application. Methods 
for solving MOPs are, therefore, characterized by how 
the designer preferences are articulated. As suggested 
by Fonseca and Fleming [67.6], there are three generic 
classes of methods for solving multiobjective problems: 


© A priori methods. The preferences of the designer 
are expressed by aggregating the objective functions 
into a single scalar through weights or bias, ulti- 
mately making the problem single objective. 

© A posteriori methods. The algorithm first identifies 
a set of non-dominated solutions, subsequently pro- 
viding the designer greater flexibility in selecting 
the most appropriate solution. 

@ Interactive methods. The decision making and op- 
timization processes occur at interleaved steps, and 
the preferences of the designer are interactively re- 
fined. 


The a priori strategy requires the designer to indi- 
cate the relative importance of each objective before 
performing the optimization. A notable method that 
falls into this category is the weighted aggregation 
method, which is a fairly popular choice for airfoil de- 
sign applications due to its simplicity and capability of 
handling many flight conditions [67.7—9]. Despite its 
popularity, there are recognized deficiencies with this 
strategy [67.10]. The prior selection of weights does 
not necessarily guarantee that the final solution will 
faithfully reflect the preferred interests of the designer, 
and varying the weights continuously will not neces- 
sarily result in an even distribution of Pareto optimal 
solutions, nor a complete representation of the Pareto 
front [67.11]. 

Alternatively the a posteriori methods provide max- 
imum flexibility to the designer to identify the most pre- 
ferred solution, at the expense of greater computational 
effort. Generally, these methods involve explicitly solv- 
ing each objective to obtain a set of non-dominated 
solutions, a concept which is ideal for population-based 
evolutionary algorithms [67.12—14]. While these meth- 
ods are computationally more complex, researchers in 
aerodynamic design are realizing the benefits of evolu- 
tionary multiobjective optimization (EMO), especially 
if there is a certain ambiguity in selecting the final de- 
sign [67.15-17]. However, it poses the challenge of 
identifying and exploiting the entire Pareto front, which 


may be impractical for design applications due to the 
excessive number of function evaluations. 

While conventional EMO techniques may be com- 
putationally demanding, Fonseca and Fleming [67.12] 
argue that their most attractive aspect is the intermedi- 
ate information generated, which can be exploited by 
the designer to refine preferences and improve conver- 
gence. These interactive methods involve the progres- 
sive articulation of preferences, which originates from 
the multicriteria decision making literature [67.18]. The 
optimization and decision making processes are in- 
terleaved, exploiting the intermediate information pro- 
vided by the optimizer to refine preferences [67.6]. 


Handling Multiple Objectives with PSO 
PSO has been demonstrated to be an effective tool 
for single-objective optimization problems due to its 
fast convergence [67.5]. It has also gained rapid pop- 
ularity in the field of multi-objective optimization 
(MOO) [67.19]. Since PSO is a population-based tech- 
nique, it could ideally be tailored to identify a number 
of trade-off solutions to a MOP in one single run, sim- 
ilar to EMO techniques. Comprehensive surveys on 
extending PSO to handle multiple objectives have been 
provided by Engelbrecht [67.20], and more recently by 
Sierra and Coello Coello [67.19]. It was established that 
the primary ambiguity in specifically tailoring PSO to 
handle multiple objectives was the selection of guides 
for each particle to avoid convergence to a single so- 
lution. The selection process for particle leaders must, 
therefore, be restructured, to encourage search diversity 
and to ensure that non-dominated solutions found dur- 
ing the search are maintained. 

Initial attempts to design a multiobjective particle 
swarm optimization (MOPSO) algorithm were moti- 
vated by the archive strategy by [67.21]. Coello Coello 
and Lechuga [67.22] incorporated the concept of Pareto 
dominance in PSO by maintaining two independent 
populations: the particle swarm and the elitist archive. 
Non-dominated solutions are stored in the archive and 
subsequently used as neighborhood leaders. The objec- 
tive space is separated into hypercubes, which serve 
as a particle anti-clustering mechanism. Solutions in 
sparsely populated hypercubes have a higher selection 
pressure to be leaders, and solutions in densely pop- 
ulated hypercubes are removed if the archive limit is 
exceeded. This initial approach was later extended by 
Mostaghim and Teich [67.23], who studied the concept 
of €-dominance and compared it to existing clustering 
techniques for fixing the archive size, with favorable re- 
sults. 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design 


67.1 Airfoil Design 


Fieldsend and Singh [67.24] addressed the compu- 
tational complexity of maintaining a restricted archive, 
by incorporating the dominated tree method. This data 
structure allows for an unrestricted archive size, which 
interacts with the population to define global leaders. 
A turbulence operator (similar to the concept of mu- 
tation in EA) was also implemented, where swarm 
members were randomly displaced on the design space 
to reduce the probability of premature stagnation. In 
the non-dominated sorting particle swarm optimization 
(NSPSO) algorithm of Li [67.25], the non-dominated 
sorting mechanisms of non-dominated sorting genetic 
algorithm (NSGA-II) are incorporated. The popula- 
tion and the personal best position of each particle are 
consolidated to form one single population, and the 
non-dominated sorting scheme is utilized to rank each 
solution. Global guides are selected based on particle 
clustering, where a niching or crowding distance met- 
ric is used to further classify non-dominated solutions. 
Li later proposed the maximinPSO algorithm [67.26], 
which does not use any niching method to maintain 
diversity. 

Sierra and Coello Coello [67.27] proposed an eli- 
tist archive incorporating the €-dominance strategy to 
maintain global leaders for the swarm. A crowding dis- 
tance operator is employed to classify non-dominated 
solutions and maintain uniformity. The crowding dis- 
tance operator is also used to limit the number of 
candidate leaders after each population update, simpli- 
fying the mechanism to control the set of candidate 
leaders. A turbulence operator is implemented to en- 
courage diversity, whereby particles are randomly mu- 
tated. A similar approach by [67.28] was developed in 
parallel (although this method does not implement €- 
dominance), where the crowding distance was used to 
both define the global guides and truncate the size of the 
archive. The proposed algorithm is primarily influenced 
by the two latter studies. 


Preference-Based Optimization 
The concept of interactive optimization has led to 
an increasing interest in coupling classical interactive 
methods to EMO as an intuitive way of reflecting the 
designer preferences and identifying solutions of inter- 
est to the designer. This has led to the development 
of the preference-based optimization philosophy, which 
provides the motivation for the current study. Compre- 
hensive surveys on preference-based optimization are 
provided by Coello Coello [67.29] and Rachmawati 
and Srinivasan [67.30]. The first recorded attempt at 
incorporating preferences within an evolutionary mul- 


tiobjective optimization framework was made by Fon- 
seca and Fleming [67.31] using the goal programming 
approach. Goal programming [67.11] is an ideal ap- 
proach to indicate desired levels of performance for 
each objective, since they closely relate to the final 
solution of the problem. Goals may either represent 
target or ideal values. Fonseca and Fleming later ex- 
tended the approach where an online decision making 
strategy was proposed based on goal and priority in- 
formation [67.6]. A goal programming mechanism for 
identifying preferred solutions for MOP was also pro- 
posed by [67.32]. While the reported frameworks draw 
on the preferred interests of the designer to aid the 
optimization process, the goal programming approach 
is computationally complex, and there is no means of 
specifying any relation or trade-off between the objec- 
tives [67.30]. 

Thiele et al. [67.33] proposed another variant of 
interactive evolutionary multiobjective optimization. 
A coarse representation of the Pareto front is initially 
presented to the designer. The most interesting regions 
are subsequently isolated, on which the algorithm con- 
tinues to focus on exclusively. This proposal effectively 
removes the necessity to predefine target values for each 
objective and provides the designer with a means of 
isolating the preferred trade-offs. However, it is a two- 
stage approach requiring an initial approximation to the 
Pareto front, which may be unnecessarily expensive. 
The integration of other classical preference articula- 
tion methods has also been proposed in the literature. 
A reference point-based evolutionary multiobjective 
optimization framework was proposed by [67.34]. The 
crowding distance operator of the NSGA-II algorithm 
was modified to include the reference point information 
and the extent of the preferred region was controlled by 
€-dominance. Deb and Kumar also experimented with 
the use of other classical preference methods, such as 
the reference direction method [67.35] and the light 
beam search method [67.36]. 

Recently, the use of interactive methods has also 
been integrated within PSO frameworks. Wickramas- 
inghe and Li [67.37] integrated the reference point 
method to both the NSPSO [67.25] and maxim- 
inPSO [67.26] algorithms. Significant improvement in 
convergence efficiency was highlighted, and it was 
demonstrated that final solutions are of higher rele- 
vance to the designer. Wickramasinghe and Li [67.38] 
later extended their approach to handle MOP, by re- 
placing the dominance criteria entirely with the simpler 
distance metric. It was conclusively demonstrated that 
without the use of the reference point, obtaining a final 


1315 


1'29 | 4 Hed 


1316 Part F 


Swarm Intelligence 


1'29 | d Hed 


set of preferred solutions solely through conventional 
dominance-based techniques is improbable. 


67.1.4 Surrogate Modeling 


The most prohibiting factor of design optimization 
is the cost of evaluating the objective and constraint 
functions. For high-fidelity airfoil design, function eval- 
uations may very well be measured in hours. This 
computational burden ultimately questions the practi- 
cality of performing an optimization study, and is often 
alleviated by simply reducing the level of sophistica- 
tion of the solver. This consequently reduces the fidelity 
of the final design, which is undesirable. Another miti- 
gating strategy which has steadily gained popularity in 
design is the use of inexpensive surrogates or metamod- 
els [67.39]. These models emulate the response of the 
expensive function at an unobserved location, based on 
observations at other locations. Surrogate models are 
not specifically optimization methods, but rather they 
may ideally be used in lieu of the expensive function 
to extract information from the design space during the 
optimization process. 

The insightful texts by Keane and Nair [67.39] and 
Forrester et al. [67.40] provide a detailed account of the 
use of surrogates in design. The most common use is to 
construct a curve fit of an expensive function landscape, 
which can be used to predict results without recourse to 
the original function. This is supported by the assump- 
tion that the inexpensive surrogate will still be usefully 
accurate when predicting sufficiently far from observed 
data points [67.40]. Figure 67.3 illustrates the use of 


fœ) 
204 
— Original function 
15 @ Observed points 


---- Surrogate 


0.2 0.4 0.6 0.8 1 


Fig. 67.3 Constructing an interpolation-based surrogate to 
fit a one-dimensional function 


a surrogate to fit the one-dimensional multi-modal func- 
tion, based on four sample observations. It is important 
to note, however, that the original function landscape 
could potentially represent any deterministic quantity of 
the design space. Rather than exactly emulating the re- 
sponse of a high-fidelity flow solver, the surrogate may, 
in fact, be used to bridge the gap between flow solvers 
of varying fidelity [67.40]. Alternatively, a surrogate 
may be used to interpret or filter noisy landscapes, so 
as to eliminate the adverse effects of flow solver con- 
vergence or grid discretization. Surrogates may also be 
used for data mining and design space visualization. 
Such methodologies are applied to extract useful infor- 
mation about the relationship between the design space 
and the objective space, allowing informed decisions to 
be made, which could simplify a seemingly complex 
problem. 

For the aforementioned uses of surrogate model- 
ing, the common requirement is to replicate the func- 
tion relationship between the variable inputs and the 
output quantity of interest. This is typically achieved 
by sampling the design space using the exact func- 
tion to sufficiently model the underlying relationship 
within the allowable computational budget. Whether 
the aim is to locally model the design space surround- 
ing an existing design or tune a surrogate to repli- 
cate the global design space is entirely dependent on 
the formation of the sampling plan [67.39]. The con- 
struction of a surrogate model in either case should 
ideally make use of a parallel computing structure. 
A suitable surrogate model f of the precise objec- 
tive function f should then be constructed to fit the 
dataset. 

There are a multitude of popular techniques for 
constructing surrogates in the literature. For a com- 
prehensive review of different methods, the reader is 
referred to (among others) [67.39-42]. The selection 
of the surrogate model is dependent on the information 
that the designer is attempting to extract from the design 
space. Polynomial response surfaces and radial basis 
functions are fairly popular techniques for constructing 
local surrogates, especially if some level of regression 
is desirable. Techniques such as Kriging or support 
vector machines are more ideally suited to global op- 
timization studies, since they offer greater flexibility in 
tuning model parameters and provide a confidence in- 
terval of the predicted output. Neural networks require 
extensive training and validation, yet have also been 
a popular technique for design applications, notably in 
aerodynamic modeling [67.43] and visualization tech- 
niques [67.44, 45]. 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design 


67.2 Shape Parameterization and Flow Solver 


67.2 Shape Parameterization and Flow Solver 


It was established in Sect. 67.1.1 that the shape param- 
eterization method essentially governs the dimension- 
ality of the problem and the attainable shapes, whereas 
the objective flow solver dictates the overall fidelity of 
the optimum design. In this section, we present a dis- 
cussion on these elements of the design loop to be used 
in conjunction with the developed optimizer for the sub- 
sequent design process. 


67.2.1 The PARSEC Parameterization Method 


Geometry manipulation is of particular importance in 
aerodynamic design. The selection of the shape param- 
eterization method is an important contributing factor, 
since it will effectively define the objective landscape 
and the topology of the design space [67.46]. If the 
aim of the optimization process is to improve on an 
established design, then perhaps local parameterization 
methods, which offer a greater number of geometri- 
cal degrees of freedom, are desirable. However, the 
large number of variables may cause the convergence 
rate for global design applications to deteriorate. The 
development of efficient parameterization models has, 
therefore, been given significant attention, to increase 
the flexibility of geometrical control with a minimum 
number of design variables. 

For certain applications, it is possible to make use of 
fundamental aerodynamic theory to refine the param- 
eterization method, such that the design variables re- 
late to important aerodynamic or geometric quantities. 
A common method for airfoil shape parameterization 
is the PARSEC method [67.47]. It has the advantage 
of strict control over important aerodynamic features, 
and it allows independent control over the airfoil geom- 
etry for imposing shape constraints. The methodology 


Xxup 


Fig. 67.4 Airfoil representation via the PARSEC method 


is characterized by 11 design variables (Fig. 67.4), in- 
cluding leading edge radius (rg), upper and lower 
thickness locations (xup, ZUP, XLO, ZLO) and curvatures 
(Zxxyp+ ZxxL9), trailing edge direction (arg) and wedge 
angle (re), and trailing edge coordinate (zre) and 
thickness (Azrg). The shape function is modeled via 
a sixth-order polynomial function 


6 
1 
n= 


Zk = ank’ Xp > (67.3) 


n=l 


where (x,z) are the shape coordinates and k denotes 
either the upper (suction) or lower (pressure) airfoil 
surface. The coefficients a, are determined from the ge- 
ometric parameters. A modification by Jahangirian and 
Shahrokhi [67.48] was introduced to provide additional 
control over the trailing edge curvature. For supercrit- 
ical transonic airfoils, this is beneficial to reduce the 
probability of downstream boundary layer separation, 
which results in increased drag values. A new vari- 
able Aarg was introduced, which directly influences 
the additional curvature of the trailing edge. The modi- 
fication decouples the trailing edge parameterization by 
first defining a smoother upper surface contour and then 
constraining the lower surface to intersect the trailing 
edge coordinate. Figure 67.5 illustrates the modifica- 
tion to the trailing edge curvature. The modification is 
applied to the upper and lower surfaces as follows 
L-tan AQTE 


= 


te [1 +n- (1-7)“] , 7.) 


where the constants 7, jz, and t are set to 0.8, 2, and 6, 
respectively. The modification is applied over the entire 


Fig. 67.5 Additional trailing edge curvature via the modi- 
fied PARSEC method 


1317 


729 | 4 Hed 


1318 Part F 


Swarm Intelligence 


729 | 4 Hed 


Table 67.1 PARSEC parameter ranges for transonic optimization 


Description Variable 
Leading edge radius TLE 
Trailing edge direction OTE 
Trailing edge wedge angle Bre 
Upper-crest abscissa XUP 
Upper-crest ordinate ZUP 
Upper-crest curvature ZxxUP 
Lower-crest abscissa XLO 
Lower-crest ordinate ZLO 
Lower-crest curvature Bri 
Trailing edge curvature Sate 


surface, such that Lyp = Lio = c, where c is the airfoil 
chord length. 

Table 67.1 presents the upper and lower bound- 
aries for the subsequent optimization case study. These 
boundaries have been selected based on a thorough 
screening study involving a statistical sample of a num- 
ber of benchmark airfoils. 


67.2.2 Transonic Flow Solver 


The optimization process is ultimately dependent on 
the selection of the flow solver, since it is the most 
computationally expensive component, and repeated 
evaluations of the objective and constraint functions 
are required for each candidate shape. However if the 
flow solver is not sufficiently accurate, the optimization 
process will converge to shapes that exploit the numer- 
ical errors or limitations, rather than the fundamental 
physics of the problem. For this reason, it is desirable 
to maintain the correct balance between solution accu- 
racy and computational expense, which is dictated by 
the flow regime. For certain problems where the aero- 
dynamic flow field is well behaved, it may be sufficient 
to consider more robust linear solvers. However for 
high-fidelity design it is prudent to consider non-linear 
and more computationally demanding solvers, to ensure 
that optimized shapes provide the anticipated perfor- 
mance requirements in flight. 

The general purpose finite volume code ANSYS 
Fluent is adopted in this study. A pressure-based nu- 
merical procedure is adopted with third-order spatial 
discretization to capture the occurring flow phenomena. 
The momentum equations and pressure-based continu- 
ity equation are solved concurrently, with the Courant— 
Friedrichs—Lewy number set at 200. The one-equation 


Lower bound Upper bound 
0.0063 0.0151 
0.2405(—) 0.0026(—) 
0.0655 0.2618 
0.3170 0.5250 
0.0497 0.0683 
0.5135(—) 0.2393(—) 
0.2835 0.3418 
0.0603(—) 0.0478(—) 
0.2535 0.8405 
0.0080(—) 0.3696 


-0.2 0 


0.2 04 06 08 1 1.2 


x—coordinate 


Fig. 67.6 C-type grid for transonic simulation 


Spalart—Allmaras turbulence model [67.49] is selected, 
and turbulent flow is modeled over the entire airfoil 
surface. The C-type grid (as represented in Fig. 67.6) 
stretches 25 chord lengths aft and normal of the airfoil 
section. Resolution of the C-grid is 460 x 65, providing 
an affordable mesh size of approximately 30000 ele- 
ments. The first grid point is located 2.5 x 1074 units 
normal to the airfoil surface, resulting in an average y- 
plus value of 120. In the interest of robust and efficient 
convergence rates, a full multi-grid (FMG) initializa- 
tion scheme is employed, with coarsening of the grid 
to 30 cells. In the FMG initialization process, the Euler 
equations are solved using a first-order discretization to 
obtain a flow field approximation before submission to 
the full iterative calculation. 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design 


67.3 Optimization Algorithm 


67.3 Optimization Algorithm 


The proposed algorithm was primarily motivated by the 
studies of Wickramasinghe and Li [67.38]. The prin- 
cipal argument is that for most design applications, 
to explore the entire Pareto front is often unneces- 
sary, and the computational burden can be alleviated by 
considering the immediate interests of the designer. In 
Sect. 67.1.3, a discussion on the benefits of preference- 
based optimization was provided. Drawing on these 
concepts, a preference-based algorithm is proposed, 
where a designer-driven distance metric is used to scalar 
quantify the success of a solution. The multiobjective 
search effort is coordinated via a MOPSO algorithm. 
The swarm is guided by a reference point, which is an 
intuitive means of articulating the preferences of the 
designer and can ideally be based on an existing or 
target design. This section provides a comprehensive 
discussion on the proposed algorithm, highlighting its 
viability for the intended domain of application. 


67.3.1 The Reference Point Method 


In this research, the swarm is guided by a reference 
point to confine its search focus exclusively on the 
preferred region of the Pareto front as dictated by the 
preferences of the designer. Introducing the preferred 
region provides the designer flexibility to explore other 
interesting alternatives. This hybrid methodology is ad- 
vantageous for navigating high-dimensional and multi- 
modal landscapes, which are typical of aerodynamic 
design problems. Furthermore, inherently considering 
the preferences of the designer provides a feasible 
means of quantifying the practicality of a design. 


The Reference Point Distance Metric 
The reference point method has been integrated into 
MOO algorithms, notably by Deb and Sundar [67.34] 
and Wickramasinghe and Li [67.37,38]. These stud- 
ies highlight the benefits of incorporating preference 
information via the reference point in terms of con- 
vergence. Guided by the information provided by the 
reference point, the swarm can simultaneously identify 
multiple solutions in the preferred region. This provides 
the designer flexibility to explore several preferred de- 
signs, while alleviating the computational burden of 
identifying the entire Pareto front. A reference point 
distance metric following the work of Wickramasinghe 
and Li [67.37] is proposed. This metric provides an in- 
tuitive criterion to select global leaders and assists the 
swarm to identify only solutions of interest to the de- 


signer. The distance of a particle x to the reference point 
z is defined as 


d(x) = max {(fi (x) —z)} . (67.5) 


A solution a is, therefore, preferred to solution b if 
d,(a) < d_(b). This condition is an extension of the con- 
dition f(a) < f(b), therefore, the distance metric may, in 
fact, substitute the dominance criteria entirely [67.38]. 
Using this distance metric, the swarm is guided to pre- 
ferred regions of the Pareto optimal front. Figure 67.7 
illustrates the search directions of the algorithm when 
guided by a reference point, and the corresponding 
preferred design as a direct result of minimizing the dis- 
tance metric d,. 

The distinguishing feature of the reference point 
distance metric over the mathematical Euclidean dis- 
tance is that solutions do not converge to the reference 
point, but on the preferred region of the Pareto front 
as dictated by the search direction. This is illustrated 
in Fig. 67.8. All solutions are non-dominated and lie 
on the circular arc surrounding the reference point Z, 
and thus the Euclidean distance to the reference point 
is equal. However, since solution 7 has the smallest 
maximum translational distance to the reference point 
compared to any other solution, it is considered pre- 
ferred. The definition of the reference point distance 
also suggests negative values. If the distance of the 
preferred solution d,(z’) < 0, then it can simply be con- 
sidered that the reference point is dominated or z’ < 
z. Since the designer generally has no prior knowl- 
edge of the topology and location of the Pareto front, 


vot 


a 

fi 

Fig. 67.7 Illustration of the search direction governed by 
the reference point 


1319 


€°29 | 4 Hed 


1320 


€°29 | 4 Hed 


Part F 


Swarm Intelligence 


a reference point may be ideally placed in any fea- 
sible or infeasible region, as is shown in Fig. 67.7. 
It is, therefore, the consensus that the reference point 
draws on the experience of the designer to express 
the preferred compromise, rather than specific target 
values or goals. Similarly, the reference point dis- 
tance metric ranks or assesses the success of a particle 
as one single scalar, instead of an array of objective 
values. 


Defining the Preferred Region 
As is demonstrated in Fig. 67.7, if there is no con- 
trol over the solution spread the swarm will explore 
the preferred search direction and converge to the sin- 
gle solution z’ as dictated by the reference point Z. 
The advantage of maintaining a population of parti- 
cles provides the designer the possibility to explore 
a range of interesting alternatives within a preferred 
region of the Pareto front. The aim is, therefore, to iden- 
tify a set of solutions surrounding the intersection point 
z’. A threshold parameter 5 > 0 is defined, such that 
a solution x is within the preferred region if the follow- 
ing conditional statement is true 

d(x) < dZ) +8. (67.6) 
Figure 67.9 illustrates the preferred region for a bi- 
objective problem. The extent of the solution spread is 
proportional to 6 and evidently as ô — 0, the designer 
is interested in determining only the most preferred 
solution z’. Conversely, as 6 —> oo, the designer is in- 
terested in determining all solutions along the Pareto 
front, and thus the influence of the reference point loca- 
tion diminishes. 


h 

i-1 

| Fag 

| . 

i `N 

| k . 

i ne 

a s 

| ‘S 

\ = N 

| a Y 

| di+1) 

le »>@ i+] 

! = 

| D 

| x 

ER 

> 
fi 


Fig. 67.8 Illustration of the reference point distance for 
solutions with equal Euclidean distance 


67.3.2 User-Preference Multiobjective PSO: 
UPMOPSO 


The proposed algorithm combines the searching pro- 
ficiency of PSO and the guidance of the reference 
point method. The swarm is guided by the user-defined 
reference point to confine its search to focus exclu- 
sively on the identified preferred region of the Pareto 
front. While the concept of the reference point is fairly 
intuitive, ensuring that the swarm is guided by this 
information to identify preferred solutions is more am- 
biguous. The algorithm function is consolidated in 
Algorithm 67.1 and further described in the subsequent 
steps. 


Algorithm 67.1 The UPMOPSO algorithm 
1: OBTAIN user-defined preferences 
2: INITIALIZE swarm 
3: EVALUATE fitness and distance metric 
4: 


ASSIGN personal best 
a) hA 
> 
fi 
b) ft 


NI 


Vd; < d;(z') + 02 
Í 


fi 


Fig. 67.9a,b Definition of the preferred region via the pa- 
rameter 5. (a) 5; = 0.01, (b) 62 = 0.001 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design 


67.3 Optimization Algorithm 


5: CONSTRUCT archive 

6: t=1 

7: repeat 

8: SELECT global leaders 

9: UPDATE particle velocity 
10: | UPDATE particle position 
11: ADJUST boundary violation 
12: EVALUATE fitness and distance metric 
13: UPDATE personal best 
14: UPDATE archive 
15: t=f+1 
16: until t = fmax OR fmax 


OBTAIN User-Defined Preferences 

The designer stipulates the reference point z and the 
corresponding solution spread 6 to define the loca- 
tion and extent of the preferred region. For airfoil 
design applications, designers can exploit the exist- 
ing domain knowledge to determine the most feasi- 
ble performance compromise for the desired operating 
conditions. 


INITIALIZE the Particles 

A swarm of N particles is required to navigate the de- 
sign space S bounded by Xmin and Xmax. To safeguard 
against magnitude and scaling issues, all variables are 
normalized into the unit cube, such that S = [0, 1]”. The 
i-th particle in the swarm is characterized by the n- 
dimensional vectors x; and v;, which are the particle 
position and velocity, respectively. These vectors are 
randomly initialized within the unit cube at time t = 
0. The particle personal best position is recorded as 
the particle position, such that p; = x;. The particles 
are then evaluated with the objective functions and fit- 
ness is assigned. The reference point distance metric is 
computed for each particle to measure the individual 
preference value. 


UPDATE Archive and SELECT Global Leaders 

A secondary population of non-dominated solutions 
in the form of an elitist archive is maintained at 
time, t. The non-dominated solutions identified by 
the particles are appended to the archive. A non- 
dominated sorting procedure is applied, where all 
members pertaining to local inferior fronts are omit- 
ted. The archive serves as a mutually accessible 
memory bank for the particles of the swarm. Each 
member is a potential candidate for global leader- 
ship of the particles during the subsequent velocity 
update. 


Defining the global leaders ultimately governs the 
direction of the search. The swarm should efficiently 
navigate the design space such that the search effort is 
locally focused within the preferred region and provides 
a uniform spread of solutions. Since all members of the 
archive are mutually non-dominated, a ranking proce- 
dure is necessary to distinguish the most appropriate 
candidates for leadership from the remaining members. 
At each time step ft, the most preferred solution z’(t) 
is recorded. The subset of members X,(t) selected for 
global leadership satisfy the condition of (67.6), such 
that 


Xe (Ð) E€ d(O) <d.@/()) +8. (67.7) 


Since not every member will initially satisfy this condi- 
tion, the number of candidate leaders may fluctuate over 
time. This condition provides the necessary selection 
pressure for particles to locally focus the search effort 
within the preferred region, avoiding the unnecessary 
computational effort of exploring undesired regions of 
the design space. Each swarm particle is randomly as- 
signed a leader to promote diversity in the search. In the 
case where all non-dominated solutions satisfy the con- 
dition of (67.7), additional guidance through a crowding 
distance metric (as described in [67.27]) is provided to 
promote a uniform spread. 

As the particles are guided to converge to the pre- 
ferred region, the number of identified non-dominated 
solutions will steadily increase. To avoid this number 
escalating unnecessarily and to maintain high com- 
petitiveness within the archive, there is a restriction 
(denoted by Kmax) on the number of solutions permit- 
ted for entry. If the number of members K > Kmax, 
the newest solution is permitted entry, and the exist- 
ing least preferred member is removed. If all archive 
members exist within the preferred region, the most 
crowded solutions are removed. This ensures that solu- 
tions in densely populated regions are removed in favor 
of solutions which exploit sparsely populated regions, 
to further promote a uniform spread. 


UPDATE Particle Position 
The update equations of PSO adjust the position of 
the i-th particle from time ¢ to t+ 1. In this algo- 
rithm, the constriction type 1 framework of Clerc and 
Kennedy [67.50] is adopted. In their studies, the authors 
studied particle behavior from an eigenvalue analysis of 
swarm dynamics. The velocity update of the i-th par- 
ticle is a function of acceleration components to both 
the personal best position, p;, and the global best po- 


1321 


€°29 | 4 Hed 


1322 


€°29 | 4 Hed 


Part F 


Swarm Intelligence 


sition, py. The updated velocity vector is given by the 
expression, 


vi(t+ 1) = xvi (À + Ri[0, 91] 8 iD —xi() 
+ R20, p] 8 (Pe) —xi())}. (67.8) 


The velocity update of (67.8) is quite complex and is 
composed of many quantities that affect certain search 
characteristics. The previous velocity v;(t) serves as 
a memory of the previous flight direction and prevents 
the particle from drastically changing direction and is 
referred to as the inertia component. The cognitive com- 
ponent of the update equation (p;(t)— x;(t)) quantifies 
the performance of the i-th particle relative to past per- 
formances. The effect of this term is that particles are 
drawn back to their own best positions, which resem- 
bles the tendency of individuals to return to situations 
where they experienced most success. The social com- 
ponent (p,(t) —x;(t)) quantifies the performance of the 
i-th particle relative to the global (or neighborhood) best 
position. This resembles the tendency of individuals to 
emulate the success of others. 

The two functions R,[0,¢] and R2[0, p2] return 
a vector of uniform random numbers in the range [0, g1] 
and [0, g2], respectively. The constants g, and g2 are 
equal to g/2 where g = 4.1. This randomly affects the 
magnitude of both the social and cognitive components. 
The constriction factor y applies a dampening effect as 
to how far the particle explores within the search space 
and is given by 


X= 2/|2—@- V¢?—4g|. 


Once the particle velocity is calculated, the particle is 
displaced by adding the velocity vector (over the unit 
time step) to the current position, 


(67.9) 


xilt + 1) = x(t) + vit + 1) š (67.10) 
Particle flight should ideally be confined to the feasible 
design space. However, it may occur during flight that 
a particle involuntarily violates the boundaries of the 
design space. While it is suggested that particles which 
leave the confines of the design space should simply 
be ignored [67.51], the violated dimension is restricted 
such that the particle remains within the feasible design 
space without affecting the flight trajectory. 


UPDATE Personal Best 
The ambiguity in updating the personal best using the 
dominance criteria lies in the treatment of the case 


when the personal best solution p;(t) is mutually non- 
dominated with the solution x;(t + 1). The introduction 
of the reference point distance metric elegantly deals 
with this ambiguity. If the particle position x;(t + 1) is 
preferred to the existing personal best p;(t), then the 
personal best is replaced. Otherwise the personal best 
is remained unchanged. 


67.3.3 Kriging Modeling 


Airfoil design optimization problems benefit from the 
construction of inexpensive surrogate models that em- 
ulate the response of exact functions. This section 
presents a novel development in the field of preference- 
based optimization. Adaptive Kriging models are in- 
corporated within the swarm framework to efficiently 
navigate design spaces restricted by a computational 
budget. The successful integration of these design tools 
is facilitated through the reference point distance met- 
ric, which provides an intuitive criterion to update the 
Kriging models during the search. 

In most engineering problems, to construct a glob- 
ally accurate surrogate of the original objective land- 
scape is improbable due to the weakly correlated design 
space. It is more common to locally update the predic- 
tion accuracy of the surrogate as the search progresses 
towards promising areas of the design space [67.40]. 
For this purpose, the Kriging method has received 
much interest, because it inherently considers confi- 
dence intervals of the predicted outputs. For a complete 
derivation of the Kriging method, readers are encour- 
aged to follow the work of Jones [67.41] and Forrester 
et al. [67.40]. We provide a very brief introduction to 
the ordinary Kriging method, which expresses the un- 
known function y(x) as, 

yx) = B+2z(x), (67.11) 
where x = [x1,...,%,] is the data location, 6 is a con- 
stant global mean value, and z(x) represents a local 
deviation at the data location x based on a stochastic 
process with zero-mean and variance o? following the 
Gaussian distribution. The approximation ĵ(x) is ob- 
tained from 

$(x) = B+r7R7'(Y—-18), (67.12) 
where B is the approximation of 6, R is the correla- 
tion matrix, r is the correlation vector, Y is the training 
dataset of N observed samples at location X, and 1 is 
a column vector of N elements of 1. The correlation 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design | 67.4 Case Study: Airfoil Shape Optimization 1323 


matrix is a modification of the Gaussian basis function, 
n , 
R(x', x’) = exp (-Zsut-xe) ; (67.13) 
k=1 


where 6; > 0 is the k-th element of the correlation pa- 
rameter 0. Following the work of Jones [67.41], the 
correlation parameter 0 (and hence the approximations 
B and ô?) are estimated by maximizing the concen- 
trated In-likelihood of the dataset Y, which is an n- 
variable single-objective optimization problem, solved 
using a pattern search method. The accuracy of the pre- 
diction j at the unobserved location x depends on the 
correlation distance with sample points X. The closer 
the location of x to the sample points, the more confi- 
dence in the prediction )(x). The measure of uncertainty 
in the prediction is estimated as 


(1—17R™"r)? 


Pæ =a [1-rR r+ TRI 


| (67.14) 


if x C X, it is observed from (67.14) that 5(x) reduces 
to zero. 


67.3.4 Reference Point Screening Criterion 


Training a Kriging model from a training dataset is time 
consuming and is of the order O(N?). Stratified sam- 
pling using a maximin Latin hypercube (LHS [67.52] 
is used to construct a global Kriging approximation 
[X, Y]. The non-dominated subset of Y is then stored 
within the elitist archive. This ensures that candidates 
for global leadership have been precisely evaluated (or 
with negligible prediction error) and, therefore, offer no 
false guidance to other particles. Adopting the concept 
of individual-based control [67.42], Kriging predictions 
are then used to pre-screen each candidate particle after 


the population update (or after mutation) and subse- 
quently flag them for precise evaluation or rejection. 
The Kriging model estimates a lower-confidence bound 
for the objective array as 


Fix), ... fm) }m = a — @81(@)},..., Gm) 
— WS (x)}] , (67.15) 


where w = 2 provides a 97% probability for f(x) to 
be the lower bound value of Ê (x). An approximation to 
the reference point distance, d, (x), can thus be obtained 
using (67.5). This value, whilst providing a means of 
ranking each solution as a single scalar, also gives an 
estimate to the improvement that is expected from the 
solution. At time ¢, the archive member with the highest 
ranking according to (67.5) is recorded as din. The can- 
didate x may then be accepted for precise evaluation, 
and subsequent admission into the archive if d(x) < 
dmin. Particles will thus be attracted towards the areas of 
the design space that provide the greatest resemblance 
to z, and the direction of the search will remain consis- 
tent. 

As the search begins in the explorative phase and the 
prediction accuracy of the surrogate model(s) is low, de- 
pending on the deceptivity of the objective landscape(s) 
there will initially be a large percentage of the swarm 
that is flagged for precise evaluation. Subsequently, 
as the particles begin to identify the preferred region 
and the prediction accuracy of the surrogate model(s) 
gradually increases, the screening criterion becomes 
increasingly difficult to satisfy, thereby reducing the 
number of flagged particles at each time step. To re- 
strict saturation of the dataset used to train the Kriging 
models, a limit is imposed of N = 200 sample points 
where lowest ranked solutions according to (67.5) are 
removed. 


67.4 Case Study: Airfoil Shape Optimization 


The parameterization method and transonic flow solver 
described in the preceding section are now integrated 
within the Kriging-assisted UPMOPSO algorithm for 
an efficient airfoil design framework. The framework is 
applied to the re-design of the NASA-SC(2)0410 airfoil 
for robust aerodynamic performance. A three-objective 
constrained optimization problem is formulated, with 
fi = Ca and f =—C,, for M = 0.79, Cı = 0.4, and 
fs = 9Cqa/0M for the design range M = [0.79, 0.82], 
Cı = 0.4. The lift constraint is satisfied internally within 


the solver, by allowing Fluent to determine the an- 
gle of incidence required. A constraint is imposed on 
the allowable thickness, which is defined through the 
parameter ranges (see Table 67.2) as approximately 
9.75% chord. The reference point is logically selected 
as the NASA-SC(2)0410, in an attempt to improve on 
the performance characteristics of the airfoil, whilst still 
maintaining a similar level of compromise between the 
design objectives. The solution variance is controlled 
by ô = 5 x 107°. 


7°19 | d Hed 


1324 Part F 


Swarm Intelligence 


7°19 | d Hed 


The design application is segregated into three 
phases: pre-optimization and variable screening; opti- 
mization and; post-optimization and trade-off screen- 
ing. 


67.4.1 Pre-Optimization 
and Variable Screening 


Global Kriging models are constructed for the aero- 
dynamic coefficients from a stratified sample of N = 
100 design points based on a Latin hypercube de- 
sign. This sampling plan size is considered suffi- 
cient in order to obtain sufficient confidence in the 
results of the subsequent design variable screening 
analysis. Whilst a larger sampling plan is essential 
to obtain fairly accurate correlation, the interest here 
is to quantify the elementary effect of each vari- 
able to the objective landscapes. The global Kriging 
models are initially trained via cross-validation. The 
cross-validation curves for the Kriging models are 
illustrated in Fig. 67.10. The subscripts to the aero- 
dynamic coefficients refer to the respective angle of 
incidence. 

It is observed in Fig. 67.10 that the Kriging mod- 
els constructed for the aerodynamic coefficients are able 
to reproduce the training samples with sufficient confi- 
dence, recording error margin values between 2 to 4%. 
It is hence concluded that the Kriging method is very 
adept at modeling complex landscapes represented by 
a limited number of precise observations. 

To investigate the elementary effect of each de- 
sign variable on the metamodeled objective landscapes, 
we present a quantitative design space visualization 
technique. A popular method for designing prelimi- 
nary experiments for design space visualization is the 
screening method developed by Morris [67.53]. This al- 
gorithm calculates the elementary effect of a variable x; 
and establishes its correlation with the objective space f 
as: 


a) Negligible 

b) Linear and additive 

c) Nonlinear 

d) Nonlinear and/or involved in interactions with x. 


Table 67.2 NASA-SC(2)0410 airfoil results for the formu- 
lated objectives 
Airfoil Mach fi h fa 
number, 

M 


NASA-SC(2)0410 0.79 0.008708 0.1024 0.189625 


a) Drag coefficient Cay 
0.045 


0.035 


0.025 


0.015 


0.005 
0.005 


= 
0.035 0.045 
Drag coefficient Cay 


0.015 0.025 


b) Moment coefficient C,, 
0.25 


79 


0.2 


0.15 


0.1 


0.05 


> 
0.15 0.2 0.25 


Moment coefficient Cm 


0 0.05 0.1 


c) Drag coefficient Cag 
0.1 


0.08 
0.06 
0.04 * 


0.02 


> 
0.06 0.08 0.1 
Drag coefficient Cag 


0 0.02 0.04 


Fig. 67.10a—c Cross-validation curves for the constructed 
Kriging models. (a) Training sample for Cy at M = 0.79. 
(b) Training sample for C, at M = 0.79. (c) Training sam- 
ple for Cy at M = 0.82 


In plain terminology, the Morris algorithm mea- 
sures the sensitivity of the i-th variable to the objective 
landscape f. For a detailed discussion on the Morris al- 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design 


67.4 Case Study: Airfoil Shape Optimization 


e » 
~ x 
gorithm the reader is referred to Forrester et al. [67.40] 
and Campolongo et al. [67.54]. Presented here are the 
results of the variable screening analysis using the Mor- 
ris algorithm for the proposed design application. 

Figure 67.11 graphically shows the results obtained 
from the design variable screening study. It is immedi- 
ately observed that the upper thickness coordinates have 
a relatively large influence on the drag coefficient for 
both design conditions. At higher Mach numbers the 
effect of the lower surface curvature z,,,, is also sig- 
nificant. It is demonstrated, however, that the variables 
Zxx,9 and apg have the largest effect on the moment 
coefficient — variables which directly influence the aft 
camber (and hence the aft camber) on the airfoil. These 
variables will no doubt shift the loading on the airfoil 
forward and aft, resulting in highly fluctuating moment 
values. 

Similar deductions can be made by examining the 
variable influence on d, shown in Fig. 67.11d. The 
variable influence on d, is case specific and entirely de- 
pendent on the reference point chosen for the proposed 
optimization study. Since the value of d, is a means 
of ranking the success of a multiobjective solution as 
one single scalar, variables may be ranked by influ- 
ence, which is otherwise not possible when considering 
a multiobjective array. Preliminary conclusions to the 
priority weighting of the objectives to the reference 
point compromise can also be made. It is observed that 


the variable influence on d; most closely resembles the 
plots of the drag coefficients C4 and Cag, suggesting 


Fig. 67.11a-d Variable influence 
on aerodynamic coefficients (sub- 
scripts refer to the operating Mach 
number). (a) Drag Ca. (b) Moment 
Cm- (c) Drag Cag. (d) d; 


that the moment coefficient is of least priority for the 
preferred compromise. It is interesting to see that the 
trailing edge modification variable Aarg is of particular 
importance for all design coefficients, which validates 
its inclusion in the subsequent optimization study. 


67.4.2 Optimization Results 


A swarm population of N; = 100 particles is flown to 
solve the optimization problem. The objective space is 
normalized for the computation of the reference point 
distance by fmax —fmin- Instead of specifying a maximum 
number of time steps, a computational budget of 250 
evaluations is imposed. A stratified sample of N = 100 
design points using an LHS methodology was used to 
construct the initial global Kriging approximations for 
each objective. A further 150 precise updates were per- 
formed over t ~ 100 time steps until the computational 
budget was breached. As is shown in Fig. 67.12a, the 
largest number of update points was recorded during 
the initial explorative phase. As the preferred region 
becomes populated and s — 0, the algorithm triggers 
exploitation, and the number of update points steadily 
reduces. 

The UPMOPSO algorithm has proven to be very 
capable for this specific problem. Figure 67.12b fea- 
tures the progress of the highest ranked solution (i. e., 
dmin) as the number of precise evaluations increase. The 
reference point criterion is shown to be proficient in fil- 
tering out poorer solutions during exploration, since it 


1325 


7°19 | d Hed 


1326 PartF | Swarm Intelligence 


719 | d Hed 


a) 14 
12 


10 


0 20 40 60 80 100 
Time steps (^) 


0.2 


0.1 


100 150 200 250 
No. of precise evaluations 


Fig. 67.12a,b UPMOPSO performance for transonic air- 
foil shape optimization. (a) History of precise updates. 
(b) Progress of most preferred solution 


is only required to reach 50 update evaluations within 
15% of din, and to reach a further 50 evaluations within 
3%. Furthermore, no needless evaluations as a result of 
the lower-bound prediction are performed during the 
exploitation phase. This conclusion is further comple- 
mented by Fig. 67.13a, as a distinct attraction to the 
preferred region is clearly visible. A total of 30 non- 
dominated solutions were identified in the preferred 
region, which are shown in Fig. 67.13b. 


67.4.3 Post-Optimization 
and Trade-Off Visualization 


The reference point distance also provides a feasi- 
ble means of selecting the most appropriate solutions. 
For example, solutions may be ranked according to 
how well they represent the reference point compro- 
mise. To illustrate this concept, self-organizing maps 


Fig. 67.13a,b Precise evaluations performed and the re- 
sulting non-dominated solutions. (a) Scatter plot of all 
precise evaluations. (b) Preferred region 250 evaluations 


(SOMs) [67.44] are utilized to visualize the interac- 
tion of the objectives with the reference point com- 
promise. Clustering SOM techniques are based on 
a technique of unsupervised artificial neural networks 
that can classify, organize, and visualize large sets of 
data from a high to low-dimensional space [67.45]. 
A neuron used in this SOM analysis is associated 
with the weighted vector of m inputs. Each neuron is 
connected to its adjacent neurons by a neighborhood 
relation and forms a two-dimensional hexagonal topol- 
ogy [67.45]. The SOM learning algorithm will attempt 
to increase the correlation between neighboring neurons 
to provide a global representation of all solutions and 
their corresponding resemblance to the reference point 
compromise. 

Each input objective acts as a neuron to the SOM. 
The corresponding output measures the reference point 
distance (i.e., the resemblance to the reference point 
compromise). A two-dimensional representation of the 
data is presented in Fig. 67.14, organized by six SOM- 
ward clusters. Solutions that yield negative d, values 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design | 67.4 Case Study: Airfoil Shape Optimization 


Table 67.3 Preferred airfoil objective values with measure 
of improvement 

NASA-SC(2)0410 0.008708 0.1024 0.189625 

Preferred design 0.008106 0.0933 0.168809 

% Improvement 6.9 8.8 10.9 


indicate success in the improvement over each aspi- 
ration value. Solutions with positive d, values do not 
surpass each aspiration value but provide significant 
improvement in at least one other objective. Each of 
the node values represent one possible Pareto-optimal 
solution that the designer may select. The SOM chart 
colored by d, is a measure of how far a solution deviates 
from the preferred compromise. However, the concept 
of the preferred region ensures that only solutions that 
slightly deviate from the compromise dictated by z are 
identified. Following the SOM charts, it is possible to 


a) z-coordinate 


0.08 4 
—— NASA-SC(2)0410 
0.06 —- Preferred airfoil 


0.2 0.4 0.6 0.8 1 
x-coordinate 


b) Pressure coefficient, C 
=il 


—e— NASA-SC(2)0410 
— Preferred airfoil 


ES > 


0 0.2 0.4 0.6 0.8 1 
x-coordinate 


a) b) 

LSS èë ë ë Ls O ë 

T T T T T 1 f T T T T 1 
0.008 0.009 0.01 0.011 0.012 0.013 ie 0.006 0.008 0.009 0.011 0.013 


c) 


T T T T T 1 
0 0.002 0.004 0.006 0.008 0.01 -0.001 0.001 0.002 0.003 0.004 


Fig. 67.14 SOM charts to visualize optimal trade-offs between the 
design objectives. (a) fi, (b) f>, (c) fs, (d) d; 


visualize the preferred compromise between the design 
objectives that is obtained. The chart of d, closely fol- 
lows the fı chart, which suggests that this objective 
has the highest priority. If the designer were slightly 
more inclined towards another specific design objec- 
tive, then solutions that perhaps place more emphasis 
on the other objectives should be considered. In this 
study, the most preferred solution is ideally selected as 
the highest ranked solution according to (67.5). 


67.4.4 Final Designs 


Table 67.3 shows the objective comparisons with the 
NASA-SC(2)0410. Of interest to note is that the most 
active objective is fı, since the solution which provides 
the minimum d, values also provides the minimum fı 
value. This implies that the reference point was sit- 


Fig. 67.15a,b The most preferred solution observed by 
the UPMOPSO algorithm. (a) Preferred airfoil geometry. 
(b) C, distributions for M = 0.79, Cı = 0.4 < 


1327 


7°19 | d Hed 


1328 PartF | Swarm Intelligence 


7°19 | d Hed 


Fig. 67.16a,b Pressure contours for design condition of M = 


Drag coefficient, Cy 
0.02 


—e— Preferred airfoil 
—e— Most robust solution 


— NASA-SC(2)0410 
0.018 


0.016 
0.014 
0.012 

0.01 


0.008 


> 
0.77. OTa 0.79 0.8 0.81 0.82 0.83 
Design mach number, M 


Fig. 67.17 Drag rise curves for C; = 0.4 


uated near the fı Pareto boundary. Of the identified 
set of Pareto-optimal solutions, the largest improve- 
ments obtained in objectives fọ and f3 are 36.4 and 
91.6%, respectively, over the reference point. The pre- 
ferred airfoil geometry is shown in Fig. 67.15a in 
comparison with the NASA-SC(2)0410. The preferred 
airfoil has a thickness of 9.76% chord and main- 
tains a moderate curvature over the upper surface. 
A relatively small aft curvature is used to gener- 


b) 


D 


0.79, C; = 0.4. (a) NASA-SC(2)0410, (b) Preferred airfoil 


ate the required lift, whilst reducing the pitching 
moment. 

Performance comparisons between the NASA- 
SC(2)0410 and the preferred airfoil at the design condi- 
tion of M = 0.79 can be made from the static pressure 
contour output in Fig. 67.16, and the surface pressure 
distribution of Fig. 67.15b. The reduction in Cg is at- 
tributed to the significantly weaker shock that appears 
slightly upstream of the supercritical shock position. 
The reduction in the pitching moment is clearly visible 
from the reduced aft loading. Along with the improve- 
ment at the required design condition, the preferred 
airfoil exhibits a lower drag rise by comparison, as is 
shown in Fig. 67.17. There is a notable decline in the 
drag rise at the design condition of M = 0.79, and the 
drag is recorded as lower than the NASA-SC(2)0410, 
even beyond the design range. Also visible is the solu- 
tion that provides the most robust design (i. e., min f3). 
The most robust design is clearly not obtained at the 
expense of poor performance at the design condition, 
due to the compromising influence of z. If the designer 
were interested in obtaining further alternative solutions 
which provide greater improvement in either objective, 
it would be sufficient to re-commence (at the current 
time step) the optimization process with a larger value 
of ô, or by relaxing one or more of the aspiration val- 
ues, Z;. 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design 


References 


67.5 Conclusion 


In this chapter, an optimization framework has been 
introduced and applied to the aerodynamic design of 
transonic airfoils. A  surrogate-driven multiobjective 
particle swarm optimization algorithm is applied to nav- 
igate the design space to identify and exploit preferred 
regions of the Pareto frontier. The integration of all 
components of the optimization framework is entirely 
achieved through the use of a reference point distance 
metric which provides a scalar measure of the preferred 
interests of the designer. This effectively allows for the 
scale of the design space to be reduced, confining it to 
the interests reflected by the designer. 

The developmental effort that is reported on here 
is to reduce the often prohibitive computational cost 
of multiobjective optimization to the level of prac- 
tical affordability in computational aerodynamic de- 
sign. A concise parameterization model was consid- 
ered to perform the necessary shape modifications in 
conjunction with a Reynolds-averaged Navier-Stokes 
flow solver. Kriging models were constructed based 
on a stratified sample of the design space. A pre- 
optimization visualization tool was then applied to 


References 


67.1 T.E. Labrujére, J.W. Sloof: Computational methods 
for the aerodynamic design of aircraft components, 
Annu. Rev. Fluid Mech. 51, 183-214 (1993) 

67.2 R.M. Hicks, P.A. Henne: Wing design by numerical 
optimization, J. Aircr. 15(7), 407-412 (1978) 

67.3 J. Kennedy, R.C. Eberhart, Y. Shi: Swarm Intelli- 
gence (Morgan Kaufmann, San Francisco 2001) 

67.4 J. Kennedy, R.C. Eberhart: Particle swarm opti- 
mization, Proc. IEEE Intl. Conf. Neural Netw. (1995) 
pp. 1942-1948 

67.5 I.C. Trelea: The particle swarm optimization algo- 
rithm: Convergence analysis and parameter selec- 
tion, Inform. Proces. Lett. 85(6), 317-325 (2003) 

67.6 C.M. Fonseca, P.J. Fleming: Multiobjective opti- 
mization and multiple constraint handling with 
evolutionary algorithms. A unified formulation, 
IEEE Trans. Syst. Man Cybern. A 28(1), 26-37 (1998) 

67.7 M. Drela: Pros and cons of airfoil optimization, 
Front. Comput. Fluid Dynam. 19, 1-19 (1998) 

67.8 W. Li, L. Huyse, S. Padula: Robust airfoil optimiza- 
tion to achieve drag reduction over a range of 
mach numbers, Struct. Multidiscip. Optim. 24, 38- 
50 (2002) 

67.9 M. Nemec, D.W. Zingg, T.H. Pulliam: Multipoint and 
multi-objective aerodynamic shape optimization, 
AIAA J. 42(6), 1057-1065 (2004) 


screen variable elementary influence and quantify its 
relative influence to the preferred interests of the de- 
signer. Initial design drivers were easily identified and 
an insight to the optimization landscape was obtained. 
Optimization was achieved by driving a surrogate- 
assisted particle swarm towards a sector of special 
interest on the Pareto front, which is shown to be an 
effective and efficient mechanism. It is observed that 
there is a distinct attraction towards the preferred region 
dictated by the reference point, which implies that the 
reference point criterion is adept at filtering out solu- 
tions that will disrupt or deviate from the optimal search 
path. 

Non-dominated solutions that provide significant 
improvement over the reference geometry were iden- 
tified within the computational budget imposed and 
are clearly reflective of the preferred interest. A post- 
optimization data mining tool was finally applied to 
facilitate a qualitative trade-off visualization study. This 
analysis provides an insight into the relative priority of 
each objective and their influence on the preferred com- 
promise. 


67.10 |. Das, J.E. Dennis: A closer look at draw- 
backs of minimizing weighted sums of objec- 
tives for Pareto set generation in multicriteria 
optimization problems, Struct. Optim. 14, 63-69 
(1997) 

67.11 R.T. Marler, J.S. Arora: Survey of multi-objective op- 
timization methods for engineering, Struct. Multi- 
discip. Optim. 26, 369-395 (2004) 

67.12 C.M. Fonseca, P.J. Fleming: An overview of evolu- 
tionary algorithms in multiobjective optimization, 
Evol. Comput. 3, 1-16 (1995) 

67.13 K. Deb: Multi-Objective Optimization Using Evolu- 
tionary Algorithms (Wiley, New york 2001) 

67.14 C.A. Coello Coello: Evolutionary Algorithms for Solv- 
ing Multi-Objective Problems (Springer, Berlin, Hei- 
delberg 2007) 

67.15 S. Obayashi, D. Sasaki, Y. Takeguchi, N. Hirose: Mul- 
tiobjective evolutionary computation for super- 
sonic wing-shape optimization, IEEE Trans. Evol. 
Comput. 4&(2), 182-187 (2000) 

67.16 A. Vicini, D. Quagliarella: Airfoil and wing de- 
sign through hybrid optimization strategies, AIAA 
J. 37(5), 634-641 (1999) 

67.17 A. Ray, H.M. Tsai: Swarm algorithm for single and 
multiobjective airfoil design optimization, AIAA J. 
42(2), 366-373 (2004) 


1329 


Z9 | d Hed 


1330 Part F 


Swarm Intelligence 


249 | d Hed 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


67. 


18 


27 


31 


34 


M. Ehrgott, X. Gandibleux: Multiple Criteria Opti- 
mization: State of the Art Annotated Bibliographic 
Surveys (Kluwer, Boston 2002) 

M.R. Sierra, C.A. Coello Coello: Multi-objective par- 
ticle swarm optimizers: A survey of the state-of- 
the-art, Int. J. Comput. Intell. Res. 2(3), 287-308 
(2006) 

A.P. Engelbrecht: Fundamentals of Computational 
Swarm Intelligence (Wiley, New York 2005) 

J. Knowles, D. Corne: Approximating the non- 
dominated front using the Pareto archived evolu- 
tion strategy, Evol. Comput. 8(2), 149-172 (2000) 
C.A. Coello Coello, M.S. Lechuga: Mopso: A proposal 
for multiple objective particle swarm optimiza- 
tion, IEEE Cong. Evol. Comput. (2002) pp. 1051- 
1056 

S. Mostaghim, J. Teich: The role of «-dominance 
in multi-objective particle swarm optimization 
methods, IEEE Cong. Evol. Comput. (2003) pp. 1764- 
1771 

J.E. Fieldsend, S. Singh: A multi-objective algo- 
rithm based upon particle swarm optimisation, an 
efficient data structure and turbulence, U.K Work- 
shop Comput. Intell. (2002) pp. 37-44 

X. Li: A non-dominated sorting particle swarm 
optimizer for multiobjective optimization, Genet. 
Evol. Comput. Conf. (2003) pp. 37-48 

X. Li: Better spread and convergence: Particle 
swarm multiobjective optimization using the max- 
imin fitness function, Genet. Evol. Comput. Conf. 
(2004) pp. 117-128 

M.R. Sierra, C.A. Coello Coello: Improving pso- 
based multi-objective optimization using crowd- 
ing, mutation and e-dominance, Lect. Notes Com- 
put. Sci. 3410, 505-519 (2005) 

C.R. Raquel, P.C. Naval: An effective use of crowd- 
ing distance in multiobjective particle swarm 
optimization, Genet. Evol. Comput. Conf. (2005) 
pp. 257-264 

C.A. Coello Coello: Handling preferences in evolu- 
tionary multiobjective optimization: A survey, IEEE 
Cong. Evol. Comput. (2000) pp. 30-37 

L. Rachmawati, D. Srinivasan: Preference incorpo- 
ration in multi-objective evolutionary algorithms: 
A survey, IEEE Cong. Evol. Comput. (2006) pp. 962- 
968 

C.M. Fonseca, P.J. Fleming: Handling preferences 
in evolutionary multiobjective optimization: A sur- 
vey, Proc. IEEE 5th Int. Conf. Genet. Algorithm (1993) 
pp. 416-423 

K. Deb: Solving goal programming problems us- 
ing multi-objective genetic algorithms, IEEE Cong. 
Evol. Comput. (1999) pp. 77-84 

L. Thiele, P. Miettinen, P.J. Korhonen, J. Molina: 
A preference-based evolutionary algorithm for 
multobjective optimization, Evol. Comput. 17(3), 
411-436 (2009) 

K. Deb, J. Sundar: Reference point based multi- 
objective optimization using evolutionary algo- 


67.35 


67.36 


67.37 


67.38 


67.39 


67.40 


67.41 


67.42 


67.43 


67.44 


67.45 


67.46 


67.47 


67.48 


67.49 


67.50 


67.51 


rithms, Genet. Evol. Comput. Conf. (2006) pp. 635- 
642 

K. Deb, A. Kumar: Interactive evolutionary multi- 
objective optimization and decision-making using 
reference direction method, Genet. Evol. Comput. 
Conf. (2007) pp. 781-788 

K. Deb, A. Kumar: Light beam search based multi- 
objective optimization using evolutionary algo- 
rithms, Genet. Evol. Comput. Conf. (2007) pp. 2125- 
2132 

U.K. Wickramasinghe, X. Li: Integrating user pref- 
erences with particle swarms for multi-objective 
optimization, Genet. Evol. Comput. Conf. (2008) 
pp. 745-752 

U.K. Wickramasinghe, X. Li: Using a distance met- 
ric to guide pso algorithms for many-objective 
optimization, Genet. Evol. Comput. Conf. (2009) 
pp. 667-674 

A.J. Keane, P.B. Nair: Computational Approaches for 
Aerospace Design: The Pursuit of Excellence (Wiley, 
New York 2005) 

A. Forrester, A. SObester, A.J. Keane: Engineering 
Design Via Surrogate Modelling: A Practical Guide 
(Wiley, New York 2008) 

D.R. Jones: A taxomony of global optimization 
methods based on response surfaces, J. Glob. Op- 
tim. 21, 345-383 (2001) 

Y. Jin: A comprehensive survey of fitness approxi- 
mation in evolutionary computation, Soft Comput. 
9(1), 3-12 (2005) 

R.M. Greenman, K.R. Roth: High-lift optimiza- 
tion design using neural networks on a multi- 
element airfoil, J. Fluids Eng. 121(2), 434-440 
(1999) 

T. Kohonen: Self-Organizing Maps (Springer, 
Berlin, Heidelberg 1995) 

S. Jeong, K. Chiba, S. Obayashi: Data mining for 
aerodynamic design space, J. Aerosp. Comput. In- 
form. Commun. 2, 452-469 (2005) 

W. Song, A.J. Keane: A study of shape parame- 
terisation methods for airfoil optimisation, Proc. 
10th AIAA/ISSMO Multidiscip. Anal. Optim. Conf. 
(2004) 

H. Sobjieczky: Parametric airfoils and wings, Notes 
Numer. Fluid Mech. 68, 71-88 (1998) 

A. Jahangirian, A. Shahrokhi: Inverse design of 
transonic airfoils using genetic algorithms and 
a new parametric shape model, Invers. Probl. Sci. 
Eng. 17(5), 681-699 (2009) 

P.R. Spalart, S.R. Allmaras: A one-equation turbu- 
lence model for aerodynamic flows, Rech. Aerosp. 
1, 5-21 (1992) 

M. Clerc, J. Kennedy: The particle swarm - explo- 
sion, stability, and convergence in a multidimen- 
sional complex space, IEEE Trans. Evol. Comput. 
6(1), 58-73 (2002) 

D. Bratton, J. Kennedy: Defining a standard for par- 
ticle swarm optimization, IEEE Swarm Intell. Symp. 
(2007) pp. 120-127 


Preference-Based Multiobjective Particle Swarm Optimization for Airfoil Design | References 


67.52 


M.D. Mckay, R.J. Beckman, W.J. Conover: A compar- 
ison of three methods for selecting values of input 
variables in the analysis of output from a com- 
puter code, Invers. Prob. Sci. Eng. 21(2), 239-245 
(1979) 


67.53 


67.54 


M.D. Morris: Factorial sampling plans for prelim- 
inary computational experiments, Technometrics 
33(2), 161-174 (1991) 

F. Campolongo, A. Saltelli, S. Tarantola, M. Ratto: 
Sensitivity Analysis in Practice (Wiley, New York 
2004) 


1331 


Z9 | d Hed 


1333 


68. Ant Colony Optimization for the Minimum-Weight 
Rooted Arborescence Problem 


Christian Blum, Sergi Mateo Bellido 


T 7 68.1 Introductiory Remarks ......................... 1333 
The minimum-weight rooted arborescence prob- 
lem is an NP-hard combinatorial optimization 68.2 The Minimum-Weight Rooted 
problem which has important applications, for Arborescence Problem.......................... 1334 
example, in computer vision. An example of such 68.3 DP-HEUR: A Heuristic Approach 
an application is the automated reconstruction of to the MWRA Problem ....................0000 1335 D 
consistent tree structures from noisy images. In this Sia Anit lony Optimization + 
ae ae $ = 
chapter, weipresent ar ant colony optimization for the MWRA Problem......................... 1335 E 
approach to tackle this problem. Ant colony op- a 
timization is a metaheuristic which is inspired by 68.5 Experimental Evaluation...................... 1337 om 
the foraging behavior of ant colonies. By means of 68.5.1 Benchmark Instances................. 1337 
an extensive computational evaluation, we show 68.5.2 Algorithm Tuning .................0. 1337 
that the proposed approach has advantages over 8.5.2 RESUS .cisiccccscsescosseeiesecawsccvsesese 1338 
an existing heuristic from the literature, especially 68.6 Conclusions and Future Work ............... 1343 
for what concerns rather dense graphs. 
References... cc ceeeceneeeeeeeeeneeeenees 1343 


68.1 Introductiory Remarks 


Solving combinatorial optimization problems with ap- 
proaches from the swarm intelligence field has al- 
ready a considerably long tradition. Examples of 
such approaches include particle swarm optimization 
(PSO) [68.1] and artificial bee colony (ABC) optimiza- 
tion [68.2]. The oldest — and most widely used — algo- 
rithm from this field, however, is ant colony optimiza- 
tion (ACO) [68.3]. In general, the ACO metaheuristic 
attempts to solve a combinatorial optimization prob- 
lem by iterating the following steps: (1) Solutions to 
the problem at hand are constructed using a pheromone 
model, that is, a parameterized probability distribution 
over the space of all valid solutions, and (2) (some 
of) these solutions are used to change the pheromone 
values in a way being aimed at biasing subsequent sam- 
pling toward areas of the search space containing high 
quality solutions. In particular, the reinforcement of 
solution components depending on the quality of the 
solutions in which they appear is an important aspect 
of ACO algorithms. It is implicitly assumed that good 
solutions consist of good solution components. To learn 


which components contribute to good solutions most 
often helps assembling them into better solutions. 

In this chapter, ACO is applied to solve the 
minimum-weight rooted arborescence (MWRA) prob- 
lem, which has applications in computer vision such as, 
for example, the automated reconstruction of consistent 
tree structures from noisy images [68.4]. The structure 
of this chapter is as follows. Section 68.2 provides a de- 
tailed description of the problem to be tackled. Then, 
in Sect. 68.3 a new heuristic for the MWRA problem 
is presented which is based on the deterministic con- 
struction of an arborescence of maximal size, and the 
subsequent application of dynamic programming (DP) 
for finding the best solution within this constructed ar- 
borescence. The second contribution is to be found in 
the application of ACO [68.3] to the MWRA prob- 
lem. This algorithm is described in Sect. 68.4. Finally, 
in Sect. 68.5 an exhaustive experimental evaluation of 
both algorithms in comparison with an existing heuris- 
tic from the literature [68.5] is presented. The chapter is 
concluded in Sect. 68.6. 


13334 Part F | Swarm Intelligence 


7°89 | 4 Hed 


68.2 The Minimum-Weight Rooted Arborescence Problem 


As mentioned before, in this work we consider the 
MWRA problem, which is a generalization of the prob- 
lem proposed by Venkata Rao and Sridharan in [68.5, 
6]. The MWRA problem can technically be described 
as follows. Given is a directed acyclic graph G = (V, A) 
with integer weights on the arcs, that is, for each a € 
A exists a corresponding weight w(a) € Z. Moreover, 
a vertex v, € V is designated as the root vertex. Let A 
be the set of all arborescences in G that are rooted 
in v,. In this context, note that an arborescence is a di- 
rected, rooted tree in which all arcs point away from 
the root vertex (see also [68.7]). Moreover, note that A 
contains all arborescences, not only those with max- 
imal size. The objective function value (that is, the 


a) Example input graph b) Optimal solution, value: —19 


5 AN, c 
Ss Ge 
X K 


Ble 34 = 
cot E 
-12 -12 
Fig. 68.1a,b (a) An input DAG with eight vertices and 14 
arcs. The uppermost vertex is the root vertex v,. (b) The op- 
timal solution, that is, the arborescence rooted in v, which 


has the minimum weight among all arborescence rooted 
in v, that can be found in the input graph 


Fig. 68.2a,b (a) A 2D image of the retina of a human eye. 
The problem consists in the automatic reconstruction (or 
delineation) of the vascular structure. (b) The reconstruc- 
tion of the vascular structure as produced by the algorithm 
proposed in [68.4] 


weight) f(T) of an arboresence T € A is defined as 
follows: 


f(T):=) ow). (68.1) 


aET 


The goal of the MWRA problem is to find an ar- 
boresence T* € A such that the weight of T* is 
smaller or equal to all other arborescences in A. In 
other words, the goal is to minimize objective func- 
tion f(-). An example of the MWRA problem is shown 
in Fig. 68.1. 

The differences to the problem proposed in [68.5] 
are as follows. The authors of [68.5] require the root 
vertex v, to have only one single outgoing arc. More- 
over, numbering the vertices from 1 to |V|, the given 
acyclic graph G is restricted to contain only arcs qj, ; 
such that i<j. These restrictions do not apply to the 
MWRA problem. Nevertheless, as a generalization of 
the problem proposed in [68.5], the MWRA prob- 
lem is NP-hard. Concerning the existing work, the 
literature only offers the heuristic proposed in [68.5], 
which can also be applied to the more general MWRA 
problem. 

The definition of the MWRA problem as previ- 
ously outlined is inspired by a novel method which 
was recently proposed in [68.4] for the automated re- 
construction of consistent tree structures from noisy 
images, which is an important problem, for example, 
in Neuroscience. Tree-like structures, such as den- 
dritic, vascular, or bronchial networks, are pervasive 
in biological systems. Examples are 2D retinal fun- 
dus images and 3D optical micrographs of neurons. 
The approach proposed in [68.4] builds a set of can- 
didate arborescences over many different subsets of 
points likely to belong to the optimal delineation and 
then chooses the best one according to a global ob- 
jective function that combines image evidence with 
geometric priors (Fig. 68.2, for example). The so- 
lution of the MWRA problem (with additional hard 
and soft constraints) plays an important role in this 
process. Therefore, developing better algorithms for 
the MWRA problem may help in composing bet- 
ter techniques for the problem of the automated re- 
construction of consistent tree structures from noisy 
images. 


ACO for the MWRA Problem | 68.4 Ant Colony Optimization for the MWRA Problem 


68.3 DP-HEUR: A Heuristic Approach to the MWRA Problem 


In this section, we propose a new heuristic approach 
for solving the MWRA problem. First, starting from 
the root vertex v,, a spanning arborescence T” in G is 
constructed as outlined in lines 2—9 of Algorithm 68.1. 
Second, a DP algorithm is applied to 7’ in order 
to obtain the minimum-weight arborescence T that 
is contained in T’ and rooted in v,. The DP algo- 
rithm from [68.8] is used for this purpose. Given an 
undirected tree T = (Vr, Er) with vertex and/or edge 
weights, and any integer number k € [0,|V7|— 1], this 
DP algorithm provides — among all trees with exactly k 
edges in T — the minimum-weight tree T*. The first step 
of the DP algorithm consists in artificially converting 
the input tree T into a rooted arborescence. Therefore, 
the DP algorithm can directly be applied to arbores- 
cences. Morever, as a side product, the DP algorithm 
also provides the minimum-weight arborescences for 
all 7 with O</<k, as well as the minimum-weight 
arborescences rooted in v, for all / with O</<k. 
Therefore, given an arborescence of maximal size T’, 
which has |V|—1 arcs (where V is the vertex set 
of the input graph G), the DP algorithm is applied 
with |V| — 1. Then, among all the minimum-weight ar- 
borescences rooted in v, for /<|V|—1, the one with 
minimum weight is chosen as the output of the DP 


algorithm. In this way, the DP algorithm is able to gen- 
erate the minimum-weight arborescence T (rooted in v,) 
which can be found in arborescence T’. The heuristic 
described above is henceforth labeled DP-HEUR. As 
a final remark, let us mention that for the description 
of this heuristic, it was assumed that the input graph is 
connected. Appropriate changes have to be applied to 
the description of the heuristic if this is not the case. 


Algorithm 68.1 Heuristic DP-HEuR for the MWRA 


problem 
1: input: a DAG G = (V,A), and a root node v, 
2: Ti := (VG = {v,}, AG = Ø) 
3: Apos = {a = (vq, vi) EA | va € VG, vi É VG} 
4: fori=1,...,|V|—1do 
5: a* = (vq, Y1) := argmin{w(a) | a € Apos} 
6: A; :=Ai_ U {a*} 
7. Vi:= VL, Uf} 
8: T= (V/A) 
9: Apos = {a = (vq, vi) EA | va € Vi £ VI} 
10: end for 
11: T:= Dynamic_Programming(T/y—; k = 
IVI- 1) 
12: output: arborescence T 


68.4 Ant Colony Optimization for the MWRA Problem 


The ACO approach for the MWRA problem which is 
described in the following is a MAX-MIN Ant Sys- 
tem (MMAS) [68.9] implemented in the hyper-cube 
framework (HCF) [68.10]. The algorithm, whose pseu- 
docode can be found in Algorithm 68.2, works roughly 
as follows. At each iteration, a number of na solutions 
to the problem is probabilistically constructed based 
on both pheromone and heuristic information. The sec- 
ond algorithmic component which is executed at each 
iteration is the pheromone update. Hereby, some of 
the constructed solutions — that is, the iteration-best 
solution T®, the restart-best solution T™, and the best- 
so-far solution T°’ — are used for a modification of 
the pheromone values. This is done with the goal of 
focusing the search over time on high-quality areas 
of the search space. Just like any other MMAS al- 
gorithm, our approach employs restarts consisting of 
a re-initialization of the pheromone values. Restarts are 
controlled by the so-called convergence factor (cf) and 


a Boolean control variable called bs_update. The main 
functions of our approach are outlined in detail in the 
following. 


Algorithm 68.2 Ant Colony Optimization for the 
MWRA Problem 


l; 


ee 


input: a DAG G = (V, A), and a root node v, 
T™ := ({v,}, Ø), 
bs_update := false 

Ta := 0.5 for all a € A 
while termination conditions not met do 


T® := ({v,}, Ø), cf:= 0, 


S:=@ 

fori=1,...,n, do 
T := Construct_Solution(G, v,) 
S:= SU {T;} 

end for 

T? := argmin{f(T) | T € S} 

if T? < T® then T® := T? 

if T < T™ then T°’ := T? 


1335 


7°89 | 4 Hed 


1336 PartF | Swarm Intelligence 


7°89 | d Hed 


13: ApplyPheromoneUpdate 
(cf,bs_update,T ,T® ,T™®,T®) 

14: cf:= ComputeConvergenceFactor(T ) 

15: if cf> 0.99 then 


16: if bs_update = true then 
17: Ta := 0.5 for all a € A 
18: T® := ({v,}, Ø) 

19: bs_update := false 

20: else 

21: bs_update := true 

22: end if 

23: endif 


24: end while 
25: output: T®™, the best solution found by the algo- 
rithm 


Construct_Solution(G, v,): This function, first, 
constructs a spanning arborescence T’ in the way which 
is shown in lines 2—9 of Algorithm 68.1. However, the 
choice of the next arc to be added to the current ar- 
borescence at each step (see line 5 of Algorithm 68.1) 
is done in a different way. Instead of deterministically 
choosing from Apos, the arc which has the small- 
est weight value, the choice is done probabilistically, 
based on pheromone and heuristic information. The 
pheromone model T that is used for this purpose con- 
tains a pheromone value Tą for each arc a € A. The 
heuristic information n(a) of an arc a is computed as 
follows. First, let 


Wmax := max{w(a) |a € A}. (68.2) 


Based on this maximal weight of all arcs in G, the 
heuristic information is defined as follows: 


n(a) := Wmax + 1 — w(a) . (68.3) 


In this way, the heuristic information of all arcs is a pos- 
itive number. Moreover, the arc with minimal weight 
will have the highest value concerning the heuristic 
information. Given an arborescence T’ (obtained af- 
ter the ith construction step), and the nonempty set of 
arcs Apos that may be used for extending T/, the prob- 
ability for choosing arc a € Apos is defined as follows 


Ta’ n(a) 
aes ta nlà) ` 


However, instead of choosing an arc from Apos always 
in a probabilistic way, the following scheme is applied 
at each construction step. First, a value r € [0, 1] is cho- 
sen uniformly at random. Second, r is compared to 


p(a| Tj) := (68.4) 


a so-called determinism rate ô € [0, 1], which is a fixed 
parameter of the algorithm. If r < ô, arc a* € Apos is 
chosen to be the one with the maximum probability, 
that is 


a* := argmax{p(a | Tj) | a € Apos} - (68.5) 


Otherwise, that is, when r > 5, arc a* € Apos is chosen 
probabilistically according to the probability values. 

The output T of the function Construct_Solu- 
tion(G, v,) is chosen to be the minimum-weight ar- 
borescence which is encountered during the process of 
constructing T’, that is, 


T := argmin{f(T;) |i=0,...,|VJ—1}. 


ApplyPheromoneUpdate(cf, bs_update, T, T®, 
T™®, T°’): The pheromone update is performed in the 
same way as in all MMAS algorithms implemented in 
the HCF. The three solutions T, T, and TS (as de- 
scribed at the beginning of this section) are used for 
the pheromone update. The influence of these three so- 
lutions on the pheromone update is determined by the 
current value of the convergence factor cf, which is de- 
fined later. Each pheromone value t, € T is updated as 
follows: 


Ta = Ta + P+ (Ea — Ta) , (68.6) 


where 


Ea = Kin: A(T”, a) +k A(T, a) +kps° A(T”, a) , 
(68.7) 


where Kip is the weight of solution T, kẹ the one of 
solution T", and kp the one of solution T°’. Moreover, 
A(T, a) evaluates to 1 if and only if arc a is a component 
of arborescence 7. Otherwise, the function evaluates 
to 0. Note also that the three weights must be cho- 
sen such that Kib + Kj + Kos = 1. After the application 


Table 68.1 Setting of Kib, Kb, and Kps depending on the 
convergence factor cf and the Boolean control variable 
bs_update 


bs_update = FALSE bs_update 


cef<0.7 cfe cf> 0.9 TRUE 
[0.7, 0.9) 
Kib 2/3 1/3 0 0 
Krb 1/3 2/3 1 0 
Kbs 0 0 0 1 


ACO for the MWRA Problem | 68.5 Experimental Evaluation 


of (68.6), pheromone values that exceed Tmax = 0.99 
are set back to Tmax, and pheromone values that have 
fallen below Tmin = 0.01 are set back to Tmin. This pre- 
vents the algorithm from reaching a state of complete 
convergence. Finally, note that the exact values of the 
weights depend on the convergence factor cf and on 
the value of the Boolean control variable bs_update. 
The standard schedule as shown in Table 68.1 has been 
adopted for our algorithm. 
ComputeConvergenceFactor(7 ): The conver- 
gence factor (cf) is computed on the basis of the 


68.5 Experimental Evaluation 


The algorithms proposed in this chapter — that is, DP- 
HEUR and ACO — were implemented in ANSI C++ 
using GCC 4.4 for compiling the software. Moreover, 
the heuristic proposed in [68.5] was reimplemented. As 
mentioned before, this heuristic — henceforth labeled 
VENSRI — is the only existing algorithm which can 
directly be applied to the MWRA problem. All three al- 
gorithms were experimentally evaluated on a cluster of 
PCs equipped with Intel Xeon X3350 processors with 
2667 MHz and 8 Gb of memory. In the following, we 
first describe the set of benchmark instances that have 
been used to test the three algorithms. Afterward, the 
algorithm tuning and the experimental results are de- 
scribed in detail. 


68.5.1 Benchmark Instances 


A diverse set of benchmark instances was generated 
in the following way. Three parameters are necessary 
for the generation of a benchmark instance G = (V,A). 
Hereby, n and m indicate, respectively, the number of 
vertices and the number of arcs of G, while q € [0, 1] 
indicates the probability for the weight of any arc to be 
positive (rather than negative). The process of the gen- 
eration of an instance starts by constructing a random 
arborescence T with n vertices. The root vertex of T is 
called v,. Each of the remaining m—n-+ 1 arcs was gen- 
erated by randomly choosing two vertices v; and vj, and 
adding the corresponding arc a = (vj, vj) to T. In this 
context, a = (v;, vj) may be added to T, if and only if by 
its addition no directed cycle is produced, and neither 
(vi, yj) nor (vj, vi) form already part of the graph. The 
weight of each arc was chosen by, first, deciding with 
probability q if the weight is to be positive (or nonpos- 
itive). In the case of a positive weight, the weight value 


pheromone values 


Ie er wee) ) 
cf:=2 a 0.5]. 
f (( IT| á (Tmax = Tmin) 


This results in cf = 0 when all pheromone values are set 
to 0.5. On the other side, when all pheromone values 
have either value Tmin OF Tmax, then cf = 1. In all other 
cases, cf has a value in (0, 1). This completes the de- 
scription of all components of the proposed algorithm, 
which is henceforth labeled ACO. 


was chosen uniformly at random from [1, 100], while in 
the case of a nonpositive weight, the weight value was 
chosen uniformly at random from [—100, 0]. 

In order to generate a diverse set of benchmark 
instances, the following values for n, m, and q were con- 
sidered: 


@ ne {20,50, 100, 500, 1000, 5000}; 
@ me {2n,4n, 6n}; 
@ gé€{0.25,0.5,0.75}. 


For each combination of n, m, and q, a total of 10 
problem instances were generated. This resulted in a to- 
tal of 540 problem instances, that is, 180 instances for 
each value of q. 


68.5.2 Algorithm Tuning 


The proposed ACO algorithm has several parameters 
that require appropriate values. The following parame- 
ters, which are crucial for the working of the algorithm, 
were chosen for tuning: 


@ n,€ {3,5,10,20}: the number of ants (solution 
constructions) per iteration; 

@ p€{0.05,0.1, 0.2}: the learning rate; 

@ ô € {0.0, 0.4, 0.7, 0.9}: the determinism rate. 


We chose the first problem instance (out of 10 prob- 
lem instances) for each combination of n, m, and g for 
tuning. A full factorial design was utilized. This means 
that ACO was applied (exactly once) to each of the 
problem instances chosen for tuning. The stopping cri- 
terion was fixed to 20 000 solution evaluations for each 
application of ACO. For analyzing the results, we used 


1337 


s°89 | 4 Hed 


1338 PartF 


Swarm Intelligence 


s°89 |4 Hed 


a rank-based analysis. However, as the set of problem 
instances is quite diverse, this rank-based analysis was 
performed separately for six subsets of instances. For 
defining these subsets, we refer to the instances with 
n € {20, 50, 100} as small instances, and the remaining 
ones as large instances. With this definition, each of the 
three subsets of instances concerning the three differ- 
ent values for g, was further separated into two subsets 
concerning the instance size. For each of these six sub- 
sets, we used the parameter setting with which ACO 
achieved the best average rank for the corresponding 
tuning instances. These parameter settings are given in 
Table 68.2. 


68.5.3 Results 


The three algorithms considered for the comparison 
were applied exactly once to each of the 540 prob- 
lem instances of the benchmark set. Although ACO is 
a stochastic search algorithm, this is a valid choice, be- 
cause results are averaged over groups of instances that 
were generated with the same parameters. As in the 
case of the tuning experiments, the stopping criterion 
for ACO was fixed to 20000 solution evaluations. Ta- 
bles 68.3-68.5 present the results averaged — for each 
algorithm — over the 10 instances for each combination 
of n and m (as indicated in the first two table columns). 
Four table columns are used for presenting the results 
of each algorithm. The column with heading value pro- 
vides the average of the objective function values of 
the best solutions found by the respective algorithm for 
the 10 instances of each combination of n and m. The 
second column (with heading std) contains the corre- 
sponding standard deviation. The third column (with 
heading size) indicates the average size (in terms of the 
number or arcs) of the best solutions found by the re- 
spective algorithm (remember that solutions — that is, 
arborescences — may have any number of arcs between 
0 and |V|—1, where |V] is the number of the input DAG 
G=(V,A)). Finally, the fourth column (with heading 
time (s)) contains the average computation time (in sec- 


Table 68.2 Parameter setting (concerning ACO) used for 
the final experiments 


 q=0.25 g=05 g=0.75 
Small instances Py, = 20 in, = 20 t=) 
p=0.2 p=0.2 p=0.05 
6=0.7 6=0.7 6=0.4 
Large instances Ng = 20 na =20 ng=20 
p=0.2 p=0.2 p=0.2 
6=09 6=0.9 6=0.9 


onds). For all three algorithms, the computation time 
indicates the time of the algorithm termination. In the 
case of ACO, an additional table column (with heading 
evals) indicates at which solution evaluation, on aver- 
age, the best solution of a run was found. Finally, for 
each combination of n and m, the result of the best- 
performing algorithm is indicated in bold font. 


a) g=0.25 
60 4 
H DP-HEUR 48.17 
W I l ACO f 
w 35.25 34.09 
30 
21.82 
20 
10 5.86 
0 0.30 p | 
2n 4n 6n 
b) q=0.5 
504 45.99 
40 |— JI DP-Heur 
Aco 
30 H 25.02 25.03 
; i 
10 8.51 
‘ Ea 
ao WEES 
-11.57 -11.35 
20 > 
2n 4n 6n 
c) g=0.75 
10, 
2.92 
0 | | 
“ik | 
-20 
ZET -19.02 
-30 
-34.91 A 
0 ` PI pp-Heur 
—50 I l| Aco 
-51.06 
60 > 
2n 4n 6n 


Fig. 68.3a-c Average improvement (in %) of ACO and 
DP-HEUR over VENSRI. Positive values correspond to 
an improvement, while negative values indicate that the 
respective algorithm is inferior to VENSRI. The improve- 
ment is shown for the three different arc-densities that are 
considered in the benchmark set, that is, m = 2n, m = 4n, 
and m = 6n 


ACO for the MWRA Problem | 68.5 Experimental Evaluation 1339 


Part F | 68.5 


ETTSLS OT LETOI OF PL8h (6~ ory) OGELISZTE— TISIOT O72 606F (ZO9L8E) Or'6SSOTZT— T9 OO'SI8h (69°€06L) OLS8IL6T— “9 
EvsiSp OL'TOSHI O68I8h (6SOI8h) OTTOPP6Z— S0'600Z O9'LS8h (POFPO9T) ODLIGLIZ— 189 OE 8ZLr (E8'T688S) OS 601 99T— Uy 
8rSILZ Or IPrSI O69LSP (SrOS8Z) OFEHTHIZ— TLOE6I OE 16Sh (8S 169%) OP LEZTSOC— 66r O0S9EF (O8'PSSZ) O8'S80907— YZ 000S 


b8TrT 06'9LE9I OE'EL6 (Lr'YsSor) OF PLOLI— Pr’ 06186 (8888) Ob EtLSp— STO  OL'S96 (IT IZ91) OE 678 I9— u9 
TOTLI = OW B89S9T 01096 (6S'096) O0OSÞLSO9— vori OS1L6 § (SrIEPl) OS Ser rr— 970 069r6 (€O'L861) OS'8SZLES— up 
SOOIT OF GPLLI OT9I6 (67OS6I) OL'86FSP— IPI 09°76 = (OL'BLTT) OASI +70 O99L8 (8E8SO~) OTTZOTP— uz OOOI 
¢9'9¢ O€ 0P69T O1S8h (OT'E6L) OTT6SEE— EST 0006r (ZELIL) OU LEP ET— 900 0078 (76688) OS'TLEOC— 49 
9€ Br O6'ESTSI 00'087 (99'806) O0'ESSOE—  8L'I OL8r (SP969) = OF 89. TT— LOO §O€ELb (EP'SLZI) OS 9S8L7— up 
TS'OE OL'6SILI OL Sh (8r'96L) OV'TOSET— 6LI OS 19h (9C'S79) 08'070 I7— 900  OL'8E} (LE'69TI) OL 66807— uz OOS 
99°€ OL'SZOLI 0796 (8'677) «9 8FOL— 100 0v'96 (90e) O8S'SIOS— 100 > O¢s6  (S8'SIE) O8ZTZTZT9— u9 
667 OE'LSI9I 086 (616%) OTT919— 100 0L'S6 (8'607) 06'86St— 100> OL%  (IS'Y9€) OS98Es— Up 
617 00°6r601 0706 (6S 9h)  00°rr6r— 10°0 07 16 (SOTE) = OW STEP— 100> OLL8 (pTO) §~=— 00'09TP— uz OOI 
BSI O6TOIEl OS8r (6L TEI) o¢0z9E— 100> O98r (8I'SIZ@) RO eee 100> O@8r (Cro | oes ug 
Iv'l 096676 O09'Lr = (EP'881) = OW PI EE— 100> O9Lr (97997)  OTO9ST— 100> O0Lr (€6'8SZ)  O8'~%ZOE— up 
Iw 09 7S9 «00H =e (HT LET) ~—s OV LET— 100> Oth (LEP8Z)  0L:9907— 100> Ot (8L0S%) Of 6807- uz OS 
08°0 00'P06S  O€'8I (ZS'6EL) 00 TZET— 00> Ost (SOTED  00'PZOI— 100> OLLI (ZL100) OTC'SEII— 49 
vL'0 OO'Serh OTSI (99'ET1)  OT'SÞZTI— 100> O€8I (9¢°¢8) 07'666— 100> OTSI (89'SS1) OL €601— 
79°0 OL'ErST OLLI (Slr) = OL'796— 100> OPLI (8r'0ST) 09'S r8— 100> OOLT (S9°E€1) Or’ Ss8— UZ OT 
(S) WL SIeag ƏZIS PIS ənna (S) IL ƏZIS PIS əneA (S) IL ƏZIS PIS NLA 

(07027 IASNAA UNdAH-dq uM u 


(IUSNAA) [S789] Woy 
WIYWOSTe oy) pue (4N4H-dA) YOM sy) ut posodord onstnay oy} 0} poredurod st ODY ‘SZO = P yum soovuwysur 081 OY) 1047 s}[Nsor ezuownodxq €°gg ajqeL 


67S06h OO'EPH9OT OL'89Sh (LrI68r) OO'STLIST— 66°6L9% OTLOLY (1I8'6E87) OLESPILI— 67L O9SZEF (LELODTZI) OO'ZITLIZ— 49 
66°C86E OL'SLTHI OL 66EP (SP'OL8L) OF'960707T— S9'S8ST OO'OSSh (ZOESTZ) OO'SSEEII-— CHO OTSPOr (ES 88EL) O9'SODLLI— Ub 
pOPLIT OTETTSI OS EELE CIS LS9P) OO'LE8FII— OLOrPYT OLIZ6E (999677) OS'EEESET— ETS ODILEE (HL O86r) O6ZZI6II— YZ 000S 


O6ZIZ OC 6IZIT 07076 (8T'89ET) OO LO8ES— IVT Or Ir6 (82°98)  0S°06L9E— LTO  OO'SL8 (OL9LEZ) OTOSTSp— u9 
8L091 OLtSYLI OTE68S (L6'TO9I) OOT90FF— 698I Orso6 (68'19p1) OS LIEre— ETO O8II8 (6S OE0E) OV'SILLE— up 
6L'SII OT'OSELI OE'98L (EO'9EEI) OT9PO9T— OLI OVOLL (Oprs66) 08'S6697— ETO O9IL9 (ELLSI) O8E6rr7— YZ OODI 
SSeS OCLOTII O8'8Sr (SS'889) OSOETLZ—  8ET OO ILr (S96EL) OF'9688I— 900 O6rEr (ETZTOTZ) O998ETZ— U9 
IL'O = OV O8B8I OS LPH (IT I061) OOPETTZ— 877 Oc ery (9rE0L) 00°9S7LI— 900 O00@r I?) O8IZE8I— up 
SPIE OG O9EST 09807 (8E0SL) O6ISEFI-— IZ OL'86€ (6809) OS'S80rI— 900 O68PE CL R0EI) OS POrZI— YZ OOS 
OE OT PrLOT OTI6 (OrEZr) OLL98S— z0'0 Ole6 = (ZPLE) ~—- OL 6S0H— 100> O€L48  (8I'7s) Orelsr— u9 
80'E OlSrrrl 0768 (8S'SLE) og Teos— 700 06'68 (6TS0€) OL TL8E— 100> OCZS (69'659) 00` E06£— up 
EET OL'OLSTI O6'tL (8 9Er) 09L8IE— 10`0 OT9L  (EL'60Y) OL'8T8T— 100> OrL9 (167r) OĽezscz— uz ool 
TO'TI OO'PPZOI O6'tr  (06'S97) Or'ET6T— 100> OLS  (SL'LOE)  OT'99I7— 100> 09'e}  (€0'89E) OLTZLET— u9 
Or'l O8'SePIl OS eh (ES'8LZ)  06°99F7— 100> oger (EP'807%) OF TI8I— 100> OL6E (LEP) OE'ZI6I— up 
871 OO'EEI6 O7'6E (8T'Y67)  OL'PSST— 10> osse o OL9LPI— 100> O9EE  (88°ELZ) O€ 9F7I— uz OS 
€8°0 O€OZ6E O6'LT (977ST)  OL'OLTI— 100> OSZI (SOTSI) OD LP6— 100> OLLI (OS L61) OT TEOI— ug 
TLO OSTIS OCLI (6T6rI)  06°600T— 100> OVI (PT'80I1) O£'908— 100 > 06ST (89'0E7) 09 TE8— up 
T90 OS'99LY 00'ST (STILI) oe TE9— 100> 06741 (69°9S1)  OT'69S— 100> 0971 (OTETI) os r7s— uz OT 
(S) L seag ƏZIŞ PIS ənea (S) UNL ƏZIS PIS ənjea (S) WL ƏZIŞ PIS ƏnJeA 

OOV TUSNAA YWAAH-dqd w u 


(IUSNAA) [S789] woy 
wpuogge oy) pue (UNIH-dA) JOM syp ur pasodoid onstmnay əy} 0} poseduioo st ODY ‘S'O = b ym sovugysut ORT I) JO} s}[NsoI feJUoUILIEdxY 4°gg alge 


1340 Part F | Swarm Intelligence 


Part F | 68.5 


ACO for the MWRA Problem | 68.5 Experimental Evaluation 1341 


Part F | 68.5 


W'OShh OGO6I PI ONLOLE (ZS EETH) OFO96IZTI— SOC6Ib OS OZIP (IO'OIEI) OSPILEZTI-— Sth OL P66 (87S9PL) OS'P8796— U9 
6h 9S9E OP HCSOL OSTLIE (LISTIS) OE BESPL—  96'E89E OO'E9DE (OS'LODI) O8'SS6TOI— O89 OS EIST (S6'ETIO) O8'SET89O— Up 


LSOLLZ OF 6ST9L OL968I (SrPL8I) OO'68TET— ESTCTE OO'ELPZ (LSIZ8I) Oe sS9rgs— E'S  OLTEST (89°9EEZ) O9OLL9E— uz 000S 
LOV6l OS TL6O9L OLSSL (E661) O6'EZOST— G6LOE OS 778 (S869) OE 887TSz— 670 O81Z9 (Z9'00ST) OL P9S7Z— 49 
ST8ST O6OLO9I OF E69 (Zr P8hI) OO'TOESI— 878%  OTSTL (LOLZSI) OT'8OSOZT— 870 0681S (IL'867TI) O6'terri— up 
LEOIT OE'SSSSI OL'ELY (8T'9OILI) 06°6008— pOrT O9E8P (EO'TLEI) OT'806TI— ETO OS LIE  (T8'SSSI) OF ETES— uz O00 
pes OTIP8LI 0896€ (9S'6r8) OSEOZTSI— CLE os'cor (0'809) 06°1697I— L00 OTITE (98'SSTI) OE'ZTP9II— v9 
6€ Ir OOLZOLI OTOPE (OS'SITI) OL98E0I— 6'E OSL9E (9666S) OF E6E OT— 900 O€ 9% (ET'E6EI) OF ETLL— up 
6S IE  O8SOLLI Or'89% (86'006) OF EsIs— 68 Or'e9c (7878) ~— OF S649 — 900 00'861 (ZS 708) OF LP9r—- YZ OOS 
T6'E 09°60891 0808 (Þr'9It) OS'TISE— €0°0 os08 (LS r6z) 07 8E6T— 100> 0029 (9617S) O8L8L7— u9 
o'e OS'SIESTI O789 (ELITE)  or'so9z7— Z0'0 O60L (ZETLZ) OTOITZ- 100> 06S  (SL97Z) O899LI- “Pr 
997 OF 98STZI OE SP (€@6rh) OS r77I— TO'O OS'SY (008€) O6T8II— 100 > OIE (secre)  06`L06— uz OOT 
80°7 O6TLSTI 098 (OT'6EZ)  O8'LI8I— 100> Olly (OTOI) 06 6rrI— 100> OTE (LSE) = OV TETI— ug 
68'1 O6ISPOL O0'rE (T91) o9'9geT— 100> Ose (89°091)  OS'STIT— 100> 0997 (STL8Z) = OL 198— up 
bL OVIISL Ogez (SI'20%)  OL'TH9— 100> OLre (CPrL9I) 08°08S— 100 > OL'6I (I9'0LI)  00'88t— uz OS 
IE OL'86ES OL'ST (O6'Lrl) OE'8E8— 100> OLSI (89°rL1)  01°869— 100> 0971 (67It) = Ol 6rS— ug 
trI OC TSP9 O9'EI (rr'901)  0L'09S— 100> OSTI (LVLOD) = O€ 6Lb— 100> 066 (ET'EEI) Os 16e— up 
LT OO'TEPI 066 (TEY91) = OT ZHT— 100> O00T (99°LS1) | E= 100> OSL (90'rEIT) 0S 98I— uz oz 
(S) dW, sRAg aZIS PIS ənjea (S) IUL əƏZIŞ PIS anyeA (S) IL ƏZIS PIS onjeA 

OOV TYSNAA UNAH-dq uu u 


(IUSNAA) [S789] Woy 
UIYWOSTe oy) pue “(YNAH-dd) YOM sy) ur pasodoid oysunoy sy} 0} paredurod st ODY ‘GLO = F YIM sovueysur 081 OY} 107 sipnsar TeyuowTIodxy 4°89 ajqeL 


1342 


s°89 | 4 Hed 


Part F 


Swarm Intelligence 


a) q= 
5000 4 


4500 
4000 
3500 
3000 
2500 
2000 
1500 
1000 

500 


b) q= 


Concerning the 180 instances with q = 0.25, the 
results allow us to make the following observations. 
First, ACO is for all combinations of n and m the 


l 
2n 4n 6n|2n 4n 6n 


025 


E DP-Heur 
I VeENSRI 
Aco 


mam m 
2n 4n 6n 


n= 100 


2n 4n 6n 
n=50 


2n 4n 6n 
n=20 


2n 4n 6n 
n = 5000 


n = 500 |n = 1000 


0.5 


5000 4 


4500 
4000 
3500 
3000 
2500 
2000 
1500 
1000 

500 


c) q= 


0 
2n 4n 6n 


WE DP-Heur 
J VenSri 
Aco 


2n 4n 6n 
n= 100 


2n 4n 6n 
n=50 


2n 4n 6n 
n = 5000 


ai 
2n 4n 6n|2n 4n 6n 


n=20 n = 500 |n = 1000 


0.75 


5000 4 


4500 
4000 
3500 
3000 
2500 
2000 
1500 
1000 

500 


E DP-Heur 
I VenSri 
Aco 


0 
2n 4n 6n|2n 4n 6n|2n 4n 6n|2n 4n 6n|2n 4n 6n|2n 4n 6n 


Fig. 68. 


n=20 | n=50 | n=100 | n= 500 |n = 1000/7 = 5000 


4 These graphics show, for each combination of n and m, 


information about the average size — in terms of the number of 
arcs — of the solutions produced by DP-HEUR, ACO, and VENSRI 


best-performing algorithm. Averaged over all prob- 
lem instances ACO obtains an improvement of 29.8% 
over VENSRI. Figure 68.3a shows the average im- 
provement of ACO over VENSRI for three groups 
of input instances concerning the different arc den- 
sities. It is interesting to observe that the advantage 
of ACO over VENSRI seems to grow when the arc 
density increases. On the downside, these improve- 
ments are obtained at the cost of a significantly in- 
creased computation time. Concerning heuristic DP- 
HEUR, we can observe that it improves over VEN- 
SRI for all combinations of n and m, apart from 
(n = 100, m = 2n) and (n = 500, m = 2n). This seems 
to indicate that, also for DP-HEUR, the sparse in- 
stances pose more of a challenge than the dense 
instances. Averaged over all problem instances, DP- 
HEUR obtains an improvement of 18.6% over VENSRI. 
The average improvement of DP-HEUR over VEN- 
SRI is shown for the three groups of input instances 
concerning the different arc-densities in Fig. 68.3a. 
Concerning a comparison of the computation times, 
we can state that DP-HEUR has a clear advan- 
tage over VENSRI especially for large-size problem 
instances. 

Concerning the remaining 360 instances (q = 0.5 
and q = 0.75), we can make the following additional 
observations. First, both ACO and DP-HEUR seem to 
experience a downgrade in performance (in compari- 
son to the performance of VENSRI) when q increases. 
This holds especially for rather large and rather sparse 
graphs. While both algorithms still obtain an aver- 
age improvement over VENSRI in the case of q = 
0.5 — that is, 19.9% improvement in the case of ACO 
and 7.3% in the case of DP-HEUR — both algorithms 
are on average inferior to VENSRI in the case of 
q = 0.75. 

Finally, Fig. 68.4 presents the information which 
is contained in column size of Tables 68.3-68.5 in 
graphical form. It is interesting to observe that the 
solutions produced by DP-HEUR consistently seem 
to be the smallest ones, while the solutions pro- 
duced by VENSRI seem generally to be the largest 
ones. The size of the solutions produced by ACO 
is generally in between these two extremes. More- 
over, with growing q the difference in solution size 
as produced by the three algorithms seems to be 
more pronounced. We currently have no explana- 
tion for this aspect, which certainly deserves further 
examination. 


ACO for the MWRA Problem | References 


68.6 Conclusions and Future Work 


In this work, we have proposed a heuristic and an ACO 
approach for the minimum-weight rooted arboresence 
problem. The heuristic makes use of dynamic program- 
ming as a subordinate procedure. Therefore, it may be 
regarded as a hybrid algorithm. In contrast, the pro- 
posed ACO algorithm is a pure metaheuristic approach. 
The experimental results show that both approaches are 
superior to an existing heuristic from the literature in 
those cases in which the number of arcs with positive 
weights is not too high and in the case of rather dense 
graphs. However, as far as sparse graphs with a rather 


References 


68.1 R. Poli, J. Kennedy, T. Blackwell: Particle swarm op- 
timization - an overview, Swarm Intell. 1(1), 33-57 
(2007) 

68.2 M.F. Tasgetiren, Q.-K. Pan, P.N. Suganthan, A.H.- 
L. Chen: A discrete artificial bee colony algorithm 
for the total flowtime minimization in permutation 
flow shops, Inf. Sci. 181(16), 3459-3475 (2011) 

68.3 M. Dorigo, T. Stiitzle: Ant Colony Optimization (MIT, 
Cambridge 2004) 

68.4 E. Türetken, G. González, C. Blum, P. Fua: Auto- 
mated reconstruction of dendritic and axonal trees 
by global optimization with geometric priors, Neu- 
roinformatics 9(2/3), 279-302 (2011) 

68.5 V. Venkata Rao, R. Sridharan: Minimum-weight 
rooted not-necessarily-spanning arborescence 
problem, Networks 39(2), 77-87 (2002) 

68.6 V. Venkata Rao, R. Sridharan: The minimum weight 
rooted arborescence problem: Weights on arcs 


large fraction of positive weights are concerned, the 
existing heuristic from the literature seems to have ad- 
vantages over the algorithms proposed in this chapter. 

Concerning future work, we plan to develop a hy- 
brid ACO approach which makes use of dynamic pro- 
gramming as a subordinate procedure, in a way similar 
to the proposed heuristic. Moreover, we plan to im- 
plement an integer programming model for the tackled 
problem — in the line of the model proposed in [68.11] 
for a related problem — and to solve the model with an 
efficient integer programming solver. 


case, IIMA Working Papers WP1992-05-01_01106 
(Indian Institute of Management Ahmedabad, Re- 
search and Publication Department, Ahmedabad 
1992) 

68.7 W.T. Tutte: Graph Theory (Cambridge Univ. Press, 
Cambridge 2001) 

68.8 C. Blum: Revisiting dynamic programming for find- 
ing optimal subtrees in trees, Eur. J. Oper. Res. 
177(1), 102-114 (2007) 

68.9 T. Stiitzle, H.H. Hoos: MAX-MIN ant sys- 
tem, Future Gener. Comput. Syst. 16(8), 889-914 
(2000) 

68.10 C. Blum, M. Dorigo: The hyper-cube framework for 
ant colony optimization, IEEE Trans. Syst. Man Cy- 
bern. B 34(2), 1161-1172 (2004) 

68.11 C. Duhamel, L. Gouveia, P. Moura, M. Souza: Models 
and heuristics for a minimum arborescence prob- 
lem, Networks 51(1), 34-47 (2008) 


1343 


89 | 4 Hed 


69. An Intelligent Swarm of Markovian Agents 


Dario Bruneo, Marco Scarpa, Andrea Bobbio, Davide Cerotti, Marco Gribaudo 


We define a Markovian agent model (MAM) as an 
analytical model formed by a spatial collection of 
interacting Markovian agents (MAs), whose prop- 
erties and behavior can be evaluated by numerical 
techniques. MAMs have been introduced with the 
aim of providing a flexible and scalable frame- 
work for distributed systems of interacting objects, 
where both the local properties and the interac- 
tions may depend on the geographical position. 
MAMs can be proposed to model biologically in- 
spired systems since they are suited to cope with 
the four common principles that govern swarm 
intelligence: positive feedback, negative feedback, 
randomness, and multiple interactions. In the 
present work, we report some results of a MAM for 
a wireless sensor network (WSN) routing protocol 
based on swarm intelligence, and some prelim- 
inary results in utilizing MAs for very basic ant 
colony optimization (ACO) benchmarks. 


69.1 Swarm Intelligence: 


A Modeling Perspective........................ 1345 
69.2 Markovian Agent Models...................... 1346 
69.2.1 Mathematical Formulation......... 1347 
69.3 A Consolidated Example: 
WSN Routing ...................ccccccceeseeeeeeeees 1349 
69.3.1 A Swarm Intelligence Based 
ROUTINE x55 scastesesendanescheayesigeneszsd 1349 
69.3.2 The MAM Model... 1350 
69.3.3 Numerical Results... 1351 
69.4 Ant Colony Optimization....................... 1354 
69.4.1 The MAM Model... 1354 


69.4.2 Numerical Results 
for ACO Double 


Bridge Experiment..................... 1357 
69.5 Conclusions ..0........ eee cece cece eceeeee ae es 1358 
(CS CUCL 1. gr 1359 


69.1 Swarm Intelligence: A Modeling Perspective 


Swarm intelligent (SI) algorithms are variously inspired 
from the way in which colonies of biological organ- 
isms self-organize to produce a wide diversity of func- 
tions [69.1, 2]. Individuals of the colony have a limited 
knowledge of the overall behavior of the system and 
follow a small set of rules that may be influenced by the 
interaction with other individuals or by modifications 
produced in the environment. The collective behavior of 
large groups of relatively simple individuals, interacting 
only locally with few neighboring elements, produces 
global patterns. Even if many approaches have been 
proposed that differentiate in many respects, four ba- 
sic common principles have been isolated that govern 
SI: 


© Positive feedback 
© Negative feedback 


@ Randomness 
© Multiple interactions. 


The same four principles also govern a class of al- 
gorithms inspired by the expansion dynamics of slime 
molds in the search for food [69.3, 4], that have been 
utilized as the base for the generation of routing proto- 
cols in wireless sensor networks (WSNs). 

Through the adoption of the above four principles, 
it is possible to design distributed, self-organizing, and 
fault tolerant algorithms able to self-adapt to the en- 
vironmental changes, that present the following main 
properties [69.1]: 


i) Single individuals are assumed to be simple with 
low computational intelligence and communication 
capabilities. 


1345 


v 
o 
= 
= 
“Tl 
OV 
Ke 
= 


13346 Part F | Swarm Intelligence 


2°69 | d Hed 


ii) Individuals communicate indirectly, through modi- 
fication of the environment (this property is known 
as stigmergy [69.2]). 

iii) The range of the interaction may be very short; nev- 
ertheless, a robust global behavior emerges from the 
interaction of the nodes. 

iv) The global behavior adapts to topological and envi- 
ronmental changes. 


The usual way to study such systems is through 
simulation, due to the large number of involved in- 
dividuals that lead to the well-known state explosion 
problem. Analytical techniques are preferable if, start- 
ing from the peculiarities of SI systems, they allow to 
realize effective and scalable models. Along this line, 
new stochastic entities, called Markovian agents (MAs) 
[69.5,6] have been introduced with the aim of pro- 
viding a flexible, powerful, and scalable technique for 
modeling complex systems of distributed interacting 
objects, for which feasible analytical and numerical so- 
lution algorithms can be implemented. Each object has 
its own local behavior that can be modified by the mu- 
tual interdependences with the other objects. MAs are 
scattered over a geographical area and retain their spa- 
tial position so that the local behavior and the mutual 
interdependencies may be related to their geographical 
positions and other features like the transmittance char- 
acteristics of the interposed medium. MAs are modeled 
by a discrete-state continuous-time finite Markov chain 
(CTMC) whose infinitesimal generator is influenced by 
the interaction with other MAs. The interaction among 
agents is represented by a message passing model com- 
bined with a perception function. When residing in 
a state or during a transition, an MA is allowed to 
send messages that are perceived by the other MAs, 
according to a spatial-dependent perception function, 
modifying their behavior. Messages may model real 


69.2 Markovian Agent Models 


The structure of a single MA is represented in Fig. 69.1. 
States i, j, . . . , k are the states of the CTMC representing 
the MA. The transitions among the states are of two 
possible types and are drawn in a different way: 


© Solid lines (like the transition from i to j or the 
self-loops in 7 or in j) indicate the fixed compo- 
nent of the infinitesimal generator and represent the 
local or autonomous behavior of the object that is 


physical messages (as in WSNs) or simply the mutual 
influences of an MA over the other ones. 

The flexibility of the MA representation, the spatial 
dependency, and the mutual interaction through mes- 
sage passing and perception function, make MA models 
suited to cope with various biologically inspired mech- 
anisms governed by the four aforementioned principles. 
In fact, the MAM, whose constituent elements are the 
MAs, was specifically studied to cope with the follow- 
ing needs [69.6]: 


i) Provide analytical models that can be solved by nu- 
merical techniques, thus avoiding the need of long 
and expensive simulation runs. 

ii) Provide a flexible and scalable modeling framework 
for distributed systems of interacting objects. 

iii) Provide a framework in which local properties can 
be coupled with global properties. 

iv) Local and global properties and interactions may 
depend on the position of the objects in the space 
(space-sensitive models). 

v) The solution algorithm self-adapts to variations in 
the system topology and in the interaction mecha- 
nisms. 


Interactive Markovian agents have been first in- 
troduced in [69.5,7] for single class MAs and then 
extended to multiclass multimessage Markovian agent 
model in successive works [69.8—10]. In [69.9, 11, 12], 
MAs have been applied to routing algorithms in WSNs, 
adopting SI principles [69.13]. 

This work describes the structure of MAMs and 
the numerical solution algorithms in Sect. 69.2. Then, 
applications derived from biological models are pre- 
sented: a swarm intelligent algorithm for routing pro- 
tocols in WSNs (Sect. 69.3) and a simple ant colony 
optimization (ACO) example (Sect. 69.4). 


independent of the interaction with the other MAs 
(like, for instance, the time-to-failure distribution, 
or the reaction to an external stimulus). Note that 
we include in the representation also self-loop tran- 
sitions that require a particular notation since they 
are not visible in the infinitesimal generator of the 
CTMC [69.14]. 

@ Dashed lines (like the transition from i to k or the 
transitions entering into 7 or j from other states 


An Intelligent Swarm of Markovian Agents 


69.2 Markovian Agent Models 


not shown in the figure) represent the transitions 
induced by the interaction with the other MAs. 
The way in which the rates of the induced tran- 
sitions are computed is explained in the following 
section. 


During a local transition (or a self-loop) an MA can 
emit a message of any type with an assigned probabil- 
ity, as represented by the dotted arrows in Fig. 69.1 
emerging from the solid transitions. The pair (gj, m) 
denotes both the message generation probability and 
the message type. Messages generated by an MA may 
be perceived by other MAs with a given probability, 
according to a suitable perception function, and the 
interaction mechanism between emitted messages and 
perceived messages generates the induced transitions 
(dashed lines). The pair (m, aj) denotes both the type 
of the perceived message and the corresponding accep- 
tance probability. 

An MAM is a collection of interacting MAs defined 
over a geographical space V. Given a position v inside 
V, p(v) denotes the density of MAs in v. According 
to the definition of the density p(v), we can classify 
a MAM with the following taxonomy: 


@ An MAM is static if p(v) does not depend on time, 
and dynamic if it does depend on time. 

@ An MAM is discrete if the geographical area on 
which the MAs are deployed is discretized and p(v) 
is a discrete function of the space or it is continuous 
if e(v) is a continuous function of the space. 


Further, MAs may belong to a single class or to 
different classes with different local behaviors and in- 
teraction capabilities, and messages may belong to dif- 
ferent types where each type induces a different effect 
on the interaction mechanism. The perception function 
describes how a message of a given type emitted by an 
MA of a given class in a given position in the space 


Fig. 69.1 Schematic structure of a Markovian agent 


is perceived by an MA of a given class in a different 
position. 


69.2.1 Mathematical Formulation 


A multiple agent class, multiple message type MAM is 
defined by the tuple [69.12] 


MAM = {C, M, V,U, R}, (69.1) 


where C = {1,..., C} is the set of agent classes. We de- 
note with MA“ an agent of class c € C. M = {1,..., M} 
is the set of message types. Each agent (independently 
of its class) can send or receive messages of type m € 
M. V is the finite space over which Markovian agents 
are spread. U = {u!(-),...,u“(-)} is a set of M per- 
ception functions (one for each message type). R = 
fol(-),..., p°(-)} is a set of C agent density functions 
(one for each agent class). Each agent MA“ of class c 
is characterized by a state space with n, states, and it is 
defined by the tuple 


MAS = {Q*(v), A°(v), G° (v, m), A° (v, m), zo (V)} , 
(69.2) 


where Q°(v) is the local component of the infinitesimal 
generator; A‘(v) is the vector of the self-jump transition 
rates; G° (v, m) is the matrix containing the probabilities 
of generating a message of type m; A‘(v, m) is the ma- 
trix containing the probabilities of accepting a message 
of type m; x§(v) is the initial probability vector. 

Note that even if the structure of the CTMC associ- 
ated to each MA of a given class is the same for all the 
objects, the values of the parameters may depend on po- 
sition v and, therefore, may vary from MAs belonging 
to the same class. 

An MAM can be analyzed solving a set of coupled 
differential equations. Let us call pf (t, v) the density of 
agents of class c, in state i, located in position v at time 
t. In the following, we will focus on static MAMs thus 
assuming that the total density of agents in position v 
remains constant over the time; we have that 


Detvy = pv), Vv, vr>0. (69.3) 


i=l 


We collect the state densities into a vector p°(t, v) = 
[of (t, v)] and we are interested in computing the tran- 
sient evolution of p° (t, v). 

From the above definitions, we can compute the 
total rate B*(v, m) at which messages of type m are gen- 


1347 


7°69 | d Hed 


1348 Part F 


Swarm Intelligence 


7°69 | d Hed 


erated by an agent of class c in state j in position v 
È — jc g 
B; (v,m) = À; (v) 8; m) 


+Y G) e;m), 
i 


(69.4) 


where the first term on the right-hand isde is the con- 
tribution of the messages of type m emitted during 
a self-loop from j and the second term is the contribu- 
tion of messages of type m emitted during a transition 
from j to any k (Æ j). 

The interdependences among MAs are ruled by a set 
of perception functions whose general form is 


u” (c,v,i,c', v, j). (69.5) 
The perception function u” (-) in (69.5) represents how 
an MA of class c in position v in state i perceives the 
messages of type m emitted by an MA of class c’ in 
position v’ in state j. The functional form of w’"(-) iden- 
tifies the perception mechanisms and must be specified 
for any given application since it determines how an 
MA is influenced by the messages emitted by the other 
MAs. The transition rates of the induced transitions are 
primarily determined by the structure of the perception 
function. 

A pictorial and intuitive representation of how the 
perception function u” (c, v, i,c’, v’,j) acts, is given in 
Fig. 69.2. The MA in the top right portion of the figure 
in position v’ broadcasts a message of type m from state 
j that propagates in the geographical area until reaches 
the MA in the bottom left portion of the figure in po- 
sition v and in state i. Upon acceptance of the message 
according to the acceptance probability aj,(v, m), an in- 
duced transition from state i to state k (represented by 
a dashed line) is triggered in the model. 


With the above definitions we are now in the po- 
sition to compute the components of the infinitesimal 
generator of an MA that depends on the interaction with 
the other MAs and that constitutes the original and in- 
novative part of the approach. 

We define y;;(t, v,m) the total rate at which mes- 
sages of type m coming from the whole volume V are 
perceived by an MA of class c in state 7 in location v. 


ngs 


c 
y(t, v, m) = | 5 5 u”(c,v, i,c, v, j) 


y d =1j=1 
x Be (mof (t, v’)dv' , 


where y(t, v,m) is computed by taking into account 
the total rate of messages of type m emitted by all 
the MAs in state j and in a given position v’ (the 
term B; (y, m)) times the density of MAs in v’ (the 
term pj(t, v’)) times the perception function (the term 
u” (c, v, i, c', v’,j)) summed over all the possible states 
j and class c’ of each MA and integrated over the whole 
space V. From an MA of class c in position v and in 
state i an induced transition to state k (drawn in dashed 
line) is triggered with rate y(t, V,m) aix(v,m) where 
aix(V, m) is the appropriate entry of the acceptance ma- 
trix A(v, m). 

We collect the rates (69.6) in a diagonal ma- 
trix P(t, v, m) = diag(y£(t, v, m)). This matrix can be 
used to compute K°(f, v), the infinitesimal generator of 
a class c agent at position v at time t 


K‘(t,v) = Q° + T°, v,m)[A“(m) — I]. (69.7) 


(69.6) 


The first term on the right-hand side is the local transi- 
tion rate matrix and the second term contains the rates 
induced by the interactions. 


Fig. 69.2 Message passing mecha- 
nism ruled by a perception function 


An Intelligent Swarm of Markovian Agents | 69.3 A Consolidated Example: WSN Routing 1349 


The evolution of the entire model can be studied by 
solving Vv, c the following differential equations: 


p° (0, v) = p° (V) T6 » (69.8) 
cet) = p° (t, v)K“(t, v). (69.9) 


From the density of agents in each state, we can com- 
pute the probability of finding a class c agent at time t 
in position v in state 7 as 


pf (t, v) 


‘ 69.10 
p(y) ; i 


ne(t, v) = 


We collect all the terms in a vector x° (t, v) = [x° (t, v)]. 
Note that the definition of (69.10) together with (69.3) 
ensures that }_; z; (t, v) = 1, Yt, Vv. 

Note that each equation in (69.9) has the dimen- 
sion n; of the CTMC of a single MA of class c. In this 
way, a problem defined over the product state space of 
all the MAs is decomposed into several subproblems, 
one for each MA, having decoupled the interaction 
by means of (69.6). Equation (69.9) provides the ba- 
sic time-dependent measures to evaluate more complex 
performance indices associated to the system. Equation 
(69.9) is discretized both in time and space and are 
solved by resorting to standard numerical techniques 
for differential equations. 


69.3 A Consolidated Example: WSN Routing 


In this section, we present our first attempt to model 
swarm intelligence inspired mechanisms through the 
MAM formalism. This application describes an MAM 
model for the analysis of a swarm intelligence rout- 
ing protocol in WSNs and was first proposed in [69.9] 
and then enriched in [69.12]. In this work, we show 
new experiments to illustrate the self-adaptability of the 
MAM model to the changing of environmental condi- 
tions. 

WSNs are large networks of tiny sensor nodes that 
are usually randomly distributed over a geographical 
region. The network topology may vary in time in an 
unpredictable manner due to many different causes. 
For example, in order to reduce power consumption, 
battery-operated sensors undergo cycles of sleeping — 
active periods; additionally, sensors may be located in 
hostile environments increasing their likelihood of fail- 
ure; furthermore, data might also be collected from dif- 
ferent sources at different times and directed to different 
sinks. For this reason, multihop routing algorithms used 
to route messages from a sensor node to a sink should 
be rapidly adaptable to the changing topology. Swarm 
intelligence has been successfully used to face these 
problems thanks to its ability in converging to a single 
global behavior starting from the interaction of many 
simple local agents. 


69.3.1 A Swarm Intelligence Based Routing 


In [69.15], a new routing algorithm, inspired by the 
biological process of pheromone emission (a chemi- 
cal substance produced and layed down by ants and 
other biological entities), has been proposed. The rout- 


ing table in each node stores the pheromone level owned 
by each neighbor, coded as a natural integer quan- 
tity [69.15]; when a data packet has to be sent it is 
forwarded to the neighbor with the highest pheromone 
level. This approach correctly works only if a sequence 
of increasing values of pheromone levels toward the 
sinks exists; in other words, the sinks must have the 
maximum pheromone level in the WSN and a decreas- 
ing pheromone gradient must be established around the 
sinks covering all the net. 

To build the pheromone gradient, the initial setting 
of the WSN is as follows: the sinks are set to a fixed 
maximum pheromone level, whereas the sensor nodes’ 
pheromone levels are set to 0. When the WSN is oper- 
ating, each node periodically sends a signaling packet 
with its pheromone level and updates its value based on 
the level of its neighbors. 

More specifically, the algorithm for establishing the 
pheromone gradient is based on two types of nodes in 
the WSN, called sinks and sensors, respectively, and the 
pheromone is assumed discretized into P different lev- 
els, ranging from 0 to P— 1. In this way, routing paths 
toward the sink are established through the exchange 
of pheromone packets containing the pheromone level 
p(0 < p < P—1) of each node. 

Sink nodes, once activated, set their internal 
pheromone level to the highest value p = P — 1. Then, 
they, at fixed time interval, broadcast a pheromone mes- 
sage to their neighbors with the value p. We assume T1 
is the time interval incurring between two consecutive 
sending of pheromone message. 

Instead, the pheromone level of a sensor node is 
initially set to O and then it is periodically updated ac- 


€°69 | J Hed 


1350 Part F 


Swarm Intelligence 


€°69 | 4 Hed 


cording to two distinct actions — excitation action (the 
positive feedback) and evaporation action (the negative 
feedback): 


@ Excitation action: Sensor nodes periodically broad- 
cast to the neighbors a pheromone message con- 
taining their internal pheromone level p. Like the 
sink node, sensor nodes perform the sending at reg- 
ular time interval T1. When a sensor node receives 
a pheromone level p, sent by a neighbor it com- 
pares p, with its own level p and updates the latter 
if pa >p. The new value is computed as a func- 
tion of the current and the received pheromone level 
update(p, pn). In this context, we use the average 
of the sender and the receiver level as the new up- 
dating value, thus the function is assumed to be 
update(p, Py) = round((p + py)/2). 

© Evaporation action: it is triggered at regular time 
interval T2 and it simply decreases the current value 
of p by one unit assuring it maintains a value greater 
or equal to 0. 


We note that, despite all nodes perform their exci- 
tation action with the same mean time interval T1, no 
synchronization activity is required among the nodes; 
all of them act asynchronously in accordance with the 
principles of biological systems where each entity acts 
autonomously with respect to the others. 

The excitation—evaporation process, like in biologi- 
cal systems, assures the stability of the system and the 
adaptability to possible changes in the environment or 
in some nodes. Any change in the network condition 
is captured by an update of the pheromone level of 
the involved nodes that modifies the pheromone gradi- 
ent automatically driving the routing decisions toward 
the new optimal solution. In this way, the network can 
self-organize its topology and adapt to environmen- 
tal changes. Moreover, when link failures occur, the 
network reorganization task is accomplished by those 
nodes near the broken links. This results in a robust and 
self-organized architecture. 

The major drawback of this algorithm is the dif- 
ficulty in appropriately setting the parameter T1 and 
T2: as shown in [69.12, 15], the stability of the sys- 
tem and the quality of the produced pheromone gradient 
is strictly dependent on the parameters ratio. When 
T1 decreases and T2 is fixed, pheromone messages 
are exchanged more rapidly among the nodes and 
their pheromone level tends to the maximum level be- 
cause the sink node always sends the same maximum 
value. Without an appropriate balancing action, the 
pheromone level saturates all the nodes of the WSN. 


At the opposite, let us suppose T1 is fixed and T2 de- 
creases; in this case the pheromone level in each sensor 
node decreases more quickly than its updating accord- 
ing to the value of the neighbors. As a result all the 
levels will be close to zero. From this behavior, we note 
that: (1) both timers are necessary to ensure that the al- 
gorithm could properly work, and (2) a smart setting of 
both timers is necessary in order to have the best gra- 
dient shape all over the network. The MAM model we 
are going to describe in the next section helps us to de- 
termine the best parameter values. 


69.3.2 The MAM Model 


The MAM model used to study the gradient forma- 
tion is based on two agent classes: the class sink node 
denoted by a superscript s and the class sensor node 
denoted by a superscript n. The message exchange is 
modeled by using M different message types. As we 
will explain later, since each message is used to send 
a pheromone level, we set M = P, where P is the num- 
ber of different pheromone intensities considered in the 
model. 


Geographical Space 

The geographical space V where the N agents are lo- 
cated is modeled as a ny, X ny, rectangular grid, and each 
cell has a square shape with side d,. Sensors can only be 
located in the center of each cell and we allow at most 
one node per cell: i.e., some cell might be empty, and 
N < m X ny. Moreover, sink nodes are very few with 
respect to the number of sensor nodes. 


Agent's Structure and Behavior 
Irrespective of the MA class considered, we model the 
pheromone level of a node with a state and this choice 
determines two different MA structures. 

The sink class (Fig. 69.3a) is very simple and is 
characterized by a single state labeled P — 1 with a self- 
loop of rate A = T The sink has always the same 
maximum pheromone level, and emits a single message 
of type P — 1 with rate À. 

Instead, the sensor class (Fig. 69.3b) has P states 
identifying the range of all the possible pheromone 
levels. Each state is labeled with the pheromone in- 
tensity i (i =0,...,P—1) in the corresponding node 
and has a self-loop of rate A = H that represents the 
firing of timer at regular intervals equal to T1. This 
event causes the sending of a message (Sect. 69.3.2). 
The evaporation phenomenon is modeled by the solid 
arcs (local transitions) connecting state i with state i— 1 


An Intelligent Swarm of Markovian Agents | 69.3 A Consolidated Example: WSN Routing 1351 


a) 


Me 
P= 


SA 
1 


(0 <i< P-— 1). The evaporation rate is set to y = oe 
in such a way we represent the firing of timer T2. 


Message Types 
The types of messages in the model correspond to the 
different levels of pheromone a node can store, thus we 
define M = {0,1,...,P—1}. Any self-loop transition 
in state i emits a message of the corresponding type i at 
a constant rate A, both in sink and in sensor nodes. The 
sink message is always of type P — 1, representing the 
maximum pheromone intensity, whereas the messages 
emitted by a sensor node corresponds to the state where 
it actually is. 

When a message of type m is emitted, neighbor- 
ing nodes are able to receive it changing their state 
accordingly. This behavior is implemented through the 
dashed arcs (whose labels are defined through (69.11)) 
that model the transitions induced by the reception of 
a message. In particular, when a node in state i receives 
a message of type m, it immediately jumps to state j if 
m € M(i, j), with 


M(i, j) = {m € [0, ..., P — 1] : round((m + i)/2) = j} 
Vi,je[0,...,P—1]:j>i. (69.11) 


In other words, an MA in state i jumps to the state j 
that represents the pheromone level equal to the mean 
between the current level i and the level m encoded in 
the perceived message. 


Perception Function 
Messages of any type sent by a node are characterized 
by the same transmission range t, that defines the radius 
of the area in which an MA can perceive a message pro- 
duced by another MA. This property is reflected in the 
perception function u”(-) that, Wm € [1,...,M], is de- 
fined as 


0 dist(v, v) >t 
u”(v,c,i, v’, c, i) = " (69.12) 


1 dist(v, vV) <t, 


Fig. 69.3a,b Markovian agent mod- 
els. (a) Agent class = sink, (b) Agent 
class = sensor 


where dist(v, v’) represents the Euclidean distance be- 
tween the two nodes at position v and v’. 

As can be observed, the perception function in 
(69.12) is defined irrespective of the message type, 
because in this kind of application the reception of 
a message of any type i depends only on the distance 
between the emitting and the perceiving node. The 
transmission range ¢, depends on the properties of the 
sensor and it influences the number n of neighbors per- 
ceiving the message. In the numerical experimentation, 
we consider d; < t4 < /2d, corresponding to n = 4. 

Generation and Acceptance Probabilities 
In this application, messages are only generated dur- 
ing self-loop transitions with probability 1, so that Vi, j, 
gi(m) = 1 and g; (m) = 0, (i £ j). Similarly, we assume 
either a; (m) = 0 or a} (m) = 1, that is incoming mes- 
sages are always accepted or always ignored. 


69.3.3 Numerical Results 


In order to analyze the behavior of the WSN model, the 
main measure of interest is the evolution of x’ (t, v) i. e., 
the distribution of the pheromone intensity of a sensor 
node over the entire area V as a function of the time. 
The value of 7; (t, v) can be computed from (69.10) and 
allows us to obtain several performance indices like the 
average pheromone intensity ¢ (t, v) at time f for each 
cell ve V 


P—1 


plt, v) =J i ntv). (69.13) 


i=0 


The distribution of the pheromone intensity over the en- 
tire area V depends both on the pheromone emission 
rate A and on the pheromone evaporation rate j1; fur- 
thermore, the excitation—evaporation process depends 
on the transmission range t, that determines the number 
of neighboring cells 7 perceived by an MA in a given 
position. To take into account this physical mechanism, 


€°69 | J Hed 


1352 


€°69 | 4 Hed 


Part F | Swarm Intelligence 


0 g 
w i 
w O 


10 
15 5 
20 
25 30 0 


Fig. 69.4a-c Distribution of the pheromone intensity varying r. (a) r = 1.2, (b) r= 1.8, (c) r = 2.4 


10 
i) 5 
20 
25 30 0 


10 
15 5 
20 
25 30 0 


c) 


10 
13 5 
20 
25 30 0 


Fig. 69.5a-f Distribution of the pheromone intensity with respect to t when two sinks are alternately activated. The 
change is applied at time t = 17.5s. (a) t = Os, (b) t = 17s, (c) t = 17.5, (d) t= 19s, (e) t= 24s, (f) t = 29s 


we define the following quantity, 


à-n 
r=—, 


H 


(69.14) 


which regulates the balance between the pheromone 
emission and evaporation in the SI routing algorithm. 
For a complete discussion about the performance in- 
dices that can be derived and analyzed using the de- 
scribed MAM, refer to [69.12]. 

The numerical results have been obtained with the 
following experimental setting. The geographical space 
is a square grid of sizes np = nw = 31, where N = 961 
sensors are uniformly distributed with a spatial density 
equal to 1 (one sensor per cell). Further, we set A = 4.0, 
P= 20, and n = 4. The first experiment aims at investi- 
gating the formation of the pheromone gradient around 


the sink as a function of the model parameters. To this 
end, a single sink node is placed at the center of the area 
and the pheromone intensity distribution is evaluated as 
a function of the parameter r, by varying jz being A and 
n fixed. 

Figure 69.4 shows the distribution of the pheromone 
intensity #(t,v) measured in the stable state for three 
different values of r. If the value of r is small (r= 
1.2) or high (r = 2.4), the quality of the gradient is 
poor. This is due to the prevalence of one of the two 
feedbacks: negative (with r = 1.2 evaporation prevails) 
or positive (with r= 2.4 excitation prevails and all 
sensors saturate). On the contrary, intermediate values 
(r = 1.8) generate well-formed pheromone gradients 
able to cover the whole area, thanks to the correct 
balance between the two feedbacks. Then, an oppor- 
tune evaluation of the value of r has to be carried 


An Intelligent Swarm of Markovian Agents 


69.3 A Consolidated Example: WSN Routing 


0 
O 10 20 30 40 50 60 70 80 90 


Fig. 69.6 The 100x 100 grid with 10000 cells and 50 ran- 
domly scattered sinks 


out in order to generate a pheromone gradient that fits 
with the topological specification of the WSN under 
study. 

In order to understand the dynamic behavior of the 
SI algorithm, we carried out a transient analysis able to 
highlight different phases of the gradient construction 
process when the position of the sink changes in time. 
In particular, in the following experiment (Fig. 69.5) we 
analyzed how the algorithm self-adapts to topological 
modifications by recalculating the pheromone gradient 
when two different sinks are present in the network and 
they are alternately activated. Figure 69.5a,b show how 
the pheromone signal is spread on the space V until 
the stable state is reached. At this point (t = 17.5 s), we 
deactivated the old sink and we activated a new one 
in a different position (Fig. 69.5c). Figure 69.5d,e de- 
scribe the evolution of the gradient modification. It is 
possible to observe that, thanks to the properties of the 
SI algorithm, the WSN is able to rapidly discover the 
new sink and to change the pheromone gradient by for- 
getting the old information until a new stable state is 
reached (Fig. 69.5f). 


Fig. 69.7 Distribution of the pheromone intensity when 
the network is composed by a grid of 10000 sensor nodes 
with 50 sinks 


Finally, in order to test the scalability of the MAM 
in more complex scenarios, we have assumed a rectan- 
gular grid with ny, = ny = 100 hence with N = 100 x 
100 = 10000 sensors, and we have randomly scattered 
50 sinks in the grid. The grid is represented in Fig. 69.6, 
where the sinks are drawn as black spots. Since each 
sensor is represented by an MA with P = 20 states 
(Fig. 69.3b), the product state space of the overall sys- 
tem has N = 2019 states! 

The steady pheromone intensity distribution for the 
geographical space represented in Fig. 69.6 is reported 
in Fig. 69.7. Through this experiment, we can assess 
that the pheromone gradient is also reached when no 
symmetries are present in the network and that the pro- 
posed model is able to capture the behavior of the pro- 
tocol in generating a correct pheromone gradient also 
in the presence of different maximums. Using the same 
protocol configurations found for a simple scenario, the 
SI algorithm is able to create a well-formed pheromone 
gradient also in a completely different situation, making 
such routing technique suitable in nonpredictable sce- 
narios. This scenario also demonstrates the scalability 
of the proposed analytical technique that can be easily 
adopted in the analysis of very large networks. 


1353 


€°69 | J Hed 


13354 Part F | Swarm Intelligence 


7°69 | 4 Hed 


69.4 Ant Colony Optimization 


The aim of this section is to show how MAMs 
can be adopted to represent one of the more clas- 
sical swarm intelligence algorithm known as ACO 
[69.2], that was inspired by the foraging behav- 
ior of ant colonies which, during food search, ex- 
hibit the ability to solve simple shortest path prob- 
lems. To this end, in this work, we simply show 
how to build a MAM that solves the famous Dou- 
ble Bridge Experiment which was first proposed by 
Deneubourg et al. in the early 90s [69.16, 17], and that 
has been proposed as an entry benchmark for ACO 
models. 

In the experiment, a nest of Argentine ants is con- 
nected to a food source using a double bridge as shown 
in Fig. 69.8. Two scenarios are considered: in the 
first one the bridges have equal length (Fig. 69.8a), 
in the second one the lengths of the bridges are dif- 
ferent (Fig. 69.8b). The collective behavior can be 
explained by the way in which ants communicate in- 
directly among them (stigmergy). During the journey 
from the nest to the food source and vice versa, ants 
release on the ground an amount of pheromone. More- 
over ants can perceive pheromone and they choose 
with greater probability a path marked by a stronger 
concentration of pheromone. As a results, ants releas- 
ing pheromone on a branch, increase the probability 
that other ants choose it. This phenomenon is the re- 
alization of the positive feedback process described 
in Sect. 69.1 and it is the reason for the conver- 
gence of ants to the same branch in the equal length 
bridge case. When lengths are different, the ants choos- 
ing the shorter path reach the food source quicker 
than those choosing the longer path. Therefore, the 
pheromone trail grows faster on the shorter bridge 
and more ants choose it to reach food. As a result, 
eventually all ants converge to follow the shortest 
path. 


b) IN 


Nest Food 


Fig. 69.8a,b Experiment scenarios. Modified from Goss etal. 
[69.17]. (a) Equal branches, (b) Different branches 


69.4.1 The MAM Model 


We represent the double bridge experiment through 
a multiple agent class and multiple message type MAM. 
We model ants by messages, and locations that ants 
traverse by MAs. Three different MA classes are intro- 
duced: the class Nest denoted by superscript n, the class 
Terrain denoted by superscript t, and the class Food de- 
noted by superscript f. Two types of messages are used: 
ants walking from the nest to the food source corre- 
spond to messages of type fw (forward), whereas ants 
coming back to the nest correspond to messages of type 
bw (backward). 


Geographical Space 

Agents (either nest, terrain, or food source) are de- 
ployed on a discrete geographical space V represented 
as an undirected graph G = (V, E), where the elements 
in the set V are the vertices and the elements in the 
set E are the edges of the graph. In Fig. 69.9a,b, we 
show the locations of agents for the equal and the dif- 
ferent length bridge scenarios, respectively. The squares 
are the vertices of the graph and the labels inside them 
indicate the class of the agent residing on the vertex. 
In this model, we assume that only a single agent re- 
sides on each vertex. Message passing from a node to 
another is depicted as little arrows labeled by the mes- 
sage type. As shown in Fig. 69.9, different lengths of the 
branches are represented by a different number of hops 
needed by a message to reach the food source starting 
from the nest. Figure 69.9c represents a three branches 
bridge with branches of different length. 


Agent's Structure and Behavior 
The structure of the three MA classes is described in the 
following: 


@ MA Nest: The nest is represented by a single MA 
of class n, shown in Fig. 69.10a. The nest MA” is 
composed by a single state that emits messages of 
type fw at a constant rate A, modeling ants leaving 
the nest in search for food. 

@ MA Terrain: An MA of class t (Fig. 69.10c) repre- 
sents a portion of terrain on which an ant walks and 
encodes in its state space the concentration of the 
pheromone trail on that portion of the ground. We 
assume that the intensity of the pheromone trail is 
discretized in P levels numbered 0,1,...P—1. 


An Intelligent Swarm of Markovian Agents | 69.4 Ant Colony Optimization 1355 
a) c) 
b Méw bo t 
0 et 
f Tia “ios 
= `A 
gien na bi myw 
Pe 
TMs ` A- k 4 
= s -v m 5 b2 sy Mow ae 
mee j Mg w \ = Mow — Mow 
bı a- Ei E 
Mey Mey 
Fig. 69.9a-c Graph used to model the experiment scenarios. (a) Equal branches, (b) two different branches, (c) three 
different branches 

With reference to Fig. 69.10c, the meaning of the © MA Food source: An MA of class f represents the 

states is the following: food source (Fig. 69.10b). The reception of a mes- 

sage of type fw in state fọ indicates that a forward 

@ to denotes no pheromone on the ground and no ant ant has reached the food source. After a mean time 

walking on it; of 1/ns, such an ant leaves the food and starts 

@ z; denotes a concentration of pheromone of level i the way back to the nest becoming a backward ant 
and no ant on the ground; (emission of message of type bw). D 
© ti denotes an ant of forward type residing on the ter- a 
rain while the pheromone concentration is at level i; In order to keep model complexity low thus increas- — 
© tj denotes an ant of backward type residing on ing the model readability, we have chosen to limit to 1 S 
the terrain while the pheromone concentration is at the number of ants that can reside at the same time on F 


level i. 


The behavior of the MA‘ agent at the reception of 
the messages is the following: 


@ fw- forward ant: A message of type fw perceived 
by an MA’ in states t;, induces a transition to state 
t(-1)¢ Meaning that the arrival of a forward ant in- 
creases the pheromone concentration of one level 
(positive feedback). 

@ bw-backward ant: A message of type bw perceived 
by an MA’ in states t;, induces a transition to state 
t(+1)p Meaning that the arrival of a backward ant 
increases the pheromone concentration of one level 
(positive feedback). 


Ants remain on a single terrain portion for a mean 
time of 1/ns, then they leave toward another destina- 
tion. The local transitions from states fj to states t; and 
the generation of message fw model this behavior for 
forward ants. An analogous behavior is represented for 
backward ants by local transitions from states tņ to 
states t;. The local transitions at constant rate jz from 


a portion of terrain (or in the food source). For this rea- 
son, message reception is not enabled in states fir or tip 
for MAs of class ¢ and in the state fı for MAs of class f. 
In future works, we will study effective techniques (e.g., 
intervening on MA density) in order to release such an 
assumption. 


Perception Function 
The perception function rules the interactions among 
agents and, in this particular example, defines the prob- 


states f; to states ¢;_; indicate the decreasing of one unit Fig. 69.10a—c Markovian agent models for the ACO experiment. 
of the concentration of pheromone due to evaporation (a) MA”: Agent of class nest. (b) MA‘: Agent of class food. 
(negative feedback): (c) MA‘: Agent of class terrain 


Swarm Intelligence 


1356 PartF 
a~) 
og 
= 
m) 
(e) 
= 
F 
a) 


ability that a message (ant) follows a specific path both 
on the forward and backward direction. The defini- 
tion of the perception function takes inspiration on the 
stochastic model proposed in [69.16, 17] to describe the 
dynamic of the ant colony. In such a model the proba- 
bility of choosing the shorter branch is given by 


(k + gis(t))* 


E+ ont? ++ eae E 


Pis(T) = 


where pj,(t) (respectively pa(t)) is the probability of 
choosing the shorter (longer) branch, gjs(t) (Ya(T)) is 
the total amount of pheromone on the shorter (longer) 
branch at a time t. The parameter k is the degree of at- 
traction attributed to an unmarked branch. It is needed 
to provide a non-null probability of choosing a path not 
yet marked by pheromone. The exponent œ provides 
a nonlinear behavior. 

In our MA model, the perception function w’"(-) is 
defined, Vm € {fw, bw}, as 


u"(v,c,i,v’.c’,j,T) 
= (k + E[x‘(z, v)” 
Deer vh eNex (vc) (k + Eja” (T, v”)pe 
(69.16) 


’ 


where k and «œ have the same meaning as in (69.15), 
E[x‘(t,v)] gives the mean value of the concentra- 
tion of pheromone at a time t in position v on the 
ground, and corresponds to g(t). The computation of 
E[x‘(t, v)] will be addressed in (69.18). The function 
Next” (v’,c’) gives the set of pairs {(c’”, v’”)} such that 
the agent of class c” in position v” perceives a message 
of type m emitted by the agent of class c’ in posi- 
tion v’. Figure 69.11a helps to interpret (69.16). The 
multiple box stands for all the agents receiving a mes- 
sage m sent by the agent of class c’ in position v’. 
The value of u” (v, c, i, v’,c’,j, T) is proportional to the 
mean pheromone concentration of the agent in class c 


b) 


A (£, bi) 


= (c, v) 


SS (£ ba) 


Fig. 69.11a,b Perception function description. (a) General case, 
(b) example of scenario in Fig. 69.9b 


at position v with respect to the sum of the mean con- 
centrations of all the agents that receive message m by 
the agent in class c’ and position v’. For instance, we 
consider the scenario depicted in Fig. 69.11b, where 
a class n agent in position bọ sends messages of type 
fw to two other class t agents at position bı and bz, 
and we compute ui (bo, ft, i, bo, n, j, T). In such case, 
the evaluation of function Next” (bo, n) gives the set 
of pair {(t, b1), (t, b2)} and the value of the function 
is 


u™ (b2, t, i, bo, n, j, T) 
(k + E[r" (t, b2)])“ 


KOTAM GTA 
(69.17) 


As a final remark, we highlight that u” (-) does not 
depend on the state variables i and j of the sender and 
receiver agents even if these variables appear in the def- 
inition of u” (-) ((69.16)). Instead, u” (-) depends on the 
whole probability distribution x®(t, v) needed to com- 
pute the mean value E[z‘“(t, v)]. 


Generation and Acceptance Probabilities 

As in Sect. 69.3, also in this ACO-MAM model we 
only allow 8; m) =0 or 8; jm) =] and a; (m) = 
0 or a; jm) =] Vc,m. In particular, for the terrain 
agent MA’, messages of type fw are sent with proba- 
bility gi, ,(fw) = 1, and are accepted with probability 
a, a+pr@W) = 1 only in a t; state inducing a transi- 
tion to a f(j41)¢ state. An analogous behavior is fol- 
lowed during emission and reception of messages of 
type bw. 


E[a'(c, bi)] 
25 


2 


ES) 


0.5 


> 
50 60 70 80 
T 


0 
0 10 20 


30 40 


Fig. 69.12 Mean pheromone concentration with A = 1.0, 
jt = 1 and n = 1 for the equal branches experiment 


An Intelligent Swarm of Markovian Agents | 69.4 Ant Colony Optimization 


a) E[a'(c, bi)] b) £[a'(z, bi)] 
84 84 
7 7 
6 6 
5 5 
4 4 
3 3 
2 — Short: bo 2 —— Short: bo 
7 l o o oe A a a 
0 0 
0 10 20 30 40 50 60 70 80 O 10 20 30 40 50 60 70 80 


6 
S 
4 
3 
2 — Short: bo — Short: bo 
1 ae i oe re Wa Long: b LIers. | l l Long: b; 
oh — | | “r-l. > > 
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 
T T 
e) E[x'(z, b] f) El'(z, bi)] 
Doe 3.5 
31 
2 
25 
eS) 2 
1 15 
il 
0.5 —— Short: bo —— Short: bo 
NG 0.5 |__| | | ss == Long: b; 
0 F ae x 
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 
T T 


Fig. 69.13a-f Mean pheromone concentration for the case with two different branches. (a) Mean pheromone concen- 
tration A = 1.0, u = 0 and ņ = 1, (b) mean pheromone concentration A = 1.0, u = 0 and 7 = 10, (c) mean pheromone 
concentration A = 1.0, u = 0.5 and n = 1, (d) mean pheromone concentration A = 1.0, u = 0.5 and ņ = 10, (e) mean 
pheromone concentration A = 1.0, u = 2 and ņ = 1, (f) mean pheromone concentration 2 = 1.0, u = 2 and n = 10 


69.4.2 Numerical Results for a class c agent, E[x“(t, v)], defined as 
for ACO Double 
Bridge Experiment E[x‘(t, v)] = > (Vv, c)I(s) , (69.18) 
sEese 


We have performed several experiments on the ACO 
model. In particular, we study the mean value of the where S° denotes the state space of a class c agent, I(s) 
concentration of pheromone at a time t in position v represents the pheromone level in state s, and it corre- 


1357 


7°69 | 4 Hed 


1358 Part F | Swarm Intelligence 


S°69 | 4 Hed 


a) Elat, bD] 


— Short: bo 
Medium: b; 


a : 
O 10 20 30 40 50 60 70 80 


b) E[z'(z, b)] 
7 


6 

5 

4 

3 

2 — Short: bọ 
spes Medium: b, 


pE Long: bz 
0 i Ass, 
O 10 20 30 40 50 60 70 80 
T 


Fig. 69.14a,b Mean pheromone concentration for the case 
with three different branches. (a) n = 10, (b) n = 10 


sponds to 
I(s) =i, Vs € {tj} U {tig} U {tip} (69.19) 


This value is used in (69.16) to compute u”(-) which, 
as previously said, rules the ant’s probability to follow 
a specific path; therefore, such performance index pro- 
vides useful insights of the modeled ant’s behavior. 


69.5 Conclusions 


In this work, we have presented how the Markovian 
agents performance evaluation formalism can be used 
to study swarm intelligent algorithms. Although the 
formalism was developed to study largely distributed 
systems like sensor networks, or physical propagation 
phenomena like fire or earthquakes, it has been proven 
to be very efficient in capturing the main features of 
swarm intelligence. 

Beside the two cases presented in this chap- 
ter, routing in WSNs and ant colony optimiza- 


We consider three scenarios depicted in Fig. 69.9, 
the labels bj denote the positions where we compute 
the mean value of the concentration of pheromone. In 
all the experiments, the intensity of the pheromone trail 
is discretized in P = 8 levels. 

In Fig. 69.12, the mean pheromone concentration 
E[x“(t, b;)] over the time for the equal branches experi- 
ment is plotted. As it can be seen, both mean pheromone 
concentrations have exactly the same evolution proving 
that ants do not prefer one of the routes. 

The case with two different branches is considered 
in Fig. 69.13. The speed of the ants (i. e., parameter 77) is 
considered in the column (the left column corresponds 
to 7 = 1.0 and the right column to 7 = 10), while the 
evaporation of the pheromone is taken into account in 
the rows (respectively with u = 0, u = 0.5, and u = 
2). When no evaporation is considered (Fig. 69.13a,b), 
both paths are equally chosen due to the finite amount 
of the maximum pheromone level considered in this 
work. However the shorter path reaches its maximum 
level earlier than the longer route. In all the other cases, 
it can be seen that the longer path is abandoned after 
a while in favor of the shorter one. The evaporation of 
the pheromone and the speed of the ants both play a role 
in the time required to drop the longer path. Increasing 
either of the two, reduces the time required to discover 
the shorter route. 

Finally, Fig. 69.14 considers a case with three 
branches of different length and different evaporation 
levels (7 = 1 and ņ = 10). Also in this case the model 
is able to predict that ants will choose the shortest 
route. It also shows that longer paths are dropped in 
an order proportional to their length: the longest route 
is dropped first, and the intermediate route is dis- 
carded second. Also in this case, the evaporation rates 
determine the speed at which paths are chosen and 
discarded. 


tion, the formalism is capable of considering other 
cases like Slime Mold models. Future research lines 
will try to emphasize the relations between Marko- 
vian agents and swarm intelligence, trying to in- 
tegrate both approaches: using Markovian agents 
to formally study new swarm intelligent algo- 
rithms, and use swarm intelligent techniques to 
study complex Markovian agents models in order 
to find optimal operation points and best connection 
strategies. 


An Intelligent Swarm of Markovian Agents 


References 


References 


69.1 


69.2 


69.3 


69.4 


69.5 


69.6 


69.7 


69.8 


69.9 


M.G. Hinchey, R. Sterritt, C. Rouff: Swarms and 
swarm intelligence, IEEE Comput. 40(4), 111-113 
(2007) 

M. Dorigo, T. Stiitzle: Ant Colony Optimization (MIT, 
Cambridge 2004) 

K. Li, K. Thomas, L.F. Rossi, C.-C. Shen: Slime mold 
inspired protocols for wireless sensor networks, 
2nd IEEE Int. Conf. Self-Adapt. Self-Organ. Syst. 
(SASO), Venice (2008) pp. 319-328 

K. Li, C.E. Torres, K. Thomas, L.F. Rossi, C.-C. Shen: 
Slime mold inspired routing protocols for wire- 
less sensor networks, Swarm Intell. 5(3/4), 183-223 
(2011) 

D. Cerotti, M. Gribaudo, A. Bobbio: Performabil- 
ity analysis of a sensor network by interacting 
Markovian agents, 4th IEEE Int. Workshop Sensor 
Netw. Syst. Pervasive Comput. (PerSens), Hong Kong 
(2008) pp. 300-305 

A. Bobbio, D. Bruneo, D. Cerotti, M. Gribaudo: 
Markovian agents: A new quantitative analytical 
framework for large-scale distributed interacting 
systems, IIIS Int. Conf. Design Model. Sci. Educ. 
Technol. (DeMset), Orlando (2011) pp. 327-332 

M. Gribaudo, A. Bobbio: Performability analysis of 
a sensor network by interacting markovian agents, 
8th Int. Workshop Perform. Model. Comput. Com- 
mun. Syst. (PMCCS), Edinburgh (2007) 

A. Bobbio, D. Cerotti, M. Gribaudo: Presenting dy- 
namic Markovian agents with a road tunnel ap- 
plication, 17th IEEE/ACM Int. Symp. Model. Anal. 
Simul. Comput. Telecommun. Syst. (MASCOTS), Lon- 
don (2009) 

D. Bruneo, M. Scarpa, A. Bobbio, D. Cerotti, 
M. Gribaudo: Analytical modeling of swarm intel- 


69.10 


69.11 


69.12 


69.13 


69.14 


69.15 


69.16 


69.17 


ligence in wireless sensor networks, 4th Int. Conf. 
Perform. Eval. Methodol. Tools (Valuetools), Pisa 
(2009) 

D. Cerotti, M. Gribaudo, A. Bobbio, C.T. Calafate, 
P. Manzoni: A Markovian agent model for fire prop- 
agation in outdoor environments, 7th Eur. Perform. 
Eng. Workshop (EPEW), Bertinoro (2010) pp. 131- 
146 

D. Bruneo, M. Scarpa, A. Bobbio, D. Cerotti, M. Grib- 
audo: Adaptive swarm intelligence routing algo- 
rithms for WSN in a changing environment, 9th 
Annu. IEEE Conf. Sensors (SENSORS), Waikoloa (2010) 
pp. 1813-1818 

D. Bruneo, M. Scarpa, A. Bobbio, D. Cerotti, M. Grib- 
audo: Markovian agent modeling swarm intelli- 
gence algorithms in wireless sensor networks, Per- 
form. Eval. 69(3/4), 135-149 (2011) 

M. Saleem, G.A. Di Caro, M. Farooq: Swarm intel- 
ligence based routing protocol for wireless sensor 
networks: Survey and future directions, Inf. Sci. 
181(20), 4597-4624 (2011) 

K. Trivedi: Probability @7 Statistics with Reliabil- 
ity, Queueing & Computer Science Applications, 2nd 
edn. (Wiley, New York 2002) 

M. Paone, L. Paladina, D. Bruneo, A. Puliafito: 
A swarm-based routing protocol for wireless sensor 
networks, 6th IEEE Int. Symp. Netw. Comput. Appl. 
(NCA), Cambridge (2007) pp. 265-268 

J. Deneubourg, S. Aron, S. Goss, J.M. Pasteels: The 
self-organizing exploratory pattern of the argen- 
tine ant, J. Insect Behav. 3(2), 159-168 (1990) 

S. Goss, S. Aron, J. Deneubourg, J. Pasteels: Self- 
organized shortcuts in the Argentine ant, Natur- 
wissenschaften 76(12), 579-581 (1989) 


1359 


69 | 4 Hed 


1361 


70. Honey Bee Social Foraging Algorithm 
for Resource Allocation 


Jairo Alonso Giraldo, Nicanor Quijano, Kevin M. Passino 


Bioinspired mechanisms are an emerging area in 
the field of optimization, and various algorithms 
have been developed in the last decade. We in- 
troduce a novel bioinspired model based on the 


70.1.3 Dance Strength Determination.... 1363 
70.1.4 Explorer Allocation 
and Forager Recruitment ........... 1364 


70.2 Application in a Multizone Temperature 


social behavior of honey bees during the foraging Control Grid................ sees tees teseeseseees 1365 
process, and we show how this algorithm solves 70.2.1 Hardware Description Menoe 1366 
a class of dynamic resource allocation problems. 70.2.2 Other Algorithms 

To illustrate the practical utility of the algorithm, for Resource Allocation .............. 1366 
we show how it can be used to solve a dynamic 70,3 Results... 1371 


voltage allocation problem to achieve a maximum 
uniform temperature in a multizone temperature 


70.3.1 Experiment l: 
Maximum Uniform Temperature . 1372 


grid. Its behavior is compared with other evolu- 70.3.2 Experiment Il: Disturbance......... 1372 
tionary algorithms. 70.3.3 Experiment Ill: 
Multiple Set Points .................0. 1372 
70.1 Honey Bee Foraging Algorithm ERNES A 1363 70.4 Discussion EEPE EETAS TOTT ITO TER 1373 
70.1.1 Landscape of Foraging 70.5 Conclusions 1374 
POTEAU EY ccc casccccadbyeescasyeeccs 1363 gt iia ia a Rae 
70.1.2 Roles and Expedition of Bees ..... 1363 Referentes. osc2c.sesdccessaddxacseeiaccasntedarmniedensalst 1374 


Over several decades researchers’ interest in under- 
standing the patterns and collective behaviors of some 
organisms has increased because of the possibility 
of generating mathematical models that can be used 
for solving problems [70.1]. These bioinspired mod- 
els have been used to develop robust technological 
solutions in different research fields [70.2]. One of 
the first models based on natural behaviors is the ge- 
netic algorithm (GA), proposed by Holland in [70.3]. 
This method reproduces the concepts of evolution con- 
sidering natural selection, reproduction, and mutation 
in organisms. Many variations have been developed 
since then [70.4], and a wide variety of applica- 
tions have been implemented [70.5,6]. A sub-field 
of bioinspired algorithms is the so-called swarm in- 
telligence [70.1,7], which is inspired by the collec- 
tive behavior of social animals that are able to solve 
distributed and complex problems following individ- 
ual simple rules and producing emerging behaviors. 
Swarm intelligence mainly refers to those techniques 


inspired by the social behavior of insects, such as 
ants [70.8] and bees [70.9-11], or the social inter- 
action of different animal societies (e.g., flocks of 
birds) [70.12]. Ant colony optimization (ACO), as in- 
troduced by Dorigo et al. [70.13], mimics the foraging 
behavior of a colony of ants, based on pheromone 
proliferation, and it has been used in the solution 
of optimization problems [70.1, 14], and in some en- 
gineering applications [70.15-17]. Another common 
approach is particle swarm optimization (PSO), which 
mimics the behavior of social organisms that move 
according to the knowledge of their neighbors’ good- 
ness, and it is able to solve continuous optimization 
problems [70.18]. This technique has been widely im- 
plemented in a variety of applications, such as econom- 
ical dispatch [70.19, 20], feature selection [70.21], and 
some resource allocation problems [70.22, 23], to name 
just a few. 

There are also several bioinspired techniques based 
on the collective behavior of foraging bees, and 


v 
fa] 

= 

“= 
" 
~ 
(=e) 


1362 PartF | Swarm Intelligence 


02 | d Hed 


each one has different characteristics and applications. 
In [70.24], a decentralized honey bee algorithm is pre- 
sented, which is based on the distribution of forager 
bees amongst flower patches, which occurs in such 
a way that the nectar intake is maximized. This tech- 
nique has been applied to Internet servers hosting dis- 
tribution. Tereshko in [70.25] also developed a model of 
the foraging behavior of a honey bee colony based only 
on the recruitment and abandonment process, taking 
into account just the local information of a food source. 
However, in [70.26], the algorithm was improved by 
considering either local and global information of food 
sources. Another approach of honey bee foraging al- 
gorithms was developed by Karaboga [70.27], and 
it is called the artificial bee colony (ABC), which 
can be used to solve unconstrained optimization prob- 
lems. In [70.28], a comparison of the ABC algorithm 
performance was made with other common heuris- 
tic algorithms, such as genetic algorithms and par- 
ticle swarm optimization. The authors conclude that 
ABC can be used for multivariable and multimodal 
function optimization. Several applications have been 
developed [70.29-31], and some improvements have 
been made in order to solve constrained optimization 
problems [70.32]. Teodorović and Dell’ Orco in [70.33] 
introduced another algorithm based on honey bee for- 
aging called bee colony optimization (BCO). This 
technique is very similar to ABC and follows almost 
the same steps of exploring, foraging, and recruitment 
based on the waggle dance. However, in ABC, the ini- 
tial population is distributed in such a way that scouts 
and foragers are in equal proportion, while in BCO 
the initial population distribution is not fixed. Some 
applications have been developed using BCO to solve 
difficult optimization problems, such as combined heat 
and power dispatch [70.34], and job scheduling [70.35]. 
There are many other applications, and we refer the 
reader to [70.36] for an extensive literature review of 
the field. 

In general, none of the previous optimization meth- 
ods based on honey bee social foraging attempt to 
mimic the whole behavior of the foraging process. They 
mainly concentrate on the communication between the 
agents (bees), which is achieved through the waggle 
dance. One of the goals of this chapter is to show 
another swarm intelligence method (i. e., a honey bee 
social foraging) that mimics very closely the real be- 
havior of a hive (or even multiple hives) of bees, in 
order to solve dynamic resource allocation problems. 
This method is based on the models obtained by Seeley 
and Passino in [70.37,38], where each bee can be an 


explorer, an employed forager, an observer, or a rester. 
The foraging process consists of exploring a landscape 
with different profitability sites. Hence, if a site is good 
enough, the explorer will try to recruit other bees in the 
hive using the waggle dance, which varies its intensity 
according to the quality of the site and the nectar un- 
loading time. The observers will tend to follow the bees 
with the higher dance intensity and they may become 
employed foragers. If a site is no longer good enough, 
the bees may tend to become observers and will try to 
follow another waggle dance. One of the advantages of 
this method is that each bee only considers local in- 
formation about its position and the profitability of the 
forage site. Besides, the communication is only consid- 
ered in the waggle dance process, which depends on 
the nectar unloading wait time. Hence, we do not need 
to have full information of each agent. However, with 
only the unloading wait time information, an emerging 
behavior is produced, and complex resource allocation 
problems can be solved. This method is based on exper- 
imental results and imitates almost the whole behavior 
of honey bees during the foraging process, which is not 
the case with the other approaches presented before, 
which only considered a few actions of the foraging 
activity. On the other hand, the utility of the theoret- 
ical concepts that are introduced in this chapter are 
illustrated in an engineering application, which consists 
of a multizone temperature control grid. These kinds 
of problems are very important in commercial and in- 
dustrial applications, including the distributed control 
of thermal processes, semiconductor processing, and 
smart building temperature control [70.39-42]. Here, 
we use a multizone grid similar to the one in [70.43], 
with four zones, each one with a temperature sensor, 
and a lamp that varies its temperature. The complexity 
of these kinds of problems arises mainly due to the in- 
terzone effects (e.g., lamps affecting the temperature in 
neighboring zones), ambient temperature and external 
wind currents, zone component differences, and sensor 
noise. This is why common control strategies cannot 
be applied. For this reason, different experiments are 
implemented in order to observe the performance of 
the algorithm under different conditions. Besides, we 
compare its behavior with two common evolutionary al- 
gorithms, i.e., genetic algorithm and PSO, which have 
been selected because of their low computational cost 
and their high capability to solve optimization prob- 
lems. These algorithms have been modified in order to 
solve dynamic resource allocation problems, and their 
behavior can be compared with the honey bee social 
foraging algorithm. 


Honey Bee Social Foraging Algorithm 


70.1 Honey Bee Foraging Algorithm 


This chapter is organized as follows. First, in 
Sect. 70.1, we introduce the honey bee social foraging 
algorithm. Then, in Sect. 70.2 the multizone tempera- 
ture problem is presented and the other two evolution- 


70.1 Honey Bee Foraging Algorithm 


The honey bee social foraging algorithm models the 
behavior of social honey bees during nectar foraging, 
based on experimental studies summarized in [70.37] 
and some ideas from other mathematical models. This 
algorithm models some activities such as exploration 
and foraging, nectar unload, dance strength decisions, 
explorer allocation, recruitment on the dance floor, and 
interactions with other hives. The theory and the experi- 
ments are based on the work developed by Quijano and 
Passino in [70.43]. 


70.1.1 Landscape of Foraging Profitability 


The landscape is assumed as a spatial distribution of 
forage sites with encoded information of the foraging 
profitability that quantifies the distance from the hive, 
nectar sugar content, nectar abundance, and any other 
relevant site variables. There is a number of B bees that 
are represented by a two-dimensional position 6! € R?, 
fori=1,2,...,B. During foraging, bees sample a for- 
aging profitability landscape denoted by J;(@) € [0, 1], 
which is proportional to the profitability of nectar at lo- 
cation 0. Hence, J¢(@) = 1 represents a location with 
the highest possible profitability and J;(@) = 0 repre- 
sents a location with no profitability. 

As an example, assume the foraging landscape J;(@) 
is zero everywhere except at forage sites. We could have 
four forage sites, indexed by j = 1,2,3,4, centered at 
various positions that are initially unknown to the bees. 
Each site can be represented as a cylinder with radius 
e}, and height N} € [0, 1] that is proportional to nectar 
profitability. We may also assume that the profitability 
of a bee being at site j decreases as the number of bees 
visiting that site increases. This can be denoted by sj, 
which in behavioral ecology theory is called the suit- 
ability function [70.44]. 


70.1.2 Roles and Expedition of Bees 
There are several kinds of bees involved in the foraging 


process during an expedition, and each kind has a differ- 
ent function. An expedition can be considered as a time 


ary strategies, genetic algorithm and PSO for resource 
allocation are introduced. The results and comparisons 
are presented in Sects. 70.3 and 70.4, and in Sect. 70.5 
some conclusions are drawn. 


instant where each bee executes a function according 
to its role. There are By(k) employed foragers that ac- 
tively bring nectar back from some site, and some of 
them dance to recruit new bees if the site is good. B.(k) 
explorer foragers go to random positions in the envi- 
ronment, bring their nectar back if they find any, dance 
to recruit, and they can become foragers if they find 
a relative good site. There are By(k) = Bo(k) + B,(k) 
unemployed foragers, with B,(k) bees that rest (or are 
involved in some other activity), and B,(k) that ob- 
serve the dances of employed and explorer foragers on 
the dance floor. Some of the observers will follow the 
dances. 

We ignore the specific path used by the foragers on 
expeditions and we assume that a bee samples the for- 
aging profitability landscape once on its expedition, and 
this value is held when the bee returns to the hive. Let 
the foraging profitability assessment by the employed 
forager or explorer i be 


if Je(0'(k)) 
+oj(k) > 1, 
Fİ (k) = J0) + oik) if1> HO'W) 
+@;(k) > €n, 
: if JCO’) 
+oilk) <en, 


where 6'(k) represents the position of the i-th bee at 
the k-th expedition, and wi(k) is the profitability as- 
sessment noise, which can be considered uniformly dis- 
tributed between (—0.1, 0.1). The value €„ sets a lower 
threshold on site profitability, and here we use €, = 
0.1. 


70.1.3 Dance Strength Determination 


The number of waggle runs of bee i at the k-th expe- 
dition is called dance strength and is denoted by Li(k). 
The unemployed foragers have L;(k) = 0, and the em- 
ployed foragers that have F'(k) = 0 will have Li(k) = 0 


1363 


02 | 4 Hed 


1364 Part F 


Swarm Intelligence 


LoL | d Hed 


since they do not find a location above the profitabil- 
ity threshold €,, and for this reason they will become 
unemployed foragers. 


Unloading Waiting Time 
Now, we will explain dance strength decisions for the 
employed foragers and explorers that find a site of suf- 
ficiently good profitability and have F! (k) > en. Firstly, 
we have to model the unloading wait time in order to re- 
late it with the dance strength. Let F,(k) = an Fi(k) 
be the total nectar profitability assessment at time k for 
the hive and F Ck) be the quantity of nectar gathered 
for a profitability assessment F'(k). We assume that 
Fi (K) = aF'(k), where œ > 0 is a proportionality con- 
stant. We may choose œ = 1, such that the total hive 
nectar influx Fig(k) is equal to the total nectar profitabil- 
ity assessment. Suppose that the number of food-storer 
bees is sufficiently large so the wait time W'(k) that 
bee i experiences is given by 

W'(k) = y max {Fig(k) + œi, (k), 0} , (70.1) 
where y is a scale factor and wi, (k) is a random 
variable uniformly distributed in (—@,),@ )) that rep- 
resents variations in the wait time a bee experiences. 
When the total nectar influx is maximum, the value of 
the wait time is approximately 30 s, based on the exper- 
iments in [70.45]. With this assumption we can obtain 
the values of y and wo from the fact that the maximum 
wait time from (70.1) is given by W(B+@.) = 30. 
Hence, it can be noted that Wwa is the variation in the 
number of seconds in wait time due to the noise, and 
Ww has to be set adequately. If we let ww, =5 and 
we have assumed that B = 200, we obtain two equa- 
tions and two unknowns, which gives y = 52/200 and 
Ww = 40. 


Dance Decision Function 
Now, we assume that each successful forager converts 
the wait time it experienced into a scaled version of an 
estimate of the total nectar influx that we define as 

Falk) = SW'(k) . (70.2) 
The value Fg (k) provides bee i a noisy estimate of the 
whole colony’s foraging performance, since it provides 
an indication of how many successful foragers are wait- 
ing to be unloaded [70.37]. The proportionality constant 
is ô > 0, and since W'(k) € [0, Y (B + wo )] = [0, 30] s, 
it implies that Fuh) € [0, 306]. In order to ensure that 
Îi (K) € [0, 1], we consider that 0 < ô < 5- 


With this estimation, each bee has to decide how 
long to dance according to some forage site variables 
that determine the energetic profitability (e.g., distance 
from hive, sugar content of nectar, nectar abundance), 
and some conditions that determine the threshold of 
the dance response (e.g., weather, time of day, colony’s 
nectar influx). The decision function is 


Li = max fp (Fw -Êa W) 0} (70.3) 
which indicates the number of waggle runs of bee i 
at expedition k. The parameter f > 0 has the effect of 
a gain on the rate of recruitment for sites above the 
dance threshold, and experimentally [70.37] we can set 
B = 100. 

When a bee has Li(k) > 0, it may consider dancing 
for her forage site. The probability that bee i will choose 
to dance for the site it is dedicated to is given by 


? 
B 


where ¢ € [0,1]; matching the behavior of what is 
found in experiments, we choose ¢ = 1. 


pri, k) = SLi(k)], 


70.1.4 Explorer Allocation 
and Forager Recruitment 


Bees that are not successful on an expedition, or those 
that do not consider dancing, become unemployed for- 
agers. Some of these bees will start to rest or they 
will become observers and they will start seeking danc- 
ing bees in order to get recruited. The probability that 
an unemployed forager or current rester bee will be- 
come an observer is po € [0, 1]. Based on the results 
in [70.46], we choose p = 0.35 in such a way that in 
times where there are no forage sites being harvested 
there can be about 35% of the bees performing as for- 
age explorers. 

If an observer bee does not find any dance to follow, 
it will go exploring. So we take the B,(k) observer bees 
and each one can become an explorer with probability 
De(k) or can follow the dance and become an employed 
forager with probability 1 — p,(k). We choose 


(70.4) 


ey 


e(k) = 
pelk) exp E 


where L,(k) = yo Li(k) is the total number of wag- 


i=l 


gle runs on the dance floor at step k. Notice that 


Honey Bee Social Foraging Algorithm 


70.2 Application in a Multizone Temperature Control Grid 


if L,(k) = 0, there are no dancing bees on the dance 
floor, so pe(k) =1 and all the observers will ex- 
plore (i.e., 35% of the unemployed foragers). Here, 
we choose o = 1000 since it produces patterns of 
foraging behavior in simulations that correspond to 
experiments. 

Now, we take the observer bees that did not go to 
explore and some of them will be recruited in order to 
follow the dance of bee i with probability 


Lith) 

a PPR 
De HW 
In this way, bees that dance more strongly will tend to 
recruit more foragers for their site. 


Algorithm 70.1 summarizes the pseudo-code of the 
honey bee social foraging algorithm described above. 


Dik) = (70.5) 


Algorithm 70.1 Honey Bee Social Foraging Algo- 
rithm 

1: Set the parameter values 

2: while Stopping criterium is not reached do 

3: | Determine number of bees at each forage site, 

and compute the suitability of each site. 
4: for Each employed forager and explorer do 
5: Define a noisy profitability assessment F'(k) 
according to the location 


6: if Fİ (k) > €, then 
T: if Bee is an employed forager then 
8: Stays that way 
9: else 
10: Bee becomes an employed forager 
11: end if 
12: else 
13: Bee becomes an observer or rester 
14: end if 
15: end for 


16: Compute the total nectar profitability, and total 
nectar influx 


17: for All employed foragers do 


18: Compute wait time Wi, and the noise for un- 
load wait time ww. 

19: Compute estimate of scaled total nectar in- 
flux Fig 

20: Compute dance decision function L; 

21; if L; = 0 then 

22: Bee becomes unemployed 

23: end if 

24: if Employed forager should not recruit then 

25: Li = 0. Bee i is removed from those that 

dance 
26: end if 
27: end for 


28: Determine L,. Employed foragers and successful 
forager explorers may dance based on sampling 
of profitability 

29: Send all employed foragers back the their pre- 
vious site after recruitment for the next expedi- 


tion 
30: for Unemployed foragers do 
31: We set wW =L= =0 
32: Unemployed foragers become observers with 


probability po. The remaining unemployed 
foragers become resters 


33: end for 

34: Set pe 

35: for Unemployed foragers do 

36: if rand < pe then 

37: Bee becomes explorer. Set explorer location 
for the next expedition 

38: end if 

39: for Unemployed observers do 

40: Unemployed observer will be recruited by 
bee i with probability p; 

41: end for 

42: end for 


43: end while 


70.2 Application in a Multizone Temperature Control Grid 


In order to apply the proposed algorithm in the context 
of a physical resource allocation problem, we imple- 
ment the multi-zone temperature control grid intro- 
duced in [70.43], with four zones as shown in Fig. 70.1. 
The relations between our problem and the proposed al- 
gorithm can be summarized as follows: 


1. We assume that there is a population of B indi- 
viduals (i. e., bees, chromosomes, or particles) that 
contains the information of a position 6;, where 
are se 

2. The search space is composed of allocation sites, 
which are denoted with a position R; and a width e}. 


1365 


7°02 | 4 Hed 


1366 Part F 


Swarm Intelligence 


Z'O | d Hed 


Temperature sensor 


LAMP ; LAMP 
Zone 1 ' Zone 2 

LAMP ; LAMP 
Zone 3 ' Zone 4 


Fig. 70.1 Layout for the multizone temperature control 
grid 


3. Each allocation site corresponds to the zone j in the 
temperature grid, for j = 1,...,4. 

4. Let T4 and T; be a temperature reference and the 
temperature for each zone j, respectively. 

5. We consider the temperature error for zone j as 
e = T? —T;, and if an individual is located in a zone, 
its fitness is given by ye, for y being a positive con- 
stant that sets the fitness value in the range of [0, 1]. 


70.2.1 Hardware Description 


A zone contains a temperature sensor LM35 and a lamp 
that varies its intensity in order to increase or de- 
crease the temperature of the zone. The data acquisition 
and the lamps’ intensity variations are performed us- 
ing a microcontroller PIC18F4550, which receives the 
temperature values (voltages between 0 and 5 V) and 
transmits them through the USB port to a PC using 
the USB-bulk communication class. We cannot guaran- 
tee that the four sensors have the same characteristics 
(they have +0.2°C typical accuracy, and +0.5°C 
guaranteed). With these temperature values, a Matlab 
program executes an iteration of one of the algorithms 
and sends pulse width modulation (PWM) width in- 
formation back to the peripheral interface controller 
(PIC) (Fig. 70.2). PWM signals are generated using 
four digital outputs of the PIC and a couple of tran- 
sistors that drive the amount of current and voltage 
necessary to control the lamps. The width of the PWM 
signal depends on the number of individuals (i. e., bees, 


Temperatures 
(0-5v) 
Sensors | ———~> S 
WwW 
val 
S 
© | ——>| Pc 
baam. 
PWM 2 
Lamps —— A 


Fig. 70.2 Layout of data acquisition and temperature 
control 


chromosomes, or particles) allocated on a site. Each 
individual is equivalent to a portion of the PWM, and 
a 100% duty cycle corresponds to 12 V of direct current 
(DC). 

We assume that there is a total amount of volt- 
age that can be distributed between the four zones. 
The goal is to allocate that voltage in such a way 
that the reference temperature for each zone can be 
achieved. However, to achieve this goal is complicated 
due to external effects, such as ambient temperature and 
wind currents, interzone effects, differences between 
the components of a zone, and sensor noise. For this 
reason, the total voltage amount has to be dynamically 
allocated, despite the external and internal effects. 


70.2.2 Other Algorithms 
for Resource Allocation 


In order to compare the behavior of the honey bee social 
foraging algorithm to solve dynamic resource allocation 
problems, two evolutionary algorithms were selected, 
i.e., the genetic algorithm (GA) and particle swarm 
optimization (PSO). These methods have been imple- 
mented in a wide variety of applications because of 
their low computational cost and their huge capability 
for solving optimization problems. Some implementa- 
tions for resource allocation can be found in [70.22, 
23,47]. We will show below that these algorithms can 
be adapted in such a way that these resource allocation 
problems can be solved. 


Genetic Algorithms 
A genetic algorithm (GA) is a random search algorithm 
based on the mechanics of natural selection, genetics, 
and evolution [70.48]. The basic structure of the pop- 
ulation is the chromosome. During each generation, 
chromosomes are evaluated based on a fitness function 


Honey Bee Social Foraging Algorithm 


70.2 Application in a Multizone Temperature Control Grid 


Population 


1011011 
0111001 
1100010 


New chromosomes 
will form a new 
population 


Good chromosomes 
are selected, 
and recombined 


1100010 ——+1101101 


0101010 
1000101 
1111101 


1111101 —+ 1110010 


Fig. 70.3 Selection and crossover in GA 


and some of them are stochastically selected depend- 
ing on their fitness values. The population evolves from 
generation to generation through the application of ge- 
netic operators [70.49]. 

The simplest GA form uses chromosome infor- 
mation, which is encoded into a binary string. The 
chromosomes are modified using three operators: se- 
lection, crossover, and mutation [70.3]. Selection is 
an artificial version of natural selection, and only the 
fittest chromosomes from the population are selected. 
With crossover, two parents are chosen for reproduc- 
tion, and a crossover site (a bit position) is randomly 
selected. The subsequences after the crossover site 


a) Temperature (°C) —— —— Number of bees 


30 A A 
25 

100 
20 80 


0 
1200 
Time (s) 


d i 
0 200 400 600 800 1000 


c) Temperature (°C) —— —— Number of bees 
4 A 


25 


20 80 


15 


d Ri 0 
0 200 1200 
Time (s) 


400 600 800 1000 


are exchanged with a probability pe, producing two 
offspring with information from both parents. Then, 
mutation randomly inverts a bit on a string with a very 
low probability, and introduces new information into 
the population at the bit level. Figure 70.3 illustrates 
the GA selection and crossover process for the simplest 
GA algorithm. 

To solve a resource allocation problem, we adjust 
this algorithm as follows: we set a population of B in- 
dividuals, each one with binary encoded information 
about its position 6;, for i = 1,...,B. There is a land- 
scape that contains 4 different resources sites, each one 
located in a position R; with a width e}, forj=1,...,4. 
Each resource site corresponds to a temperature zone 
in the multizone control grid. When an individual is lo- 
cated in a resource site j, the fitness of that individual 
is given by the error between the current temperature T; 
and the reference temperature T“, otherwise, the fitness 
is 0. As was pointed out before, each individual cor- 
responds to a portion of the total amount of voltage. 
For that reason, if the fitness is good, the population 
evolves, most of the individuals will have the same ge- 
netic information (position), and they will be allocated 


b) Temperature (°C) —— —— Number of bees 


304 A 
25 
100 
20 80 
€ 0 
0 200 400 600 800 1000 1200 
Time (s) 
d) Temperature (°C) — —— Number of bees 
4 A 


25 


100 


0 
1200 
Time (s) 


Fig. 70.4a-d Average temperature (solid lines) and number of bees per zone (stem plots) using the honey bee foraging 


algorithm 


1367 


7°02 | 4 Hed 


1368 PartF | Swarm Intelligence 


a) Temperature (CC) —— 
04 


c) Temperature (°C) — 
304 


25 
20 


15 


—— Number of chromosomes 
YW Wy A 


200 
160 
120 
80 
40 


600 800 


1000 
Time (s) 


—— Number of chromosomes 


VVW WW WWM A 


200 
160 
120 
80 


| | 40 
if 


w () 
1200 
Time (s) 


| Hi 
1000 


600 800 


v 
5 a) Temperature (°C) —— —— Number of particles 
= 304 i 
a 
i=) 
N 25 
200 
20 160 
120 
80 
15 A Oep 40 
BG) fi MSOC yah 
E — URSA O 
0 200 400 600 800 1000 1200 
Time (s) 


c) Temperature (°C) — 
4 


25 


—— Number of particles 


0 
1200 
Time (s) 


800 1000 


b) Temperature (°C) — —— Number of chromosomes 
304 \INWSVVN' 


25 
200 
20 160 
120 
80 
15 40 
pilie ! $ aN) 
0 200 400 600 800 1000 1200 
Time (s) 
d) Temperature (°C) — —— Number of chromosomes 
304 WW WV, 
25 
20 
D OP PE 
15 lld el] 


d 
thes) o 
ite 


bai 0 
0 200 400 600 800 1000 1200 
Time (s) 


b) Temperature (°C) — —— Number of particles 


30, 
25 
200 
20 160 
120 
80 
15 D OF 40 


DENT ON ca MMA MONCTON 
etl ca A 0 
0 200 400 600 00 1000 1200 
Time (s) 


d) Temperature (°C) — 
04 


—— Number of particles 


@ X 
g hata g 
t t 


Qa 
I 


y A Rs Ke ELENA Cet zie ames 
i tA ilk o 
0 200 400 600 800 1000 1200 
Time (s) 


Honey Bee Social Foraging Algorithm 


70.2 Application in a Multizone Temperature Control Grid 


Fig. 70.5a-d Average temperature (solid lines) and number of chromosomes per zone (stem plots) using the genetic 


algorithm < 


a) Temperature (°C) —— —— Number of bees 


4 A 
25 
200 
20 160 
120 


800 


0 200 600 1000 1200 
Time (s) 
c) Temperature (°C) —— —— Number of bees 
30 4 A 
25 
200 
20 160 
120 
dé f 0 
0 200 400 600 800 1000 1200 
Time (s) 


b) Temperature (°C) — —— Number of bees 
A A 
25 
200 
20 160 
120 
d - | HI 0 
0 200 400 600 800 1000 1200 
Time (s) 
d) Temperature (°C) — —— Number of bees 
30 A A 
I 
] 
l 
25 l 
] 
' 200 
20 l i 160 
eoa 120 
4 E 80 
15 H | +40 
il s Jo 
0 200 400 600 800 1000 1200 
Time (s) 


Fig. 70.7a-d Average temperature (solid lines) and number of bees per zone (stem plots) using the honey bee foraging 
algorithm. Dashed lines indicate the beginning and end of the disturbance 


Fig. 70.6a-d Average temperature (solid) and number of particles per zone (stem plots) using the PSO algorithm < 


at that site. Then, the amount of voltage applied to that 
zone increases as well as the temperature, which pro- 
vokes that the fitness associated to that site decreases, 
and it will become less profitable. For the next genera- 
tion, the individuals tend to be reallocated into another 
more profitable zone, and after some generations, the 
population is distributed in such a way that a uniform 
temperature is achieved for all sites. 


Particle Swarm Optimization 
Particle swarm optimization (PSO) is a population- 
based stochastic optimization technique, inspired by 
the social behavior of animals (e.g., bird flocks, fish 
schools, or even human groups) [70.7]. In PSO, the 
potential solutions, called particles, fly through the 
problem space by following the currently best particles. 
They have two essential reasoning capabilities: mem- 


ory of their own best position and knowledge of the 
global or their neighborhood’s best. Each particle in 
a population has the information about its current po- 
sition that defines a potential solution, and its fitness 
value associated to that position. A change of posi- 
tion of a particle is defined by the velocity, which is 
a vector of numbers that are added to the position co- 
ordinates in order to move the particle from one time 
step to another. At each iteration, a particle’s veloc- 
ity is updated depending on the difference between the 
individual’s previous best and current positions, and 
the difference between the neighborhood’s best and the 
individual’s current position. With these simple rules, 
individuals tend to follow the particles associated to the 
more profitable sites, and optimization problems can 
be solved. The details of this method are summarized 
in [70.7]. 


1369 


7°02 | 4 Hed 


1370 PartF | Swarm Intelligence 


z°02 | d Hed 


a) Temperature (°C) — —— Number of chromosomes 


04 


L ia Ilina a DENIC) () 
0 200 400 600 800 1000 1200 


Time (s) 
c) Temperature (°C) — —— Number of chromosomes 
30 A 4 
25 
200 


20 


IS i is 40 
10 
800 1000 1200 
Time (s) 

a) Temperature (°C) — — Number of particles 

304 

25 

200 
20 160 


Time (s) 
c) Temperature (°C) — —— Number of particles 
4 a 


25 


i 0 
0 200 400 600 800 1000 1200 
Time (s) 


b) Temperature (°C) — —— Number of chromosomes 


304 


25 


20 ) “| 160 


15 jR 


KEN 


0 2 ~ 600 800 1000 1200 
Time (s) 


d) Temperature (°C) — —— Number of chromosomes 


30,4 
25 
200 
20 160 
120 
80 
i)! 40 
PATH HE ect nn) 
0 200 400 600 800 1000 1200 
Time (s) 
b) Temperature (°C) — — Number of particles 


304 


25 


il i 
0 200 400 600 800 1000 1200 
Time (s) 


—— Number of particles 


fn lah 0 
0 200 400 600 800 1000 200 
Time (s) 


Honey Bee Social Foraging Algorithm | 70.3 Results 


Fig. 70.8a-d Average temperature (solid lines) and number of chromosomes per zone (stem plots) using the genetic 
algorithm. Dashed lines indicate the beginning and end of the disturbance < 


a) Temperature (°C) —— —— Number of bees 
4 A 


200 
160 
120 
Q 80 
15 BINS 40 
Cane z i me Jo 
0 200 400 600 800 1000 1200 
Time (s) 
c) Temperature (°C) —— —— Number of bees 
25 
200 
20 160 
120 
80 
15 4 40 


i; [eins eter LS net in li 5! 0 
200 400 600 800 1000 1200 
Time (s) 


b) Temperature (°C) — —— Number of bees 
A A 


25 


0 200 400 600 800 1000 1200 
Time (s) 


d) Temperature (°C) — —— Number of bees 
30 A A 


- — 0 
0 200 400 600 800 1000 1200 
Time (s) 


Fig. 70.10a-d Average temperature (solid lines) and number of bees per zone (stem plots) using the honey bee foraging 
algorithm. The dashed lines indicate the different set points for each zone 


Fig. 70.9a-d Average temperature (solid lines) and number of particles per zone (stem plots) using the PSO algorithm. 
Dashed lines indicate the beginning and end of the disturbance < 


To solve a dynamic resource allocation problem, 
an adjustment to this algorithm is made as follows. 
First, we assume a population of B particles, where 
each particle represents a one-dimensional position 6; 
in the search space. The fitness of each individ- 
ual is defined by the temperature error ye; shown 
above. When a particle has a good fitness, most of 


70.3 Results 


In order to compare the behavior of the proposed algo- 
rithms to solve the dynamic resource allocation prob- 
lem, three experiments are performed using the multi- 
zone temperature control grid described in Sect. 70.2.1. 
In the first experiment, we seek the maximum uniform 


the individuals will tend to fly to the same posi- 
tion, provoking a temperature increment and a de- 
crease of the error. For that reason, the particles 
are reallocated to a more profitable place, and after 
some generations, the population is distributed in such 
a way that a uniform temperature is achieved for all 
sites. 


temperature with a single population of B = 200 indi- 
viduals. The second experiment illustrates the response 
of the individuals when a disturbance is applied in the 
fourth zone. Finally, multiple temperature set points are 
assigned to each zone. Our results show the behavior 


1371 


€°02 | 4 Hed 


1372 Part F | Swarm Intelligence 


€°02 | d Hed 


a) Temperature (°C) — —— Number of chromosomes 


O 200 400 600 800 1000 1200 
Time (s) 


c) Temperature (°C) — —— Number of chromosomes 


“i ra” NN 
25 Sh a ore 
DORE 200 
20 160 
k 120 
D 80 
15 40 
GX - a ee gp ———__—- D0 
0 200 400 600 800 1000 1200 
Time (s) 


b) Temperature (°C) —— —— Number of chromosomes 


304 
25 
R >| 200 
20 |- 160 
ii 120 
7 80 
15 plc 40 


Di ———- %0 
600 800 1000 1200 
Time (s) 


Æ TEN S 
0 200 400 


d) Temperature (°C) — —— Number of chromosomes 


304 
DS) 
lone, 200 
20 |. 160 
120 
80 
15 |) 40 
0 200 400 600 800 1000 1200 
Time (s) 


Fig. 70.1la-d Average temperature (solid lines) and number of chromosomes per zone (stem plots) using the genetic 


algorithm. The dashed lines show the set point for each zone 


of the three algorithms applied in a resource allocation 
implementation under the same circumstances. 


70.3.1 Experiment I: 
Maximum Uniform Temperature 


In this experiment, we want to achieve a maximum 
uniform temperature for all zones. For this reason, we 
set reference temperatures Tř = 30, for all j = 1,...,4 
(this value cannot be achieved by the system). We have 
a population of 200 individuals and a PWM frequency 
of 70 Hz, where each individual corresponds to a duty 
cycle of 0.5% (i.e., each individual corresponds to 
0.06 V.) For example, 50 individuals in a zone are equal 
to 25% of the duty cycle and they correspond to 3 volts. 
Figures 70.4—70.6 show the temperature results and the 
number of individuals allocated in each zone. 


70.3.2 Experiment Il: Disturbance 
This experiment is similar to the first one, but now we 


add a controlled disturbance. An extra lamp is placed 
next to zone 4; it is turned on after 4 min, and is turned 


off 2 min later. When we apply this disturbance, the 
temperature in zone 4 increases drastically and site 4 
becomes the least profitable. Then, the number of in- 
dividuals in that site are reallocated, provoking a small 
increase in temperatures of the other three zones. Fig- 
ures 70.7—70.9 illustrate the behavior of the system 
applying the three algorithms. 


70.3.3 Experiment Ill: Multiple Set Points 


In this experiment we want to achieve multiple set 
points (26, 24, 27, and 25 °C, respectively, for each of 
the four zones), which are lower than the ones achieved 
before. Figure 70.10 presents the results obtained using 
the honey bee foraging algorithm. We can observe that 
the set points are never achieved, but the temperatures 
get very close to it, and the steady states are reached 
quickly. This is because the algorithm requires an error 
for each zone e; > 0, which implies that the tempera- 
tures are always below the set point. 

On the other hand, with the GA and PSO, the be- 
havior is very different. When one of the set points is 
achieved, the resources tend to reallocate to the other 


Honey Bee Social Foraging Algorithm | 70.4 Discussion 


a) Temperature (°C) —— —— Number of particles 
4 4 


L Hit ili i iti! 0 
0 200 400 600 800 1000 1200 


Time (s) 
c) Temperature (°C) —— —— Number of particles 
304 4 


200 
20 160 
120 
80 
J40 


iii v: Jo 
800 1000 1200 
Time (s) 


dil sti 
0 200 400 600 


b) Temperature (°C) — —— Number of particles 
4 A 


ei 0 
O 200 400 600 800 1000 1200 


Time (s) 
d) Temperature (°C) — —— Number of particles 
304 A 


dt bia al 
0 20! 400 600 800 1000 1200 
Time (s) 


Fig. 70.12a-d Average temperature (solid lines) and number of chromosomes per zone (stem plots) using the PSO. The 


dashed lines show the set point for each zone 


zones. Therefore, when all the zones reach their set 
points, all the fitnesses (which are proportional to the 
error values) are equal to 0, and the remaining resources 
should not be allocated. However, for these two meth- 
ods, most of the resources need to be allocated to any 
site, and the remaining resources stay in the last site, 
provoking a drastic increase of the temperature in only 
that zone. When the temperature in any other zone de- 


70.4 Discussion 


The experiments that were performed using the bioin- 
spired algorithms show behaviors common to each of 
the three techniques, and some advantages and disad- 
vantages can be discussed. In Sect. 70.3.1, we saw how 
the maximum uniform temperature can be achieved by 
the three methods implemented. We obtained the av- 
erage values and standard deviation of temperatures 
after ten experiments (Table 70.1). We observe that the 
genetic algorithm achieves the highest average temper- 
atures, but the standard deviations are also high, which 
means that the GA behavior is unsteadier. This is be- 


creases, agents tend to be reallocated to that new site 
very quickly, until the temperature increases again. Fig- 
ures 70.11 and 70.12 show this behavior for the GA and 
PSO, respectively, with a fixed set point of 26 °C for all 
four zones. The drastic changes produced by the non- 
needed resources can be observed, and it can be seen 
that these methods are not feasible for these kinds of 
problems. 


cause the GA is very susceptible to changes and, as 
soon as a zone becomes more profitable (the temper- 
ature decreases), most chromosomes tend to abandon 
their current positions and go to the more profitable 
one. That is why the number of chromosomes changes 
abruptly (Fig. 70.5), and the temperature variation is 
also high for each one of the ten experiments. On the 
other hand, the other algorithm’s reactions are slower, 
and the number of individuals remains almost con- 
stant (very low variations). Table 70.1 illustrates that 
the honey bee social foraging algorithm has the lowest 


1373 


702 | d Hed 


1374 Part F 


Swarm Intelligence 


02 | 4 Hed 


Table 70.1 Average temperatures and standard deviations for the last 10 min, using each algorithm 


Algorithms Zone 1 Zone 2 
Honey bee 27.28 + 0.12 27.26 = 0.12 
GA 29.78 + 0.38 29.8 + 0.3 
PSO 27.84 + 0.17 27.86 + 0.18 


temperature variation, which means that for every ex- 
periment, the maximum temperature achieved for each 
zone is practically the same. This is an advantage, be- 
cause repeatability is a very important characteristic in 
practical applications. Besides, the low variations in the 
amount of individuals (i. e., low voltage changes), im- 
ply a low deterioration of the electric elements. 

Section 70.3.2 shows the results when a disturbance 
is applied to the fourth zone. When this disturbance 
is turned on, the temperature increases in that zone, 
and that site becomes less profitable, provoking a re- 
allocation of the individuals into the other zones. Fig- 
ures 70.7—70.9 show how the resources are reallocated 
and the temperatures in the other sites increase. It can 
be observed that during the disturbance, the number of 
individuals in the fourth zone tends to 0 and, as soon 
as the disturbance is turned off, the temperature in that 
zone decreases. These results illustrate the robustness 
of the three techniques for external disturbances, and 
we can also observe that regardless of the type of distur- 
bance, the maximum uniform temperature is achieved. 


70.5 Conclusions 


In this chapter, a novel bioinspired method for dynamic 
resource allocation based on the social behavior of 
honey bees during the foraging process was presented, 
and an application that illustrates the validity of the ap- 
proach was studied. The application that we used is 
a multizone temperature control grid, where the objec- 
tive is to achieve some reference temperatures for each 
one of the zones, taking into account the complexity in- 
duced by the interzone effects and external or internal 
noise. Some comparative analyses have been developed 


References 


Zone 3 Zone 4 

Dy PP) se(0) 1 27.18 + 0.11 
29.75 + 0.34 29.76 + 0.36 
28.55 + 0.14 27.81 + 0.16 


Experiment 3 illustrates the behavior of the algo- 
rithms when low temperature set points are considered. 
We observed that most of the resources should be allo- 
cated to the sites, and low temperature references could 
not be achieved. This is because the individuals move 
from one place to another, looking for the more prof- 
itable site, i.e., the one with the lowest temperature. 
When all the set points are achieved, all profitability 
values are 0, and the individuals will remain in their 
current position. Hence, temperatures continue to rise, 
even if the reference has been achieved. Figures 70.11 
and 70.12 illustrate this behavior for the GA algorithm 
and PSO when a reference temperature of 26°C is set. 
However, the honey bee foraging algorithm can solve 
this kind of problem because of its capability to allocate 
only the necessary bees into the foraging places, and the 
non-needed resources are simply not used. This charac- 
teristic of our technique may be very useful in practical 
applications, such as in smart building temperature con- 
trol, where multiple temperature references for different 
rooms need to be achieved. 


with two evolutionary algorithms, i.e., GA and PSO. 
We can see that the proposed method has some advan- 
tages compared to other bioinspired methods due to its 
capability of allocating only the necessary resources, 
and the low variability of the number of individuals in 
each one of the four zones. Clearly, there are other ap- 
plications for the social foraging method for resource 
allocation. For instance, in the area of task allocation 
of agents, formation control, economic dispatch, and 
smart building temperature control. 


70.1 E. Bonabeau, M. Dorigo, G. Theraulaz: Swarm Intel- 
ligence: From Natural to Artificial Systems (Oxford 
Univ. Press, New York 1999) 

K.M. Passino: Biomimicry for Optimization, Control 


and Automation (Springer, London 2005) 


70.2 


70.3 J.H. Holland: Adaptation in Natural and Artifi- 
cial Systems: An Introductory Analysis with Appli- 
cations to Biology, Control, and Artificial Intelli- 
gence, 1st edn. (Univ. Michigan Press, Ann Arbor 


1975) 


Honey Bee Social Foraging Algorithm 


References 


70.4 


70.5 


70.6 


70.7 


70.8 


70.9 


70.10 


70.11 


70.12 


70.13 


70.14 


70.15 


70.16 


70.17 


70.18 


70.19 


T.P. Hong, WY. Lin, S.M. Liu, J.H. Lin: Dynamically 
adjusting migration rates for multi-population ge- 
netic algorithms, J. Adv. Comput. Intell. Intell. In- 
form. 11, 410-415 (2007) 

M. Affenzeller, S. Winkler: Genetic Algorithms and 
Genetic Programming: Modern Concepts and Prac- 
tical Applications, Vol. 6 (Chapman Hall/CRC, Boca 
Raton 2009) 

M.D. Higgins, R.J. Green, M.S. Leeson: A genetic al- 
gorithm method for optical wireless channel con- 
trol, J. Ligthwave Technol. 27(6), 760-772 (2009) 

J. Kennedy, R.C. Eberhart: Swarm Intelligence (Mor- 
gan Kaufmann, San Francisco 2001) 

M. Dorigo, C. Blum: Ant colony optimization the- 
ory: A survey, Theor. Comput. Sci. 344(2), 243-278 
(2005) 

D. Sumpter, D.S. Broomhead: Formalising the link 
between worker and society in honey bee colonies. 
In: Multi-Agent Systems and Agent-Based Sim- 
ulation, ed. by J. Sichman, R. Conte, N. Gilbert 
(Springer, Berlin 1998) pp. 95-110 

T.D. Pham, A. Ghanbarzadeh, E. Koc, S. Otri, 
S. Rahim, M. Zaidi: The bees algorithm — a novel 
tool for complex optimisation problems, Proc. 2nd 
Int. Virtual Conf. Intell. Prod. Mach. Syst. (Elsevier, 
Oxford 2006) pp. 454-461 

D. Teodorovic: Bee Colony Optimization (BCO), 
Studies in Computational Intelligence, Vol. 248 
(Springer, Berlin/Heidelberg 2009) 

R.C. Eberhart, Y. Shi, J. Kennedy: Swarm Intelli- 
gence, 1st edn. (Morgan Kaufmann, San Francisco 
2001) 

M. Dorigo, V. Maniezzo, A. Colorni: Ant system: Op- 
timization by a colony of cooperating agents, IEEE 
Trans. Syst. Man Cybern. B 26(1), 29-41 (1996) 

M. Dorigo, L.M. Gambardella: Ant colony system: 
A cooperative learning approach to the traveling 
salesman problem, IEEE Trans. Evol. Comput. 1(1), 
53-66 (1997) 

K.M. Sim, W.H. Sun: Ant colony optimization for 
routing and load-balancing: Survey and new di- 
rections, IEEE Trans. Syst. Man Cybern. A 33(5), 
560-572 (2003) 

J. Zhang, H.S.-H. Chung, A.W.-L. Lo, T. Huang: 
Extended ant colony optimization algorithm for 
power electronic circuit design, IEEE Trans. Power 
Syst. 24(1), 147-162 (2009) 

D. Merkle, M. Middendorf, H. Schmeck: Ant colony 
optimization for resource-constrained project 
scheduling, IEEE Trans. Evol. Comput. 6(4), 333-346 
(2002) 

R. Poli, J. Kennedy, T. Blackwell: Particle swarm op- 
timization: An overview, Swarm Intell. 1(1), 33-57 
(2007) 

A.l. Selvakumar, K. Thanushkodi: A new particle 
swarm optimization solution to nonconvex eco- 
nomic dispatch problems, IEEE Trans. Power Syst. 
22(1), 42-51 (2007) 


70.20 


70.21 


70.22 


70.23 


70.24 


70.25 


70.26 


70.27 


70.28 


70.29 


70.30 


70.31 


70.32 


70.33 


70.34 


Z.L. Gaing: Particle swarm optimization to solving 
the economic dispatch considering the generator 
constraints, IEEE Trans. Power Syst. 18(3), 1187-1195 
(2003) 

Y. Liu, G. Wang, H. Chen, H. Dong, X. Zhu, 
S. Wang: An improved particle swarm optimization 
for feature selection, J. Bionic Eng. 8(2), 191-200 
(2011) 

BY. Yin, JY. Wang: A particle swarm optimiza- 
tion approach to the nonlinear resource alloca- 
tion problem, Appl. Math. Comput. 183(1), 232-242 
(2006) 

S. Gheitanchi, F. Ali, E. Stipidis: Particle swarm 
optimization for adaptive resource allocation in 
communication networks, EURASIP J. Wirel. Com- 
mun. Netw. 2010, 9-21 (2010) 

S. Nakrani, C. Tovey: From honeybees to internet 
servers: Biomimicry for distributed management of 
internet hosting centers, Bioinspir. Biomim. 2(4), 
$182-S197 (2007) 

V. Tereshko: Reaction-diffusion model of a hon- 
eybee colony's foraging behaviour, Proc. 6th Int. 
Conf. Parallel Probl. Solving Nat. PPSN VI (Springer, 
London 2000) pp. 807-816 

V. Tereshko, A. Loengarov: Collective decision mak- 
ing in honey-bee foraging dynamics, Comput. Inf. 
Syst. 9(3), 1-7 (2005) 

B. Basturk, D. Karaboga: An artificial bee colony 
(abc) algorithm for numeric function optimization, 
IEEE Swarm Intell. Symp. 2006 (2006) pp. 12-14 

D. Karaboga, B. Basturk: A powerful and efficient 
algorithm for numerical function optimization: Ar- 
tificial bee colony (ABC) algorithm, J. Global Optim. 
39(3), 459-471 (2007) 

C. Ozturk, D. Karaboga, B. Gorkemli: Probabilistic 
dynamic deployment of wireless sensor networks 
by artificial bee colony algorithm, Sensors 11(6), 
6056-6065 (2011) 

W.Y. Szeto, Y. Wu, S.C. Ho: An artificial bee colony 
algorithm for the capacitated vehicle routing prob- 
lem, Eur. J. Oper. Res. 215(1), 126-135 (2011) 

C. Zhang, D. Ouyang, J. Ning: An artificial bee colony 
approach for clustering, Expert Syst. Appl. 37(7), 
4761-4767 (2010) 

D. Karaboga, B. Akay: A modified artificial bee 
colony (ABC) algorithm for constrained optimiza- 
tion problems, Appl. Soft Comput. 11(3), 3021-3031 
(2011) 

D. Teodorović, M. Dell'Orco: Bee colony optimiza- 
tion — A cooperative learning approach to complex 
transportation problems. In: Advanced OR and Al 
Methods in Transportation, ed. by A. Jaszkiewicz, 
M. Kaczmarek, J. Zak, M. Kubiak (Publishing House 
of Poznan University of Technology, Poznan 2005) 
pp. 51-60 

M. Basu: Bee colony optimization for combined 
heat and power economic dispatch, Expert Syst. 
Appl. 38(11), 13527-13531 (2011) 


1375 


02 | 4 Hed 


1376 =Part F 


Swarm Intelligence 


02 | 4 Hed 


70.35 


70.36 


70.37 


70.38 


70.39 


70.40 


70.41 


70.42 


C.S. Chong, A.I. Sivakumar, M.Y. Low, K.L. Gay: 
A bee colony optimization algorithm to job shop 
scheduling, Proc. 38th Conf. Winter Simul. (2006) 
pp. 1954-1961 

A. Kaur, S. Goyal: A survey on the applications of 
bee colony optimization techniques, Int. J. Com- 
put. Sci. Eng. 3(8), 3037-3046 (2011) 

T.D. Seeley: The Wisdom of the Hive (Harvard Univ. 
Press, Cambridge 1995) 

K.M. Passino, T.D. Seeley: Modeling and analysis 
of nest-site selection by honeybee swarms: The 
speed and accuracy trade-off, Behav. Ecol. Socio- 
biol. 59(3), 427-442 (2006) 

G. Obando, A. Pantoja, N. Quijano: Evolutionary 
game theory applied to building temperature con- 
trol, Proc. Nolcos (IFAC, Bologna 2010) pp. 1140- 
145 

A. Pantoja, N. Quijano, S. Leirens: A bioinspired ap- 
proach for a multizone temperature control system, 
Bioinspir. Biomim. 6(1), 16007-16020 (2011) 

N. Quijano, K.M. Passino: The ideal free distri- 
bution: Theory and engineering application, IEEE 
Trans. Syst. Man Cybern. B 37(1), 154-165 (2007) 

N. Quijano, A.E. Gil, K.M. Passino: Experiments 
for dynamic resource allocation, scheduling, and 


70.43 


70.44 


70.45 


70.46 


70.47 


70.48 


70.49 


control, IEEE Control 
(2005) 

N. Quijano, K.M. Passino: Honey bee social forag- 
ing algorithms for resource allocation: Theory and 
application, Eng. Appl. Artif. Intell. 23(6), 845-861 
(2010) 

S.D. Fretwell, H.L. Lucas: On territorial behavior 
and other factors influencing habitat distribution 
in bird. |. Theoretical development, Acta Biotheor. 
19, 16-36 (1970) 

T.D. Seeley, C.A. Tovey: Why search time to find 
a food-storer bee accurately indicates the relative 
rates of nectar collecting and nectar processing in 
honey bee colonies, Animal Behav. 47(2), 311-316 
(1994) 

T.D. Seeley: Division of labor between scouts and 
recruits in honeybee foraging, Behav. Ecol. Socio- 
biol. 12, 253-259 (1983) 

J. Alcaraz, C. Maroto: A robust genetic algorithm 
for resource allocation in project scheduling, Ann. 
Oper. Res. 102, 83-109 (2001) 

S.N. Sivanandam, S.N. Deepa: Introduction to Ge- 
netic Algorithms (Springer, Berlin 2007) 

M. Mitchel: An Introduction to Genetic Algorithms 
(MIT Press, Cambridge 1996) 


Syst. Mag. 25(1), 63-79 


1377 


71. Fundamental Collective Behaviors 
in Swarm Robotics 


Vito Trianni, Alexandre Campo 


; 7 71.3 Acting Together: Synchronization ......... 1381 

In this chapter, we present and discuss a num- 3 nny 
; ; 71.3.1 Variants of Synchronization 

ber of types of fundamental collective behaviors Behaving 1381 
studied within the swarm robotics domain. Swarm 71.3.2 Self-Organized Synchronization 
robotics is a particular approach to the design ii BIOUG EY iieiea 1382 
and study of multi-robot systems, which empha- 71.3.3 Self-Organized Synchronization 
sizes decentralized and self-organizing behavior in Swarm Robotics..................... 1382 
that deals with limited individual abilities, local FIZ OME Studies... seiere: 1383 
sensing, and local communication. The desired 
features for a swarm robotics system are flexibility 71.4 Staying Together: 
to variable environmental conditions, robustness Coordinated Motion..............0...0.0.0.0. 1383 
to failure, and scalability to large groups. These 71.4.1 Variants of the Coordinated 
can be achieved thanks to well-designed collec- Motion Behavior ....................06 1384 
tive behavior — often obtained via some sort of 71.4.2 Coordinated Motion in Biology ... 1384 
bio-inspired approach — that relies on cooperation 71.4.3 Coordinated Motion 
among redundant components. In this chapter, we in Swarm Robotics EADAE ERATEN 1385 
discuss the solutions proposed for a limited num- 71.4.4 Other Studies ..............cccecccseceees 1385 


ber of problems common to many swarm robotics 
systems — namely aggregation, synchronization, 
coordinated motion, collective exploration, and 


71.5 Searching Together: 
Collective Exploration.......................0.. 1386 
71.5.1 Variants of Collective Exploration 


decision making. We believe that many real-word BORAGE cscsccivacesnsescseneossetieters 1386 
applications subsume one or more of these prob- 71.5.2 Collective Exploration 
lems, and tailored solutions can be developed MT BOVEY cc ce csc ns paccgeavedegiousa te 1386 
starting from the studies we review in this chapter. 71.5.3 Collective Exploration 
Finally, we propose possible directions for future in Swarm Robotics... 1386 D 
research and discuss the relevant challenges to be + 
addressed in order to push forward the study and 71.6 Deciding Together: = 
the applications of swarm robotics systems. Collective Decision Making ................... 1388 x 
71.6.1 Variants of Collective Decision 
Pe P Making Behavior .................5.... 1388 
71.1 Designing Swarm Behaviours ............... 1378 71.6.2 Collective Decision Making 
71.2 Getting Together: Aggregation ............. 1379 in BUOY annenin 1389 
71.2.1 Variants 71.6.3 Collective Decision Making 
of Aggregation Behavior............. 1379 in Swarm Robotics...........00..0000. 1389 
71.2.2 Self-Organized Aggregation FIG Other Studies onsec 1390 
in Biological Systems ................. 1379 
71.2.3 Self-Organized Aggregation 71.7 Conclusions ....................ccceceeeeeeeeeeeeeees 1390 
in Swarm Robotics..................58 1380 


724 Other Studies ecusi 1381 References...........eeeeserererereesereees 1391 


2378 Part F | Swarm Intelligence 


lL | 4 Hed 


71.1 Designing Swarm Behaviours 


Imagine the following scenario: in a large area there 
are multiple items that must be reached, and possibly 
moved elsewhere or processed in some particular way. 
There is no map of the area to be searched, and the area 
is rather unknown, unstructured, and possibly danger- 
ous for the intervention of humans or any valuable asset. 
The items must be reached and processed as quickly as 
possible, as a timely intervention would correspond to 
a higher overall performance. This is the typical sce- 
nario to be tackled with swarm robotics. It contains all 
the properties and complexity issues that make a swarm 
robotics solution particularly appropriate. Parallelism, 
scalability, robustness, flexibility, and adaptability to 
unknown conditions are features that are required from 
a system confronted with such a scenario, and exactly 
those features are sought in swarm robotics research. 

Put in other terms, swarm robotics promises the so- 
lution of complex problems through robotic systems 
made up of multiple cooperating robots. With respect to 
other approaches in which multiple robots are exploited 
at the same time, swarm robotics emphasizes aspects 
like decentralization of control, limited individual abil- 
ities, lack of global knowledge, and scalability to large 
groups. 

One important aspect that characterizes a swarm 
robotics system concerns the robotic units, which are 
unable to solve the given problem individually. The 
limitation is given either by physical constraints that 
would prevent the single robot to individually tackle 
the problem (e.g., the robot has to move some items 
that are too heavy), or by time constraints that would 
make a solitary action very inefficient (e.g., there are 
too many items to be collected in a limited time). An- 
other source of limitation for the individual robot comes 
from its inability to acquire a global picture of the prob- 
lem, having only access to partial (local) information 
about the environment and about the collective activity. 
These limitations imply the need for cooperation to en- 
sure task achievement and better efficiency. Groups of 
autonomous cooperating robots can be exploited to syn- 
ergistically achieve a complex task, by joining forces 
and sharing information, and to distributedly undertake 
the given task and achieve higher efficiency through 
parallelism. 

The second important aspect in swarm robotics is 
redundancy in the system, which is intimately con- 
nected with robustness and scalability. Swarm robotics 
systems are made by homogeneous robots (or by rel- 
atively few heterogeneous groups of homogeneous 


robots). This means that the failure of a single or a few 
robots is not a relevant fact for the system as a whole, 
because the failing robot can easily be replaced by 
another teammate. Differently from a centralized sys- 
tem, in a swarm robotics system there is no single 
point of failure, and every component is interchange- 
able with other components. Redundancy, distributed 
control, and local interactions also allow for scalabil- 
ity, enabling the robotic system to seamlessly adapt to 
varying group sizes. This is a significant advantage with 
respect to centralized systems, which would present 
an exponential increase in complexity for larger group 
sizes. 

Because all the above features are desiderata, the 
problem remains as to how to design and implement 
such a robotic system. The common starting point in 
swarm robotics is the biological metaphor, for which 
the fundamental mechanisms that govern the organiza- 
tion of animal societies can be distilled in simple rules 
to be implemented in the robotic swarm. This approach 
allowed us to extract the basic working principles for 
many types of collective behavior, and several examples 
will be presented in this chapter. However, it is worth 
noting that swarm robotics systems are not constrained 
to mimicking nature. Indeed, in many cases there is 
no biological example to be taken as reference, or the 
mechanisms observed in the natural system are too dif- 
ficult to be implemented in the robotic swarm (e.g., 
odor perception is an open problem in robotics, prevent- 
ing easy exploitation of pheromone-based mechanisms 
by using real chemicals). Still, even in those systems 
that have no natural counterpart, the relevant property 
that should be present is self-organization, for which 
group behavior is the emergent result of the numer- 
ous interactions among different individuals. Thanks to 
self-organization, simple control rules repeatedly exe- 
cuted by the individual robots may result in complex 
group behavior. 

If we consider the scenario presented at the begin- 
ning of this chapter, it is possible to recognize a number 
of problems common to many swarm robotics systems, 
which need to be addressed in order to develop suitable 
controllers. One first problem in swarm robotics is hav- 
ing robots get together in some place, especially when 
the robotic system is composed by potentially many 
individuals. Getting together (i. e., aggregation) is the 
precondition for many types of collective behavior, and 
needs to be addressed according to the particular char- 
acteristics of the robotic system and of the environment 


Fundamental Collective Behaviors in Swarm Robotics | 71.2 Getting Together: Aggregation 1379 


in which it must take place. The aggregation problem 
is discussed in Sect. 71.2. Once groups are formed, 
robots need some mechanism to stay together and to 
keep a coherent organization while performing their 
task. A typical problem is, therefore, how to maintain 
such coherence, which corresponds to ensuring the syn- 
chronization of the group activities (Sect. 71.3), and to 
keep the group in coordinated motion when the swarm 
must move across the environment (Sect. 71.4). An- 
other common problem in swarm robotics corresponds 
to searching together and processing some items in the 
environment. To this aim, different strategies can be 
adopted to cover the available space, and to identify 
relevant navigation routes without resorting to maps 


71.2 Getting Together: Aggregation 


Aggregation is a task of fundamental importance in 
many biological systems. It is the basic behavior for 
the creation of functional groups of individuals, and 
therefore, supports the emergence of various forms of 
cooperation. Indeed, it can be considered a prerequi- 
site for the accomplishment of many collective tasks. 
In swarm robotics too, aggregation has been widely 
studied, both as a standing-alone problem or within 
a broader context. Speaking in general terms, aggrega- 
tion is a collective behavior that leads a group of agents 
to gather in some place. Therefore, from a (more or less) 
uniform distribution of agents in the available space, the 
system converges to a varied distribution, with the for- 
mation of well recognizable aggregates. In other words, 
during aggregation there is a transition from a homoge- 
neous to a heterogeneous distribution of agents. 


71.2.1 Variants of Aggregation Behavior 


Aggregation can be achieved in many different ways. 
The main issue to be considered is whether or not the 
environment contains pre-existing heterogeneities that 
can be exploited for aggregation: light or humidity gra- 
dients (think of flies or sow bugs), corners, shelters, and 
so forth represent heterogeneities that can be easily ex- 
ploited. Their presence can, therefore, be at the basis 
of a collective aggregation behavior, which, however, 
may not exploit interactions between different agents. 
Instead, whenever heterogeneities are not present (or 
cannot be exploited for the aggregation behavior), the 
problem is more complex. The agents must behave in 
order to create the heterogeneities that support the for- 
mation of aggregates. In this case, the basic mechanism 


and global knowledge (Sect. 71.5). Finally, to maintain 
coherence and efficiency, the swarm robotics system 
is often confronted with the necessity to behave as 
a single whole. Therefore, it must be endowed with 
collective perception and collective decision mecha- 
nisms. Some examples are discussed in Sect. 71.6. 
For each of these problems, we describe some semi- 
nal work that produced solutions in a swarm robotics 
context. In each section, we describe the problem along 
with some possible variants, the biological inspiration 
and the theoretical background, the relevant studies in 
swarm robotics, and a number of other works that are 
relevant for some particular contribution given to the 
specific problem. 


of aggregation relies on a self-organizing process based 
on a positive feedback mechanism. Agents are sources 
of some small heterogeneity in the environment (e.g., 
being the source of some signal that can be chemi- 
cal, tactile or visual). The more aggregated agents, the 
higher the probability to be attracted by the signal. 
This mechanism leads to amplification of small hetero- 
geneities, leading to the formation of large aggregates. 


71.2.2 Self-Organized Aggregation 
in Biological Systems 


Several biological systems present self-organized ag- 
gregation behavior. One of the best studied examples 
is given by the cellular slime mold Dictyostelium 
discoideum, in which aggregation is enabled by self- 
generated biochemical signals that support the migra- 
tion of cells and the formation of a multi-cellular 
body [71.1, 2]. A similar aggregation process can be ob- 
served in many other unicellular organisms [71.3]. So- 
cial and pre-social insects also present multiple forms of 
aggregation [71.4, 5]. In all these systems, it is possible 
to recognize two main variants of the aggregation pro- 
cess. On the one hand, the agents can emit a signal that 
creates an intensity gradient in the surrounding space. 
This gradient enables the aggregation process: agents 
react by moving in the direction of higher intensity, 
therefore aggregating with their neighbors (Fig. 71.1). 
On the other hand, aggregation may result from agents 
modulating their stopping time in response to social 
cues. Agents have a certain probability to stop and re- 
main still for some time. The vicinity to other agents 
increases the probability of stopping and of remaining 


cL | 4 Hed 


13380 Part F | Swarm Intelligence 


712 |d Hed 


d) 


Fig. 71.1a-d Aggregation process based on a diffusing signal that creates an intensity gradient. (a) Agents individually 
emit a signal and move in the direction of higher concentration. (b) The individual signals sum up to form a stronger 
intensity gradient in correspondence with forming aggregates. (c) A positive feedback loop amplifies the aggregation 


process until all agents are in the same cluster (d) 


within the aggregate, eventually producing an aggrega- 
tion process mediated by social influences (Fig. 71.2). 
In both cases, the same general principle is at work. 
Aggregation is dependent on two main probabilities: 
the probability to enter an aggregate, which increases 
with the aggregate size, and the probability to leave 
an aggregate, which decreases accordingly. This creates 
a positive feedback loop that makes larger aggregates 
more and more attractive with respect to small ones. 
Some randomness in the system helps in breaking the 
symmetry and reaching a stable configuration. 


71.2.3 Self-Organized Aggregation 
in Swarm Robotics 


On the basis of the studies of aggregation in biologi- 
cal systems, various robotic implementations have been 
presented, based on either of the two behavioral mod- 


els described above. Of particular interest is the work 
presented in [71.6], in which the robotic system was de- 
veloped to accurately replicate the dynamics observed 
in the cockroach aggregation experiments presented 
in [71.5]. In this work, a group of Alice robots [71.7] 
was used and their controller was implemented by 
closely following the behavioral model derived from 
experiments with cockroaches. The behavioral model 
consists of four main conditions: 


i) Moving in the arena center 

ii) Moving in the arena periphery 
iii) Stopping in the center 

iv) Stopping in the periphery. 


When stopping, the mean waiting time is influenced 
by the number of perceived neighbors (for more de- 
tails, see [71.6]). The group behavior resulting from the 


d) 


Fig. 71.2a-d Aggregation process based on variable probability of stopping within an aggregate. (a) Agents move ran- 
domly and may stop for some time (gray agent). (b) When encountering a stopped agent, other agents stop as well, 
therefore increasing the size of the aggregate. (c) The probability of meeting an aggregate increases with the aggregate 
size for geometric reasons. Social interactions modulate the probability of leaving the aggregate, which diminishes with 
the increasing number of individuals. (d) Eventually, all agents are in the same aggregate 


Fundamental Collective Behaviors in Swarm Robotics 


71.3 Acting Together: Synchronization 


interaction among Alice robots was analyzed with the 
same tools used for cockroaches [71.5, 6]. The compar- 
ison of the robotic system with the biological model 
shows a very good correspondence, demonstrating that 
the mechanisms identified by the behavioral model are 
sufficient to support aggregation in a group of robots, 
with dynamics that are comparable to that observed in 
the biological system. Additionally, the robotic model 
constitutes a constructive proof that the identified mech- 
anisms really work as suggested. 

This study demonstrates, in terms of simple rules, 
the approach of distilling the relevant mechanisms that 
produce a given self-organizing behavior. A different 
approach consists in exploiting artificial evolution to 
synthesize the controllers for the robotic swarm. This 
allows the user to simply define some performance met- 
ric for the group and let the evolutionary algorithm 
find the controllers capable of producing the desired 
behavior. This generic approach has been exploited to 
evolve various self-organizing behaviors, including ag- 
gregation [71.8]. In this case, robots were rewarded 
to minimize their distance from the geometric cen- 
ter of the group and to keep moving. The analysis of 
the evolved behavior revealed that in all cases robots 
are attracted by teammates and repelled by obstacles. 
When a small aggregate forms, robots keep on moving 
thanks to the delicate balance between attractive and re- 
pulsive forces. This makes the aggregate continuously 
expand and shrink, moving slightly across the arena. 
This slow motion of the aggregate makes it possible to 
attract other robots or other aggregates formed in the 
vicinity, and results in a very good scalability of the ag- 


gregation behavior with respect to the group size. This 
experiment revealed a possible alternative mechanism 
for aggregation, which is not dependent on the proba- 
bility of joining or leaving an aggregate. In fact, robots 
here never quit an aggregate to which they are attracted. 
Rather, the aggregates themselves are dynamic struc- 
tures capable of moving within the environment, and in 
doing so they can be attracted by neighboring aggre- 
gates, until all robots belong to the same group. 


71.2.4 Other Studies 


The seminal papers described above are representative 
of other studies, which either exploit a probabilistic ap- 
proach [71.9, 10], or rely on artificial evolution [71.11]. 
Approaches grounded on mathematical models and 
control theory are also worth mentioning [71.12, 13]. 
Other variants of the aggregation behavior can be con- 
sidered. The aggregate may be characterized by an 
internal structure, that is, agents in the aggregate are 
distributed on a regular lattice or form a specific shape. 
In such cases, we talk about pattern/shape forma- 
tion [71.14]. Another possibility is given by the admis- 
sibility of multiple aggregates. In the studies mentioned 
so far, multiple aggregates may form at the beginning 
of the aggregation process, but as time goes by smaller 
aggregates are disbanded in favor of larger ones, eventu- 
ally leading to a single aggregate for the whole swarm. 
However, it could be desirable to obtain multiple ag- 
gregates forming functional groups of a specific size. In 
this case, it is necessary to devise mechanisms for con- 
trolling the group size [71.15]. 


71.3 Acting Together: Synchronization 


Synchronization is a common phenomenon observed 
both in the animate and inanimate world. In a syn- 
chronous system, the various components present 
a strong time coherence between the individual types of 
behavior. In robotics, synchronization can be exploited 
for the coordination of actions, both within a single or 
a multi-robot domain. In the latter case, synchronization 
may be particularly useful to enhance the system effi- 
ciency and/or to reduce the interferences among robots. 


71.3.1 Variants of Synchronization Behavior 


Synchronization in a multi-agent system can be of 
mainly two forms: loose and tight. In the case of loose 


synchronization, we observe a generic coordination in 
time of the activities brought forth by different agents. 
In this case, single individuals do not present a periodic 
behavior, but as a group it is possible to observe bursts 
of synchronized activities. Often in this case there are 
external cues that influence synchrony, such as the day- 
light rhythm. On the other hand, it is possible to observe 
tight synchronization when the individual actions are 
perfectly coherent. To ensure tight synchronization in 
a group, it is possible to rely on either a centralized or 
a distributed approach. In the former, one agent acts as 
a reference (e.g., a conductor for the orchestra or the 
music theme for a ballet) and drives the behavior of the 
other system components. In the latter, a self-organizing 


1381 


E'L | 4 Hed 


1382 


E'IL | d Hed 


Part F 


Swarm Intelligence 


process is in place, and the system shows the ability 
to synchronize without an externally-imposed rhythm. 
It is worth noting that tight synchronization does not 
necessitate individual periodic behavior, neither in the 
centralized nor in the self-organized case. For instance, 
synchronization has also been studied between cou- 
pled chaotic systems [71.16]. In the following, we focus 
on self-organized synchronization of periodic behavior, 
which is the most studied phenomenon as it is com- 
monly observed in many different systems. 


71.3.2 Self-Organized Synchronization 
in Biology 


Although synchronization has always been a well- 
known phenomenon [71.17], its study did not arouse 
much interest until the late 1960s, when Winfree be- 
gan investigating the mechanisms underlying biological 
rhythms [71.18]. He observed that many systems in 
biology present periodic oscillations, which can get 
entrained when there is some coupling between the 
oscillators. A mathematical description of this phe- 
nomenon was first introduced by Kuramoto [71.19], 
who developed a very influential model that was after- 
wards refined and applied to various domains [71.17]. 
Similar mechanisms are at the base of the syn- 
chronous signaling behavior observed in various animal 
species [71.3]. Chorusing is a term commonly used to 
refer to the coordinated emission of acoustic commu- 
nication signals by large groups of animals. To cite 
a few examples, chorusing has been observed in frogs, 
crickets, and spiders. However, probably the most fasci- 
nating synchronous display is the synchronous flashing 
of fireflies from South-East Asia. This phenomenon was 


Oscillator 1 
1 ; 


0.5 


thoroughly studied until a self-organizing explanation 
was proposed to account for the emergence of syn- 
chrony [71.20]. 

A rather simple model describes the behavior of 
fireflies as the interactions between pulse-coupled oscil- 
lators [71.21]. In Fig. 71.3, the activity of two oscilla- 
tors is represented as a function of time. Each oscillator 
is of the integrate-and-fire type, which well represents 
a biological oscillator such as the one of fireflies. The 
oscillator is characterized by a voltage-like variable that 
is integrated over time until a threshold is reached. At 
this point, a pulse is fired and the variable is reset to the 
base level (Fig. 71.3). Interactions between oscillators 
take the form of constant phase shifts induced by in- 
coming pulses, which bring other oscillators close to the 
firing state, or make them directly fire. These simple in- 
teractions are sufficient for synchronization; in a group 
of similarly pulse-coupled oscillators, constant adjust- 
ments of the phase made by all the individuals lead to 
a global synchronization of pulses (for a detailed de- 
scription of this model, see [71.21]). 


71.3.3 Self-Organized Synchronization 
in Swarm Robotics 


The main purpose of synchronization in swarm robotics 
is the coordination of the activities in a group. This can 
be achieved in different ways, and mechanisms inspired 
by the behavior of pulse-coupled oscillators have been 
developed. In [71.15], synchronization is exploited to 
regulate the size of traveling robotic aggregates. Robots 
can emit a short sound signal (a chirp), and enter a re- 
fractory state for a short time after signaling. Then, 
robots enter an active state in which they may signal 


Fig. 71.3 Synchronization between 


Oscillator 2 
1 


0.5 


16 pulse-coupled oscillators. The os- 
Time (s) cillator emits pulses each time its 
state variable reaches the threshold 
level (corresponding to | in the plot). 
When one oscillator emits a pulse, 
its state is reset while the state of the 
other oscillator is advanced by a con- 


stant amount, which corresponds 
to a phase shift, or to the oscillator 
firing if it overcomes the threshold 


> 
16 
Time (s) 


Fundamental Collective Behaviors in Swarm Robotics 


71.4 Staying Together: Coordinated Motion 


at any time, on the basis of a constant probability per 
time-step. Therefore, the chirping period is not constant 
and depends on the chirping probability. In this state, 
robots also listen to external signals and react by im- 
mediately emitting a chirp. This mechanism, similar to 
chorusing in frogs and crickets, leads to synchronized 
emission of signals. Thanks to this simple synchroniza- 
tion mechanism, the size of an aggregate can somehow 
be estimated. Given the probabilistic nature of chirping, 
a robot has a probability of independently initiating sig- 
naling that depends on the number of individuals in the 
group; estimating this probability by listening to own 
and others’ chirps allows an approximate group size 
estimation. Synchronization, therefore, ensures a mech- 
anism to keep coherence in the group, which is the 
precondition for group size estimation. 

In [71.22], synchronization is instead necessary to 
reduce the interferences between robots, which pe- 
riodically perform foraging and homing movements 
in a cluttered environment. Without coordination, the 
physical interferences between robots going toward and 
away from the home location lead to a reduced overall 
performance. Therefore, a synchronization mechanism 
based on the firefly behavior was devised. Robots emit 
a signal in correspondence to the switch from foraging 
to homing. This signal can be perceived by neighboring 
robots within a limited radius and induces a reset of the 
internal rhythm that corresponds to a behavioral shift 
to homing. Despite the limited range of communica- 
tion among robots, a global synchronization is quickly 
achieved, which leads the group to reduce interferences 
and increase the system performance [71.22]. 

A different approach to the study of synchronization 
is described in [71.23]. Here, artificial evolution is ex- 
ploited to synthesize the behavior of a group of robots, 
with the objective of obtaining minimal communication 
strategies for synchronization. Robots were rewarded to 
present an individual periodic movement and to signal 
in order to synchronize the individual oscillations. The 
results obtained through artificial evolution are then an- 
alyzed to understand the mechanisms that can support 


synchronization, showing that two types of strategies 
are evolved: one is based on a modulation of the oscil- 
lation frequency, the other relies on a phase reset. These 
two strategies are also observed in biological oscilla- 
tors: for instance, different species of fireflies present 
different synchronization mechanisms, based on de- 
layed or advanced phase responses [71.20]. 


71.3.4 Other Studies 


While self-organized synchronization is a well-known 
phenomenon, its application in collective and swarm 
robotics has not been largely exploited. The coupled- 
oscillator synchronization mechanism was applied to 
a cleaning task to be performed by a swarm of micro 
robots [71.24]. Another interesting implementation of 
the basic model can be found in [71.25]. Here, syn- 
chronization is exploited to detect and correct faults in 
a swarm robotics system. It is assumed that robots can 
synchronize a periodic flashing behavior while moving 
in the arena and accomplishing their task. If a robot 
incurs some fault, it will forcedly stop synchronizing. 
This fault can be detected and recovered by neighboring 
robots. Similar to the heartbeat in distributed com- 
puting, correct synchronization corresponds to a well- 
functioning system, while the lack of synchronization 
corresponds to a faulty condition. 

Finally, synchronization behavior may emerge 
spontaneously in an evolutionary robotics setup, even 
if they are not explicitly rewarded. In [71.26], synchro- 
nization of group activities evolved spontaneously as 
a result of the need to limit the interferences among 
robots in a foraging task. In [71.27], robots were 
rewarded to maximize the mean mutual information 
between their motor actions. Mutual information is 
a statistical measure derived in information theory, and 
roughly corresponds to the correlation between the out- 
put of two stochastic processes. Evolution, therefore, 
produced synchronous movements among the robots, 
which could actually maximize the mutual information 
while maintaining a varied behavior. 


71.4 Staying Together: Coordinated Motion 


Another fundamental problem for a swarm is ensuring 
coherence in space. This means that the individuals in 
the swarm must display coordinated movement in order 
to maintain a consistent spatial structure. Coordinated 
motion is often observed in groups of animals. Flocks of 


birds or schools of fish are fascinating examples of self- 
organized behavior producing a collective motion of the 
group. Similar problems need to be tackled in robotics, 
for instance for moving in formation or for distributedly 
deciding a common direction of motion. 


1383 


T1LL|4 Hed 


1384 Part F | Swarm Intelligence 


Ll | 4 Hed 


71.4.1 Variants of the Coordinated Motion 
Behavior 


The coordinated motion of a group of agents can be 
achieved in different ways. Also in this case, we can 
distinguish mainly between a centralized and a dis- 
tributed approach. In a centralized approach, one agent 
can be considered the leader and the other agents fol- 
low (e.g., the mother duck with her ducklings). In the 
distributed approach, instead, there is no single leader 
and some coordination mechanism must be found to 
let the group move in a common direction. Of par- 
ticular interest for swarm robotics are the coordinated 
motion models based on self-organization. Such models 
consider multi-agent systems that are normally homo- 
geneous and characterized by a uniform distribution of 
information: no agent is more informed than the others, 
and there exists no a priori preference for any direction 
of motion (i. e., agents start being uniformly distributed 
in space). However, through self-organization and am- 
plification of shared information, the system can break 
the symmetry and converge to a common direction 
of motion. A possible variant of the self-organized 
coordinated motion consists in having a non-uniform 
distribution of information, which corresponds to hav- 


a) b) . c) 


a) | > (ect 


-i w 


a 

eo. eo Oo | © 

_ © © O ol 2 
© © 


Fig. 71.4a-c Self-organized coordinated motion in a group of 
agents. In the bottom part, a group of agents is moving in roughly 
the same direction. According to the model presented in [71.28], 
agents react to the closest neighbor within their perception range 
and follow three main rules: (a) agents move toward a neighbor 
when it is too far; (b) agents move away from a neighbor when it 
is too close; (c) agents rotate and align with a neighbor situated at 
intermediate distances. The iterated application of these rules leads 
the group to move in a same direction 


ing some agents that are more informed than the others 
on a preferred direction of motion. In this case, a few 
informed agents may influence the motion of the entire 


group. 
71.4.2 Coordinated Motion in Biology 


Many animal species present coordinated motion be- 
havior, ranging from bacteria to fish and birds. Not all 
animal species employ the same mechanisms, but in 
general it is possible to recognize various types of in- 
teractions among individuals that have a bearing on the 
choice of the motion direction. Coordinated motion has 
mainly been studied in various species of fish, in birds, 
and in insect swarms [71.29, 30]. The most influential 
model was introduced by Huth and Wissel to describe 
the behavior of various species of fish observed [71.28]. 
In this model, it is assumed that each fish is influenced 
solely by its nearest neighbor. Also, the movement of 
each fish is based on the same behavioral model, which 
also includes some inherent random fluctuation. Ac- 
cording to the proposed behavioral model, each fish 
follows essentially three rules: 


i) Approach a far away individual 
ii) Get away from individuals that are too close 
iii) Align with the neighbor direction (Fig. 71.4). 


When the nearest neighbor is within the closest re- 
gion, the fish reacts by moving away. When the nearest 
neighbor is in the farthest region, the fish reacts by 
approaching. Otherwise, if the neighbor is within the 
intermediate region, the fish reacts by aligning. These 
simple rules are sufficient to produce collective group 
motion, and the final direction emerges from the inter- 
actions among the individuals. 

Starting from the above model, a number of variants 
have been proposed, which take into account differ- 
ent parameters and different numbers of individuals. 
In [71.31], a model including all individuals in the 
perceptual range was introduced, and a broad analysis 
of the parameters was performed, showing how minor 
differences at the individual level correspond to large 
differences at the group level. In [71.32], an experimen- 
tal study on bird flocks in the field was performed, and 
position and velocity data were obtained for each bird in 
a real flock through stereo-photography and 3-D map- 
ping. The data obtained data were used to verify the 
assumption about the number of individuals that each 
bird monitors during flocking, showing that this num- 
ber is constant (and corresponds to about 7 individuals) 


Fundamental Collective Behaviors in Swarm Robotics 


71.4 Staying Together: Coordinated Motion 


notwithstanding the varying density of the flock. Fi- 
nally, in [71.33], a model was developed in which some 
of the group members have individual knowledge on 
a preferential direction. The model describes the out- 
come of a consensus decision in the flock as a result 
of the interaction between informed and uninformed 
individuals. 


71.4.3 Coordinated Motion 
in Swarm Robotics 


The models introduced for characterizing the self- 
organized behavior of fish schools or bird flocks have 
also inspired a number of interesting studies. The most 
influential work is definitely that of Reynolds, who de- 
veloped virtual creatures called boids [71.34]. In this 
work, each creature executes three simple types of be- 
havior: 


i) Collision avoidance, to avoid crashing with nearby 
flockmates 

ii) Velocity matching, to move in the same way of 
nearby flockmates 

iii) Flock centering, to stay close to nearby flockmates. 


Notice that the behavioral model corresponds to the 
models proposed in biological studies. The merit of this 
work is that it is the first implementation of the rules 
studied for real flocks in a virtual 3-D world, showing 
a close correspondence of the behavior of boids with 
that of flocks, herds, and schools. Reynolds’ research 
has been taken as inspiration by many other studies on 
coordinated motion, mainly in simulation. In [71.35], 
an implementation of the flocking behavioral model 
was proposed and tested on real robots. Robots use 
infrared proximity sensors to recognize the presence 
of other robots and their distance, which is necessary 
for collision avoidance and flock centering behavior. 
Additionally, a dedicated sensor to perceive the head- 
ing of neighbors was developed to support aligning 
behavior. This system, called the virtual heading sys- 
tem (VHS), is based on a digital compass and wireless 
communication. Despite the fact that a digital compass 
cannot reliably work in an indoor environment, it is 
assumed that neighboring robots have similar percep- 
tions. The heading perceived with respect to the local 
north is communicated over the wireless channel, and 
it is exploited for alignment behavior. This system al- 


lowed testing the flocking behavior of small robotic 
groups in a physical setting and studying the dynam- 
ics of flocking with up to 1000 simulated robots. This 
work was later extended in [71.36], by having a sub- 
group of informed individuals which could steer the 
whole flock, following the model presented in [71.33]. 
The dynamics of steered flocking have been studied by 
varying the percentage of informed robots in simula- 
tion, and tests with real robots have been performed as 
well. 


71.4.4 Other Studies 


As mentioned above, there exist numerous studies that 
were inspired by the schooling/flocking models. All 
these studies adopt some variants of the behavioral rules 
described above, or analyze the group dynamics under 
some particular perspective. A different approach to co- 
ordinated motion can be found in [71.37]. In this work, 
robots have to transport a heavy object and have imper- 
fect knowledge of the direction of motion. They can, 
however, negotiate the goal direction by displaying their 
own preferred direction of motion and by adjusting it 
on the basis of the direction displayed by others. On 
the whole, this mechanism implements similar dynam- 
ics to the alignment behavior of the classical flocking 
model. Here, however, robots are connected together to 
the object to be transported, adding a further constraint 
to the system that obliges a good negotiation to allow 
motion. A similar constraint characterizes the coordi- 
nated motion studies with physically assembled robots 
presented in [71.38, 39]. Here, robots form a physical 
structure of varying shape and can rotate their chas- 
sis in order to match the direction of motion of the 
other robots. In this case, there is no direct detection 
of the motion direction of neighbors. Instead, robots 
can sense the pulling and pushing forces that are ex- 
erted by the other connected robots through the physical 
connections. These pulling/pushing forces are naturally 
averaged by the force sensor, which returns their resul- 
tant. Artificial evolution was exploited to synthesize an 
artificial neural network that could transform the forces 
sensed to motor commands. The results obtained show 
the impressive capability of self-organized coordination 
between the robots, as well as scalability and gener- 
alization to different size and shapes [71.38], and the 
ability to cope with obstacles and to avoid falling out- 
side the borders of the arena [71.39]. 


1385 


T1LL|4 Hed 


1386 PartFf | Swarm Intelligence 


G'I | d Hed 


71.5 Searching Together: Collective Exploration 


Exploring and searching the environment is an impor- 
tant behavior for robot swarms. In many tasks, the 
swarm must interact with the environment, sometimes 
only to monitor it, but sometimes also to process ma- 
terials or other kinds of resources. Usually, the swarm 
cannot completely perceive the environment, and the 
environment may also change during the operation of 
the robots. Hence, robots need to explore and search 
the environment to monitor for changes or in order to 
detect new resources. 

To cope with its partial perception of the environ- 
ment, a swarm can move, for instance using flocking, in 
order to explore new places (some locations may be un- 
available, though). Hence, most of the environment can 
be perceived, but not at the same time. As in many other 
artificial systems, a tradeoff between exploration and 
exploitation exists and requires careful design choices. 


71.5.1 Variants of Collective Exploration 
Behavior 


There is no perfect exploration and search strategy be- 
cause the structure of the environment in which the 
swarm is placed can take many different shapes. Strate- 
gies only perform more or less well as a function of 
the situation with which they are faced [71.40]. For 
instance, the swarm could be in a maze, in a open envi- 
ronment with few obstacles, or in an environment with 
many obstacles. 

We identified a restricted number of environmen- 
tal characteristics that play an important role in the 
choice of searching behavior in swarm robotics. These 
characteristics are commonly found in swarm robotics 
scenarios, and are the presence of a central place, the 
size of the environment, the presence of obstacles. 

The central place is a specific location where robots 
must come back regularly, for instance for maintenance 
or to deposit foraged items. A scenario that involves 
a central place requires a swarm able to either remem- 
ber or keep track of that location. 

If the environment is closed (finite area) and not too 
large, the swarm may use random motion to explore, 
with fair chances to rapidly locate resources (or even 
the central place). In an open environment, robots can 
get lost very quickly. In this type of environment, it is 
necessary to use a behavior that allows robots to stay 
together and maintain connectivity. 

Obstacles are environmental elements that constrain 
the motion of the swarm. If the configuration of the ob- 


stacles is known in advance, the swarm can move in 
the environment following appropriate patterns. In most 
cases, however, obstacles are unexpected or might be 
dynamic and may prevent the swarm from exploring 
parts of the environment. 


71.5.2 Collective Exploration in Biology 


In nature, animals are constantly looking for resources 
such as food, sexual partners, or nesting sites. Animals 
living in groups may use several types of behavior to 
explore their environment and locate these resources. 

For instance, fish can take advantage of the number 
of individuals in a shoal to improve their capabilities to 
find food [71.41—43]. To do so, they move and maintain 
large interdistances between individuals. In this way, 
fish increase their perceptual coverage as well as their 
chances to find new resources. 

Animals also heavily rely on random motion to 
explore their environment [71.44—46]. Usually the ex- 
ploratory pattern is not fully random (that is, isotropic), 
because animals use all possible environmental cues at 
hand to guide themselves. Random motion can be bi- 
ased towards a given direction, or it can be constrained 
in a specific area, for instance around a previously 
memorized location [71.47]. Some desert ants achieve 
high localization performance with odometry (counting 
their footsteps) and relying on gravity and the polariza- 
tion of natural light. They may move randomly to look 
for resources but they are able to quickly return to their 
nest and also to return to an interesting location previ- 
ously identified. 


71.5.3 Collective Exploration 
in Swarm Robotics 


One of the most common exploration strategies used 
in robotics is random exploration. In a typical imple- 
mentation, robots wander in the environment until they 
perceive a feature of interest [71.48—50]. By doing this, 
robots possibly lose contact with each other and, there- 
fore, their ability to work together. Hence this strategy 
is not suited for large or open environments. Due to the 
stochastic nature of the strategy, its performance can 
only be evaluated statistically. On average, the time to 
locate a feature is proportional to the squared distance 
with robots [71.44]. 

Systematic exploration strategies are very different. 
Robots use some a priori knowledge about the structure 


Fundamental Collective Behaviors in Swarm Robotics 


71.5 Searching Together: Collective Exploration 


a) b) 

ORS) OROKO) 

OMe ere, 

D oe O O QA 
Q 


o W 
9 O 
° g Oo y 
~ °° © © pai 
Q © O 


Fig. 71.5a-d Gas expansion behavior to monitor the surroundings of a central place. (a) The swarm starts aggregated 
around a central place (represented by a black spot). (b,c) Robots try to move as far as possible from their neighbors, 
while maintaining some visual or radio connection. (d) As a result, the whole swarm expands in the environment, like 


a gas, covering part of the environment 


of the environment in order to methodically sweep it 
and find features. To ensure that robots do not repeat- 
edly cover the same places, they may need to memorize 
which places have already been explored. This is often 
implemented with localization techniques and mapping 
of the environment [71.51]. The advantage of this tech- 
nique is that an answer will be found with certainty, and 
the time of exploration has a lower and upper bound if 
the environment is not open. However, memory require- 
ments may be excessive, and the strategy is not suited 
for open environments. 

Between the two extreme strategies reported in the 
previous paragraph lie a number of more specialized 
strategies that present advantages and drawbacks de- 
pending on the structure of the environment and the 
distribution of the resources. 


Collective motion (which has already been detailed 
in Sect. 71.4) allows swarms to maintain their cohesion 
while moving through the environment. Flocking be- 
havior can be employed in an open environment with 
a limited risk of losing contact between robots. The 
swarm behaves like a sort of physical mesh that covers 
part of the environment; to maximize the area covered 
during exploration robots can increase their interdis- 
tance during motion as much as possible. 

Gas expansion behavior (Fig. 71.5) allows robots to 
quickly and exhaustively explore the surroundings of 
a central place [71.52-55]. While one or several robots 
keep track of a central place, other robots try to move 
as far as possible from their neighbors, while still main- 
taining direct line of sight with at least one of them. 
The swarm behaves like a fluid or gas that penetrates 


a) b) 

Oe ©ea 
OoGO OJO 
DDO DDO 
© Q © O 


o) d) 


@ V 
DOO 


Fig. 71.6a-d Chaining behavior in action with a central place represented by a large black dot, bottom left. (a) Robots 
start aggregated around the central place. (b) While maintaining visual or radio contact with neighbors, some robots 
change role and become part of a chain (grayed out). (c) Other robots move around the central place and encounter the 
early chain of robots. With some probability, they also turn into new parts of the chain. (d) At the end of the iterative 
process, robots form a long chain that spans through the environment and maintains a physical link to the central place 


1387 


SLL] 4 Hed 


1388 Part F 


Swarm Intelligence 


9°12 | d Hed 


the asperities of the environment. The exploration is 
very effective and any change or new resource within 
the perception range of the swarm is immediately per- 
ceived. However, since robots are bound to the central 
place, the area that they can explore is limited by the 
number of robots in the swarm. If robots do not stick to 
a central place, the resulting behavior shifts to a type of 
flocking or moving formation. 

With chaining behavior, swarms can form a chain 
with one end that sticks to a central place and the 
other end that freely moves through the environ- 


ment (Fig. 71.6). In [71.56], minimalistic behavior 
produces a static chain, but different types of chain 
motions can be imagined. In [71.57], for instance, 
chains can build up, move, and disaggregate until 
a resource is found. Contrary to gas expansion behav- 
ior, a chaining swarm may not immediately perceive 
changes in the environment because it has to con- 
stantly sweep the space. Chaining allows robots to 
cover a more important area than gas expansion behav- 
ior, ideally a disc of radius proportional to the number 
of robots. 


71.6 Deciding Together: Collective Decision Making 


Decision making is a behavior used by any artificial 
system that must produce an adapted response when 
facing new or unexpected situations. Because the best 
action depends on the situation encountered, a swarm 
cannot rely on a pre-programmed and systematic re- 
action. Monolithic artificial systems make decisions all 
the time, by gathering information and then evaluating 
the different options at hand. However, when it comes to 
swarms, each group member might have its own opin- 
ion about the correct decision. If all individuals perceive 
the same information and process it in the same way, 
then they might independently make the same decision. 
However, in practice, the more common case is that in- 
dividuals perceive partial and noisy information about 
the situation. Thus, if no coordination among group 
members occurs, a segregation based on differing opin- 
ions might take place, thereby removing the advantages 
of being a swarm. Therefore, the challenge is to have 
the whole group collaborate to make a collective deci- 
sion and take action accordingly. 


71.6.1 Variants of Collective Decision Making 
Behavior 


There are mainly three mechanisms reported in the 
literature that allow swarms to make collective deci- 
sions. The first and most simple mechanism is based 
on opinion propagation. As soon as a group member 
has enough information about a situation to make up its 
mind, it propagates its opinion through the whole group. 

The second mechanism is based on opinion averag- 
ing. All individuals constantly share their opinion with 
their neighbors and also adjust their own opinion in con- 
sequence. This iterative process leads to the emergence 
of a collective decision. The adjustment of the opinion 


is typically achieved with an average function, espe- 
cially if opinions are about quantitative values such as 
a location, a distance, or a weight, for instance. 

The third and last mechanism relies on amplifi- 
cation to produce a collective decision. In a nutshell, 
all individuals start with an opinion, and may decide 
to change their opinion to another one. The switch to 
a new opinion happens with a probability calculated on 
the basis of the frequency of this opinion in the swarm. 
Practically, this means that if an opinion is represented 
often in the group, it has also more chances of being 
adopted by an individual, which is why the term ampli- 
fication is used. 

Each of the three aforementioned mechanisms has 
some advantages over the others and may be preferred, 
depending on the situation faced. The factors that play 
an important role in collective decision processes in- 
clude the speed and the accuracy needed to make the 
choice, the robustness of communication, and the relia- 
bility of individual information. 

In terms of speed, opinion propagation allows 
fast collective decisions, in contrast with the two 
other mechanisms, which require numerous interac- 
tions among individuals. However, this speed generally 
comes at the cost of robustness or accuracy [71.58—60]. 
If communication is not robust enough, messages can 
be corrupted. The mechanism of opinion propagation 
is particularly sensitive to such effects, and a wrong or 
random collective decision might be made by a swarm 
in that case. 

The averaging mechanism would produce a more 
robust decision because wrong information from erro- 
neous messages is diluted in the larger amount of infor- 
mation present in the swarm [71.61]. However, opinion 
averaging works best if all individuals have roughly 


Fundamental Collective Behaviors in Swarm Robotics 


71.6 Deciding Together: Collective Decision Making 


identical knowledge. If a small proportion of individ- 
uals have excellent knowledge to make the decision, 
while the remaining individuals have poor information, 
opinion propagation may produce better results than 
opinion averaging [71.33]. 

Lastly, the amplification mechanism is the main 
choice for a gradually emerging collective decision if 
opinions cannot be merged with some averaging func- 
tion. Instead of adjusting opinions, individuals simply 
adopt new opinions with some probability. It is worth 
noting that this mechanism can produce good decisions 
even if individuals have poor information. 


71.6.2 Collective Decision Making in Biology 


The powerful possibilities of decision making in groups 
were already suggested by Galton back in 1905 [71.62]. 
In that paper, Galton reports the results of a weight- 
judging competition in which competitors had to esti- 
mate the weight of a fat ox. With slightly less than 800 
independent estimates, Galton observed that the aver- 
age estimate was accurate to 1% of the real weight of 
the ox. This early observation opened interesting per- 
spectives about the accuracy of collective estimations, 
but it did not describe a collective decision mechanism, 
since Galton himself had to gather the estimates and 
apply some calculation to evaluate the estimate of the 
crowd. 

More recent studies about group navigation have 
shown that groups of animals cohesively moving to- 
gether towards a goal direction reach their objective 
faster than independent individuals [71.63,64]. The 
mechanism of collective navigation not only allows the 
individuals to move and stay together, but it also acts 
as a distributed averaging function that locally fuses the 
opinions of individuals about the direction of motion, 
allowing them to improve their navigation performance. 

In the last decades, the amplification mechanism 
has been identified as a source of collective decision in 
a broad range of animal species such as ants [71.65, 
66], honeybees [71.67,68], spiders [71.69], cock- 
roaches [71.70], monkeys [71.71], and sheep [71.72]. 

Ants that choose one route to a resource probably 
constitute the most well-known example of the amplifi- 
cation mechanism. In [71.66], an ant colony is offered 
two paths to two identical resource sites. Initially, the 
two resources are exploited equally, but after a short 
time ants focus on a single resource. This collective 
choice happens because ants that have found the re- 
source come back to the nest, marking the ground with 
a pheromone trail. The next ants that try to reach the 


resource are sensitive to this odor and have higher 
chances of following the path with higher pheromone 
concentration. As a result of this amplified response, 
a collective decision rapidly emerges. In addition, it 
was shown in [71.73] that when ants are presented two 
paths of different lengths to the same resource, the same 
pheromone-based mechanism allows them to choose 
the shortest path. This can be explained by the fact 
that ants using the shortest path need less time to make 
round trips, making the pheromone concentration on 
this path grow faster. 

Quorum sensing is a special case of the ampli- 
fication mechanism which has been notably used to 
explain nest site selection in ants and bees [71.74— 
76]. The most basic example of quorum sensing uses 
a threshold to dictate if individuals should change their 
opinion. If an individual perceives enough neighbors 
(above the threshold) that already share the opposite 
opinion, then it will in turn adopt this opinion. It has 
been shown that this threshold makes quorum sensing 
more robust to the propagation of erroneous infor- 
mation during the decision process. In addition, the 
accuracy of collective decisions made with quorum 
sensing may improve with group size, and cognitive ca- 
pabilities of groups may outperform the ones of single 
individuals [71.77, 78]. In the case of nest site selection, 
cohesion is mandatory for the group. A cross inhibition 
mechanism complementing amplification was identi- 
fied as a key feature to ensure that groups do not 
split [71.79]. 


71.6.3 Collective Decision Making 
in Swarm Robotics 


In swarm robotics, opinion averaging has been used 
to improve the localization capabilities of robots. 
In [71.50], a swarm of robots carries out a foraging task 
between a central place and a resource site. The robots 
have to navigate back and forth between the two places 
and use odometry to estimate their location. As odom- 
etry provides noisy estimates, robots using solely this 
technique may quickly get lost. Here, robots can share 
and merge their localization opinions when they meet, 
by means of local infrared communication. By doing 
so, robots manage better localization and improve their 
performance in the foraging task. Moreover, robots as- 
sociate a confidence level to their estimates, which is 
used to decide how information is merged. If a robot 
advertises an opinion with very high confidence, then 
the mechanism produces opinion propagation. Hence 
the two mechanisms of averaging and propagation are 


1389 


9°1L | 4 Hed 


1390 Part F | Swarm Intelligence 


L'LL | d Hed 


a) Site 2 || b) alas 
D 6 eN f S tA P 
sasi O Q 
Q TO © M 
QA EnA ` 
© O a 
© Q tae O Oe 
i ee eee a 
is Or K 
/ pe N 
Q D 7 ia > 
© Be) S 


Site 1 ~~ 


c) d) 


Fig. 71.7a-d A swarm of robots is presented with two resource sites in its environment and must collectively choose 
one. (a) Initially, robots are randomly scattered. (b) Using a random walk, they move until a resource site is found. On 
average, the swarm is split in the two sites. (c) The more neighbors they perceive, the longer the robots stay. A competition 
between the two sites takes place and any random event may change the situation. Here a robot just left the top right site, 
further reducing the chances that other robots stay there. (d) The swarm has made a choice in favor of the bottom left 
site. The choice is stable, although some robots may frequently leave the site for exploration 


blended in a single behavior, and the balance between 
them is tuned by the user with a control parameter. 

The aggregation behavior previously mentioned in 
Sect. 71.2.3 can be exploited to trigger collective de- 
cision making in situations where there are several en- 
vironmental heterogeneities. In [71.80], the robots are 
presented two shelters and they choose one of them as 
a resting site by aggregating there. The behavior of the 
robots closely follows the one observed in cockroaches 
(Fig. 71.7). In [71.81], both robots and cockroaches 
are introduced in an arena with two shelters, demon- 
strating the influence of the two groups on each other 
when making the collective decision. The collective 
decision is the result of an amplification mechanism, 
implemented via the probability of a robot leaving an 
aggregate. This probability diminishes with the number 
of perceived neighbors, allowing larger aggregates to 
attract more robots. 


71.6.4 Other Studies 


The opinion averaging mechanism was deeply investi- 
gated with a general mathematical approach in [71.82, 


71.7 Conclusions 


In this chapter, we have presented a broad overview 
of the common problems faced in a swarm robotics 
context, and we pointed to possible approaches to 
obtain solutions based on a self-organizing process. 
We have discussed aggregation, synchronization, co- 


83]. These studies demonstrate convergence of the 
mechanism and emphasize the importance of the topol- 
ogy of the communication network through which in- 
teractions take place. 

Another amplification mechanism inspired from the 
behavior of honeybees was implemented in [71.84]. 
With this mechanism, it was shown that robots are able 
to make a collective decision and between two sites 
reliably choose the one offering the best illumination 
conditions. 

The amplification mechanism based on pheromone 
trails, which is used by ants, has also inspired 
several swarm robotics studies. In [71.85,86], the 
pheromone is replaced by light projected by a beamer. 
This implementation is limited to laboratory stud- 
ies, but it allowed demonstrating path selection with 
robot swarms. In [71.87], the process is abstracted 
inside a network of robots that are deployed in 
the environment. Virtual ants hop from robots to 
robots and deposit pheromone inside them. Even- 
tually, the shortest path to a resource is marked 
out by robots with high and sustained levels of 
pheromone. 


ordinated motion, collective exploration, and decision 
making, and we argued that many application scenarios 
could be solved by a mix of the above solutions. So, 
are we done with swarm robotics research? Definitely 
not. 


Fundamental Collective Behaviors in Swarm Robotics | References 


First of all, the fact that possible solutions exist does 
not mean that they are the most suitable for any possible 
application scenario. Hardware constraints, miniatur- 
ization, environmental contingencies, and performance 
issues may require the design of different solutions, 
which may strongly depart from the examples given 
above. Still, the approaches we presented constitute 
a logical starting point, as well as a valid benchmark 
against which novel approaches can be compared. 

Another important research direction consists in 
characterizing the self-organizing behavior we pre- 
sented in terms of abstract properties, such as the time 
of convergence toward a stable state, sensitivity to pa- 
rameter changes, robustness to failures, and so forth. 
From this perspective, the main problem is to ensure 
a certain functionality of the system with respect to 
the needs of the application and to predict the sys- 
tem features before actual development and testing. In 
many cases, a precise characterization of the system is 
not possible, and only a statistical description can be 
achieved. Still, such an enterprise would bring swarm 
robotics closer to an engineering practice, eventually 
allowing us to guarantee a certain performance of the 
developed system, as well as other properties that engi- 
neering commonly deals with. 

The examples we presented all refer to homoge- 
neous systems, in which all individuals are physically 


References 


71.1 J.T. Bonner: Chemical signals of social amoebae, 
Sci. Am. 248, 114-120 (1983) 

71.2 C. van Oss, A.V. Panfilov, P. Hogeweg, F. Siegert, 
C.J. Weijer: Spatial pattern formation during 
aggregation of the slime mould Dictyostelium 
discoideum, J. Theor. Biol. 181(3), 203-213 
(1996) 

71.3 S. Camazine, J.-L. Deneubourg, N. Franks, J. Sneyd, 
G. Theraulaz, E. Bonabeau: Self-Organization in Bi- 
ological Systems (Princeton Univ. Press, Princeton 
2001) 

71.4 J.-L. Deneubourg, J.C. Grégoire, E. Le Fort: Kinetics 
of larval gregarious behavior in the bark bee- 
tle Dendroctonus micans (coleoptera, scolytidae), 
J. Insect Behav. 3(2), 169-182 (1990) 

71.5 R. Jeanson, C. Rivault, J.-L. Deneubourg, S. Blanco, 
R. Fournier, C. Jost, G. Theraulaz: Self-organized 
aggregation in cockroaches, Anim. Behav. 69(1), 
169-180 (2005) 

71.6 S. Garnier, C. Jost, J. Gautrais, M. Asadpour, 
G. Caprari, R. Jeanson, A. Grimal, G. Theraulaz: The 
embodiment of cockroach aggregation behavior in 


identical and follow exactly the same rules. This is, 
however, a strong simplification, which follows the 
tradition of biological modeling of self-organizing be- 
havior. However, instead of being a limitation, het- 
erogeneity is potentially a richness to be exploited 
in a swarm robotics system, which can lead to more 
complex group behavior. For instance, different indi- 
vidual reactions to features in the environment can be 
at the basis of optimal decision making at the group 
level [71.88]. Otherwise, heterogeneity between groups 
of individuals can be exploited for performing tasks that 
require specialized abilities, but maintaining an overall 
redundancy of the system that ensures robustness and 
scalability [71.89]. 

In conclusion, swarm robotics research still has 
many challenges to address, which range from the 
need for a more theoretical understanding of the rela- 
tion between individual behavior and group dynamics, 
to the autonomy and adaptation to varied real-world 
conditions in order to face complex application scenar- 
ios (e.g., due to harsh environmental conditions such 
as planetary or underwater exploration, or to strong 
miniaturization down to the micro scale). Whatever the 
theoretical or practical driver is, we believe that the 
studies presented in this chapter constitute fundamen- 
tal reference points that teach us how self-organization 
can be achieved in a swarm robotics system. 


a group of micro-robots, Artif. Life 14(4), 387-408 
(2008) 

71.7 G. Caprari, R. Siegwart: Mobile micro-robots ready 
to use: Alice, Proc. 2005 IEEE/RSJ Int. Conf. In- 
tell. Robot. Syst. (IROS 2005), Piscataway (2005) 
pp. 3295-3300 

71.8 M. Dorigo, V. Trianni, E. Sahin, R. Groß, T.H. La- 
bella, G. Baldassarre, S. Nolfi, J.-L. Deneubourg, 
F. Mondada, D. Floreano, L.M. Gambardella: Evolv- 
ing self-organizing behaviors for a swarm-bot, 
Auton. Robot. 17(2/3), 223-245 (2004) 

71.9 S. Kernbach, R. Thenius, 0. Kernbach, T. Schmickl: 
Re-embodiment of honeybee aggregation behav- 
ior in an artificial micro-robotic system, Adapt. 
Behav. 17(3), 237-259 (2009) 

71.10 0. Soysal, E. Şahin: A macroscopic model for self- 
organized aggregation in swarm robotic systems, 
Lect. Notes Comput. Sci. 4433, 27-42 (2007) 

71.11 E. Bahçeci, E. Şahin: Evolving aggregation behav- 
iors for swarm robotic systems: A systematic case 
study, Proc. IEEE Swarm Intell. Symp. (SIS 2005), Pis- 
cataway (2005) pp. 333-340 


1391 


LZ | d Hed 


1392 


L | 4 Hed 


Part F 


Swarm Intelligence 


71.12 


71.13 


71.14 


71.15 


71.16 


71.17 


71.18 


71.19 


71.20 


71.21 


71.22 


71.23 


71.24 


71.25 


71.26 


71.27 


71.28 


71.29 


H. Ando, Y. Oasa, |. Suzuki, M. Yamashita: Dis- 
tributed memoryless point convergence algorithm 
for mobile robots with limited visibility, IEEE Trans. 
Robot. Autom. 15(5), 818-828 (1999) 

V. Gazi: Swarm aggregations using artificial poten- 
tials and sliding-mode control, IEEE Trans. Robot. 
21(6), 1208-1214 (2005) 

W.M. Spears, D.F. Spears, J.C. Hamann, R. Heil: 
Distributed, physics-based control of swarms of 
vehicles, Auton. Robot. 17(2), 137-162 (2004) 

C. Melhuish, 0. Holland, S. Hoddell: Convoying: Us- 
ing chorusing to form travelling groups of minimal 
agents, Robot. Auton. Syst. 28, 207-216 (1999) 

A. Pikovsky, M. Rosenblum, J. Kurths: Phase syn- 
chronization in regular and chaotic systems, Int. 
J. Bifurc. Chaos 10(10), 2291-2305 (2000) 

S.H. Strogatz: Sync: The Emerging Science of Spon- 
taneous Order (Hyperion, New York 2003) 

A.T. Winfree: Biological rhythms and the behavior 
of populations of coupled oscillators, J. Theor. Biol. 
16(1), 15-42 (1967) 

Y. Kuramoto: Phase dynamics of weakly unstable 
periodic structures, Prog. Theor. Phys. 71(6), 1182- 
1196 (1984) 

J. Buck: Synchronous rhythmic flashing of fireflies. 
ll, Q. Rev. Biol. 63(3), 256-289 (1988) 

R.E. Mirollo, S.H. Strogatz: Synchronization of 
pulse-coupled biological oscillators, SIAM J. Appl. 
Math. 50(6), 1645-1662 (1990) 

S. Wischmann, M. Huelse, J.F. Knabe, F. Pase- 
mann: Synchronization of internal neural rhythms 
in multi-robotic systems, Adapt. Behav. 14(2), 117- 
127 (2006) 

V. Trianni, S. Nolfi: Self-organising sync in a robotic 
swarm. A dynamical system view, IEEE Trans. Evol. 
Comput. 13(4), 722-741 (2009) 

M. Hartbauer, H. Roemer: A novel distributed 
swarm control strategy based on coupled signal os- 
cillators, Bioinspiration Biomim. 2(3), 42-56 (2007) 
A.L. Christensen, R. O'Grady, M. Dorigo: From fire- 
flies to fault-tolerant swarms of robots, IEEE Trans. 
Evol. Comput. 13(4), 754-766 (2009) 

S. Wischmann, F. Pasemann: The emergence of 
communication by evolving dynamical systems, 
Lect. Notes Artif. Intell. 4095, 777-788 (2006) 

V. Sperati, V. Trianni, S. Nolfi: Evolving coordinated 
group behaviours through maximization of mean 
mutual information, Swarm Intell. 2(2-4), 73-95 
(2008) 

A. Huth, C. Wissel: The simulation of the movement 
of fish schools, J. Theor. Biol. 156(3), 365-385 (1992) 
|. Aoki: A simulation study on the schooling mech- 
anism in fish, Bull. Jpn. Soc. Sci. Fish. 48(8), 1081- 
1088 (1982) 

A. Okubo: Dynamical aspects of animal grouping: 
Swarms, schools, flocks, and herds, Adv. Biophys. 
22, 1-94 (1986) 

I.D. Couzin, J. Krause, R. James, G.D. Ruxton, 
N.R. Franks: Collective memory and spatial sort- 


71.32 


71.33 


71.34 


71.35 


71.36 


71.37 


71.38 


71.39 


71.40 


71.41 


71.42 


71.43 


71.44 


71.45 


71.46 


71.47 


71.48 


ing in animal groups, J. Theor. Biol. 218(1), 1-11 
(2002) 

M. Ballerini, N. Calbibbo, R. Candeleir, A. Cav- 
agna, E. Cisbani, |. Giardina, V. Lecomte, A. Orlandi, 
G. Parisi, A. Procaccini, M. Viale, V. Zdravkovic: 
Interaction ruling animal collective behavior de- 
pends on topological rather than metric distance: 
Evidence from a field study, Proc. Natl. Acad. Sci. 
USA 105(4), 1232-1237 (2008) 

I.D. Couzin, J. Krause, N.R. Franks, S.A. Levin: Ef- 
fective leadership and decision-making in animal 
groups on the move, Nature 433(7025), 513-516 
(2005) 

C.W. Reynolds: Flocks, herds, and schools: A dis- 
tributed behavioral model, Comput. Graph. 21(4), 
25-34 (1987) 

A.E. Turgut, H. Celikkanat, F. Gökçe, E. Sahin: Self- 
organized flocking in mobile robot swarms, Swarm 
Intell. 2(2-4), 97-120 (2008) 

H. Celikkanat, E. Sahin: Steering self-organized 
robot flocks through externally guided individuals, 
Neural Comput. Appl. 19(6), 849-865 (2010) 

A. Campo, S. Nouyan, M. Birattari, R. Groß, 
M. Dorigo: Negotiation of goal direction for co- 
operative transport, Lect. Notes Comput. Sci. 4150, 
191-202 (2006) 

G. Baldassarre, V. Trianni, M. Bonani, F. Mondada, 
M. Dorigo, S. Nolfi: Self-organised coordinated mo- 
tion in groups of physically connected robots, IEEE 
Trans. Syst. Man Cybern. B 37(1), 224-239 (2007) 

V. Trianni, M. Dorigo: Self-organisation and com- 
munication in groups of simulated and physical 
robots, Biol. Cybern. 95, 213-231 (2006) 

D.H. Wolpert, W.G. Macready: No free lunch theo- 
rems for search. Technical Report SFI-TR-95-02-010 
(Santa Fe Institute 1995) 

T.J. Pitcher, A.E. Magurran, |.J. Winfield: Fish in 
larger shoals find food faster, Behav. Ecol. Socio- 
biol. 10(2), 149-151 (1982) 

T.J. Pitcher, J.K. Parrish: Functions of shoaling be- 
haviour in teleosts, Behav. Teleost Fishes 2, 369- 
439 (1993) 

D.J. Hoare, I.D. Couzin, J.-G.J. Godin, J. Krause: 
Context-dependent group size choice in fish, Anim. 
Behav. 67(1), 155-164 (2004) 

E.A. Codling, M.J. Plank, S. Benhamou: Random 
walk models in biology, J. R. Soc. Interface 5(25), 
813-834 (2008) 

P. Turchin: Quantitative Analysis of Movement: 
Measuring and Modeling Population Redistribu- 
tion in Animals and Plants (Sinauer Associates 
Sunderland, Massachusetts 1998) 

A. Okubo, S.A. Levin: Diffusion and Ecological Prob- 
lems: Modern Perspectives, Vol. 14 (Springer, Berlin, 
Heidelberg 2001) 

S. Benhamou: Spatial memory and searching effi- 
ciency, Animal Behav. 47(6), 1423-1433 (1994) 

J.-L. Deneubourg, S. Goss, N. Franks, A. Sendova- 
Franks, C. Detrain, L. Chrétien: The dynamics of col- 


Fundamental Collective Behaviors in Swarm Robotics 


References 


71.49 


71.50 


71.51 


71.52 


71.53 


71.54 


71.55 


71.56 


71.57 


71.58 


71.59 


71.60 


71.61 


71.62 
71.63 


lective sorting robot-like ants and ant-like robots, 
Proc. 1st Int. Conf. Simul. Adapt. Behav. Anim. An- 
imat. (1991) pp. 356-363 

T. Schmickl, K. Crailsheim: Trophallaxis within 
a robotic swarm: Bio-inspired communication 
among robots in a swarm, Auton. Robot. 25(1), 171- 
188 (2008) 

Á. Gutiérrez, A. Campo, F. Santos, F. Monasterio- 
Huelin Maciá, M. Dorigo: Social odometry: Imi- 
tation based odometry in collective robotics, Int. 
J. Adv. Robot. Syst. 6(2), 129-136 (2009) 

W. Burgard, M. Moors, D. Fox, R. Simmons, S. Thrun: 
Collaborative multi-robot exploration, Proc. IEEE 
Int. Conf. Robot. Autom. (ICRA '00), San Francisco 
(2000) pp. 476-481 

D. Payton, M. Daily, R. Estowski, M. Howard, C. Lee: 
Pheromone robotics, Auton. Robot. 11(3), 319-324 
(2001) 

D. Payton, R. Estkowski, M. Howard: Progress in 
pheromone robotics. In: Intelligent Autonomous 
Systems 7, ed. by M. Gini, W.-M. Shen, C. Torras, 
H. Yuasa (IOS, Amsterdam 2002) pp. 256-264 

A. Howard, M.J. Matarić, G.S. Sukhatme: An in- 
cremental self-deployment algorithm for mobile 
sensor networks, Auton. Robot. 13(2), 113-126 (2002) 
M. Batalin, G. Sukhatme: Spreading out: A local 
approach to multi-robot coverage. In: Distributed 
Autonomous Robotic Systems 5, ed. by H. Asama, 
T. Arai, T. Fukuda, T. Hasegawa (Springer, Berlin, 
Heidelberg 2002) pp. 373-382 

B.B. Werger, M.J. Matarić: Robotic food chains: 
Externalization of state and program for minimal- 
agent foraging, Proc. 4th Int. Conf. Simul. Adapt. 
Behav. Anim. Animat., ed. by P. Maes, M.J. Mataric, 
J. Meyer, J. Pollack, S. Wilson (MIT, Cambridge 1996) 
pp. 625-634 

S. Nouyan, R. Groß, M. Bonani, F. Mondada, 
M. Dorigo: Teamwork in self-organized robot 
colonies, IEEE Trans. Evol. Comput. 13(4), 695-711 
(2009) 

L. Chittka, P. Skorupski, N.E. Raine: Speed-accuracy 
tradeoffs in animal decision making, Trends Ecol. 
Evol. 24(7), 400-407 (2009) 

N.R. Franks, A. Dornhaus, J.P. Fitzsimmons, 
M. Stevens: Speed versus accuracy in collective de- 
cision making, Proc. R. Soc. B 270(1532), 2457-2463 
(2003) 

J.A.R. Marshall, A. Dornhaus, N.R. Franks, T. Ko- 
vacs: Noise, cost and speed-accuracy trade-offs: 
Decision-making in a decentralized system, J. R. 
Soc. Interface 3(7), 243-254 (2006) 

A. Gutiérrez, A. Campo, F.C. Santos, F. Monasterio- 
Huelin, M. Dorigo: Social odometry: Imitation 
based odometry in collective robotics, Int. J. Adv. 
Robot. Syst. 6(2), 1-8 (2009) 

F. Galton: Vox populi, Nature 75, 450-451 (1907) 
A.M. Simons: Many wrongs: The advantage of 
group navigation, Trends Ecol. Evol. 19(9), 453-455 
(2004) 


71.64 


71.65 


71.66 


71.67 


71.68 


71.69 


71.70 


71.71 


71.72 


71.73 


71.74 


71.75 


71.76 


71.77 


71.78 


71.79 


E.A. Codling, J.W. Pitchford, S.D. Simpson: Group 
navigation and the many-wrongs principle in 
models of animal movement, Ecology 88(7), 1864- 
1870 (2007) 

J.-L. Deneubourg, S. Goss: Collective patterns and 
decision-making, Ethol. Ecol. Evol. 1, 295-311 (1989) 
R. Beckers, J.-L. Deneubourg, S. Goss, J.M. Pasteels: 
Collective decision making through food recruit- 
ment, Insectes Soc. 37(3), 258-267 (1990) 

T.D. Seeley, S. Camazine, J. Sneyd: Collective 
decision-making in honey bees: How colonies 
choose among nectar sources, Behav. Ecol. Socio- 
biol. 28(4), 277-290 (1991) 

T.D. Seeley, S.C. Buhrman: Nest-site selection in 
honey bees: How well do swarms implement the 
best-of-N decision rule?, Behav. Ecol. Sociobiol. 
49(5), 416-427 (2001) 

F. Saffre, R. Furey, B. Krafft, J.-L. Deneubourg: 
Collective decision-making in social spiders: 
Dragline-mediated amplification process acts as 
a recruitment mechanism, J. Theor. Biol. 198(4), 
507-517 (1999) 

J.M. Amé, J. Halloy, C. Rivault, C. Detrain, J.- 
L. Deneubourg: Collegial decision making based on 
social amplification leads to optimal group forma- 
tion, Proc. Natl. Acad. Sci. USA 103(15), 5835-5840 
(2006) 

0. Petit, J. Gautrais, J.B. Leca, G. Theraulaz, 
J.-L. Deneubourg: Collective decision-making in 
white-faced capuchin monkeys, Proc. R. Soc. B 
276(1672), 3495 (2009) 

P. Michelena, R. Jeanson, J.-L. Deneubourg, 
A.M. Sibbald: Personality and collective decision- 
making in foraging herbivores, Proc. R. Soc. B 
277(1684), 1093 (2010) 

S. Goss, S. Aron, J.-L. Deneubourg, J.M. Pas- 
teels: Self-organized shortcuts in the Argentine 
ant, Naturwissenschaften 76(12), 579-581 (1989) 
D.J.T. Sumpter, J. Krause, R. James, 1.D. Couzin, 
A.J.W. Ward: Consensus decision making by fish, 
Curr. Biol. 18(22), 1773-1777 (2008) 

A.J.W. Ward, D.J.T. Sumpter, I.D. Couzin, P.J.B. Hart, 
J. Krause: Quorum decision-making facilitates in- 
formation transfer in fish shoals, Proc. Natl. Acad. 
Sci. USA 105(19), 6948 (2008) 

S.C. Pratt, E.B. Mallon, D.J. Sumpter, N.R. Franks: 
Quorum sensing, recruitment, and collective 
decision-making during colony emigration by the 
ant Leptothorax albipennis, Behav. Ecol. Sociobiol. 
52(2), 117-127 (2002) 

D.J.T. Sumpter, S.C. Pratt: Quorum responses and 
consensus decision making, Philos. Trans. R. Soc. 
B 364(1518), 743-753 (2009) 

S. Canonge, J.-L. Deneubourg, S. Sempo: Group liv- 
ing enhances individual resources discrimination: 
The use of public information by cockroaches to as- 
sess shelter quality, PLoS ONE 6(6), e19748 (2011) 
T.D. Seeley, P.K. Visscher, T. Schlegel, P.M. Hogan, 
N.R. Franks, J.A.R. Marshall: Stop signals pro- 


1393 


LZ | 4 Hed 


1394 Part F 


Swarm Intelligence 


LZ | 4 Hed 


71.80 


71.81 


71.82 


71.83 


71.84 


71.85 


vide cross inhibition in collective decision-making 
by honeybee swarms, Science 335(6064), 108-111 
(2012) 

S. Garnier, J. Gautrais, M. Asadpour, C. Jost, G. Ther- 
aulaz: Self-organized aggregation triggers collec- 
tive decision making in a group of cockroach-like 
robots, Adapt. Behav. 17(2), 109-133 (2009) 

J. Halloy, G. Sempo, G. Caprari, C. Rivault, M. Asad- 
pour, F. Tâche, I. Said, V. Durier, S. Canonge, 
J.M. Amé, C. Detrain, N. Correll, A. Martinoli, 
F. Mondada, R. Siegwart, J.L. Deneubourg: Social 
integration of robots into groups of cockroaches to 
control self-organized choices, Science 318(5853), 
1155 (2007) 

R. Olfati-Saber, R.M. Murray: Consensus problems 
in networks of agents with switching topology 
and time-delays, IEEE Trans. Autom. Control 49(9), 
1520-1533 (2004) 

R. Olfati-Saber, J.A. Fax, R.M. Murray: Consensus 
and cooperation in networked multi-agent sys- 
tems, Proc. IEEE 95(1), 215-233 (2007) 

T. Schmickl, R. Thenius, C. Moeslinger, G. Rad- 
spieler, S. Kernbach, M. Szymanski, K. Crailsheim: 
Get in touch: Cooperative decision making based 
on robot-to-robot collisions, Auton. Agents Multi- 
Agent Syst. 18(1), 133-155 (2009) 

K. Sugawara, T. Kazama, T. Watanabe: Forag- 
ing behavior of interacting robots with virtual 


71.86 


71.87 


71.88 


71.89 


pheromone, Proc. Int. Conf. Intell. Robot. Syst. 
(IROS 2004) (2004) pp. 3074-3079 

S. Garnier, F. Tache, M. Combe, A. Grimal, G. Ther- 
aulaz: Alice in pheromone land: An experimental 
setup for the study of ant-like robots, Proc. IEEE 
Swarm Intell. Symp. (SIS 2007), Piscataway (2007) 
pp. 37-44 

A. Campo, Á. Gutiérrez, S. Nouyan, C. Pinciroli, 
V. Longchamp, S. Garnier, M. Dorigo: Artificial 
pheromone for path selection by a foraging 
swarm of robots, Biol. Cybern. 103(5), 339-352 
(2010) 

E.J.H. Robinson, N.R. Franks, S. Ellis, S. Okuda, 
J.A.R. Marshall: A simple threshold rule is suffi- 
cient to explain sophisticated collective decision- 
making, PLoS ONE 6(5), e19981 (2011) 

M. Dorigo, D. Floreano, L.M. Gambardella, F. Mon- 
dada, S. Nolfi, T. Baaboura, M. Birattari, M. Bonani, 
M. Brambilla, A. Brutschy, D. Burnier, A. Campo, 
A.L. Christensen, A. Decugnire, G.A. Di Caro, 
F. Ducatelle, E. Ferrante, A. Fröster, J.M. Gonzales, 
J. Guzzi, V. Longchamp, S. Magnenat, N. Math- 
ews, M.A. de Montes Oca, R. O'Grady, C. Pinciroli, 
G. Pini, P. Rétornaz, J. Roberts, V. Sperati, T. Stirling, 
A. Stranieri, T. Stiitzle, V. Trianni, E. Tuci, A.E. Turgut, 
F. Vaussard: Swarmanoid: A novel concept for 
the study of heterogeneous robotic swarms, IEEE 
Robot. Autom. Mag. 20(4), 60-71 (2012) 


1395 


72. Collective Manipulation and Construction 


Lynne Parker 


Many practical applications can make use of robot 
collectives that can manipulate objects and con- 
struct structures. Examples include applications in 
warehousing, truck loading and unloading, trans- 
porting large objects in industrial environments, 
and assembly of large-scale structures. Creating 
such systems, however, can be challenging. When 
collective robots work together to manipulate 
physical objects in the environment, their inter- 
actions necessarily become more tightly coupled. 
This need for tight coupling can lead to important 
control challenges, since actions by some robots 
can directly interfere with those of other robots. 
This chapter explores techniques that have been 
developed to enable robot swarms to effectively 
manipulate and construct objects in the environ- 
ment. The focus in this chapter is on decentralized 
manipulation and construction techniques that 
would likely scale to large robot swarms (at least 10 
robots), rather than approaches aimed primarily 
at smaller teams that attempt the same objectives. 
This chapter first discusses the swarm task of object 
transportation; in this domain, the objective is for 


72.1 Object Transportation 


Some of the earliest work in swarm robotics was aimed 
at the object transportation task [72.1—6], which re- 
quires a swarm of robots to move an object from its 
current position in the environment to some goal des- 
tination. The primary benefit of using collective robots 
for this task is that the individual robots can combine 
forces to move objects that are too heavy for individ- 
ual robots working alone or in small teams. However, 
the task is not without its challenges; it is nontrivial to 
design decentralized robot control algorithms that can 
effectively coordinate robot team members during ob- 
ject transportation. A further complication is that the 
interaction dynamics of the robots with the object can 


72.1 Object Transportation ......................... 1395 
72.1.1 Transport by Pushing................. 1396 
72.1.2 Transport by Grasping................ 1397 
72.1.3 Transport by Coping sesers: 1400 

72.2 Object Sorting and Clustering............... 1401 

72.3 Collective Construction 
and Wall Building........................c008 1402 

72.4 Conclusions enore 1404 

ROTONOUOS sods ceccesicecisscbarsevesensaaeeceatssneaeanaees 1404 


robots to collectively move objects through the 
environment to a goal destination. The chapter 
then discusses object clustering and sorting, which 
requires objects in the environment to be aggre- 
gated at one or more locations in the environment. 
The final task discussed is that of collective con- 
struction and wall building, in which robots work 
together to build a prespecified structure. While 
these different tasks vary in their specific objec- 
tives for collective manipulation, they also have 
several commonalities. This chapter explores the 
state of the art in this area. 


be sensitive to certain object geometries [72.7, 8] and 
object rotations during transportation [72.8], thus exac- 
erbating the control problem. 

There are many ways to compare and contrast 
alternative distributed techniques to collective object 
transport. The most common distinctions are: 


v 
fa] 
e 
i 
“I 
N 
. 
= 


@ Local knowledge only versus some required global 
knowledge (e.g., of team size, state, position). 

@ Homogeneous swarms versus heterogeneous 
swarms (e.g., teams with leaders and followers). 

@ Manual controller design versus autonomously 
learned control. 


1396 PartF | Swarm Intelligence 


L'Z | d Hed 


@ 2-D (two-dimensional) vs. 3-D (three-dimensional) 
environments. 

@ Obstacle-free environments versus cluttered envi- 
ronments. 

@ Static environments versus dynamic environments. 

@ Dependent on fully functioning robots versus sys- 
tems robust to error. 


Alternatively, we can compare transportation tech- 
niques by focusing on the specific manipulation tech- 
nique employed. The manipulation techniques used for 
collective object transportation can be grouped into 
three primary methods [72.9]: pushing, grasping, and 
caging. The pushing approach requires contact between 
each robot and the object, in order to impart force in the 
goal direction; however, the robots are not physically 
connected with the object. In the grasping approach, 
each robot in the swarm is physically attached to the 
object being transported. Finally, the caging approach 
involves robots encircling the object so that the object 
moves in the desired direction, even without the con- 
stant contact of all the robots with the object. 

This section outlines some of the key techniques 
developed to address this object transportation task, or- 
ganized according to these three main techniques. 


72.1.1 Transport by Pushing 


A canonical task often used as a testbed in distributed 
robotics is the box pushing task. The number, size, 
or weight of the boxes can be varied to explore dif- 
ferent types of multirobot cooperation. This task typ- 
ically involves robots first locating a box, positioning 
themselves at the box, and then moving the box co- 
operatively toward a goal position. Typically, this task 
is explored in 2-D. The domain of box pushing is 
also popular because it has relevance to several real- 
world applications [72.10], including warehouse stock- 
ing, truck loading and unloading, transporting large 
objects in industrial environments, and assembling of 
large-scale structures. 

The pushing technique was first demonstrated in 
the early work of Kube and Zhang [72.1], inspired 
by the cooperative transport behavior in ants [72.7]. 
They proposed a behavior-based approach that com- 
bined behaviors for seeking out the object (illuminated 
by a light), avoiding collisions, following other robots, 
and motion control. An additional behavior to detect 
stagnation was used to ensure that the collective did 
not work consistently against each other. In this ap- 
proach, all robots acted similarly; there was no concept 


of a leader and followers. While some of the robots in 
the swarm might not contribute to the pushing task due 
to poor alignment or positioning along the nondominant 
pushing direction, Kube and Zhang showed that care- 
ful design of these behaviors enabled the robot swarm 
to distribute along the boundary of the object and push 
it. Figure 72.1 shows five robots cooperatively pushing 
a lighted box. 

Other researchers have explored different aspects of 
box pushing in multirobot systems. While much of this 
early work involved demonstrations of smaller robot 
teams, many of these techniques could theoretically 
scale to larger numbers of robots. Task allocation and 
action selection are often demonstrated using collec- 
tive box pushing experiments; examples of this work 
include that of Parker [72.11, 12], who illustrated as- 
pects of adaptive task allocation and learning; Gerkey 
and Martaric [72.13], who present a publish/subscribe 
dynamic task allocation method; and Yamada and 
Saito [72.14], who develop a behavior-based action 
selection technique that does not require any commu- 
nication. 

Other work using box pushing as an implemen- 
tation domain for multirobot studies includes Donald 
etal. [72.15], who illustrates concepts of informa- 
tion invariance and the interchangeability of sensing, 
communication, and control; Simmons et al. [72.16], 
who demonstrate the feasibility of cooperative con- 
trol for building planetary habitats, Brown and Jen- 
nings [72.17], and Béhringer et al. [72.18], who ex- 
plored notions of strong cooperation without communi- 
cation in pusher/steerer models, Rus et al. [72.19], who 


Fig. 72.1 Demonstration of five robots collectively push- 
ing a lighted box (after [72.7]) 


Collective Manipulation and Construction | 72.1 Object Transportation 


studied different cooperative manipulation protocols in 
robot teams that make use of different combinations 
of state, sensing, and communication, and Jones and 
Mataric [72.20], who developed general methods for 
automatically synthesizing controllers for multirobot 
systems. 

Most of this existing work in box pushing has fo- 
cused, not on box pushing as the end objective, but 
rather on using box pushing for demonstrating various 
techniques for multirobot control. However, for studies 
whose primary objective is to generate robust cooper- 
ative transport techniques, work has more commonly 
focused on manipulation techniques involving grasp- 
ing and caging, rather than pushing, since grasping and 
caging provide more controllability by the robot team. 


72.1.2 Transport by Grasping 


Grasping approaches for object transportation in swarm 
robotics typically make use of form closure and force 
closure properties [72.21]. In form closure, the ob- 
ject motion is constrained via frictionless contact con- 
straints; in force closure, frictional contact forces ex- 
erted by the robots prevent unwanted motions of the 
manipulated object. The earliest work representing the 
grasping technique is that of Wang et al. [72.4]. This ap- 
proach uses form closure, along with a behavior-based 
control approach that is similar to the early swarm 
robot pushing technique of Kube and Zhang [72.1]. 
The technique of Wang etal. called BeRoSH (for 
Behavior-based Multiple Robot System with Host for 
Object Manipulation), incorporates behaviors for push- 
ing, maintaining contact, moving, and avoiding objects. 
In this approach, the goal pose of the object is provided 
directly to each robot from an external source (i.e., 
the Host); otherwise, the robots work independently ac- 
cording to their designed behaviors. As a collective, the 
swarm exhibits form closure. Wang et al. showed that 
this form closure technique can successfully transport 
an object to its desired goal pose from a variety of dif- 
ferent starting locations. 

Another early work using the force closure grasping 
technique is that of Stilwell and Bay [72.2] and Johnson 
and Bay [72.3]. They developed distributed leader- 
follower techniques that enable swarms of tank-like 
robots to transport pallets collectively while maintain- 
ing a level height of the pallet during transportation 
(Fig. 72.2). In their approaches, a pallet sits atop sev- 
eral tank-like robots; the weight of the pallet creates 
a coupling with the robots that could be viewed sim- 
ilar to a grasp. To transport the pallet, one vehicle is 


designated as the leader. This leader then perturbs the 
dynamics of the system to move the swarm in the de- 
sired direction, and with the desired pallet height. The 
remaining robots in the swarm react to the perturbations 
to stabilize the forces in the system. The system is fully 
distributed, and requires robots to only use local force 
information to achieve the collective motion. The in- 
dividual robots do not require knowledge of the pallet 
mass or inertia, the size of the collective, or the robot 
positions relative to the pallet’s center of gravity. They 
showed the control stability of their approach for this 
application, even in the presence of inaccurate sensor 
data. 

A related approach is that of Kosuge and 
Oosumi [72.5], who also used a decentralized leader— 
follower approach for multiple holonomic robots grasp- 
ing and moving an object, in a manner similar to that 
of [72.2]. Their approach defines a compliant motion 
control algorithm for each velocity-controlled robot. 
The main difference of this work compared to [72.2] 
is that the control algorithm specifies the desired in- 
ternal force as part of the coordination algorithm. This 
approach was validated in simulation for robots carry- 
ing an aluminum steel pipe. 

Another related approach is that of Miyata 
et al. [72.6], who addressed the need for nonholonomic 
vehicles to regrasp the object during transport. Their ap- 
proach includes a hybrid system that makes use of both 
centralized and decentralized planners. The centralized 
planner develops an approximate motion plan for the 
object, along with a regrasping plan at low resolution; 
the decentralized planner precisely estimates object mo- 
tion and robot control at a much higher resolution. 


Fig. 72.2 Cooperative transport of a pallet using tank-like 
robots (after [72.2]) 


1397 


lZ | 4 Hed 


13398 PartF 


Swarm Intelligence 


L'Z | 4 Hed 


They demonstrated the effectiveness of this approach 
in simulation. 

Sugar and Kumar [72.22] developed distributed 
control algorithms enabling robots with manipulators to 
grasp and cooperatively transport a box. In this work, 
a novel manipulator design enables the locomotion 
control to be decoupled from the manipulation con- 
trol. Only a small number of the team members need 
to be equipped with actively controlled end effectors. 
This approach was shown to be robust to position- 
ing errors related to the misalignment between the two 
platforms and errors in the measurement of the box 
size. 

Cooperative stick pulling [72.23, 24] was explored 
by Jjspeert et al.; this task requires robots to pull sticks 
out of the ground (Fig. 72.3). The robot controllers 
are behavior-based, and include actions such as look- 
ing for sticks, detecting sticks, gripping sticks, obstacle 
avoidance, and stick release. Experiments show that the 
dynamics are dependent on the ratio between the num- 
ber of robots and sticks; that collaboration can increase 
superlinearly with certain team sizes; that heterogene- 
ity in the robots can increase the collaboration rate 
in certain circumstances; and that a simple signalling 
scheme can increase the effectiveness of the collabo- 
ration for certain team sizes. A main objective of this 
research was to explore the effectiveness of various 
modeling techniques for group behavior. These model- 
ing techniques are discussed in more detail in a separate 
chapter. 

The SWARM-BOTS project is a more recent ex- 
ample of the use of grasping for collective transport; 
it also makes use of self-assembly as a novel approach 
for achieving distributed transport. In this work [72.25], 
s-bot robots are developed that have grippers en- 
abling the robots to create physical links with other 
s-bots or objects, thus creating assemblies of robots. 
These assemblies can then work together for naviga- 
tion across rough terrain, or to collectively transport 
objects. The s-bots are cylindrical, with a flexible arm 


Fig. 72.3 Stick pulling experiment using robot collectives 
(after [72.23]) 


and toothed gripper that can connect one s-bot to an- 
other (Fig. 72.4). 

The decentralized control of the SWARM-BOT 
robots is learned using evolutionary techniques in sim- 
ulation, then ported to the physical robots. The learned 
s-bot control [72.26] consists of an assembly module, 
which is a neural network that controls the robot prior 
to connection, and a transport module, which is a neu- 
ral network that enables the s-bot to move the object 
toward the goal after a grasp connection is made. The 
self-assembly process involves the use of a red-colored 
seed object, to which other s-bots are attracted. S-bots 
initially light themselves with a blue ring, and then are 
attracted to the red color, while being repulsed by the 
blue color. Once robots make a connection, they color 
themselves red. 

The interaction of these attractive and repulsive 
forces across the s-bots enables the robots to self- 
assemble into various connection patterns. Once the s- 
bots have self-assembled, they use the transport module 
to align toward a light source, which indicates the tar- 
get position. The s-bots then apply pushing and pulling 
motions to transport the object to the destination. Simi- 
lar to the approach of Kube and Zhang [72.1], the s-bots 
also check for stagnation and execute a recovery move 
when needed. The authors demonstrate [72.8] how the 
evolutionary learning approach allows the collective 


Fig. 72.4 An s-bot, developed as part of the SWARM- 
BOTS project (after [72.25]) 


Collective Manipulation and Construction | 72.1 Object Transportation 


to successfully deal with different object geometries, 
adapt to changes in target location, and scale to larger 
team sizes. 

This technique for collective transport using self- 
assembly was demonstrated [72.25] in an interesting 
application of object transport, in which 20 s-bots self- 
assembled into four chains in order to pull a child across 
the floor (Fig. 72.5). In this experiment, the user spec- 
ifies the number of assembled chains, the distribution 
of the s-bots into the chains, the global localization of 
the child, and the global action timing. The s-bots then 
autonomously form the chains using self-assembly and 
execute the pull. 

Several additional interesting phenomena regarding 
collective transport were discovered in related studies 
with the SWARM-BOTS. Nouyan et al. [72.27] showed 
that the different collective tasks of path formation, 
self-assembly, and group transport can be solved in 
a single system using a homogeneous robot team. They 
further introduce the notion of chains with cycle di- 
rectional patterns, which facilitate swarm exploration 
in unknown environments, and assist in establishing 
paths between the object and goal. The paths estab- 
lished by the robot-generated chains mimic pheromone 
trails in ants. In [72.28], Grof and Dorigo determined 
that, while robots that behave as if they are solitary 
robots can still collectively move objects, robots that 
learn transport behaviors in a group can achieve a better 
performance. In [72.29], Campo et al. showed that the 
SWARM-BOTS robots could effectively transport ob- 


Fig. 72.5 SWARM-BOTS experiment in which s-bots 
self-assemble to pull a child across the floor (after [72.25]) 


jects even with only partial knowledge of the direction 
of the goal. They investigated four alternative control 
strategies, which vary in the degree to which the robots 
negotiate regarding the goal position during transport. 
Their results showed that negotiating throughout ob- 
ject transport can improve motion coordination. All of 
these works are based on inspiration from biological 
systems. 

The work of Berman et al. [72.31] is not only bio- 
inspired, but also seeks to directly model the group 
retrieval behavior in ants. Their studies examined the 
ants’ roles during transport in order to define rules that 
govern the ants’ actions. They further explored mea- 
surements of individual forces used by the ants to guide 
food to their nest. They found that the distributed ant 
transport behavior exhibits an initial disordered phase, 
which then transitions to a more highly coordinated 
phase of increased load speed. From these studies, 
a computational dynamic model of the ant behavior was 
designed and implemented in simulations, showing that 
the derived model matches ant behavior. Ultimately, 
this approach could be adapted for use on physical robot 
teams. 

Once a robot collective has begun transporting an 
object, the question arises as to how new robots can 
join the group to help with the transport task. Es- 
posito [72.30] addresses this challenge by adapting 
a grasp quality function from the multifingered hand 
literature. This approach assumes that robots know the 
object geometry, the total number of robots in the 
swarm, and the actuator limitation. Individual robot 
contact configurations are defined relative to the ob- 
ject center and object boundary. The objective is to 
find an optimal position for a new robot by opti- 
mizing across the grasping wrench space. A numeri- 
cal algorithm was developed to address this problem, 
which incorporates the force closure criteria. This ap- 


Fig. 72.6 Illustration of unmanned tugboats autonomously 
transporting a barge (after [72.30]) 


1399 


lZ | 4 Hed 


1400 Part F | Swarm Intelligence 


L'Z | 4 Hed 


Fig. 72.7a-e Illustration in simulation of object closure 
by 20 robots (after [72.32]) 


Fig. 72.8 Demonstration of the use of vector fields for col- 
lective transport via caging (after [72.33]) 


proach was demonstrated on unmanned tugboats col- 
lectively moving a barge, as illustrated in Fig. 72.6. 
In this demonstration, the robots are equipped with 
articulated magnetic attachments that allow them to 
grasp the barge. This approach is scalable to larger 
numbers of robots, with constant best case runtime, 


and median runtimes polynomial in the number of 
robots. 


72.1.3 Transport by Caging 


The caging approach simplifies the object manipulation 
task, compared to the grasping approach, by making 
use of the concept of object closure [72.34]. In ob- 
ject closure, a bounded movable area is defined for 
the object by the robots surrounding it. The benefit 
of this approach is that continuous contact between 
the object and the robots is not needed, thus making 
for simpler motion planning and control techniques, 
compared to grasping techniques based on the form or 
force closure. Wang and Kumar [72.32] developed this 
object-closure technique under the assumptions that the 
robots are circular and holonomic, the object is star- 
shaped, the robots know the number of robots in the 
collective, and can estimate the geometric properties 
of the object, along with the distance and orientation 
to other robots and the object. Their approach causes 
the robots to first approach the object independently, 
and then search for an inescapable formation, which 
is a configuration of the robots from which the object 
cannot escape. Finally, the robots execute a formation 
control strategy to guide the object to the goal des- 
tination. The object approach technique is based on 
potential fields, in which force vectors attract the robot 
toward the object and generally away from other robots. 
Song and Kumar [72.35] proved the stability of this po- 
tential field approach for collective transport. Robots 
search for proper configurations around the object by 
representing the problem as a path finding problem in 
configuration space. This work describes a necessary 
and sufficient condition for testing for object closure. 
Later work [72.36] presents a fast algorithm to test for 
object closure. Experiments with 20 robots validate the 
proposed approach (Fig. 72.7). 

A further enhancement of this vector-based control 
strategy was developed in [72.33], which can account 
for inter-robot collisions. This latter strategy imple- 
ments three primary behaviors — approach, surround, 
and transport. In this variant of the work, robots con- 
verge to a smooth boundary using control-theoretic 
techniques. This work was implemented on a collective 
of physical robots, as illustrated in Fig. 72.8. 


Collective Manipulation and Construction 


72.2 Object Sorting and Clustering 


72.2 Object Sorting and Clustering 


Collective object sorting and clustering requires robot 
teams to sort objects from multiple classes, typically 
into separate physical clusters. There are different types 
of related tasks in this domain [72.37], including clus- 
tering, segregation, patch sorting, and annular sorting. 
Early discussions of this task in robot swarms were 
given by Deneubourg etal. [72.38], with the ideas 
inspired by similar behaviors in ant colonies. The ob- 
jective is to achieve clustering and sorting behaviors 
without any need for hierarchical decision making, 
inter-robot communication, or global representations of 
the environment. Deneubourg et al. showed that stig- 
mergy could be used to cluster scattered objects of 
a single type, or to sort objects of two different types. To 
achieve the sorting behavior, the robots sensed the local 
densities of the objects, as well as the type of object 
they were carrying. Clustering resulted from a simi- 
lar mechanism operating on a single type of object. 
Beckers etal. [72.39] achieved clustering from even 
simpler robots and behaviors, via stigmergic threshold 
mechanisms. 

Holland and Melhuish [72.37] explored the ef- 
fect of stigmergy and self-organization in swarms of 
homogeneous physical robots. The robots are pro- 
grammed with simple rule sets with no ability for 
spatial orientation or memory. The experiments show 
the ability of the robots to achieve effective sort- 
ing and clustering, as illustrated in Fig. 72.9. In this 
work, a variety of influences were explored, includ- 
ing boundary effects and the distance between ob- 
jects when deposited. The authors concluded that the 
effectiveness of the developed sorting behaviors is 
critically dependent on the exploitation of real-world 
physics. An implication of this finding is that simu- 
lators must be used with care when exploring these 
behaviors. 

Wang and Zhang [72.40, 41] explored similar aims, 
but focused on discovering a general approach to the 
sorting problem. They conjecture that the outcome of 
the sorting task is dependent primarily on the capabil- 
ities of the robots, rather than the initial configuration. 
This conjecture is validated in simulation experiments, 
as illustrated in Fig. 72.10. 

Other work on this topic includes that of Yang 
and Kamel [72.42], who present research using three 
colonies of ants having different speed models. The 
approach is a two-step process. The first step is for clus- 
terings to be visually formed on the plane by agents 
walking, picking up, or setting down objects according 


to a probabilistic model, which is based on Deneubourg 
et al. [72.38]. The second step is for clusters to be com- 
bined using a hypergraph model. Experiments were 
conducted in simulation to show the viability of the 
approach. The authors also discovered that having too 
many agents can lead to a deterioration in the swarm 
performance. 

Martinoli and Mondada [72.43] implemented an- 
other bio-inspired approach to clustering, in which the 
robot behavior is similar to that of a Braitenberg vehi- 
cle. They also discovered that large numbers of robots 
can cause interference in this task, concluding that non- 
cooperative task cannot always be improved with more 
robots. 


~——S ee! 


Fig. 72.9 Results of physical robot experiments in sorting. Panel 
(a) shows the starting configuration, while (b) shows the sorting 
results after 1.75 h (after [72.37]) 


Fig. 72.10 Results of simulations of sorting tasks, with 8 
robots and 40 objects of two types (after [72.40]) 


14.01 


TZL | 4 Hed 


1402 Part F | Swarm Intelligence 


72.3 Collective Cor 
The objective of the collective construction and wall 
building task is for robots to build structures of a spec- 
ified form, in either 2-D or 3-D. This task is distin- 
guished from self-reconfigurable robots, whose bodies 
themselves serve as the dynamic structure. This sec- 
tion is focused on the former situation, in which ma- 
nipulation is required to create the desired structure. 
The argument in favor of this separation of mobil- 
ity and structure is that, once formed, the structure 
does not need to move again, and thus the ability 
to move could serve as a liability [72.44]. Further- 
more, robotic units that serve both as mobility and 
structure might not be effective as passive structural 
elements. 


truction and Wall B 


uilding 

Werfel etal. have extensively explored this topic, 
developing distributed algorithms that enable simplified 
robots to build structures based on provided blueprints, 
both in 2-D [72.45-47] and in 3-D [72.44]. In their 
3-D approach, the system consists of idealized mobile 
robots that perform the construction, and smart blocks 
that serve as the passive structure. The robots’ job is to 
provide the mobility, while the blocks’ role is to identify 
places in the growing structure at which an additional 
block can be placed that is on the path toward obtain- 
ing the desired final structure. The goal of their work is 
to be able to deploy some number of robots and free 
blocks into a construction zone, along with a single 
block that serves as a seed for the structure, and then 
have the construction to proceed autonomously accord- 
ing to the provided blueprint of the desired structure. 
Several simplifying assumptions are made in this 


ei i work [72.44], such as the environment being weight- 
Hih less and the robots being free to move in any direction 

in 3-D, including along the surface of the structure un- 

512 blocks = 451 blocks 220blocks 258 blocks 465blocks| {CT Construction. This work does not address physical 


robot navigation and locomotion challenges, grasping 


Fig. 72.11 Experiments for a variety of 3-D structures, built au- challenges, etc. 


tonomously by a system of simple robots and blocks (after [72.44]) 


Fig. 72.12a-f Proof-of-principle experiments for 2-D con- 
struction, using a single robot (after [72.45]) 


In this approach, blocks are smart cubes; they 
can communicate with attached neighbors, they share 
a global coordinate system, and they can communi- 
cate with passing robots regarding the validity of block 
attachments to exposed faces. Once robots have trans- 
ported a free block to the structure, they locate attach- 
ment points in one of three ways: random movement, 
systematic search, or gradient following. A signifi- 
cant contribution of this work is the development of 
the block algorithm that enables the blocks to specify 


Fig. 72.13 Geometric structures built by a team of 30 
robots, in simulation (after [72.48]) 


Collective Manipulation and Construction | 72.3 Collective Construction and Wall Building 1403 


Step 1-1 Step 1-2 


Fig. 72.14 Experiments with pro- 


\ totype hardware designed for 


multirobot construction tasks (af- 
ter [72.49]) 


Step 2 


Step 3-1 Step 3-2 


how to grow the developing structure with guarantees, 
and with only limited required communication. More 
specifically, the communication requirement between 
blocks scales linearly in the size of the structure, while 
no explicit communication between the mobile robots 
is needed. 

Experiments using this approach have shown the 
ability of the system to build a variety of structures 
in simulation, as illustrated in Fig. 72.11. A proof-of- 
principle physical robot experiment using a single robot 
in the 2-D case [72.45] is illustrated in Fig. 72.12. 

Werfel [72.48] also describes a system for arranging 
inert blocks into arbitrary shapes. The input to the robot 
system is a high-level geometric program, which is then 
translated by the robots into an appropriate arrange- 
ment of blocks using their programmed behaviors. The 
desired structure is communicated to the robots as a list 
of corners, the angles between corners, and whether 
the connection between corners is to be straight or 
curved. Robots are provided with behaviors such as 
clearing, doneClearing, beCorner, collect, seal, and 
off. Figure 72.13 shows some example structures built 
using this system in simulation. 


Fig. 72.15 Experimental trial demonstrating a swarm 
building a loose wall via a spatiotemporal varying template 
(after [72.50]) 


Hardware challenges of collective robot construc- 
tion are addressed by Terada and Murata [72.49]. In 
this work, a hardware design is proposed that defines 
passive building blocks, along with an assembler robot 
that constructs structures with the robots. Figure 72.14 
shows the prototype hardware completing an assem- 
bly task. In principle, multiple assembler robots could 
work together to create larger construction teams more 
closely aligned with the concept of swarm construc- 
tion. 

Other related work on the topic of collective con- 
struction includes the work of Wawerla et al. [72.51], 
in which robots use a behavior-based approach to build 
a linear wall using blocks equipped with either posi- 
tive or negative Velcro, distinguished by block color. 
Their results show that adding | bit of state informa- 
tion to communicate the color of the last attached block 
provides a significant improvement in the collective 
performance. The work by Stewart and Russell [72.50, 
52] proposes a distributed approach to building a loose 
wall structure with a robot swarm. The approach makes 
use of a spatiotemporal varying light-field template, 
which is generated by an organizer robot to help di- 
rect the actions of the builder robots. Builder robots 
deposit objects in locations indicated by the template. 


Fig. 72.16 Experiments of blind bulldozing for site clearing using 
physical robots (after [72.53]) 


EZZ | 4 Hed 


1404 Part F 


Swarm Intelligence 


ZL | 4 Hed 


Figure 72.15 shows the results from one of the experi- 
ments using this approach on physical robots. 

Another type of construction is called blind bulldoz- 
ing, which is inspired by a behavior observed in certain 
ant colonies. Rather than constructing by accumulat- 
ing materials, this approach achieves construction by 
removing materials. This task has practical application 
in site clearing, such as would be needed for planetary 
exploration [72.54]. Early ideas of this concept were 
discussed by Brooks etal. [72.55], which argues for 
large numbers of small robots to be delivered to the lu- 
nar surface for site preparation. Parker et al. [72.53], 


72.4 Conclusions 


This chapter has surveyed some of the important tech- 
niques that have been developed for collective object 
transport and manipulation. While many advances have 
been made, there are still many open challenges that re- 
main. Some open problems include: How to deal with 
faults in the robot team members during task execution; 
how to address construction in dynamic and cluttered 


further developed this idea by proposing robots using 
force sensors to clear an area by pushing material to 
the edges of the work site. In this approach, the robot 
system collective behavior is modeled in terms of how 
the nest grows over time. Stigmergy is used to control 
the construction process, in that the work achieved by 
each robot affects the other robots’ behaviors through 
the environment. Figure 72.16 shows some experiments 
using this approach on physical robots. The authors ar- 
gue that blind bulldozing is appropriate in applications 
where the cost, complexity, and reliability of the robots 
is a concern. 


environments; how to enable humans to interact with 
the robot swarms; how to extend more of the existing 
techniques to 3-D applications; how to design formal 
techniques for predicting and guaranteeing swarm be- 
havior; how to realize larger scale systems on physical 
robots; and how to apply swarm techniques for manip- 
ulation and construction to practical applications. 


References 
(21. C.R. Kube, H. Zhang: Collective robotics: From social 72.9 Y. Mohan, S.G. Ponnambalam: An extensive re- 
insects to robots, Adapt. Behav. 2(2), 189-218 (1993) view of research in swarm robotics, World Congr. 
72.2 D. Stilwell, J.S. Bay: Toward the development of Nat. Biol. Inspir. Comput. 2009 (2009) pp. 140- 
a material transport system using swarms of ant- 145 
like robots, IEEE Int. Conf. Robot. Autom. (1993) 72.10 D. Nardi, A. Farinelli, L. locchi: Multirobot systems: 
pp. 766-771 A classification focused on coordination, IEEE Trans. 
72.3 P.J. Johnson, J.S. Bay: Distributed control of sim- Syst. Man Cybern. B 34(5), 2015-2028 (2004) 
ulated autonomous mobile robot collectives in 72.11 L.E. Parker: ALLIANCE: An architecture for fault tol- 
payload transportation, Auton. Robot. 2(1), 43-63 erant, cooperative control of heterogeneous mobile 
(1995) robots, Proc. IEEE/RSJ/GI Int. Conf. Intell. Robot. 
72.4 Z. Wang, E. Nakano, T. Matsukawa: Realizing Syst. (1994) pp. 776-783 
cooperative object manipulation using multiple 72.12 L.E. Parker: Lifelong adaptation in heterogeneous 
behaviour-based robots, Proc. IEEE/RSJ Int. Conf. teams: Response to continual variation in individ- 
Intell. Robot. Syst. (1996) pp. 310-317 ual robot performance, Auton. Robot. 8(3), 239-269 
235 K. Kosuge, T. Oosumi: Decentralized control of mul- (2000) 
tiple robots handling an object, Proc. IEEE/RSJ Int. 72.13 B.P. Gerkey, M.J. Mataric: Sold! Auction methods for 
Conf. Intell. Robot. Syst. (1996) pp. 318-323 multi-robot coordination, IEEE Trans. Robot. Au- 
72.6 N. Miyata, J. Ota, Y. Aiyama, J. Sasaki, T. Arai: tom. 18(5), 758-768 (2002) 
Cooperative transport system with regrasping car- 72.14 S. Yamada, J. Saito: Adaptive action selection with- 
like mobile robots, Proc. IEEE/RSJ Int. Conf. Intell. out explicit communication for multirobot box- 
Robot. Syst. (1997) pp. 1754-1761 pushing, IEEE Trans. Syst. Man Cybern. C 31(3), 398- 
72.7 C.R. Kube, E. Bonabeau: Cooperative transport by 404 (2001) 
ants and robots, Robot. Auton. Syst. 30(1), 85-101 72.15 B. Donald, J. Jennings, D. Rus: Analyzing teams of 
(2000) cooperating mobile robots, IEEE Int. Conf. Robot. 
72.8 R. Gro&, M. Dorigo: Towards group transport by Autom. (1994) pp. 1896-1903 
swarms of robots, Int. J. Bio-Inspir. Comput. 1(1), 72.16 R. Simmons, S. Singh, D. Hershberger, J. Ramos, 


1-13 (2009) 


T. Smith: First results in the coordination of hetero- 


Collective Manipulation and Construction 


References 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


geneous robots for large-scale assembly, ISER 7th 
Int. Symp. Exp. Robot. (2000) 

R.G. Brown, J.S. Jennings: A pusher/steerer model 
for strongly cooperative mobile robot manipula- 
tion, Proc. 1995 IEEE Int. Conf. Intell. Robot. Syst. 
(1995) pp. 562-568 

K. Bohringer, R. Brown, B. Donald, J. Jennings, 
D. Rus: Distributed robotic manipulation: Experi- 
ments in minimalism, Lect. Notes Comput. Sci. 223, 
11-25 (1997) 

D. Rus, B. Donald, J. Jennings: Moving furniture 
with teams of autonomous robots, Proc. IEEE/RSJ 
Int. Conf. Intell. Robot. Syst. (1995) pp. 235-242 

C. Jones, M.J. Mataric: Automatic synthesis of 
communication-based coordinated multi-robot 
systems, Proc. IEEE/RJS Int. Conf. Intell. Robot. Syst. 
(2004) pp. 381-387 

A. Bicchi: On the closure properties of robotic 
grasping, Int. J. Robot. Res. 14(4), 319-334 (1995) 
T.G. Sugar, V. Kumar: Control of cooperating mobile 
manipulators, IEEE Trans. Robot. Autom. 18(1), 94- 
103 (2002) 

A.J. \jspeert, A. Martinoli, A. Billard, L.M. Gam- 
bardella: Collaboration through the exploitation 
of local interactions in autonomous collective 
robotics: The stick pulling experiment, Auton. 
Robot. 11(2), 149-171 (2001) 

A. Martinoli, K. Easton, W. Agassounon: Modeling 
swarm robotic systems: A case study in collabora- 
tive distributed manipulation, Int. J. Robot. Res. 
23(4/5), 415-436 (2004) 

F. Mondada, L.M. Gambardella, D. Floreano, 
S. Nolfi, J.L. Deneuborg, M. Dorigo: The coopera- 
tion of swarm-bots: Physical interactions in collec- 
tive robotics, IEEE Robot. Autom. Mag. 12(2), 21-28 
(2005) 

R. Groß, E. Tuci, M. Dorigo, M. Bonani, F. Mon- 
dada: Object transport by modular robots that 
self-assemble, IEEE Int. Conf. Robot. Autom. (2006) 
pp. 2558-2564 

S. Nouyan, R. Groß, M. Bonani, F. Mondada, 
M. Dorigo: Group transport along a robot chain in 
a self-organised robot colony, Proc. 9th Int. Conf. 
Intell. Auton. Syst. (2006) pp. 433-442 

R. Groß, M. Dorigo: Evolution of solitary and group 
transport behaviors for autonomous robots capable 
of self-assembling, Adapt. Behav. 16(5), 285-305 
(2008) 

A. Campo, S. Nouyan, M. Birattari, R. Groß, 
M. Dorigo: Negotiation of goal direction for co- 
Operative transport, Lect. Notes Comput. Sci. 4150, 
191-202 (2006) 

J.M. Esposito: Distributed grasp synthesis for swarm 
manipulation with applications to autonomous 
tugboats, IEEE Int. Conf. Robot. Autom. (2008) 
pp. 1489-1494 

S. Berman, Q. Lindsey, M.S. Sakar, V. Kumar, 
S.C. Pratt: Experimental study and modeling of 
group retrieval in ants as an approach to collec- 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


72. 


32 


33 


34 


35 


36 


37 


38 


39 


72.40 


72.41 


72.42 


72.43 


72.44 


72.45 


72.46 


72.47 


72.48 


tive transport in swarm robotic systems, Proc. IEEE 
99(9), 1470-1481 (2011) 

Z. Wang, V. Kumar: Object closure and manipula- 
tion by multiple cooperating mobile robots, IEEE 
Int. Conf. Robot. Autom. (2002) pp. 394-399 

J. Fink, N. Michael, V. Kumar: Composition of vec- 
tor fields for multi-robot manipulation via caging, 
Robot. Sci. Syst. (2007) pp. 25-32 

Z.D. Wang, V. Kumar: Object closure and manipu- 
lation by multiple cooperating mobile robots, IEEE 
Int. Conf. Robot. Autom. (2002) pp. 394-399 

P. Song, V. Kumar: A potential field based approach 
to multi-robot manipulation, IEEE Int. Conf. Robot. 
Autom. (2002) pp. 1217-1222 

Z. Wang, Y. Hirata, K. Kosuge: Control a rigid caging 
formation for cooperative object transportation by 
multiple mobile robots, Proc. IEEE Int. Conf. Robot. 
Autom. (2004) pp. 1580-1585 

0. Holland, C. Melhuish: Stigmergy, self-or- 
ganization, and sorting in collective robotics, Artif. 
Life 5(2), 173-202 (1999) 

J.L. Deneubourg, S. Goss, N. Franks, A. Sendova- 
Franks, C. Detrain, L. Chretien: The dynamics of col- 
lective sorting robot-like ants and ant-like robots, 
Proc. 1st Int. Conf. Simul. Adapt. Behav. Anim. 
Anim. (1990) 

R. Beckers, 0. Holland, J. Deneubourg: From lo- 
cal actions to global tasks: Stigmergy and collective 
robotics, Proc. 14th Int. Workshop Synth. Simul. Liv- 
ing Syst. (1994) pp. 181-189 

T. Wang, H. Zhang: Multi-robot collective sort- 
ing with local sensing, IEEE Intell. Autom. Conf. 
(2003) 

T. Wang, H. Zhang: Collective Sorting with Multi- 
ple Robots, IEEE Int. Conf. Robot. Biomim. (2004) 
pp. 716-720 

Y. Yang, M. Kamel: Clustering ensemble using 
swarm intelligence, IEEE, Swarm Intell. Symp. 
(2003) pp. 65-71 

A. Martinoli, F. Mondada: Collective and coopera- 
tive group behaviours: Biologically inspired experi- 
ments in robotics, Lect. Notes Comput. Sci. 223, 1-10 
(1997) 

J. Werfel, R. Nagpal: Three-dimensional construc- 
tion with mobile robots and modular blocks, Int. 
J. Robot. Res. 27(3/4), 463-479 (2008) 

J. Werfel, Y. Bar-Yam, D. Rus, R. Nagpal: Dis- 
tributed construction by mobile robots with en- 
hanced building blocks, IEEE Int. Conf. Robot. Au- 
tom. (2006) pp. 2787-2794 

J. Werfel: Building patterned structures with robot 
swarms, Proc. 19th Int. Joint Conf. Artif. Intell. 
(2005) pp. 1495-1502 

J. Werfel, R. Nagpal: Extended stigmergy in col- 
lective construction, IEEE Intell. Syst. 21(2), 20-28 
(2006) 

J. Werfel: Building blocks for multi-robot con- 
struction, Distrib. Auton. Robot. Syst. 6, 285-294 
(2007) 


14.05 


ZL | 4 Hed 


1406 Part F 


Swarm Intelligence 


ZL | d Hed 


72.49 


72.50 


72.51 


72.52 


Y. Terada, S. Murata: Automatic modular assembly 
system and its distributed control, Int. J. Robot. 
Res. 27, 445-462 (2008) 

R.L. Stewart, R.A. Russell: A distributed feed- 
back mechanism to regulate wall construction by 
a robotic swarm, Adapt. Behav. 14(1), 21-51 (2006) 
J. Wawerla, G.S. Sukhatme, M.J. Mataric: Collec- 
tive construction with multiple robots, IEEE/RSJ Int. 
Conf. Intell. Robot. Syst. (2002) pp. 2696-2701 

R.L. Stewart, R.A. Russell: Building a loose wall 
structure with a robotic swarm using a spatio- 


72.53 


72.54 


72.55 


temporal varying template, IEEE/RSJ Int. Conf. In- 
tell. Robot. Syst. (2004) pp. 712-716 

C.A.C. Parker, H. Zhang, C.R. Kube: Blind bulldozing: 
Multiple robot nest construction, IEEE/RSJ Int. Conf. 
Intell. Robot. Syst. (2003) pp. 2010-2015 

T. Huntsberger, G. Rodriguez, P.S. Schenker: 
Robotics challenges for robotic and human mars 
exploration, Proc. Robot. (2000) pp. 340-346 

R.A. Brooks, P. Maes, M.J. Mataric, G. More: Lunar 
based construction robots, Proc. Int. Conf. Intell. 
Robot. Syst. (1990) 


Kasper Støy 


Reconfigurable robots are robots built from mecha- 
tronics modules that can be connected in different 
ways to create task-specific robot morphologies. 
In this chapter we introduce reconfigurable robots 
and provide a brief taxonomy of this type of robot. 
However, the main focus of this chapter is on the 
four most important challenges in realizing re- 
configurable robots. The first two are mechatronics 
challenges, namely the challenge of connector de- 
sign and energy. Connectors are the most important 
design element of any reconfigurable robot be- 
cause they provide it with much of its functionality, 
but also many of its limitations. Supplying energy 
to a connected, distributed multi-robot system 
such as a reconfigurable robot is an important, but 
often underestimated problem. The third challenge 
is distributed control of reconfigurable robots. It is 
examined both how reconfigurable robots can be 
controlled in static configurations to produce loco- 
motion and manipulation and how configurations 
can be transformed through a self-reconfiguration 
process. The fourth challenge that we will discuss 
is programability and debugging of reconfigurable 
robotsystems. The chapter is concluded with a brief 


Reconfigurable robots are a kind of robot built from 
modules that can be connected in different ways to 
form different morphologies for different purposes. The 
motivation for this is that conventional robots are lim- 
ited by their morphology. E.g., the size of the wheels 
of a wheeled robot determines what terrain it can tra- 
verse. If it has small wheels it can operate in confined 
spaces, but not traverse rugged terrain and, vice versa, if 
it has large wheels. A robot arm may be limited in terms 
of reach or inability to move around in the environ- 
ment. Reconfigurable robots aim to solve this problem 
by providing a robotic system that can be manually re- 
configured to be physically suited for the specific task 
at hand. It is conceivable that a reconfigurable robot on- 


73. Reconfigurable Robots 


73.1 Mechatronics System Integration .......... 1409 
73.2 Connection Mechanisms....................... 1410 
Poe ENGEY oerna TANE 1411 
73.4 Distributed Control .......................068 1412 
73.4.1 Communication... 1412 
734.2 LOCOMOTION, joc si scecsecsczdassacaiestes 1413 
73.4.3 Self-Reconfiguration.................. 1414 
73.4.4 Manipulation... 1416 
73.5 Programmability and Debugging.......... 1417 


73.5.1 Iterative, Incremental 
Programming and Debugging..... 1417 


Toa AOne 1417 
73.5.3 Emerging Solutions................5-.6 1418 
73.6 Perspective. ...............cccccccccceessssseeeeeeees 1418 
73.7 Further Reading .......................::000eeeee 1419 
Referentes. esn ei 1419 


perspective. Overall, the chapter provides a general 
overview of the field of reconfigurable robots and 
is a perfect starting point for anyone interested in 
this exciting field. 


site can be fitted with several types of wheels, different 
segments can be added to form the body or arms. In this 
way, the reconfigurable robot can be a perfect physical 
fit for its task despite the task not being known or de- 
fined in advance. 

While this practically oriented motivation is suffi- 
cient for developing and working with reconfigurable 
robots, the vision for the field is much deeper. The un- 
derlying, long-term vision is to develop robots that are 
robust, self-healing, versatile, cheap, and autonomous. 
Hence most research in the field of reconfigurable 
robots has been on self-reconfigurable robots. These 
robots consist of modules just like reconfigurable 
robots, but in addition the modules can automatically 


14.07 


v 
pan 

= 

pes 
n 
“J 
w 


1408 Part F 


Swarm Intelligence 


€2 | 4 Hed 


Fig. 73.1 A gathering of Dictyostelium discoideum amoe- 
bae cells can be seen migrating (some individually, and 
some in streams) toward a central point. Aggregation 
territories can be as much as a centimeter in diameter (af- 
ter [73.2]) 


connect, disconnect, and move with respect to neigh- 
boring modules, and thus the robot as a whole can 
change shape not unlike the robot systems envisioned 
in science fiction movies such as Terminator and Trans- 
formers or the children’s cartoon Barbapapa. Another 
way to view self-reconfigurable robots is as multi- 
cellular robots [73.1]. In this case, each module is com- 
parable to a cell in an organism. In fact, the small animal 
hydra is able to self-reconfigure in the sense that if it is 
cut in half, the two sections each form a smaller, but 
complete hydra. While not strictly a multi-cellular or- 
ganism, the slime mold Dictyostelium has also been of 
significant inspiration to the self-reconfigurable robot 
community. In this slime mold individual cells search 
for food in a local area, but once the food sources are 
used up the cells aggregate, as is shown in Fig. 73.1, to 
form slugs that number in the hundreds of thousands 
of cells, however, is still able to function as one or- 
ganism whose mission it is in fact to find a suitable 
place to disperse spores for the next generation of slime 
molds. 

The modular design gives reconfigurable robots 
a number of useful features. First of all the robots are 
robust since if modules fail, the rest of the modules will 
still continue working and thus the robot can maintain 


Fig. 73.2 The CKbot in a chain configuration with wheels 
attached (courtesy of Modlab at University of Pennsylva- 
nia) 


a level of functionality, although it is slightly reduced. 
Self-reconfigurable robots extend on this by being able 
to eject failed modules from the robot and replace them 
with modules from other parts of their bodies and effec- 
tively achieve self-healing. A powerful demonstration 
of this is the CKBot shown in Fig. 73.2, which af- 
ter a kick has broken into several clusters of modules. 
These clusters are able to locate each other again and 
recreate the original structure [73.3]. Another vision of 
reconfigurable robots is that they are versatile due to 
their modular structure. Reconfigurable robots are not 
limited by a fixed shape, but the number and capabilities 
of the modules are available. A final feature is that the 
individual modules can be produced relatively cheaply 
due to production at scale. The implication is that even 
though the individual module can be quite complex they 
can be mass produced and thus their cost can be rela- 
tively low compared to their complexity. 

It is also important to point out that reconfigurable 
robots can never be a universal robot that can take on the 
functionality of any robot. It is, of course, the ultimate 
dream, but given a known task-environment a robot can 
be custom-designed for this and thus will be simpler 
and better performing than a comparable reconfigurable 
robot. Thus reconfigurable robots are best suited for sit- 
uations where the task or environment is not known in 
advance or in locations where many different types of 
robots are needed, but it is not possible to bring them 
all. Optimal applications for reconfigurable robots are 
thus in extra-planetary missions, disaster areas, and so 
on. However, they may also find their use in more down 
to earth applications such as educational robotics [73.4] 
or as robot construction kits [73.5]. 

It is important to note that these are all visions that 
the reconfigurable robot community is striving towards, 
but has only realized in limited ways. However, it is the 
vision of these truly amazing reconfigurable robots that 
drives us forward. 


Reconfigurable Robots | 73.1 Mechatronics System Integration 


73.1 Mechatronics System Integration 


In mechatronics system integration the main challenge 
is how to make a trade-off between different potential 
features of a module and the need to fit everything in 
a mechatronics module, which typically has a radius on 
the order of centimeters. The different classes of recon- 
figurable robots reflect different trade-offs. 

The oldest class is the mobile reconfigurable robot 
and can be traced back to the cellular robot (CEBOT) 
developed in the late 1980s [73.6], but new instanti- 
ations of this class, which are an order of magnitude 
smaller, have also recently been published [73.79]. 
This class of reconfigurable robots is characterized by 
the modules having a high-degree of self-mobility, typ- 
ically obtained by providing each module with a set of 
wheels. Modules of other classes also have a limited de- 
gree of self-mobility, e.g., a module can perform inch- 
worm like gaits, but it is so limited that we do not con- 
sider it mobile. 

Chain-type reconfigurable robots (Fig. 73.3a) were 
the first of the modern reconfigurable robots in the 
sense that they successfully demonstrated versatility: 
given the same reconfigurable robot several locomotion 
gaits could be implemented, including inch-worming, 
rolling, and even walking, as was demonstrated using 
the PolyPod robot [73.10, 11] and later its descendent 
PolyBot [73.12] and CONRO [73.13]. The chain-type 
modules are characterized by having a high degree of 
internal actuation which typically allow a module to 
bend or twist. However, there are examples of modules 
which provide for both bending and twisting [73.14]. 
These modules are also typically elongated, which 
makes them suitable for forming chains of modules 
with many degrees of freedom, making them appropri- 
ate for making limbs. 

Another class of reconfigurable robots is the lattice- 
type robots (Fig. 73.3b). This class of robot addresses 
one of the short-comings of chain-type robots. Namely, 
that it is difficult for chains to align and connect be- 
cause this requires precision that they often do not have. 
This makes it difficult to achieve self-reconfiguration 
with chain-type modular robots. The solution lattice- 
type reconfigurable robots represents is to have a ge- 
ometric design that allows them to fit in a lattice just 


Fig. 73.3a-c Examples of the three main types of recon- 
figurable robots: (a) CONRO chain-type, (b) molecule 
lattice-type, (c) M-TRAN hybrid, courtesy of USC’s Infor- 
mation Sciences Institute (a); Distributed Robotics Labo- 
ratory, MIT (b); AIST, Japan (c) > 


like atoms in a crystal. The movement of the modules 
is then limited to moving from lattice to lattice posi- 
tion, a task that only requires limited precision, sensing, 


1409 


l'E | d Hed 


1410 Part F 


Swarm Intelligence 


T'E | d Hed 


and actuation. However, this often means that modules 
have limited functionality outside of the lattice. Early 
two-dimensional lattice-type robots include the Fracta 
and Metamorphic robots [73.15, 16]; the first three-di- 
mensional lattice-type robots were the Molecule and the 
3-D-Unit [73.17, 18]. 

Chain-type and lattice-type robots (Fig. 73.3c) have 
largely been superseded by the hybrid reconfigurable 
robots. These robots combine the characteristics of 
chain-type and lattice-type robots in one system. That 
is, they can both fit in a lattice structure, which al- 
lows for relatively easy self-reconfiguration, and out- 
of-lattice, which allows for efficient locomotion or 
manipulation. The recent generation of self-reconfig- 
urable robots, including M-TRAN II, ATRON shown 
in Fig. 73.8, and SuperBot, are all of the hybrid 
type [73.14, 19, 20]. 

A final type of reconfigurable robot are the actua- 
tion-less robots. These robots depend on external forces 
to provide reconfiguration capabilities and are not able 
to move once they are in a lattice. There are stochastic 
versions of actuation-less modules that are suspended in 
a fluid [73.21] or float on an air-hockey table [73.22], 
and the random movements of the modules in the 
medium allow them by chance to get close enough to 
form connections. A slight twist of this approach is 
that modules only use the external forces to swing from 
lattice position to lattice position, but maintain control 
of when to disconnect and connect themselves [73.23]. 
There are also deterministic versions, which employ an 


73.2 Connection Mechanisms 


An element of the mechatronic design of reconfig- 
urable modular robots that has turned out to be a sig- 
nificant challenge is the mechanism that connects 
modules to one another. This may appear puzzling 
at first, but individual modules are functionally lim- 
ited and, hence, reconfigurable robots perform most 
tasks using groups of modules. This means that ev- 
erything has to be passed across connectors from 
module to module, including forces, communication, 
and energy. For self-reconfigurable robots connector 
design is even more difficult because the connector 
also has to be able to actively connect and discon- 
nect. The optimal connector would have the following 
features: 


@ Small size 
@ Fast 


assembly-by-disassembly process where modules start 
connected in a lattice and modules that are not needed 
in the specific configuration can then deterministically 
decide to disconnect from the structure (typically based 
on electro-magnetic forces) [73.24]. Finally, there are 
the manually reconfigurable robots that depend on the 
human user (or another robot) to perform the reconfig- 
uration [73.25, 26]. 

An orthogonal classification of reconfigurable 
robots is according to whether they are homogeneous 
or heterogeneous. Homogeneous modular robots con- 
sist of identical modules and have been favored in the 
community because they lend themselves to self-recon- 
figuration. However, it is becoming clear that if we want 
to keep modules simple and provide a certain level of 
functionality, we need to focus more on heterogeneous 
systems. It is not cost effective to provide all modules 
with the same level of functionality, and more impor- 
tantly, the modules become too complex, heavy, and 
large if they are to contain all the functionality needed, 
which in practice make them unsuited for practical ap- 
plications [73.27]. 

Another emerging idea is that of soft modules. In 
fact, quite a number of rigid modules that have been 
built come out of projects that aimed to build soft re- 
configurable robots. The motivation for soft modules is 
that they provide a certain level of compliance in the 
interaction with the environment and also within the 
robot. However, a good way to realize soft reconfig- 
urable robots has not yet been discovered. 


Strong 

Robust to wear and tear 

High tolerance to alignment errors 

Energy use only in the transition phases 

Transferal of electrical and/or communication sig- 
nals between modules 

Genderless 

Allows connection with different orientations 
Disconnects from both sides 

Dirt resistant. 


While there are a few connectors that incorpo- 
rate most of these features, none has implemented 
them all. It is easy to imagine that a solution 
is something along the lines of self-cleaning, con- 
ducting, with active velcro or gecko skin. Unfortu- 
nately, this does not exist (yet) and connector de- 


Reconfigurable Robots | 73.3 Energy 


signs are, therefore, based on conventional electro- 
magnetic, mechanical, or electro-static principles of 
connection. 

Magnetic connectors and combinations with elec- 
tro-magnets for active connection and disconnection 
have been quite successful given that they meet most 
of the above requirements except for being gender- 
less and strong. The gender issue is normally solved 
by laying out magnets in a geometrical pattern on the 
surface of the connector that allow male connectors 
and female connectors to be connected at a discrete 
number of angles. However, the main shortcoming of 
the magnetic solution is the strength it provides. It 
is clear that if the magnetic force is too weak mod- 
ules will fall apart easily. However, it is, in fact, also 
a problem if the magnetic force is too strong, because 
for active mechanisms the modules have to overcome 
the magnetic force to disconnect (or for manually re- 
configurable robots, the human user has to overcome 
the magnetic force). Therefore, a compromise has to 
be found, which is not optimal for any of the situa- 
tions. A recent solution to this problem is the use of 
switchable magnets. Switchable magnets come in sim- 
ple mechanical forms where magnets are physically 
turned to change the direction of the magnetic flux and 
thus the connection strength. The more advanced type 
use electro-magnets to change the magnetic polariza- 
tion of a permanent magnet achieving the same but in 
a much smaller form factor. These developments have 
opened up the possibility of using magnetic connectors 
again, but it remains largely unexplored in the com- 
munity except for the robot pebbles system (where the 
technology originated) [73.24]. 


73.3 Energy 


Reconfigurable robots are typically designed for auton- 
omy and hence rely on on-board batteries for power. 
The challenges here are to enable the modules to share 
the available energy and to allow the robot to recharge 
once batteries are depleted. 

It is important for the modules of a reconfigurable 
robot to be able to share energy because modules may 
have very different activity levels and hence very dif- 
ferent levels of energy consumption. Therefore, the life 
of the robot can be extended significantly by allow- 
ing inactive modules to donate their energy to more 
active ones. The issue is largely unexplored but there 
has been attempts of passing energy across connec- 
tors [73.32] through physical connections, and it has 


The most recent generations of self-reconfigurable 
robots have all favored mechanical solutions. A me- 
chanical solution is based on hooks coming out of one 
connector surface attaching to holes in the opposing 
connector surface. A mechanical solution immediately 
solves the problem of having strong connectors, but un- 
fortunately introduces others. The most important prob- 
lems are they are large and slow, e.g., in the ATRON 
self-reconfigurable robots the connector mechanism 
and associated actuators and electronics account for up 
to 60% of the modules’ size and weight and it takes 2 s 
to connect. However, the Terada connector [73.28] used 
in M-TRAN III and the SINGO connector [73.29] used 
in SuperBot appear to have provided potential solutions 
to the problem of size, but the time issue still remains. 

A last class of connectors are based on electro-static 
forces [73.30]. The idea is to charge two opposing metal 
surfaces causing the two surfaces to connect strongly. 
While being an interesting option, the realized systems 
are impractical, because they are large and sensitive to 
the distance between the connection surfaces. This ap- 
proach makes most sense at smaller scales, but despite 
some effort this has not been realized. Also, in this 
category of non-standard connector unisex velcro con- 
nectors [73.31] should also be mentioned, but here, of 
course, the main problem is to obtain enough connec- 
tion strength. 

Overall, connector technology is fairly advanced, 
but there is certainly room for improvement. However, 
at this point for significant progress to be made, new 
results probably have to emerge from material science 
and not from the reconfigurable robotic community it- 
self. 


been discussed in the context of electro-static connec- 
tors [73.30]. For self-reconfigurable robots there is also 
likely to be an algorithmic solution where modules 
change roles over time and, hence, distribute the energy 
consumption equally among modules over time. 

For recharging, a solution is to run an energy 
bus through the robot that both allows modules to 
recharge their batteries and run off an external power 
supply [73.33]. It has also been proposed for more sta- 
tionary applications that modules do not have onboard 
batteries, but are powered by the external power sup- 
ply. A way to achieve this while still giving the modules 
some autonomy of movement is to charge them through 
the floor; this has been investigated both mechatroni- 


1411 


EEZ | 4 Hed 


1412 Part F | Swarm Intelligence 


el | d Hed 


cally [73.15,34] and algorithmically [73.35]. A more 
flexible, but challenging approach is to allow a subset 
of robot modules to return to the charger and then re- 
turn to charge the remaining modules [73.36]. 


73.4 Distributed Control 


Reconfigurable robots were born out of the distributed 
autonomous robot systems community and there has, 
therefore, been a focus on distributed control algorithms 
since the early beginning. The reason why distributed 
control is such a good match for reconfigurable robots 
is that, if designed well, they have characteristics such 
as robustness and scalability. Robustness in this context 
refers to the ability of the control system and, hence, the 
robot to continue to function despite module failures 
and communication errors. This may seem as a rela- 
tively modest advantage, however, it turns out to be 
crucial. 

Communication systems on reconfigurable robots 
tend to be unreliable because communication in the 
case of wired communication has to be passed across 
the physical connector between modules and the con- 
nection may not have perfectly connected and, hence, 
the electrical terminals for passing the communica- 
tion signals have not made a completely stable phys- 
ical connection. It may also be that dust has tem- 
porarily ruined the physical interface between the two 
modules, not allowing electrical communication sig- 
nals to pass through. For these reasons, reconfigurable 
robots often rely on wireless communication in the 
form of either infrared communication or more global 
forms of wireless communication such as Bluetooth. 
However, this does not solve the problem, it just 
changes it. For infrared communication the transmit- 
ter and receiver on modules that are to communicate 
may not be aligned perfectly. There may be crosstalk 
caused by reflections of signals that cause interfer- 
ence between signals and even cause modules to re- 
ceive messages that were not for them in the first 
place [73.38]. Communication relying on electro-mag- 
netic waves do not have these problems, but then 
often the interference between modules and even back- 
ground wireless signals can cause communication to 
fail. The point here is that communication errors can- 
not only be attributed to poor design or the immature 
nature of module prototypes, but are fundamental prob- 
lems that the algorithms have to be able to handle 
robustly. 


At the more explorative end of the spectrum there 
may be interesting possibilities in wireless energy trans- 
fer [73.37], solar energy, or other alternative forms of 
obtaining or harvesting energy. 


Scalability is the other advantage of distributed 
control algorithms. The motivation here is that while 
current reconfigurable robots consist of tens of mod- 
ules, the ambition is that eventually we will have robots 
consisting of hundreds, thousands, or maybe even mil- 
lions of modules. It is, therefore, important that the 
controller does not rely on a central module for con- 
trol, since this module would be both a bottleneck for 
the responsiveness of the system and also be a single 
point of failure. Therefore, scalability of control al- 
gorithms is crucial, and distributed control algorithms 
have the potential to provide just that. Also, it is im- 
portant here to understand that algorithms also have to 
be able to deal with module failure, since as the num- 
ber of modules increases the chance of module failures 
and communication errors increase. In fact, given a high 
enough number of modules, modules will fail. Assume 
that the probability that a single module fails is pı, the 
probability that one module out of n fails, p, is given 
by 


Pn = 1—(1—p,)". (73.1) 


This is a very basic consideration, but it is important 
to understand that the probabilities are working against 
a controller that is not fault tolerant. For example, given 
ten modules with just a 1% probability of failing, the 
probably of one of them failing is 9.6%. Given this 
background we come to realize that distributed control 
algorithms are not just a luxury, but absolutely required 
if we are to realize reconfigurable robots. 


73.4.1 Communication 


A fundamental basis for all distributed control systems 
is the supporting communication system. In reconfig- 
urable robots there are local, global, and hybrid com- 
munication systems, see Fig. 73.4a-c, respectively. 
Local communication systems are based on mod- 
ule to module communication based on, for instance, 
infrared transceivers. While fairly primitive, this form 
of communication is essential in reconfigurable robots 


Reconfigurable Robots | 73.4 Distributed Control 


Fig. 73.4a-c The underlying communication models used in reconfigurable robots are (a) local, (b) global, or (c) hybrid 


(after [73.39]) 


because it allows two modules to determine their rel- 
ative positions and orientations. This is not possible 
using global communication systems independently of 
whether they are wired or wireless. Local communica- 
tion also scales since there is no common communica- 
tion medium that becomes saturated. 

Global communication systems using wired bus 
systems or wireless communication are also useful for 
real-time control of the reconfigurable robot. Because 
in a purely local communication system there may sig- 
nificant lag even if the communication volume is low, 
because messages may have to travel many links to 
arrive. Hence, an optimal design is one that has both 
a local communication system to support topology dis- 
covery and a global communication system to perform 
high-speed global coordination. 

An alternative is hybrid communication [73.39]. 
The idea behind this is to make a bus system whose 
topology can be changed dynamically. Initially, mod- 
ules can connect to neighbors to discover the local 
topology. As the need arises modules may connect 
or disconnect, i.e., reconfigure, their busses to match 
a given communication load and distribution across the 
system. While the idea seems to hold potential, it has 
not been thoroughly investigated. 


73.4.2 Locomotion 


One of the basic tasks of a reconfigurable robot is 
locomotion, hence let us take a look a some of the 
algorithms which have been proposed for controlling 
locomotion. One of the first distributed control algo- 
rithms was gait control tables [73.11,40] (Fig 73.5). 
Despite being very simple, the algorithm is a pow- 
erful demonstration that often practical and robust 
algorithms are more useful than theoretically sound 
algorithms. 


Each cell in a control table corresponds to the posi- 
tion of one actuator of one module of a specific time 
interval. The column identifies the actuator and the 
row identifies the time interval. The algorithm is based 
on the assumption that all modules are synchronized. 
When the algorithm is activated each module moves its 
actuators to the position identified in the first row of 
the gait control table. It then waits until the time inter- 
val has passed and then move actuators to the position 
identified by the second row and so on. When the end 
of the gait control table is reached the controller loops 
back to the first row. 

Gait control tables are a simple form of distributed 
control since they only work with the specific number 
of modules for which they were designed and they make 
the relatively large assumption that all modules’ clocks 
are synchronized. There is no way around the first lim- 
itation; however, the second one in practice often holds 
long enough to make successful experiments relying on 
modules being initialized at the same time. While be- 


Fig. 73.5 The PolyBot robot in a loop configuration con- 
trolled by a gait control table (after [73.41]) 


1413 


el | 4 Hed 


1414 Part F 


Swarm Intelligence 


el | d Hed 


ing fairly primitive the algorithm has been successful, 
and given our motivation above it is fairly clear why: 
it is very robust. It does not rely on communication 
and, in fact there is no communication between mod- 
ules so the failure of one module does not influence 
the control of other modules, but may of course reduce 
the performance of the robot. However, it is often what 
we refer to as graceful degradation of performance, be- 
cause the degradation is proportional to the number of 
failed modules. 

Above we have described gait control tables in 
their most basic form, however, in reality each cell 
of a control table may refer to a general behavior in- 
stead of a specific actuator position. For example, it is 
possible that an actuator should implement a spring- 
like function, be turned off completely, or be com- 
pletely stiff. In this way, the behavior of individual 
modules can be influenced by the behavior of other 
modules and the environment through which the robot 
is navigating. 

Theoretically, the main problem of gait control 
tables is that modules have no mechanism to stay syn- 
chronized and, hence, over time modules will lose syn- 
chronization. Another more serious consequence of this 
is that the robot as a whole cannot react to the environ- 
ment as a whole and, for instance, change locomotion 
pattern or shape. One solution to this problem is rep- 
resented by hormone-based control algorithms [73.42]. 
Slightly simplified, the idea is that before each module 
executes a row of the gait control table a synchroniz- 
ing hormone is passed through the robot. This ensures 
that modules stay synchronized and, in addition, it is 
also possible to pass different hormones to reflect dif- 
ferent desired locomotion patterns. While theoretically 
well-developed hormone-based control slows down the 
robot due to the overhead of synchronization and even 
worse if synchronization hormones are lost the robot 
may stop for a while until a new hormone is gen- 
erated. Role-based control [73.43] is a compromise 
between the two. The main idea of role-based control 
is to have a looser coupling between action and syn- 
chronization. The modules have autonomy like in gait 
control tables and achieve synchronization over time. 
However, the robot is not able to react globally as 
fast as hormone-based controllers because synchroniza- 
tion signals are slow compared to the movement of the 
robot. 

These algorithms are mainly suited for open-looped 
control. However, an important challenge is to under- 
stand how to adapt locomotion patterns and config- 
uration to the environment. This is a less explored 


challenge [73.44, 45] and is a very important challenge 
for the future of reconfigurable robots. 


73.4.3 Self-Reconfiguration 


A challenge that has received significant attention is 
that of self-reconfiguration control. This challenge is, 
of course, tied to self-reconfigurable robots and not 
to reconfigurable robots in general. It turns out that 
the general problem of reconfiguring one configuration 
into another is computationally intractable. In fact, it 
is currently believed that to find the optimal solution 
is NP-hard [73.46]. However, it is not entirely clear if 
there exists a subspace of the problem where it is com- 
putationally more tractable. There may be special cases 
where this is the case, but we currently do not know. 
The current status is that self-reconfiguration control re- 
mains difficult, in particular because we also aim for 
distributed, not just centralized, algorithms for solv- 
ing this believed-to-be NP-hard problem. In Fig. 73.6 
we shown an example of a short self-reconfiguration 
sequence. 


Definition 
One way to define the distributed self-reconfiguration 
problem is: 


Given a start configuration A and a final configura- 
tion B, distributedly find and execute a sequence of 
disconnections, moves, and connections that trans- 
forms A into B. 


This formulation shies away from the optimality 
criteria because, in practice, good-enough solutions 
are what we are after. Of course, optimal solutions 
would be better, but given that the problem is NP- 
hard we cannot find them efficiently. A likely false 
assumption this formulation implies is that configu- 
rations A and B are known. While this is true for 
very simple cases where the self-reconfigurable robot 
is to transform itself into a pre-specified object like 
a chair, in general the robot should be able to dis- 
tributedly discover suitable configurations. That is, the 
final configuration B is often not known beforehand. 
This leads to the flip-side of the self-reconfiguration 
problem, which has only been addressed to a limited 
degree: 


Given a start configuration A, a task T, and an en- 
vironment E, distributedly find a configuration B 
better suited for task T in environment E. 


Reconfigurable Robots | 73.4 Distributed Control 


It is likely that the split between configuration dis- 
covery and self-reconfiguration control is not entirely 
productive and thus maybe a formulation like this is bet- 
ter: 


Given a start configuration A, a task T, and an en- 
vironment E, distributedly find an action that makes 
configuration A better suited for task T in environ- 
ment E. 


A final comment on the problem formulation is that 
we may need all three variations of the problem. The 
last formulation is useful for incremental improvement, 
but occasionally the robot has to go through a paradigm 
shift, e.g., from a snake to a walking robot, and to 
achieve this we need the first two formulations of the 
problem. 


Fig. 73.6 A self-reconfiguration sequence that transforms 
M-TRAN from a walker to a snake configuration (af- 
ter [73.47]) 


Algorithms 
A self-reconfiguration algorithm consists of a repre- 
sentation of the final configuration and a movement 
strategy. 

The movement strategies that have been re- 
searched so far are random movement [73.15], lo- 
cal rules [73.48], coordinate attractors [73.49], gradi- 
ents [73.50], and recruitment [73.51]. The most concep- 
tually simple algorithm is one where modules know the 
global coordinates of the positions contained in the goal 
configuration. In this case, modules can move around 
randomly and stop when they are at a coordinate which 
is contained in the goal configuration. While this strat- 
egy is attractive due to its simplicity, it has a number of 
drawbacks. In a three-dimensional self-reconfigurable 
robot random movement is dangerous because mod- 
ules may by accident disconnect from the structure and 
fall down. While this problem may be solved by build- 
ing sturdy, soft modules that are able to reattach to the 
structure after a fall it is a difficult solution (at least 
nobody has attempted it so far). Another problem is 
that when modules settle randomly hollow subspaces or 
sealed off caves may be created where modules cannot 
enter. Again, in practice this may not be a problem since 
these subspaces are likely to be relatively small. Finally, 
and this is probably the least problematic, random walk 
is inefficient for a large number of modules and self- 
reconfiguration sequences consisting of a large number 
of actions. In other words, a movement strategy based 
on random walk has scalability issues. Coordinate at- 
tractors, gradients, and recruitments are all designed 
to improve scalability. The idea behind coordinate at- 
tractors is that modules that have reached a coordinate 
contained in the goal configuration that through local 
sensing discover that an adjacent position in the goal 
configuration is unfilled and can broadcast this coordi- 
nate to all the modules to attract free modules to this 
location. 

Gradients are again an improvement over this strat- 
egy because coordinate attractors are prone to local 
minima. That is, there may be no direct path from a free 
module to the free goal position. A gradient-based strat- 
egy does not broadcast a coordinate, but communicates 
an integer to neighbor modules; these communicate 
this integer minus 1 to their neighbors, and so on. 
Modules listen for integers and pass the highest one 
they have received minus 1 on. Once this process is 
complete, free modules can climb the gradient to find 
the location of the available goal position and, impor- 
tantly, they can do so by following the structure of 
the robot and avoid local minima (this is the strat- 


1415 


HEL | d Hed 


1416 Part F 


Swarm Intelligence 


el | d Hed 


egy used in the self-reconfiguration sequence shown in 
Fig. 73.7). 

Finally, the recruitment strategy is a more conserva- 
tive version of gradients because here the module next 
to the goal position sends out a single message whose 
purpose it is to recruit a single free module for the un- 
filled goal position that it knows about. Whether it pays 
off to be conservative or not is a matter of priorities. 
However, it may often be a good strategy to attract 
more modules because where one module is needed 
probably more are needed later, and then accept the 
movement overhead when this is not the case. The fi- 
nal movement strategy is based on local rules. These 
rules only allow a module to move if the local configu- 
ration satisfies a rule in a rule set. In this case, the rule 
will fire and an action will be executed. By defining 
the local rules cleverly the resulting configurations can 
be constrained. The movement strategy of local rules 
is typically used to control cluster-flow or water-flow 
locomotion where modules from the back of the robot 
move towards the front, resulting in a forward locomo- 
tion of the robot [73.52]. 

All these movement strategies, except for local 
rules, rely on a representation of the goal configura- 
tion. A simple representation is to represent all the goal 
coordinates in the final configuration. However, given 
that all modules need a copy of this representation it 
is important that it is space efficient and, therefore, 
a direct representation like this is only suited for small 
configurations. Another representation used is one of 
overlapping cubes [73.50], but other representations 
could also be used. Typically, which representation to 
use depends on a trade-off between space and compu- 
tational complexity. It is also possible to have indirect 


b) c) 


Fig. 73.7a-f Simulated, large-scale self-reconfiguration 


representations that code for growth patterns instead, 
but these are explored to a lesser degree. 

The standing challenge for self-reconfiguration is 
to make algorithms that are practical to use on physi- 
cal self-reconfigurable robots. The range of algorithms 
covered here is sufficient for robots consisting of tens 
of modules and, thus the self-reconfiguration challenge 
is currently more of a mechatronics problem than an al- 
gorithmic problem [73.53]; however there is certainly 
room for algorithmic improvements as well. 


73.4.4 Manipulation 


Manipulation is another task that is suitable for recon- 
figurable robots. If modules are connected in a chain 
configuration, they can form a serial manipulator with 
properties not unlike those of traditional robot manip- 
ulators. However, two important problems have to be 
solved before they can work as a traditional robot ma- 
nipulator. 

One problem is how to calculate the inverse kine- 
matics for a chain of modules. The other is how to 
increase the strength of a modular manipulator since 
it is relatively weak because of the limited actuator 
strength of the individual module. Inverse kinematics 
provide a way to calculate the position of the internal 
joint angles of the module needed for the outermost 
module to reach a given position and orientation in 
space [73.54]. Several answers to the question of how 
to calculate inverse kinematics have been explored. 
One option is to fit the serial chain of modules to 
a curve [73.55]. Another is to use constrained opti- 
mization techniques [73.56]. A third option is to use 
a fast method based on what is defined as dexterous 
workspaces of subchains [73.57]. An important aspect 
of this work is also the potential for the robot to dis- 


Fig. 73.8 The ATRON hybrid self-reconfigurable robot 


Reconfigurable Robots | 73.5 Programmability and Debugging 


cover its own kinematics [73.58]. However, maintaining 
a correspondences between the kinematic model and 
the physical robot remains a significant challenge for 
these approaches given that a chain of modules is often 
not rigid enough in part due to the connectors. 

The other problem that needs to be addressed is 
the relative weakness of the modules and, as a con- 
sequence, the modular manipulator as a whole. We 
need to find a way to make modules work together 
to produce cooperative actuation that allows them to 
produce larger forces than those that the modules can 
produce individually. This question is a little harder 
to answer, but one option is to exploit the large me- 
chanical advantages formed near singularities of the 
joints [73.59]. The idea is that a closed loop of mod- 
ules forms a manipulator. If specific sets of modules 
are alternately moved and locked, the chain of modules 
can continually maintain a large mechanical advan- 
tage and then generate a much larger force than the 
one an individual module can provide. However, prob- 


lems remain with internal forces of the chain and the 
weight of the modules involved. Alternative, mostly 
theoretical approaches include a biologically inspired 
approach to mechanical design where modules form 
the equivalent of bones, muscles, and tendons [73.60] 
and use these construction elements to propagate forces 
to where they are needed in the configuration. It may 
also be possible to coordinate the movement of all mod- 
ules so that the movements add up to produce a larger 
movement on the global level, i.e., perform collective 
actuation [73.61]. 

Besides the traditional arm-based approach to ma- 
nipulation, it is also possible to use reconfigurable 
robots as a distributed actuator array [73.62]. In this 
approach, modules are spread over a surface. When an 
object is placed on the surface, many modules can work 
together to manipulate it. The array can handle heavy 
objects as long as their surface area is relatively large. 
In combination with self-reconfiguration, it is also pos- 
sible to use this approach in three dimensions [73.63]. 


73.5 Programmability and Debugging 


An area that is receiving more and more focus in the 
community is the challenge of how to efficiently pro- 
gram and debug reconfigurable robots. In the previous 
section, we discussed how to control reconfigurable 
robots distributedly; while these distributed control al- 
gorithms are desired for their robustness and scalability 
they are notoriously difficult to implement and debug 
in general, but even more so on modular robots that are 
resource constrained, embedded platforms that often 
only allow for debugging output in the form of blink- 
ing LEDs. 


73.5.1 Iterative, Incremental Programming 
and Debugging 


Conventionally, the challenge is met by developing ap- 
plications for reconfigurable robots using an iterative, 
incremental approach to programming and debugging. 
This ensures that we can locate errors relatively quickly 
and correct them before we introduce additional func- 
tionality and hence complexity. A good programming 
practice is also to develop an application programming 
interface that hides the low-level hardware interface for 
the programmer more interested in the higher-level con- 
trol algorithms. This conventional approach is suitable 
for small demonstration programs, but as the complex- 


ity of the task and thus the program increases, this 
approach becomes intractable. The main reason is that 
it becomes increasingly difficult to obtain reliable de- 
bugging information due to the distributed and dynamic 
nature of reconfigurable robots and also more often than 
not due to the immaturity of the physical platforms. 


73.5.2 Simulation 


Given the shortcomings of iterative, incremental de- 
velopment, researchers use simulations, e.g., [73.64], 
to ensure that the distributed and dynamic aspects of 
the controller are thoroughly debugged before being 
deployed on the physical platform. However, build- 
ing a reliable simulator is a feat in its own right. The 
simplest form of simulators are logic in nature where 
events are discrete and instantaneous in time, e.g., the 
transmission of a message or the movement of a mod- 
ule from one lattice position to another. These logic- 
based simulators are useful for experimentally convinc- 
ing oneself that the logic of the distributed algorithm 
under development is correct. If this algorithm is then 
transferred to the physical, reconfigurable robot and 
allowed to run on a carefully debugged application pro- 
gramming interface, it is only real-world issues and 
hardware limitations that stand in the way of success. 


1417 


S'EL | d Hed 


1418 Part F 


Swarm Intelligence 


9°€2 | 4 Hed 


However, these issues and limitations should not be 
underestimated and include, but are not limited to: un- 
reliable communication, limited communication band- 
width, time, parallelism, hardware failures, differences 
between modules in terms of actuation, sensing, and 
communication performance. These issues and limita- 
tions can to some degree be handled by proper simu- 
lation, but typically are not considered and thus leave 
algorithms stranded in simulation because they are un- 
able to deal with the real-world constraints of a physical 
reconfigurable robot. The gap between the simulated 
world and the real world is often referred to as the re- 
ality-gap [73.65]. This gap is widened even more by 
using simulations based on simplified physics engines, 
because precise modeling of the physics of a recon- 
figurable modular robot is almost impossible due to 
the complex interactions between the modules them- 
selves and the modules and the environment. While 
the physics-based simulations increase the reality gap, 
they do allow for a wider area of study, e.g., study of 
locomotion algorithms. However, often at the cost of re- 
duced transfer of results to the physical reconfigurable 
robot. 


73.5.3 Emerging Solutions 


Above we have presented current approaches and their 
advantages and in particular their disadvantages. It is 
clear that we are far from meeting the challenge of 
efficient programming and debugging of physical re- 
configurable robots. However, there are currently two 
approaches under investigation that may provide some 
solutions. 

One is the use of domain specific programming 
languages [73.66, 67]. The fundamental idea is to ex- 
pose programming primitives at the level of abstraction 
preferred by the programmer and hide the implemen- 
tation of the primitives. For instance, communication 


73.6 Perspective 


This concludes the overview of reconfigurable robots. 
The question is where does this leave us? What does 
the future of reconfigurable robots and reconfigurable 
robot research look like? 

First of all, the field of reconfigurable robots has 
matured significantly over the last decade, leading to 
the first applications of modular robot technology: 
Cubelets [73.4], a construction kit teaching children 


is not necessarily central to the programmer and can, 
therefore, be implicitly handled by the programming 
language. The advantage of this approach is that it 
frees the programmer from error-prone, repetitive pro- 
gramming and allows him to focus more energy on the 
programming challenges related to the task at hand. The 
language can also to some degree help deal with hard- 
ware limitations, allowing the programmer to build on 
reliable programming primitives. However, there may 
be a problem of leaky abstractions where it can be hard 
for a programmer to discover hardware problems be- 
cause the software that interfaces with the hardware is 
hidden from the programmer. 

Another development is that reconfigurable robots 
increasingly get more and more powerful processors 
and communication hardware, a development which is 
primarily driven and made possible by the cell-phone 
industry. This development opens an opportunity for ef- 
ficient programming and debugging of reconfigurable 
robots. The reason is that up until now reconfigurable 
robots have been resource constrained to the degree that 
it was not possible to run programs targeted at debug- 
ging in parallel to the executing program because either 
there was simply not enough available memory and pro- 
cessing energy or the debugging tool would interfere 
with the executing program to the degree that its behav- 
ior would be completely altered or simply not work at 
all. However, with the increase in processing and com- 
munication energy this problem may be reduced, and 
this will open the door to new forms of programming 
middleware that has been instrumental to the success of 
other areas of robotics such as Player/Stage [73.68] and 
robot operating system ROS [73.69]. 

In the broader context breakthroughs in program- 
ming and debugging of reconfigurable robots will be 
a significant contribution that can help us develop so- 
lutions to complex tasks that are currently beyond our 
reach. 


about emergent behavior in complex systems. Another 
example is LocoKit that has been developed to effi- 
ciently explore morphology related questions in the 
context of robot locomotion [73.5, 70]. The commu- 
nity, in general, is very engaged in understanding how 
reconfigurable robots can be adapted to this specific 
application, which in time is likely to lead to more 
applications. 


Reconfigurable Robots | References 


From a research point of view, the vision of an 
autonomously distributed reconfigurable robot still re- 
mains to be realized. This requires advances in all 
areas covered in this chapter. Like other fields of 
robotics, reconfigurable robotics is nurtured by the 
progressive development of rapid prototyping tech- 
nology and smartphone technology, including wire- 
less charging, which may open the path to novel 
mechatronic designs. Also, the emerging field of 
soft robotics may hold potential for radical new de- 
signs of reconfigurable robots. Overall, there seems 
to be a growing opportunity to exploit these ad- 
vances to design a new generation of reconfigurable 
robots. 

In the area of distributed control, which is probably 
the best understood area of reconfigurable robots, there 


73.7 Further Reading 


This chapter has provided a high-level overview of 
the field of reconfigurable robots. Those readers in- 
terested in a complementary introduction should con- 


References 


is a clear understanding of how the modules of a recon- 
figurable robot can be coordinated internally. However, 
there is still a significant open challenge in making re- 
configurable robots adapt to and interact with unknown 
environments through sensors. Also, a more engineer- 
ing oriented area of programmability and debugging is 
crucial for the field if we are to handle the complexity 
of reconfigurable robots more efficiently. 

The potential of autonomously distributed recon- 
figurable robots is as exciting as ever, and their real- 
ization today appears much more realistic than when 
the idea was conceived 25 years ago, thanks to the 
hard work of the community. However, there are still 
important discoveries to be made before the potential 
of autonomously distributed reconfigurable robots is 
realized. 


sider reading [73.71] and those needing a more de- 
tailed introduction are referred to the books [73.72, 
73]. 


73.1 T. Fukuda, T. Ueyama: Cellular Robotics and Micro 
Robotics Systems (World Scientific, Singapore 1994) 

73.2 R.H. Kessin: Cell motility: Making streams, Nature 
422, 482 (2003) 

73.3 M. Yim, B. Shirmohammadi, J. Sastra, M. Park, 
M. Dugan, C.J. Taylor: Towards robotic self- 
reassembly after explosion, Proc. IEEE/RSJ Int. Conf. 
Intell. Robot. Syst. (2007) pp. 2767-2772 

73.4 E. Schweikardt et al.: Cubelets robot construction 
kit, https://modrobotics.com/ (2015) 

73.5 J.C. Larsen, D. Brandt, K. Stoy: LocoKit: A construc- 
tion kit for building functional morphologies for 
robots, Proc. 12th Int. Conf. Adapt. Behav. (2012) 
pp. 12-24 

73.6 T. Fukuda, Y. Kawauchi, M. Buss: Self organiz- 
ing robots based on cell structures — CEBOT, Proc. 
IEEE/RSJ Int. Workshop Intell. Robot. Syst. (1988) 
pp. 145-150 

73.7 R. Groß, M. Bonani, F. Mondada, M. Dorigo: 
Autonomous self-assembly in swarm-bots, IEEE 
Trans. Robot. 22(6), 1115-1130 (2006) 

73.8 M.D.M. Kutzer, M.S. Moses, C.Y. Brown, D.H. Scheidt, 
G.S. Chirikjian, M. Armand: Design of a new 
independently-mobile reconfigurable modular 
robot, Proc. IEEE Int. Conf. Robot. Autom. (2010) 
pp. 2758-2764 


73.9 G.G. Ryland, H.H. Cheng: Design of imobot, an in- 
telligent reconfigurable mobile robot with novel 
locomotion, Proc. IEEE Int. Conf. Robot. Autom. 
(2010) pp. 60-65 

73.10 M. Yim: A reconfigurable modular robot with many 
modes of locomotion, Proc. JSME Int. Conf. Adv. 
Mechatron. (1993) pp. 283-288 

73.11 M. Yim: Locomotion with a Unit-Modular Recon- 
figurable Robot, Ph.D. Thesis (Department of Me- 
chanical Engineering, Stanford University, Stanford 
1994) 

73.12 M. Yim, D.G. Duff, K. Roufas, Y. Zhang, C. El- 
dershaw: Evolution of PolyBot: A modular recon- 
figurable robot, Proc. Harmon. Drive Int. Symp. 
(2001) 

73.13 A. Castano, R. Chokkalingam, P. Will: Autonomous 
and self-sufficient CONRO modules for reconfig- 
urable robots, Proc. 5th Int. Symp. Distrib. Auton. 
Robot. Syst. (2000) pp. 155-164 

73.14 B. Salemi, M. Moll, W.-M. Shen: SuperBot: A de- 
ployable, multi-functional, and modular self- 
reconfigurable robotic system, Proc. IEEE/RSJ Intl. 
Conf. Intell. Robot. Syst. (2006) pp. 3636-3641 

73.15 S. Murata, H. Kurokawa, S. Kokaji: Self-assembling 
machine, Proc. IEEE Int. Conf. Robot. Autom. (1994) 
pp. 441-448 


1419 


€Z | d Hed 


1420 Part F 


€2 | 4 Hed 


Swarm Intelligence 


73.16 


73.17 


73.18 


73.19 


73.20 


73.21 


73.22 


73.23 


73.24 


73.25 


73.26 


73.27 


73.28 


73.29 


73.30 


73.31 


G.S. Chirikjian: Kinematics of a metamorphic 
robotic system, Proc. IEEE Int. Conf. Robot. Autom. 
(1994) pp. 449-455 

K. Kotay, D. Rus, M. Vona, C. McGray: The self- 
reconfiguring robotic molecule, Proc. IEEE Int. Conf. 
Robot. Autom. (1998) pp. 424-431 

S. Murata, H. Kurokawa, E. Yoshida, K. Tomita, 
S. Kokaji: A 3-d self-reconfigurable structure, Proc. 
IEEE Int. Conf. Robot. Autom. (1998) pp. 432-439 
H. Kurokawa, K. Tomita, A. Kamimura, 
S. Kokaji, T. Hasuo, S. Murata: Distributed self- 
reconfiguration of M-TRAN Ill modular robotic 
system, Int. J. Robot. Res. 27(3/4), 373-386 (2008) 
E.H. Østergaard, K. Kassow, R. Beck, H.H. Lund: De- 
sign of the ATRON lattice-based self-reconfigurable 
robot, Auton. Robot. 21(2), 165-183 (2006) 

P. White, V. Zykov, J. Bongard, H. Lipson: Three di- 
mensional stochastic reconfiguration of modular 
robots, Proc. Robot. Sci. Syst. (2005) pp. 161-168 

J. Bishop, S. Burden, E. Klavins, R. Kreisberg, 
W. Malone, N. Napp, T. Nguyen: Self-organizing 
programmable parts, Proc. Int. Conf. Intell. Robot. 
Syst. (2005) pp. 3684-3691 

P.J. White, M. Yim: Reliable external actuation 
for full reachability in robotic modular self- 
reconfiguration, Int. J. Robot. Res. 29(5), 598-612 
(2010) 

K. Gilpin, A. Knaian, D. Rus: Robot pebbles: 
One centimeter modules for programmable mat- 
ter through self-disassembly, Proc. IEEE Int. Conf. 
Robot. Autom. (2010) pp. 2485-2492 

V. Zykov, A. Chan, H. Lipson: Molecubes: An open- 
source modular robotics kit, Proc. IEEE/RSJ Int. Conf. 
Robot. Syst., Self-Reconfig. Robot. Workshop (2007) 
A. Lyder, R.F.M. Garcia, K. Stoy: Mechanical design 
of ODIN, an extendable heterogeneous deformable 
modular robots, Proc. IEEE/RSJ Int. Conf. Int. Robot. 
Syst., Nice (2008) pp. 883-888 

A. Lyder, K. Stoy, R.F.M. Garciá, J.C. Larsen, P. Her- 
mansen: On sub-modularization and morpholog- 
ical heterogeneity in modular robotics, Proc. 12th 
Int. Conf. Intell. Auton. Syst. (2012) pp. 1-14 

Y. Terada, S. Murata: Automatic modular assembly 
system and its distribution control, Int. J. Robot. 
Res. 27, 445-462 (2008) 

W.-M. Shen, R. Kovac, M. Rubenstein: SINGO: 
A single-end-operative and genderless connector 
for self-reconfiguration, self-assembly and self- 
healing, Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 
Workshop Self-Reconfig. Robot., Syst. Appl. (2008) 
pp. 64-67 

M.E. Karagozler, J.D. Campbell, G.K. Fedder, 
S.C. Goldstein, M.P. Weller, B.W. Yoon: Electro- 
static latching for inter-module adhesion, power 
transfer, and communication in modular robots, 
Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. (2007) 
pp. 2779-2786 

A. Ishiguro, M. Shimizu, T. Kawakatsu: Don't try to 
control everything! An emergent morphology con- 


73.32 


73.33 


73.34 


73.35 


73.36 


73.37 


73.38 


73.39 


73.40 


73.41 


73.42 


73.43 


73.44 


73.45 


73.46 


trol of a modular robot, Proc. IEEE/RSJ Int. Conf. 
Intell. Robot. Syst. (2004) pp. 981-985 

M.W. Jørgensen, E.H. Ostergaard, H.H. Lund: Mod- 
ular ATRON: Modules for a self-reconfigurable 
robot, Proc. IEEE/RSJ Int. Conf. Robot. Syst. (2004) 
pp. 2068-2073 

R.F.M. Garcia, A. Lyder, D.J. Christensen, K. Stoy: 
Reusable electronics and adaptable communica- 
tion as implemented in the ODIN modular robot, 
Proc. IEEE Int. Conf. Robot. Autom. (2009) 

B. Kirby, B. Aksak, J. Hoburg, T. Mowry, P. Pillai: A 
modular robotic system using magnetic force ef- 
fectors, Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. 
(2007) pp. 2787-2793 

J. Campbell, P. Pillai, S.C. Goldstein: The robot 
is the tether: Active, adaptive power routing for 
modular robots with unary inter-robot connectors, 
Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. (2005) 
pp. 4108-4115 

S. Kernbach, 0. Kernbach: Collective energy home- 
ostasis in a large-scale micro-robotic swarm, 
Robot. Auton. Syst. 59, 1090-1101 (2011) 

M.P.0. Cabrera, R.S. Trifonov, G.A. Castells, K. Stoy: 
Wireless communication and power transfer in 
modular robots, Proc. IROS Workshop Reconfig. 
Modul. Robot. (2011) 

D.J. Christensen, U.P. Schultz, D. Brandt, K. Stoy: 
Neighbor detection and crosstalk elimination in 
self-reconfigurable robots, Proc. 1st Int. Conf. Robot 
Commun. Coord. (2007) 

R.F.M. Garcia, D.J. Christensen, K. Stoy, A. Lyder: 
Hybrid approach: A self-reconfigurable communi- 
cation network for modular robots, Proc. 1st Int. 
Conf. Robot Commun. Coord. (2007) pp. 23:1-23:8 
M. Yim: New locomotion gaits, Proc. Int. Conf. 
Robot. Autom. (1994) pp. 2508-2514 

M. Yim, Y. Zhang, K. Roufas, D.G. Duff, C. El- 
dershaw: Connecting and disconnecting for chain 
self-reconfiguration with PolyBot, IEEE/ASME Trans. 
Mechatron. 7(4), 442 (2002) 

W.-M. Shen, B. Salemi, P. Will: Hormone-inspired 
adaptive communication and distributed control 
for conro self-reconfigurable robots, IEEE Trans. 
Robot. Autom. 18(5), 700-712 (2002) 

K. Støy, W.-M. Shen, P. Will: Using role based 
control to produce locomotion in chain-type self- 
reconfigurable robot, IEEE Trans. Mechatron. 7(4), 
410-417 (2002) 

K. Støy, W.-M. Shen, P. Will: On the use of sensors 
in self-reconfigurable robots, Proc. 7th Int. Conf. 
Simul. Adapt. Behav. (2002) pp. 48-57 

A. Kamimura, H. Kurokawa, E. Yoshida, S. Murata, 
K. Tomita, S. Kokaji: Automatic locomotion design 
and experiments for a modular robotic system, 
IEEE/ASME Trans. Mechatron. 10(3), 314-325 (2005) 
F. Hou, W.-M. Shen: On the complexity of opti- 
mal reconfiguration planning for modular recon- 
figurable robots, Proc. IEEE Int. Conf. Robot. Autom. 
(2010) pp. 2791-2796 


Reconfigurable Robots | References 1421 


73.47 


73.48 


73.49 


73. 


73. 


73. 


73. 


73. 


73. 


73. 


73. 


73. 


73. 


73. 


50 


5I 


52 


53 


54 


55 


56 


D 


58 


59 


60 


H. Kurokawa et al. (2010): M-TRAN (Modular 
Transformer), Research, https://unit.aist.go.jp/is/ 
frrg/dsysd/mtran3/research.htm 

Z. Butler, K. Kotay, D. Rus, K. Tomita: Generic 
de-centralized control for lattice-based self- 
reconfigurable robots, Int. J. Robot. Res. 23(9), 
919-937 (2004) 

M. Yim, Y. Zhang, J. Lamping, E. Mao: Distributed 
control for 3-D metamorphosis, Auton. Robot. 10(1), 
41-56 (2001) 

K. Støy, R. Nagpal: Self-reconfiguration using di- 
rected growth, Proc. Int. Conf. Distrib. Auton. Robot 
Syst. (2004) pp. 1-10 

Z. Butler, R. Fitch, D. Rus: Experiments in dis- 
tributed control of modular robots, Proc. Int. Symp. 
Exp. Robot. (2003) pp. 307-316 

E.H. Østergaard, H.H. Lund: Distributed cluster walk 
for the ATRON self-reconfigurable robot, Proc. 8th 
Conf. Intell. Auton. Syst. (2004) pp. 291-298 

K. Stoy, H. Kurokawa: Current topics in classic self- 
reconfigurable robot research, Proc. IROS/RSJ Work- 
shop Reconfig. Modul. Robot. (2011) 

J.J. Craig: Introduction to Robotics: Mechanics and 
Control, 3rd edn. (Prentice Hall, Reading 2003) 

G.S. Chirikjian, J.W. Burdick: The kinematics of 
hyper-redundant robot locomotion, IEEE Trans. 
Robot. Autom. 11(6), 781-793 (1995) 

Y. Zhang, M. Fromherz, L. Crawford, Y. Shang: 
A general constraint-based control framework with 
examples in modular self-reconfigurable robots, 
Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. (2002) 
pp. 2163-2168 

S.K. Agrawal, L. Kissner, M. Yim: Joint solutions of 
many degrees-of-freedom systems using dextrous 
workspaces, Proc. IEEE Int. Conf. Robot. Autom. 
(2001) pp. 2480-2485 

M. Bordignon, U.P. Schultz, K. Stoy: Model-based 
kinematics generation for modular mechatronic 
toolkits, Proc. 9th Int. Conf. Gener. Progr. Compon. 
Eng. (2010) pp. 157-166 

M. Yim, D. Duff, Y. Zhang: Closed chain motion 
with large mechanical advantage, Proc. IEEE/RSJ 
Int. Conf. Intell. Robot. Syst. (2001) pp. 318- 
323 


D.J. Christensen, J. Campbell, K. Stoy: 
Anatomy-based organization of morphology 
and control in self-reconfigurable modular 


73.61 


73.62 


73.63 


73.64 


73.65 


73.66 


73.67 


73.68 


73.69 


73.70 


73.71 


73.72 


73.73 


robots, Neural Comput. Appl. 19(6), 787-805 
(2010) 

J. Campbell, P. Pillai: Collective actuation, Int. 
J. Robot. Res. 27(3/4), 299-314 (2007) 

M. Yim, J. Reich, A. Berlin: Two approaches to 
distributed manipulation. In: Distributed Manip- 
ulation, ed. by H. Choset, K. Bohringer (Kluwer 
Academic, Boston 2000) pp. 237-260 

J. Kubica, A. Casal, T. Hogg: Agent-based con- 
trol for object manipulation with modular self- 
reconfigurable robots, Proc. Int. Jt. Conf. Artif. In- 
tell. (2001) pp. 1344-1352 

D. Christensen, U.P. Schultz, D. Brandt, K. Stoy: 
A unified simulator for self-reconfigurable robots, 
Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. (2008) 
pp. 870-876 

N. Jakobi, P. Husbands, I. Harvey: Noise and the 
reality gab: The use of simulation in evolutionary 
robotics, Adv. Artif. Life: Proc. Third Eur. Conf. Artif. 
Life (1995) pp. 704-720 

U.P. Schultz, M. Bordignon, K. Stoy: Robust and 
reversible execution of self-reconfiguration se- 
quences, Robotica 29(1), 35-57 (2011) 

M.P. Ashley-Rollman, P. Lee, S.C. Goldstein, P. Pil- 
lai, J.D. Campbell: A language for large ensembles 
of independently executing nodes, Proc. Int. Conf. 
Log. Progr. (2009) pp. 265-280 

B.P. Gerkey, R.T. Vaughan, K. Støy, A. Howard, 
G.S. Sukhatme, M.J. Mataric: Most valuable player: 
A robot device server for distributed control, 
Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. (2001) 
pp. 1226-1231 

M. Quigley, K. Conley, B. Gerkey, J. Faust, T.B. Foote, 
J. Leibs, R. Wheeler, A.Y. Ng: ROS: An open-source 
robot operating system, Proc. ICRA Workshop Open 
Source Softw. (2009) 

J. C. Larsen et al.: LocoKit: Robots that move, http:// 
www. locokit.sdu.dk (2013) 

M. Yim, W.-M. Shen, B. Salemi, D. Rus, M. Moll, 
H. Lipson, E. Klavins, G.S. Chirikjian: Modular self- 
reconfigurable robot systems, IEEE Robot. Autom. 
Mag. (2007) pp. 43-52 

S. Murata, H. Kurokawa: Self-Organizing Robots 
(Springer, Berlin, Heidelberg 2012) 

K. Støy, D.J. Christensen, D. Brandt: Self- 
Reconfigurable Robots: An Introduction (MIT, 
Cambridge 2010) 


€Z | d Hed 


74. Probabilistic Modeling of Swarming Systems 


Nikolaus Correll, Heiko Hamann 


This chapter provides on overview on probabilis- 
tic modeling of swarming systems. We first show 
how population dynamics models can be derived 
from the master equation in physics. We then 
present models with increasing complexity and 
with varying degrees of spatial dynamics. We will 
first introduce a model for collaboration and show 
how macroscopic models can be used to derive 
optimal policies for the individual robot analyti- 
cally. We then introduce two models for collective 
decisions; first modeling spatiality implicitly by 
tracking the number of robots at specific sites and 
then explicitly using a Fokker—Planck equation. 
The chapter is concluded with open challenges in 


74.1 From Bioligical to Artificial Swarms....... 1423 


74.2 The Master Equation.....................000 1424 
74.3 Non-Spatial Probabilistic Models.......... 1424 
74:31 Collaboration essin 1424 
74.3.2 Collective Decisions ................... 1426 
74.4 Spatial Models: 
Collective Optimization ........................ 1428 
74.5 Conclusion... 1431 
Referentes. onnie 1431 


combining non-spatial with spatial probabilistic 
modeling techniques. 


74.1 From Bioligical to Artificial Swarms 


The swarming behavior of ants, wasps, and bees 
demonstrates the emergence of stupendously complex 
spatio-temporal patterns ranging from a swarm finding 
the shortest paths to the assembly of three-dimensional 
structures with intricate architecture and well-regulated 
thermodynamics [74.1,2]. In the bigger scheme of 
things, these systems represent just the tip of the ice- 
berg; their behavior is considerably less complex than 
that of the brain, cities, or galaxies, all of which are es- 
sentially swarming systems (and all of which can be 
reduced to first principles and interactions on atomic 
scale). Yet, social insects make the world of self- 
organization accessible to us as they are comparably 
easy to observe. Studying these systems is interesting 
from an engineering perspective as they demonstrate 
how collectives can transcend the abilities of the indi- 
vidual member and let the organism as a whole exhibit 
cognitive behavior. 

Cognition is derived from the Latin word cog- 
nescere and means to know, to recognize, and also to 
conceptualize. In the human brain, cognition emerges — 
to the best of our knowledge — from the complex in- 


teractions of highly connected, large-scale distributed 
neural activity. We argue that cognition can manifest it- 
self at multiple different levels of complexity, ranging 
from conceptualizing collective decisions such as as- 
suming a certain shape or deciding between different 
abstract choices in social insects to reasoning on com- 
plex problems and expressing emotions in humans, the 
combination of the latter two often framed as the Tur- 
ing test in artificial intelligence. This chapter aims at 
developing formal models to capture the characteris- 
tic properties of the most simple cognitive primitives in 
swarming systems. In particular, we wish to understand 
the relationship between the activities of the individ- 
ual member of the swarm and the dynamics that arise at 
collective level. The resulting models can be matched to 
data recorded from physical systems, be used to predict 
the outcome of a robot’s individual behavior on a larger 
swarm, and used in an optimization framework to de- 
termine the best parameters that help improve a certain 
metric [74.3]. 

This chapter reviews probabilistic models of three 
swarming primitives that are examples of conceptu- 


1423 


v 
fan] 

= 

pes 
n 
~ 
F 
= 


1424 Part F 


Swarm Intelligence 


E2 | d Hed 


alizations that are exclusively represented at the col- 
lective level: collaboration, collective decision making, 
and collective optimization. Guided by examples from 
social insects, we present models that generalize to arbi- 
trary agent systems and can serve as building blocks for 
more complex systems. The probabilistic component of 
the models arises from: 


1. The agent’s motion, which often has a random com- 
ponent 

2. Explicit random decisions made by individual 
agents 

3. Random encounters between agents. 


Randomness in an agent’s motion can be intro- 
duced, for example, by physical properties such as 
slip, by deficits in robot hardware, or by explicitly ex- 
plorative behavior, e.g., based on random turns. It is, 
therefore, reasonable to model at least the single-agent 
behavior with probabilistic methods. Yet, it is possible 
to model the expected swarm-level behavior using de- 
terministic models. In such a swarm-level model the 


74.2 The Master Equation 


Let a robot be in a discrete set of states with probabil- 
ity p; € P, with P a vector maintaining the probabilities 
of all possible states and ` P = 1. These states model 
internal states of the robot, determined by its program, 
or external states, determined by the state of the robot 
within its environment. Actions of the robot and envi- 
ronmental effects will change these probabilities. This 
is captured by a phenomenological set of first-order 
differential equations, also known as the master equa- 
tion [74.11], 


La AOP, 


a (74.1) 


underlying stochastic motion of agents is summarized 
in macroscopic properties, which are averages such as 
the expected swarm fraction in a certain state or at a cer- 
tain position [74.4—7]. 

Such probabilistic models are in contrast with deter- 
ministic models of swarming systems, which explicitly 
model the positions of individual robots. Representa- 
tive examples include controllers for flocking [74.8], 
consensus [74.9], and optimal sensor distribution for 
sampling a given probability density function [74.10]. 
While the robots’ spatial distribution is explicitly mod- 
eled, those models have difficulties dealing with ran- 
domness or robot populations in which robots can be in 
different states at the same time. 

After providing a brief background on phenomeno- 
logical probabilistic models based on the master equa- 
tion, this chapter will first review population dynamic 
models that ignore the spatial distribution of the indi- 
vidual robots and the swarm and then present models 
that explicitly model the spatial distribution of the robot 
swarm using time-dependent, spatial probability den- 
sity functions. 


where A(t) is the transition matrix consisting of entries 
pij(t) that correspond to the probability of a transition 
from state i to state j at time t. Multiplying both sides 
with the total number of robots No allows us to calculate 
the average number of robots in each state. For brevity, 
we write 


N,(t) = Nopi(t) . 


Similarly, when expanding the master equation for 
a continuous space variable, one finds the Fokker- 
Planck equation, also known as the Kolmogorov for- 
ward equation or the Smoluchowski equation [74.12, 
13]. 


74.3 Non-Spatial Probabilistic Models 


We will first consider two models that assume the spa- 
tial distribution of the agents in the environment to be 
uniform: collaboration and collective decision. 


74.3.1 Collaboration 


An important swarming primitive is collaboration, 
which requires a number of agents (7) to get together at 


a site. Collaboration is different from the more general 
task allocation problem, in which the number of agents 
is not explicitly specified. In swarm robotics, a site 
can have spatial meaning, but can also be understood 
in an abstract way as means to form teams. Although 
there are many different algorithms for team formation, 
we focus on a collaboration mechanism that was intro- 
duced in the stick-pulling experiment [74.14] and turned 


Probabilistic Modeling of Swarming Systems 


74.3 Non-Spatial Probabilistic Models 


out to be a recurrent primitive in swarm robotic systems, 
e.g., in swarm robotic inspection, where robots can 
serve as temporary markers in the environment [74.15]. 
Here, collaboration happens when an inspecting robot 
encounters a marker, which informs it that this spe- 
cific area has already been inspected. The collaboration 
model, therefore, finds application in studying trade- 
offs between serving as memory to the swarm and 
actively contributing to the swarming behavior’s met- 
ric. 

In the stick-pulling experiment No robots are con- 
cerned with pulling Mọ sticks out of the ground in 
a bounded environment. This task requires exactly n = 
2 robots. Physically, this can be understood as a stick 
that is too long to be extracted from the ground by 
a single robot. Rather, every robot that grabs the stick 
can pull it out a little further and keeps it there un- 
til the next robot arrives. In this work, we abstract the 
classical stick-pulling experiment to a generic collab- 
oration model in which robots are simply required to 
meet, see also Fig. 74.1. Intuitively, the amount of time 
spent waiting for collaboration to happen is a trade- 
off between (1) waiting at a site to find a collaborator 
and (2) having the chance to find a collaborator oneself 
by actively browsing the environment. Finding the col- 
laboration rate, and the individual parameters that lead 
to it, that is optimal for a given environment, i.e., the 
number of collaboration sites and the number of agents, 
illustrates how probabilistic models can be employed to 
design this process and find optimal collaboration poli- 
cies. 

The following model is loosely based on the de- 
velopment in [74.16], which applies discrete time 


Fig. 74.1 A collaboration example. No = 5 robots (black) 
in a bounded environment with Mo = 3 collaboration sites, 
each requiring n = 2 robots to be present simultaneously 
for collaboration to happen 


difference equations. For simplicity, we assume that 
collaboration happens instantaneously and focus on 
a continuous-time representation and stochastic waiting 
times. The reader is referred to [74.16] for an extensive 
treatment of deterministic waiting times and [74.17] for 
an extension to n > 2 agents. Variables used in the equa- 
tions that follow are summarized in Table 74.1. 

Let n,(t) with n,(0) = No be the number of search- 
ing agents at time ż € Rt and No the total number of 
agents. Let ny (t) = No —n,(t) be the number of waiting 
agents at time t. With p the probability to encounter or 
match a waiting agent and T, the average time an agent 
will wait for collaboration, we can write 


As (t) = —p(Mo — nw (t))ns(t) (74.2) 
T ZO 
+ pns(t)ny (ft) . (74.3) 


Thus, ,(t) decreases by the rate at which searching 
agents encounter empty collaboration sites (of which 
there exist Mo —n,(t) at time t), and it increases by 
those agents that return either from unsuccessful (at rate 
1/T,) or successful collaboration, i.e., find any of the 
Nw (t) waiting agents. 

In order to maximize the collaboration rate in the 
system we are interested in maximizing the rate at 
which robots return from successful collaboration, i. e., 
c(t) = pns(t)nw(t). 

Solving for s(t) = 0 and substituting n,,(t) = No — 
ns(t) allows us to calculate the number of robots at 
steady-state n% 


| (2No — Mo)pT, — 1 | 
až = L + VNTC + (Mo = 2No)pT 3? 
i 4pT, 
(74.4) 


As ny, = No — nž by definition, we can write 
cœ“ =p (n*No =n’) ; (74.5) 


Table 74.1 Notation used in the collaboration model 


ns(t) Average number of searching agents 

Nw (t) Average number of waiting agents 

p Probab. to encounter/match a waiting agent 
c(t) Average rate of collaboration matches 

No Total number of agents 

Mo Total number of collaboration sites 

ey Waiting time 


14.25 


EHZ | d Hed 


1426 Part F 


Swarm Intelligence 


E2 | d Wed 


The collaboration rate as a function of T, and No is 
shown in Fig. 74.2. By solving dc* /dnž = 0, we can 


calculate n* „„ = 4No that maximizes c*. Substituting 


s,opt 2 
Ng opt into (74.4) and solving for T,, we can calculate the 


optimal waiting time T, opt as 


1 


e 74.6 
(Mo — No)p ane 


Ii opt = 


As T; opt cannot be negative, an optimal waiting time 
can only exist if Mp > No. This intuitively makes sense, 
because if there are less agents than collaboration sites, 
waiting too long might consume all agents in waiting 
states. We can also see that the more collaboration sites 
there are, the less an agent should wait. There are two 
interesting special cases: first, Nọ = Mo. In this case 
T; opt is undefined. Considering that collaboration sites 
exceed agents by exactly one, T, opt is fully defined by 
1/p. Thus, the higher the likelihood is that agents find 
a collaboration site, the lower the waiting time should 
be. In this case, it makes sense to release searching 
agents from wait states to find another agent to collabo- 
rate. If this likelihood is low, however, agents are better 
off waiting to serve as collaborators for few searching 
agents. 

With T, op given by (74.6) we can derive the 
following guidelines for agent behavior. First, an op- 
timal wait time exists only if there are less agents 
than collaboration sites. Otherwise, longer waits im- 
prove the chance of collaboration. Second, if the num- 
ber of agents, the number of collaboration sites, and 
the likelihood to encounter a collaboration site are 
known to each agent at all times, e.g., due to global 
communication or shared memory, agents could cal- 


Fig. 74.2 The collaboration rate as a function of T, and No 
for Mo = 10 collaboration sites. There exists an optimal 
T, for No < Mo, whereas the collaboration rate increases 
steadily otherwise for increasing values of T, 


Table 74.2 Notation used in the collective decision model 


ns(t) Average number of searching agents 

nj(t) Average number of agents committed to choice i 
Pi Unbiased probability to select choice i 

T Unbiased time to stay with choice i 

No Total number of agents 

Mo Total number of choices 

Ti Waiting time 


culate T, op at all time. If these quantities are not 
known, however, agents can estimate these quanti- 
ties based on their interactions in the environment 
by observing the rates at which they encounter col- 
laboration and empty sites. Individual agent learning 
algorithms that accomplish this goal are discussed in 
detail in [74.18]. 


74.3.2 Collective Decisions 


Another collective intelligent swarming primitive is 
collective decisions. These can be observed in the 
path selection of ants [74.19] or shelter selection of 
cockroaches [74.20] or robots [74.21], but can also 
have non-spatial meaning, for example when a con- 
sensus on Mọ different discrete values is needed. An 
example of such a situation is shown in Fig. 74.3. 
While the above references provide models that 
are specific to their application, this chapter pro- 
vides a generalized model for collective decisions 
that rely on different ways of social amplification, 
i.e., a change of the behavior based on the ac- 
tivities of other swarm members, or the absence 
thereof. 


Fig. 74.3 Collective decision example. No = 6 robots de- 
cide between My =2 choices. Three plus one robot 
have already made decisions, two robots remain 
undecided 


Probabilistic Modeling of Swarming Systems | 74.3 Non-Spatial Probabilistic Models 1427 


Model parameters for the collective decision model 
are summarized in Tab. 74.2. Let n,(t) with n,(0) = No 
be the number of searching/undecided agents at time 
tE RE and No the total number of agents. Let p;, 0 < 
i < Mp, be the unbiased probability for an agent to se- 
lect value i from Mp different values. This probability is 
unbiased as it does not depend on social amplification. 
We can then write the following differential equations 
for the number of agents n;(t) that have the selected 
value i 


nindi- EnO M =0, 


(74.7) 
Mo 


ns(t) =No— > ni(t), 


i=1 


(74.8) 


where 7; is the average time spent on solution i be- 
fore resuming search, and R,(t), Q;(t) : n;(t),ns(t) > 
R*+ are functions that might or might not depend on the 
number of agents in other states, and therefore making 
the differential equation for n;(t) linear or non-linear, 
respectively. There are four interesting cases: both R;(f) 
and Q,(t) being constant, both being functions of one or 
more states of the system, e.g., n;(t) or n,(t), and com- 
binations thereof. 

If both R(t) and Q(t) are constants, one can show 
that the number of agents selecting choice i at steady- 
state n* is given by 


n = Ri s (74.9) 
Qi 

with n= the number of agents that remain undecided 

at steady-state. (This results from agents discarding 

choices at rate 1/T;.) For example, for a two-choice sys- 

tem, using n* = No —n¥ — nj yields the steady states 


ny = = (74.10) 
QQ. + Q2Rı + QıR2 
ž QIR (74.11) 


a OQ + QR, + Oe 


A solution for Rj = 0.01, Ro = 0.04, and Q; = Q2 = 
1/10 is depicted in Fig. 74.4 and leads to ~ 7 and 
~ 27% of agents in states one and two, respectively, 
while most agents remain undecided. In this system, 
the speed at which the steady-state is reached depends 
on the values of R;, with higher values of R; leading 


to faster decisions, whereas the steady-state of unde- 
cided agents is determined by Q;, with lower values of 
Qi corresponding to lower values of n*. In particular, 
values for Q; = 1/100 or Q; = 1/1000 will drastically 
increase convergence, in this example to 67 and 78% 
for the majority choice, respectively. 

If Q;(t) is constant, but R;(t) is a non-linear func- 
tion of the form R;(t) = f[n;(1)]% with a; > 1 a constant, 
we observe n;(t) to grow faster due to social amplifica- 
tion of attraction; the larger n;(t), the larger the positive 
influx into 7i;(t). Systems with this property usually con- 
verge much faster than linear systems. For example, 
a system with 


(74.12) 


shows faster convergence than a linear system for 
a; > 1. Here, normalizing social attraction with No pro- 
vides independence of the dynamics of the number of 
agents. An example with a; = 5 is shown in Fig. 74.4b. 

Similarly, if R;(t) is constant, but Q;(t) is a non- 
linear function of the form Q;(t) = f[n: O]? with B <0 


a) n b) n 
1000 1000 
800 800 
600 600 
400 400 
200 200 
0 0 

0 20 40 60 80 100 0 20 40 60 80 

t 
c)n d) n 
1000 1000 
800 800 
600 600 
400 400 
200 200 
0 
0 20 40 60 80 100 0 20 40 60 80 


Fig. 74.4a-d Time evolution of a collective decision where solu- 
tion two is picked four times as likely as solution one, and both 
solutions are re-evaluated after an average of 100s, for different 
non-linear dynamics. Graphs show the fraction of agents picking 
solutions one and two. (a) Linear system achieving steady-states of 
x 20% and ~ 80%, matching analytical results. (b) Time evolution 
of a system with social amplification of attraction using R; given 
by (74.12) and œ; = 5 for both choices. (c) Social amplification of 
rest using (74.13) and B; = 5. (d) Social amplification of both at- 


traction and rest with a; = f; = 5 


EHZ | d Hed 


1428 Part F 


Swarm Intelligence 


hl | 4 Hed 


a constant, we observe n;(t) to grow faster due to so- 
cial amplification of rest; the larger n;(t), the smaller 
the out-flux from 7i;(t). For example, a system with 


(74.13) 


nj(t) 
oxy = (1+) 
also shows faster convergence than a linear system. No- 
tice that we do not consider positive exponents for f;, 
as this will drive agents away from decisions expo- 
nentially fast and will simply increase n,(t), i.e., the 
number of undecided agents. Results for a two-choice 
system with 8; = 5 are shown in Fig. 74.4c. 

Finally, systems that rely both on social amplifica- 
tion of attraction and rest exhibit the best convergence, 
when compared with a purely linear system as well 
as systems that rely only on either social amplifica- 
tion mechanism. Results for a two-choice system with 
a; = pi = 5 are shown in Fig. 74.4d. 

Similar models, i. e., models that rely on non-linear 
amplification of either attraction, rest, or both have been 
proposed for a series of social insect experiments. For 
example, in [74.19] an ant colony is presented with a bi- 
nary choice to select the shortest of two branches of 
a bridge that connect their nest to a food source. Here, 
a model with social amplification of attraction — by 
means of an exponentially higher likelihood to choose 
a branch with higher pheromone concentration — is cho- 
sen and successfully models the dynamics observed 
experimentally. In [74.20] a model that uses social am- 
plification of rest is chosen to model the behavior of 
a swarm of cockroaches deciding between two shelters 
of equal size but different brightness. The preference 


of cockroaches for darker shelters is expressed with 
a higher p; for this shelter. Convergence to the dark 
shelter is then achieved by social amplification of rest, 
increasing the time cockroaches remain in a shelter 
exponentially with the number of individuals that are 
already in the shelter. Here, all cockroaches converge 
to a single shelter, even though the model proposed 
in [74.20] employs negative social amplification of at- 
traction by introducing a notion of shelter capacity, 
which cancels the positive term in 7;(t) when the shel- 
ter reaches a constant carrying capacity. Finally, [74.22] 
presents a model for cockroach aggregation in which 
the likelihood to join an aggregate of cockroaches in- 
creases with the size of the aggregate, whereas the 
likelihood to leave a cluster exponentially decreases 
with its size. 

The examples from the social insect domain are 
trade-offs between the expressiveness of the model and 
its complexity. As the true parameter values of œ; and 
pi are unknown, the same experimental data can be 
accurately matched by models with different dynam- 
ics. For example, social amplification of attraction as 
observed on larvae of German cockroaches in [74.22] 
was deemed to have negligible influence on mature 
American cockroaches selecting shelters with limiting 
carrying capacity in [74.20]. 

With respect to artificial agent and robotic systems, 
the models presented can instead provide design guide- 
lines for achieving a desired convergence rate. At the 
same time, the models are able to support decisions 
on sensing and communication sub-systems that are 
required to implement one or the other social amplifi- 
cation mechanism. 


74.4 Spatial Models: Collective Optimization 


The concept of optimization in collective systems is 
difficult to separate from the concept of collective 
decisions. Rather, there seems to be a continuous tran- 
sition. Collective decisions are made between several 
distinct alternatives, implying a discrete world of op- 
tions (e.g., left and right branches in path selection, 
two shelters, etc.). Typically, one refers to the term op- 
timization in collective systems in the case of tasks 
that allow for a vast (possibly even infinite) num- 
ber of alternatives implying a continuous world of 
options. 

For this optimization scenario we apply the prob- 
abilistic model reported in [74.4, 5, 23-25]. It is based 


ona stochastic differential equation (SDE, the Langevin 
equation) and a partial differential equation (PDE, the 
Fokker—Planck equation), which can be derived from 
the former. While the Langevin equation is a stochastic 
description of the trajectory in space over time of a sin- 
gle robot, the Fokker—Planck equation describes the 
temporal evolution of the probability density in space 
for these trajectories. Hence, it can be interpreted as the 
average over many samples of robot trajectories (i. e., 
ensembles of trajectories). Even a second, more daring 
interpretation arises. We can interpret this probability 
density directly as a swarm density, that is, the expected 
fraction of the robot swarm for a given area and time. 


Probabilistic Modeling of Swarming Systems 


74.4 Spatial Models: Collective Optimization 


The deterministic PDE describes the mean swarm frac- 
tion in space and time. Interactions between robots can 
be modeled via dependence on the swarm density it- 
self [74.4]. 

We introduce our formalism (see Table 74.3 for 
a summary of all variables used). The Langevin equa- 
tion that gives the position of a robot R at time t is 

RO = —A(R(A), 1) + B(R(A), DEO , (74.14) 
where A defines directed motion via drift depending 
on the current position R and B(R(‘), 1)F defines ran- 
dom motion based on F, which is a stochastic process 
(e.g., white noise). Based on the Langevin equation the 
Fokker—Planck equation can be derived [74.4, 11, 12, 
26] 


ate I _ _v(aer,np(r,t)) 


+ LoVer, t)p(r,t)), (74.15) 
for a swarm density p(r,t) (according to the above 
interpretation) at position r and time f, a drift term 
(—V(A(r, f)p(r, t))) due to directed motion and a dif- 
fusion term (40V? (B? (r, t)p(r, t))) due to random mo- 
tion, whereas typically we set Q = 2 for simplicity. 
According to our general approach [74.4] we introduce 
a Fokker-Planck equation for each robot state and man- 
age the transitions between states by rates similar to the 
rate equation approach of the above sections. 

The optimization scenario considered here was in- 
spired by the behavior of young honeybees. The algo- 
rithm, which defines the robots’ behavior, is derived 
from a behavioral model of honeybees [74.27,28]. 
Honeybees of an age of less than 24h stay in the hive, 
cannot yet fly, navigate towards spots of a preferred 
warmth of 36 °C, and stay mostly inactive. An interest- 
ing example of swarm intelligent behavior is how they 


Table 74.3 Notation used in the optimization model 


R Robot position 

A Direction and intensity of robots’ directed motion 
B Intensity of robots’ random motion 

F Stochastic process (fluctuating directions) 

r Point in space 

Q Theoretic term describing intensity of collisions 
Ps Expected density of robots in state stopped 

Pm Expected density of robots in state moving 

w Waiting time 

o Rate of stopping robots 


search and find the right temperature that their bodies 
need. It turns out that they do not seem to do a gradient 
ascent in the temperature field but rather a correlated 
random walk with inactive periods triggered by so- 
cial interaction. Both the above-mentioned behavioral 
model and the robot controller — called BEECLUST — 
are defined by the following: 


1. Each robot moves straight until it perceives an ob- 
stacle 2 within sensor range. 

2. If 2 is a wall the robot turns away and continues 
with step 1. 

3. If 2 is another robot, the robot measures the lo- 
cal temperature. The higher the temperature is the 
longer the robot stays stopped. When the waiting 
elapses, the robot turns away from the other robot 
and continues with step 1. 


The temperature field that we investigate in the sce- 
nario here has one global optimum (36 °C) at the right 
end of the arena and one local optimum (32 °C) at the 
left end of the arena. In analogy to the behavior ob- 
served in young honeybees, the swarm is desired to 
aggregate fully at the global optimum but, at the same 
time, should also stay flexible within a possibly dy- 
namic environment. The latter is implemented by robots 
(bees) that leave the cluster from time to time and ex- 
plore the remaining arena. If a more preferable spot 
were to emerge elsewhere they would start to aggregate 
there, and the former cluster might shrink in size and 
finally vanish fully. 

Now we apply the above modeling approach to 
this scenario. We have two states: moving and stopped. 
It turns out that in the moving state we do not have 
any directed motion, hence, we will turn off the bias 
in the Langevin equation (74.14, A = 0). Without any 
directed motion in BEECLUST (no gradient ascent, 
actually movement fully independent from the temper- 
ature field) the Fokker—Planck equation can be reduced 
to a mere diffusion equation in order to model the mov- 
ing robots 


dp(r, t) 
ot 


This equation is our approach for state moving yet with- 
out addressing state transition rates. 

The state stopped is even easier to model as it nat- 
urally lacks motion. That way it can be viewed as 
a reduction to a mere rate equation defined in each 
position r. The state transition rates are defined by 
a stopping rate g, which can be determined, for exam- 


= V’ (B (r, t)p(r,t)) . (74.16) 


1429 


4 | d Hed 


1430 


4°42 | 4 Hed 


Part F 


Swarm Intelligence 


ple, empirically or by geometrical investigations (e.g., 
calculation of collision probabilities) [74.4]. For the 
stopped state we obtain 


Ops (r, t) 
ot 


for a stopping swarm fraction p;(r,t)ọ at spot r and 
time ¢, and an awakening swarm fraction p,(r,t— 
w(r))g. The robots stop and wait for a time period w(r), 
which depends on the temperature at spot r. 

Here, we choose to approximate the robots’ corre- 
lated random walk as mere diffusion in a rough estima- 
tion. The function B in (74.16) is reduced to a diffusion 
constant D. We add the rates of stopping/awakening and 
obtain the equation for state moving 


= Poll, 9 — Pm r,t- w(r))p, (74.17) 


ðPm (T, t) 
p = PV Palt, 1) = Palt, Do 


+ Pm(r,t—w(r))o. (74.18) 


If we ignore diffusion and focus on one point in space 
we would have a mere rate equation similar to the above 
sections (except for the time delay) 


Pm(t) = —Pm(NY + Pm(t—w(r))¢ . (74.19) 


Using (74.18) ((74.17) is mathematically not necessary) 
we can model the BEECLUST behavior. For a provided 
initial distribution of the robots we end up with an initial 
value problem for a PDE that we can solve numerically. 


Fig. 74.5a-f Comparison of histograms of 
swarm density obtained by an agent-based 
simulation and the corresponding model 
based on (74.18) for different times and an 
initial state with equal distribution of robots. 
An optimal temperature peak of 36 °C is at 
the right end of the arena, at the left end there 
is a suboptimal peak in temperature of 32°C, 
the middle part is cooler. We observe that 

on average at first clusters form at both ends 
of the arena, but later those on the left van- 
ish. Swarm size is N = 25. The histograms 
obtained by simulation are based on 10° sam- 
ples. (a) Simulation, t = 30; (b) model, t = 
30; (c) simulation, t = 130; (d) model, t = 
130 (e) simulation, t = 200; (f) model, t = 
200 


The solution of this initial value problem is the tem- 
poral evolution of the swarm density. In Fig. 74.5 we 
compare the model to the results obtained by a simple 
agent-based simulation of BEECLUST. This compari- 
son is meant to be qualitative only. The model catches 
most of the qualitative features that occur in simulation, 
although we approximate the robots’ motion in a rough 
estimation by diffusion. 

Our approach shows how borders between the fields 
of engineering and biology vanish in swarm robotics. 
The BEECLUST algorithm is at the same time a con- 
troller for robots but also a behavioral model of an 
animal. The same Fokker—Planck model is used to 
model the macroscopic behavior of honeybees and 
robot swarms. 

The Fokker—Planck model gives good estimates 
for expected swarm densities in space, the tran- 
sient/asymptotic behavior of the swarm, and density 
flows. Modeling space explicitly allows for specific in- 
vestigations such as objective areas and obstacles of 
certain shapes. Other case studies included an emer- 
gent taxis task which relies on one group of robots that 
is pushing another group by collision avoidance [74.4], 
a collective perception task in which robots have to dis- 
criminate aggregation areas of different sizes [74.29], 
and a foraging task [74.30]. This model is mostly rele- 
vant to scenarios with spatially inhomogeneous swarm 
densities, that is, swarms forming particular spatial 
structures that cannot be averaged over several runs. 


Probabilistic Modeling of Swarming Systems | References 


74.5 Conclusion 


We presented mathematical models for three distributed 
swarming behaviors: collaboration, deciding between 
different choices, and optimization. Each of these pro- 
cesses are collective decisions of increasing complexity. 

While the behaviors and trajectories of individual 
robots might be erratic and probabilistic, the aver- 
age swarm behavior might be considered deterministic. 
This holds for both the models and the observed real- 
ity in robotic and biological experiments. An analogy 
is the distinction between the complex, microscopic 
dynamics of multi-particle systems and the much sim- 
pler properties of the corresponding ensembles of such 
systems in thermodynamics. This insight is important 
as it allows us to design the individual behavior so 


References 


74.1 E. Bonabeau, M. Dorigo, G. Theraulaz: Swarm In- 
telligence: From Natural to Artificial Systems, SFI 
Studies in the Science of Complexity (Oxford Univ. 
Press, New York 1999) 

74.2 S. Camazine, J.-L. Deneubourg, N.R. Franks, 
J. Sneyd, G. Theraulaz, E. Bonabeau: Self- 
Organization in Biological Systems, Princeton 
Studies in Complexity (Princeton Univ. Press, 
Princeton 2001) 

74.3 N. Correll, A. Martinoli: Towards optimal control of 
self-organized robotic inspection systems, 8th Int. 
IFAC Symp. Robot Control (SYROCO), Bologna (2006) 

74.4 H. Hamann: Space-Time Continuous Models of 
Swarm Robotics Systems: Supporting Global-to- 
Local Programming (Springer, Berlin, Heidelberg 
2010) 

74.5 A. Prorok, N. Correll, A. Martinoli: Multi-level spa- 
tial modeling for stochastic distributed robotic sys- 
tems, Int. J. Robot. Res. 30(5), 574-589 (2011) 

74.6 D. Milutinovic, P. Lima: Cells and Robots: Model- 
ing and Control of Large-Size Agent Populations 
(Springer, Berlin, Heidelberg 2007) 

74.7 A. Kettler, H. Wörn: A framework for Boltzmann- 
type models of robotic swarms, Proc. IEEE Swarm 
Intell. Symp. (SIS'11) (2011) pp. 131-138 

74.8 A. Jadbabaie, J. Lin, A.S. Morse: Coordination of 
groups of mobile autonomous agents using near- 
est neighbor rules, IEEE Trans. Autom. Control 48(6), 
988-1001 (2003) 

74.9 R. Olfati-Saber, R. Murray: Consensus problems for 
networks of dynamic agents with switching topol- 
ogy and time-delays, IEEE Trans. Autom. Control 49, 
1520-1533 (2004) 

74.10 J. Cortés, S. Martínez, T. Karatas, F. Bullo: Cover- 
age control for mobile sensing networks, IEEE Trans. 
Autom. Control 20(2), 243-255 (2004) 


that the expected value of collective performance is 
maximized. 

Although we presented models with increasing 
level of spatiality — from collaboration sites in the envi- 
ronment to modeling the distribution of robots over con- 
tinuous space — modeling swarming systems with het- 
erogeneous spatial and state distributions using closed 
form expressions is still a major challenge. A better 
understanding of swarming systems with non-uniform 
spatial distributions will help us to better understand 
the impact of environmental patterns such as terrain, 
winds, or currents, thereby enabling swarm engineer- 
ing for a series of real-world applications that swarming 
systems have yet to tackle. 


74.11 N.G. van Kampen: Stochastic Processes in Physics 
and Chemistry (Elsevier, Amsterdam 1981) 

74.12 H. Haken: Synergetics - An Introduction (Springer, 
Berlin, Heidelberg 1977) 

74.13 F. Schweitzer: Brownian Agents and Active Parti- 
cles. On the Emergence of Complex Behavior in the 
Natural and Social Sciences (Springer, Berlin, Hei- 
delberg 2003) 

74.14 A.J. ljspeert, A. Martinoli, A. Billard, L. Gam- 
bardella: Collaboration through the exploitation 
of local interactions in autonomous collective 
robotics: The stick pulling experiment, Auton. 
Robot. 11, 149-171 (2001) 

74.15 N. Correll, A. Martinoli: Modeling and analy- 
sis of beaconless and beacon-based policies for 
a swarm-intelligent inspection system, Proc. 2005 
IEEE Int. Conf. Robot. Autom. (ICRA 2005) (2005) 
pp. 2477-2482 

74.16 A. Martinoli, K. Easton, W. Agassounon: Modeling 
of swarm robotic systems: A case study in collabo- 
rative distributed manipulation, Int. J. Robot. Res. 
23(4), 415-436 (2004) 

74.17 K. Lerman, A. Galstyan, A. Martinoli, A.-J. ljspeert: 
A macroscopic analytical model of collaboration in 
distributed robotic systems, Artif. Life 7(4), 375-393 
(2001) 

74.18 L. Li, A. Martinoli, Y. Abu-Mostafa: Learning and 
Measuring Specialization in Collaborative Swarm 
Systems, Adapt. Behav. 12(3/4), 199-212 (2004) 

74.19 J.-L. Deneubourg, S. Aron, S. Goss, J.M. Pasteels: 
The self-organizing exploratory pattern of the ar- 
gentine ant, J. Insect Behav. 3, 159-168 (1990) 

74.20 J. Halloy, J.-M. Amé, G.S.C. Detrain, G. Caprari, 
M. Asadpour, N. Correll, A. Martinoli, F. Mon- 
dada, R. Siegwart, J.-L. Deneubourg: Social inte- 
gration of robots in groups of cockroaches to control 


1431 


42 | d Hed 


1432 Part F | Swarm Intelligence 


42 | 4 Hed 


74.21 


74.22 


74.23 


74.24 


74.25 


self-organized choice, Science 318(5853), 1155-1158 
(2009) 

S. Garnier, C. Jost, R. Jeanson, J. Gautrais, M. Asad- 
pour, G. Caprari, J.-L. Deneubourg, G. Ther- 
aulaz: Collective decision-making by a group of 
cockroach-like robots, 2nd IEEE Swarm Intell. Symp. 
(SIS) (2005) 

R. Jeanson, C. Rivault, J.-L. Deneubourg, S. Blanco, 
R. Fournier, C. Jost, G. Theraulaz: Self-organized 
aggregation in cockroaches, Anim. Behav. 69, 169- 
180 (2005) 

H. Hamann, H. Worn: A framework of space-time 
continuous models for algorithm design in swarm 
robotics, Swarm Intell. 2(2-4), 209-239 (2008) 

H. Hamann, H. Worn, K. Crailsheim, T. Schmickl: 
Spatial macroscopic models of a bio-inspired 
robotic swarm algorithm, IEEE/RSJ 2008 Int. Conf. 
Intell. Robot. Syst. (IROS'08), Los Alamitos (2008) 
pp. 1415-1420 

T. Schmickl, H. Hamann, H. Worn, K. Crailsheim: 
Two different approaches to a macroscopic model 


74.26 


74.27 


74.28 


74.29 


74.30 


of a bio-inspired robotic swarm, Robot. Auton. 
Syst. 57(9), 913-921 (2009) 

J.L. Doob: Stochastic Processes (Wiley, New York 
1953) 

T. Schmickl, R. Thenius, C. Méslinger, G. Radspieler, 
S. Kernbach, K. Crailsheim: Get in touch: Coop- 
erative decision making based on robot-to-robot 
collisions, Auton. Agents Multi-Agent Syst. 18(1), 
133-155 (2008) 

S. Kernbach, R. Thenius, 0. Kornienko, T. Schmickl: 
Re-embodiment of honeybee aggregation behav- 
ior in an artificial micro-robotic swarm, Adapt. 
Behav. 17, 237-259 (2009) 

H. Hamann, H. Worn: A space- and time- 
continuous model of self-organizing robot swarms 
for design support, 1st IEEE Int. Conf. Self-Adapt. 
Self-Organ. Syst. (SASO'07), Boston, Los Alamitos 
(2007) pp. 23-31 

H. Hamann, H. Worn: An analytical and spatial 
model of foraging in a swarm of robots, Lect. Notes 
Comput. Sci. 4433, 43-55 (2007) 


1433 


Hybr 


75 A Robust Evolving Cloud-Based 


7 


7 


a 


~ 


Controller 

Plamen P. Angelov, Bailrigg, Lancaster, UK 
Igor Škrjanc, Ljubljana, Slovenia 

Sašo Blažič, Ljubljana, Slovenia 


Evolving Embedded Fuzzy Controllers 
Oscar H. Montiel Ross, Mesa de Otay, 
Tijuana, Mexico 

Roberto Sepúlveda Cruz, Mesa de Otay, 
Tijuana, Mexico 


Multiobjective Genetic Fuzzy Systems 
Hisao Ishibuchi, Osaka, Japan 
Yusuke Nojima, Osaka, Japan 


Part G Hybrid Systems 


Ed. by Oscar Castillo, Patricia Melin 


78 Bio-Inspired Optimization 
of Type-2 Fuzzy Controllers 
Oscar Castillo, Tijuana, Mexico 


79 Pattern Recognition 
with Modular Neural Networks 
and Type-2 Fuzzy Logic 
Patricia Melin, Chula Vista, USA 


80 Fuzzy Controllers 
for Autonomous Mobile Robots 
Patricia Melin, Chula Vista, USA 
Oscar Castillo, Tijuana, Mexico 


81 Bio-Inspired Optimization Methods 
Fevrier Valdez, Tijuana, Mexico 


ial" Sy 


75. A Robust Evolving Cloud-Based Controller 


Plamen P. Angelov, Igor Škrjanc, Sašo Blažič 


In this chapter a novel online self-evolving cloud- 
based controller, called Robust Evolving Cloud- 
based Controller (RECCo) is introduced. This type 
of controller has a parameter-free antecedent 
(IF) part, a locally valid PID consequent part, and 
a center-of-gravity based defuzzification. A first- 
order learning method is applied to consequent 
parameters and reference model adaptive con- 
trol is used locally in the ANYA type fuzzy rule- 
based system. An illustrative example is provided 
mainly for a proof of concept. The proposed con- 
troller can start with no pre-defined fuzzy rules 
and does not need to pre-define the range of the 
output, number of rules, membership functions, 
or connectives such as AND, OR. This RECCo con- 
troller learns autonomously from its own actions 
while controlling the plant. It does not use any 
off-line pre-training or explicit models (e.g. in 
the form of differential equations) of the plant. 
It has been demonstrated that it is possible to 
fully autonomously and in an unsupervised man- 
ner (based only on the data density and selecting 
representative prototypes/focal points from the 
control hypersurface acting as a data space) gen- 
erate and self-tune/learn a non-linear controller 
structure and evolve it in online mode. Moreover, 


75.1 Overview of Some Adaptive 


and Evolving Control Approaches.......... 1435 
75.2 Structure of the Cloud-Based 
Controller oone nea 1437 
75.3 Evolving Methodology for RECCo........... 1439 
75.3.1 Online Adaptation 
of the Rule Consequents ............ 1439 
75.3.2 Evolution of the Structure: 
Adding New Clouds................00 1440 
75.4 Simulation Study ....................ee 1442 
75.5 Conclusions .................ccceeeeeeeeeeeeeeeeees 1447 
Referentes... os.ciuacncsinguescanatovertaatoeadensonanne 1448 


the results demonstrate that this autonomous 
controller has no parameters in the antecedent 
part and surpasses both traditional PID controllers 
being a non-linear, fuzzy combination of locally 
valid PID controllers, as well as traditional fuzzy 
(Mamdani and Takagi-Sugeno) type controllers 
by their lean structure and higher performance, 
lack of membership functions, antecedent pa- 
rameters, and because they do not need off-line 
tuning. 


75.1 Overview of Some Adaptive and Evolving Control Approaches 


Fuzzy logic controllers where proposed some four 
decades ago by Mamdani and Assilian [75.1]. Their 
main advantage is that they do not require the model 
of the plant to be known and their linguistic form is 
closer to the way human reasoning is expressed and 
formalized. It is difficult to identify all possible events 
or the frequency of their occurrences while modeling 
a system. The lack of this knowledge requires use of an 
approximate model of a system. 


Due to the fact that a fuzzy logic algorithm has the 
characteristic of a universal approximator, it is possible 
to model systems containing unknown nonlinearities 
using a set of IF-THEN fuzzy rules. 

The main challenges in designing conventional 
fuzzy controllers are that they are sometimes designed 
to work in certain modeling conditions [75.2]. More- 
over, fuzzy controllers include at least two parameters 
per fuzzy set, which are usually predefined in advance 


1435 


v 
o 
= 
or 
(e) 
~ 
vi 
. 
—= 


1436 


Lod | 9 Hed 


Part G 


Hybrid Systems 


and tuned off-line [75.3]. Many techniques have been 
presented for auto-tuning of the parameters of con- 
trollers in batch mode [75.4], mostly using genetic 
algorithms [75.5,6] or neural networks [75.7, 8] off- 
line. From a practical point of view, however, there 
is no guarantee that pre-training parameters have sat- 
isfactory performance in online applications when the 
environment or the object of the controller changes. To 
tackle this problem several approaches have been pro- 
posed for online adaptation of fuzzy parameters [75.9-— 
15]. 

Nevertheless, only a few approaches have been 
introduced for online adaptation of fuzzy controller 
structures when no prior knowledge of the system is 
available. Evolving fuzzy rule-based controllers were 
introduced in 2001 by Angelov et al. [75.16]. They al- 
low the controller structure (fuzzy rules, fuzzy sets, 
membership functions, etc.) to be created based on 
data collected online. This is based on a combination 
of inverse plant dynamic modeling [75.17] using self- 
evolving fuzzy rule-based systems [75.18]. The pro- 
posed approach is applied to autonomously learning 
controllers that are self-designed online. 

The advantage of this method is that there is no 
need for pre-tuning of the control parameters. More- 
over, the proposed method can start with an empty 
topology, and the structure of the controller is modified 
online based on the data obtained during the opera- 
tion of the closed loop system. Two main phases were 
introduced for parameter learning of the controller’s 
consequents and modifying the structure of the con- 
troller. The proposed approach was successfully applied 
to a nonlinear servo system consisting of a DC motor 
and showed satisfactory performance [75.19], as well 
as to control of mobile robots [75.20]. The drawback of 
this approach is that the addition of new membership 
functions increases the number of rules exponentially, 
and each membership requires at least two parameters 
in the antecedent part to be specified, plus connec- 
tives such as AND, OR, NOT. Many of these problems 
have been overcome with the latest version of the ap- 
proach [75.21], which combines the Angelov—Yager 
(ANYA) type fuzzy rule-based system (FRB) [75.22] 
with the inverse plant dynamics model. ANYA can 
be seen as the next form of FRB system types after 
the two well-known Mamdani and Takagi—Sugeno type 
FRBs. It does not require the membership functions 
to be defined for the antecedent part, nor the connec- 
tives such as AND, OR, NOR. It still has a linguistic 
form and is non-linear. It is fuzzy in terms of the de- 
fuzzification. In order to clarify what that means let us 


recall that all three types of FRB: Mamdani, Takagi- 
Sugeno, and ANYA can be represented as a set of 
fuzzy rules of the form IF (antecedent) and THEN 
(consequent). While in the Mamdani type FRB both 
antecedent and consequent parts are fuzzy, in the so- 
called Takagi-Sugeno type FRB the consequent part 
is a functional, f(x) (most often linear) with the an- 
tecedent part being fuzzy. In both types of FRB the 
defuzzification can be either of so-called center-of- 
gravity (COG) or winner takes all (WTA) type. There 
are variations such as few winners take all, etc., but usu- 
ally COG is applied unless a classification problem is 
considered, where WTA performs better. In ANYA type 
FRBs the antecedent part is defined using an alterna- 
tive, density-based representation which is parameter- 
free and reduces the problems of definition and tuning 
of membership functions (one of the stumbling blocks 
in the application of the fuzzy set theory overall). The 
consequent part of the ANYA type FRB can still be 
same as in Takagi-Sugeno type FRBs. For more de- 
tail, the reader is referred to [75.21, 22]. The so-called 
SPARC self-evolving controller, however, has poorer 
performance in the first moments when applied from 
scratch (with no pre-trained model and no rules). In this 
chapter, model reference control and gradient-based 
learning of the consequents of the individual locally 
valid rules. In the proposed method the antecedent part 
is determined using focal points/prototypes (selected 
descriptive actual data points) instead of pre-defining 
the membership functions in an explicit manner. The 
fuzzy rules are formed around selected representative 
points from the control surface; thus, there is no need 
to define the membership functions per variable. It has 
a much simplified antecedent part which is formed us- 
ing so-called data clouds. Fuzzy data clouds are fuzzy 
sets of data samples which have no specific shape, pa- 
rameters, or boundaries. With ANYA type FRBs the 
relative density is used to define the relative member- 
ship to a particular cloud. It takes into account the 
distance to all previous data samples and can be cal- 
culated recursively. 

In order to show the effective performance of the 
proposed controller, it is applied to a simulated problem 
of temperature control in a water bath [75.21]. 

The remainder of this chapter is organized as fol- 
lows. In Sect. 75.2 the new simplified FRB system 
is introduced, including rule representation and the 
associated inference process. The evolving methodol- 
ogy used for the online learning of both the struc- 
ture and the parameters of RECCo is described in 
Sect. 75.3. First, we present the mechanism for the 


A Robust Evolving Cloud-Based Controller 


75.2 Structure of the Cloud-Based Controller 


online adaptation of the consequents in Sect. 75.3.1 
and then, we illustrate the structure evolution process 
in Sect. 75.3.2. In Sect. 75.4, the simulation exam- 


ple is presented as a proof of concept for the pro- 
posed methodology. Finally, conclusions are drawn in 
Sect. 75.5. 


75.2 Structure of the Cloud-Based Controller 


ANYA [75.22] is a recently proposed type of FRB 
system characterized by the use of non-parametric 
antecedents. Unlike traditional Mamdani and Takagi- 
Sugeno FRB systems, ANYA does not require an ex- 
plicit definition of fuzzy sets (and their corresponding 
membership functions) for each input variable. On the 
contrary, ANYA applies the concepts of fuzzy data 
clouds and relative data density to define antecedents 
that represent exactly the real data density and distri- 
bution and that can be obtained recursively from the 
streaming data online. 

Data clouds are subsets of previous data samples 
with common properties (closeness in the data space). 
Contrary to traditional membership functions, they rep- 
resent directly and exactly all the previous data samples. 
Some given data can belong to all the data clouds with 
a different degree y € [0, 1], thus the fuzziness in the 
model is preserved and used in the defuzzification, as 
will be shown later. It is important to stress that clouds 
are different from traditional clusters in that they do not 
have specific shapes and, thereby, do not require the 
definition of boundaries. 

First it was proposed to use ANYA to design fuzzy 
controllers for the situations in which the lack of knowl- 
edge about the plant makes it difficult to define the rule 
antecedents in [75.21]. SPARC autonomous controllers 
have a rule base with N rules of the following form 


R' : IF (x~ X’) THEN (u') , (75.1) 


where ~ denotes the fuzzy membership expressed lin- 
guistically as is associated with, X' € R” is the i-th data 
cloud defined in the input space, x = [x1,x2,...,Xn]7 is 
the controller’s input vector, and u' is the control action 
defined by the i-th rule. 

It is to be noted that no aggregation operator is re- 
quired to combine premises of the form IF x; is x, as 
in traditional fuzzy systems. All the remaining compo- 
nents of the FRB system (e.g., the consequents and the 
defuzzification method) can be selected as in any of the 
traditional fuzzy systems. 

A tule base of the form (75.1) can describe complex, 
generally non-linear, non-stationary, non-deterministic 


systems that can be only observed through their inputs 
and outputs. Hence, autonomous controllers based on 
ANYA type FRB systems are suitable to describe de- 
pendence of the type JF X THEN U based on the history 
of pairs of data observations of the form g = [x/; uj)’ 
(withj = 1,...,k— 1 andz € R”+!), and the current k- 
th input x7. 

The degree of membership of the data sample x, 
to the cloud X' is measured by the normalized relative 
density as follows 


vi 
N i? 
ei Yk 


where y; is the local density of the i-th cloud for that 
data sample. 

This local density is defined by a suitable kernel 
over the distance between x; and all the other samples 
in the cloud, i. e., 


ài = co Peres (75.2) 


vi=K| del: i=1,...N, (75.3) 


where dj denotes the distance between the data samples 
x, and x;, and M' is the number of input data samples 
associated with the cloud XÍ. 

In a similar manner, we consider that a sample is 
associated with the cloud with the highest local density. 
In addition, we use the Euclidean distance, i.e., dj = 


\|xx —.||?. Nonetheless, any other type of distance could 
also be used [75.22]. 

In this study we used a Cauchy kernel. Thereby, 
(75.3) can be recursively determined as follows [75.23] 


1 


= 3 (75.4) 
1+ [|x — poll? + £r — [eel |? 


vi 


where yj denotes the relative density to the i-th data 
cloud calculated in the k-th time instant, and X; denotes 
the scalar product of the data x; 

k-1 


1 
dig = —— Dy + zl Pe 


2 
: 75.5 
i (75.5) 


1437 


T'S | 9 Hed 


1438 PartG 


Hybrid Systems 


7°G2 | D Hed 


with starting condition ©, = ||x;||?. The update of the 
mean value, jz is straightforward 


k-1 1 
Hk = ——BHk-1 + Xk 


i i (75.6) 


with the starting condition 4; = x). 

As for the defuzzification, it is to be noted that 
ANYA can work with both Mamdani and Takagi- 
Sugeno—Kang (TSK) consequents. In this case, we use 
the latter type, as is usual in control applications [75.19, 
24-26]. Hence, if we consider the weighted average for 
the defuzzification, the output of the ANYA controller 


N N Bi 
j = Y iui = at 
km Ke Ng 
i= i=1 Vk 


(75.7) 
where u’ denotes the i-th rule consequent. 

From a local point of view, the goal of the con- 
troller is to bring the plant’s output from its current 
value to the desired reference value as soon as possi- 
ble, i. e., ideally y,4) = rg, where k and k + 1 represent 
consecutive control steps. It is well known that it is not 
possible to do this immediately due to several limita- 
tions in the system (most notably — the actuator ones). 
A useful practice is to introduce a reference model that 
represents the desired closed-loop dynamics of the con- 
trolled systems [75.27]. The simplest choice is to use 
a linear reference model of the first order. Then the pre- 
diction of the reference output y" can be obtained using 
the following equations 


Yki = 4k + (l-a)n, O<a, <1, (75.8) 
where a, is the pole of the first-order filter. 

It can be tuned according to the desired speed of the 
closed-loop system. Comparing the output of the plant 
yx to the output of the reference model y;, the tracking 
error € is obtained 

Ek = Yk — Yk - (75.9) 
The goal of the controller in terms of the tracking error, 
&; is to keep it as low as possible. Since the reference 
model output yj, is a filtered version of the reference 


signal rz, this means that the tracking error has no step 
changes due to reference signal changes. It also has 
to be noted that tracking error is used as a driving 
error during parameter adaptation, as we shall see in 
Sect. 75.3.1. 

As noted above, the proposed approach is com- 
patible with a wide spectrum of control laws in rule 
consequents. Here, the PID-based rule consequents are 
proposed 

Uj, = Pree tH Se+Di Ac, i=1,...,N, (75.10) 
where X, and A, denote the discrete-time integral and 
derivative of the tracking error, respectively, 


(75.11) 


while PŻ, L, and Di are parameters that will be tuned 
by means of adaptation of rule consequents. 

The approach offers the possibility of implement- 
ing several subsets of PID-based controllers such as P, 
PI, PD, etc. For simplicity, only proportional controllers 
will be used in the rule consequent for the rest of this 
chapter 

u =P er, i=1,...,N. (75.12) 

It has to be kept in mind that most real-life con- 
trollers are limited in their operation and can only 
provide control actions within a specific range, namely, 
the actuator’s interval [Umin, Umax]. If the computed 
control signal uz, is outside the actuator’s interval, 
it can simply be projected onto the interval if P or 
PD type controllers are used. If integral controllers 
are used as well, some classical approaches to avoid 
integral windup should be implemented. When the 
violations of the actuator’s constraints are more dras- 
tic, because of the chosen dynamics of the refer- 
ence model and a narrow actuator interval, then also 
the interruption of the control parameters adaptation 
can be employed to make the adaptation more ro- 
bust. This modification will also be explained in detail 
later. 


A Robust Evolving Cloud-Based Controller 


75.3 Evolving Methodology for RECCo 


75.3 Evolving Methodology for RECCo 


In this section, we present the methodology applied 
for evolving the structure and parameters of the con- 
sequents of RECCo online. Initially, the controller is 
empty, so it has to be initialized from the first data sam- 
ple received. After this, the same steps are repeated for 
all incoming data. First, the consequents of the current 
rules are updated according to the error at the plant out- 
put. Then, a new control action is generated by applying 
the inference process described in Sect. 75.2. Finally, 
the structure of the controller is updated. If the appro- 
priate conditions are satisfied, a new cloud (and hence, 
a rule) is created; otherwise, the new sample is used to 
update the information about the data density and the 
consequent parameters of the current configuration of 
the controller. 

In the following sections, the entire process is de- 
scribed in more detail. Section 75.3.1 is devoted to 
the mechanism for online adaptation of the rule conse- 
quents. In Sect. 75.3.2 the process of adding new clouds 
is described. 


75.3.1 Online Adaptation 
of the Rule Consequents 


Assuming that the plant is monotonic with respect to the 
control signal, the partial derivative of the plant’s output 
with respect to the control signal has a definite constant 
sign Gsign = +1, which has to be known in advance. 
Therefore, the combination of the error at the plant’s 
output and the sign of the monotonicity of the plant 
with respect to the control signal provides information 
about the right direction in which to move the rule con- 
sequents to achieve the local control objective [75.10]. 

As is already known, the parameters of the rule 
consequents are obtained by means of adaptation. In 
normal circumstances the parameter changes are calcu- 
lated as follows 


AP = YPGsignAi(@)ek, i=1,...N, (75.13) 


where yp is an adaptive gain for the proportional con- 
troller gains. 

Equation (75.13) is obtained by using gradient de- 
scent and having the square of the tracking error as 
a cost function. The controller gains are obtained by 
summing up the terms obtained in (75.13) 


P} =P; +A4AP, i=1,...N. (75.14) 


Note that parameters keep changing until the tracking 
error is driven towards 0. Note also that only parameters 
corresponding to the active clouds are adapted, while 
the others are kept constant. 

Systems with parameter adaptation are subjected to 
parameter drift, which can lead to performance degra- 
dation and, eventually, to system instability [75.28]. 
There exist many known approaches to make adaptive 
laws more robust [75.29-31]. We will employ pa- 
rameter projection, parameter leakage, introduce dead 
zone into adaptive laws, and employ the saturation 
of the adaptive parameters when the actuator is in 
saturation. 


Dead Zone in the Adaptive Law 

As already has already been said, adaptation of the 
parameters in the closed loop always presents poten- 
tial danger to the system’s stability. The adaptation 
is driven by an error signal that is always composed 
of the useful component and the harmful one. The 
latter is due to disturbances and parasitic dynam- 
ics, and is usually bounded. Large errors are usually 
mostly composed of useful components, while small 
tracking error is very often due to harmful signals. 
Having the adaptation active during the time that the 
error is small results in a false adaptation. The idea 
behind the dead zone in the adaptive law is that 
the adaptation is simply switched off if the abso- 
lute value of the error that governs the adaptation is 
small [75.32] 


AP = VPGsignà i (x)ek [Ex] = daeaa 
0 lex < dead 


i= Ta. (75.15) 


Parameter Projection 
Parameter projection is a natural way to prevent param- 
eter drift. The idea is to project the parameters onto 
a compact set [75.27]. In our case, each individual pa- 
rameter is projected on a certain interval or a ray. When 
projecting the parameters some prior knowledge must 
always be available. Since in our case proportional con- 
troller gains are adapted, their sign is always known 
and is equal to Gsign. In the case of positive plant gain 
all the consequent parameters should be bounded by 0 
from below, while an upper bound may or may not (if 
not enough prior knowledge is available) be provided. 
The adaptive law given by (75.14) is generalized as 


1439 


E'S | D Hed 


1440 


€°S2 | D Hed 


Part G | Hybrid Systems 


follows if the controller gains are projected onto the in- 
terval [P, P] 


Pi +AP, P<Pi_,+AP,<P 
PP Pi» + AP. <P ; 
P Pi, +APi >P 
i= leM, 
(75.16) 


where P and P are two design parameters. In our ap- 
proach P = 0 and P = 00 will be used. 


Leakage in the Adaptive Law 
The idea of leakage is that the discrete integration in 
the adaptive law in (75.14) presents a potential danger 
to the adaptive system, and the pole due to the integra- 
tor should be pushed inside the unit disc [75.33], which 
results in the adaptive law 


Pi = (1—op)Pi_ + APL, i=1,...N, (75.17) 


where op defines the extent of the leakage. The intro- 
duction of the leakage results in adaptive parameter 
boundedness. This is why leakage is sometimes referred 
to as soft projection. 


Interruption of Adaptation 

When the chosen dynamics and the actuator constraints 
are in conflict, such that tracking of the reference model 
cannot be achieved in a sufficiently small interval, then 
drift of the control parameters often occurs, because the 
adaptive law is driven with a tracking error €, which 
cannot be reduced, due to the control signal constraints. 
The interruption of the adaptation results in the follow- 
ing modification 


APi YPGsignA g (X)Ex , Umin S Uk S Umax 
ki ; 
0, else 


i=1,...N. (75.18) 


75.3.2 Evolution of the Structure: 
Adding New Clouds 


The adaptation of the parameters of the consequents 
is performed online in a closed loop manner (while 
the controller operates over the real plant). The control 
is applied from the first moment (no a-priori informa- 
tion or controller structure is needed). Adaptive systems 


traditionally [75.34] concern tuning parameters of the 
controllers for which the structure has been pre-selected 
by the designer. Self-evolving controllers [75.16, 24] 
offer the possibility of evolving the structure of the con- 
troller as well as adapting parameters. This helps design 
on the fly controllers, which are non-linear and with 
no pre-defined structure or knowledge about the plant 
model. This requires us to define a mechanism for the 
online evolution of the controller’s structure, i.e., for 
adding new antecedents and fuzzy rules. 

We already defined the local density earlier. Now, 
the global density J”, will be defined. Its definition is 
analogous to the one given for the local density, except 
that it takes into account the distance to all the previ- 
ously observed samples z; (j = 1,...,k—1). It has to be 
noted that the global density is computed for the points 
zk = [x]; ue)’, whilst the local density is defined only 
for the input vectors xz. Using again the Cauchy kernel, 
this density can be defined as 


1 
i, = ————; » (75.19) 


and can be computed recursively by [75.23] 


1 
+ [ze — mll? + ZF- lugi 


T, (75.20) 


where J; denotes the global density to all the data cal- 
culated in the k-th time instant, and X Hy denotes the 
scalar product of the data z 


k-1 1 
es ih + glial’, (75.21) 


with starting condition Xf = ||z\||°. The update of the 
mean value u| is straightforward 


ke} Ae l (75.22) 
k Mk-1 Re . 


Hk = 
with starting condition © = z1. 

Since this measure considers all existing samples, 
it provides an indication of how representative a given 
point zx is with respect to the entire data distribution. 

Additionally, and only for learning purposes, a fo- 
cal point X} and a radius rig are defined for each cloud. 
The focal point is a real data sample that has highly 


A Robust Evolving Cloud-Based Controller 


75.3 Evolving Methodology for RECCo 


representative qualities. The fact that the focal point is 
always areal sample is important, as it avoids problems 
that may appear when only a descriptive measure is 
used instead (as in the case of the average of the points 
in the cloud). In order to follow the philosophy of the 
proposed methodology (i. e., avoiding the need to pre- 
define the parameters of the rules), the focal point is 
updated online. Thus, for each new sample zę, the fol- 
lowing process is applied: 


1. Find the associated cloud Cİ, according to 


Cİ = arg max (vi) é (75.23) 


2. Check the representative qualities of the new data 


point using the following conditions, 


(75.24a) 
(75.24b) 


vi> yi, 
Ren, 


where y; and Tý represent the local and global den- 
sity of the current focal point, respectively. 

3. If both conditions are satisfied, then replace the fo- 
cal point by applying Xf < x. 


The radius provides an idea of the spread of the 
cloud. Since the cloud does not have a definite shape or 
boundary, the radius represents only an approximation 
of the spread of the data in the different dimensions. It 
is also recursively updated as follows [75.35] 

Tik = P Fig) + = pyar ; rA =i; (75.25) 
where p is a constant that regulates the compatibility 
of the new information with the old one and is usually 
set to p = 0.5 [75.35]. The value oh denotes the cloud’s 
local scatter over the input data space and is given by 


; E esos 
Ce = | DK: o=l. (75.26) 
l=1 


It is important to note that the radii and focal points 
are only used to provide an idea of the location and dis- 
tribution of the data in the clouds during the structure 
evolution process. However, they are not actually used 
to represent the clouds or the fuzzy rules and do not 
affect the inference process at any point. 

The structure-learning mechanism applied for the 
proposed RECCo is based on the following princi- 
ples [75.36, 37]: 


a) Good generalization and summarization are 
achieved by forming new clouds from data samples 
with high global density I”. 

b) Excessive overlap between clouds is avoided by 
controlling the minimum distance between them. 


Hence, the evolution of the structure is based on the 
addition of new clouds and the associated rules. First, 
RECCo is initialized by creating a cloud C! from the 
first data sample zı = [x1 ;u;]’. The antecedent of the 
first rule is then defined by this cloud and its consequent 
equals the value u;. Next, for all the further incoming 
data samples zg, the following steps are applied: 


1. The sample zę = [x7;u,]’ is considered to have 
good generalization and summarization capabilities 
if its global density is higher than the global den- 
sity of all the existing clouds. Thus, the following 
condition is defined 

Dale Vis iawgN< (75.27) 
Note that this is a very restrictive condition that 
requires that the inequality is satisfied for all the ex- 
isting clouds, which is not very often for real data. 

2. Check if the existing clouds are sufficiently far 
from zę with the following condition 


di> E, Wi=1,....N, (75.28) 
where dpi; represents the distance from the current 
sample to the focal point of the associated cloud, Xi}. 

3. According to the result of the previous steps, take 
one of the following actions: 

a) If conditions (75.27) and (75.28) are both satis- 
fied, then create a new cloud C+!, The focal 
point of the new cloud is K = xz. Its local 
scatter is initialized based on the average of the 
local scatters of the existing clouds [75.35] 


(75.29) 


Additionally, the corresponding rule has to be 
added to the rule base. The antecedent of the 
new rule is defined by the newly created cloud 
CNI, For the consequent, we provide an ini- 
tial value that guarantees that the output of the 
new controller when the input vector is equal 


1441 


E'S | D Hed 


1442 


7°G2 | D Hed 


Part G 


Hybrid Systems 


to the focal point (i. e., x = X? +1) equals the 


controller’s output under its previous config- 
uration for that same input [75.21]. The ra- 
tionale behind this initialization is to provide 
a smooth transition from the old configura- 
tion of the controller to the new one. This 
avoids sudden changes in the output surface 
that could damage the controller’s performance 
in the first time instants after the rule has 
been added (and before the consequents are 
adapted) [75.25]. 


75.4 Simulation Study 


A simulation study of the proposed self-organizing 
controller is presented in this section. The main atten- 
tion is given to the study of different modifications of 
self-organizing controllers to make the adaptive laws 
more robust. In the study, we show the implementa- 
tion of parameter projection, parameter leakage, and 
the introduction of dead zone into the adaptive laws. 
The study was carried out with the assumption of no 
prior knowledge about the plant dynamics. The math- 
ematical model was only used to simulate the plant 
dynamics. 

The plant for the simulation study is the thermal 
process of a water bath. The main goal is the control of 
the temperature in the water bath. The plant is described 
with the following mathematical model in a discrete 


> 
10000 
k 


© 


2000 4000 6000 8000 


© 


2000 4000 6000 8000 10000 


k 


Fig. 75.1 Open-loop response of the plant: output temper- 
ature and input variable 


b) Ifthe conditions are not satisfied, update the pa- 
rameters of the cloud C’ associated with zę, as 
previously explained. 


It is important to stress that the methodology pre- 
sented for the evolution of the controller’s structure 
starts from an empty controller. However, if an initial 
set of rules is known beforehand (e.g., provided by an 
expert or obtained from any other training method), it 
can be used for the initial controller. In this case, the 
algorithm’s initialization step can be omitted. 


form 


y(k+ 1) = ay(k) + bu(k) + (1—a) yo, (75.30) 
where a =e” 7s and b= BUST, The parameters 
of the plant are estimated as œ = 1074, B =8.7 x1073, 
y =40, and yo = 20°C. The sampling period, Ts, is 
equal to 25s. 

The open loop response of the plant is shown in 
Fig. 75.1. It is shown that the behavior of the plant 
exhibits a huge nonlinearity in the static gain of the pro- 
cess. The reference signal is chosen to show the ability 
of self-learning and dealing with nonlinearity, which 
is the main advantage of the proposed algorithm. The 


Tk, Yk and y; 
1004 


80 


60 


0 02 


04 06 0.8 1 LS TA LE L8 2 


-10 > 
0 0.2 04 06 08 1 12 14 16 18 2 
kx 104 


Fig. 75.2 Reference, model reference, output signal track- 
ing, and control signal in the case of no robust adaptive 
laws 


A Robust Evolving Cloud-Based Controller | 75.4 Simulation Study 


clouds are defined in a way so as to enable dealing with 
nonlinearity, and for that reason the input variables for 
the controller are the reference value ry and the tracking 
error Ex. 

All the simulations started from zero fuzzy rules and 
membership functions, and new rules were generated 
during the process. The first simulation was done for 
the case without any robust adaptive laws. The parame- 
ters of the control law do not converge and drift, which 
leads to performance degradation and, after some time, 
also to instability. In this example, the actuator interval 
was given as [—6, 6], the reference model parameter was 
defined as a, = 0.925, and the adaptive gain yp = 0.1. 
The upper plot of Fig. 75.2 shows the reference r, the 


Fig. 75.3 Clouds in the case of no robust adaptive laws 


Pi 
120 


100 
80 
60 
40 
20 


0 
0 02 04 06 08 1 C IA 6T Sa 


kx 10* 


Fig. 75.4 Adaptive parameters P* in the case of no robust 
laws 


model reference y,, the output signal y, and the lower 
plot shows the control signal. The seven clouds gen- 
erated during the self-learning procedure are given in 
Fig. 75.3. 

The drifts of the adaptive parameters Pi, are shown 
in Fig. 75.4, where the parameters for all clouds are 
shown, and in Fig. 75.5, where the adaptive parameter 
P; is shown. 

The second simulation is done with dead zone mod- 
ification to make the adaptation robust. The dead zone 
was chosen as ddeaq = 2. All the other parameters of 
the algorithm were the same as in the first simulation: 


0 0.2 04 06 0.8 1 12 14 16 18 2 
kx 104 


Fig. 75.5 Drift of the adaptive parameter Px in the case of 
no robust laws 


Tis Yk and yg 


100 
80 
60 
40 
20 
0 0.2 04 06 0.8 1 I2 14 16 18 2 
kx 10t 
Uk 
10 
5) 
0 
-5 
-10 > 
0 0.2 04 06 0.8 1 Tee hh ey SS 2 
kx 10* 


Fig. 75.6 Reference, model reference, output signal track- 
ing, and control signal in the case of a dead zone 


1443 


1°21 | D Hed 


1444 Part G 


Hybrid Systems 


71°S2 | D Hed 


the actuator’s interval was given as [—6, 6], the refer- 
ence model parameter was defined as a, = 0.925, and 
the adaptive gain yp = 0.1. In the upper plot, Fig. 75.6 
shows the reference, the model reference, and the out- 
put signal, and in the lower part the control signal. The 
model reference tracking is suitable and the parameters 
of the control law converge and enable a reasonable per- 
formance. 

The clouds generated during the self-learning pro- 
cedure are the same as those obtained in the first ex- 
ample in Fig. 75.3. During the procedure seven clouds 
were generated again. The adaptive parameters Pi, are 
shown in Fig. 75.7, where the parameters for all clouds 


0 
0 02 04 06 08 1 e La OES 2 


kx 10+ 
Fig. 75.7 Adaptive parameters P% in the case of a dead 


zone 


P; 
7 


0 
0 02 04 06 08 1 e ES ez 


kx 10° 


Fig. 75.8 Drift of the adaptive parameter Px in the case of 
dead zone 


are shown, and in Fig. 75.8, where the adaptive param- 
eter P; is shown. 

The tracking error ex in the case of dead zone mod- 
ification is shown in Fig. 75.9. 

The results of the last 500 samples are shown in 
detail in Fig. 75.10. It can be seen that the tracking us- 
ing the proposed modification of the adaptive laws has 
a very good control performance. 

The relatively big dead zone stops the adaptation of 
the control parameters and results in a bigger tracking 
error. On the other hand, a smaller dead zone would 
result in longer settling of the adaptive parameters and 


“o 02 04 06 O8 i 12 ia 16 18 2 
kx 10* 


Fig. 75.9 Tracking error 


Fk, Yk and y} 
804 


70 


ð 50 100 150 200 250 300 350 400 450 S0 150 200 250 300 350 400 450 500 
k 


100 150 200 250 300 350 400 450 500 
k 


Fig. 75.10 Reference, model reference, output signal 
tracking, and control signal in the case of a dead zone, in 
detail 


A Robust Evolving Cloud-Based Controller | 75.4 Simulation Study 


also in possible drifting. This way, the combination of 
a dead zone and a leakage adaptive law is proposed in 
the third simulation, where the rest of the parameters 
are the same as in the previous simulation, except the 
dead zone, which is now chosen to be dgeaq = 0.25, and 
the leakage term which is defined as op = 1075. 

Figure 75.11 shows the reference, the model ref- 
erence, the output signal, and the control signal. The 
model reference tracking is satisfactory and the parame- 
ters of the control law converge and enable a reasonable 
performance. 


Tk, Yk and y}; 
100 
80 
60 
40 
20 > 
0 0.2 04 06 0.8 1 12 14 #16 1.8 2 
kx 10+ 
Uk 
10 
3 
0 
-5 
-10 
0 0.2 04 06 0.8 1 12 14 16 1.8 2 


k x 10* 


Fig. 75.11 Reference, model reference, output signal track- 
ing, and control signal in the case of leakage in the adaptive 
law 


0 


0 02 04 06 08 1 CZ TIA 6 IS 2 


kx 10* 


Fig. 75.12 Adaptive parameters P* in the case of leakage 
in the adaptive law 


The clouds generated during the self-learning pro- 
cedure with leakage in adaptive law are the same as 
those obtained in the previous two examples, and are 
given in Fig. 75.3. The positions of the clouds remain 
the same in all three approaches using different robust 
modifications of the adaptive laws. The adaptive param- 
eters PÌ are shown in Fig. 75.12, where the parameters 
for all clouds are shown, and in Fig. 75.13 where the 
adaptive parameter Px is shown. 

The tracking error eg in the case of leakage in the 
adaptive law is shown in Fig. 75.14. Due to the use of 
a smaller dead zone and leakage in the adaptive law, the 
tracking is better and also the parameter convergence is 
good. 


04 0.6 0.8 1 12 14 16 1.8 2 
kx 104 


Fig. 75.13 Drift of the adaptive parameter P% in the case of 
leakage in the adaptive law 


Ek 
8 


Fig. 75.14 Tracking error 


1445 


1°21 | D Hed 


1446 


°S | D Hed 


Part G | Hybrid Systems 


The results of the last 500 samples are shown in de- 
tail in Fig. 75.15. It is shown that the tracking using 
the proposed leakage in the modification of the adap- 
tive laws has a high control performance. 

In the fourth simulation study we would like to 
show an example of drastic constraints in the process 
actuator. In this case, the actuator constraints are given 
by the interval [-1,2]. The dead zone is now chosen 


Tk, Yk and y; 
804 


70 


> 
100 150 200 250 300 350 400 450 500 
k 


0 50 


0 a 


0 50 100 150 200 250 300 350 400 450 500 


k 


Fig. 75.15 Reference, model reference, output signal 
tracking, and control signal in the case of leakage in the 
adaptive law, in detail 


Tk, Yk and y; 
100 


0 02 04 06 0.8 1 12 14 16 18 2 


0 02 04 06 08 1 (ee teak ley bk 


kx 104 


Fig. 75.16 Reference, model reference, output signal 
tracking, and control signal in the case of adaptation in- 
terruption 


to be daeaad = 0.5. Figure 75.16 shows the reference, the 
model reference, the output signal, and the control sig- 
nal in the case of adaptation interruption. The model 
reference tracking is satisfactory and the parameters of 
the control law converge and enable a reasonable per- 
formance. 

The clouds generated during the self-learning pro- 
cedure are shown in Fig. 75.17. The positions of the 
clouds is now different because of the constraints and 


n 
— 


Fig. 75.17 Clouds in the case of adaptation interruption 


-60 Le *=8 
® 02 02 06 08 7 12 142 16 ig 2 


kx 10* 


Fig. 75.18 The adaptive parameters P* in the case of adap- 
tation interruption 


A Robust Evolving Cloud-Based Controller | 75.5 Conclusions 


> wn an I foe} oO Sau 


No Ww 


= 


> 
0 02 04 06 0.8 1 LZ aka Ge 18 2 
kx 10* 


Fig. 75.19 The drift of the adaptive parameter Py in the 
case adaptation interruption 


Ek 

15 

10 

5 

0 

-5 

-10 = 
0 0.2 04 06 0.8 1 12 14 16 1.28 2 


75.5 Conclusions 


In this chapter, a new approach for an online self-evolv- 
ing cloud-based fuzzy rule-based controller (RECCo), 
which has no antecedent parameters, was proposed. 
One illustrative example was provided to support the 
concept. It has been shown that the proposed controller 
can start with no a-priori knowledge. All the fuzzy 
tules are defined during the self-evolving phase. The 
controller performs the self-evolving algorithm simulta- 
neously with the control of the plant. The advantage of 
the proposed controller is the self-evolving procedure, 


Tis Yk and yg 


80 

70 

60 

50 
0 50 100 150 200 250 300 350 400 450 500 
k 

Uk 


2 
1 
== aa 
-1 
100 150 200 250 300 350 400 450 a 


Fig. 75.21 Reference, model reference, output signal 
tracking, and control signal in the case of adaptation in- 
terruption 


different tracking errors. The adaptive parameters P} 
are shown in Fig. 75.18, where the parameters for all 
clouds are shown, and in Fig. 75.19, where the adaptive 
parameter P; is shown. 

The tracking error £x in the case of adaptation inter- 
ruption is shown in Fig. 75.20. 

The results of the last 500 samples are shown in de- 
tail in Fig. 75.21. It is shown that perfect tracking in 
the case of a highly constrained control signal can be 
achieved by using the proposed modification based on 
adaptation interruption. 


Fig. 75.20 Tracking error < 


which enables a working algorithm that starts from no 
a-priori knowledge; it can cope perfectly with nonlin- 
earity because of the use of fuzzy data clouds, which 
actually divide the input space and enable the use of 
different control parameters in each cloud, and also en- 
ables adaptation to changes of the process parameters 
during the control. No explicit membership function 
is needed, no pre-training or any explicit model in 
any form. The proposed algorithm combines the well- 
known concept of model-reference adaptive control al- 


1447 


g°s2|9 Hed 


1448 Part G 


S4 | D Hed 


Hybrid Systems 


gorithms with the concepts of evolving fuzzy systems 
of ANYA type (no antecedent parameters and density- 
based fuzzy aggregation of the linguistic rules). In this 
work, we analyzed problems related to the adaptive ap- 


proach. Different modifications of adaptive laws were 
studied. Those modifications make the adaptive laws 
more robust to parameter drift which often leads to per- 
formance degradation and instability. 


References 

75.1 E.H. Mamdani, S. Assilian: An experiment in lin- 75.14 1. Škrjanc, S. Blažič, D. Matko: Model-reference 
guistic synthesis with a fuzzy logic controller, Int. fuzzy adaptive control as a framework for nonlinear 
J. Man-Mach. Stud. 7(1), 1-13 (1975) system control, J. Intell. Robot. Syst. 36(3), 331-347 

75.2 S. Sugawara, T. Suzuki: Applications of fuzzy con- (2003) 
trol to air conditioning environment, J. Therm. Biol. 75.15 S. Blazié, |. Škrjanc, D. Matko: Globally stable model 
18(5), 456-472 (1993) reference adaptive control based on fuzzy descrip- 

75.3 Y. Liu, Y. Zheng: Adaptive robust fuzzy control for tion of the plant, Int. J. Syst. Sci. 33(12), 995-1012 
a class of uncertain chaotic systems, Nonlinear Dyn. (2002) 

57(3), 431-439 (2009) 75.16 P. Angelov, R. Buswell, J.A. Wright, D. Loveday: 

75.4 M. Mucientes, J. Casillas: Quick design of fuzzy Evolving rule-based control, Proc. EUNITE Symp. 
controllers with good interpretability in mobile (2001) pp. 36-41 
robotics, IEEE Trans. Fuzzy Syst. 15(4), 636-651 75.17 D. Pasaltis, A. Sideris, A. Yamamura: A multilayer 
(2007) neural network controller, IEEE Trans. Control Syst. 

75.5 K. Shimojima, T. Fukuda, Y. Hashegawa: Self- Manag. 8(2), 17-21 (1988) 
tuning modelling with adaptive membership func- 75.18 P. Angelov, D.P. Filev: An approach to online 
tion, rules, and hierarchical structure based on identification of Takagi-Sugeno fuzzy models, 
genetic algorithm, J. Fuzzy Sets Syst. 71(3), 295-309 IEEE Trans. Syst. Man Cybern. 34(1), 484-498 
(1995) (2004) 

75.6 P. Angelov, R. Guthke: A genetic-algorithm-based 75.19 A.B. Cara, Z. Lendek, R. Babuska, H. Pomares, 
approach to optimization of bioprocesses de- I. Rojas: Online self-organizing adaptive fuzzy 
scribed by fuzzy rules, Bioprocess Eng. 16, 299-303 controller: Application to a nonlinear servo sys- 
(1996) tem, IEEE Int. Conf. Fuzzy Syst. (2010) pp. 1- 

75.7 M. Huang, J. Wan, Y. Ma, Y. Wang, W. Li, X. Sun: 8 
Control rules of aeration in a submerged biofilm 75.20 P. Angelov, P. Sadeghi-Tehran, R. Ramezani: An ap- 
wastewater treatment process using fuzzy neu- proach to automatic real-time novelty detection, 
ral networks, Expert Syst. Appl. 36(7), 10428-10437 object identification, and tracking in video streams 
(2009) based on recursive density estimation and evolv- 

75.8 C. Li, C. Lee: Self-organizing neuro-fuzzy system for ing Takagi-Sugeno fuzzy systems, Int. J. Intell. Syst. 
control of unknown plants, IEEE Trans. Fuzzy Syst. 26(3), 189-205 (2011) 

11(1), 135-150 (2003) 75.21 P. Sadeghi-Tehran, A.B. Cara, P. Angelov, H. Po- 

75.9 H. Pomares, l. Rojas, J. González, F. Rojas, mares, l. Rojas, A. Prieto: Self-evolving parameter- 
M. Damas, F.J. Fernández: A two-stage approach free rule-based controller, IEEE Proc. World Congr. 
to self-learning direct fuzzy controllers, Int. J. Ap- Comput. Intell. (2012) pp. 754-761 
prox. Reason. 29(3), 267-289 (2002) 75.22 P. Angelov, R. Yager: Simplified fuzzy rule-based 

75.10 |. Rojas, H. Pomares, J. Gonzalez, L. Herrera, systems using non-parametric antecedents and 
A. Guillen, F. Rojas, 0. Valenzuela: Adaptive fuzzy relative data density, IEEE Workshop Evol. Adapt. 
controller: Application to the control of the tem- Intell. Syst. (2011) pp. 62-69 
perature of a dynamic room in real time, Fuzzy Sets 75.23 P. Angelov: Anomalous system state identification, 
Syst. 157(16), 2241-2258 (2006) Patent GB120 8542.9 (2012) 

75.11 W. Wang, Y. Chien, I. Li: An on-line robust and 75.24 P. Angelov: A fuzzy controller with evolving struc- 
adaptive T-S fuzzy-neural controller for more gen- ture, Inf. Sci. 161(1/2), 21-35 (2004) 
eral unknown systems, Int. J. Fuzzy Syst. 10(1), 75.25 A.B. Cara, H. Pomares, |. Rojas: Anew methodology 
33-43 (2008) for the online adaptation of fuzzy self-structuring 

75.12 1. Škrjanc, K. Kavaek-Biasizzo, D. Matko: Real-time controllers, IEEE Trans. Fuzzy Syst. 19(3), 449-464 
fuzzy adaptive control, Eng. Appl. Artif. Intell. 10(1), (2011) 

53-61 (1997) 75.26 H. Pomares, |. Rojas, J. Gonzalez, M. Damas, 

75.13 1. Škrjanc, S. Blažič, D. Matko: Direct fuzzy model- B. Pino, A. Prieto: Online global learning in direct 


reference adaptive control, Int. J. Intell. Syst. 17(10), 
943-963 (2002) 


fuzzy controllers, IEEE Trans. Fuzzy Syst. 12(2), 218- 
229 (2004) 


A Robust Evolving Cloud-Based Controller | References 1449 


75.27 


75.28 


75.29 


75.30 


75.31 


G. Kreisselmeier, K.S. Narendra: Stable model refer- 
ence adaptive control in the presence of bounded 
disturbances, IEEE Trans. Autom. Control 27(6), 
1169-1175 (1982) 

C.E. Rohrs, L. Valavani, M. Athans, G. Stein: Ro- 
bustness of continuous-time adaptive control al- 
gorithms in the presence of unmodeled dynam- 
ics, IEEE Trans. Autom. Control 30(9), 881-889 
(1985) 

P.A. loannou, J. Sun: Robust Adaptive Control 
(Prentice Hall, Upper Saddle River 1996) 

S. Blažič, |. Škrjanc, D. Matko: Globally stable direct 
fuzzy model reference adaptive control, Fuzzy Sets 
Syst. 139(1), 3-33 (2003) 

S. Blažič, |. Škrjanc, D. Matko: A new fuzzy adap- 
tive law with leakage, IEEE Conf. Evol. Adapt. Intell. 
Syst. (2012) pp. 47-50 


75.32 


75.33 


75.34 


75.35 


75.36 


75.37 


B.B. Peterson, K.S. Narendra: Bounded error adap- 
tive control, IEEE Trans. Autom. Control 27(6), 1161- 
1168 (1982) 

P.A. loannou, P.V. Kokotovic: Instability analysis 
and improvement of robustness of adaptive con- 
trol, Automatica 20(5), 583-594 (1984) 

K. Åström, B. Wittenmark: Adaptive Control (Addi- 
son Wesley, Reading 1989) 

P. Angelov, X. Zhou: Evolving fuzzy systems from 
data streams in real-time, IEEE Int. Symp. Evol. 
Fuzzy Syst. (2006) pp. 29-35 

P. Angelov: On line learning fuzzy rule-based sys- 
tem structure from data streams, IEEE Int. Conf. 
Fuzzy Syst. (2008) pp. 915-922 

P. Angelov, D.P. Filev, N.K. Kasabov: Evolving In- 
telligent Systems: Methodology and Applications 
(Wiley, Hoboken 2010) 


SZ | D Hed 


76. Evolving Embedded Fuzzy Controllers 


Oscar H. Montiel Ross, Roberto Sepulveda Cruz 


The interest in research and implementations of 
type-2 fuzzy controllers (T2FCs) is increasing. It has 
been demonstrated that these controllers pro- 
vide more advantages in handling uncertainties 
than type-1 FCs (TIFCs). This characteristic is very 
appealing because real-world problems are full 
of inaccurate information from diverse sources. 
Nowadays, it is no problem to implement an in- 
telligent controller (IC) for microcomputers since 
they offer powerful operating systems, high-level 
languages, microprocessors with several cores, and 
co-processing capacities on graphic processing 
units (GPUs), which are interesting characteristics 
for the implementation of fast type-2 ICs (T2ICs). 
However, the above benefits are not directly avail- 
able for the design of embedded ICs for consumer 
electronics that need to be implemented in devices 
such as an application-specific integrated circuit 
(ASIC), a field-programmable gate array (FPGAs), 
etc. Fortunately, for TIFCs there are platforms that 
generate code in VHSIC hardware description lan- 
guage (VHDL; VHSIC: very high speed integrated 
circuit), C++, and Java. This is not true for the de- 
sign of T2ICs, since there are no specialized tools 
to develop the inference system as well as to op- 
timize it. 

The aim of this chapter is to present different 
ways of achieving high-performance computing 
for evolving TI and T2 ICs embedded into FPGAs. 
Therefore, we provide a compiled introduction 
to Tl and T2 FCs, with emphasis on the well- 
known bottle neck of the interval T2FC (IT2FC), and 
software and hardware proposals to minimize its 
effect regarding computational cost. An overview 
of learning systems and hosting technology for 
their implementation is given. We explain differ- 
ent ways to achieve such implementations: at the 
circuit level using a hardware description lan- 


76.1 OVErVIGW ......... 2... eee eeeeeeeeeeeeeeeee 1452 
76.2 Type-1 and Type-2 Fuzzy Controllers ..... 1454 
76.3 Host Technology ..................::ceeeeeee neers 1457 
76.4 Hardware Implementation Approaches. 1458 


76.4.1 Multiprocessor Systems .............. 1458 
76.4.2 Implementations into FPGAs ...... 1459 


76.5 Development of a Standalone IT2FC...... 1461 


76.5.1 Development of the IT2 FT2KM 


Design ENGY.......001.seccrcersccacsares 1462 

76.6 Developing of IT2FC Coprocessors .......... 1466 
76.6.1 Integrating the IT2FC Through 

Internal POrtS. enirere 1466 

76.6.2 Development of IP Cores ............ 1466 

76.7 Implementing a GA in an FPGA............. 1468 


76.7.1 GA Software Based 
Implementations .................0006. 1469 
76.7.2 GA Hardware Implementations... 1469 


76.8 Evolving Fuzzy Controllers.................... 1470 
76.8.1 EAPR Flow for Changing 
the Controller Structure.............. 1471 
76.8.2 Flexible Coprocessor Prototype 
at an UT QFE oi eiie iee 1472 


76.8.3 Conclusion and Further Reading. 1474 


REFEFENCES......... oe cec ccc eeceeeeeceeeeseeeaeeneeees 1475 


guage, using a multiprocessor system and a high- 
level language, and combining both methods. We 
explain how to use the IT2FC developed in VHDL as 
a standalone system, and as a coprocessor for the 
FPGA Fusion of Actel, Spartan 6, and Virtex 5. We 
present the methodology and two new proposals 
to achieve evolution of the IT2FC for FPGA, one for 
the static region of the FPGA, and the other one 

for the reconfigurable region using the dynamic 

partial reconfiguration methodology. 


1451 


v 
o 

= 

pes 
7) 
~ 
ON 


1452 Part G | Hybrid Systems 


92 | D Hed 


76.1 Overview 


An intelligent system and evolution are intrinsically re- 
lated since it is difficult to conceive intelligence without 
evolution because intelligence cannot be static. Hu- 
man beings create, adapt, and replace their own rules 
throughout their whole lives. The idea to apply evolu- 
tion to a fuzzy system is an attempt to construct a math- 
ematical assembly that can approximate human-like 
reasoning and learning mechanisms [76.1]. A mathe- 
matical tool that has been successfully applied to better 
represent different forms of knowledge is fuzzy logic 
(FL); also if-then rules are a good way to express hu- 
man knowledge, so the application of FL to a rule-based 
system leads to a Fuzzy Rule-Based System (FRBS). 
Unfortunately, an FRBS is not able to learn by itself, 
the knowledge needs to be derived from the expert or 
generated automatically with an evolutionary algorithm 
(EA) such as a genetic algorithm (GA) [76.2]. 

The use of GAs to design machine learning systems 
constitutes the soft computing paradigm known as the 
genetic fuzzy system where the goal is to incorporate 
learning to the system or tuning different components 
of the FRBS. Other proposals in the same line of work 
are: genetic fuzzy neural networks, genetic fuzzy clus- 
tering, and fuzzy decision trees. A system with the 
capacity to evolve can be defined as a self-developing, 
self-learning, fuzzy rule-based or neuro-fuzzy system 
with the ability to self-adapt its parameters and struc- 
ture online [76.3]. 

Figure 76.1 shows the general structure of an evo- 
lutionary FRBS (EFRBS) that can be used for tuning 


Learning or tuning process 


Scaling 


or learning purposes. Although, it is difficult to make 
a clear distinction between tuning and learning, the par- 
ticular aspect of each process can be summarized as 
follows. The tuning process is assumed to work on 
a predefined rule base having the target to find the 
optimal set of parameters for the membership func- 
tions and/or scaling functions. On the other hand, the 
learning process requires that a more elaborated search 
in the space of possible rule bases, or in the whole 
knowledge base be achieved, as well as for the scal- 
ing functions. Since the learning approach does not 
depend on a predefined set of rules and knowledge, 
the system can change its fundamental structure with 
the aim of improving its performance according to 
some criteria. The idea of using scaling functions for 
input and output variables is to normalize the uni- 
verse of discourse in which membership functions were 
defined. 
According to De Jong [76.4]: 


the common denominator in most learning systems 
is their capability of making structural changes to 
themselves over time with the intent of improving 
performance on tasks defined by the environment, 
discovering and subsequently exploiting interesting 
concepts, or improving the consistency and gener- 
ality of internal knowledge structures. 


Hence, it is important to have a clear understanding 
of the strengths and limitations of a particular learning 
system, to achieve a precise characterization of all the 


functions 


TAE a aed 


Optimization method: 
evolutionary algorithm 
or related technique 


t 


Knowledge base 


Membership | | Fuzzy 
functions rules 


Input 
P Scaling Fuzzification liiferenicg 
engine 


oe f Output 
Defuzzification Scaling , -———> 


Fig. 76.1 General structure 
of an evolutionary fuzzy 
rule-based system 


Evolving Embedded Fuzzy Controllers | 76.1 Overview 


permitted structural changes and how they are going to 
be made. 

De Jong sets three different levels of complexity 
where the GA can perform legal structural changes in 
following a goal, these are [76.4]: 


1. By changing critical parameters’ values 

2. By changing key data structures 

3. By changing the program itself with the idea of 
achieving effective behavioral changes in a task 
subsystem where a prominent representative of this 
branch is the learning production-systems program. 


A good reason behind the success of production sys- 
tems in machine learning is due to the fact that they 
have a representation of knowledge that can simultane- 
ously support two kinds of activities: (1) the knowledge 
can be treated as data that can be manipulated according 
to some criteria; (2) for a particular task, the knowledge 
can be used as an executable entity. 

The two classical approaches for working with evo- 
lutionary FRBS (EFRBS) for a learning system are 
the Pittsburgh and Michigan approaches. Historically, 
in 1975 Holland [76.5] affirmed that a natural way to 
represent an entire rule set is to use a string, i.e., an 
individual; so, the population is formed by candidate 
tule sets, and to achieve evolution it is necessary to use 
selection and genetic operators to produce new gener- 
ations of rule sets. This was the approach taken by De 
Jong at the University of Pittsburgh, hence the name of 
Pittsburgh approach. During the same period, Holland 
developed a model of cognition in which the members 
of population are individual rules, and the entire popula- 
tion is conformed with the rule set; this quickly became 
the Michigan approach [76.6, 7]. 

There are extensive pioneering and recent work 
about tuning and learning using FRBS most of them 
fall in some way in the Michigan or in the Pitts- 
burgh approaches, for example, the supervised in- 
ductive algorithm [76.8,9], the iterative rule learning 
approach [76.10], coverage-based genetic induction 
(COGIN) [76.11, 12], the relational genetic algorithm 
learner (REGAL) system [76.13], the compact fuzzy 
classification system [76.14], with applications to fuzzy 
control [76.15, 16], and about tuning type-2 fuzzy con- 
trollers [76.17—20]. 

The focus of this chapter is on evolving embed- 
ded fuzzy controllers; this subclassification reduces the 
number of related works; however, they are still a big 
quantity, since by an embedding system (ES), we can 
understand a combination of computer hardware (HW) 


and software (SW) devoted to a specific control func- 
tion within a larger system. Typically, the HW of an ES 
can be a dedicated computer system, a microcontroller, 
a digital signal processor, or a FPGA-based system. If 
the SW of the ES is fixed, it is called firmware; because 
there are no strict boundaries between firmware and 
software, and the ES has the capability of being repro- 
grammed, the firmware can be low level and high level. 
Low-level firmware tells the hardware how to work and 
typically resides in a read only memory (ROM) or in 
a programmable logic array (PLA); high-level firmware 
can be updated, hence is usually set in a flash memory, 
and it is often considered software. 

In the literature, there is extensive work on suc- 
cessful applications of type-1 and type-2 fuzzy sys- 
tems; with regards to evolving embedded fuzzy sys- 
tems, they were applied in a control mechanism for 
autonomous mobile robot navigation in real environ- 
ments in [76.21]. For the sake of limiting more the 
content of this chapter, we have focused on EFRBSs 
to be implemented in an FPGA HW platform, with 
special emphasis on type-2 FRBSs. In this last cat- 
egory, with respect to type-1 FRBS took our atten- 
tion to the following proposals: The development of 
an FPGA-based proportional-differential (PD) fuzzy 
look-up table controller [76.22], FPGA implementa- 
tion of embedded fuzzy controllers for robotic ap- 
plications [76.23], a non-fixed structure fuzzy logic 
controller is presented in [76.24], a flexible architecture 
to implement a fuzzy controller into an FPGA [76.25], 
a very simple method for tuning the input membership 
function (MF) for modifying the implemented FPGA 
controller response [76.26]; how to test and simulate the 
different stages of a FRBS for future implementation 
into an FPGA are explained in [76.27—29]. On type- 
1 EFRBS there are some works like: A reconfigurable 
hardware platform for evolving a fuzzy system by us- 
ing a cooperative coevolutionary methodology [76.30], 
the tuning of input MFs for an incremental fuzzy PD 
controller using a GA [76.31]. In the type-2 FRBS cate- 
gory, the amount of reported work is less; representative 
work can be listed as follows: an architectural pro- 
posal of hardware-based interval type-2 fuzzy inference 
engine for FPGA is presented in [76.32], the use of par- 
allel HW implementation using bespoke coprocessors 
handled by a soft-core processor of an interval type-2 
fuzzy logic controller is explored in [76.33], a high- 
performance interval type-2 fuzzy inference system 
(IT2-FIS) that can achieve the four stages fuzzifica- 
tion, inference, KM-type reduction, and defuzzification 
in four clock cycles is shown in [76.34]; the same 


1453 


92 | D Hed 


1454 Part G 


Hybrid Systems 


7°92 | D Hed 


system is suitable for implementation in pipelines pro- 
viding the complete IT2-FIS process in just one clock 
cycle. 

This work deals with the development of evolv- 
ing embedded type-1 and type-2 fuzzy controllers. In 
the chapter, a broad exploration of several ways to 
implement evolving embedded fuzzy controllers are 
presented. We choose to work with the Mamdani fuzzy 


controller proposal since it provides a highly flexible 
means to formulate knowledge. 

The organization of this chapter is as follows. In 
Sect. 76.2 we present the basis of T1 and T2 FL to 
explain how to achieve the HW implementation of an 
FRBS. In Sect. 76.3 a brief description of the state of 
the art in hosting technology for high-performance em- 
bedded systems is given. 


76.2 Type-1 and Type-2 Fuzzy Controllers 


The type-2 fuzzy sets (T2FS) were developed with the 
aim of handling uncertainty in a better way than T1 FS 
does, since a T1FS has crisp grades of membership, 
whereas a T2FS has fuzzy grades of membership. An 
important point to note is that if all uncertainty dis- 
appears, a T2 FS can be reduced to a TIFS. A type-2 
membership function (T2MF) is an FS that has primary 
and secondary membership values; the primary MF is 
a representation of an FS, and serves to create a lin- 
guistic representation of some concept with linguistic 
and random uncertainties with limited capabilities; the 
secondary MF allows capturing more about linguistic 
uncertainty than a T1 MF. 

There are two common ways to use a T2FS, the gen- 
eralized T2FS (GT2), and the interval T2FS (IT2FS). 
The former has secondary membership grades of dif- 


Ma(x’) Vertical slice—GT2 


Wy] DAS ‘ 
UMF(A) | R Wen 
>| [IID 
T > 
MF\(x') MFy(x') u 
Uy un 
ME’) i Ma & D a Vertical slice—IT2 
Wy Wyn 
=> 
i T > 
‘ M F\(x') MFy(x') u 
MEN uy uy 
aM < 
LMF(A) x 


Fig. 76.2 Type-2 membership function. For the triangular MF the 
FOU is shown. The FOU is bounded by the upper part UMF(A) and 
the lower part LMF(A). A vertical slice at x’ is illustrated. Right, 
top: secondary MF values for a generalized T2MF; bottom: sec- 
ondary MF values of an IT2MF 


ferent values to represent more accurately the existing 
uncertainty; on the other hand, in an IT2FS the sec- 
ondary membership value always takes the value of 
1. Unfortunately, to date for GT2 no one knows yet 
how to choose their best secondary MFs; moreover, 
this method introduces a lot of computations, making it 
inappropriate for current application in real-time (RT) 
systems, even those with small time constraints; in con- 
trast, the calculations are easy to perform in an IT2FS. 

A T2MF can be represented using a 3-D figure that 
is not as easy to sketch as a TIMF. A more common 
way to visualize a T2MF is to sketch its footprint of 
uncertainty (FOU) on the 2-D domain of the T2FS. We 
illustrate this concept in Fig. 76.2, where we show a ver- 
tical slice sketch of the FOU at the primary MF value 
x’; in the case of a GT2, in the right upper part of the 
figure, the secondary MF shows different height values 
of the GT2; in the case of an IT2F2, just below is the 
secondary MF with uniform values for the IT2FS. Note 
that the secondary values sit on top of its FOU. 

Figure 76.3 shows the main components of a fuzzy 
logic system showing the differences between the T1 
and T2 FC. For T1 systems, there are three components: 
fuzzifier, inference engine, and the defuzzifier which is 


Type-2/Type-1 FC 


Type-1 F 

i Inference engine Output processing 
i| Knowledge base |; Defuzzification 
‘| | Data Rule | |} 7 

& i base base l 

s E 

2 i | Type-reducer 

N i 

N |i 

ag Inference | i | i 4 
: mechanism 


Fig. 76.3 Type-1 and type-2 FC. The T2FC at the output 
processing has the type reducer block 


Evolving Embedded Fuzzy Controllers | 76.2 Type-1 and Type-2 Fuzzy Controllers 1455 


the only output processing unit; whereas for a T2 system 
there are four components, since the output processing 
has interconnected the type reducer (TR) block and the 
defuzzifier to form the output processing unit. 

Ordinary fuzzy sets were developed by Zadeh in 
1965 [76.35]; they are an extension of classical set the- 
ory where the concept of membership was extended to 
have various grades of membership on the real con- 
tinuous interval [0,1]. The original idea was to use 
a fuzzy set (FS); i. e., a linguistic term to model a word; 
however, after almost 10 years, Zadeh introduced the 
concept of type-n FS as an extension of an ordinary FS 
(T1FS) with the idea of blurring the degrees of mem- 
bership values [76.36]. 

TIFSs have been demonstrated to work efficiently 
in many applications; most of them use the mathematics 
of fuzzy sets but lose the focus on words that are mainly 
used in the context to represent a function which is more 
mathematical than linguistic [76.37]. 

A TIFS is a set of ordered pairs represented 
by (76.1) [76.38], 


A= {(x, Ma(x)) |x E X} , (76.1) 


where each element is mapped to [0, 1] by its MF ma, 
where [0, 1] means real numbers between 0 and 1, in- 
cluding the values 0 and 1, 


a(x): X —> [0,1]. (76.2) 


A pointwise definition of a T2FS is given as follows, 
A is characterized by a T2MF pz(x, u), where x € X and 
u € Jy C [0, 1], i. e. [76.39], 


A= {(x, u), UA(x, u)|Yx € X, Vue J, C [0, 1)} : 
(76.3) 


where 0 < g(x, u) < 1. 
Another way to express A is 


a= [ f aww JC [0,1], (76.4) 


xEX uel, 


where f f denote the union over all admissible input 
variables x’ and u’. For discrete universes of discourse 
J is replaced by X` [76.39]. In fact, J, C [0, 1] repre- 
sents the primary membership of x € X and yg, u) is 
a TIFS known as the secondary set. Hence, a T2MF 
can be any subset in [0,1], the primary membership, 
and corresponding to each primary membership, there 
is a secondary membership (which can also be in [0,1]) 
that defines the uncertainty for the primary member- 
ship. 


When py(x,u)=1, where xe X and ued, CS 
[0, 1], we have the IT2MF shown in Fig. 76.2. The uni- 
form shading for the FOU represents the entire IT2FS 
and it can be described in terms of an upper member- 
ship function and a lower membership function 


œx) = FOU(A) Yxe X, (76.5) 
L(x) = FOU(A) Vx eX. (76.6) 


Figure 76.2 shows an IT2MF, the shadow region is the 
FOU. At the points x; and x2 are the primary MFs Jy, 
and J;,, and the corresponding secondary MFs g(x) 
and pry(x2) are also shown. 

The basics and principles of fuzzy logic do not 
change from TIFSs to T2FSs [76.37, 40, 41], they are 
independent of the nature of the membership functions, 
and in general, will not change for any type-n. When 
a FIS uses at least one type-2 fuzzy set, it is a type-2 
FIS. 

In this chapter we based our study on IT2FSs, so the 
IT2 FIS can be seen as a mapping from the inputs to the 
output and it can be interpreted quantitatively as Y = 
f(X), where X = {x,,x2,...,x,} are the inputs to the 
IT2 FIS f, and Y = {y1, yo,..., Yn} are the defuzzified 
outputs. These concepts can be represented by rules of 
the form 


If xı is Fy and ... and x, is F,, then y isG. (76.7) 


In a TIFC, where the output sets are TIFS, the de- 
fuzzification produces a number, which is in some sense 
a crisp representation of the combined output sets. In 
the T2 case, the output sets are T2, so the extended 
defuzzification operation is necessary to get T1FS at 
the output. Since this operation converts T2 output sets 
to a TIFS, it is called type reduction, and the T1FS is 
called a type-reduced set, which may then be defuzzi- 
fied to obtain a single crisp number. 

The TR stage is the most computationally expen- 
sive stage of the T2FC; therefore, several proposals to 
improve this stage have been developed. One of the 
first proposals was the iterative procedure known as the 
Karnik—Mendel (KM) algorithm. 

In general, all the proposals can be classified into 
two big groups. Group I embraces all the algorithmic 
improvements and Group II all the hardware improve- 
ments, as follows [76.42]: 


1. Improvements to software algorithms, where the 
dominant idea is to reduce computational cost of 
IT2-FIS based on algorithmic improvements. This 
group can be subdivided into three subgroups. 


7°92 | D Hed 


1456 PartG | Hybrid Systems 


7°92 | D Hed 


(a) Enhancements to the KM TR algorithm. As the 


classification’s name claims, the aim is to im- 

prove the original KM TR algorithm directly, to 

speed it up. The best known algorithms in this 
classification are: 

i. Enhanced KM (EKM) algorithms. They 
have three improvements over the original 
KM algorithm. First, a better initialization 
is used to reduce the number of iterations. 
Second, the termination condition of the it- 
erations is changed to remove unnecessary 
iterations (one). Finally, a subtle computing 
technique is used to reduce the computa- 
tional cost of each iteration. 

ii. The enhanced Karnik-Mendel algorithm 
with new initialization (EKMANI) [76.43]. 
It computes the generalized centroid of gen- 
eral T2FS. It is based on the observation that 
for two alpha-planes close to each other, the 
centroids of the two resulting IT2FSs are 
also closed to each other. So, it may be ad- 
vantageous to use the switch points obtained 
from the previous alpha-plane to initialize 
the switch points in the current alpha-plane. 
Although EKMANI was primarily intended 
for computing the generalized centroid, it 
may also be used in the TR of IT2-FIS, 
because usually the output of an IT2-FIS 
changes only a small amount at each step. 

iii. The iterative algorithm with stop condition 
(IASC). This was proposed by Melgarejo 
et al. [76.44] and is based on the analysis of 
behavior of the firing strengths. 

iv. The enhaced IASC [76.45] is an improve- 
ment of the IASC. 

v. Enhanced opposite directions searching 
(EODS), which is a proposal to speed up 
KM algorithms. The aim is to search in both 
directions simultaneously, and in each iter- 
ation the points L and R are the switching 
points. 


(b) Alternative TR algorithms. Unlike iterative KM 


algorithms, most alternative TR algorithms have 

a closed-form representation. Usually, they are 

faster than KM algorithms. Two representative 

examples are: 

i. The Gorzalczany method. A polygon us- 
ing the firing strengths [f”,f’] and [(y', y”), 
which can be viewed as an IT2FS. 
It computes an approximate membership 
value for each point. Here, y" = y’ = y”, 


forn=1,2,3...,N. 


+f > 
uO) sE L -F-P], (76.8) 


where f —f is called the bandwidth. Then the 
defuzzified output can be computed as 


yG = arg max,p(y) . (76.9) 


ii. The Wu—Tan (WT) method. It searches an 
equivalent T1FS. The centroid method is ap- 
plied to obtain the defuzzification. This is 
the faster method in this category. 

2. Hardware implementation. The main idea is to take 
advantage of the intrinsic parallelism of the hard- 
ware and/or combinations of hardware and parallel 
programming. Here, we divided this group into four 
main approaches that embrace the existing propos- 
als of reducing the computational time of the type 
reduction stage by the use of parallelism at different 
levels. 

(a) The use of multiprocessor systems, including 
multicore systems that enable the same benefits 
at a reduced cost. In this category are personal 
and industrial computers with processors such 
as the Intel Pentium Core Processor family, 
which includes the Intel Core i3, 15 and 17; the 
AMD Quad-Core Optetron, the AMD Phenom 
X4 Quad-Core processors, multicore microcon- 
trollers such as the Propeller P8X32A from 
Parallax, or the F28M35Hx of the Concerto 
Microcontrollers family of Texas Instruments. 
Multicore processors also can be implemented 
into FPGAs. 

(b) The use of a general-purpose GPU (GPGPU), 
and compute unified device architecture 
(CUDA). In general, GPU provides a new 
way to perform high performance computing 
on hardware. In particular IT2FCs can take 
the most advantage of this technology be- 
cause their complexity. Traditionally, before 
the development of the CUDA technology, 
the programming was achieved by translating 
a computational procedure into a graphic format 
with the idea to execute it using the standard 
graphic pipeline; a process known as encoding 
data into a texture format. The CUDA technol- 
ogy of NVIDIA offers a parallel programming 
model for GPUs that does not require the use 
of a graphic application programming interface 
(API), such as OpenGL [76.46]. 


Evolving Embedded Fuzzy Controllers | 76.3 Host Technology 


(c) The use of FPGAs. This approach offers the 
best processing speed and flexibility. One of the 
main advantages is that the developer can deter- 
mine the desired parallelism grade by a trade-off 
analysis. Moreover, this technology allows us to 
use the strength of all platforms in tight inte- 
gration to provide the large performance avail- 
able at the present time. It is possible to have 
a standalone T1/IT2FC, or to integrate the same 
T1/T2FC as a coprocessor as part of a high per- 
formance computing system. 

(d) The use of ASICs. The T1/T2FC is factory 
integrated using complementary metal-oxide- 
semiconductor (CMOS) technology. The main 


76.3 Host Technology 


Until the beginnings of this century, general-purpose 
computers with a single-core processor were the sys- 
tems of choice for high-performance computing (HPC) 
for many applications; they replaced existing big and 
expensive computer architectures [76.47]. In 2001, 
IBM introduced a reduced intstruction set computer 
(RISC) microarchitecture named POEWER4 (perfor- 
mance optimization with enhanced RISC) [76.48]. 
This was the first dual core processor embedded into 
a single die, and subsequently other companies intro- 
duced different multicore microprocessor architectures 
to the market, such as the Arm Cortex A9 [76.49], 
Sparc64 [76.50], Intel and AMD Quad Core processors, 
Intel i7 processors, and others [76.51]. These develop- 
ments, together with the rapid development of GPUs 
that offer massively parallel architectures to develop 
high-performance software, are an attractive choice 
for professionals, scientists, and researchers interested 
in speeding up applications. Undoubtedly, the use of 
a generic computer with GPU technology has many ad- 
vantages for implementing an embedded learning fuzzy 
system [76.46], and disadvantages are mainly related to 
size and power consumption. A solution to the afore- 
mentioned problems is the use of application specific 
integrated circuits (ASICs) fuzzy processors [76.52— 
54], or reprogrammable hardware based on microcon- 
trollers and/or FPGAs. 

The orientation of this paper is towards tuning and 
learning using FRBS for embedded applications; for 
now, we are going to focus on FPGAs and ASIC tech- 
nology [76.55], since they provide the best level of 
parallelization. Both families of devices provide char- 
acteristics for HPC that the other options cannot. Each 


advantages are that they are cheaper than 
FPGAs. Differently to FPGA technology, ASIC 
solutions are not field reprogrammable. 


A system based on an FPGA platform allows us 
to program all the Group I algorithms since modern 
FPGAs have embedded hard and/or soft processors; this 
kind of system can be programmed using high-level 
languages such as C/C++ and also they can incorpo- 
rate operating systems such as Linux. On the other 
hand, T1/T2 FC hardware implementations have the 
advantage of providing competitive faster systems in 
comparison to ASIC systems and the in field reconfig- 
urability. 


technology has its own advantages and disadvantages, 
which are narrowing down due to recent developments. 
In general, ASICs are integrated circuits that are de- 
signed to implement a single application directly in 
fixed hardware; therefore, they are very specialized 
for solving a particular problem. The costs of ASIC 
implementations are reduced for high volumes; they 
are faster and consume less power; it is possible to 
implement analog circuitry, as well as mixed signal de- 
sign, but the time to market can take a year or more. 
There are several design issues that need to be car- 
ried out that do not need to be achieved using FPGAs, 
the tools for development are very expensive. On the 
other hand, FPGAs can be introduced to the market 
very fast since the user only needs a personal com- 
puter and low-cost hardware to burn the HDL (HDL) 
code to the FPGA before it is ready to work. They 
can be remotely updated with new software since they 
are field reprogrammable. They have specific dedicated 
hardware such as blocks of random access memory 
(RAM); they also provide high-speed programmable 
T/O, hardware multipliers for digital signal processing 
(DSP), intellectual property (IP) cores, microproces- 
sors in the form of hard cores (factory implemented) 
such as PowerPC and ARM for Xilinx, or Microblaze 
and Nios softcore (user implemented) for Xilinx and 
Altera, respectively. They can have built-in analog dig- 
ital converters (ADCs). The synthesis process is easier. 
A significant point is that the HDL tested code devel- 
oped for FPGAs may be used in the design process of 
an ASIC. 

There are three main disadvantages of the FPGAs 
versus ASICs, they are: FPGA devices consume more 


1457 


€°92 | D Hed 


1458 PartG 


Hybrid Systems 


7°92 | D Hed 


power than ASICs, it is necessary to use the resources 
available in the FPGA which can limit the design, and 
they are good for low-quantity production. To overcome 
these disadvantages it is very important to achieve op- 
timized designs, which can only be attained by coding 
efficient algorithms. 

During the last decade, there has been an increasing 
interest in evolving hardware by the use of evolutionary 
computations applied to an embedded digital system. Al- 
though different custom chips have been proposed for 


this plan, the most popular device is the FPGA because 
its architecture is designed for general-purpose commer- 
cial applications. New FGAs allow modification of part 
of the programmed logic, or add new logic at the run- 
ning time. This feature is known as dynamic or active 
reconfiguration, and because in an FPGA we can com- 
bine a multiprocessor system and coprocessors, FPGAs 
are very attractive for implementing evolvable hardware 
algorithms. Therefore, in the next sections, we shall put 
special emphasis on multiprocessor systems and FPGAs. 


76.4 Hardware Implementation Approaches 


In this section, an overview of the three main lines of 
attack to do a hardware implementation of an intelligent 
system is given. 


76.4.1 Multiprocessor Systems 


Multiprocessor systems consist of multiple processors 
residing within one system; they have been available 


for many years. Multicore processors have equivalent 
benefits to multiprocessors at a lower cost; they are inte- 
grated in the same electronic component. At the present 
time, most modern computer systems have many pro- 
cessors that can be single core or multicore proces- 
sors; therefore, we can have three different layouts 
for multiprocessing; a multicore system, a multipro- 
cessor system, and a multiprocessor/multicore system. 


Pere 5 Software Software 
program 1 program 2 
Peripherals: 
serial Port, IIC, Microblaze Microblaze 


PWM, etc. 


XPS_mailbox 
core 


Power PC 440 
system 
master 


íi 


J 


XPS_mailbox 
core 


i 


XPS_mailbox 
core 


XPS_mailbox 
core 


XPS_mailbox 
core 


Microblaze Microblaze Microblaze 
Fig. 76.4 Multicore system embed- 
ded into an FPGA. Embedded is 
a hard-processor PowerPC440 and 
Software Software Software five MicroBlaze soft-processors. In 
program 5 program 4 program 3 this system we can process an EA 


using the island model 


Evolving Embedded Fuzzy Controllers | 76.4 Hardware Implementation Approaches 1459 


Fig. 76.5 The whole embedded evolutionary IT2FC im- AOD OOOO OOOO OOOO 


plemented in the program memory of the multiprocessor 
system, similarly as in a desktop computer > 


Multi-core system 


Figure 76.4 shows a multicore system embedded into 
a Virtex 5 FPGA XCSVFX70; it has the capacity to 
integrate a distributed multicore system with a hard- 
processor PowerPC 440 as the master, five Microblaze 
32-bit soft-processor slaves, coprocessors, and periph- 
erals. The FPGA capacity to integrate devices is, of 
course, limited by the size of the FPGA. Figure 76.5 
shows the full implementation in the program memory 
of the multiprocessor system. 


Hard Soft 
processor processor 
E 


OS Data 
memory 


Program Yy 
memory | Genetic 
algorithm 
ry 
Knowledge 
base 
ry 
Y 


Inference | 


7°92 | D Hed 


76.4.2 Implementations into FPGAs 


The architecture of FPGAs offers massive parallelism 
because they are composed of a large array of config- 
urable logic blocks (CLBs), digital signal processing 
blocks (DSPs), block RAM, and input/output blocks 
(IOBs). Similarly, to a processor’s arithmetic unit 
(ALU), CLBs and DSPs can be programmed to per- QOUUUUUCUUUUUO UU UU 
form arithmetic and logic operations like compare, 
add/subtract, multiply, divide, etc. In a processor, ALU 
architectures are fixed because they have been de- 
signed in a general-purpose manner to execute various 
operations. CLBs can be programmed using just the 
operations that are needed by the application, which 
results in increased computation efficiency. Therefore, 
an FPGA consists of a set of programmable logic cells 
manufactured into the device according to a connec- 
tion paradigm to build an array of computing resources; 
the resulting arrangement can be classified into four 
categories: symmetrical array, row-based, hierarchy- 
based, and sets of gates [76.56]. Figure 76.6 shows 
a symmetrical array-based FPGA that consists of a two- 
dimensional array of logic blocks immersed in a set 
of vertical and horizontal lines; examples of FPGAs in 
this category are Spartan and Virtex from Xilinx, and 
Atmel AT40K. In Fig. 76.6 three main parts can be 
identified: a set of programmable logic cells also called 
logic blocks (LBs) or configurable logic blocks (CLBs), 
a programmable interconnection network, and a set of 


Output 
| processing 


MOO OOOO 
TUTTO TOUT 


MOON O OOOO 
TUTTO OCU OOOO 


ga ae a econ ag RTO UTOAATO AOTEAROA 
mbedded programmable logic devices usually in- 
tegrate one or several processor cores, programmable Interconnect Switch matrix 


logic and memory on the same chip (an FPGA) [76.56]. Fig. 76.6 Symmetric array-based FPGA architecture island style 
Developments in the field of FPGA have been very 

amazing in the last two decades, and for this reason, finite state machines, glue-logic for complex devices, 
FPGAs have moved from tiny devices with a few thou- and very limited CPUs. In a 10-year period of time, 

sand gates that were used in small applications such as a 200% growth rate in the capacity of Xilinx FPGAs 


1460 PartG 


Hybrid Systems 


1°92 | D Hed 


FT2KM 
elUP elUP clLeft —>| clLeft 
e2UP e2UP c2Left —| c2Left 
delUP delUP c3Left —>| c3Left 
Error de2UP de2UP c4Left —>| c4Left 
— ta gelUP gelUP gclUp | —| gclUp 
Saes ge2UP ge2UP gc2Up }—+|_ gc2Up yl > yl 
m x2 gdelUP gdelUP 2 gc3Up >| gc3Up 7 
= gde2UP gde2UP “$ gc4Up —>| gc4Up 5 = Output 
E elLow elLow 2 clRight —> clRighħ 3 = y 
$E  e2Low e2Low 5 c2Right —> c2Right 5 Â 
E delLow delLow © c3Right > c3Right = 
2 ; : y2 —> y2 
de2Low de2Low & c4Right ;>—» c4Right 7 ` 
gelLow gc4Low gelLow > gelLow clk Reset 
ge2Low ge2Low gc2Low >| gc2Low A 4 
gdelLow gdelLow gc3Low ->—>} gc3Low 
gde2Low gde2Low gc4Low | —>| gc4Low 
clk 
Reset 4 
> 


Fig. 76.7 IT2FC design entity (FT2KM). This top-level module contains instances of the four fuzzy controller 


submodules 


devices was observed, a 50% reduction rate in power 
consumption, and prices also show a significant de- 
crease rate. Other FPGA vendors, such as ACTEL, and 
ALTERA show similar developments, and this trend 
still continues. These developments, together with the 
progress in development tools that include software and 
low-cost evaluation boards, have boosted the acceptance 
of FPGAs for different technological applications. 


Development Flow 
The development flow of an FPGA-based system con- 
sists of the following major steps: 


1. Write in VHDL the code that describes the systems’ 
logic; usually a top-down and bottom-up methodol- 
ogy is used. For example, to design an IT2FC, we 
need to achieve the following procedure: 

(a) Describe the design entity where the designer 
defines the input and output of the top VHDL 
module. The idea is to present the complex 
object in different hierarchical levels of abstrac- 


tion. For our example, the top design entity is 
FT2KM. 

(b) Once the design entity has been defined, it is 
required to define its architecture, where the de- 
scription of the design entity is given; in this 
step, we define its behavior, its structure, or 
a mixture of both. For the case of the IT2 FLS, 
we define the system’s internal behavior, so we 
determined the necessity to achieve a logic de- 
sign formed by four interconnected modules: 
fuzzification, inference engine, type reduction, 
and defuzzification. The VHDL circuits (sub- 
modules) are described using a register transfer 
logic (RTL) sequence, since we can divide the 
functionality in a sequence of steps. At each 
step, the circuit achieves a task consisting in 
data transference between registers and evalua- 
tion of some conditions in order to go to the next 
step; in other words, each VHDL module (de- 
sign entity) can be divided into two areas: data 
and control. Each of the four modules needs 


Evolving Embedded Fuzzy Controllers | 76.5 Development of a Standalone IT2FC 1461 


to be conceptualized, so we need to define its 
own design entity and, therefore, its particular 
architecture as well interconnections with inter- 
nal modules. This process is achieved when we 
have reached the last system component. 

(c) Integrate the system. It is necessary to create 
a main design entity (top level) that integrates 
the submodules defining their interconnections. 
In Fig. 76.7 the integration of the four modules 
is shown. 

2. Develop the test bench in VHDL and perform RTL 
simulations for each submodule of the main design 
entity. It is necessary to achieve timing and func- 
tional simulations to create reliable internal design 
entities. 

3. Perform synthesis and implementation. In the syn- 
thesis process, the software transforms the VHDL 
constructs to generic gate-level components, such 
as simple logic gates and flip-flops. The imple- 
mentation process is composed of three small sub- 
processes: translate, map, and place, and route. In 
the translate process the multiple design files of 
a project are merged into a single netlist. The map 
process maps the generic gates in the netlist to the 
FPGA’s logic cells and IOBs, this process is also 
known as technology mapping. In the place and 
route process, using the physical layout inside the 
FPGA chip, the process places the cells in physical 
locations and determines the routes to connect di- 
verse signals. In the Xilinx flow, the static timing 
analysis performed at the end of the implantation 
process determines various timing parameters such 
as maximal clock frequency and maximal propaga- 
tion delay [76.57]. 

4. Generate the programming file and download it to 
the FPGA. According to the final netlist a configu- 
ration file is generated, which is downloaded to the 
FPGA serially. 

5. Test the design entity using a simulation program 
such as Simulink of Matlab and the Xilinx system 
generator (XSG) for Xilinx devices. The idea here 
is first to plot the surface control in order to analyze 
the general behavior of the design (a controller in 


our example), and second to integrate the design en- 
tity as a block of the desired system to be controlled. 
Although, this fifth step, is not in the current litera- 
ture of logic design for FPGA implementation, it is 
the authors’s recommendation since we have expe- 
rienced good results following this practice. 


Using the design entity FT2KM.vhd, which was 


created and tested using the aforementioned develop- 
ment flow, we can integrate it an FPGA in two ways: 


1. 


As a standalone system. Here, we mean an inde- 
pendent system that does not require the support 
of any microprocessor to work, the system itself 
is a specialized circuit that can produce the de- 
sired output. The IT2FC is implemented using the 
FPGA flow design; therefore, it is programmed us- 
ing the complete development flow for a specific 
application. 

As a coprocessor. The coprocessor performs spe- 
cialized functions in such a way that the main 
system processor cannot perform as well and faster. 
For IT2FCs, given an input, the time to produce an 
output is big enough to achieve an adequate con- 
trol of many plants when the IT2FC is programmed 
using high-level language, even we have used a par- 
allel programming paradigm. Since a coprocessor is 
a dedicated circuit designed to offload the main pro- 
cessor, and the FPGA can offer parallelism on the 
circuit level, the designer of the IT2FC coproces- 
sor can have control of the controller performance. 
The coprocessor can be physically separated, i. e., 
in a different FPGA circuit (or module), or it can be 
part of the system, in the same FPGA circuit. In this 
work, we show two methods to develop a system 
with an IT2FC as a coprocessor. In both methods, 
we consider that we have a tested IT2FC design en- 
tity. In the first case, we shall use the FT2KM design 
entity to incorporate the fuzzy controller as a copro- 
cessor of an ARM processor into an FPGA Fusion. 
In the second case, we are going to create the IT2FC 
IP core using the Xilinx Platform Studio; the core 
will serve as a coprocessor of the MicroBlaze pro- 
cessor embedded into a Spartan 6 FPGA. 


76.5 Development of a Standalone IT2FC 


Figure 76.7 shows the top-level design entity (FT2KM) 
of the IT2FC and its components (submodules) for 
FPGA implementation. The entity codification of the 
top-level entity and its components are given in 


Sect. 76.5.1. All stages include the clock (clk) and re- 
set (rst) signals. In the defuzzifier, we have included 
these two signals to illustrate that a full process takes 
only four clock cycles, one for each stage. In prac- 


S°92|9 Hed 


1462 


S°92|9 Hed 


Part G | Hybrid Systems 


tice, we did not add these two signals, since when we 
used it as a coprocessor, in order to incorporate it to 
the system, one 8-bit data latch is added at the output. 
For a detailed description of the IT2FC stages con- 
sult [76.34]. 

The fuzzification stage has two input variables, x, 
and x2. This module contains a fuzzifier for the up- 
per MPs, and another for the lower MFs of the IT2FC. 
For the upper part, for the first input xı, considering 
that a crisp value can be fuzzified by two MFs be- 
cause it may have membership values in two contiguous 
T2MFs, the linguistic terms are assigned to the VHDL 
variables eiup and eup, and their upper membership 
values are géjyp and gezup. For the second input x2, 
the linguistic terms are assigned to the VHDL vari- 
ables deiup and dey», and gde,,, and gde,,, are the 
upper membership values. The lower part of the fuzzi- 
fier is similar; for example, for the input variable xı the 
VHDL assigned variables are éjjoy and eow, and their 
lower MF values are géjjow and gezow, etc. The fuzzifi- 
cation stage entity only needs one clock cycle to perform 
the fuzzification. These eight variables are the inputs of 
the inference engine stage [76.58]. 


POON OOOO OOOO 


i 


Instances of 
VHDL entities 


r 


PS 
| Fuzzifier | 
| 


Defuzzifier 


LNIII0MI0IIINIININIANINI0NIIINANN 


min 


UUUUOUUUUUCUUU UU UCU ou 


Fig. 76.8 A standalone IT2FC is embedded into an FPGA. The 
fuzzifier reads the inputs directly from the FPGA terminals. The 
defuzzifier sends the crisp output to the FPGA terminals. The sys- 
tem may be embedded in the static region or in the reprogrammable 
region 


The inference engine is divided into two parallel 
inference engine entities IEEup is used to manage the 
upper bound of the IT2FC, and IEElow for the lower 
bound of the IT2FCs. Each entity has eight inputs from 
the corresponding fuzzifier stage, and eight outputs; 
four belong to the output linguistic terms, the rest corre- 
spond to their firing strengths. All the inputs enter into 
a parallel selection VHDL process, the circuits into the 
process are placed in parallel; the degree of parallelism 
can be tailored by an adequate codification style. In our 
case, all the rules are processed in parallel and the eight 
outputs of each inference engine section (upper bound 
and lower bound) are obtained at the same time be- 
cause the c/k signal synchronizes the process, hence this 
stage needs only one clock cycle to perform a whole 
inference and provide the output to the next stage. In 
the upper bound, the four antecedents are formed at 
the same time, for example, for the first rule, the an- 
tecedent is formed using the concatenation operator &, 
so it looks like ante := e1 & de1. Each antecedent can 
address up to four rules and depending on the combina- 
tion, one of the four rules is chosen; the upper inference 
engine output provides the active consequents and its 
firing strengths. The lower bound of the inference en- 
gine is treated in the same way [76.59]. 

At the input of the TR, we have the equivalent val- 
ues of the pre-computed yi, i. e., the linguistic terms of 
the active consequents (Citen, Crier, C3left and Catert), 
the upper firing strength (gcjup, 8C2up, 8C3up, ANd gC4up), 
in addition to the equivalent values of the pre-computed 
y(Cirights Coright, C3right, and Cyrignt), the lower firing 
strength (8Cttows &C2Iows §C3low> and 8C4low) [76.60]. All 
the above-mentioned signals go to a parallel selection 
process to perform the KM algorithm [76.39]. There 
are parallel blocks to obtain the average of the upper 
and lower firing strength for the active consequents, re- 
quired to obtain the average of the y, and y;; a block to 
obtain the different defuzzified values of y, and y;; par- 
allel comparator blocks to obtain the final result of y, 
and y; [76.61]. 

The final result of the IT2FC is obtained using the 
defuzzification block, which computes the average of 
the y, and y;, and produces the only output y. 


76.5.1 Development of the IT2 FT2KM 
Design Entity 


Figure 76.8 shows the implementation of a static IT2FC 
that can work as a standalone system. By static, we 
mean that the only way to reconfigure (modify) the 
FC is to stop the application and uploading the whole 


Evolving Embedded Fuzzy Controllers 


76.5 Development of a Standalone IT2FC 


configuration bit file (bitstream). In this system, the 
inputs of the fuzzifier and the defuzzifier output are 
connected directly to the FPGA terminals. The assign- 
ment of the terminals is achieved in accordance with the 
internal architecture of the chosen FPGA. Hence, it is 
necessary to provide to the Xilinx Integrated Synthesis 
Environment (ISE) program, special instructions (con- 
straints) to carry through the synthesis process. They 
are generally placed in the user constraint file (UCF), 
although they may exist in the HDL code. In general, 
constraints are instructions that are given to the FPGA 
implementation tools with the purpose of directing the 
mapping, placement, timing, or other guidelines for 
the implementation tools to follow while processing an 
FPGA design. In Fig. 76.7 the overall entity of design 
of the IT2FC (FTK2M) was defined as follows, 


entity FT2KM is 


Port(clk, reset : in std_logic; 
xl, x2 : in std_logic_vector(8 downto 1); 
y : out std_logic_vector (8 downto 1) 
)3 
end FT2KM; 


The architecture of FT2KM has four components, and 
all of them have two common input ports: clock (clk), 
and reset (rst). All ports in an entity are signals by de- 
fault. This is important since a signal serves to pass 
values in and out of the circuit; a signal represents cir- 
cuit interconnects (wires). A component is a simple 
piece of customized code formed by entities as corre- 
sponding architectures, as well as library declarations. 
To allow a hierarchical design, each component must be 
declared before been used by another circuit, and to use 
a component it is neccesary to instatiate it first. In this 
approach the components are: 


1. The component labeled as fuzzyUpLw. It is the T2 
fuzzifier that consists of one fuzzifier for the upper 
MF of the FOU and one for the lower MF. It has two 
input ports x/ and x2; these are 16: el Up to de2Low. 


component fuzzyUpLw is 
port(clk, reset : in std_logic; 


x1, x2, gelUp, ge2Up, gdelUp, gde2Up : 


in std_logic_vector(n downto 1); 
elUp, e2Up, delUp, de2Up, elLow, 
e2Low, delLow, 


de2Low : out std_logic_vector(3 downto 1); 


gelUp, ge2Up, gdelUp, gde2Up, gelLow, 
ge2Low, gdelLow, 


gde2Low : out std_logic_vector(n downto 1); 


); 


end component; 


1463 


S°92|9 Hed 


1464 Part G 


Hybrid Systems 


S°92|9 Hed 


The instantiation of this component is achieved us- 
ing nominal mapping and the name of this instance 
is fuzt2. Note that ports clk, reset, and x1 and x2 are 
mapped (connected) directly to the entity of design 
FT2KM, since as we explained before, all ports are 
signals by default, which represent wires. The piece 
of code that defines the instantiation of the fuzzyU- 
pLw component is as follows, 


fuzt2 : fuzzyUpLw port map( 
clk => clk, reset=> reset, x1 => x1, x2 => x2, 
elUp => elupsig, e2Up => e2upsig, delUp => delupsig, 
de2Up => de2upsig, gelUp => gelupsig, ge2Up => ge2upsig, 
gdelUp => gdelupsig, gde2Up => gde2upsig, elLow => ellowsig, 
e2Low => e2lowsig, delLow => dellowsig, de2Low => de2lowsig, 
gelLow => gellowsig, ge2Low => ge2lowsig, gdelLow => gdellowsig, 
gde2Low => gde2lowsig 
MF 


The component Infer_type_2 corresponds to the T2 
inference the controller. It has 16 inputs that match 
to the 16 outputs of the fuzzification stage. This 
component has 16 outputs to be connected to the 
type reduction stage. The piece of code to include 
this component is: 


component Infer_type_2 is 
port(rst, clk : in std_logic; 


el, e2, del, de2, e1_2, e2_2, del_2, de2_2 : in STD_LOGIC_VECTOR (m downto 1); 


g_el, g_e2, g_del, g_de2, g_el_2, g_e2_2, 
g_de1_2, g_de2_2 : in STD_LOGIC_VECTOR (n downto 1); 
cl, c2, c3, c4, cl_2, c2_2, c3_2, c4_2 : out STD_LOGIC_VECTOR {m downto 1); 


gcl_2, gc2_2, gc3_2, gc4_2, gel, gc2, gc3, gc4 : out STD_LOGIC_VECTOR (n downto 1); 


; 


end component ; 


This component is instantiated with the name Jn- 
fer_type_2 as follows, 


inft2: Infer_type_2 port map ( 
rst => reset, clk => clk, el => elupsig, e2 => e2upsig, del => delupsig, 
de2 => de2upsig, g_el => gelupsig, g_e2 => ge2upsig, g_del => gdelupsig, 
g_de2 => gde2upsig, e1_2 => ellowsig, e2_2 => e2lowsig, del_2 => dellowsig, 


de2_2 => de2lowsig, g_el_2 => gellowsig, g_e2_2 => ge2lowsig, g_del_2 => gdellowsig, 


g_de2_2 => gde2lowsig, cl => clsig, c2 => c2sig, c3 => c3sig, c4 => c4sig, 
gcl => gclsig, gc2 => gc2sig, gc3 => gc3sig, gc4 => gc4sig, cl_2 => cl2sig, 
c2_2 => c22sig, c3_2 => c32sig, c4_2 => c42sig, gcl_2 => gcl2sig, 
gc2_2 => gc22sig, gc3_2 => gc32sig, gc4_2 => gc42sig 
i 


To connect the instances fuzt2 and Infer_type_2 it is 
necessary to define some signals (wires), 


signal elupsig, e2upsig, delupsig, de2upsig : std_logic_vector (m-1 downto 0); 
signal gelupsig, ge2upsig, gdelupsig, gde2upsig :std_logic_vector (7 downto 0); 
signal ellowsig, e2lowsig, dellowsig, de2lowsig :std_logic_vector (m-1 downto 0); 


signal gellowsig, ge2lowsig, gdellowsig, gde2lowsig : std_logic_vector (7 downto 0); 


Evolving Embedded Fuzzy Controllers | 76.5 Development of a Standalone IT2FC 


The component TypeRed corresponds to the type 
reduction stage of the T2FC. It has 16 inputs that 
should connect the inference engine’s outputs and it 
has two outputs yr and yl that should be connected 
to the deffuzifier through signals, once both have 
been instantiated. The piece of code to include this 
component is: 


component TypeRed is 
Port (clk, rst : in std logic; 
cl, c2, c3, c4, c1_2, c2_2, c3_2, c4_2 : in STD LOGIC VECTOR (3 downto 1); 
gel, gc2, gc3, gc4, gcl_2, gc2_2, gc3_2, gc4_2 : in STD_LOGIC_VECTOR (7 downto 0); 
yl, yr : out std_logic_vector (8 downto 1)); 


end component; 


This component is instantiated with the name trkm 
as follows, 


inft2: Infer_type_2 port map( 
rst => reset, clk => clk, el => elupsig, e2 => e2upsig, del => delupsig, 
de2 => de2upsig, g_el => gelupsig, g_e2 => ge2upsig, g_del => gdelupsig, 
g_de2 => gde2upsig, e1_2 => ellowsig, e2_2 => e2lowsig, del_2 => dellowsig, 
de2_2 => de2lowsig, g_el_2 => gellowsig, g_e2_2 => ge2lowsig, g_del_2 => gdellowsig, 
g_de2_2 => gde2lowsig, cl => clsig, c2 => c2sig, c3 => c3sig, c4 => c4sig, 
gel => gclsig, gc2 => gc2sig, gc3 => gc3sig, gc4 => gc4sig, cl_2 => cl2sig, 
c2_2 => c22sig, c3_2 => c32sig, c4_2 => c42sig, gcl_2 => gcl2sig, 
gc2_2 => gc22sig, gc3_2 => gc32sig, gc4_2 => gc42sig 
js 


The signals that connect the instance Infer_type_2 
to the instance trkm are 


signal clsig, c2sig, c3sig, c4sig : std_logic_vector (m-1 downto 0); 
signal gclsig, gc2sig, gc3sig, gc4sig : std_logic_vector (7 downto 0); 
signal cl2sig, c22sig, c32sig, c42sig : std_logic_vector (m-1 downto 0); 
signal gcl2sig, gc22sig, gc32sig, gc42sig :std_logic_vector (7 downto 0); 


The last component defit2 corresponds to the de- 
fuzzifier stage of the T2FLC. It has two inputs and 
one output. 


component defit2 is 
Port ( yl, yr : in std_logic_vector (n-1 downto 0); 
y =: out std logic vector (n-1 downto 0)); 
end component; 


This component is instantiated with the name dfit2 
as follows, 


dfit2 : defit2 port map(yl => ylsig, yr => yrsig, y => y); 


We did not define any signal for the port y since 
it can be connected directly to the entity of design 
FT2KM. The instances trkm and dfit2 are connected 
using the following signals, 


signal ylsig, yrsig : std -logic vector (n-1 downto 0); 


1465 


S°92|9 Wed 


1466 PartG 


Hybrid Systems 


9°92 | D Hed 


This approach of implementing an IT2FC pro- 
vides the faster response. The whole process 
consisting of fuzzification, inference, type re- 
duction, and defuzzification is achieved in four 


clock cycles, which for a Spartan family im- 
plementation using a SOMHz clock represents 
80 x107°s, and for a Virtex 5 FPGA-based system is 
40 x10~°s. 


76.6 Developing of IT2FC Coprocessors 


The use of IT2FC embedded into an FPGA can cer- 
tainly be the option that offers the best performance 
and flexibility. As we shall see, the best performance 
can be obtained when the embedded FC is used as stan- 
dalone system. Unfortunately, this gain in performance 
can present some drawbacks; for example, for people 
who were not involved in the design process of the con- 
troller or who are not familiar with VHDL codification, 
or the code owners simply want to keep the codifica- 
tion secret. All these obstacles can be overcome by the 
use of IP cores. Next, we shall explain two methods of 
implementing IT2FC as coprocessors. 


76.6.1 Integrating the IT2FC Through 
Internal Ports 


In Fig. 76.9, we show a control system that integrates 
the FT2KM design entity embedded into the Actel 
Fusion FPGA [76.62] as a coprocessor of an ARM 
processor. This FPGA allows incorporating the soft pro- 
cessor ARM Cortex, as well as other IP cores to make 
a custom configuration. The embedded system con- 
tains the ARM processor, two memory blocks, timers, 
interrupt controller (IC), a Universal Asynchronous Re- 
ceiver/Transmitter (UART) serial port, IIC, pulse width 
modulator/tachometer block, and a general-purpose in- 
put/output interface (GPIO) interfacing the FT2KM 
block. All the factory embedded components are soft 
IP cores. The FT2KM is a VHDL module that together 
with the GPIO form the Ft2km_core soft coprocessor, 
handled as an IP core; however, in this case, it is nec- 
essary to have the VHDL code. In the system, the IT2 
coprocessor is composed of the GPIO and the FT2KM 
modules, forming the Ft2km_core. In the system, more- 
over, are a DC motor with a high-resolution quadrature 
optical encoder, the system’s power supply, an H-bridge 
for power direction, a personal computer, and a digital 
display. 

The Ft2km_core has six inputs and two outputs. The 
inputs are error, c.error, ce, rst, w, and clk. The 8-bit 
inputs error and c.errror are the controller input for the 
error and change of error values. ce input is used to en- 


able/disable the fuzzy controller, the input rst restores 
all the internal registers of the IT2FC, and the input 
w allows starting a fuzzy inference cycle. The outputs 
are out, and IRQ/RDY; the first one is the crisp output 
value, which is 8-bit wide. JRQ/RDY is produced when 
the output data corresponding to the respective input is 
ready to be read. IRQ is a pulse used to request an inter- 
rupt, whereas, RDY is a signal that can be programmed 
to be active in high or low binary logic level, indicating 
that valid output was produced; this last signal can be 
used in a polling mode. In Fig. 76.9 we used only 1 bit 
for the JRO/RDY signal, at the moment of designing the 
system the designer will have to decide on one method. 
It is possible to use both, modifying the logic or sepa- 
rating the signal and adding an extra 1-bit output. 

The GPIO IP has two 32 bit wide ports, one for input 
(reading bus) and one for output (write bus). The output 
bus connects the GPIO IP to the ARM cortex using the 
32 bit bus APB. The input bus connects the IT2FC IP to 
the GPIO IP. The ARM cortex uses the Ft2km_core as 
a coprocessor. 


76.6.2 Development of IP Cores 


In Sect. 76.6.1, we showed how to integrate the fuzzy 
coprocessor through an input/output port, i.e., the IP 
GPIO. We also commented on the existence of IP cores 
such as the UART and the timers that are connected 
directly to the system bus as in any microcontroller 
system with integrated peripherals. In this section, we 
shall show how to implement an IT2FC connected to 
the system bus to obtain an IT2FC IP core integrated to 
the system architecture. The procedure is basically the 
same for any FPGA of the Xilinx family. We worked 
with the Spartan 6 and Virtex 5, so the Xilinx ISE De- 
sign Suite was used. 

The whole process to start an application that in- 
cludes a microprocessor and a coprocessor can be 
broadly divided into three steps: 


1. Design and implement the design entity that will be 
integrated as an IP core in further steps, then follows 


Evolving Embedded Fuzzy Controllers | 76.6 Developing of IT2FC Coprocessors 1467 


IRQ ARM processor 
z: m=- 8 
Y DC 
Memory flash Memory sram power 
supply y 
= 
G) 
APB H-bridge — 
N 
T c 
OLED TIC 2 
display GPIO PWM/TACH 
PWM 


PC UART 
Ft2km_core 


X1(8:1) 
X2(8:1) 


Timers 
T ce 


logic 


rst 


FT2KM 


Registers + 


Logic 


OSC/PLL 
50 Mhz 


Fig. 76.9 A coprocessor implemented into the Actel Fusion FPGA. The system has an ARM processor, the IT2FC 
coprocessor implemented through the general-purpose input/output port, and some peripherals 


the development flow explained in Sect. 76.4.2. In 
our case, the design entity is FT2KM. 

Create the basic embedded microcontroller system 
tailored for our application. We already know the 
kind and amount of memory that we will need, as 


well as the peripherals. This step is achieved as fol- 
lows: we create the microprocessor system using the 
base system builder (BSB) of the Xilinx Platform 
Studio (XPS) software. The system contains a Mi- 
croblaze softcore, 16 KB of local memory, the data 
controller bus (dlmb_cntlr), and the instruction con- 
troller bus (ilmb_cntlr). 

3. Create the IP core, which should contain the de- 
sired design entity, in our case the FT2KM. This 
step is achieved using the Import Peripheral Wiz- 
ard found in the Hardware option in the XPS. The 
idea is to connect the FTKM design entity to the 
processor local bus (PLB V4.6) through three reg- 
isters, one for each input (two registers) and one 
for the output. Upon the completion, this tool will 
create synthesizable HDL file (ft2km_core) that im- 
plements the intellectual property interface (IPIF) 


Fig. 76.10 IP Core implementation of a user defined pe- 
ripheral. The IT2FC coprocessor is implemented into the 
user logic module. This module achieves communication 
with the rest of the system through the PLB or the on-chip 
peripheral bus OPB. For a static coprocessor, use the PLB. 
For an implementation in the reconfigurable region, use the 
OPB < 


1468 PartG 


Hybrid Systems 


29L|9 Hed 


services required and a stub user_logic_module. 
These two modules are shown in Fig. 76.10. The 
IPIF connects the user logic module to the sys- 
tem bus using the OPB or the PLB bus or to the 
on-chip peripheral bus (OPB). At this stage, we 
will need to use the JSE Project Navigator (ISE) 
software to integrate to the user_logic_module all 
the required files that implement the FT2KM de- 
sign entity. Edit the User_Logic_I.vhd file to de- 
fine the FT2KM component and signals. Open the 
jtk2_core.vhd file and create the ftk2_core entity 
and user logic. Synthesize the HDL code and exit 


76.7 Implementing a GA in an FPGA 


In essence, evolution is a two-step process of random 
variation and selection of a population of individuals 
that responds with a collection of behaviors to the envi- 
ronment. Selection tends to eliminate those individuals 
that do not demonstrate an appropriate behavior. The 
survivors reproduce and combine their features to ob- 
tain better offspring. In replication random mutation 
always occurs, which introduces novel behavioral char- 
acteristics. The evolution process optimizes behavior 
and this is a desirable characteristic for a learning sys- 
tem. Although the term evolutionary computation dates 
back to 1991, the field has decades of history, ge- 
netic algorithms being one avenue of investigation in 
simulated evolution [76.63]. GAs are family of compu- 
tational models, which imitates the principles of natural 
evolution. For consistency they adopt biological termi- 
nology to describe operations. There are six main steps 
of a GA: population initialization, evaluation of candi- 
dates using a fitness function, selection, crossover, and 
termination judgment, as is shown in Algorithm 76.1. 
The first step is to decide how to code a solution to the 
problem that we want to optimize; hence, each individ- 
ual is represented using a chromosome that contains the 
parameters. Common encoding of solutions are binary, 
integer, and real value. In binary encoding, every chro- 
mosome is a string of bits. In real-value encoding, every 
chromosome is a string than can contain one or several 
parameters encoded as real numbers. Algorithm 76.1 
starts initializing a population with random solutions, 
and then each individual of the population is evaluated 
using a fitness function, which is selected according to 
the optimization goals. For example, for tuning a con- 
troller it may be enough to check if the actual output 
controller is minimizing errors between the target and 


ISE. Return to the XSP and add the FTK2_core 
IP to the embedded system, connect the new IP 
core to the mb_plb bus system and generate ad- 
dress. Figure 76.10 shows the IT2FC IP core; 
the IPIF consists of the PLB V4.6 bus controller 
that provides the necessary signals to interface 
the IP core to the embedded soft core bus sys- 
tem. 

4. Design the drivers (software) to handle this design 
entity as a peripheral. 

5. Design the application software to use the design 
entity. 


the reference. However, one or more complex fitness 
functions can be designed in order to carry out the con- 
trol goal. In steps 3 to 5 the genetic operations are 
applied, i. e., selection, crossover (recombination), and 
mutation. In step 6, the termination criteria are checked, 
stopping the procedure if such criteria have been ful- 
filled. 


Algorithm 76.1 General scheme of a GA 
initialize population with random candidate solu- 
tions 
evaluate each candidate 
repeat 

select parents 

recombine pairs of parents 

mutate the resulting offspring 

evaluate new candidates 

select individuals for the next generation 
until termination condition is satisfied 


In this work, we have chosen work a GA to evolve 
the IT2FC. However, the ideas exposed here are valid 
for most evolutionary and natural computing methods. 
So, there are two methods to implement any evolu- 
tionary algorithm. One is based on executing software 
written using a computer language such as C/C++, 
similarly as with a desktop computer. The second 
method is based on designing specialized hardware us- 
ing a HDL. Both have advantages and disadvantages; 
the first method is the easier method since there is 
much information about coding using a high level lan- 
guage for different EAs. However, this solution may 
have similar limitations for real-time systems since they 
are slower than hardware implementations by at least 


Evolving Embedded Fuzzy Controllers 


76.7 Implementing a GA in an FPGA 


Fitness 


RGN 0 evaluation > 


module 


Selection 
module 


Control module 


Crossover 
module 


Mutation 
module 


Fig. 76.11 High-level view of the structure of a GA for 
FPGA implementation 


a factor of magnitude of five. On the other hand, state 
machine hardware-based designs are more complex to 
implement and use. In this section we shall present 
a small overview of both methods. 


76.7.1 GA Software Based Implementations 


It is well known that a GA can run in parallel, taking ad- 
vantage of the two types of known parallelism: data and 
control parallelism. Data parallelism refers to executing 
one process over several instances of the EA, while con- 
trol parallelism works with separate instances. 
Coarse-grained parallelism and fine-grained paral- 
lelism are two methods often associated with the use 
of EA in parallel. The use of both methods is called 
a hybrid approach. Coarse-grained parallelism entails 
the EA cores to work in conjunction to solve a prob- 
lem. The nodes swap individuals of their population 
with another node running the same problem. The cores 


can exchange individuals with each other to improve 
diversity. The amount of information, frequency of ex- 
change, direction, data pattern, etc., are factors that can 
affect the efficiency of this approach. 

In fine-grained parallelism, the approach is to share 
mating partners instead of populations. The members 
of populations across the parallel cores select to mate 
their fittest members with the fittest found in a neigh- 
boring node’s population. Then, the offspring of the 
selected individuals are distribuited. The distribution 
of this next generation can go to one of the parents’ 
populations, both parents’ population, or all cores’ pop- 
ulations, based on the means of distribution. 

Figure 76.4 shows a six-core architecture design 
for the Virtex 5. Here, we can make fine or coarse- 
grained implementations of an EA. For example, for 
coarse-grained implementation, the island model with 
one processor per island can be used. 


76.7.2 GA Hardware Implementations 


Figure 76.11 shows a high-level view of the architec- 
ture of a GA for hardware implementation. The system 
has eight basic modules: selection module, crossover 
module, mutation module, fitness evaluation module, 
control module, observer module, four random gener- 
ation number (RGN) modules, and two random access 
memory modules. 

The control module is a Mealy state machine de- 
signed to feed all other modules with the necessary 
control signals to synchronize the algorithm execution. 
The selection module can have any existing method 
of selection, for example the Roulette Wheel Selec- 
tion Algorithm. This method picks the genes of the 
parents of the current population, and the parents are 
processed to create new individuals. At the current 
generation, the crossover and genetic modules achieve 
the corresponding genetic operation on the selected 
parents. The fitness evaluation module computes the 
fitness of each offspring and applies elitism to the pop- 
ulation. The observer module determines the stopping 
criterion and observes its fulfilment. RNGs are indis- 
pensable to provide the randomness that EAs require. 
Additionally, RAM 1 is necessary to store the current 
population and RAM 2 to store the selected parents of 
each generation. 


14.69 


7'91 | D Wed 


1470 Part G | Hybrid Systems 


8°92|9 Hed 


76.8 Evolving Fuzzy Controllers 


In Sect. 76.1 the general structure of an EFRBS was 
presented. It was mentioned that the common denom- 
inator in most learning systems is their capability of 
making structural changes to themselves over time to 
improve their performance for defined tasks. It also was 
mentioned that the two classical approaches for fuzzy 
learning systems are the Michigan and Pittsburgh ap- 
proaches, and there exist newer proposals with the same 
target. Although to programm a learning system in 
a computer using high-level language, such as C/C++, 
requires some skill, system knowledge, and experimen- 
tation, there are no technical problems with achieving 
a system with such characteristics. This can be also 
true for hardware implementation, if the EFRBS was 
developed in C/C++ and executed by a hard or soft pro- 
cessor such as PowerPC or Microblaze, it is similarly 
as it is done in a computer. How to develop a coproces- 


FPGA 
Static region Reconfigurable region 
s g i 
Static module 3 PR region A 
E£ IT2FC module A 
2 
aa) 


Micro blaze 


PR module Al 
IT2FC module Al 


Peripherals PR module A2 


IT2FC module A2 


interface 


System ACE modules 


Bus macro 


Tititi 


System ACE CF 
controller memory 


Fig. 76.12 The FPGA is divided into two regions: static and recon- 
figurable. The soft processor and peripherals are in the static region. 
Different fuzzy controller architectures are in the reconfigurable re- 
gion. The bus macro are fixed data paths for signals going between 
a reconfigurable module and another module 


sor was explained in Sect. 76.6. The coprocessor was 
developed in the FPGA’s static (base) region, which 
cannot be changed during a partial reconfiguration pro- 
cess. Therefore, such coprocessors cannot suffer any 
structural change. Achieving an EFRBS in hardware is 
quite different to achieving it using high-level language, 
because it is more difficult to change the circuitry than 
to modify programming lines. 

FPGAs are reprogrammable devices that need a de- 
sign methodology to be successfully used as reconfig- 
urable devices. Since there are several vendors with dif- 
ferent architectures, the methodology usually change 
from vendor to vendor and devices. For the Xilinx 
FPGAs the configuration memory is volatile, so, it needs 
to be configured every time that it is powered by upload- 
ing the configuration data known as bitstream. Configur- 
ing FPGA this way is not useful for many applications 
that need to change its behavior while they still work- 
ing online. A solution to overcome such a limitation is 
to use partial reconfiguration, which splits the FPGA 
into two kinds of regions. The static (base) region is 
the portion of the design that does not change during 
partial reconfiguration, it may include logic that con- 
trols the partial reconfiguration process. In other words, 
partial reconfiguration (PR) is the ability to reconfigure 
select areas of an FPGA any time after its initial con- 
figuration [76.64]. It can be divided into two groups: 
dynamic partial reconfiguration (DPR) and static par- 
tial reconfiguration (SPR). DPR is also known as active 
partial reconfiguration. It allows changing a part of the 
device while the rest of the FPGA is still running. DPR 
is accomplished to allow the FPGA to adapt to chang- 
ing algorithms and enhance performance, or for critical 
missions that cannot be disrupted while some subsys- 
tems are being defined. On the other hand, in SPR the 
static section of the FPGA needs to be stopped, so auto- 
reconfiguration is impossible (Fig. 76.12). 

For Xilinx FPGAs, there are basically three ways 
to achieve DPR for devices that support this feature. 
The two basic styles are difference-based partial re- 
configuration and module-based partial reconfiguration. 
The first one can be used to achieve small changes to 
the design, the partial bitstream only contains infor- 
mation about differences between the current design 
structure that resides in the FPGA and the new con- 
tent of the FPGA. Since the bitstream differences are 
usually small, the changes can be made very quickly. 
Module-based partial reconfiguration is useful for re- 
configuring large blocks of logic using modular design 


Evolving Embedded Fuzzy Controllers 


76.8 Evolving Fuzzy Controllers 


concepts. The third style is also based on modular de- 
sign but is more flexible and less restrictive. This new 
style was introduced by Xilinx in 2006 and it is known 
as early access partial reconfiguration (EAPR) [76.65, 
66]. There are two key differences between the de- 
sign flow EAPR and the module-based one. (1) In the 
EAPR flow the shape and size of partially reconfig- 
urable regions (PRRs) can be defined by the user. Each 
PRR has at least one, and usually multiple, partially re- 
configurable modules (PRMs) that can be loaded into 
the PRR. (2) For modules that communicate with each 
other, a special bus macro allows signals to cross over 
a partial reconfiguration boundary. This is an important 
consideration, since without this feature intermodule 
communication would not be feasible, as it is impos- 
sible to guarantee routing between modules. The bus 
macro provides a fixed bus of inter-design communi- 
cation. Each time partial reconfiguration is performed, 
the bus macro is used to establish unchanging routing 
channels between modules, guaranteeing correct con- 
nections [76.65]. 

An important core that enables embedded micropro- 
cessors such as MicroBlaze and PowerPC to achieve 
reconfiguration at run time is HWICAP (hardware in- 
ternal configuration access point) for the OPB. The 
HWICAP allows the processors to read and write the 
FPGA configuration memory through the ICAP (in- 
ternal configuration access point). Basically it allows 
writing and reading the configurable logic block (CLB) 
look-up table (LUT) of the FPGA. 

The process to achieve reconfigurable computing 
with application to IT2FC will be explained with more 
detail in Sect. 76.8.2. Moreover, how to evolve an 
IT2FC embedded into an FPGA, whether it resides in 
the static or in the reconfigurable region, will be also 
explained in therein. 


76.8.1 EAPR Flow for Changing 
the Controller Structure 


Figure 76.12 shows the basic idea of using EAPR flow 
for reconfigurable computing to change from one IT2FC 
structure to a different one. In this figure the Microb- 
laze soft processor can evaluate each controller structure 
according to single or multiobjective criteria. The pro- 
cessor communicates with a PR region using the bus 
macro, which provides a means of locking the routing 
between the PRM and the base design. The system can 
achieve fast reconfiguration operations since partial bit- 
stream are transferred between the FPGA and the com- 
pact flash memory (CF) where bitstreams are stored. 


In general, the EAPR design flow is as fol- 
lows [76.64, 67, 68]: 


1. Hardware description language design and synthe- 
sis. The first steps in the EAPR design flow are very 
similar to the standard modular design flow. We can 
summarize this in three steps: 

(a) Top-level design. In this step, the design de- 
scription must only contain black-box instanti- 
ations of lower-level modules. Top-level design 
must contain: I/O instantiations, clock primi- 
tives instantiations, static module instantiations, 
PR module instantiations, signal declarations, 
and bus macro instantiations, since all non- 
global signals between the static design and the 
PRMs must pass through a bus macro. 

(b) Base design. Here, the static modules of the 
system contain logic that will remain constant 
during reconfiguration. This step is very simi- 
lar to the design flow explained in Sect. 76.4.2. 
However, the designer must consider input and 
output assignment rules for PR. 

(c) PRM design. Similarly to static modules, PR 
modules must not include global clock sig- 
nals either, but may use those from top-level 
modules. When designing multiple PRMs to 
take advantage of the same reconfigurable area, 
for each module, the component name and 
port configuration must match the reconfig- 
urable module instantiation of the top-level 
module. 

2. Set design constraints. In this step, we need to 
place constraints in the design for place and route 
(PAR). The constraints included are: area group, 
reconfiguration mode, timing constraint, and loca- 
tion constraints. The area group constraint specifies 
which modules in the top-level module are static 
and which are reconfigurable. Each module instanti- 
ated by the top-level module is assigned to a group. 
The reconfiguration mode constraint is only applied 
to the reconfigurable group, which specifies that the 
group is reconfigurable. Location constraints must 
be set for all pins, clocking primitives, and bus 
macros in top-level design. Bus macros must be lo- 
cated so that they straddle the boundary between the 
PR region and the base design. 

3. Implement base design. Before the implementation 
of the static modules, the top level is translated 
to ensure that the constraints file has been created 
properly. The information generated by implement- 
ing the base design is used for the PRM implemen- 


1471 


8°92 | D Hed 


1472 


8°92|9 Hed 


Part G 


Hybrid Systems 


tation step. Base design implementation follows 
three steps: translate, map, and PAR. 

4. Implement PRMs. Each of the PRMs must be 
implemented separately within its own directory, 
and follows base design implementation steps: 1. e., 
translate, map and PAR. 

5. Merge. The final step in the partial reconfiguration 
flow is to merge the top level, base, and PRMs. Dur- 
ing the merge step, a complete design is built from 
the base design and each PRM. In this step, many 
partial bitstreams for each PRM and initial full bit- 
streams are created to configure the FPGA. 


Partial dynamic reconfigurable computing allows us 
to achieve online reconfiguration. By selecting a cer- 
tain bitstream is possible to change the full controller 
structure, or any of the stages (fuzzification, inference 
engine, type reduction, and defuzzification), as well 
as any individual section of each stage, for example, 
different membership functions for the fuzzification 
stage, etc. However, we need to have all the reconfig- 
urable modules previously synthesized because they are 
loaded using partial bitstreams. Therefore, to have the 
capability to evolve reconfigurable modules we need to 
provide them with a control register (CR) to change the 
desired parameters. 

Next, a flexible coprocessor (FlexCo) prototype of 
an IT2FC (FlexCo IT2FC) that can be implemented 
either in the static region as well as in the PR is 
presented. 


76.8.2 Flexible Coprocessor Prototype 
of an IT2FC 


Figure 76.13 illustrates the FlexCo IT2FC, which con- 
tains the four stages (fuzzification, inference engine, 
type reduction, and defuzzification). They are con- 
nected depending on the target region, to the PLB 
or to the OPB through a 32bits command register 
(CR), which is formed by four 8 bit registers named 
R1 to R4 (Fig. 76.14). The parameters of each stage 
can be changed by the programmer since they are not 
static as they were defined previously for the FT2KM 
(Sect. 76.5). Now, they are volatile registers connected 
through signals to save parameter values. The proces- 
sor (MicroBlaze) can send through the PLB or the OPB, 
two kinds of commands to the CR: control words (CWs) 
and data words (DWs). The state machine of the FlexCo 
IT2FC interprets the command. 

Figure 76.14 illustrates the CR coding for static 
and reconfigurable FC. This register is used to perform 


0mm Static flexible coprocessor 
IT2FC 
= wy 
7 51> fel wales g 
= S = Ñ 
N a o N 
3 d B E 
v 
-BH a G a) 
a | 


SG2 


State machine 


23 
H 
V 
= 

Zi 
iol) 
o 
a 


1 = Contr. word 
0 = Data word 


we 
= 


Fig. 76.13 Flexible coprocessor proposal of an IT2FC for 
the static region 


Control register (CR) 
sil 24 
R4: ] 
iTi = 00: Fuzzification 
es SG-bits: States 0 to 4 4 Ot: Inference engine 
10: Type reduction 
11: Defuzzification 
.,. J 1 = Default method 
DEAD bit 0 = Change method 
ee 1 = Lingustic var/activate 
WARS 0 = Lingustic term/deactivate 
Be 1 = Upper section 
Sale 0 = Lower section 
y 1 = Input MF 
Ant/Con: A 0 = Output MF 
-~ J 1 = Contr. word 
Control bit: e Dind 
23 16 
R3: 
Se 
4 
'—— Number of LV, LT, Type reducer, defuzzifier 
15 8 
R2: 
Parameter value 
T 0 or rule coding 
RI: ] 


Fig. 76.14 The control register is used for both styles of 
implementation, in the static region or in the reconfig- 
urable region 


parameter modification in both modes, static and recon- 
figurable. In general, bit 7 of R4 is used to differentiate 
between a CW or a DW, / means a CW, whereas 0 
means a DW. The StaGe bits (SG-bits) serves to iden- 
tify the IT2FC stage that is to be modified. 


Fig. 76.15 In the static region of the FPGA a multiproces- 
sor system (MPS) with operating system. The GA resides 
in the program memory, it is executed by the MPS. The 
IT2FC may be implemented in the reconfigurable region, 
Fig. 76.16, or in the static region, Fig. 76.13 > 


@ SG-bits = 00: The fuzzification stage has been cho- 
sen, then it is necessary to set the bit Ant/Con to 
1 to indicate that the antecedent MFs are going to 
be modified. With the section-bit (S-bit) we indi- 
cate which part of the FOU (upper or lower) will 
be modified. The bit linguistic-variable-term/active 
(LVT/Active) is to indicate whether we want to 
modify a linguistic variable (LV) or the linguistic 
term (LT), the Act option is for the inference engine 
(IE). In accordance to the LV/LT bit value, in the 
register R3 we set the number of the LV or the LT 
that will be changed. Finally, with registers R1 and 
R2, the parameter value of the LV or the LT is given, 
R1 is the least significant byte. 

© SG-bits = 01: With this setting, the state machine 
identifies that the IE will be modified. It works 
in conjunction with Ant/Con, S-bit, and the reg- 
isters Rl, R2, and R3. Set a O value in the 
Ant/Con bit to change the consequent parameters 
of a Mamdani inference system, in S-bit choose 
the upper or lower MF, using R3 indicate the 
number of MF, and with R1 and R2 set the cor- 
responding value or static implementation. It is 
possible to activate and deactivate rules using the 
bit LVT/Active. With bit dynamic change/activate- 
deactivate (DC/AD), it is possible to change the 
combination of antecedents and consequents of 
a specific rule provided that we have made this 
part flexible by using registers. For an implemen- 
tation in the reconfigurable region, it is possible 
to add or remove rules. These two features need 
to work in conjunction with registers R1, R2, and 
R3. 

@ SG-bits = 10: This selection is to modify the type 
reduction stage. It is possible to have more than 
one type reducer. By setting the DC/AD-bit to 1, 
we indicate that we wish to change the method 
at running time without the necessity of achiev- 
ing a reconfiguration process that implies uploading 
partial bitstreams. The methods can be selected us- 
ing register R3. By using a DC/AD-bit equal to 0 
and LVT/Act equal to 0, in combination with regis- 
ters R1 to R3 we can indicate that we wish to change 
the preloaded values that the KM-algorithm needs 
to achieve the TR. 


Multi-core system 


Hard | Os 


processor Linux... 
O 


Data 
memory 


H 
Soft Program memory 
processor 


Genetic algorithm 


Output 
processing | 


—> Fuzzification | 
>j 


Instances of VHDL entities 


NOOO OOOO A 


CUTOUT OCU TUCO 


Reconfigurable flexible coprocessor 


IT2FC 
0 PR PR PR PR 
regionA regionB  regionC region D 

_ 

7 5 vh FIE 
5 5 2 y 

8 HID E 

5 4 

a 5 a A 

15 


23) 
State machine 


z 


fyo] 1 = Contr. word 
31) 0 = Data word 


Static logic module 


Fig. 76.16 Flexible coprocessor proposal of an IT2FC for 
the reconfigurable region 


@ SG-bits = 11: Similarly to the type reduction stage, 
we can change the defuzzifier at running time. 


With respect to the type reducer and defuzzifica- 
tion stages, we give the option to have more than one 
module, which has the advantage of making the process 


QOUUUUUUOUUUUUOUUUUUOU UU UU 


i 


Evolving Embedded Fuzzy Controllers | 76.8 Evolving Fuzzy Controllers 1473 


8°91 | D Hed 


1474 Part G | Hybrid Systems 


8°91 | D Hed 


Output 
| processing 


NOOO ANNONA AA AANA AA 


UUUUUUUUUUU UU UU 


Fig. 76.17 This design may be implemented in both regions to have 
a dynamic reconfigurable system. For a static implementation, the 
system must have registers for all the variable parameters to make 
possible to change their values, Fig. 76.13 


easier and possible for static designs, but the disadvan- 
tage is that the design will consume more macrocells, 
increasing the cost of the required FPGAs, boards, and 
power consumption. Next, we will explain the imple- 
mentation of the FlexCo IT2FC for the static region and 
the reconfigurable region. 


Implementing the FlexCo IT2FC 

on the Static Region 
The IT2FC of is connected to the PLB. Although the 
controller structure is static, this system can be evolved 
for tuning and learning because it is possible to achieve 
parametric modifications to all the IT2FC stages. Fig- 
ure 76.13 shows the architecture of this system and 
Fig. 76.15 a conceptual model of the possible imple- 
mentation. 


Implementing the FlexCo IT2FC on the PR 
Figure 76.16 illustrates a more flexible architecture for 
FlexCo IT2FC. The IT2FC is implemented in the recon- 
figurable region, using a partially reconfigurable region 
(PRR) for each stage. This is convenient since each re- 
gion can have multiple modules that can be swapped 


in and out of the device on the fly. This is the most rec- 
ommended method to achieve the evolving IT2FC since 
it is more flexible. One disadvantage is that at running 
time it is slower than the static implementation because 
more logic circuits are incorporated. 

Figure 76.17 is an evolutive standalone system; as 
it was mentioned, the IT2FC and the GA can be in the 
static or in the reconfigurable region. 


76.8.3 Conclusion and Further Reading 


FPGAs combine the best parts of ASICs and processor- 
based systems, since they do not require high volumes 
to justify making a custom design. Moreover, they 
also provide the flexibility of software, running on 
a processor-based system, without being limited by the 
number of cores available. They are one of the best 
options to parallelize a system since they are parallel 
in nature. In an IT2FC, a typical whole T2-inference, 
computed using an industrial computer equipped with 
a quad-core processor, lasts about 18 x107? s. A whole 
IT2FC (fuzzification, inference, KM-type reducer, and 
defuzzification) lasts only four clock cycles, which for 
a Spartan implementation using a 50 MHz clock repre- 
sents 80x10? s, and for a Virtex 5 FPGA-based system 
represents 40 x10~° s. For the Spartan family the typi- 
cal implementation speedup is 225 000, whereas for the 
Virtex 5 it is 450000. Using a pipeline architecture, the 
speedup of the whole IT2 process can be obtained in 
just one clock cycle, so using the same criteria to com- 
pare, the speedup for Spartan is 90000 and 2 400000 
for Virtex. Reported speedups of GAs implemented into 
an FPGA, are at least 5 times higher than in a computer 
system. For all these reasons, FPGAs are suitable de- 
vices for embedding evolving fuzzy logic controllers, 
especially the IT2FC, since they are computationally 
expensive. There are some drawbacks with the use of 
this technology, mostly with respect to the need to have 
a highly experienced development team because its 
implementation complexity. Achieving an evolving in- 
telligent system using reconfigurable computing is not 
as direct as it is using a computer system. It requires 
the knowledge of FPGA architectures, VHDL cod- 
ing, soft processor implementation, the development 
of coprocessors, high-level languages, and reconfig- 
urable computing bases. Therefore, people interested in 
achieving such implementations require expertise in the 
above fields, and further reading must focus on these 
topics, FPGA vendor manuals and white papers, as well 
as papers and books on reconfigurable computing. 


Evolving Embedded Fuzzy Controllers 


References 


References 


76.1 


76.2 


76.3 


76.4 


76.5 


76.6 


76.7 


76.8 


76.9 


76.10 


76.11 


76.12 


76.13 


76.14 


76.15 


76.16 


76.17 


P.P. Angelov, X. Zhou: Evolving fuzzy-rule-based 
classifiers from data streams, IEEE Trans. Fuzzy Syst. 
16(6), 1462-1475 (2008) 

0. Cordon, F. Herrera, F. Hoffman, L. Magdalena: 
Genetic Fuzzy Systems: Evolutionary Tuning and 
Learning of Fuzzy Knowledge Bases (World Scien- 
tific, Singapore 2001) 

P. Angelov, R. Buswell: Evolving rule-based mod- 
els: A tool for intelligent adaptation, IFSA World 
Congr. 20th NAFIPS Int. Conf. 2001. Jt. 9th, Vancou- 
ver, Vol. 2 (2001) pp. 1062-1067 

K. De Jong: Learning with genetic algorithms: An 
overview, Mach. Learn. 3(2), 121-138 (1988) 

J.H. Holland: Adaptation in Natural and Artificial 
Systems: An Introductory Analysis with Applications 
to Biology, Control, and Artificial Intelligence (MIT 
Press/Bradford Books, Cambridge 1998) 

K.A. De Jong: Evolutionary Computation: A Unified 
Approach (MIT Press, Cambridge 2006) 

0. Cordón, F. Gomide, F. Herrera, F. Hoffmann, 
L. Magdalena: Ten years of genetic fuzzy systems: 
Current framework and new trends, Fuzzy Sets Syst. 
141(1), 5-31 (2004) 

V. Gilles: SIA: A supervised inductive algorithm 
with genetic search for learning attributes based 
concepts, Lect. Notes Comput. Sci. 667, 280-296 
(1993) 

J. Juan Liu, J. Tin-Yau Kwok: An extended genetic 
rule induction algorithm, Proc. 2000 Congr. Evol. 
Comput., Vol. 1 (2000) pp. 458-463 

0. Cordón, M.J. del Jesus, F. Herrera, M. Lozano: 
MOGUL A methodology to obtain genetic fuzzy rule 
based systems under the iterative rule learning ap- 
proach, Int. J. Intell. Syst. 14(11), 1123-1153 (1999) 
G.D. Perry, F.S. Stephen: Competition-based in- 
duction of decision models from examples, Mach. 
Learn. 13, 229-257 (1993) 

G.D. Perry, F.S. Stephen: Using coverage as a model 
building constraint in learning classifier systems, 
Evol. Comput. 2, 67-91 (1994) 

A. Giordana, F. Neri: Searc-intensive concept in- 
duction, Evol. Comput. 3, 375-416 (1995) 

H. Ishibuchi, K. Nozaki, N. Yamamoto, H. Tanaka: 
Selecting fuzzy if-then rules for classification prob- 
lems using genetic algorithms, IEEE Trans. Fuzzy 
Syst. 3(3), 260-270 (1995) 

A. Homaifar, E. McCormick: Simultaneous design of 
membership functions and rule sets for fuzzy con- 
trollers using genetic algorithms, IEEE Trans. Fuzzy 
Syst. 3(2), 129-139 (1995) 

D. Park, A. Kandel, G. Langholz: Genetic-based new 
fuzzy reasoning models with application to fuzzy 
control, IEEE Trans. Syst. Man Cybern. 24(1), 39-47 
(1994) 

0. Castillo, R. Sepulveda, P. Melin, 0. Montiel: Evo- 
lutionary optimization of interval type-2 member- 


76.18 


76.19 


76.20 


76.21 


76.22 


76.23 


76.24 


76.25 


76.26 


76.27 


76.28 


76.29 


ship functions, Proc. 2006 Int. Conf. Artif. Intell. ICAI 
2006, Las Vegas (2006) pp. 558-564 

R. Sepulveda, 0. Castillo, P. Melin, 0. Montiel, 
L.T. Aguilar: Evolutionary optimization of interval 
type-2 membership functions using the human 
evolutionary model, FUZZ-IEEE (2007) pp. 1-6 

R. Sepulveda, 0. Montiel-Ross, 0. Castillo, P. Melin: 
Optimizing the MFs in type-2 fuzzy logic controllers, 
using the human evolutionary model, Int. Rev. Au- 
tom. Control 3(1), 1-10 (2010) 

0. Castillo, P. Melin, A.A. Garza, 0. Montiel, 
R. Sepulveda: Optimization of interval type-2 fuzzy 
logic controllers using evolutionary algorithms, Soft 
Comput. 15(6), 1145-1160 (2011) 

C. Wagner, H. Hagras: A genetic algorithm based ar- 
chitecture for evolving type-2 fuzzy logic controllers 
for real world autonomous mobile robots, Fuzzy 
Syst. Conf, Proc. 2007. FUZZ-IEEE 2007, London (2007) 
pp. 1-6 

J.E. Bonilla, V.H. Grisales, M.A. Melgarejo: Genetic 
tuned FPGA based PD fuzzy LUT controller, 10th IEEE 
Int. Conf. Fuzzy Syst. (2001) pp. 1084-1087 

S. Sanchez-Solano, A.J. Cabrera, |. Baturone: FPGA 
implementation of embedded fuzzy controllers 
for robotic applications, IEEE Trans. Ind. Electron. 
54(4), 1937-1945 (2007) 

J.L. González, 0. Castillo, L.T. Aguilar: FPGA as a tool 
for implementing non-fixed structure fuzzy logic 
controllers, IEEE Symp. Found. Comput. Intell. 2007. 
FOCI 2007 (2007) pp. 523-530 

0. Montiel, Y. Maldonado, R. Sepulveda, 0. Castillo: 
Simple tuned fuzzy controller embedded into an 
FPGA, Fuzzy Inf. Proc. Soc. 2008. NAFIPS 2008. Annu. 
Meet. North Am. (2008) pp. 1-6 

0. Montiel, J. Olivas, R. Sepulveda, 0. Castillo: De- 
velopment of an embedded simple tuned fuzzy 
controller, IEEE Int. Conf. Fuzzy Syst., FUZZ-IEEE 
2008, IEEE World Congr. Comput. Intell. (2008) 
pp. 555-561 

Y. Maldonado, 0. Montiel, R. Sepulveda, 0. Castillo: 
Design and simulation of the fuzzification stage 
through the Xilinx system generator. In: Soft 
Computing for Hybrid Intelligent Systems, Stud- 
ies in Computational Intelligence, Vol. 154, ed. 
by 0. Castillo, P. Melin, J. Kacprzyk, W. Pedrycz 
(Springer, Berlin, Heidelberg 2008) pp. 297-305 
J.A. Olivas, R. Sepulveda, 0. Montiel, 0. Castillo: 
Methodology to test and validate a VHDL infer- 
ence engine through the Xilinx system generator. 
In: Soft Computing for Hybrid Intelligent Systems, 
Studies in Computational Intelligence, Vol. 154, ed. 
by 0. Castillo, P. Melin, J. Kacprzyk, W. Pedrycz 
(Springer, Berlin, Heidelberg 2008) pp. 325-331 

G. Lizárraga, R. Sepulveda, 0. Montiel, 0. Castillo: 
Modeling and simulation of the defuzzification 
stage using Xilinx system generator and simulink. 


1475 


92| D Hed 


1476 Part G 


Hybrid Systems 


91 | D Hed 


76.30 


76.31 


76.32 


76.33 


76.34 


76.35 


76.36 


76.37 


76.38 


76.39 


76.40 


76.41 


76.42 


76.43 


76.44 


In: Soft Computing for Hybrid Intelligent Systems, 
Studies in Computational Intelligence, Vol. 154, ed. 
by 0. Castillo, P. Melin, J. Kacprzyk, W. Pedrycz 
(Springer, Berlin, Heidelberg 2008) pp. 333-343 

M. Grégory, U. Andres, P. Carlos-Andres, S. Eduardo: 
A dynamically-reconfigurable FPGA platform for 
evolving fuzzy systems, Lect. Notes Comput. Sci. 
3512, 296-359 (2005) 

Y. Maldonado, 0. Castillo, P. Melin: Optimiza- 
tion of membership functions for an incremental 
fuzzy PD control based on genetic algorithms. In: 
Soft Computing for Intelligent Control and Mobile 
Robotics, Studies in Computational Intelligence, 
ed. by 0. Castillo, J. Kacprzyk, W. Pedrycz (Springer, 
Berlin Heidelberg 2011) pp. 195-211 

R.M.A. Melgarejo, C.A. Peña-Reyes: Hardware ar- 
chitecture and FPGA implementation of a type-2 
fuzzy system, Proc. 14th ACM Great Lakes Symp. VLSI 
(2004) pp. 458-461 

C. Lynch, H. Hagras, V. Callaghan: Parallel type-2 
fuzzy logic co-processors for engine management, 
IEEE Int. Conf. Fuzzy Syst., FUZZ-IEEE (2007) pp. 1-6 
R. Sepulveda, 0. Montiel, 0. Castillo, P. Melin: Em- 
bedding a high speed interval type-2 fuzzy con- 
troller for a real plant into an FPGA, Appl. Soft 
Comput. 12(3), 988-998 (2012) 

L.A. Zadeh: Fuzzy sets, Inf. Control 8(3), 338-353 
(1965) 

L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning -l, Inf. 
Sci. 8(3), 199-249 (1975) 

J.M. Mendel: Type-2 fuzzy sets: Some questions and 
answers, IEEE Connect. Newsl. IEEE Neural Netw. 
Soc. 1, 10-13 (2003) 

J.S.R. Jang, C.T. Sun, E. Mizutani: Neuro-Fuzzy 
and Soft Computing: A Computational Approach to 
Learning and Machine Intelligence (Prentice Hall, 
Upper Saddle River 1997) 

J.M. Mendel: Uncertainty Rule-Based Fuzzy Logic 
Systems: Introduction and New Directions (Prentice 
Hall, Upper Saddle River 2001) 

J.M. Mendel: Type-2 fuzzy sets and systems: An 
overview, IEEE Comput. Intell. Mag. 2(2), 20-29 
(2007) 

J.M. Mendel, R.I.B. John: Type-2 fuzzy sets made 
simple, IEEE Trans. Fuzzy Syst. 10(2), 117-127 (2002) 
D. Wu: Approaches for reducing the computa- 
tional cost of interval type-2 fuzzy logic controllers: 
overview and comparison, IEEE Trans. Fuzzy Syst. 
21(1), 80-99 (2013) 

D. Wu: Approaches for Reducing the Computa- 
tional Cost of Interval Type-2 Fuzzy Logic Systems: 
Overview and Comparisons, IEEE Trans. Fuzzy Syst. 
21(1), 80-99 (2013) 

K. Duran, H. Bernal, M. Melgarejo: Improved it- 
erative algorithm for computing the generalized 
centroid of an interval type-2 fuzzy set, Fuzzy Inf. 
Proc. Soc. 2008. NAFIPS 2008. Annu. Meet. North 
Am. (2008) pp. 1-6 


76.45 


76.46 


76.47 


76.48 


76.49 


76.50 


76.51 


76.52 


76.53 


76.54 


76.55 


76.56 


76.57 


76.58 


76.59 


76.60 


D. Wu, M. Nie: Comparison and practical imple- 
mentation of type reduction algorithms for type-2 
fuzzy sets and systems, Proc. IEEE Int. Conf. Fuzzy 
Syst. (2008) pp. 2131-2138 

L.T. Ngo, D.D. Nguyen, L.T. Pham, C.M. Luong: Speed 
up of interval type-2 fuzzy logic systems based on 
GPU for robot navigation, Adv. Fuzzy Syst. 2012, 
475894 (2012), doi: 10.1155/2012/698062 

P. Sundararajan: High Performance Computing Us- 
ing FPGAs, Xilinx. White Paper: FPGA. WP375 (v1.0), 
1-15 (2010) 

IBM Redbooks: The Power4 Processor Introduction 
and Tuning Guide, 1st edn. (IBM, Austin 2001) 

J. Yiu: The Definitive Guide To The ARM CORTEX-MO 
(Newnes, Oxford 2011) 

S.P. Dandamudi: Fundamentals of Computer Orga- 
nization and Design (Springer, Berlin, Heidelberg 
2003) 

T. Tauber, G. Runger: Parallel Programming: For 
Multicore and Cluster Systems (Springer, Berlin, 
Heidelberg 2010) 

K.P. Abdulla, M.F. Azeem: A novel programmable 
CMOS fuzzifiers using voltage-to-current converter 
circuit, Adv. Fuzzy Syst. 2012, 419370 (2012), doi: 
10.1155/2012/419370 

D. Fikret, G.Z. Sezgin, P. Banu, C. Ugur: ASIC imple- 
mentation of fuzzy controllers: A sampled-analog 
approach, 21st Eur. Solid-State Circuits Conf. 1995 
ESSCIRC '95. (1995) pp. 450-453 

L. Kourra, Y. Tanaka: Dedicated silicon solutions for 
fuzzy logic systems, IEE Colloquium on 2 Decades 
Fuzzy Contr. Part 13, 311-312 (1993) 

M. Khosla, R.K. Sarin, M. Uddin: Design of an 
analog CMOS based interval type-2 fuzzy logic con- 
troller chip, Int. J. Artif. Intell, Expert Syst. 2(4), 
167-183 (2011) 

C. Bobda: Introduction to Reconfigurable Comput- 
ing. Architectures, Algorithms, and Applications 
(Springer, Berlin, Heidelberg 2007) 

P.C. Pong: FPGA Prototyping by VHDL Examples (Wi- 
ley, Hoboken 2008) 

M. Oscar, S. Roberto, M. Yazmin, C. Oscar: De- 
sign and simulation of the type-2 fuzzification 
stage: Using active membership functions. In: Evo- 
lutionary Design of Intelligent Systems in Model- 
ing, Simulation and Control, Studies in Compu- 
tational Intelligence, Vol. 257, ed. by O. Castillo, 
W. Pedrycz, J. Kacprzyk (Springer, Berlin, Heidelberg 
2009) pp. 273-293 

S. Roberto, M. Oscar, 0. José, C. Oscar: Methodol- 
ogy to test and validate a VHDL inference engine 
of a type-2 FIS, through the Xilinx system genera- 
tor. In: Evolutionary Design of Intelligent Systems in 
Modeling, Simulation and Control, Studies in Com- 
putational Intelligence, Vol. 257, ed. by 0. Castillo, 
W. Pedrycz, J. Kacprzyk (Springer, Berlin/Heidelberg 
2009) pp. 295-308 

S. Roberto, M.-R. Oscar, C. Oscar, M. Patricia: Em- 
bedding a KM type reducer for high speed fuzzy 


Evolving Embedded Fuzzy Controllers 


References 


76.61 


76.62 


controller into an FPGA. In: Soft Computing in 
Industrial Applications, Advances in Intelligent 
and Soft Computing, Vol. 75, ed. by X.-Z. Gao, 
A. Gaspar-Cunha, M. Köppen, G. Schaefer, J. Wang 
(Springer, Berlin/Heidelberg 2010) pp. 217-228 

S. Roberto, M. Oscar, L. Gabriel, C. Oscar: Mod- 
eling and simulation of the defuzzification stage 
of a type-2 fuzzy controller using the Xilinx sys- 
tem generator and simulink. In: Evolutionary De- 
sign of Intelligent Systems in Modeling, Simulation 
and Control, Studies in Computational Intelligence, 
Vol. 257, ed. by 0. Castillo, W. Pedrycz, J. Kacprzyk 
(Springer, Berlin/Heidelberg 2009) pp. 309-325 

0. Montiel-Ross, J. Quiñones, R. Sepúlveda: De- 
signing high-performance fuzzy controllers com- 
bining ip cores and soft processors, Adv. Fuzzy Syst. 
2012, 1-11 (2012) 


76.63 


76.64 


76.65 


76.66 


76.67 


76.68 


D.B. Fogel, T. Back: An introduction to evolutionary 
computation. In: Evolutionary Computation. The 
Fossile Record, ed. by D.F. Fogel (IEEE, New York 
1998) 

W. Lie, W. Feng-yan: Dynamic partial reconfigura- 
tion in FPGAs, 3rd Int. Symp. Intell. Inf. Technol. 
Appl. (2009) pp. 445-448 

D. Lim, M. Peattie: Two Flows for Partial Reconfig- 
uration: Module Based or Small Bit Manipulations. 
XAPP290. May 17, 2007 

Emil Eto: Difference-Based Partial Reconfiguration. 
XAPP290. December 3, 2007 

Xilinx: Early Access Partial Reconfiguration User 
Guide For ISE 8.1.01i, UG208 (v1.1). May 6, 2012 
C.-S. Choi, H. Lee: A self-reconfigurable adaptive 
FIR filter system on partial reconfiguration plat- 
form, IEICE Trans. 90-D(12), 1932-1938 (2007) 


1477 


92| D Hed 


77. Multiobjective Genetic Fuzzy Systems 


Hisao Ishibuchi, Yusuke Nojima 


This chapter explains evolutionary multiobjective 
design of fuzzy rule-based systems in comparison 
with single-objective design. Evolutionary algo- 
rithms have been used in many studies on fuzzy 
system design for rule generation, rule selection, 
input selection, fuzzy partition, and membership 
function tuning. Those studies are referred to as 
genetic fuzzy systems because genetic algorithms 
have been mainly used as evolutionary algorithms. 
In many studies on genetic fuzzy systems, the ac- 
curacy of fuzzy rule-based systems is maximized. 
However, accuracy maximization often leads to the 
deterioration in the interpretability of fuzzy rule- 
based systems due to the increase in their com- 
plexity. Thus, multiobjective genetic algorithms 
were used in some studies to maximize not only 
the accuracy of fuzzy rule-based systems but also 
their interpretability. Those studies, which can be 
viewed as a subset of genetic fuzzy system stud- 
ies, are referred to as multiobjective genetic fuzzy 
systems (MoGFS). A number of fuzzy rule-based 
systems with different complexities are obtained 
along the interpretability—accuracy tradeoff curve. 
One extreme of the tradeoff curve is a simple highly 
interpretable fuzzy rule-based system with low 
accuracy while the other extreme is a complicated 
highly accurate one with low interpretability. In 
MoGFS, multiple accuracy measures such as a true 
positive rate and a true negative rate can be si- 
multaneously used as separate objectives. Multiple 
interpretability measures can also be simultane- 
ously used in MoGFS. 


77.1 Fuzzy System Design 


A fuzzy rule-based system is a set of fuzzy rules, which 
has been successfully used as a nonlinear controller in 
various real-world applications. The basic structure of 
fuzzy rules for multi-input and single-output fuzzy con- 


77.1 Fuzzy System Design .....................00066 1479 
77.2 Accuracy Maximization......................... 1482 
77.2.1 Types of Fuzzy Rules .................. 1482 
77.2.2 Types of Fuzzy Partitions ............ 1485 


77.2.3 Handling 

of High-Dimensional Problems 

with Many Input Variables ......... 1486 
77.2.4 Hybrid Approaches 

with Neural Networks 


and Genetic Algorithms ............. 1486 
77.3 Complexity Minimization...................... 1487 
77.3.1 Decreasing the Number 
Of FUZZY RUES ersinnen 1487 
77.3.2 Decreasing the Number 
of Antecedent Conditions........... 1487 
77.3.3 Other Interpretability 
Improvement Approaches .......... 1488 
77.4 Single-Objective Approaches................ 1489 
77.4.1 Use of Scalarizing Functions ....... 1489 
77.4.2 Handling of Objectives 
as Constraint Conditions............. 1490 
77.4.3 Minimization of the Distance 
to the Reference Point............... 1490 


77.5 Evolutionary Multiobjective Approaches 1491 
77.5.1 Basic Idea of Evolutionary 


Multiobjective Approaches ......... 1491 

77.5.2 Various Evolutionary 
Multiobjective Approaches ......... 1492 
77.5.3 Future Research Directions ......... 1493 
TLO- COMCIISION . oiccen 1494 
Referentes ieies ca sescseveatcaecdenes 1494 


trol can be written as follows [77.1—3] 


Rule R; : If xı is Aj; and ... and x, is Agn 77.) 
then y is B; , i 


1479 


v 
fa] 

=} 

(a 
(e) 
N 
N 
= 


1480 PartG 


Hybrid Systems 


22 | D Hed 


where q is a rule index, R, is the label of the qth rule, n 
is the number of input variables, x; is the ith input vari- 
able (i = 1,2,...,m), Ag; is an antecedent fuzzy set for 
the ith input variable x;, y is an output variable, and B, 
is a consequent fuzzy set for the output variable y. 
The antecedent and consequent fuzzy sets A,; and B, 
are specified by their membership functions pa, (xi) 
and ug, (y), respectively. Examples of antecedent fuzzy 
sets are shown in Fig. 77.1 where the domain inter- 
val [0,1] of the input variable x; is partitioned into 
three fuzzy sets small, medium, and large with trian- 
gular membership functions. 

Fuzzy rules of the form in (77.1) are based on 
the concept of linguistic variables by Zadeh [77.4— 
6]. According to Zadeh [77.4-6], a fuzzy set with 
a linguistic meaning such as small and large is re- 
ferred to as a linguistic value while a variable with 
linguistic values is called a linguistic variable. For ex- 
ample, in Fig. 77.1, the three fuzzy sets are linguistic 
values while x; is a linguistic variable. In our daily 
life, we almost always use linguistic variables and lin- 
guistic values. When we say your car is fast but my 
car is slow, the speed of cars is a linguistic variable 
while fast and slow are linguistic values. When we 
say it is hot today, the temperature is a linguistic vari- 
able while hot is a linguistic value. Of course, we 
use those linguistic values without explicitly specify- 
ing their meanings by membership functions. However, 
we have our own vague definitions of those linguis- 
tic values, which may be approximately represented by 
membership functions. 

The main advantage of fuzzy rule-based systems 
over other nonlinear models such as multilayer feedfor- 
ward neural networks is their linguistic interpretability. 
In Fig. 77.2, we show a two-input and single-output 
fuzzy rule-based system with the following nine fuzzy 


Membership value 
A 


1.0 


Small 


> 
0 1.0 
Input variable x; 


Fig. 77.1 Three antecedent fuzzy sets small, medium, and 
large 


rules 


Rule R, : If x 


is small and x2 is small 


then y is medium , 


Rule R : If x 


is small and x2 is medium 


then y is small , 


Rule R3 : If x 
then y is m 
Rule R4 : If x 


is small and x is large 
edium , 


is medium and xz is small 


then y is small , 


Rule Rs : If x 
then y is m 
Rule Re : If x 


is medium and x2 is medium 
edium , 


is medium and xp is large 


then y is large , 


Rule R3 : If x 
then y is m 
Rule Rg : If x 


is large and x2 is small 
edium , 


is large and xz is medium 


then y is large , 


Rule Ro : If x 


then y is m 


is large and x2 is large 


edium . 


A linguistic value in each cell in Fig. 77.2 shows the 


Input variable x2 


1.0 


consequent fuzzy set of the corresponding fuzzy rule. 


Medium 


1.0 
Input variable x, 


Fig. 77.2 A two-input and single-output fuzzy rule-based 
system with nine fuzzy rules 


Multiobjective Genetic Fuzzy Systems | 77.1 Fuzzy System Design 


For example, medium in the bottom-right cell shows 
the consequent fuzzy set of the fuzzy rule R7 with the 
antecedent fuzzy set large for x; and small for x2. Let 
us assume that the consequent fuzzy sets in Fig. 77.2 
are defined by the triangular membership functions in 
Fig. 77.1. Then, we can roughly understand the shape 
of the two-input and single-output nonlinear function 
represented by the fuzzy rule-based system in Fig. 77.2 
(even when we do not know anything about fuzzy rea- 
soning). 

It is easy to linguistically understand the input— 
output relation of the fuzzy rule-based system in 
Fig. 77.2. That is, the fuzzy rule-based system in 
Fig. 77.2 has high interpretability. However, it is dif- 
ficult to approximate a complicated highly nonlinear 
function by such a simple 3 x 3 fuzzy rule-based system. 
More membership functions for the input and output 
variables may be needed for improving the accuracy of 
fuzzy rule-based systems. The tuning of each member- 
ship function may be also needed. Theoretically, fuzzy 
rule-based systems are universal approximators of non- 
linear functions. This property has been shown for 
fuzzy rule-based systems [77.7—9] and multilayer feed- 
forward neural networks [77.10-12]. This means that 
fuzzy rule-based systems as well as neural networks 
have high approximation ability of nonlinear functions. 

In Fig. 77.3, we show an example of a tuned 
7x7 fuzzy partition of the two-dimensional input 
space [0, 1] x [0, 1]. We can design a much more ac- 


Input variable x2 
1.0 


Input variable xı 


Fig. 77.3 A tuned 7 x 7 fuzzy partition 


curate fuzzy rule-based system by using such a tuned 
fuzzy partition than the simple 3 x 3 fuzzy partition in 
Fig. 77.2. That is, we can say that Fig. 77.3 is a better 
fuzzy partition than Fig. 77.2 with respect to the ac- 
curacy of fuzzy rule-based systems. However, it is very 
difficult to linguistically interpret each antecedent fuzzy 
set in Fig. 77.3. In other words, it is very difficult to as- 
sign an appropriate linguistic value such as small and 
large to each antecedent fuzzy set in Fig. 77.3. Thus, 
we can say that the fuzzy partition in Fig. 77.3 does not 
have high linguistic interpretability. That is, Fig. 77.2 is 
a better fuzzy partition than Fig. 77.3 with respect to the 
linguistic interpretability of fuzzy rule-based systems. 
As shown by the comparison between the two fuzzy 
partitions in Figs. 77.2 and 77.3, accuracy maximiza- 
tion usually conflicts with interpretability maximization 
in the design of fuzzy rule-based systems. 

Let us denote a fuzzy rule-based system by S. The 
fuzzy rule-based system S is a set of fuzzy rules. In 
fuzzy system design, the accuracy of S$ is maximized. 
The accuracy maximization of S is usually formulated 
as the following error minimization 


Minimize f (S$) = Error(S) , (77.2) 


where f(S) is an objective function to be minimized, 
and Error(S) is an error measure. 

As shown in Fig. 77.3, the accuracy maximization 
often leads to a complicated fuzzy rule-based system 
with low interpretability. Thus, a complexity measure 
is combined into the objective function in (77.2) as fol- 
lows [77.13, 14] 


Minimize f (S) = wı Complexity(S) + w2 Error(S) , 
(77.3) 


where w; and wp are nonnegative weights, and 
Complexity(S) is a complexity measure. 

In the late 1990s, the idea of multiobjective fuzzy 
system design [77.15] was proposed where the accuracy 
maximization and the complexity minimization were 
handled as separate objectives 


Minimize fı (S) = Complexity(S) and 


(77.4) 
fa(S) = Error(S) , 
where fı (S) and f2 (S) are separate objectives to be min- 
imized. 
The two-objective optimization problem in (77.4) 
does not have a single optimal solution that simulta- 
neously optimizes the two objectives f\(S) and f (S). 


1481 


22 | D Hed 


1482 


722 | 9 Hed 


Part G 


Hybrid Systems 


This is because the error minimization increases the 
complexity of fuzzy rule-based systems (i.e., the op- 
timization of f,(S) deteriorates fı (S)). That is, the two 
objectives fı (S) and f2 (S) in (77.4) are conflicting with 
each other. In general, a multiobjective optimization 
problem has a number of nondominated solutions with 
different tradeoffs among the conflicting objectives. 
Those solutions are referred to as Pareto optimal solu- 
tions. The two-objective optimization problem in (77.4) 
has a number of nondominated fuzzy rule-based sys- 
tems with different complexities (Figs. 77.2 and 77.3). 

In Fig. 77.4, we illustrate the concept of complex- 
ity-accuracy tradeoff in the design of fuzzy rule-based 
systems. The horizontal axis of Fig. 77.4 shows the 
values of the complexity measure (i. e., Complexity(S)) 
while the vertical axis shows the values of the error 
measure (i.e., Error(S)). Around the top-left corner 
of Fig. 77.4, we have simple fuzzy rule-based sys- 
tems with high interpretability and low accuracy (e.g., 
a simple 3 x 3 fuzzy rule-based system in Fig. 77.2). 
The improvement in their accuracy increases their com- 
plexity. By minimizing the error measure Error(S), we 
have complicated fuzzy rule-based systems with high 
accuracy and low interpretability around the bottom- 
right corner of Fig. 77.4 (e.g., a tuned 7x7 fuzzy 
rule-based system in Fig. 77.3). In Fig. 77.4, we have 
many nondominated fuzzy rule-based systems along the 
complexity—accuracy tradeoff curve. It should be noted 
that there exist no fuzzy rule-based systems around the 
bottom-left corner (i. e., no ideal fuzzy rule-based sys- 


77.2 Accuracy Maximization 


In this section, we briefly explain various approaches 
proposed for improving the accuracy of fuzzy rule- 
based systems. Those approaches often deteriorate the 
interpretability. 


77.2.1 Types of Fuzzy Rules 


Fuzzy rules of the form in (77.1) have been successfully 
used in fuzzy controllers since Mamdani’s pioneering 
work in 1970s [77.23, 24]. Those fuzzy rules have of- 
ten been called Mamdani-type fuzzy rules or Mamdani 
fuzzy rules. A heuristic rule generation method of such 
a fuzzy rule from numerical data was proposed by Wang 
and Mendel [77.25], which has been used for function 
approximation. 


o A 
bD 
a 
a Interpretable 
fuzzy system 
x 
f be ee aa a as za 
a 7 Sate = a SN 
ee \ Sane Ss ~h 
` ‘N - PiS 
2 Sai awe N 
A Mg ee a 
Se, a ae et 
tne tt eee eee. allio! ON 
| N na Ii 
~~- Accurate 
a `~- fuzzy system 
S 
n > 


Simple <— Complexity(S) ———»* Complicated 


Fig. 77.4 Nondominated fuzzy rule-based systems with 
different complexity—accuracy tradeoffs 


tems with high accuracy and high interpretability). This 
is because the two objectives in Fig. 77.4 are conflicting 
with each other. 

Since the late 1990s, a number of multiobjective 
approaches have been proposed for fuzzy system de- 
sign [77.16-19]. In this chapter, we explain the basic 
idea of multiobjective fuzzy system design using multi- 
objective evolutionary algorithms [77.20—22]. Whereas 
we started with fuzzy rules for fuzzy control in (77.1), 
our explanations in this chapter are mainly about mul- 
tiobjective design of fuzzy rule-based systems for pat- 
tern classification. This is because early multiobjective 
approaches were mainly proposed for pattern classifica- 
tion problems. 


A well-known idea for improving the approxima- 
tion ability of fuzzy rules in (77.1) is the use of a linear 
function instead of a linguistic value in the consequent 
part 


Rule R, : If x; is Ag, and ... and x, is Agn 
then y = bgo + bai x1 + bg2X2 +++ + DgnXn , 
(77.5) 


where b,; is a real number coefficient (i = 0,1,...,7). 
Fuzzy rules of this type were proposed by Takagi and 
Sugeno [77.26]. A fuzzy rule-based system with fuzzy 
rules in (7.5) is referred to as a Takagi-Sugeno model. 
The use of a linear function instead of a linguistic value 


Multiobjective Genetic Fuzzy Systems | 77.2 Accuracy Maximization 


in the consequent part of fuzzy rules clearly increases 
the accuracy of fuzzy rule-based systems. However, it 
degrades their interpretability. 

The following simplified version of fuzzy rules in 
Takagi—Sugeno models has been also used 


Rule R, : If xı is Ag; and ... and x, is Agn 
(77.6) 
then y = b4 , 


where bg is a consequent real number. It is easy to 
tune the consequent real number of each fuzzy rule. 
This is the main advantage of simplified fuzzy rules 
in (77.6). Thus, simplified fuzzy rules have often been 
used in trainable fuzzy rule-based systems called neuro- 
fuzzy systems [77.27—29]. In those studies, antecedent 
fuzzy sets as well as consequent real numbers are ad- 
justed in the same manner as the learning of neural 
networks. 

Due to their simple structure, simplified fuzzy 
rules in (77.6) may have higher interpretability than 
Takagi—Sugeno fuzzy rules in (77.5). However, it is 
usually difficult to linguistically interpret a conse- 
quent real number. Thus, the linguistic interpretability 
of simplified fuzzy rules in (77.6) is usually viewed 
as being limited if compared with Mamdani fuzzy 
rules with a linguistic value in their consequent part 
in (77.1). 

For pattern classification problems, three types of 
fuzzy rules have been used in the literature [77.30]. The 
simplest structure of fuzzy rules for pattern classifica- 
tion problems is as follows 


Rule R; : If xı is Ag; and ... and x, is Agn 


(77.7) 
then Class C; , 


where C; is a consequent class. 

The compatibility grade of an input pattern x, = 
p1; X2; - - - +%pn) With the antecedent part of the fuzzy 
rule R, in (77.7) is usually calculated by the minimum 
or product operator. In this chapter, we use the follow- 
ing product operator 


MA, (xp) = HA p1) Haga (x2) se MAgn (Xn) (77.8) 


where A, = (Agi, A42, . - - , Aqn) is an antecedent fuzzy 
set vector, and /14,(x,) shows the compatibility of x, 
with the antecedent fuzzy set vector A,. 

Let S be a set of fuzzy rules of the form in 
(77.7). The rule set S can be viewed as a fuzzy 
rule-based classifier. When an input pattern x, = 
(X%p1,%p2,-+++Apn) is presented to S, x, is classified 


by a single winner rule with the maximum compat- 
ibility. Such a single winner-based fuzzy reasoning 
method has been frequently used in fuzzy rule-based 
classifiers. 

Let us assume that we have nine fuzzy rules in 
Fig. 77.5 for a pattern classification problem with the 
two-dimensional pattern space [0, 1] x [0, 1]. A different 
consequent class is assigned to each rule in Fig. 77.5 
for explanation purposes. The grid lines in the pattern 
space in Fig. 77.5 show the classification boundary be- 
tween different classes when we use the single winner- 
based fuzzy reasoning method together with the product 
operator-based compatibility calculation. It should be 
noted that the classification boundary by the nine fuzzy 
tules in Fig. 77.5 can also be generated by nine non- 
fuzzy rules with interval antecedent conditions [77.31, 
32]. 

The second type of fuzzy rules for pattern classifi- 
cation problems has a rule weight [77.30] 


Rule R; : If xı is Ag; and ... and x, is Agn 


: (77.9) 
then Class C, with CF, , 


where CF; is a real number in the unit interval [0, 1], 
which is called a rule weight or a certainty fac- 
tor. This type of fuzzy rules has been used in many 
studies on fuzzy rule-based classifiers since the early 
1990s [77.33, 34]. 


Input variable x2 
1.0 


Class 3 


Class 2 Class 5 


0 1.0 
Input variable x, 


Fig. 77.5 A fuzzy rule-based classifier with nine fuzzy 
classification rules 


14.83 


722 | 9 Hed 


1484 Part G | Hybrid Systems 


722 | 9 Hed 


d) CF; = CF; = CF, = 0.25 
1.0 


Fig. 77.6a-d Classification boundaries generated by assigning a different rule weight to each of the nine fuzzy rules in 
Fig. 77.5. In each plot, the default setting of CF, is 1.0 (e.g., CF, = 1.0 for q = 4,5, 6,7, 8,9 in (b)) 


When an input pattern x, is presented to a fuzzy 
rule-based classifier with fuzzy rules of the form in 
(77.9), a single winner rule is determined using the 
product of the compatibility 14, (xp) of x, with each 
rule R, and its rule weight CF 4: HA, (x,)CFy. 

Fuzzy rules with a rule weight have higher clas- 
sification ability than those with no rule weight. For 
example, the classification boundary in Fig. 77.5 can 
be adjusted by assigning a different rule weight to 
each rule (without changing the shape of each an- 
tecedent fuzzy set). Examples of the adjusted classi- 
fication boundaries are shown in Fig. 77.6. As shown 
in Fig. 77.6, the accuracy of fuzzy rule-based classi- 
fiers can be improved by using fuzzy rules with a rule 
weight. However, the use of a rule weight degrades 


the interpretability of fuzzy rule-based classifiers. It is 
a controversial issue to compare the interpretability of 
fuzzy rule-based classifiers between the following two 
approaches: One is the use of fuzzy rules with a rule 
weight and the other is the modification of antecedent 
fuzzy sets [77.35, 36]. 

The third type of fuzzy rules has multiple rule 
weights as follows [77.30] 


Rule R; : If xı is Ag; and ... and x, is Ag, 
then Class C; with CF41 ,...,Class Cm (77.10) 
with CF gn , 


where m is the number of classes and CF; is a real 
number in the unit interval [0, 1], which can be viewed 


Multiobjective Genetic Fuzzy Systems | 77.2 Accuracy Maximization 


Input variable x2 
1.0¢ 


1.0 
Input variable x, 


Fig. 77.7 Examples of fuzzy rules in an approximative 
fuzzy rule-based system 


as a rule weight for the jth class G (j= 1,2,...,m). 
When we use the single winner-based fuzzy reasoning 
method, the classification result of each pattern de- 
pends only on the maximum rule weight CF, of each 
rule (i.e., CF, = max{CF 4, CF 2, ...,CFgm}). Thus, 
the use of multiple rule weights in (77.9) is meaning- 
less under the single winner rule-based fuzzy reasoning 
method. However, they can improve the accuracy of 
fuzzy rule-based classifiers when we use a voting-based 
fuzzy reasoning method [77.30, 37]. Of course, the use 
of multiple rule weights further degrades the inter- 
pretability of fuzzy rule-based classifiers. 


77.2.2 Types of Fuzzy Partitions 


Since Mamdani’s pioneering work in the 1970s [77.23, 
24], grid-type fuzzy partitions have frequently been 
used in fuzzy control (e.g., the 3 x 3 fuzzy partition in 
Fig. 77.2). Such a grid-type fuzzy partition has high 
interpretability when it is used for two-dimensional 
problems (i. e., for the design of two-input single-output 
fuzzy rule-based systems). However, grid-type fuzzy 
partitions have the following two difficulties. One diffi- 
culty is the inflexibility of membership function tuning. 
Since each antecedent fuzzy set is used in multiple 
fuzzy rules, membership function tuning for improving 
the accuracy of one fuzzy rule may degrade the accu- 
racy of some other fuzzy rules. The other difficulty is 


the exponential increase in the number of fuzzy rules 
with respect to the number of input variables. Let L 
be the number of antecedent fuzzy sets for each of 
the n variables. In this case, the number of cells in 
the corresponding n-dimensional fuzzy grid is L” (e.g., 
510 — 9765 625 when L = 5 and n = 10). 

These two difficulties can be removed by assign- 
ing different antecedent fuzzy sets to each fuzzy rule 
as shown in Fig. 77.7. Each fuzzy rule has its own an- 
tecedent fuzzy sets. That is, no antecedent fuzzy set is 
shared by multiple fuzzy rules. 

Fuzzy rule-based systems with this type of fuzzy 
rules are referred to as approximative models whereas 
grid-type fuzzy rule-based systems such as Fig. 77.2 are 
called descriptive models [77.38, 39]. If the accuracy 
of fuzzy rule-based systems is much more important 
than their interpretability, approximative models may 
be a better choice than descriptive models. Approx- 
imative models have been used as fuzzy rule-based 
classifiers since the early 1990s [77.40, 41]. 

One limitation of approximative models with re- 
spect to accuracy maximization is that every antecedent 
fuzzy set is defined on a single input variable. As 
a result, the shape of a fuzzy subspace covered by 
the antecedent part of each fuzzy rule is rectangular 
as shown in Fig. 77.7. This means that such a fuzzy 
subspace cannot handle any correlation among input 
variables. One approach to the handling of correlated 
subspaces is the use of a single high-dimensional an- 
tecedent fuzzy set in each fuzzy rule 

Rule R, : If x is A,then Class C; , (71-41) 
where x is an n-dimensional input vector (i. e., x = 
(x1, X2, .- -, Xn)) and A; is a n-dimensional antecedent 
fuzzy set directly defined in the n-dimensional input 
space. This type of fuzzy rules has also been used for 
pattern classification problems since the 1990s [77.42]. 
Figure 77.8 illustrates an example of the n-dimensional 
antecedent fuzzy set A, in the case of n = 2. As we can 
see from Fig. 77.8, antecedent fuzzy sets in fuzzy rules 
of the type in (77.11) can cover correlated fuzzy sub- 
spaces of the input space. This characteristic feature is 
an advantage over single-dimensional antecedent fuzzy 
sets with respect to the accuracy of fuzzy rule-based 
systems. However, as we can see from Fig. 77.8, it is 
almost impossible to linguistically interpret a high-di- 
mensional antecedent fuzzy set. That is, the use of high- 
dimensional antecedent fuzzy sets may improve the ac- 
curacy of fuzzy rule-based systems but degrade their 
interpretability. 


1485 


722 | 9 Hed 


1486 


722 | D Hed 


Part G 


Hybrid Systems 


Input variable x2 
1.04 


> 
0 1.0 
Input variable x, 


Fig. 77.8 Illustration of an n-dimensional antecedent 
fuzzy set A, in the two-dimensional input space 


Subsystem 


Subsystem Subsystem 


XxX) X2 X3 X4 X5 


Fig. 77.9 A five-input single-output fuzzy rule-based sys- 
tem with a hierarchical structure 


77.2.3 Handling 
of High-Dimensional Problems 
with Many Input Variables 


As we have already explained, the number of fuzzy 
rules exponentially increases with the number of in- 
put variables when we use a descriptive model with 
a grid-based fuzzy partition. Thus, it looks impracti- 
cal to design a descriptive model for high-dimensional 
problems. 

Approximative models do not have such a diffi- 
culty of grid-based fuzzy partitions. This is because the 
number of fuzzy rules in an approximative model is 
independent from the number of input variables. That 


is, we can design fuzzy rule-based systems for high-di- 
mensional problems by using approximative models. 

One difficulty in the use of approximative models is 
poor interpretability of fuzzy rule-based systems due to 
the following two reasons: (i) it is difficult to linguis- 
tically interpret antecedent fuzzy sets in approximative 
models as shown in Fig. 77.7, and (ii) it is also diffi- 
cult to understand a fuzzy rule with a large number of 
antecedent conditions. 


77.2.4 Hybrid Approaches 
with Neural Networks 
and Genetic Algorithms 


In the 1990s, a large number of learning and opti- 
mization methods were proposed for accuracy maxi- 
mization of fuzzy rule-based systems. Almost all of 
those approaches were hybrid approaches with neu- 
ral networks called neuro-fuzzy systems [77.27-29, 43, 
44] and with genetic algorithms called genetic fuzzy 
systems [77.4548]. In neuro-fuzzy systems, learning 
algorithms of neural networks were utilized for param- 
eter tuning (e.g., for membership function tuning). As 
shown in Fig. 77.3, parameter tuning in fuzzy rule- 
based systems usually leads to accuracy improvement 
and interpretability deterioration. 

Genetic fuzzy systems can be used not only for pa- 
rameter tuning but also for structure optimization such 
as rule selection, input selection and fuzzy partition. 
As we will explain in the next section, rule selection, 
and input selection can improve the interpretability of 
fuzzy rule-based systems by decreasing their complex- 
ity whereas parameter tuning almost always deterio- 
rates their interpretability. Genetic fuzzy systems were 
also used for constructing a hierarchical structure of 
fuzzy rule-based systems [77.49]. Figure 77.9 shows 
an example of a fuzzy rule-based system with a hierar- 
chical structure. The use of hierarchical structures can 
prevent the exponential increase in the number of fuzzy 
rules because each subsystem has only a few inputs 
(e.g., in Fig. 77.9, each subsystem has only two inputs). 
However, it significantly degrades the interpretability of 
fuzzy rule-based systems. This is because the interpre- 
tation of intermediate variables between subsystems is 
usually impossible. 


Multiobjective Genetic Fuzzy Systems | 77.3 Complexity Minimization 


77.3 Complexity Minimization 


In this section, we briefly explain various approaches 
proposed for decreasing the complexity of fuzzy rule- 
based systems. Those approaches improve the inter- 
pretability of fuzzy rule-based systems but often de- 
grade their accuracy. 


77.3.1 Decreasing the Number 
of Fuzzy Rules 


A simple idea for complexity reduction of fuzzy rule- 
based systems is to decrease the number of fuzzy rules. 
Let us consider a three-class pattern classification prob- 
lem in Fig. 77.10. All patterns in Fig. 77.10 can be 
correctly classified by the following nine fuzzy rules 
with the 3 x 3 fuzzy grid in Fig. 77.10 


Rule R; : If x; is small and xz is small 
then Class 2 , 

Rule R3 : If x; is small and x2 is medium 
then Class 2 , 

Rule R; : If xı is small and xp is large 
then Class 1 , 

Rule R; : If xı is medium and xz is small 
then Class 2 , 

Rule Rs : If xı is medium and x2 is medium 
then Class 2 , 

Rule Re : If xı is medium and xp is large 
then Class 1 , 

Rule R3 : If x; is large and x2 is small 
then Class 3 , 

Rule Rg : If x; is large and x2 is medium 
then Class 3 , 

Rule Ro : If x; is large and xp is large 
then Class 3 . 


That is, all patterns in Fig. 77.10 can be correctly 
classified by a fuzzy rule-based classifier with these 
nine fuzzy rules. It is also possible to correctly clas- 
sify all patterns in Fig. 77.10 using a simple fuzzy 
rule-based classifier only with the four fuzzy rules 
around the top-right corner (i. e., fuzzy rules Rs, Ro, Rg, 
and Ro). This example illustrates the simplification of 
fuzzy rule-based systems through rule selection. 

The use of genetic algorithms for fuzzy rule selec- 
tion was proposed by Jshibuchi et al. [77.13, 14] in the 


1990s. Let Say be a set of all fuzzy rules. Since an ar- 
bitrary subset of San can be represented by a binary 
string of length |Say|, standard genetic algorithms for 
binary strings can be directly applied to fuzzy rule se- 
lection [77.13, 14]. The number of fuzzy rules, which 
should be minimized, was used as a part of a fitness 
function in single-objective approaches [77.13, 14]. It 
was also used as a separate objective in multiobjective 
approaches [77.15]. 


77.3.2 Decreasing the Number 
of Antecedent Conditions 


In Fig. 77.10, the rightmost three fuzzy rules (i. e., R7, 
Rg and Ro with the same antecedent condition on xı) 
can be combined into a single fuzzy rule: If x, is large 
then Class 3. This fuzzy rule has no condition on the 
second input variable x2. In this manner, the 3 x 3 fuzzy 
rule-based classifier with the nine fuzzy rules can be 
simplified to a simpler classifier with the seven fuzzy 
tules. 

The fuzzy rule If x, is large then Class 3 is viewed 
as having a don’t care condition on the second in- 
put variable xz: If x; is large and x is don’t care 
then Class 3. In this fuzzy rule, don’t care is a spe- 
cial antecedent fuzzy set that is fully compatible with 


A Class 3 


© Class 1 


L] Class 2 


Input variable x2 


0 1.0 
Input variable x, 


Fig. 77.10 A three-class pattern classification problem and 
a3 x 3 fuzzy grid 


14.87 


€°22| 9 Wed 


1488 Part G | Hybrid Systems 


€22|9 Hed 


a) Projection onto each input variable 
1.0 


Input variable xz 


0 1.0 
Input variable x, 


b) Merging similar antecedent fuzzy sets 


1.0 


Input variable xz 


o l 1.0 
Input variable x, 


Fig. 77.11a,b Projection of two-dimensional fuzzy sets and the merge of similar fuzzy sets. (a) Projection onto each input 


variable, (b) merging similar antecedent fuzzy sets 


any input values. The use of don’t care enables us to 
perform rule-level input selection, which significantly 
improves the applicability of descriptive fuzzy rule- 
based systems to high-dimensional problems [77.50]. 
When we use don’t care as a special antecedent 
fuzzy set, the number of antecedent conditions in 
a fuzzy rule excluding don’t care conditions is referred 
to as the rule length since don’t care conditions are usu- 
ally omitted (e.g., If xı is large and x2 is don’t care 
then Class 3 is usually written as If xı is large then 
Class 3). A short fuzzy rule with a small number of 
antecedent conditions covers a large fuzzy subspace of 
a high-dimensional pattern space while a long fuzzy 
rule covers a small fuzzy subspace. For example, let us 
consider a 50-dimensional pattern classification prob- 
lem with the pattern space [0, 1}°°. A fuzzy rule with 
the antecedent fuzzy set small on all the 50 input vari- 
ables covers less than 1/10! of the pattern space [0, 1]°°. 
However, a short fuzzy rule with the antecedent fuzzy 
set small on only two input variables (e.g., If x; is 
small and x49 is small then Class 3) covers 1/4 of the 
pattern space [0, 1]5?. As a result, almost all of the en- 
tire high-dimensional pattern space can be covered by 
a small number of short fuzzy rules. That is, we can de- 
sign a simple fuzzy rule-based classifier with a small 
number of short fuzzy rules for a high-dimensional 
pattern classification problem. It should be noted that 
different fuzzy rules may have antecedent conditions 


on different input variables. Moreover, the rule length 
of each fuzzy rule may be different (e.g., one fuzzy 
rule has an antecedent condition only on x; while an- 
other fuzzy rule has antecedent conditions on x2, x3 
and x4). 

The total rule length (i. e., the total number of an- 
tecedent conditions), which should be minimized, was 
used as a part of a fitness function in single-objec- 
tive approaches [77.51]. It was also used as a separate 
objective in multiobjective approaches [77.52,53]. In 
multiobjective approaches, the total rule length instead 
of the average rule length has been used in the literature. 
This is because the minimization of the average rule 
length does not necessarily mean the complexity mini- 
mization of fuzzy rule-based systems. In many cases, 
the average rule length can be decreased by adding 
a new fuzzy rule with a single antecedent condition, 
which leads to the increase in the complexity of a fuzzy 
rule-based system. 


77.3.3 Other Interpretability Improvement 
Approaches 


For the design of accurate fuzzy rule-based systems for 
high-dimensional problems, clustering techniques such 
as fuzzy c-means [77.54—56] have often been used to 
generate fuzzy rules [77.42, 57—61]. Fuzzy rules with 
ellipsoidal high-dimensional antecedent fuzzy sets are 


Multiobjective Genetic Fuzzy Systems 


77-4 Single-Objective Approaches 


often obtained from clustering-based fuzzy rule gen- 
eration methods. Fuzzy rules of this type have high 
accuracy but low interpretability. Their interpretability 
is improved by projecting high-dimensional antecedent 
fuzzy sets onto each input variable. As a result, we have 
approximative fuzzy rule-based systems (Fig. 77.1 1a). 
The interpretability of the obtained fuzzy rule-based 
systems can be further improved by merging similar 
antecedent fuzzy sets on each input variable into a sin- 
gle one (Fig. 77.11b). Each of the generated antecedent 
fuzzy sets by a merging procedure is replaced with a lin- 
guistic value to further improve the interpretability of 
fuzzy rule-based systems. 

It should be noted that each of the abovemen- 
tioned interpretability improvement steps (i. e., projec- 


77.4 Single-Objective Approaches 


As we have already explained, the simplest multiob- 
jective formulation of fuzzy system design has two 
objectives (i.e., error minimization and complexity 
minimization) as follows 


Minimize f (S) = (f1(S),fo(S)) 
= (Complexity(S), Error(S)) , 
(17.42) 


where f(S) shows an objective vector. In this sec- 
tion, we explain how the two-objective problem 
in (77.12) can be handled by single-objective ap- 
proaches. For more general and comprehensive ex- 
planations on the handling of multiobjective prob- 
lems through single-objective optimization, see text- 
books on multicriteria decision making such as Miet- 
tinen [77.69]. 


77.4.1 Use of Scalarizing Functions 


One of the most frequently used approaches to multiob- 
jective optimization is the use of scalarizing functions. 
Multiple objective functions are combined into a single 
scalarizing function. That is, a multiobjective prob- 
lem is handled as a single-objective problem. Our two 
objectives in multiobjective fuzzy system design are 
combined as follows 


Minimize f ($) = f(fi(S).f2(S)) 
= f(Complexity(S), Error(S)) , 
(e713) 


tion of high-dimensional antecedent fuzzy sets, merg- 
ing similar fuzzy sets, and replacement with linguistic 
values) deteriorates the accuracy of fuzzy rule-based 
systems. Thus, the design of fuzzy rule-based sys- 
tems can be viewed as being the search for a good 
tradeoff solution between accuracy and interpretabil- 
ity. From this viewpoint, some sophisticated approaches 
were proposed [77.62-68] after a large number of 
accuracy improvement algorithms were proposed in 
1990s. Some of those approaches tried to improve 
the accuracy of fuzzy rule-based systems without 
severely deteriorating their interpretability. Other ap- 
proaches tried to improve the interpretability of fuzzy 
rule-based systems without severely deteriorating their 
accuracy. 


where f(S) is a scalarizing function to be minimized. 
A simple but frequently used scalarizing function is the 
weighted sum 


Minimize f (S) = w, fı (S) + w2 fo(S) 
= w; Complexity(S) 


+ w2 Error(S) , (77.14) 


where w; and w3 are nonnegative weights (w is a weight 
vector: w = (w1, W2)). 

Single-objective optimization algorithms such as 
genetic algorithms are used to search for the optimal 
solution (i. e., optimal fuzzy rule-based system) of the 
minimization problem in (77.13). In Fig. 77.12, we 


toO 


O Nondominated 
fuzzy system 


© Optimal for 
weighted sum 


Small <— Error(S) ——> Large 


SCO0000 


Simple <—— Complexity(S) ———» Complicated 


Fig. 77.12 The optimal fuzzy rule-based system of the 
weighted-sum minimization problem in (77.14) and the 
nondominated fuzzy rule-based systems of the original 
two-objective problem 


1489 


112 | D Wed 


1490 PartG 


Hybrid Systems 


1212 | D Hed 


illustrate the search for the optimal fuzzy rule-based 
system of the weighed-sum minimization problem in 
(77.14) together with the nondominated fuzzy rule- 
based systems of the original two-objective problem in 
(77.12). 

As shown in Fig. 77.12, a single optimal fuzzy rule- 
based system is obtained from a scalarizing function- 
based approach. The main difficulty of this approach is 
the dependency of the obtained fuzzy rule-based sys- 
tem on the choice of a scalarizing function. A different 
fuzzy rule-based system is likely to be obtained from 
a different scalarizing function. For example, a different 
specification of the weight vector in Fig. 77.12 leads to 
a different fuzzy rule-based system. Moreover, an ap- 
propriate choice of a scalarizing function is not easy. 


77.4.2 Handling of Objectives 
as Constraint Conditions 


If we have a pre-specified requirement about the com- 
plexity or the accuracy, we can use it as a constraint 
condition. For example, let us assume that the error 
measure Error(S) in our two-objective problem is the 
classification error rate. We also assume that the upper 
bound of the allowable error rate is given as w%. In this 
case, our two-objective problem can be reformulated as 
the following single-objective problem with a constraint 
condition 


Minimize fı (S) = Complexity(S) 
subject to Error(S) <a. 
(77.15) 


This single-objective problem is to find the simplest 
fuzzy rule-based system among those with a pre-speci- 
fied accuracy (i. e., with error rates smaller than or equal 
toa%). 

It is also possible to use a constraint condition on 
the complexity measure Complexity(S). For example, 
let us assume that Complexity (S) is the number of fuzzy 
rules. We also assume that the upper bound of the allow- 
able number of fuzzy rules is given as £. In this case, 
the following single-objective problem is formulated 


Minimize fə (S) = Error(S) 
subject to Complexity(S) < B . 
(77.16) 


This formulation is illustrated in Fig. 77.13 where the 
optimal solution is the most accurate fuzzy rule-based 


system under the constraint condition Complexity(S) 
<Ê. 

When we have more than two objectives, only a sin- 
gle objective is used as an objective function while all 
the others are used as constraint conditions in this ap- 
proach. That is, an m-objective problem is reformulated 
as a single-objective problem with (m— 1) constraint 
conditions. The main difficulty in this constraint con- 
dition-based approach is an appropriate specification of 
the upper bound for each objective. 


77.4.3 Minimization of the Distance 
to the Reference Point 


In the abovementioned constraint condition-based ap- 
proach, the right-hand side constant for each objective 
is the upper bound of the allowable error or com- 
plexity (e.g., the error rate should be at least smaller 
than or equal to w%). The right-hand side constant 
should be specified so that the formulated constrained 
optimization problem has feasible fuzzy rule-based 
systems. 

A single-objective problem can be also formulated 
when an ideal fuzzy rule-based system is given as a ref- 
erence point in the objective space. We assume that 
the given reference point is outside the feasible re- 
gion of the original two-objective problem in (77.12). 
That is, the ideal fuzzy rule-based system does not 
exist as a feasible solution of the two-objective prob- 
lem. Let the reference point in the two-dimensional 
objective space be f* = (f ,fž). The following sin- 
gle-objective problem can be formulated to search for 
the fuzzy rule-based system closest to the reference 


O 


© Nondominated 
© fuzzy system 


© Optimal for 
constrained 


O optimization 


O 
© 


Small <— Error(S) ——> Large 


ISSO000 


B Complexity(S) 


> 


Fig. 77.13 The optimal fuzzy rule-based system of the 
constrained optimization problem in (77.15) and the non- 
dominated fuzzy rule-based systems of the original two- 
objective problem 


Multiobjective Genetic Fuzzy Systems 


77.5 Evolutionary Multiobjective Approaches 


point 


Minimize distance (f(S),f*) , (77.17) 
where f(S) is the objective vector (i.e, f(S)= 
FiCS), f2(S)), and distance(A, B) is a distance measure 
between the two points A and B in the objective space. 
Various distance measures can be used in (77.17). 
We illustrate the reference point-based approach in 
Fig. 77.14 where the Euclidean distance is used. As 
shown in Fig. 77.14, the fuzzy rule-based system clos- 
est to the given reference point (f¥, f7) is the optimal 
solution of the single-objective problem in (77.17). 
The main difficulty of the reference point-based ap- 
proach is an appropriate specification of the reference 
point. When we have no information about the com- 
plexity and the accuracy of fuzzy rule-based systems, 
it is very difficult to appropriately specify the refer- 
ence point in the reference point-based approach as well 
as the right-hand side constant for each objective in 
the constraint condition-based approach. However, if 
we know the shape of the complexity—accuracy trade- 


o A 

3 O Nondominated 

= O fuzzy system 

f © Optimal for 
distance minimization 

2 

S 

S D) O 

| Gift) O 

x ® 

= Q 

F ÞJ0000 


Simple <— Complexity(S) ———» Complicated 


Fig. 77.14 The optimal fuzzy rule-based system of the dis- 
tance minimization problem from the reference point in 
(77.17) and the nondominated fuzzy rule-based systems of 
the original two-objective problem 


off surface in the objective space (i. e., if we know the 
nondominated fuzzy rule-based systems in the objective 
space), such a parameter specification becomes much 
easier. 


77.5 Evolutionary Multiobjective Approaches 


Since an early study in the 1990s [77.15], various mul- 
tiobjective approaches have been proposed to search for 
a large number of nondominated solutions of multiob- 
jective fuzzy system design problems. In this section, 
we explain the basic idea of those multiobjective ap- 
proaches, recent studies on multiobjective fuzzy system 
design, and future research directions. 


77.5.1 Basic Idea of Evolutionary 
Multiobjective Approaches 


Multiobjective fuzzy system design was first formu- 
lated as a two-objective optimization problem to max- 
imize the accuracy of fuzzy rule-based classifiers 
and to minimize the number of fuzzy rules in the 
1990s [77.15]. Then this two-objective optimization 
problem was extended to a three-objective problem by 
including an additional objective to minimize the total 
tule length (i. e., the total number of antecedent condi- 
tions) in [77.52]. 

The main characteristic feature of evolutionary mul- 
tiobjective approaches to fuzzy system design is that 
a number of nondominated fuzzy rule-based systems 
are obtained by a single run of an evolutionary multiob- 
jective optimization (EMO) algorithm. This is clearly 


different from the single-objective approaches where 
a single fuzzy rule-based system is obtained by a sin- 
gle run of a single-objective optimization algorithm. 
In Fig. 77.15, we illustrate the search for nondomi- 
nated fuzzy rule-based systems in evolutionary multi- 
objective approaches. The population of solutions (i. e., 
fuzzy rule-based systems) is pushed toward the Pareto 


> 


| 4 @ Fuzzy system 


© Nondominated 
fuzzy system 


Current 
population 


population 


Small <—— Error(S) ——» Large 


Simple <— Complexity(S) ———»* Complicated 


Fig. 77.15 Search for a variety of nondominated fuzzy 
rule-based systems along the Pareto front by evolutionary 
multiobjective approaches 


14.91 


G22 | D Hed 


14.92 


G22 | D Wed 


Part G 


Hybrid Systems 


front and widened along the Pareto front to search 
for a variety of nondominated solutions. Well-known 
and frequently used EMO algorithms such as nondom- 
inated sorting genetic algorithm II (NSGA-ID [77.70], 
strength Pareto evolutionary algorithm (SPEA) [77.71], 
multiobjective evolutionary algorithm based on de- 
composition (MOEA/D) [77.72], and S metric selec- 
tion evolutionary multiobjective optimisation algorithm 
(SMS-EMOA) [77.73] have their own mechanisms to 
push the population toward the Pareto front and widen 
the population along the Pareto front. 

The obtained set of nondominated solutions can be 
used to examine the complexity—accuracy tradeoff rela- 
tion in the design of fuzzy rule-based systems [77.53]. 
A human decision maker is supposed to choose a final 
fuzzy rule-based system from the obtained nondomi- 
nated ones according to his/her preference. It should be 
noted that the decision maker’s preference is needed in 
the problem formulation phase in the single-objective 
approaches in the previous section (i. e., in the form of 
a scalarizing function, the upper bound of the allow- 
able values for each objective, and the reference point 
in the objective space). However, the evolutionary mul- 
tiobjective approaches do not need any information on 
the decision maker’s preference in their search for non- 
dominated fuzzy rule-based systems. That is, a number 
of nondominated fuzzy rule-based systems can be ob- 
tained with no information on the decision maker’s 
preference. A human decision maker is needed only in 
the solution selection phase after a number of nondom- 
inated solutions are obtained. 


77.5.2 Various Evolutionary Multiobjective 
Approaches 


We have explained multiobjective fuzzy rule-based 
design using the two-objective formulation with the 
complexity minimization and the error minimization in 
Fig. 77.15. However, various evolutionary multiobjec- 
tive approaches have been proposed for multiobjective 
fuzzy system design (for their review, see [77.19]). In 
this subsection, we briefly explain some of those evolu- 
tionary multiobjective approaches. 

In some real-world applications, the design of 
fuzzy rule-based systems involves multiple perfor- 
mance measures. Especially in multiobjective fuzzy 
controller design, multiple performance measures have 
been frequently used with no complexity measures. For 
example, in Stewart et al. [77.74], multiobjective fuzzy 
controller design was formulated as a three-objective 
problem with three performance measures: a current 


tracking error, a velocity tracking error, and a power 
consumption. In Chen and Chiang [77.75], fuzzy 
controller design was formulated using no complexity 
measure and three accuracy measures: the number of 
collisions, the distance between the target and lead 
points of the new path, and the number of explored 
actions. Whereas multiple performance measures have 
been frequently used in multiobjective fuzzy controller 
design, a single performance measure such as the 
error rate has been mainly used in multiobjective 
fuzzy classifier design. However, for the handling of 
classification problems with imbalanced and cost- 
sensitive data sets, multiple performance measures 
were used in some studies on multiobjective fuzzy 
classifier design. For example, a true positive rate and 
a false positive rate were used as separate performance 
measures together with a complexity measure in three- 
objective fuzzy classifier design in [77.76]. 

Multiple complexity measures have been frequently 
used in multiobjective fuzzy classifier design. In 
the first study on multiobjective fuzzy classifier de- 
sign [77.15], the number of fuzzy rules was used 
as a complexity measure. Then the total rule length 
(i.e., the total number of antecedent conditions) was 
added as another complexity measure in three-objec- 
tive fuzzy classifier design [77.52, 53]. The number of 
fuzzy rules and the total rule length have been used 
in many other studies on multiobjective fuzzy classi- 
fier design [77.77—79]. In some studies, the number of 
antecedent fuzzy sets was used instead of the total rule 
length [77.80, 81]. 

When membership function tuning is performed to- 
gether with fuzzy rule generation in fuzzy classifier 
design, complexity measures such as the number of 
fuzzy rules and the total rule length are not always 
enough to evaluate the interpretability of fuzzy rule- 
based systems. Let us compare two fuzzy partitions 
in Fig. 77.16 with each other. The 5 x 5 fuzzy parti- 
tion in Fig. 77.16a has 25 fuzzy rules while the 4 x 4 
fuzzy partition in Fig. 77.16b has 16 fuzzy rules. Thus, 
the fuzzy partition in Fig. 77.16a is evaluated as be- 
ing more complicated than that of Fig. 77.16b when the 
abovementioned simple complexity measures are used. 
However, we intuitively feel that the simple 5 x 5 fuzzy 
partition in Fig. 77.16a is more interpretable than the 
tuned 4 x 4 fuzzy partition in Fig. 77.16b. This is be- 
cause the tuned antecedent fuzzy sets in Fig. 77.16b are 
not easy to interpret linguistically. These discussions on 
the comparison between the two fuzzy partitions show 
the necessity of interpretability measures in addition 
to the abovementioned simple complexity measures in 


Multiobjective Genetic Fuzzy Systems 


77.5 Evolutionary Multiobjective Approaches 


a) Simple 5x5 grid 
1.0 


b) Tuned 4x4 grid 
1.0 


Fig. 77.16a,b Two fuzzy partitions: (a) simple 5 x 5 grid, (b) tuned 4 x 4 grid 


fuzzy classifier design when membership function tun- 
ing is performed. 

Interpretability of fuzzy rule-based systems has 
been a hot topic in the field of fuzzy systems [77.82]. 
Various aspects of fuzzy rule-based systems are related 
to their interpretability [77.83-88]. Some studies focus 
on the explanation ability of fuzzy rule-based classifiers 
to explain why each pattern is classified as a particular 
class in an understandable manner [77.89]. 

Whereas a number of studies have already ad- 
dressed the interpretability of fizzy rule-based sys- 
tems [77.8289], it is still a very difficult open problem 
to quantitatively define all aspects of the interpretability 
of fuzzy rule-based systems. This is because the inter- 
pretability is totally subjective. That is, its definition 
totally depends on human users. Each human user may 
have a different idea about the interpretability of fuzzy 
rule-based systems. 

A number of approaches have been proposed to 
incorporate the interpretability into evolutionary mul- 
tiobjective fuzzy system design [77.90-94]. The basic 
idea is to significantly improve the accuracy of fuzzy 
rule-based systems by slightly deteriorating their in- 
terpretability (e.g., by slightly tuning antecedent fuzzy 
sets). Since the interpretability is totally subjective, it is 
not easy to compare those approaches. In this sense, ex- 
perimental studies on the interpretability of fuzzy rule- 
based systems seem to be one of the promising research 
directions [77.83]. 

Whereas multiobjective genetic algorithms have 
been mainly used in evolutionary multiobjective fuzzy 
system design, the use of other algorithms was also ex- 


amined. This is closely related to the increase in the 
popularity of not only multiobjective genetic algorithms 
but also other multiobjective algorithms. For example, 
multiobjective versions of particle swarm optimization 
(PSO) have been actively studies in the field of evolu- 
tionary computation [77.95—99]. In response to those 
active studies, multiobjective PSO algorithms were used 
for multiobjective fuzzy system design [77.100-104]. 


77.5.3 Future Research Directions 


Formulation of interpretability is still an important is- 
sue to be further studied. As pointed out by many 
studies [77.8388], various aspects are related to the in- 
terpretability of fuzzy rule-based systems. One problem 
is to quantitatively formulate those aspects so that they 
can be used as objectives in evolutionary multiobjec- 
tive fuzzy system design. Another problem is how to 
use them. We may have several options: the use of all 
aspects as separate objectives, the choice of only a few 
aspects as separate objectives, and the integration of all 
or some aspects into a few interpretability measures. If 
we use all aspects as separate objectives, multiobjective 
fuzzy system design is formulated as a many-objective 
problem. It is well-known that many-objective prob- 
lems are usually very difficult for evolutionary multiob- 
jective optimization problems [77.105—107]. However, 
both the choice of only a few aspects and the integration 
into a few interpretability measures are also difficult. 
The main advantage of multiobjective approaches to 
fuzzy system design over single-objective approaches is 
that a number of nondominated fuzzy rule-based sys- 


14.93 


G22 | D Hed 


1494 Part G 


Hybrid Systems 


Z4 | D Hed 


tems are obtained along the interpretability—accuracy 
tradeoff surface as we explained in the complexity— 
accuracy objective space. One issue, which has not been 
discussed in many studies, is how to choose a sin- 
gle fuzzy rule-based system from a large number of 
obtained ones. It is implicitly assumed that a single 
fuzzy rule-based system is to be selected by a hu- 
man decision maker. However, the selection of a single 
fuzzy rule-based system is an important issue espe- 
cially when a large number of fuzzy rule-based systems 
are obtained in a high-dimensional objective space. 
A related research topic is the elicitation of the deci- 
sion maker’s preference about interpretability—accuracy 
tradeoffs and its utilization in evolutionary multiobjec- 
tive fuzzy system design. 

Performance improvement of evolutionary multiob- 
jective approaches is still an important research topic. 
Since multiobjective fuzzy system design is often for- 
mulated as complicated multiobjective optimization 
problems with many discrete and continuous decision 
variables, it is very difficult to search for their true 
Pareto optimal solutions. Thus, it is likely that better 
fuzzy rule-based systems than reported results in the 
literature would be obtained by more efficient multiob- 
jective algorithms and/or better problem formulations. 


77.6 Conclusion 


We explained the basic idea of evolutionary multiobjec- 
tive fuzzy system design using a simple two-objective 
formulation for complexity and error minimization in 
comparison with single-objective approaches. The main 
advantage of multiobjective approaches is that a large 
number of fuzzy rule-based systems with different 
complexity—accuracy tradeoffs are obtained from a sin- 
gle run of a multiobjective approach. A human user 
can choose a single fuzzy rule-based system based on 


Actually better results are continuously reported in the 
literature. A related research topic is parallel imple- 
mentation of evolutionary multiobjective approaches. 
In general, parallel implementation of evolutionary al- 
gorithms is not difficult due to their population-based 
search mechanisms (i. e., because the fitness evaluation 
of multiple individual in the current population can be 
easily performed in parallel). 

Multiobjective genetic algorithms have been mainly 
used for evolutionary multiobjective fuzzy system de- 
sign. As we have already mentioned, recently the use 
of multiobjective PSO has been examined [77.100- 
104]. Since other population-based search algorithms 
such as ant colony optimization (ACO) have already 
been used in single-objective approaches to fuzzy sys- 
tem design [77.108—112], the use of their multiobjective 
versions will be examined for multiobjective fuzzy sys- 
tem design. 

A very important and promising research direc- 
tion is multiobjective design of type-2 fuzzy sys- 
tems [77.113]. A number of single-objective ap- 
proaches have already been proposed for type-2 fuzzy 
system design [77.114—116]. However, multiobjective 
type-2 fuzzy system design has not been discussed in 
many studies. 


his/her preference and the requirement in each applica- 
tion field. Highly interpretable fuzzy systems may be 
needed in some application fields while highly accu- 
rate ones may be preferred in other application fields. 
See [77.19] for more comprehensive review on evolu- 
tionary multiobjective approaches to fuzzy rule-based 
system design, [77.88] for single-objective and mul- 
tiobjective approaches, and [77.115, 116] for type-2 
fuzzy system design. 


References 

77.1 C.C. Lee: Fuzzy logic in control systems: Fuzzy logic 77.4 L.A. Zadeh: The concept of a linguistic variable 
controller — Part I, IEEE Trans. Syst. Man Cybern. and its application to approximate reasoning - 
20(2), 404-418 (1990) |, Inf. Sci. 8(3), 199-249 (1975) 

77.2 C.C. Lee: Fuzzy logic in control systems: Fuzzy logic 77.5 L.A. Zadeh: The concept of a linguistic variable 
controller — Part Il, IEEE Trans. Syst. Man Cybern. and its application to approximate reasoning - 
20(2), 419-435 (1990) Il, Inf. Sci. 8(4), 301-357 (1975) 

77.3 J.M. Mendel: Fuzzy logic systems for engi- 77.6 L.A. Zadeh: The concept of a linguistic variable 


neering: A tutorial, Proc. IEEE 83(3), 345-377 
(1995) 


and its application to approximate reasoning - 
Ill, Inf. Sci. 9(1), 43-80 (1975) 


Multiobjective Genetic Fuzzy Systems 


References 


77.7 


77.8 


77.9 


77.10 


77.11 


77.12 


77.13 


77.14 


77.15 


77.16 


77.17 


77.18 


77.19 


77.20 


77.21 


77.22 


B. Kosko: Fuzzy systems as universal approxima- 
tors, Proc. 1992 IEEE Int. Conf. Fuzzy Syst. (IEEE, San 
Diego 1992) pp. 1153-1162 

L.X. Wang: Fuzzy systems are universal approxi- 
mators, Proc. 1992 IEEE Int. Conf. Fuzzy Syst. (IEEE, 
San Diego 1992) pp. 1163-1170 

L.X. Wang, J.M. Mendel: Fuzzy basis functions, 
universal approximation, and orthogonal least- 
squares learning, IEEE Trans. Neural Netw. 3(5), 
807-814 (1992) 

K. Funahashi: On the approximate realization of 
continuous mappings by neural networks, Neural 
Netw. 2(3), 183-192 (1989) 

K. Hornik, M. Stinchcombe, H. White: Multilayer 
feedforward networks are universal approxima- 
tors, Neural Netw. 2(5), 359-366 (1989) 

J. Park, I.W. Sandberg: Universal approxima- 
tion using radial-basis-function networks, Neu- 
ral Comput. 3(2), 246-257 (1991) 

H. Ishibuchi, K. Nozaki, N. Yamamoto, H. Tanaka: 
Construction of fuzzy classification systems with 
rectangular fuzzy rules using genetic algorithms, 
Fuzzy Sets Syst. 65(2/3), 237-253 (1994) 

H. Ishibuchi, K. Nozaki, N. Yamamoto, H. Tanaka: 
Selecting fuzzy if-then rules for classification 
problems using genetic algorithms, IEEE Trans. 
Fuzzy Syst. 3(3), 260-270 (1995) 

H. Ishibuchi, T. Murata, |.B. Türkşen: Single- 
objective and two-objective genetic algorithms 
for selecting linguistic rules for pattern classifi- 
cation problems, Fuzzy Sets Syst. 89(2), 135-150 
(1997) 

H. Ishibuchi: Multiobjective genetic fuzzy sys- 
tems: Review and future research directions, Proc. 
2007 IEEE Int. Conf. Fuzzy Syst. (IEEE, London 2007) 
pp. 913-918 

H. Ishibuchi, Y. Nojima, |. Kuwajima: Evolutionary 
multiobjective design of fuzzy rule-based clas- 
sifiers. In: Computational Intelligence: A Com- 
pendium, ed. by J. Fulcher, L.C. Jain (Springer, 
Berlin 2008) pp. 641-685 

H. Ishibuchi, Y. Nojima: Multiobjective genetic 
Fuzzy Systems. In: Computational Intelligence: 
Collaboration, Fusion and Emergence, ed. by 
C.L. Mumford, L.C. Jain (Springer, Berlin 2009) 
pp. 131-173 

M. Fazzolari, R. Alcalá, Y. Nojima, H. Ishibuchi, 
F. Herrera: A review of the application of multiob- 
jective evolutionary fuzzy systems: Current status 
and further directions, IEEE Trans. Fuzzy Syst. 21(1), 
45-65 (2013) 

K. Deb: Multi-Objective Optimization Using Evo- 
lutionary Algorithms (Wiley, Chichester 2001) 

K.C. Tan, E.F. Khor, T.H. Lee: Multiobjective Evo- 
lutionary Algorithms and Applications (Springer, 
Berlin 2005) 

C.A.C. Coello, G.B. Lamont: Applications of Multi- 
Objective Evolutionary Algorithms (World Scien- 
tific, Singapore 2004) 


77.23 


77.24 


77.25 


77.26 


77.27 


77.28 


77.29 


77.30 


77.31 


77.34 


77.35 


77.36 


77.37 


77.38 


77.39 


E.H. Mamdani, S. Assilian: An experiment in lin- 
guistic synthesis with a fuzzy logic controller, Int. 
J. Man-Mach. Stud. 7(1), 1-13 (1975) 

E.H. Mamdani: Application of fuzzy logic to ap- 
proximate reasoning using linguistic synthesis, 
IEEE Trans. Comput. C-26(12), 1182-1191 (1977) 

L.X. Wang, J.M. Mendel: Generating fuzzy rules by 
learning from examples, IEEE Trans. Syst. Man Cy- 
bern. 22(6), 1414-1427 (1992) 

T. Takagi, M. Sugeno: Fuzzy identification of sys- 
tems and its applications to modeling and con- 
trol, IEEE Trans. Syst. Man Cybern. 15(1), 116-132 
(1985) 

C.T. Lin, C.S.G. Lee: Neural-network-based fuzzy 
logic control and decision system, IEEE Trans. 
Comput. 40(12), 1320-1336 (1991) 

S. Horikawa, T. Furuhashi, Y. Uchikawa: On fuzzy 
modeling using fuzzy neural networks with the 
back-propagation algorithm, IEEE Trans. Neural 
Netw. 3(5), 801-806 (1992) 

J.S.R. Jang: ANFIS: Adaptive-network-based fuzzy 
inference system, IEEE Trans. Syst. Man Cybern. 
23(3), 665-685 (1993) 

0. Cordón, M.J. del Jesus, F. Herrera: A proposal 
on reasoning methods in fuzzy rule-based clas- 
sification systems, Int. J. Approx. Reason. 20(1), 
21-45 (1999) 

L.I. Kuncheva: How good are fuzzy If-then clas- 
sifiers?, IEEE Trans. Syst. Man Cybern. B 30(4), 
501-509 (2000) 

L.l. Kuncheva: Fuzzy Classifier Design (Physica, 
Heidelberg 2000) 

H. Ishibuchi, K. Nozaki, H. Tanaka: Distributed 
representation of fuzzy rules and its application 
to pattern classification, Fuzzy Sets Syst. 52(1), 21- 
32 (1992) 

H. Ishibuchi, T. Nakashima, M. Nii: Classifica- 
tion and Modeling with Linguistic Information 
Granules: Advanced Approaches to Linguistic Data 
Mining (Springer, Berlin 2004) 

D. Nauck, R. Kruse: How the learning of rule 
weights affects the interpretability of fuzzy sys- 
tems, Proc. IEEE Int. Conf. Fuzzy Syst. (IEEE, An- 
chorage 1998) pp. 1235-1240 

H. Ishibuchi, T. Nakashima: Effect of rule weights 
in fuzzy rule-based classification systems, IEEE 
Trans. Fuzzy Syst. 9(4), 506-515 (2001) 

H. Ishibuchi, T. Nakashima, T. Morisawa: Voting 
in fuzzy rule-based systems for pattern classifi- 
cation problems, Fuzzy Sets Syst. 103(2), 223-238 
(1999) 

0. Cordon, F. Herrera: A three-stage evolutionary 
process for learning descriptive and approximate 
fuzzy-logic-controller knowledge bases from ex- 
amples, Int. J. Approx. Reason. 17(4), 369-407 
(1997) 

J.G. Marin-Blazquez, Q. Shen: From approxima- 
tive to descriptive fuzzy classifiers, IEEE Trans. 
Fuzzy Syst. 10(4), 484-497 (2002) 


1495 


Z4 | D Wed 


1496 PartG 


Hybrid Systems 


ZL | D Hed 


77.40 


77.41 


77.42 


77.43 


77.44 


77.45 


77.46 


77.47 


77.48 


77.49 


77.50 


77.51 


77.52 


77.53 


77.54 


77.55 


77.56 


P.K. Simpson: Fuzzy min-max neural networks — 
Part 1: Classification, IEEE Trans. Neural Netw. 3(5), 
776-786 (1992) 

S. Abe, M.S. Lan: A method for fuzzy rules extrac- 
tion directly from numerical data and its appli- 
cation to pattern classification, IEEE Trans. Fuzzy 
Syst. 3(1), 18-28 (1995) 

S. Abe, R. Thawonmas: A fuzzy classifier with 
ellipsoidal regions, IEEE Trans. Fuzzy Syst. 5(3), 
358-368 (1997) 

D. Nauck, F. Klawonn, R. Kruse: Foundations of 
Neuro-Fuzzy Systems (Wiley, New York 1997) 

S. Abe: Pattern Classification: Neuro-Fuzzy Meth- 
ods and Their Comparison (Springer, Berlin 2001) 

0. Cordón, F. Herrera, F. Hoffmann, L. Magdalena: 
Genetic Fuzzy Systems (World Scientific, Singapore 
2001) 

0. Cordon, F. Gomide, F. Herrera, F. Hoffmann, 
L. Magdalena: Ten years of genetic fuzzy systems: 
Current framework and new trends, Fuzzy Sets 
Syst. 141(1), 5-31 (2004) 

F. Herrera: Genetic fuzzy systems: Status, critical 
considerations and future directions, Int. J. Com- 
put. Intell. Res. 1(1), 59-67 (2005) 

F. Herrera: Genetic fuzzy systems: Taxonomy, cur- 
rent research trends and prospects, Evol. Intell. 
1(1), 27-46 (2008) 

K. Shimojima, T. Fukuda, Y. Hasegawa: Self- 
tuning fuzzy modeling with adaptive member- 
ship function, rules, and hierarchical structure 
based on genetic algorithm, Fuzzy Sets Syst. 71(3), 
295-309 (1995) 

H. Ishibuchi, T. Nakashima, T. Murata: Perfor- 
mance evaluation of fuzzy classifier systems 
for multidimensional pattern classification prob- 
lems, IEEE Trans. Syst. Man Cybern. B 29(5), 601- 
618 (1999) 

H. Ishibuchi, T. Yamamoto, T. Nakashima: Hy- 
bridization of fuzzy GBML approaches for pattern 
classification problems, IEEE Trans. Syst. Man Cy- 
bern. B 35(2), 359-365 (2005) 

H. Ishibuchi, T. Nakashima, T. Murata: Three- 
objective genetics-based machine learning for 
linguistic rule extraction, Inf. Sci. 136(1-4), 109- 
133 (2001) 

H. Ishibuchi, Y. Nojima: Analysis of interpretabil- 
ity-accuracy tradeoff of fuzzy systems by multi- 
objective fuzzy genetics-based machine learning, 
Int. J. Approx. Reason. 44(1), 4-31 (2007) 

J.C. Dunn: A fuzzy relative of the ISODATA process 
and its use in detecting compact well-separated 
clusters, J. Cybern. 3(3), 32-57 (1973) 

J.C. Bezdek: Pattern Recognition with Fuzzy 0b- 
jective Function Algorithms (Plenum Press, New 
York 1981) 

J.C. Bezdek, R. Ehrlich, W. Full: FCM: The fuzzy 
c-means clustering algorithm, Comput. Geosci. 
10(2/3), 191-203 (1984) 


77.57 


77.58 


77.59 


77.60 


77.61 


77.62 


77.63 


77.64 


77.65 


77.66 


77.67 


77.68 


77.69 


77.70 


77.71 


77.72 


S.L. Chiu: Fuzzy model identification based on 
cluster estimation, J. Intell. Fuzzy Syst. 2(3), 267- 
278 (1994) 

J.A. Dickerson, B. Kosko: Fuzzy function approx- 
imation with ellipsoidal rules, IEEE Trans. Syst. 
Man Cybern. 26(4), 542-560 (1996) 

C.J. Lin, C.T. Lin: An ART-based fuzzy adaptive 
learning control network, IEEE Trans. Fuzzy Syst. 
5(4), 477-496 (1997) 

M. Delgado, A.F. Gomez-Skarmeta, F. Martin: 
A fuzzy clustering-based rapid prototyping for 
fuzzy rule-based modeling, IEEE Trans. Fuzzy Syst. 
5(2), 223-233 (1997) 

M. Setnes: Supervised fuzzy clustering for rule 
extraction, IEEE Trans. Fuzzy Syst. 8(4), 416-424 
(2000) 

M. Setnes, R. Babuska, H.B. Verbruggen: Rule- 
based modeling: Precision and transparency, 
IEEE Trans. Syst. Man Cybern. C 28(1), 165-169 
(1998) 

M. Setnes, R. Babuska, U. Kaymak, H.R. van Nauta 
Lemke: Similarity measures in fuzzy rule base 
simplification, IEEE Trans. Syst. Man Cybern. B 
28(3), 376-386 (1998) 

Y. Jin, W. von Seelen, B. Sendhoff: On generating 
FC3 fuzzy rule systems from data using evolution 
strategies, IEEE Trans. Syst. Man Cybern. B 29(6), 
829-845 (1999) 

M. Setnes, H. Roubos: GA-fuzzy modeling and 
classification: Complexity and performance, IEEE 
Trans. Fuzzy Syst. 8(5), 509-522 (2000) 

H. Roubos, M. Setnes: Compact and transpar- 
ent fuzzy models and classifiers through iterative 
complexity reduction, IEEE Trans. Fuzzy Syst. 9(4), 
516-524 (2001) 

J. Abonyi, J.A. Roubos, F. Szeifert: Data-driven 
generation of compact, accurate, and linguisti- 
cally sound fuzzy classifiers based on a decision- 
tree initialization, Int. J. Approx. Reason. 32(1), 
1-21 (2003) 

R. Alcalá, J. Alcalá-Fdez, F. Herrera, J. Otero: Ge- 
netic learning of accurate and compact fuzzy rule 
based systems based on the 2-tuples linguistic 
representation, Int. J. Approx. Reason. &4(1), 45- 
64 (2007) 

K. Miettinen: Nonlinear Multiobjective Optimiza- 
tion (Kluwer, Boston 1999) 

K. Deb, A. Pratap, S. Agarwal, T. Meyarivan: 
A fast and elitist multiobjective genetic algo- 
rithm: NSGA-II, IEEE Trans. Evol. Comput. 6(2), 
182-197 (2002) 

E. Zitzler, L. Thiele: Multiobjective evolutionary 
algorithms: A comparative case study and the 
strength Pareto approach, IEEE Trans. Evol. Com- 
put. 3(4), 257-271 (1999) 

Q. Zhang, H. Li: MOEA/D: A multiobjective evolu- 
tionary algorithm based on decomposition, IEEE 
Trans. Evol. Comput. 11(6), 712-731 (2007) 


Multiobjective Genetic Fuzzy Systems 


References 


77.73 


77.74 


77.75 


77.76 


77.77 


77.78 


77.79 


77.80 


77.81 


77.82 


77.83 


77.84 


77.85 


77.86 


77.87 


N. Beume, B. Naujoks, M. Emmerich: SMS-EMOA: 
Multiobjective selection based on dominated hy- 
pervolume, Eur. J. Oper. Res. 181(3), 1653-1669 
(2007) 

P. Stewart, D.A. Stone, P.J. Fleming: Design of 
robust fuzzy-logic control systems by multi- 
objective evolutionary methods with hardware in 
the loop, Eng. Appl. Artif. Intell. 17(3), 275-284 
(2004) 

L.H. Chen, C.H. Chiang: An intelligent control sys- 
tem with a multi-objective self-exploration pro- 
cess, Fuzzy Sets Syst. 143(2), 275-294 (2004) 

P. Ducange, B. Lazzerini, F. Marcelloni: Multi- 
objective genetic fuzzy classifiers for imbalanced 
and cost-sensitive datasets, Soft Comput. 14(7), 
713-728 (2010) 

C. Setzkorn, R.C. Paton: On the use of multi- 
objective evolutionary algorithms for the induc- 
tion of fuzzy classification rule systems, BioSys- 
tems 81(2), 101-112 (2005) 

H. Wang, S. Kwong, Y. Jin, W. Wei, K.F. Man: 
Agent-based evolutionary approach for in- 
terpretable rule-based knowledge extraction, 
IEEE Trans. Syst. Man Cybern. C 35(2), 143-155 
(2005) 

C.H. Tsang, S. Kwong, H.L. Wang: Genetic-fuzzy 
rule mining approach and evaluation of fea- 
ture selection techniques for anomaly intru- 
sion detection, Pattern Recognit. 40(9), 2373-2391 
(2007) 

H. Wang, S. Kwong, Y. Jin, W. Wei, K.F. Man: Multi- 
objective hierarchical genetic algorithm for inter- 
pretable fuzzy rule-based knowledge extraction, 
Fuzzy Sets Syst. 149(1), 149-186 (2005) 

ZY. Xing, Y. Zhang, Y.L. Hou, L.M. Jia: On generat- 
ing fuzzy systems based on Pareto multi-objective 
cooperative coevolutionary algorithm, Int. J. Con- 
trol Autom. Syst. 5(4), 444-455 (2007) 

J. Casillas, 0. Cordón, F. Herrera, L. Magdalena 
(Eds.): Interpretability Issues in Fuzzy Modeling 
(Springer, Berlin 2003) 

J.M. Alonso, L. Magdalena, G. González- 
Rodríguez: Looking for a good fuzzy system 
interpretability index: An experimental ap- 
proach, Int. J. Approx. Reason. 51(1), 115-134 
(2009) 

H. Ishibuchi, Y. Kaisho, Y. Nojima: Design of lin- 
guistically interpretable fuzzy rule-based classi- 
fiers: A short review and open questions, J. Mult. - 
Valued Log. Soft Comput. 17(2/3), 101-134 (2011) 
J.M. Alonso, L. Magdalena: Editorial: Special issue 
on interpretable fuzzy systems, Inf. Sci. 181(20), 
4331-4339 (2011) 

M.J. Gacto, R. Alcalá, F. Herrera: Interpretability of 
linguistic fuzzyrule-based systems: An overview 
of interpretability measures, Inf. Sci. 181(20), 
4340-4360 (2011) 

C. Mencar, C. Castiello, R. Cannone, A.M. Fanelli: 
Interpretability assessment of fuzzy knowledge 


77.88 


77.89 


77.90 


77.91 


77.92 


77.93 


77.94 


77.95 


77.96 


77.97 


77.98 


77.99 


77.100 


bases: A cointension based approach, Int. J. Ap- 
prox. Reason. 52(4), 501-518 (2011) 

0. Cordón: A historical review of evolutionary 
learning methods for Mamdani-type fuzzy rule- 
based systems: Designing interpretable genetic 
fuzzy systems, Int. J. Approx. Reason. 52(6), 894- 
913 (2011) 

H. Ishibuchi, Y. Nojima: Toward quantitative def- 
inition of explanation ability of fuzzy rule-based 
classifiers, Proc. 2011 IEEE Int. Conf. Fuzzy Syst. 
(IEEE, Taipei 2011) pp. 549-556 

M. Antonelli, P. Ducange, B. Lazzerini, F. Marcel- 
loni: Learning concurrently partition granularities 
and rule bases of Mamdani fuzzy systems in 
a multi-objective evolutionary framework, Int. 
J. Approx. Reason. 50(7), 1066-1080 (2009) 

A. Botta, B. Lazzerini, F. Marcelloni, D.C. Ste- 
fanescu: Context adaptation of fuzzy systems 
through a multi-objective evolutionary approach 
based on a novel interpretability index, Soft Com- 
put. 13(5), 437-449 (2009) 

M.J. Gacto, R. Alcalá, F. Herrera: Integration of 
an index to preserve the semantic interpretability 
in the multiobjective evolutionary rule selection 
and tuning of linguistic fuzzy systems, IEEE Trans. 
Fuzzy Syst. 18(3), 515-531 (2010) 

Y. Zhang, X.B. Wu, Z.Y. Xing, W.L. Hu: On gen- 
erating interpretable and precise fuzzy systems 
based on Pareto multi-objective cooperative co- 
evolutionary algorithm, Appl. Soft Comput. 11(1), 
1284-1294 (2011) 

R. Alcala, Y. Nojima, F. Herrera, H. Ishibuchi: Mul- 
tiobjective genetic fuzzy rule selection of single 
granularity-based fuzzy classification rules and 
its interaction with the lateral tuning of mem- 
bership functions, Soft Comput. 15(12), 2303-2318 
(2011) 

C.A.C. Coello, G.T. Pulido, M.S. Lechuga: Handling 
multiple objectives with particle swarm opti- 
mization, IEEE Trans. Evol. Comput. 8(3), 256-279 
(2004) 

D. Liu, K.C. Tan, C.K. Goh, W.K. Ho: A multiobjec- 
tive memetic algorithm based on particle swarm 
optimization, IEEE Trans. Syst. Man Cybern. B 37(1), 
42-50 (2007) 

Y. Wang, Y. Yang: Particle swarm optimization 
with preference order ranking for multi-objective 
optimization, Inf. Sci. 179(12), 1944-1959 (2009) 
A. Elhossini, S. Areibi, R. Dony: Strength Pareto 
particle swarm optimization and hybrid EA-PSO 
for multi-objective optimization, Evol. Comput. 
18(1), 127-156 (2010) 

C.K. Goh, K.C. Tan, D.S. Liu, S.C. Chiam: A competi- 
tive and cooperative co-evolutionary approach to 
multi-objective particle swarm optimization al- 
gorithm design, Eur. J. Oper. Res. 202(1), 42-54 
(2010) 

A.R.M. Rao, K. Sivasubramanian: Multi-objective 
optimal design of fuzzy logic controller using 


1497 


Z4 | D Wed 


1498 PartG 


Hybrid Systems 


ZL | D Hed 


77.101 


77.102 


77.103 


77.104 


77.105 


77.106 


77.107 


a self configurable swarm intelligence algorithm, 
Comput. Struct. 86(23/24), 2141-2154 (2008) 

M. Marinaki, Y. Marinakis, G.E. Stavroulakis: Fuzzy 
control optimized by a multi-objective particle 
swarm optimization algorithm for vibration sup- 
pression of smart structures, Struct. Multidiscip. 
Optim. 43(1), 29-42 (2011) 

C.N. Nyirenda, D.S. Dawoud, F. Dong, M. Neg- 
nevitsky, K. Hirota: A fuzzy multiobjective particle 
swarm optimized TS fuzzy logic congestion con- 
troller for wireless local area networks, J. Adv. 
Comput. Intell. Intell. Inf. 15(1), 41-54 (2011) 

Q. Zhang, M. Mahfouf: A hierarchical Mamdani- 
type fuzzy modelling approach with new training 
data selection and multi-objective optimisation 
mechanisms: A special application for the pre- 
diction of mechanical properties of alloy steels, 
Appl. Soft Comput. 11(2), 2419-2443 (2011) 

B.J. Park, J.N. Choi, W.D. Kim, S.K. Oh: Analytic 
design of information granulation-based fuzzy 
radial basis function neural networks with the 
aid of multiobjective particle swarm optimiza- 
tion, Int. J. Intell. Comput. Cybern. 5(1), 4-35 
(2012) 

H. Ishibuchi, N. Tsukamoto, Y. Nojima: Evolution- 
ary many-objective optimization: A short review, 
Proc. 2008 IEEE Congr. Evol. Comput. (IEEE, Hong 
Kong 2008) pp. 2424-2431 

H. Ishibuchi, N. Akedo, H. Ohyanagi, Y. Nojima: 
Behavior of EMO algorithms on many-objective 
optimization problems with correlated objec- 
tives, Proc. 2011 IEEE Congr. Evol. Comput. (IEEE, 
New Orleans 2011), pp. 1465-1472 

0. Schutze, A. Lara, C.A.C. Coello: On the influ- 
ence of the number of objectives on the hardness 
of a multiobjective optimization problem, IEEE 
Trans. Evol. Comput. 15(4), 444-455 (2011) 


77.108 


77.109 


77.110 


77.111 


77.112 


77.113 


77.114 


77.115 


77.116 


C.F. Juang, C.M. Lu, C. Lo, C.Y. Wang: Ant colony 
optimization algorithm for fuzzy controller de- 
sign and its FPGA implementation, IEEE Trans. Ind. 
Electron. 55(3), 1453-1462 (2008) 

C.F. Juang, C.Y. Wang: A self-generating fuzzy sys- 
tem with ant and particle swarm cooperative 
optimization, Expert Syst. Appl. 36(3), 5362-5370 
(2009) 

C.F. Juang, P.H. Chang: Designing fuzzy-rule- 
based systems using continuous ant-colony op- 
timization, IEEE Trans. Fuzzy Syst. 18(1), 138-149 
(2010) 

C.F. Juang, P.H. Chang: Recurrent fuzzy system 
design using elite-guided continuous ant colony 
optimization, Appl. Soft Comput. 11(2), 2687-2697 
(2011) 

G.M. Fathi, A.M. Saniee: A fuzzy classification 
system based on ant colony optimization for dia- 
betes disease diagnosis, Expert Syst. Appl. 38(12), 
14650-14659 (2011) 

S. Wang, M. Mahfouf: Multi-objective optimisa- 
tion for fuzzy modelling using interval type-2 
fuzzy sets, Proc. 2012 IEEE Int. Conf. Fuzzy Syst. 
(IEEE, Brisbane 2012) pp. 722-729 

0. Castillo, P. Melin, A.A. Garza, 0. Montiel, 
R. Sepulveda: Optimization of interval type-2 
fuzzy logic controllers using evolutionary algo- 
rithms, Soft Comput. 15(6), 1145-1160 (2011) 

0. Castillo, P. Melin: A review on the design and 
optimization of interval type-2 fuzzy controllers, 
Appl. Soft Comput. 12(4), 1267-1278 (2012) 

0. Castillo, R. Martinez-Marroquin, P. Melin, 
F. Valdez, J. Soria: Comparative study of bio- 
inspired algorithms applied to the optimization 
of type-1 and type-2 fuzzy controllers for an au- 
tonomous mobile robot, Inf. Sci. 192(1), 19-38 
(2012) 


1499 


78. Bio-Inspired Optimization 
of Type-2 Fuzzy Controllers 


Oscar Castillo 


A review of the bio-inspired optimization meth- 


78.1 Related Work in Type-2 Fuzzy Control ... 1499 


ods used in the design of type-2 fuzzy systems, 78.2 Fuzzy Logic Systems ..... rete eeteteseneseeeaens 1500 
which are relatively novel models of imprecision, 78.2.1 Type-1 Fuzzy Logic Systems......... 1500 
is considered in this chapter. The fundamental 78.2.2 Type-2 Fuzzy Logic Systems......... 1500 
focus of the work is based on the basic reasons 78.3 Bio-Inspired Optimization Methods...... 1503 
for the need for optimization of type-2 fuzzy sys- 78.3.1 Particle Swarm Optimization ...... 1503 
tems for different areas of application. Recently, 78.3.2 Genetic Algorithms ‘singed ca oa eee 1504 
bio-inspired methods have emerged as powerful 78.3.3 Ant Colony Optimization............. 1504 


optimization algorithms for solving complex prob- 
lems. In the case of designing type-2 fuzzy systems 


for particular applications, the use of bio-inspired SVENE eene Ra EEE 1505 y 
optimization methods has helped in the complex 78.4 General Overview of the Area 3 
task of finding the appropriate parameter values and Future Trends .......................0c00008 1505 © 
and structure of fuzzy systems. In this chapter, 78.5 Conclusions ............. cece cece e cee eeee ee eens 1506 a 
we consider the application of genetic algorithms, References... eiorinn 1506 z 


particle swarm optimization, and ant colony opti- 
mization as three different paradigms that help in 
the design of optimal type-2 fuzzy systems. We also 
provide a comparison of the different optimization 


78.3.4 General Remarks About 
Optimization of Type-2 Fuzzy 


methods for the case of designing type-2 fuzzy 
systems. 


78.1 Related Work in Type-2 Fuzzy Control 


Uncertainty affects decision-making and appears in 
a number of different forms. The concept of infor- 
mation is fully connected with the concept of un- 
certainty [78.1]. The most fundamental aspect of this 
connection is that the uncertainty involved in any 
problem-solving situation is a result of some informa- 
tion deficiency, which may be incomplete, imprecise, 
fragmentary, not fully reliable, vague, contradictory, or 
deficient in some other way. Uncertainty is an attribute 
of information [78.2]. The general framework of fuzzy 
reasoning allows handling much of this uncertainty, and 
fuzzy systems that employ type-1 fuzzy sets represent 
uncertainty by numbers in the range [0, 1]. When some- 
thing is uncertain, like a measurement, it is difficult to 
determine its exact value, and of course type-1 using 
fuzzy sets make more sense than using crisp sets [78.3]. 


However, it is not reasonable to use an accurate mem- 
bership function for something uncertain, so in this case 
what we need are higher-order fuzzy sets, which are 
able to handle these uncertainties, like the so-called 
type-2 fuzzy sets [78.3]. So, the degree of uncertainty 
can be managed by using type-2 fuzzy logic because 
this offers better capabilities to handle linguistic un- 
certainties by modeling vagueness and unreliability of 
information [78.4—6]. 

Recently, we have seen the use of type-2 fuzzy 
sets in fuzzy logic systems (FLS) in different ar- 
eas of application [78.7-11]. In this paper we deal 
with the application of interval type-2 fuzzy con- 
trol to non-linear dynamic systems [78.4, 12-15]. It 
is a well-known fact that in the control of real sys- 
tems, the instrumentation elements (instrumentation 


1500 PartG 


z°82|9 Hed 


Hybrid Systems 


amplifier, sensors, digital to analog, analog to dig- 
ital converters, etc.) introduce some sort of unpre- 
dictable values in the information that has been col- 


78.2 Fuzzy Logic Systems 


In this section, a brief overview of type-1 and type-2 
fuzzy systems is presented. This overview is considered 
to be necessary to understand the basic concepts needed 
to develop the methods and algorithms presented later 
in the chapter. 


78.2.1 Type-1 Fuzzy Logic Systems 


Soft computing techniques have become an important 
research topic that can be applied in the design of in- 
telligent controllers, which utilize human experience in 
amore natural form than the conventional mathematical 
approach [78.18, 19]. An FLS described completely in 


Error i Output 
a u Plant y=f(u) 
Type-2 of 
FLC process 


y = y + 0.05 - randn 


Enable to introduce 
uncertainly to the system 


Fig. 78.1 System used to obtain the experimental results 


Membership degree (u) 
A 
1 


0.8 
0.6 
0.4 


0.2 


Fig. 78.2 Type-1 membership function 


lected [78.16]. So, the controllers designed under ide- 
alized conditions tend to behave in an inappropriate 
manner [78.17]. 


terms of type-1 fuzzy sets is called a type-1 fuzzy logic 
system (type-1 FLS). In this paper, the fuzzy controller 
has two input variables, which are the error e(f) and the 
error variation Ae(f), 


(78.1) 
(78.2) 


elt) =r(t)— y(t), 
Ae(t) = e(t)—e(t— 1), 


so the control system can be represented as shown in 
Fig. 78.1. 


78.2.2 Type-2 Fuzzy Logic Systems 


If for a type-1 membership function, as in Fig. 78.2, 
we blur it to the left and to the right, as illustrated 
in Fig. 78.3, then a type-2 membership function is 
obtained. In this case, for a specific value x’, the mem- 
bership function (u’) takes on different values, which 
are not all weighted the same, so we can assign an am- 
plitude distribution to all of those points. 

A type-2 fuzzy set A is characterized by the mem- 
bership function [78.1, 3] 


A= {(Q, u), uz x, u)) |YxeX, VYueJ,C[0, 1]} ; 


(78.3) 


Membership degree (u) u 
A 
1 


0.8 
0.6 
0.4 


0.2 


Fig. 78.3 Blurred type-1 membership function 


Bio-Inspired Optimization of Type-2 Fuzzy Controllers | 78.2 Fuzzy Logic Systems 


Membership degree 
1 
0.9 
0.8 
0.7 Upper MF 


function 


Lower MF 
function 


w tf 2 F 42 5 © F o YY io 


Fig. 78.4 Interval type-2 membership function 


in which 0 < u3 (x, u) < 1. Another expression for A is 


i= f| [a Jc [0,1], (78.4) 


xEX uel, 


where f f denotes the union over all admissible input 
variables x and u. For discrete universes of discourse 
J is replaced by >>. In fact J, C [0, 1] represents the 
primary membership of x, and pz(x,u) is a type-1 
fuzzy set known as the secondary set. Hence, a type- 
2 membership grade can be any subset in [0, 1], the 
primary membership, and corresponding to each pri- 
mary membership, there is a secondary membership 
(which can also be in [0, 1]) that defines the possibilities 
for the primary membership. Uncertainty is represented 
by a region, which is called the footprint of uncer- 
tainty (FOU). When p4(x, u) = 1, Vu € J, C [0, 1] we 
have an interval type-2 membership function, as shown 
in Fig. 78.4. The uniform shading for the FOU rep- 


resents the entire interval type-2 fuzzy set and it can 
be described in terms of an upper membership func- 
tion üz (x) and a lower membership function u- (x). 

A FLS described using at least one type-2 fuzzy set 
is called a type-2 FLS. Type-1 FLSs are unable to di- 
rectly handle rule uncertainties, because they use type-1 
fuzzy sets that are certain [78.3]. On the other hand, 
type-2 FLSs, are very useful in circumstances where 
it is difficult to determine an exact membership func- 
tion and there are measurement uncertainties [78.14, 
20, 21]. 

A type-2 FLS is again characterized by IF-THEN 
tules, but its antecedent or consequent sets are now of 
type-2. Similar to a type-1 FLS, a type-2 FLS includes 
a fuzzifier, a rule base, fuzzy inference engine, and an 
output processor, as we can see in Fig. 78.5. The out- 
put processor includes a type-reducer and a defuzzifier; 
it generates a type-1 fuzzy set output (type-reducer) or 
a crisp number (defuzzifier). 


Fuzzifier 

The fuzzifier maps a crisp point x = (x;,...,x,)" € 
X, xX) x...xX, =X into a type-2 fuzzy set Ax in 
X [78.1], interval type-2 fuzzy sets in this case. We 
will use type-2 singleton fuzzifier, in a singleton fuzzi- 
fication, the input fuzzy set has only a single point on 
nonzero membership [78.3]. Ax is a type-2 fuzzy sin- 
gleton if u5, (x) = 1/1 for x = x’ and Mg, (x) = 1/0 for 
all other x Æ x’ [78.1]. 


Rules 
The structure of rules in a type-1 FLS and a type-2 FLS 
is the same, but in the latter the antecedents and the con- 
sequents will be represented by type-2 fuzzy sets. So 
for a type-2 FLS with p inputs xı € X1, .. . , Xp E€ Xp and 
one output y € Y, multiple input single output (MISO). 
If we assume that there are M rules, the /-th rule in the 


Type 2 FIS 
7 Output 
4 ! Crisp value 
Defuzzifier [4——» 
Active Active 7 
Inputs antecedents consequents | | 
rill = ; : | E 
Fuzzification |>| Inference į Type-reducer 
in2 ~>] | — 


l 


Rules 


Output processor 


Fig. 78.5 Type-2 FLS 


1501 


7°82 | D Hed 


1502 


7°82|9 Hed 


Part G 


Hybrid Systems 


type-2 FLS can be written as follows [78.3] 
R! : IF x; is Fi and ... 


THEN yis G! , 
„M. 


. pl 
and x, is F, ; 


b=1,54 (78.5) 
Inference 
In the type-2 FLS, the inference engine combines rules 
and gives a mapping from input type-2 fuzzy sets to 
output type-2 fuzzy sets. It is necessary to compute 
the join U, (unions) and the meet TI (intersections), as 
well as extended sup-star compositions of type-2 rela- 
tions [78.3]. If 


Ris Bl x 
(78.6) 


R! is described by the membership function ugi (x, y) = 


UR (X1, ~- - Xp, Y), Where 
Pel X.Y) = Higi (x,y) (78.7) 
can be written as [78.3] 
Hr, y) = MG X, y) = ug (x1) 
x TMT- Hup Oua) 
= [Tupi] Tue). (78.8) 


In general, the p-dimensional input to R! is given by the 
type-2 fuzzy set A, whose membership function is 


My, (X) = Me (IT + Tuz, (Xp) = TT) He i), 
(78.9) 
where X;(i= 1,...,p) are the labels of the fuzzy sets 


describing the inputs. Each rule R! determines a type-2 
fuzzy set B! = A, o R! such that [78.3] 


= Ujor! = Urex [uz ®)Tue(x.y)] . 
yeY, 1=1,...,M (78.10) 


Hay) 


This equation is the input/output relation in Fig. 78.5 
between the type-2 fuzzy set that activates one rule in 


the inference engine and the type-2 fuzzy set at the out- 
put of that engine [78.3]. In the FLS we used interval 
type-2 fuzzy sets and meet under product t-norm, so the 
result of the input and antecedent operations, which are 
contained in the firing set TT Mi, (x, = F'(x’), is an 
interval type-1 set [78.3], 


re =e] am 
where 

L= uy (x1) Mp) (78.12) 

FONR A)R), (78.13) 


where * is the product operation. 


Type-Reducer 
The type-reducer generates a type-1 fuzzy set output, 
which is then converted in a crisp output through the 
defuzzifier. This type-1 fuzzy set is also an interval set, 
for the case of our FLS we used center of sets (coss) 
type reduction, Y..;, which is expressed as [78.3] 


Yeos(x) = [yr] = 7 1] har y] 


al, : Daf i 
fief! P Ja Da 


(78.14) 


This interval set is determined by its two end points, y; 
and y,, which correspond to the centroid of the type-2 
interval consequent set G! [78.3], 


Diet Yiði ivi 
caf f = [iyi] . 
0i EJy1 Oy EJyN 1 Pee YS 6; 


(78.15) 


Before the computation of Y.os(x), we must evalu- 
ate (78.15) and its two end points, y; and y,. If the values 
of f; and y; that are associated with y; are denoted f’ 
and y), respectively, and the values of f; and y; that are 
associated with y, are denoted f! and y,, respectively, 
from (78.14), we have [78.3] 


Ae ees 
via Sy; 
Leal 
ME 
Dich Y 
M n’ 
DaN 


y= (78.16) 


yy = (78.17) 


Bio-Inspired Optimization of Type-2 Fuzzy Controllers 


78.3 Bio-Inspired Optimization Methods 


Defuzzifiers 
From the type-reducer we obtain an interval set Yoos; 
to defuzzify it we use the average of y; and y,, so the 
defuzzified output of an interval singleton type-2 FLS 


is [78.3] 


=. yi yr 


y(x) 5 


(78.18) 


78.3 Bio-Inspired Optimization Methods 


In this section a brief overview of the basic concepts 
from bio-inspired optimization methods needed for this 
work is presented. 


78.3.1 Particle Swarm Optimization 


Particle swarm optimization is a population-based 
stochastic optimization technique, which was devel- 
oped by Eberhart and Kennedy in 1995. It was inspired 
by the social behavior of bird flocking or fish school- 
ing [78.7]. (PSO) shares many similarities with evo- 
lutionary computation techniques such as the genetic 
algorithm (GA) [78.22]. 

The system is initialized with a population of ran- 
dom solutions and searches for optima by updating 
generations. However, unlike the GA, PSO has no 
evolution operators such as crossover and mutation. 
In PSO, the potential solutions, called particles, fly 
through the problem space by following the current op- 
timum particles [78.18]. Each particle keeps track of 
its coordinates in the problem space, which are asso- 
ciated with the best solution (fitness) it has achieved 
so far (the fitness value is also stored). This value 
is called pbest. Another best value that is tracked by 
the particle swarm optimizer is the best value, ob- 
tained so far by any particle in the neighbors of the 
particle. This location is called Jbest. When a parti- 
cle takes all the population as its topological neigh- 
bors, the best value is a global best and is called 
gbest [78.15]. 

The particle swarm optimization concept consists 
of, at each time step, changing the velocity of (acceler- 
ating) each particle toward its pbest and Ibest locations 
(local version of PSO). Acceleration is weighted by 
a random term, with separate random numbers be- 
ing generated for acceleration toward pbest and Ibest 
locations [78.7]. In the past several years, PSO has 
been successfully applied in many research and ap- 
plication areas. It has been demonstrated that PSO 
obtains better results in a faster, cheaper way com- 
pared with other methods [78.15]. Another reason that 
PSO is attractive is that there are few parameters to 


adjust. One version, with slight variations, works well 
in a wide variety of applications. Particle swarm opti- 
mization has been considered for approaches that can 
be used across a wide range of applications, as well as 
for specific applications focused on a specific require- 
ment. 

The basic algorithm of PSO has the following 
nomenclature: 


xi: Particle position 
vi: Particle velocity 

w;;: Inertia weight 

P}: Best remembered individual particle position 
P£: Best remembered swarm position 

c1, Cy: Cognitive and social parameters 


rı, r2: Random numbers between 0 and 1. 


The equation to calculate the velocity is 


Vip. = Wy, + ciri (P; — x) 

+on (p8 —x!) : (78.19) 
and the position of the individual particles is updated as 
follows 


E E AE (78.20) 


The basic PSO algorithm is defined as follows: 


1) Initialize 
a) Set constants Zmax, C1, C2 
b) Randomly initialize particle position xi ED 
in R" fori=1,...,p 
c) Randomly initialize particle velocities 0 < vi < 
v” fori=1,...,p 
d) SetZ= 1 
2) Optimize 
a) Evaluate function value f using design space 
coordinates xi, 


b) If fi < fogs: then firost = fi, p =x : 
c) Ff = Soest then foes =f. PË = z 
d) If stopping condition is satisfied then go to 3. 


1503 


€°82|9 Hed 


1504 PartG | Hybrid Systems 


£'’81 | D Hed 


e) Update all particle velocities vi fori=1,...,p 
f) Update al particle positions x fori=1,...,p 
g) Increment z. 
h) Goto 2(a). 

3) Terminate. 


78.3.2 Genetic Algorithms 


Genetic algorithms (GAs) are adaptive heuristic search 
algorithms based on the evolutionary ideas of natu- 
ral selection and genetic processes [78.21]. The basic 
principles of GAs were first proposed by Holland in 
1975, inspired by the mechanism of natural selection, 
where stronger individuals are likely to be the winners 
in a competing environment [78.22]. GA assumes that 
the potential solution of any problem is an individual 
and can be represented by a set of parameters. These pa- 
rameters are regarded as the genes of a chromosome and 
can be structured by a string of values in binary form. 
A positive value, generally known as a fitness value, is 
used to reflect the degree of goodness of the chromo- 
some for the problem, which would be highly related 
with its objective value. The pseudocode of a GA is as 
follows: 


1) Start with a randomly generated population of 
n chromosomes (candidate solutions to a prob- 
lem). 

1. Calculate the fitness of each chromosome in the 
population. 

2. Repeat the following steps until n offspring have 
been created: 

a) Select a pair of parent chromosomes from the 
current population, the probability of selection 
being an increasing function of fitness. Selec- 
tion is done with replacement, meaning that the 
same chromosome can be selected more than 
once to become a parent. 

b) With probability (crossover rate), perform 
crossover to the pair at a randomly chosen point 
to a form two offspring. 

c) Mutate the two offspring at each locus with 
probability (mutation rate), and place the result- 
ing chromosomes in the new population. 

2) Replace the current population with the new popu- 
lation. 

3) Go to step 2. 


The simple procedure just described above is the 
basis for most applications of GAs found in the liter- 
ature [78.23, 24]. 


78.3.3 Ant Colony Optimization 


Ant colony optimization (ACO) is a probabilistic tech- 
nique that can be used for solving problems that can 
be reduced to finding good paths along graphs. This 
method was inspired from the behavior exhibited by 
ants in finding paths from the nest or colony to the food 
source. 

Simple ant colony optimization (S-ACO) is an al- 
gorithmic implementation that adapts the behavior of 
real ants to solutions of minimum cost path problems 
on graphs [78.11]. A number of artificial ants build 
solutions for a certain optimization problem and ex- 
change information about the quality of these solutions 
making allusion to the communication system of real 
ants [78.25]. 

Let us define the graph G = (V, E), where V is the 
set of nodes and E is the matrix of the links between 
nodes. G has ng = |V| nodes. Let us define L% as the 
number of hops in the path built by the ant k from the 
origin node to the destiny node. Therefore, it is neces- 
sary to find 

Q= {qas 4101 ec} ; (78.21) 
where Q is the set of nodes representing a continuous 
path with no obstacles; q,,...,q are former nodes of 
the path, and C is the set of possible configurations of 
the free space. If x*(t) denotes a Q solution in time 
t, f(x*(t)) expresses the quality of the solution. The 
S-ACO algorithm is based on (78.22)-(78.24) 


rk 


pit) = 4 Lenk TH O oN (78.22) 
0 if i¢ Ni 
tj — —p) y(t), (78.23) 
ty(tt+ I = y+ > WO. (78.24) 
k=1 


Equation (78.22) represents the probability for an ant k 
located on a node i selects the next node denoted by j, 
where, N‘ is the set of feasible nodes (in a neighbor- 
hood) connected to node i with respect to ant k, tj is 
the total pheromone concentration of link ij, and œ is 
a positive constant used as a gain for the pheromone 
influence. 

Equation (78.23) represents the evaporation 
pheromone update, where p € [0, 1] is the evaporation 
rate value of the pheromone trail. The evaporation is 


Bio-Inspired Optimization of Type-2 Fuzzy Controllers 


78.4 General Overview of the Area and Future Trends 


added to the algorithm in order to force the explo- 
ration of the ants and avoid premature convergence to 
sub-optimal solutions. For p = 1 the search becomes 
completely random. 

Equation (78.24), represents the concentration 
pheromone update, where Ark is the amount of 
pheromone that an ant k deposits in a link ij in a time t. 

The general steps of S-ACO are the following: 


Set a pheromone concentration Tij to each link (i,j). 

. Place a number k = 1,2,..., n; in the nest. 

3. Iteratively build a path to the food source (destiny 
node), using (78.22) for every ant. 

@ Remove cycles and compute each route weight 
fO). A cycle could be generated when there 
are no feasible candidates nodes, that is, for any 
iand any k, NE = Ø; then the predecessor of that 
node is included as a former node of the path. 

4. Apply evaporation using (78.2). 
5. Update of the pheromone concentration us- 

ing (78.24) 

6. Finally, finish the algorithm in any of the three dif- 
ferent ways: 


Ne 


@ When a maximum number of epochs has been 
reached. 
@ When it has found an acceptable solution, with 


F(x) < e. 


@ When all ants follow the same path. 


78.3.4 General Remarks About 
Optimization of Type-2 Fuzzy 
Systems 


The problem of designing type-2 fuzzy systems can 
be solved with any of the above-mentioned optimiza- 
tion methods. The main issue in any of these methods 
is to decide on the appropriate representation of the 
type-2 fuzzy system in the corresponding optimization 
paradigm. For example, in the case of GAs, the type-2 
fuzzy systems must be represented in the chromosomes. 
On the other hand, in PSO the fuzzy system is repre- 
sented as a particle in the optimization process. In the 
ACO method, the fuzzy system can be represented as 
one of the paths that the ants can follow in a graph. 
Also, the evaluation of the fuzzy system must be rep- 
resented as an objective function in any of the methods. 


78.4 General Overview of the Area and Future Trends 


In this section, a general overview of the area of type-2 
fuzzy system optimization is presented. Also, possible 
future trends that we can envision based on the review 
of this area are presented. It has been well known for 
a long time that to design fuzzy systems is a difficult 
task, and this is especially true in the case of type-2 
fuzzy systems [78.4]. The use of GAs, ACO, and PSO 
in designing type-1 fuzzy systems has become stan- 
dard practice for automatically designing this sort of 
system [78.7, 8, 23,25]. This trend has also continued 
to the type-2 fuzzy systems area, which has been ac- 
counted for with the review of papers presented in the 
previous sections. In the case of designing type-2 fuzzy 
systems the problem is more complicated due to the 
higher number of parameters to consider, making it of 
very important to use bio-inspired optimization tech- 
niques to achieve the optimal designs of this sort of 
system. In this section, a summary of the total number 
of papers published in the area of type-2 fuzzy sys- 
tem optimization is presented, so that the increasing 
trend occurring in this area can be better appreciated. 
Also, the distribution of papers according to the opti- 
mization technique used is presented, so that a general 


idea of how these different techniques contribute to the 
automatic design of optimal type-2 fuzzy systems is 
obtained. 

Figure 78.6 shows the distribution of the papers 
published on the optimization of type-2 fuzzy systems 
according to the different bio-inspired optimization 
techniques previously mentioned. From Fig. 78.6 it can 


No. publications 
4.54 


Others 


2006 2007 2008 2009 2010 2011 
Year 


Fig. 78.6 Distribution of publications per area and year 


1505 


7°82 | D Hed 


1506 PartG 


Hybrid Systems 


8, | 9 Hed 


be noted that the use of GAs has been decreasing re- 
cently. On the other hand, the use of PSO, ACO, and 
other methods have been increasing. The reason for the 
increase in use of PSO and ACO may be due to recent 
work in which either PSO or ACO have been able to 
outperform GAs for different applications. Regarding 
the question of which method would be the most appro- 
priate for optimizing type-2 fuzzy systems, there is no 
easy answer. At the moment, what we can be sure of is 
that the techniques mentioned in this paper, and prob- 
ably newer ones that may appear in the future, would 
certainly be tested in the optimization of type-2 fuzzy 
systems because the problem of automatically design- 
ing these types of systems is complex enough to require 
their use. 


78.5 Conclusions 


In this chapter we have presented a representative 
account of the different optimization methods that 
have been applied in the optimal design of type-2 
fuzzy systems. To date, genetic algorithms have been 
used more frequently to optimize type-2 fuzzy sys- 
tems. However, more recently PSO and ACO have 
attracted more attention and have also been applied 
with some degree of success to the problem of 
the optimal design of type-2 fuzzy systems. There 
have been also other optimization methods applied 


References 


78.1 J.M. Mendel: Uncertainty, fuzzy logic, and signal 
processing, Signal Process. J. 80, 913-933 (2000) 
L.A. Zadeh: The concept of a linguistic variable and 
its application to approximate reasoning, Inf. Sci. 
8, 43-80 (1975) 

N.N. Karnik, J.M. Mendel: An Introduction to Type- 
2 Fuzzy Logic Systems, Technical Report (University 
of Southern California, Los Angeles 1998) 

0. Castillo, P. Melin, A. Alanis, 0. Montiel, R. Sepul- 
veda: Optimization of interval type-2 fuzzy logic 
controllers using evolutionary algorithms, J. Soft 
Comput. 15(6), 1145-1160 (2011) 

R. Sepulveda, 0. Montiel, 0. Castillo, P. Melin: Em- 
bedding a high speed interval type-2 fuzzy con- 
troller for a real plant into an FPGA, Appl. Soft 
Comput. 12(3), 988-998 (2012) 

R.R. Yager: Fuzzy subsets of type II in decisions, 
J. Cybern. 10, 137-159 (1980) 


78.2 


78.3 


78.4 


78.5 


78.6 


There are other bio-inspired or nature-inspired tech- 
niques that at the moment have not been applied to the 
optimization of type-2 fuzzy systems that may be worth 
mentioning. For example, membrane computing, har- 
mony computing, electromagnetism-based computing, 
and other similar approaches have not been applied (to 
date) in the optimization of type-2 fuzzy systems. It is 
expected that these approaches and similar ones could 
be applied in the near future in the area of type-2 fuzzy 
system optimization. Of course, as new bio-inspired and 
nature-inspired optimization methods are continuously 
being proposed in this fruitful area of research, it is ex- 
pected that newer optimization techniques will also be 
tried in the near future in the automatic design of opti- 
mal type-2 fuzzy systems. 


to the optimization of type-2 fuzzy systems, like ar- 
tificial immune systems and the chemical optimiza- 
tion paradigm. At this time, it would be very diffi- 
cult to declare one of these optimization techniques 
as the best for optimizing type-2 fuzzy systems, be- 
cause different techniques have had success in different 
applications of type-2 fuzzy logic. In any case, the 
need for bio-inspired optimization methods is justi- 
fied due to the complexity of designing type-2 fuzzy 
systems. 


78.7 Z. Bingül, 0. Karahan: A fuzzy logic controller tuned 
with PSO for 2 DOF robot trajectory control, Expert 
Syst. Appl. 38(1), 1017-1031 (2011) 

J. Cao, P. Li, H. Liu, D. Brown: Adaptive fuzzy con- 
troller for vehicle active suspensions with particle 
swarm optimization, Proc. SPIE Int. Soc. Opt. Eng., 
Vol. 7129 (2008) 

J.R. Castro, 0. Castillo, P. Melin: An interval type- 
2 fuzzy logic toolbox for control applications, Proc. 
FUZZ-IEEE (2007) pp. 1-6 

T. Dereli, A. Baykasoglu, K. Altun, A. Durmusoglu, 
I.B. Turksen: Industrial applications of type-2 fuzzy 
sets and systems: A concise review, Comput. Ind. 
62, 125-137 (2011) 

C.-F. Juang, C.-H. Hsu: Reinforcement ant op- 
timized fuzzy controller for mobile-robot wall- 
following control, IEEE Trans. Ind. Electron. 56(10), 
3931-3940 (2009) 


78.8 


78.9 


78.10 


78.11 


Bio-Inspired Optimization of Type-2 Fuzzy Controllers 


References 


78.12 


78.13 


78.14 


78.15 


78.16 


78.17 


78.18 


0. Castillo, G. Huesca, F. Valdez: Evolutionary com- 
puting for topology optimization of type-2 fuzzy 
controllers, Stud. Fuzziness Soft Comput. 208, 163- 
178 (2008) 

0. Castillo, L.T. Aguilar, N.R. Cazarez-Castro, S. Car- 
denas: Systematic design of a stable type-2 fuzzy 
logic controller, Appl. Soft Comput. J. 8, 1274-1279 
(2008) 

R. Martinez, 0. Castillo, L.T. Aguilar: Optimiza- 
tion of interval type-2 fuzzy logic controllers for a 
perturbed autonomous wheeled mobile robot us- 
ing genetic algorithms, Inf. Sci. 179(13), 2158-2174 
(2009) 

S.-K. Oh, H.-J. Jang, W. Pedrycz: A comparative 
experimental study of type-1/type-2 fuzzy cascade 
controller based on genetic algorithms and parti- 
cle swarm optimization, Expert Syst. Appl. 38(9), 
11217-11229 (2011) 

R. Sepulveda, 0. Castillo, P. Melin, A. Rodriguez- 
Diaz, 0. Montiel: Experimental study of intelligent 
controllers under uncertainty using type-1 and 
type-2 fuzzy logic, Inf. Sci. 177(10), 2023-2048 (2007) 
H. Hagras: Hierarchical type-2 fuzzy logic control 
architecture for autonomous mobile robots, IEEE 
Trans. Fuzzy Syst. 12, 524-539 (2004) 

R. Martinez, A. Rodriguez, 0. Castillo, L.T. Aguilar: 
Type-2 fuzzy logic controllers optimization using 
genetic algorithms and particle swarm optimiza- 
tion, Proc. IEEE Int. Conf. Granul. Comput. (2010) 
pp. 724-727 


78.19 


78.20 


78.21 


78.22 


78.23 


78.24 


78.25 


S.M.A. Mohammadi, A.A. Gharaveisi, M. Mashinchi: 
An evolutionary tuning technique for type-2 fuzzy 
logic controller in a non-linear system under un- 
certainty, Proc. 18th Iran. Conf. Electr. Eng. (2010) 
pp. 610-616 

J.R. Castro, 0. Castillo, L.G. Martinez: Interval 
type-2 fuzzy logic toolbox, Eng. Lett. 15(1), 14 
(2007) 

0. Cordon, F. Gomide, F. Herrera, F. Hoffmann, 
L. Magdalena: Ten years of genetic fuzzy systems: 
Current framework and new trends, Fuzzy Sets Syst. 
141, 5-31 (2004) 

0. Cordon, F. Herrera, P. Villar: Analysis and guide- 
lines to obtain a good uniform fuzzy partition 
granularity for fuzzy rule-based systems using sim- 
ulated annealing, Int. J. Approx. Reason. 25, 187- 
215 (2000) 

C. Wagner, H. Hagras: A genetic algorithm based 
architecture for evolving type-2 fuzzy logic con- 
trollers for real world autonomous mobile robots, 
Proc. IEEE Conf. Fuzzy Syst. (2007) 

D. Wu, W.-W. Tan: Genetic learning and perfor- 
mance evaluation of interval type-2 fuzzy logic 
controllers, Eng. Appl. Artif. Intell. 19(8), 829-841 
(2006) 

C.-F. Juang, C.-H. Hsu: Reinforcement interval 
type-2 fuzzy controller design by online rule gener- 
ation and Q-value-aided ant colony optimization, 
IEEE Trans. Syst. Man Cybern. B 39(6), 1528-1542 
(2009) 


1507 


82|9 Hed 


1509 


79. Pattern Recognition with Modular Neural 
Networks and Type-2 Fuzzy Logic 


Patricia Melin 


79.1 Related Work in the Area ..................... 1509 
Interval type-2 fuzzy systems can be of great help z 
in image analysis and pattern recognition appli- 19.2 a. PERE EA me 1510 
ti eal icul d tection i aae 
talons in pariaan ecEa detaan i a MOCE with Fuzzy LOgiC.............0.000000 1510 


usually applied to image sets before the training 
phase in recognition systems. This preprocessing 
step helps to extract the most important shapes in 


79.2.2 Morphological Gradient Edge 
Detector Improved 


i N ; ‘ with Fuzzy Logic..................:06 1511 
an image, ignoring the homogeneous regions and 79.3. Experi ore 1512 
remarking the real objective to classify or recog- ` Sapp p igei S 
nize. Many traditional and fuzzy edge detectors i for the Ex aae 1512 
can be used, but it is very difficult to demon- 79.3.2 Pashes A 
strate which one is better before the recognition = for the Images Databases 1512 
results are obtained. In this chapter, we show ex- 79.3.3 The Modular Neural Network...... 1513 
peiimenil results wnere sevel edep deiecta 79.4 Experimental Results .......................00 1513 
were used to preprocess the same image sets. Each ; 
resulting image set was used as training data for 79.5 COMCIUSIONS ............. cece cee eeeeeeeees 1515 

References... cee eee teeeeeeeeeeeeeenaes 1515 


a modular neural network recognition system, and 
the recognition rates were compared. The goal of 
these experiments is to find the better edge de- 
tector that can be used to improve the training 


79.1 Related Work in the Area 


In previous work, we have proposed extensions to the 
traditional edge detectors to improve their performance 
by using fuzzy systems [79.1—3]. The performed exper- 
iments have shown that the resulting images obtained 
with fuzzy edge detectors were visually better than the 
ones obtained with the traditional edge detection meth- 
ods. 

There is still work to be done on developing for- 
mal validation metrics for fuzzy edge detectors. In the 
literature, we can find comparison of edge detectors 
based on human observations [79.4—8], and some oth- 
ers that found the optimal values for parametric edge 
detectors [79.9]. 

Edge detectors can be used in recognition systems 
for different purposes, but in this work we are partic- 
ularly interested in knowing, which is the best edge 


data of a modular neural network for an image 
recognition system. 


detector for a neural recognition system. In this chapter, 
we present some experiments which show that fuzzy 
edge detectors are a good method to improve the per- 
formance of neural recognition systems, and for this 
reason we propose that the recognition rate of the neural 
networks can be used as an edge detection performance 
index. 

The rest of the chapter is organized as fol- 
lows. Section 79.2 presents an overview of fuzzy 
edge detectors. Section 79.3 describes the exper- 
imental setup used to test the proposed fuzzy 
edge detectors in a modular neural recognition sys- 
tem. Section 79.4 presents the experimental results 
achieved with the proposed fuzzy edge detectors. Fi- 
nally, Sect. 79.5 outlines the conclusions and future 
work. 


v 
fa] 
= 
“= 
Gq) 
=~ 
0 
= 


1510 


7°62 | D Hed 


Part G | Hybrid Systems 


79.2 Overview of Fuzzy Edge Detectors 


79.2.1 Sobel Edge Detector Improved 
with Fuzzy Logic 


In this section, an overview of the previously pro- 
posed fuzzy edge detectors is presented. First, the Sobel 
edge detector improved with fuzzy logic is presented. 
Second, the morphological gradient edge detector en- 
hanced with fuzzy logic is also presented. 


In the Sobel fuzzy edge detector we used the individual 
operators Sobel, and Sobel, as in the traditional method, 


Sobel method 


ED or 
FIS1 or 


Edges 


FIS2 


Fig. 79.1 Sobel 
edge detector en- 
hanced with fuzzy 
logic 


Sobel-IT2FIS method Sobel-TIFIS method 


60 80 


Low Middle High Low Middle High 


: NON ; f 7 \) 
ak iar aay Van 
0 at ads Xz 0 r A Bs y.N a 


-200 0 


-800 -600 -400 200 400 600 -800 -600 -400 -200 0 200 400 600 800 
m m 
Low Middle High Low Middle High 
14 mn IN I, 
0.5 NSN 0.5 4 
0 --/ = > 0 - i 
-800 -600 -400 -200 0 200 400 600 800 -800 -600 —400 -200 0 200 400 600 800 
dh (Sobel,) dh (Sobel,) 
Low Middle High Low Middle High 
14 ara l4 n N 
0.5 i 0.5 
0 — > 0 
-800 -600 -400 -200 0 200 400 600 800 -800 -600 -400 600 800 
dv (Sobel,) dv (Sobel,) 
i Low Middle High 1 Low Middle High 
0.5 \ 
0 ae p 
-800 -600 -400 -200 0 200 400 600 800 -800 -600 —400 -200 0 200 400 600 800 
hp 
Low Middle High 
1 S 


Fig. 79.2 Membership functions of the variables for the 
Sobel+FIS1 edge detector 


yl (Sobel +FIS1 edges) 


yl (Sobel + FIS2 edges) 


Fig. 79.3 Membership functions of the variables for the 
Sobel+FIS2 edge detector 


Pattern Recognition with Modular Neural Networks and Type-2 Fuzzy Logic 


79.2 Overview of Fuzzy Edge Detectors 


and then we substitute the Euclidean distance of (79.1) 
by a fuzzy system, as shown in Fig. 79.1 [79.3]. 


Sobel_edges = \/ Sobel; + Sobel; (79.1) 


The individual Sobel operators are the main inputs 
to the type-1 fuzzy inference system (FIS1) and type-2 
fuzzy inference system (FIS2), and we have also con- 
sidered adding two more inputs, which are filters that 
improve the final edge image. The fuzzy variables used 
in the Sobel+FIS1 and Sobel+ FIS2 edges detectors 
are shown in Fig. 79.2 and Fig. 79.3, respectively. 

The use of the FIS2 [79.10, 11] provided images 
with better defined edges than the FIS1, which is a very 
important result in providing better inputs to the neural 
networks that will perform the recognition task. 

The fuzzy rules for both the FIS1 and FIS2 are the 
same and are shown below: 


1. If (dh is LOW) and (dv is LOW) then (y1 is HIGH) 

2. If (dhis MIDDLE) and (dv is MIDDLE) then (y1 is 
LOW) 

3. If (dh is HIGH) and (dv is HIGH) then (y1 is LOW) 


4. If (dh is MIDDLE) and (hp is LOW) then (y1 is 
LOW) 

5. If (dv is MIDDLE) and (Ap is LOW) then (y1 is 
LOW) 

6. If (m is LOW) and (dv is MIDDLE) then (y1 is 
HIGH) 

7. If (m is LOW) and (dh is MIDDLE) then (y1 is 
HIGH) 


The fuzzy rule base shown above infers the gray 
tone of each pixel for the edge image with the following 
reasoning: When the horizontal gradient dp and vertical 


T 
20 40 6 80 


gradient d, are LOW means that there is not enough dif- 
ference between the gray tones in it’s neighbors pixels, 
then the output pixel must belong of an homogeneous 
or not edges region, then the output pixel is HIGH or 
near WHITE. In the opposite case, when dp and dy are 
both HIGH this means that there is enough difference 
between the gray tones in its neighborhood, then the 
output pixel is an EDGE. 


79.2.2 Morphological Gradient Edge 
Detector Improved with Fuzzy Logic 


In the morphological gradient (MG), we calculated 
the four gradients as in the traditional method [79.12, 
13], and substitute the sum of gradients in (79.2) with 
a fuzzy inference system, as shown in Fig. 79.4. 


MG edges = Dı + D2 + D3 + D4 (79.2) 


The linguistic variables used in the MG+FIS1 and 
MG-+FIS2 edges detectors are shown in Fig. 79.5 and 
Fig. 79.6, respectively. 

The rules for both the FIS1 and FIS2 are the same 
and are shown below: 


1. If (D1 is HIGH) or (D2 is HIGH) or (D3 is HIGH) 
or (D4 is HIGH) then (E is BLACK) 

2. If (D1 is MIDDLE) or (D2 is MIDDLE) or (D3 is 
MIDDLE) or (D4 is MIDDLE) then (E is GRAY) 

3. If (D1 is LOW) and (D2 is LOW) and (D3 is LOW) 
and (D4 is LOW) then (E is WHITE) 


After many experiments, we found that an edge ex- 
ists when any gradient D; is HIGH, which means that 
a difference of gray tones in any direction of the image 
must produce a pixel with a BLACK value or EDGE. 
The same behavior occurs when any gradient D; is 


Fig. 79.4 Morphological gradient 
edge detector enhanced with fuzzy 
systems 


1511 


7°62 | D Hed 


1512 PartG | Hybrid Systems 


€°62 | D Hed 


Fig. 79.5 Membership functions of the variables for the 
MG-+FIS1 edge detector 


MIDDLE, which means that even when the differences 
in the gray tones are not maximal, the pixel is an EDGE, 
then the only rule that found a non edge pixel is the 


79.3 Experimental Setup 


The experiment consists on applying a neural recogni- 
tion system using each of the previously presented edge 
detectors: Sobel, Sobel+FIS1, Sobel+FIS2, morpho- 
logical gradient (MG), morphological gradient+FIS1 
and morphological gradient+-FIS2 and then comparing 
the results. 


79.3.1 General Algorithm used 
for the Experiments 


Define the database folder. 

Define the edge detector. 

3. Detect the edges of each image as a vector and store 
it as a column in a matrix. 


N e 


Fig. 79.6 Membership functions of the variables for the 
MG-+FIS2 edge detector 


number 3, where only when all the gradients are LOW, 
the output pixel is WHITE, which means a pixel be- 
longing to an homogeneous region. 


4. Calculate the recognition rate using the k-fold cross- 
validation method. 
a) Calculate the indices for training and test k- 
folds. 
b) Train the neural network k— 1 times, one for 
each training fold calculated previously. 
c) Test the neural network k times, one for each 
fold test set calculated previously. 
5. Calculate the mean rate for all the k-folds. 


79.3.2 Parameters for the Images Databases 


The experiments can be performed with benchmark 
image databases used for identification purposes. This 


Pattern Recognition with Modular Neural Networks and Type-2 Fuzzy Logic 


79.4 Experimental Results 


Table 79.1 Particular information for the tested benchmark face databases 


Database Person number (p) Samples number (s) 
ORL 40 10 
Cropped Yale 38 10 
FERET 74 4 


is the case of face recognition applications, then we 
used three of the most popular benchmark sets of 
images, the ORL face database [79.14], the Cropped 
Yale face database [79.15,16], and the FERET face 
database [79.17]. 

For the three databases, we defined the variable p 
as the person number and s as number of samples for 
each person. The tests were made with k-fold cross- 
validation method, with k = 5 for the three databases. 
We can generalize the calculation of fold size m or num- 
ber of samples in each fold, dividing the total number 
of samples for each person s by the fold number, and 
then multiplying the result by the person number p (3), 
then the train data set size i (4) can be calculated as the 
number of samples in k— 1 folds m, and test data set 
size t (5) are the number of samples in only one fold. 


m= (s/k)*p (79.3) 
i= m(k—1) (79.4) 
fai (79.5) 


The total number of samples used for each person were 
of 10 for the ORL and YALE databases; then if the 


79.4 Experimental Results 


In this section, we show the numerical results of the ex- 
periments. Table 79.2 contains the results for the ORL 
face database, Table 79.3 contains the results for the 
Cropped Yale database, and Table 79.4 contains the re- 
sults for the FERET face database. 

For a better appreciation of the results, we made 
plots for the values presented in Tables 79.2—79.4. Even 


Table 79.2 Recognition rates for the ORL database of 
faces 


Training set Mean Mean Standard Max 
preprocessing time rate deviation rate 
method (s) (%) (%) 
MG-+FIS1 1.2694 89.25 4.47 95.00 
MG-+ FIS2 1.2694 90.25 5.48 97.50 
Sobel++FIS 1 1.2694 87.25 3.69 1 25) 


Sobel+FIS2 1.2694 90.75 4.29 95.00 


Fold size (m) Training set size (i) Test set size (£) 
80 320 80 
76 304 76 
74 222 74 


size m of each 5-fold is 2, the number of samples for 
training for each person is 8 and for testing is 2. For the 
experiments with the FERET face database, we use only 
the samples of 74 persons who have 4 frontal sample 
images. The particular information for each database is 
shown in Tab. 79.1. 


79.3.3 The Modular Neural Network 


In previous experiments with neural networks for image 
recognition, we have found a general structure with ac- 
ceptable performance, even if it is not optimized. We 
used the same structure for multinet modular neural 
networks, in order to establish a standard for compar- 
ison for all the experiments [79.3, 18—23]. The general 
structure for the monolithic neural network is indicated 
below: 


@ Two hidden layers with 200 neurons. 

© Learning algorithm: Gradient descent with mo- 
mentum and adaptive learning rate backpropaga- 
tion. 

@ Error goal 0.0001. 


if this work does not pretend to make a comparison 
based on the training times as performance index for the 
edge detectors, it is interesting to note that the necessary 
time to reach the error goal is established for each ex- 
periment. 

As we can see in Fig. 79.7, the lowest training 
times are for the morphological gradient+FIS2 edge 


Table 79.3 Recognition rates for the cropped Yale 
database of faces 


Training set Mean Mean Standard Max 
preprocessing time rate deviation rate 
method (s) (%) (%) 
MG-+FIS1 1.76 68.42 29.11 100 
MG-+FIS2 1.07 88.16 21.09 100 
Sobel+FIS 1 1.17 79.47 26.33 100 
Sobel+FIS2 1.1321 90 22.36 100 


1513 


1'62 | D Hed 


1514 Part G 


7°62 | D Hed 


Hybrid Systems 


Time (s) 


5504 


—=— ORL 
--@-- Cropped Yale 
--}>- FERET 


> 
Sobel+FIS1 Sobel+FIS2 
Edge detectors 


MG MG+FIS1 MG+FIS2 Sobel 


Table 79.4 Recognition rates for the FERET database of 
faces 


Training set Mean Mean Standard Max 
preprocessing time rate deviation rate 

method (s) (%) (%) 

MG-+FIS1 1.17 75.34 5.45 TOTI 
MG-+FIS2 1.17 72.30 6.85 82.43 
Sobel-+FIS 1 1.17 B2 00.68 83.78 
Sobel+FIS2 eli 84.46 03.22 87.84 


detector and Sobel+FIS2 edge detector. That is because 
both edge detectors were improved with interval type-2 
fuzzy systems and produce images with more homoge- 
neous areas; which means a high frequency of pixels 
near the WHITE linguistic values. 

However, the main advantages of the interval type- 
2 edges detectors are the recognition rates plotted in 
Fig. 79.8, where we can notice that the best mean per- 
formance of the neural network was achieved when 
it was trained with the data sets obtained with the 
MG-+FIS2 and Sobel+FIS2 edge detectors. 

Figure 79.9 shows that the recognition rates are 
also better for the edge detectors improved with inter- 
val type-2 fuzzy systems. The maximum recognition 
rates could not be the better parameter to compare the 


Mean recognition rates 


l4 
—a+ ORL | 
0.95 --@-- Cropped Yale 
--~}>- FERET 
0.9 
Pa 
OS | oooO a La Ka 
B > a 
0.8 ans 
we 
0.75 9~~ as — > 
SRN 
0.7 |" 
o 
0.65 > 
MG+FIS1 MG+FIS2 Sobel+FIS1 Sobel+FIS2 


Edge detectors 


Fig. 79.8 Mean recognition rates for the compared edge 
detectors with ORL, Cropped Yale and FERET face 


Fig. 79.7 Training time for the compared edge detectors tested with databases 
the ORL, Cropped Yale and FERET face databases 


Maximum recognition rates 


O cceaacaciccas @---+-+-+++1+ -+++ @-----+-+ +++ ++ @ 
0.95 
0.9 
—-® 
0.85 ee etl 
wa r 
eer oS, 
0sg---7 š —a- ORL 
aX --@-- Cropped Yale 
--~®- FERET 
0.75 
MG+FIS1 MG+FIS2 Sobel+FIS1 Sobel+FIS2 


Edge detectors 


Fig. 79.9 Maximum recognition rates for the compared 
edge detectors with ORL, Cropped Yale and FERET face 
database 


performance of the neural networks depending on the 
training set; but it is interesting to note that the max- 
imum recognition rate of 97.5% was achieved when 
the neural network was trained with the ORL data set 
preprocessed with the MG+FIS2. This is important be- 
cause in a real-world system, we can use this as the best 
configuration for images recognition, expecting to ob- 
tain good results. 


Pattern Recognition with Modular Neural Networks and Type-2 Fuzzy Logic 


References 


79.5 Conclusions 


This chapter is one of the first efforts to develop a com- 
parison method for edge detectors as a function of their 
performance in different types of recognition systems. 
In this chapter, we show that Sobel and Morphologi- 


References 


79.1 0. Mendoza, P. Melin, G. Licea: A New Method for 
Edge Detection in Image Processing Using Interval 
Type-2 Fuzzy Logic, IEEE Int. Conf. Granular Comput. 
(GRC) (2007) 

79.2 0. Mendoza, P. Melin, G. Licea: Fuzzy Inference Sys- 
tems Type-1 and Type-2 for Digital Images Edges 
Detection, Eng. Lett., Int. Ass. Eng. 15(1), 45-52 
(2007) 

79.3 0. Mendoza, P. Melin, G. Licea: Interval type-2 fuzzy 
logic for edges detection in digital images, Int. 
J. Intell. Syst. 24(11), 1115-1134 (2009) 

79.4 H. Bustince, E. Berrenechea, M. Pagola, J. Fernan- 
dez: Interval-valued fuzzy sets constructed from 
matrices: Application to edge detection, Fuzzy Sets 
Syst. 160(13), 1819-1840 (2009) 

79.5 K. Revathy, S. Lekshmi, S.R. Prabhakaran Nayar: 
Fractal-based fuzzy technique for detection of ac- 
tive regions from solar, J. Solar Phys. 228, 43-53 
(2005) 

79.6 K. Suzuki, |. Horiba, N. Sugie, M. Nanki: Contour 
extraction of left ventricular cavity from digital 
subtraction angiograms using a neural edge detec- 
tor, Syst. Comput. 34(2), 55-69 (2003) 

79.7 L. Hua, H.D. Cheng, M. Zhang: A high performance 
edge detector based on fuzzy inference rules, Inf. 
Sci. 177(21), 4768-4784 (2007) 

79.8 M. Heath, S. Sarkar, T. Sanocki, K.W. Bowyer: A 
robust visual method for assessing the relative 
performance of edge-detection algorithms, IEEE 
Trans. Pattern Anal. Mach. Intell. 19(12), 1338-1359 
(1997) 

79.9 Y. Yitzhaky, E. Peli: A method for objective edge 
detection evaluation and detector parameter se- 
lection, IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 
1027-1033 (2003) 

79.10 J. Mendel: Uncertain Rule-Based Fuzzy Logic Sys- 
tems: Introduction and New Directions (Prentice- 
Hall, Upper Saddle River 2000) 

79.11 J.R. Castro, O. Castillo, P. Melin, A. Rodriguez-Diaz: 
Building fuzzy inference systems with a new inter- 
val type-2 fuzzy logic tool-box, Lect. Notes Comput. 
Sci. 4750, 104-114 (2008) 

79.12 A.N. Evans, X.U. Liu: Morphological gradient ap- 
proach for color edges detection, IEEE Trans. Image 
Process. 15(6), 1454-1463 (2006) 

79.13 F. Russo, G. Ramponi: Edge extraction by FIRE op- 
erators Fuzzy Systems, Proc. 1st IEEE Conf. Evolu- 


cal Gradient edge detectors improved with type-2 fuzzy 
logic have better performance than the type-1 fuzzy 
edge detector and traditional methods in an image 
recognition system based on neural networks. 


tionary Computation (ICEC), Orlando, Florida (1994) 
pp. 249-253 

79.14 AT & T Laboratories Cambridge, The ORL database 
of faces, http://www.cl.cam.ac.uk/research/dtg/ 
attarchive/facedatabase.html 

79.15 A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman: 
From few to many: Illumination cone models for 
face recognition under variable lighting and pose, 
IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 643- 
660 (2001) 

79.16 K.C. Lee, J. Ho, D. Kriegman: Acquiring linear sub- 
spaces for face recognition under variable lighting, 
IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 684- 
698 (2005) 

79.17 P.J. Phillips, H. Moon, S.A. Rizvi, P.J. Rauss: The 
FERET evaluation methodology for face-recognition 
algorithms, IEEE Trans. Pattern Anal. Mach. Intell. 
22(10), 1090-1104 (2000) 

79.18 0. Mendoza, P. Melin: The Fuzzy Sugeno Inte- 
gral As A Decision Operator in The Recognition of 
Images with Modular Neural Networks. In: Hy- 
brid Intelligent Systems, Studies in Fuzziness and 
Soft Computing, (Springer, Berlin, Heidelberg 2007) 
pp. 299-310 

79.19 0. Mendoza, P. Melin, G. Licea: A Hybrid Approach 
for Image Recognition Combining Type-2 Fuzzy 
Logic, Modular Neural Networks and the Sugeno In- 
tegral, Inf. Sci. 179(13), 2078-2101 (2007) 

79.20 0. Mendoza, P. Melin, G. Licea: Interval Type-2 Fuzzy 
Logic for Module Relevance Estimation in Sugeno 
Integration of Modular Neural Networks. In: Soft 
Computing for Hybrid Intelligent Systems, Studies 
in Computational Intelligence, Vol. 154, (Springer, 
Berlin, Heidelberg 2008) pp. 115-127 

79.21 0. Mendoza, P. Melin, G. Licea: A hybrid approach 
for image recognition combining type-2 fuzzy logic, 
modular neural net-works and the Sugeno inte- 
gral, Inf. Sci. 179(3), 2078-2101 (2009) 

79.22 0. Mendoza, P. Melin, 0. Castillo: Interval type-2 
fuzzy logic and modular neural networks for face 
recognition applications, Appl. Soft Comp. J. 9(4), 
1377-1387 (2009) 

79.23 0. Mendoza, P. Melin, 0. Castillo, G. Licea: Type-2 
fuzzy logic for improving training data and re- 
sponse integration in modular neural networks for 
image recognition, Lect. Notes Comput. Sci. 4329, 
604-612 (2007) 


1515 


64 | D Hed 


80. Fuzzy Controllers for Autonomous Mobile Robots 


Patricia Melin, Oscar Castillo 


This chapter addresses the tracking problem for 
the dynamic model of a unicycle mobile robot. 
A novel optimization method inspired from the 
chemical reactions is applied to solve this motion 
problem by integrating a kinematic and a torque 
controller based on fuzzy logic theory. Computer 
simulations are presented confirming that this 
optimization paradigm is able to outperform other 
optimization techniques applied to this particular 
robot application. 


80.1 Fuzzy Control of Mobile Robots............. 1517 
80.2 The Chemical Optimization Paradigm.... 1518 


80.1 Fuzzy Control of Mobile Robots 


Optimization is an activity carried out in almost ev- 
ery aspect of our life, from planning the best route of 
our way back home from work to more sophisticated 
approximations on the stock market, or the parameter 
optimization for a wave soldering process used in the 
manufacture of a printed circuit board assembly, op- 
timization theory has gained importance over the last 
decades. From science to applied engineering (to name 
a few), there is always something to optimize and, of 
course, more than one way to do it. 

In a generic definition, we may say that optimiza- 
tion aims to find the best available solution among 
a set of potential solutions in a defined search space. 
For almost every problem there exists a solution, not 
necessarily the best one, but we can always find an 
approximation to the ideal solution, and while in 
some cases or processes it is still common to use 
our own experience to qualify a process, a part of 
the research community has dedicated a considerable 
amount of time and effort to help find robust opti- 
mization methods for optima found in a vast range of 
applications. 


80.2.1 Elements/Compounds ................ 1519 
80.2.2 Chemical Reactions................... 1520 
80.2.3 Synthesis Reactions ................... 1520 
80.2.4 Decomposition Reactions ........... 1520 


80.2.5 Single-Substitution Reactions..... 1520 
80.2.6 Double-Substitution 


REACTOS a ccssiscessecssvanseasoesctestors 1521 
80.3 The Mobile Robot ....................::cceeeeee 1521 
80.4 Fuzzy Logic Controller ........................ 1522 
80.5 Experimental Results ......................0. 1523 
80.6 Conclusions... esoneri. 1530 
Referentes,- coniro sisenucn ase renees 1530 


That it is difficult to solve different problems by ap- 
plying the same methodology, and even the most robust 
optimization approaches may be outperformed by other 
optimization techniques, depending on the problem to 
be solved. 

When the complexity and the dimension of the 
search space make a problem unsolvable by a deter- 
ministic algorithm, probabilistic algorithms deal with 
this problem by going through a diverse set of possi- 
ble solutions or candidate solutions. Many metaheuris- 
tic algorithms can be considered probabilistic because 
they apply probability tools to solve a problem; meta- 
heuristic algorithms seek good solutions by mimicking 
natural processes or paradigms. Most of these novel op- 
timization paradigms that were inspired by nature were 
conceived by mere observation of an existing process 
and their main characteristics were embodied as com- 
putational algorithms. 

The importance of the optimization theory and its 
application has grown in the past few decades, from 
the well known genetic algorithm paradigm to parti- 
cle swarm optimization (PSO), ant colony optimization 


1517 


v 
o 

= 

as 
Gq) 
© 
© 
= 


1518 Part G | Hybrid Systems 


7°08 | D Hed 


(ACO), harmonic search, deoxyribonucleic acid (DNA) 
computing, among others, abd they were all were intro- 
duced with the expectation that they would improve the 
results obtained with existing strategies. 

There is no doubt that there could be some opti- 
mization strategies presented at some point that were 
left behind due their complexity and poor performance. 
Novel optimization paradigms should be able to per- 
form well in comparison with another optimization 
techniques and must be easily adaptable to different 
kinds of problems. 

Optimization based on chemical processes is 
a growing field that has been satisfactorily applied to 
several problems. In [80.1] a DNA-based algorithm 
was introduced to solve the small hitting set problem. 
A catalytic search algorithm was explored in [80.2], 
where some physical laws such as mass and energy 
conservation were taken into account. In [80.3], the 
potential roles of energy in algorithmic chemistries 
were illustrated. An energy framework was introduced, 
which keeps the molecules within reasonable length 
bounds, allowing the algorithm to behave thermo- 
dynamically and kinetically, similarly to real chem- 
istry. A chemical reaction optimization was applied to 
a grid scheduling problem in [80.4], where molecules 
interact with each other aiming to reach the mini- 
mum state of free potential and kinetic energies. The 
main difference between these metaheuristics is the 
parameter representation, which can be explicit or 
implicit. 

In this paper, we introduce an optimization method 
inspired by chemical reactions and its application for 
the optimization of the tracking controller of the dy- 
namic model of the unicycle mobile robot. 

The importance of applying this chemical opti- 
mization algorithm is that different methods have been 
applied to solve motion control problems. Kanayama 
etal. [80.5] propose a stable tracking control method 
for a nonholonomic vehicle using a Lyapunov function. 
Lee etal. [80.6] solved tracking control using back- 
stepping and in [80.7] saturation constraints were used. 
Furthermore, most reported designs rely on intelligent 
control approaches such as fuzzy logic control [80.8— 
13] and neural networks [80.14, 15]. 


However, the majority of the publications men- 
tioned above concentrated on kinematic models of 
mobile robots, which are controlled by the veloc- 
ity input, while less attention has been paid to the 
control problems of nonholonomic dynamic systems, 
where forces and torques are the true inputs. Bloch 
and Drakunov [80.16] and Chwa [80.17] used slid- 
ing mode control for the tracking control problem. 
Fierro and Lewis [80.18] proposed a dynamical exten- 
sion that makes the integration of kinematics and torque 
controller possible for a nonholonomic mobile robot. 
Fukao et al. [80.19] introduced an adaptive tracking 
controller for the dynamic model of mobile robots with 
unknown parameters using backstepping methodology, 
which has been recognized as a tool for solving several 
control problems [80.20, 21]. 

Motivated by this, herein a Mamdani fuzzy logic 
controller is introduced in order to drive the kinematic 
model to a desired trajectory in a finite time; consid- 
ering the torque as the real input, a chemical reaction 
optimization paradigm is applied and simulations are 
shown. 

Further publications [80.22—24] applied bio- 
inspired optimization techniques to find the parameters 
of the membership functions for the fuzzy tracking 
controller that solves the problem for the dynamic 
model of a unicycle mobile robot, using a fuzzy logic 
controller that provides the required torques to reach 
the desired velocity and trajectory inputs. 

In this paper, the main contribution is the represen- 
tation of the fuzzy controller in the chemical paradigm 
to search for the optimal parameters. Simulation results 
show that the proposed approach outperforms other 
nature-inspired computing paradigms, such as genetic 
algorithms, particle swarm, and ant colony optimiza- 
tion. 

The rest of this chapter is organized as follows. 
Section 80.2 illustrates the proposed methodology. Sec- 
tion 80.3 describes the problem formulation and con- 
trol objective. Section 80.4 describes the proposed 
fuzzy logic controller of the robot. Section 80.5 shows 
some experimental results of the tracking controller. 
In Sect. 80.6 some conclusions and future work are 
presented. 


80.2 The Chemical Optimization Paradigm 


The proposed chemical reaction algorithm is a meta- 
heuristic strategy that performs a stochastic search for 
optimal solutions within a defined search space. In this 


optimization strategy, every solution is represented as 
an element (or compound), and the fitness or perfor- 
mance of the element is evaluated in accordance with 


Fuzzy Controllers for Autonomous Mobile Robots 


80.2 The Chemical Optimization Paradigm 


Generate initial 
element pool 


Set initial 
parameters 


START 


Evaluate 
results 


Elitist 
reinsertion 


Apply 
reactions 


Fig. 80.1 General flowchart of the 
chemical reaction algorithm 


Evaluate initial 
pool 


Select elements 
to react 


f 


Stopping 
criteria 
reached? 


the objective function. The general flowchart of the al- 
gorithm is shown in Fig. 80.1. 

The main difference with other optimization tech- 
niques [80.1—4] is that no external parameters are 
taken into account to evaluate the results, while 
other algorithms introduce additional parameters (ki- 
netic/potential energies, mass conservation, thermody- 
namic characteristics, etc.). This is a very straight- 
forward methodology that takes the characteristics of 
the chemical reactions (synthesis, decomposition, sub- 
stitution, and double-substitution) to find the optimal 
solution. 

This approach is a static population-based meta- 
heuristic that applies an abstraction of the chemical re- 
actions as intensifiers (substitution, double substitution 
reactions) and diversifying (synthesis, decomposition 
reactions) mechanisms. The elitist reinsertion strategy 
allows the permanence of the best elements, and thus 
the average fitness of the entire element pool increases 
with every iteration. The algorithm may trigger only 
one reaction or all of them, depending on the nature of 
the problem to solve. For example, we may use only the 
decomposition reaction subroutine to find the minimum 
value of a mathematical function. 

The pseudocode for the chemical reaction algorithm 
is as follows: 


Algorithm 80.1 Chemical_Reaction_Algorithm 
Input: problem_definition, objective_function, di- 
mensions, 

1: Assign values to variables: pool_size, trials, 
upper_boundary, lower_boundary, synthesis_rate, 
decomposition_rate, singlesubstitution_rate, dou- 
blesubstitution_rate. 

2: Generate randomly Initial_Pool in interval [lower_ 
boundary, upper_boundary| 

3: Evaluate Jnitial_Pool 


4: Identify best_solution 

5: while ( stopping criteria not met ) do 

6: Perform Synthesis_Procedure; Get Synthe- 
sis_vector 

7: Perform Decomposition_Procedure,; Get De- 
composition_vector 

8: Perform SingleSubstitution_Procedure; Get Sin- 
gleSubstitution_vector 

9: Perform DoubleSubstitution_Procedure; Get 
DoubleSubstitution_vector 

10: Evaluate Synthesis_vector, _Decomposition_ 
vector, SingleSubstitution_vector, DoubleSub- 
stitution_vector 

11: Apply elitist_reinsertion; Get improved_pool 

12: Update best_solution 

13: end while 

Output: best_solution 


All nature-inspired paradigms have their own way 
to encode candidate solutions. When these parameters 
are defined, a set of processes or procedures are applied 
to lead the population to an optimal result. The main 
components of this chemical reaction algorithm are de- 
scribed below. 


80.2.1 Elements/Compounds 


These are the basic components of the algorithm. Each 
element or compound represents a solution within the 
search space. The initial definition of elements and/or 
compounds depends on the problem itself and can be 
represented as binary numbers, integer, floating, etc. 
They interact with each other implicitly; that is, the 
definition of the interaction is independent of the real 
molecular structure. In this approach the potential and 
kinetic energies and other molecular characteristics are 
not taken into account. 


1519 


7°08 | D Hed 


1520 PartG 


Hybrid Systems 


7°08 | D Hed 


80.2.2 Chemical Reactions 


A chemical reaction is a process in which at least 
one substance changes its composition and its sets of 
properties. In this approach, the chemical reactions be- 
have as intensifiers (substitution, double substitution 
reactions) and diversifying (synthesis, decomposition 
reactions) mechanisms. The four chemical reactions 
considered in this approach are the synthesis, decom- 
position, single and double-substitution reactions. The 
objective of these operators is to explore or exploit new 
possible solutions within a slightly larger hypercube 
than the original elements/compounds, but within the 
previously specified range. 

The synthesis and decomposition reactions are used 
to diversify the resulting solutions; these procedures 
were shown to be highly effective and to rapidly lead 
the results to a desired value. They can be described as 
follows. 


80.2.3 Synthesis Reactions 


This is a reaction of two reactants to produce one 
product. By combining two (or more) elements, this 
procedure allows us to explore higher-valued solutions 
within the search space. The result can be described as 
a compound (B+ C — BC). The pseudocode for the 
synthesis reaction procedure is as follows: 


Algorithm 80.2 Synthesis_Procedure 
Input: selected_elements, synthesis_rate 
1: n= size ( selected_elements ) 
2: i= floor (n/2 ) 
3: forj = 1toi-1 
4: Synthesis = selected_elements; 
+ selected_elements;+ 
> J=]F2 
6: end for 
Output: Synthesis_vector 


80.2.4 Decomposition Reactions 


In this reaction, typically, only one reactant is given, 
which allows a compound to be decomposed into 
smaller instances (BC —> B + C). The pseudocode for 
the decomposition reaction procedure is as follows: 


Algorithm 80.3 Decomposition_Procedure 
Input: selected_elements, decomposition_rate 
1: n= size ( selected_elements ) 


Table 80.1 Main elements of several nature-inspired 
paradigms 


Paradigm Parameter Basic operations 
representation 
GA Genes Crossover, mutation 
ACO Ants Pheromone 
PSO Particles Cognitive, social coefficients 
GP Trees Crossover, mutation 
(In some cases) 
CRM Elements, Reactions (combination, 
Compounds decomposition, 
Substitution, double- 
substitution) 


: Get randval randomly in interval [ 0, 1 ] 

: fori= l ton 
Deco, = selected _elements; x randval 
Deco = selected_elements; x ( 1 — randval) 
i=i+l 

end for 

Output: Decomposition_vector ( Deco, Decoz) 


So Oy Gv kD 


The single and double-substitution reactions allow 
the algorithm to search for optima around a previously 
found good solution and they are described below. 


80.2.5 Single-Substitution Reactions 


When a free element reacts with a compound of dif- 
ferent elements, the free element will replace one of 
the elements in the compound if the free element is 
more reactive than the element it replaces. A new com- 
pound and a new free element are produced. In the 
algorithm, a compound and an element are selected and 
a decomposition reaction is applied to the compound; 
two elements are generated from this operation. Then, 
one of the new generated elements is combined with 
the non-decomposed selected element (C+ AB — AC 
+B). The pseudocode for the single-substitution reac- 
tion procedure is as follows: 


Algorithm 80.4 SingleSubstitution_Procedure 
Input: selected_elements, singlesubstitution_rate 

: n= size ( selected_elements ) 

2: i= floor (n/2 ) 

3: a= _ selected_elements,, selected_elements>, ..., 
selected_elements; 

4: b= selected_elements;+,, selected_elements;+2, 
..., selected_elements} x2 

5: Apply Decomposition_Procedure to a; Get Decoy, 
Decoz 


= 


Fuzzy Controllers for Autonomous Mobile Robots 


80.3 The Mobile Robot 


6: Apply Synthesis_Procedure ( b+ Deco), Get Syn- 
thesis_vector Output: SingleSubstitution_vector 
( Synthesis_vector, Decon ) 


80.2.6 Double-Substitution Reactions 


Double-substitution or double-replacement reactions, 
also called double-decomposition reactions or metathe- 
sis reactions, involve two ionic compounds, most often 
in aqueous solution. In this type of reaction, the cations 
simply swap anions; in the algorithm, a similar process 
to that in the previous reaction happens. The difference 
is that in this reaction both of the selected compounds 
are decomposed, and the resulting elements are com- 
bined with each other (AB + CD —> CB + AD). The 
pseudocode for the double-substitution reaction proce- 
dure is as follows: 


Algorithm 80.5 DoubleSubstitution_Procedure 
Input: selected_elements, doublesubstitution_rate 
1: n = size ( selected_elements ) 
2: i= floor (n/2 ) 
3: a= _ selected_elements,, selected_elements>, ..., 
selected_elements; 


80.3 The Mobile Robot 


Mobile robots are non-nonholonomic systems due to 
the constraints imposed on their kinematics. The equa- 
tions describing the constraints cannot be integrated 
symbolically to obtain explicit relationships between 
robot positions in local and global coordinates’ frames. 
Hence, control problems that involve them have at- 
tracted attention in the control community in recent 
years [80.25]. 


S ____i _.| 
x X 


Fig. 80.2 Diagram of a wheeled mobile robot 


4: b= selected_elements;+ı, selected_elementsi+2, 
..., Selected _elementSix2 

5: Apply Decomposition_Procedure to a and b; Get 
(Deco,, Decoz), (Deco, Deco?) 

6: Apply Synthesis_Procedure (Decoy + Deco}), 
(Decon + Deco); Get Synthesis_vector,, Synthe- 
sis_vector 
Output: SingleSubstitution_vector 
( Synthesis_vector,, Synthesis_vector’, ) 


In this chemical reaction algorithm we may trigger 
only one reaction or all of them, depending on the na- 
ture of the problem to be solved, e.g., we can apply only 
the decomposition reaction subroutine to find the mini- 
mum value of a mathematical function. 

Throughout the execution of the algorithm, when- 
ever a new set of elements/compounds is created, an 
elitist reinsertion criterion is applied, allowing the per- 
manence of the best elements, and thus the average 
fitness of the entire element pool increases through iter- 
ations. 

In order to have a better picture of the general 
schema for this proposed chemical reaction algorithm, 
a comparison with other nature-inspired paradigms is 
shown in Table 80.1. 


The model considered is that of a unicycle mobile 
robot (see Fig. 80.2) that has two driving wheels fixed 
to the axis and one passive orientable wheel placed in 
front of the axis and normal to it [80.26]. 

The two fixed wheels are controlled independently 
by the motors, and the passive wheel prevents the robot 
from overturning when moving on a plane. 

It is assumed that the motion of the passive wheel 
can be ignored from the dynamics of the mobile robot, 
which is represented by the following set of equa- 
tions [80.18] 


cos 0 ‘ 
qg=| sind 0 M(q)v+ V(q, g)v + G(q) 
0 1 


(80.1) 


where q = [x, y, O]! is the vector of generalized coordi- 
nates that describes the robot’s position, (x, y) are the 
Cartesian coordinates, which denote the mobile center 


1521 


€°08 | 9 Hed 


1522 


7°08 | D Hed 


Part G 


Hybrid Systems 


qd e 


FLC 


T 
Mobile 
robot 


Ls} so : fa | 


fele, va) 


- T.(qa—4) 


Fig. 80.3 Tracking control structure 


of mass, and 0 is the angle between the heading direc- 
tion and the x-axis (which is taken in counterclockwise 
form); v = [v, w]! is the vector of velocities, v and w are 
linear and angular velocities respectively t € R” is the 
input vector, M(q) € R”” is a symmetric and positive- 
definite inertia matrix, V(g, g) € R”” is the centripetal 
and Coriolis matrix, and G(q) € R” is the gravitational 
vector. Equation (80.1) represents the kinematics or 
steering system of a mobile robot. 

Notice the no-slip condition imposed a non- 
nonholonomic constraint described by (80.2), which 
means that the mobile robot can only move in the di- 
rection normal to the axis of the driving wheels. 


ycosO—xsind =0. (80.2) 


The control objective will be established as follows: 
given a desired trajectory qa(f) and the orientation of 
the mobile robot, we must design a controller that 
applies an adequate torque t such that the measured 
positions q(t) achieve the desired reference qa(t) rep- 
resented as (80.3) 


lim ||qa(t)—q@|| = 0. (80.3) 
t—> co 
To reach the control objective, the method is based 
on the procedure of [80.18], and we derive t(t) of 
a specific v,(t) that controls the steering system (80.1) 
using a fuzzy logic controller (FLC). The general 
structure of a tracking control system is presented in 
Fig. 80.3. 

The control is based on the procedure proposed by 
Kanayama et al. [80.5] and Nelson and Cox [80.27] 


80.4 Fuzzy Logic Controller 


The purpose of the fuzzy logic controller (FLC) is to 
find a control input t such that the current velocity vec- 
tor v is able to reach the velocity vector ve, and this is 


to solve the tracking problem for the kinematic 
model v, (t). Suppose that the desired trajectory qq sat- 
isfies (80.4) 


cosa 0 
Ga=| sina 0 va (80.4) 
o 41 |!" 


Using the robot local frame (the moving coordinate 
system x-y in Fig. 80.1), the error coordinates can be 
defined as (80.5) 


ey 
e=T.(qa—4), ey 
eg 
cos sinô 0 Xa — x 
=| —sinð cosé 0 ya— y (80.5) 
0 0 1 0a— 0 


Moreover, the auxiliary velocity control input that 
achieves tracking for (80.1) is given by (80.6) 


Ve =fele, va), 


Ve 
We 
va + coseg + ky ey 


; ; 80.6 
wa + Vak2ey + vaks sin eg ( ) 


where kı, k2 and kz are positive gain constants. 

The first part for this work is to apply the proposed 
method to obtain the values of k; (i = 1, 2,3) to achieve 
the optimal behavior of the controller, and the second 
part is to optimize the fuzzy controller. 


denoted as 


lim ||v.—v|| =0. (80.7) 
t>>co 


Fuzzy Controllers for Autonomous Mobile Robots 


80.5 Experimental Results 


a 
Er N Z P 
1 
0.5 
0 
soe oA o2 o 02 oA oe oS l 
een 
b 
) u4 N Z P 
1 
0.5 
0 


-1 -0.8 -0.6 -0.4-0.2 0 0.2 04 0.6 0.8 I 
FIN 


Fig. 80.4a,b Membership functions of (a) input e, and ew, 
and (b) output variables F and N 


The input variables of the FLC correspond to the ve- 
locity errors obtained in (80.5) using the derivatives of 


80.5 Experimental Results 


Several tests of the chemical optimization paradigm 
were made to test the performance of the tracking con- 
troller. First, we need to find the values of k; (i = 1, 2, 
3) shown in (80.6), which will guarantee convergence 
of the error e to zero. 

To evaluate the constants obtained by the algo- 
rithm, the mobile robot tracking system, which consists 
in (80.5) and (80.6), was modeled using Simulink. 
Figure 80.5 shows the closed loop for the tracking con- 
troller. 

The conditions to evaluate each result, which 
correspond to the final position error, are given 
by (80.8): 


(80.8) 


TE 5 ex) +e ++ eo) | 


n 
i=1 


Table 80.2 Fuzzy rule set 


ey ey N Z P 

N N/N N/Z N/P 
Z Z/N ZA Z/P 
B P/N P/Z P/P 


the position and angular errors (denoted as e, and é,,). 
The initial membership functions (MF) are defined by 
one triangular and two trapezoidal functions for each 
variable involved. Figure 80.4 depicts the MFs in which 
N, Z, P represent the fuzzy sets (negative, zero, and pos- 
itive, respectively) associated to each input and output 
variable. 

The rule set of the FLC contains nine rules, 
which govern the input-output relationship of the 
FLC, and this adopts the Mamdani-style inference 
engine. We use the center of gravity method to re- 
alize the defuzzification procedure. In Table 80.2 we 
present the rule set whose format is established as fol- 
lows: 


Rule i: If e, is G1 and e,, is G2 
then F is G3 and N is G4, 


where G1 G4 are the fuzzy sets associated 
to each variable i=1...9. In this case, P de- 
notes positive, N denotes negative, and Z denotes 
zero. 


For the first set of experiments only the decomposi- 
tion reaction mechanism was triggered and the decom- 
position factor was varied; this factor is the quantity 
of resulting elements after applying a decomposition 
reaction to a determined compound. The only restric- 
tion here is that x be the selected compound and x; (i = 
1 2,...,7) the resulting elements. The sum of all values 
found in the decomposition must be equal to the value 
of the original compound. This is shown in (80.9) 


yes. (80.9) 


Each experiment was executed 35 times and the test 
parameters for each set of experiments can be observed 
in Table 80.3. 


1523 


S°08 | D Hed 


1524 Part G | Hybrid Systems 


fu) 


Ideal linear velocity 


Position error 


fw 


Ideal angular velocity 


Tracking error system 


Error display 


total test 


To workspace 


Desired linear 
velocity (va) 


Desired angular 
velocity (va) 


Fig. 80.5 Closed loop for the tracking controller system 


The decomposition rate (Dec. rate) represents the The selection strategy applied was stochastic uni- 
percentage of the pool that are candidates for the de- versal sampling, which uses a single random value to 
composition and the decomposition factor (Dec. factor) 
is the number of elements that are to be decomposed Table 80.3 Parameters of the chemical reaction optimiza- 


into. tion 
No. Elements Iterations Dec. factor Dec. rate 
Positi ; 1 2 10 a 0.3 
osition error 1n x 

y 0.34 2 5 10 3 0.3 
a | 02 ae 10 2 0.4 
(=) 0.1 4 2 10 3 0.4 
ae 0 5 5 10 2 0.4 
fo) > 
o 0 0.5 1 1.5 2 2.5 3 35 6 5 10 3 0.4 
wa Position error in y T 5 10 2 0.5 

0.3 4 8 10 10 2 0.5 

0.2 

0.1 Table 80.4 Experimental results of the proposed method 

0 for optimizing the values of the gains kı, k2, and k3 


> 


22 ; K i z . 2 i oe No. Best error Mean ky k2 k3 

ie Rosier emor ino 1 0.0086 1.1568 S100) een is 
2 4.79x10-% 0.1291 AOS | Bil | Sil 

0 3 0.0025 0.5809 36 ©6328 |E 
ne 4 0.0012 0.5589 2 |2 |0 
0 05 1 15 7 25 3 35 z 3 0.0035 0.0480 185 29 3 

6 EEO 0.0299 m | SB | 1S 

Fig. 80.6 Final position errors in x, y, and 6 for experiment num- 7 0.0066 0.1440 29 15 0 
ber 6 8 0.0019 0.1625 51 3 0 


Fuzzy Controllers for Autonomous Mobile Robots 


80.5 Experimental Results 


Table 80.5 Comparison of the best results 


Parameters Genetic Chemical optimization 
algorithm algorithm 
Individuals 5 2 
Iterations 5} 10 
Crossover rate 0.8 N/A 
Mutation rate 0.1 N/A 
Synthesis rate N/A 0.2 
Decomposition N/A 0.8 
rate 
Substitution rate N/A 0.6 
Double N/A 0.6 
substitution rate 
kı, k2, k3 43, 493, 195 36, 328, 88 
Final error 0.006734 0.0025 


Table 80.6 Parameters of the simulations for Type-1 FLC 


Parameters Value 

Elements 10 

Trials 15 

Selection method Stochastic universal sampling 
ky 117 

ky 226 

k3 IB7 

Error 0.077178 


Table 80.7 Parameters of the simulations for Type-2 FLC 


Parameters Value 

Elements 10 

Trials 10 

Selection method Stochastic universal sampling 
kı 117 

ky 226 

k3 137 

Error 2.7736 


sample all of the solutions by choosing them at evenly 
spaced intervals. In the example, for a pool containing 
five initial compounds, the vector length of decomposed 
elements when the decomposition factor is 3 and the de- 
composition rate is 0.4 will be six elements. 

By applying this criterion, the initial pool of ele- 
ments increased with every iteration. This is why the 
initial element pool was set to ten elements as the max- 
imum. Table 80.4 shows the results after applying the 
chemical optimization paradigm. 

As can be observed in Table 80.4, experiment num- 
ber 6 seems to have the best result because it reached the 
smaller final error among all experiments. Figure 80.6 
shows the final position errors in x, y, and 0 for experi- 
ment number 6. 


Position error in x 


Position error in y 


0 0.5 1 L5 2 25 3 3.5 


Position error in 0 
0.44 


0.2 


1S 2 25 3 Fa 


0 
0.2 


DA 


0 0.5 1 1.5 2 25 3 33 


Fig. 80.7 Final position errors in x, y, and 0 for experiment num- 


ber 3 
Position error in x 
03), 
0.2 
0.1 
0 
—0.1 
0 0.5 1 1.5 2, 25 3 35 
Position error in y 
034 
0.2 
0.1 
0 
—0.1 
0 0.5 1 1.5 2 25 3 ahs) 
Position error in 0 
0.44 
0.2 
z Loo 
—0.2 
-0.4 


0 0.5 1 15 2 25 3 25 


Fig. 80.8 Position errors in x, y, and 0 of best result by apply- 


ing GAs 


By analyzing the graphical results of several sets of 
exercises, we noticed that the control obtained for some 
of them was smoother despite the average error value. 
This was the case for experiment number 3, in which 
the final error value was significantly higher than that 
obtained in experiment number 6. Figure 80.7 shows 
the final position errors in x, y, and @ for experiment 
number 3. 

Comparing both graphics, we can observe that the 
average error obtained for 0 is 0.0338 for experiment 


1525 


S°08 | D Hed 


1526 PartG 


Hybrid Systems 


s°08 | D Hed 


number 6 and 0.0315 for experiment number 3. This 
smoother control of the tracking system could make 
a big difference in the complete dynamic system of the 
mobile robot. 

In previous work [80.28], the gain constant values 
were found by means of genetic algorithms. Table 80.5 
shows a comparison of the best results obtained with 
both algorithms, and we can observe that the result with 
the chemical optimization outperforms the GA in find- 
ing the best gain values. 

Figure 80.8 shows the result in Simulink for the 
experiment with the best overall result when applying 
GAs as the optimization method. 

Once we have found optimal values for the gain 
constants, the next step is to find the optimal val- 
ues for the input/output membership functions of the 
fuzzy controller. Our goal is that the lineal and angu- 
lar velocities reach zero in the simulations. Table 80.6 
shows the parameters of the simulations for Type-1 
FLC. 

Figure 80.9 shows the behavior of the chemical op- 
timization algorithm throughout the experiment. 

Figure 80.10 shows the resulted input and output 
membership functions found by the proposed optimiza- 
tion algorithm. 

Figure 80.11 shows the trajectory obtained when 
simulating the mobile control system including the ob- 
tained input and output membership functions. 


log10 (f@)) 
l4 


k k Best individual = 0.077178 


0.5 


0 5 10 15 
Generation 


Fig. 80.9 Best simulation of experiments with the chemi- 
cal optimization method 


Fig. 80.10a-d Resulting input membership functions: 
(a) linear and (b) angular velocities and output (c) right 
and (d) left torque > 


Figure 80.12 shows the best trajectory reached by 
the mobile robot when optimizing the input and output 
membership functions using genetic algorithms. 

A Type-2 FLC was developed using the param- 
eters of the membership functions found for Type-1 
FLC. The parameters searched with the chemical re- 
action algorithm were for the footprint of uncertainty 
(FOU). 

Table 80.7 shows the parameters used in the simula- 
tions and Fig. 80.13 shows the behavior of the chemical 
optimization algorithm throughout the experiment. 


a) Membership function plots 
N © P 


1 
0.5 Y 
0 > 


—1000 -800 -600 —400 -200 0 200 400 600 800 1000 
Input variable e, 


b) Membership function plots 
N C P 


0.5 


0 
—1000 -800 -600 —400 -200 0 200 400 600 800 1000 
Input variable e,, 


c) Membership function plots 
N C P 


1 
0.5 \ | 
0 


= 
—1000 -800 -600 -400 -200 0 200 400 600 800 1000 
Input variable T, 


d) Membership function plots 
i N iC P 
1 
0.5 \ | / 
0 > 


—1000 -800 -600 —400 -200 0 200 400 600 800 1000 
Input variable 7> 


Fuzzy Controllers for Autonomous Mobile Robots 


80.5 Experimental Results 


y-axis xy-plot 


0.3 > 
-1 0 1 2 3 


X-axis 


Fig. 80.11 Trajectory obtained when applying the chemi- 
cal reaction algorithm 


ae xy-plot 


0.2 


0.1 


0.3 > 
-1 0 1 2 3 


X-axis 


Fig. 80.12 Trajectory obtained using GAs 


Figure 80.14 shows the resulting Type-2 input and 
output membership functions found by the proposed 
optimization algorithm and Fig. 80.15 shows the ob- 
tained trajectory reached by the mobile robot. 

As observed in Table 80.7, the final error obtained 
is not smaller that the final error found for the Type-1 
FLC. Despite this, the trajectory obtained, which is 
shown in Fig. 80.15, is acceptable, taking into ac- 
count that the reference trajectory is a straight line. In 
Fig. 80.16 we can observe an unacceptable trajectory, 
which was found in the early attempts of optimization 
for the Type-1 FLC applying this chemical reaction 
algorithm. Here, we can observe that the parameters 


log10 (f(«)) 
1.5 


> 
1 2 3 4 5 6 7 8 9 10 
Generation 


Fig. 80.13 Behavior of the algorithm when optimizing the 
Type-2 FLC 


found were not adequate to make the FLC follow the 
desired trajectory. 

In order to test the robustness of the Type-1 
and Type-2 FLC, we added an external signal given 
by (80.10). 


Falt) =e xX sinw xt. (80.10) 


This represents an external force applied in a period 
of 10 s to the trajectory obtained that will make the mo- 
bile robot move out of its path. The idea of adding this 
disturbance is to measure the errors obtained with the 
FLC and to test the behavior of the mobile robot under 
perturbed torques. Table 80.8 shows the parameters for 
the simulations and the errors obtained during the run 
of the simulation. 

Figure 80.17 show the trajectories obtained for the 
Type-1 FLC optimized with GAs. 

Figure 80.18 shows the trajectories obtained for the 
Type-1 FLC optimized with the chemical reaction algo- 
rithm. 

Figure 80.19 shows the trajectories obtained for the 
Type-2 FLC optimized with the CRA method. 

In Table 80.8 and Figs. 80.17 to 80.19 we can 
observe that the Type-2 FLC was able to maintain 
a more controlled trajectory in despite of the large error 
found by the algorithm (e = 2.7736). For larger ep- 
silon (£) values, it was difficult for the Type-1 FLCs 
to keep in the path, and in a determined time the 
controller was not able to return to the reference tra- 
jectory. 


1527 


S°08 | D Hed 


1528 PartG | Hybrid Systems 


Fig. 80.14a-d Resulting 


a : . 
) 1, NI E p Type-2 input membership 
functions, from top to bot- 
as tom: (a) linear and (b) angu- 
lar velocities and output (c) 
R right and (d) left torque 
-1000 -800 -600 -400 -200 0 200 400 600 800 1000 
ey 
b) N @ P 
1 
0.5 
0 a 
-1000 -800 -600 -400 -200 0 200 400 600 800 1000 
ew 
c) N € P 
lg 
Ww 
-1000 -800 -600 -400 -200 0 200 400 600 800 1000 
Ii 
d) N @ P 
lg 
0.5 W 
0 
-1000 -800 -600 -400 -200 0 200 400 600 800 1000 
Th 
y-axis xy-plot y-axis xy-plot 
03,4 
vu 
g 
=i 0.2 
(o 
(8. 0.1 
= 
ul 
0 
-0.1 
-5 
—0.2 
0.3 > 10 > 
-1 0 1 2 3 -10 -5 0 5 10 
x-axis x-axis 


Fig. 80.15 Trajectory obtained for the mobile robot when 
applying the chemical reaction algorithm to the Type-2 
FLC 


Fig. 80.16 Unacceptable trajectory resulting in early opti- 
mization trials 


Fuzzy Controllers for Autonomous Mobile Robots 


80.5 Experimental Results 


Table 80.8 Simulation parameters and errors obtained under disturbed torques 


e Velocity 
errors 
0.05 Final error 
Average error 
5 Final error 
Average error 
10 Final error 
Average error 
30 Final error 
Average error 
32 Final error 
Average error 
34 Final error 
Average error 
40 Final error 
Average error 
41 Final error 
Average error 
a) y-axis xy-plot 
A 
0.05 
0 
—0.05 
-0.1 
-0.15 
—0.2 
—0.25 
C E 
-1 0 1 2 3 
X-axis 


Type-1 
(GA) 
4.0997 
4.1209 
4.1059 
3.1695 
4.1045 
3.0985 
4.0912 
2.2632 
3273 
3.4667 x 10+ 03 
1.5705 x 10+ 004 
1.1180 x 10+004 
2.534 x 19 +004 
186.0611 
8839 
2.0268 x 101004 


b) y-axis xy-plot 


Type-1 
(CRA) 
0.9815 
1.5823 
0.9729 
1.8679 
0.9745 
1.7438 
0.9783 
1.9481 
0.9748 
2.8180 
566.8 
215.8198 
3.5417 x 101% 
5.7492 x 10 +003 
3168 
0.0503 x 10+003 


c) y-axis 


a 3 -1 


x-axis 


Type-2 

(CRA) 
29.5115 
26.6408 
29.52 
26.1646 
295 
24.9467 
29.51 
24.6032 
2952 
24.6465 
29.51 
24.9211 
295 
23.8938 

685.1 
16.5257 


xy-plot 


Fig. 80.17a-c From left to right: trajectory obtained with the Type-1 FLC optimized with GAs. (a) ¢ = 30, (b) e = 32, 


(c) £ = 34 


a) y-axis xy-plot 


-1 0 1 2 3 


x-axis 


b) y-axis xy-plot 


c) y-axis 


2 3 -1 


x-axis 


xy-plot 


1 2 3 


x-axis 


Fig. 80.18a-c From left to right: trajectory obtained with the Type-1 FLC optimized with CRA. (a) £ = 30, (b) e = 32, 


(c) £ = 34 


1529 


S°08 | 9 Hed 


1530 ~PartG | Hybrid Systems 


08 | 9 Hed 


a) y-axis xy-plot b) y-axis xy-plot c) y-axis xy-plot 
A A A 
0.05 0.05 0.05 
0 0 0 
—0.05 —0.05 —0.05 
—0.1 —0.1 —0.1 
—0.15 —0.15 -0.15 
—0.2 —0.2 —0.2 
—0.25 —0.25 —0.25 
> > > 
-1 0 1 2 3 -1 0 1 2 3 -l 0 1 2 3 
X-axis x-axis x-axis 


Fig. 80.19a-c From left to right: trajectory obtained with the Type-2 FLC optimized with CRA. (a) ¢ = 30, (b) ¢ = 32, 


(c) £ = 34 


80.6 Conclusions 


In this paper, we presented simulation results from an 
optimization method that mimics chemical reactions 
applied to the problem of tracking control. The goal 
was to find the gain constants involved in the tracking 
controller for the dynamic model of a unicycle mo- 
bile robot. In the figures of the experiments we were 
able to note the behavior of the algorithm and the solu- 
tions found through all the iterations. Simulation results 
show that the proposed optimization method is able to 


References 


outperform the results previously obtained by apply- 
ing a genetic algorithm optimization technique. The 
optimal fuzzy logic controller obtained with the pro- 
posed chemical paradigm is able to reach smaller error 
values in less time than genetic algorithms. Also, the 
Type-2 fuzzy controller was able to perform better in 
the presence of disturbance in this problem despite the 
large error obtained (e = 2.7736). The design of opti- 
mal Type-2 fuzzy controllers is performed at the time. 


80.1 N.-Y. Shi, C.-P. Chu: A molecular solution to the 
hitting-set problem in DNA-based supercomput- 
ing, Inf. Sci. 180, 1010-1019 (2010) 

80.2 L. Yamamoto: Evaluation of a catalytic search algo- 
rithm, Proc. 4th Int. Workshop Nat. Inspired Coop. 
Strateg. Optim., NICSO 2010 (2010) pp. 75-87 

80.3 T. Meyer, L. Yamamoto, W. Banzhaf, C. Tschudin: 
Elongation control in an algorithmic chemistry, 
Lect. Notes Comput. Sci. 5777, 273-280 (2010) 

80.4 J. Xu, AY.S. Lam, V.0.K. Li: Chemical reaction op- 
timization for the grid scheduling problem, IEE 
Commun. Soc., ICC 2010 (2010) pp. 1-5 

80.5 Y. Kanayama, Y. Kimura, F. Miyazaki, T. Noguchi: 
A stable tracking control method for a non- 
holonomic mobile robot, Proc. IEEE/RSJ Int. Work- 
shop Intell. Robot. Syst., Osaka (1991) pp. 1236-1241 

80.6 T.-C. Lee, C.H. Lee, C.-C. Teng: Tracking control of 
mobile robots using the backsteeping technique, 
Proc. 5th Int. Conf. Contr. Automat. Robot. Vis., Sin- 
gapore (1998) pp. 1715-1719 

80.7 T.-C. Lee, K. Tai: Tracking control of unicycle- 
modeled mobile robots using a saturation feedback 


controller, IEEE Trans. Control Syst. Techn. 9(2), 305- 
318 (2001) 

80.8 S. Bentalba, A. El Hajjaji, A. Rachid: Fuzzy control 
of a mobile robot: A new approach, IEEE Int. Conf. 
Control Appl., Hartford (1997) pp. 69-72 

80.9 S. Ishikawa: A method of indoor mobile robot nav- 
igation by fuzzy control, Proc. Int. Conf. Intell. 
Robot. Syst., Osaka (1991) pp. 1013-1018 

80.10 T.H. Lee, F.H.F. Leung, P.K.S. Tam: Position control 
for wheeled mobile robot using a fuzzy controller, 
25th Annu. Conf. IEEE, San Jose (1999) pp. 525-528 

80.11 S. Pawlowski, P. Dutkiewicz, K. Kozlowski, W. Wrob- 
lewski: Fuzzy logic implementation in mobile robot 
control, 2nd Workshop Robot Motion Control (2001) 
pp. 65-70 

80.12 (C.-C. Tsai, H.-H. Lin, C.-C. Lin: Trajectory tracking 
control of a laser-guided wheeled mobile robot, 
Proc. IEEE Int. Conf. Control Appl., Taipei (2004) 
pp. 1055-1059 

80.13 S.V. Ulyanov, S. Watanabe, V.S. Ulyanov, K. Yama- 
fuji, L.V. Litvintseva, G.G. Rizzotto: Soft computing 
for the intelligent robust control of a robotic uni- 


Fuzzy Controllers for Autonomous Mobile Robots | References 1531 


80.14 


80.15 


80.16 


80.17 


80.18 


80.19 


80.20 


80.21 


cycle with a new physical measure for mechanical 
controllability, Soft Comput. 2, 73-88 (1998) 

R. Fierro, F.L. Lewis: Control of a nonholonomic 
mobile robot using neural networks, IEEE Trans. 
Neural Netw. 9(4), 589-600 (1998) 

K.T. Song, L.H. Sheen: Heuristic fuzzy-neural Net- 
work and its application to reactive navigation of 
a mobile robot, Fuzzy Sets Syst. 110(3), 331-340 
(2000) 

A.M. Bloch, S. Drakunov: Tracking in non- 
holonomic dynamic system via sliding modes, 
Proc. IEEE Conf. Decis. Control, Brighton (1991) 
pp. 1127-1132 

D. Chwa: Sliding-mode tracking control of non- 
holonomic wheeled mobile robots in polar coordi- 
nates, IEEE Trans. Control Syst. Tech. 12(4), 633-644 
(2004) 

R. Fierro, F.L. Lewis: Control of a nonholonomic 
mobile robot: Backstepping kinematics into dy- 
namics, Proc. 34th Conf. Decis. Control, New Orleans 
(1995) pp. 3805-3810 

T. Fukao, H. Nakagawa, N. Adachi: Adaptive track- 
ing control of a non-holonomic mobile robot, IEEE 
Trans. Robot. Autom. 16(5), 609-615 (2000) 

A.R. Sahab, M.R. Moddabernia: Backstepping 
method for a single-link flexible-joint manipula- 
tor using genetic algorithm, WICIC 7(7B), 4161-4170 
(2011) 

J. Yu, Y. Ma, B. Chen, H. Yu, S. Pan: Adap- 
tive neural position tracking control for induction 


80.22 


80.23 


80.24 


80.25 


80.26 


80.27 


80.28 


motors via backstepping, WICIC 7(7B), 4503-4516 
(2011) 

L. Astudillo, 0. Castillo, L. Aguilar: Intelligent con- 
trol for a perturbed autonomous wheeled mobile 
robot: A type-2 fuzzy logic approach, Nonlinear 
Stud. 14(1), 37-48 (2007) 

R. Martinez, 0. Castillo, L. Aguilar: Optimization 
of type-2 fuzzy logic controllers for a perturbed 
autonomous wheeled mobile robot using genetic 
algorithms, Inf. Sci. 179(13), 2158-2174 (2009) 

0. Castillo, R. Martinez-Marroquin, P. Melin, J. So- 
ria: Comparative study of bio-inspired algorithms 
applied to the optimization of type-1 and type-2 
fuzzy controllers for an autonomous mobile robot, 
Stud. Comput. Intell. 256, 247-262 (2009) 

I. Kolmanovsky, N.H. McClamroch: Developments in 
nonholonomic nontrol problems, IEEE Control Syst. 
Mag. 15, 20-36 (1995) 

G. Campion, G. Bastin, B. D'Andrea-Novel: Struc- 
tural properties and classification of kinematic and 
dynamic models of wheeled mobile robots, IEEE 
Trans. Robot. Autom. 12(1), 47-62 (1996) 

W. Nelson, I. Cox: Local path control for an au- 
tonomous vehicle, Proc. IEEE Conf. Robotics Autom. 
(1988) pp. 1504-1510 

S. Oh, H. Jang, W. Pedrycz: A comparative ex- 
perimental study of type-1/type-2 fuzzy cascade 
controller based on genetic algorithms and parti- 
cle swarm optimization, Expert Syst. Appl. 38(9), 
11217-11229 (2011) 


08 | D Hed 


81. Bio-Inspired Optimization Methods 


Fevrier Valdez 


Although graphic processing units (GPUs) have 
been traditionally used only for computer graphics, 
a recent technique called general-purpose com- 
puting on graphics processing units allows GPUs to 
perform numerical computations usually handled 
by the CPU (central processing unit). The advantage 
of using GPUs for general purpose computation is 
the performance speedup that can be achieved 
due to the parallel architecture of these devices. 
This chapter describes the use of bio-inspired opti- 
mization methods as particle swarm optimization 
and genetic algorithms on GPUs to demonstrate 
the performance that can be achieved using this 
technology, primarily with regard to using CPUs. 


81.1 Bio-Inspired Methods 


In this chapter we describe the optimization of a set of 
mathematical functions using bio-inspired algorithms. 
We use genetic algorithms (GAs) and particle swarm 
optimization (PSO), simulated annealing (SA), and pat- 
tern search (PS) to optimize the functions. The main 
idea is to compare these metaheuristic methods using 
the CPU and GPUs. Nowadays several approaches have 
been taken to optimize mathematical functions, see, for 
example, [81.1—6]. Our approach, however, differs from 
these approaches because we make a comparison be- 
tween the advantage of executing the methods on CPUs 
and GPUs with the aim of achieving the results quickly. 


81.1 Bio-Inspired Methods.......................04 1533 

81.2 Bio-Inspired Optimization Methods...... 1533 

81.2.1 Genetic Algorithms................ 1534 
81.2.2 Particle Swarm 

Optimizat. b.55.5 oc cse55 cee csasccesessse 1534 

81.2.3 Simulated Annealing ................. 1534 

81.2.4 Pattern Searhces 1534 

81.3 A Brief History of GPUs......................... 1535 

Siah AUDE aae naea 1535 

81.4 Experimental Results ......................... 1535 

81.5 CONCIUSIONS -ss senseeirninaiisnniiednirerisi 1538 

References... cec cc ceecceneeeeeeeeeaeeeenees 1538 


The main contribution of this work is the proposed 
approach for the implementation of bio-inspired opti- 
mization techniques on GPUs for optimization appli- 
cations. The approach is illustrated with mathematical 
function optimization, but could be applicable to other 
problems. 

The introduction to the proposed method, is fol- 
lowed by a description of bio-inspired methods in 
Sect. 81.2. In Sect. 81.3, a brief history of GPUs 
is presented, in Sect. 81.4 the experimental results 
are shown, and in Sect. 81.5 the conclusions are 
presented. 


81.2 Bio-Inspired Optimization Methods 


To compare the performance on a CPU or a GPU, it is 
necessary evaluate the methods with optimization prob- 
lems. Some basic concepts of bio-inspired optimization 
are needed to understand the differences in the corre- 


sponding algorithms. Therefore, in this section we offer 
a brief description about the bio-inspired optimization 
methods used in this work. The methods used are de- 
scribed in the following sections. 


1533 


vu 
go 

= 

(a 
7) 
ios) 
= 
N 


1534 Part G 


Hybrid Systems 


7718 | D Hed 


81.2.1 Genetic Algorithms 


Holland, from the University of Michigan initiated his 
work on genetic algorithms at the beginning of the 
1960s. His first achievement was the publication of 
Adaptation in Natural and Artificial Systems [81.7] in 
1975. 

He had two goals in mind: to improve the under- 
standing of the natural adaptation process and to design 
artificial systems having properties similar to natural 
systems [81.8]. 

The basic idea is as follows: the genetic pool of 
a given population potentially contains the solution, or 
a better solution, to a given adaptive problem. This so- 
lution is not active because the genetic combination on 
which it relies is split between several subjects. Only 
the association of different genomes can lead to the so- 
lution. 

Holland’s method is especially effective because it 
not only considers the role of mutation, but it also uses 
genetic recombination (crossover) [81.9]. The essence 
of the GA in both theoretical and practical domains has 
been well demonstrated [81.1, 10]. The concept of ap- 
plying a GA to solve engineering problems is feasible 
and sound. However, despite the distinct advantages of 
a GA for solving complicated, constrained, and mul- 
tiobjective functions where other techniques may have 
failed and the full power of the GA application is yet to 
be exploited [81.11, 12]. 


81.2.2 Particle Swarm Optimization 


Particle swarm optimization (PSO) is a population- 
based stochastic optimization technique that was devel- 
oped by Eberhart and Kennedy in 1995, inspired by the 
social behavior of bird flocking or fish schooling [81.3]. 

PSO shares many similarities with evolutionary 
computation techniques such as GAs [81.13]. The sys- 
tem is initialized with a population of random solu- 
tions and searches for optima by updating generations. 
However, unlike the GA, the PSO has no evolution 
operators such as crossover and mutation. In PSO, 
the potential solutions, called particles, fly through the 
problem space by following the current optimum parti- 
cles [81.14]. 

Each particle keeps track of its coordinates in the 
problem space, which are associated with the best so- 
lution (fitness) it has achieved so far (the fitness value 
is also stored). This value is called pbest. Another best 
value that is tracked by the particle swarm optimizer 
is the best value obtained so far by any particle in the 


neighbors of the particle. This location is called /best. 
When a particle takes all the population as its topo- 
logical neighbors, the best value is a global best and 
is called gbest. 

The particle swarm optimization concept consists 
of, at each time step, changing the velocity of (acceler- 
ating) each particle toward its pbest and [best locations 
(the local version of PSO). Acceleration is weighted by 
a random term, with separate random numbers being 
generated for acceleration toward pbest and Ibest loca- 
tions. 

In the past several years, PSO has been successfully 
applied in many research and application areas. It is 
demonstrated that PSO obtains better results in a faster 
and cheaper way compared with other methods [81.15]. 


81.2.3 Simulated Annealing 


SA is a generic probabilistic metaheuristic for the 
global optimization problem of applied mathematics, 
namely locating a good approximation to the global op- 
timum of a given function in a large search space. It is 
often used when the search space is discrete (e.g., all 
tours that visit a given set of cities). For certain prob- 
lems, simulated annealing may be more effective than 
exhaustive enumeration provided that the goal is merely 
to find an acceptably good solution in a fixed amount of 
time, rather than the best possible solution. 

The name and inspiration come from annealing 
in metallurgy, a technique involving heating and con- 
trolled cooling of a material to increase the size of its 
crystals and reduce their defects. The heat causes the 
atoms to become unstuck from their initial positions 
(a local minimum of the internal energy) and wan- 
der randomly through states of higher energy; the slow 
cooling gives them more chances of finding configu- 
rations with lower internal energy than the initial one. 
By analogy with this physical process, each step of the 
SA algorithm replaces the current solution by a random 
nearby solution, chosen with a probability that depends 
both on the difference between the corresponding func- 
tion values and also on a global parameter T (called the 
temperature), which is gradually decreased during the 
process. The dependency is such that the current so- 
lution changes almost randomly when T is large, but 
increasingly downhill as T goes to zero [81.16]. 


81.2.4 Pattern Search 


Pattern search is a family of numerical optimiza- 
tion methods that do not require the gradient of 


Bio-Inspired Optimization Methods | 81.4 Experimental Results 


the problem to be optimized, and PS can hence 
be used on functions that are not continuous or 
differentiable. Such optimization methods are also 
known as direct-search, derivative-free, or black-box 
methods. 

The name pattern search was coined by Hooke and 
Jeeves [81.17]. An early and simple PS variant is at- 
tributed to Fermi and Metropolis when they worked at 
the Los Alamos National Laboratory as described by 


81.3 A Brief History of GPUs 


We have already looked at how central processors 
evolved in both clock speeds and core count. In the 
meantime, the state of graphics processing underwent 
a dramatic revolution. In late 1980s and early 1900s, 
the growth in popularity of graphically driven operating 
systems such Microsoft Windows helped create a mar- 
ket for a new type of processor. In the early 1990s, 
users began purchasing 2-D display accelerators for 
their personal computers. These display accelerators of- 
fered hardware-assisted bitmap operations to assist in 
the display and usability of graphical operating sys- 
tems [81.19]. From a parallel-computing standpoint, 
NVIDIA's release of the GeForce 3 series in 2001 rep- 
resents arguably the most important breakthrough in 
GPU technology. The GeForce 3 series was the comput- 
ing industry’s first chip to implement Microsoft’s then 
new DirectX 8.0 standard. This standard required that 


81.4 Experimental Results 


This section presents the experimental results obtained 
with the optimization methods analyzed in this re- 
search. The main contribution of this paper is to demon- 
strate the advantages of using GPUs to calculate com- 
plex processes. 

To validate the proposed method we used a set of 
five benchmark mathematical functions; all functions 
were evaluated with different numbers of dimensions. 
In this case, the experimental results were obtained with 
32 dimensions. 

Table 81.1 shows the definitions of the mathemat- 
ical functions used in this paper. The global minimum 
for the test functions is 0. 

Tables 81.2 and 81.3 show the experimental results 
for the benchmark mathematical functions used in this 
research using the CPU and the GPU to process the GA. 
The table shows the experimental results of the evalua- 


Davidon [81.18], who summarized the algorithm as fol- 
lows: 


They varied one theoretical parameter at a time by 
steps of the same magnitude, and when no such 
increase or decrease in any one parameter fur- 
ther improved the fit to the experimental data, they 
halved the step size and repeated the process until 
the steps were deemed sufficiently small. 


complaint hardware contain both programmable ver- 
tex and programmable pixel shading stages. For the 
first time, developers had some control over the ex- 
act computations that would be performed on their 
GPUs [81.19]. 


81.3.1 CUDA 


In November 2006, NVIDIA unveiled the industry’s 
first DirectX 10 GPU, the GeForce 8800 GTX. The 
GeForce 8800 GTX was also the first GPU to be built 
with NVIDIA’s CUDA architecture. This architecture 
included several new components designed strictly for 
GPU computing and aimed to alleviate many of the 
limitations that prevented previous graphics proces- 
sors from being legitimately useful for general-purpose 
computation [81.19]. 


tions for each function with 32 dimensions; the best and 
worst values obtained with an average of 50 times can 


Table 81.1 Mathematical functions 


Function Definition 
De Jong’s N 
A= B 


Rotated n i 2 
hyper-ellipsoid f@= > ( Ds 5) 


Rosenbrock’s n=l 


valley fŒ = > 100(x;41 2) + A — x)? 


Rastrigin’s 


f@) = 10n + S (x? — 10 cos(27x;)) 
i=1 


Griewank’s 


f&)= > a —cos (=) ap il 


1535 


18 | D Wed 


1536 Part G | Hybrid Systems 


118 | D Hed 


Table 81.2 Experimental results with 32 dimensions with GA on a CPU 


De Jong’s 0.00094 1.14 x106 0.0056 1.883603 
Rotated hyper-ellipsoid 0.05371 0.00228 0.53997 2.015548 
Rosenbrock’s valley 3.14677173 3.246497 3.86201 3.001564 
Rastrigin’s 82.35724 46.0085042 129.548 1.452212 
Griewank’s 0.41019699 0.14192331 0.917367 2.548792 


Table 81.3 Experimental results with 32 dimensions with GA on a GPU 


De Jong’s 0.000084 1.14 x1078 0.00040 0.360003 
Rotated hyper-ellipsoid 0.005371 0.00228 0.53997 0.004590 
Rosenbrock’s valley 2.325468 1.97548 3.86201 0.005594 
Rastrigin’s 70.35724 41.54879 130.598 0.502254 
Griewank’s 0.31019699 0.04192331 0.917367 0.920154 


Table 81.4 Experimental results with 32 dimensions with PSO on a CPU 


De Jong’s 529 <i 3.40 x 10—!2 9.86 x107!! 2.5442154 
Rotated hyper-ellipsoid 5.42 x107!! 1.93 x107~!2 9.83 x107!! 1.2456487 
Rosenbrock’s Valley 3.2178138 3.1063 3.39178762 1.3659478 
Rastrigin’s 34.169712 16.14508 56.714207 3.569871 

Griewank’s 0.0114768 9.17 x10~° 0.09483 5.2654587 


Table 81.5 Experimental results with 32 dimensions with PSO on the GPU 


De Jong’s DAD SEO 2.40 x107 !? 9.86 x10—!! 0.05040454 
Rotated hyper-ellipsoid 4.20 x107~!! 2.30 x10~? 9.83 x107!! 0.02045687 
Rosenbrock’s Valley 3.1071308 2.16020 3.39178762 0.03659470 
Rastrigin’s 34.199999 15.14508 53.802564 0.056787 10 
Griewank’s 0.0201564 9.17 x10~® 0.094831 0.02654580 


Table 81.6 Experimental results with 32 dimensions with SA on a CPU 


De Jong’s 0.1210 0.0400 1.8926 3.0124 
Rotated hyper-ellipsoid 0.9800 0.0990 7.0104 3.0215 
Rosenbrock’s Valley 1.2300 0.4402 10.790 229999) 
Rastrigin’s 25.8890 20.101 33.415 3.2145 
Griewank’s 0.9801 0.2045 5.5678 4.0555 


be seen after execution of the method. The processing 
time in seconds is also shown. 

Tables 81.4 and 81.5 show the experimental re- 
sults for the benchmark mathematical functions used in 
this research using the CPU and the GPU to process 
the PSO method. Table 81.4 shows the experimental 
results of the evaluations for each function with 32 di- 
mensions when processing is performed on a CPU; 
the best and worst values obtained with the average 
of 50 times after execution of the method can be ob- 


served. The processing time in seconds is also shown. 
Table 81.5 shows similar information, but for the PSO 
executed on the GPU. It is very easy to appreciate 
the differences in the results shown in both tables, 
which show that performance on the GPU is clearly 
superior. 

Tables 81.6 and 81.7 show the experimental re- 
sults for the benchmark mathematical functions used 
in this research using the CPU and the GPU to pro- 
cess the SA. The table shows the experimental results 


Bio-Inspired Optimization Methods 


81.4 Experimental Results 


Table 81.7 Experimental results with 32 dimensions with SA on a GPU 


Function Average Best Worst Time (s) 
De Jong’s 0.10100 0.06012 1.2699 1.000124 
Rotated hyper-ellipsoid 0.81200 0.0891 7.1003 1.001015 
Rosenbrock’s Valley 1.31200 0.40002 10.1290 1.018787 
Rastrigin’s 25.3256 21.100 S225 1.010145 
Griewank’s 0.99010 0.3050 6.50678 1.000325 
Table 81.8 Experimental results with 32 dimensions with PS on the CPU 
Function Average Best Worst Time (s) 
De Jong’s 0. 3528 0.2232 2.0779 4.2521 
Rotated hyper-ellipsoid 16.2505 3.1667 25.782 6.2154 
Rosenbrock’s Valley 4.0568 3.0342 57765 5.2565 
Rastrigin’s 31.4203 25.7660 33.9866 3.25654 
Griewank’s 0.6897 0.0981 3.5061 2.1548 
Table 81.9 Experimental results with 32 dimensions with PS on GPU 
Function Average Best Worst Time (s) 
De Jong’s 0. 5208 0.1232 2579 1.1021 
Rotated hyper-ellipsoid 16.5005 3.6197 250182 2.1154 
Rosenbrock’s Valley 4.0588 3.00215 4.2565 2.5105 
Rastrigin’s 31.5203 25.4530 33.9866 1.6054 
Griewank’s 0.14970 0.00980 3.5061 1.4858 
of the evaluations for each function with 32 dimen- : 
3 ‘ j Time (s) 
sions; the best and worst values obtained with the TN 
average of 50 times after execution of the method —o— CPU time 
can be seen. The processing time in seconds is also € --@-- GPU time 


shown. 

Tables 81.8 and 81.9 show the experimental results 
for the benchmark mathematical functions used in this 
research using the CPU and GPU to process the PS. 
The table shows the experimental results of the eval- 
uations for each function with 32 dimensions; the best 
and worst values obtained with the average of 50 times 
after execution of the method can be seen. The process- 
ing time in seconds is also shown. 

Figure 81.1 shows the comparison results between 
the processing time on the GPU and the CPU. The 
difference in time of each best time obtained in the ex- 
periments discussed in the paper is shown. The blue 
line represents the processing time in the CPU and the 
brown line represents the processing time in the GPU. Is 


@--0-0--0-0-9 


vo 


> 


el 4: 8) () 7) ey 


10 11 12 13 14 15 16 17 18 19 20 


Experiment number 


Fig. 81.1 Comparison of results between GPU and CPU 


clear how the best time achieved is when the algorithms 


were executed on the GPU. 


1537 


718 | D Wed 


1538 PartG | Hybrid Systems 


18 | D Wed 


81.5 Conclusions 


The analysis of the experimental results of the 
bio-inspired methods considered in this paper, the 
FPSO+FGA (FPSO: fuzzy particle swarm optimiza- 
tion; FGA: fuzzy generic algorithm), lead us to the 
conclusion that for the optimization of these bench- 
mark mathematical functions execution on the GPU is 
a good alternative, because it is easier and very fast to 
optimize and achieve good results than to try it with 


References 


81.1 K.F. Man, K.S. Tang, S. Kwong: Genetic Algorithms: 
Concepts and Designs (Springer, Berlin, Heidelberg 
1999) 

81.2 R.C. Eberhart, J. Kennedy: A new optimizer using 
particle swarm theory, Proc. 6th Int. Symp. Micro- 
mach. Hum. Sci., Nagoya (1995) pp. 39-43 

81.3 J. Kennedy, R.C. Eberhart: Particle swarm optimiza- 
tion, Proc. IEEE Int. Conf. Neural Netw., Piscataway 
(1995) pp. 1942-1948 

81.4 0. Montiel, 0. Castillo, P. Melin, A. Rodriguez, 
R. Sepulveda: Human evolutionary model: A new 
approach to optimization, Inf. Sci. 177(10), 2075- 
2098 (2007) 

81.5 D. Kim, K. Hirota: Vector control for loss minimiza- 
tion of induction motor using GA-PSO, Appl. Soft 
Comput. 8, 1692-1702 (2008) 

81.6 H. Liu, A. Abraham, A.E. Hassanien: Scheduling 
jobs on computational grids using a fuzzy particle 
swarm optimization algorithm, Future Gener. Com- 
put. Syst. 26(8), 1336-1343 (2010) 

81.7 D.B. Fogel: An introduction to simulated evolution- 
ary optimization, IEEE Trans. Neural Netw. 5(1), 3-14 
(1994) 

81.8 D. Goldberg: Genetic Algorithms (Addison Wesley, 
Boston 1988) 

81.9 (C. Emmeche: Garden in the Machine. The Emerg- 
ing Science of Artificial Life (Princeton Univ. Press, 
Princeton 1994) p. 114 

81.10 J.H. Holland: Adaptation in Natural and Artifi- 
cial System (Univ. of Michigan Press, Ann Arbor 
1975) 

81.11 T. Back, D.B. Fogel, Z. Michalewicz (Eds.): Handbook 
of Evolutionary Computation (Oxford Univ. Press, 
Oxford 1997) 


PSO, GA, SA, and genetic pattern search (GPS) on 
the CPU [81.14], especially when the number of di- 
mensions is increased. This is because processing on 
GPUs is faster than processing on CPUs. Also, the 
experimental results obtained with the use of GPUs 
in this research were compared with another simi- 
lar approach [81.20,21] and achieved good results 
quickly. 


81.12 0. Castillo, F. Valdez, P. Melin: Hierarchical Ge- 
netic Algorithms for topology optimization in fuzzy 
control systems, Int. J. Gen. Syst. 36(5), 575-591 
(2007) 

81.13 0. Castillo, P. Melin: Hybrid intelligent systems for 
time series prediction using neural networks, fuzzy 
logic, and fractal theory, IEEE Trans. Neural Netw. 
13(6), 1395-1408 (2002) 

81.14 F. Valdez, P. Melin: Parallel evolutionary comput- 
ing using a cluster for mathematical function op- 
timization, Fuzzy Information Processing Society 
(NAFIPS '07), San Diego (2007) pp. 598-602 

81.15 P.J. Angeline: Using selection to improve particle 
swarm optimization, Proc. 1998 IEEE World Congr. 
Comput. Intell., Anchorage (1998) pp. 84-89 

81.16 S. Kirkpatrick, C.J. Gelatt, M. Vecchi: Optimization 
by simulated annealing, Science 220(4598), 671- 
680 (1983) 

81.17 R. Hooke, T.A. Jeeves: Direct search solution of nu- 
merical and statistical problems, J. Assoc. Comput. 
Mach. (ACM) 8(2), 212-229 (1961) 

81.18 W.C. Davidon: Variable metric method for mini- 
mization, SIAM J. Optim. 1(1), 1-17 (1991) 

81.19 J. Sanders, E. Kandrot: CUDA by Example: An In- 
troduction to General-Purpose GPU Programming 
(Addison Wesley, Boston 2011) 

81.20 M.O. Ali, S.P. Koh, K.H. Chong, A.S. Hamoodi: De- 
sign a PID controller of BLDC motor by using hy- 
brid genetic-immune, Mod. Appl. Sci. 5(1), 75-85 
(2011) 

81.21 F. Valdez, P. Melin, 0. Castillo: An improved evolu- 
tionary method with fuzzy logic for combining par- 
ticle swarm optimization and genetic algorithms, 
Appl. Soft Comput. 11(2), 2625-2632 (2011) 


Acknowledgements 


A.4 Aggregation Functions on [0,1] 

by Radko Mesiar, Anna Kolesárová, 

Magda Komornikova 
This work was supported by grants APVV—0073-10, 
VEGA 1/0143/11, and VEGA 1/0419/13. 


A.5 Monotone Measures-Based Integrals 

by Erich Klement, Radko Mesiar 
This work was supported by grants APVV-0073-10, 
VEGA 1/0171/12, and GAČR P-402/1 1/0378. 


A.6 The Origin of Fuzzy Extensions 
by Humberto Bustince, Edurne Barrenechea, 
Javier Fernández, Miguel Pagola, Javier Montero 
The work has been supported by projects TIN2013- 
40765-P and TIN2012-32482 of the Spanish Ministry 
of Science. 


A.7 F-Transform 
by Irina Perfilieva 

This work relates to Department of the Navy Grant 
N62909-12-1-7039 issued by Office of Naval Research 
Global. The United States Government has a royalty- 
free license throughout the world in all copyrightable 
material contained herein. Additional support was also 
given by the European Regional Development Fund 
in the IT4Innovations Centre of Excellence project 
(CZ.1.05/1.1.00/02.0070). 


A.8 Fuzzy Linear Programming and Duality 

by Jaroslav Ramik, Milan Vlach 
This work was supported by the European Regional 
Development Fund in the IT4Innovations Centre of Ex- 
cellence project (CZ.1.05/1.1.00/02.0070). 


A.9 Basic Solutions 

of Fuzzy Coalitional Games 

by Tomáš Kroupa, Milan Vlach 
The work of Tomáš Kroupa was supported by the 
grant P402/12/1309 of Czech Science Foundation. The 
work of Milan Vlach was supported by the Czech 
Science Foundation project No. P402/12/G097 DYME 
Dynamic Models in Economics. 


B.10 Basics of Fuzzy Sets 

by János Fodor, Imre Rudas 
The authors have been supported in part by the Hun- 
garian Scientific Research Fund OTKA under contract 
K-106392. 


B.12 Fuzzy Implications: 

Past, Present, and Future 

by Michat Baczynski, Balasubramaniam Jayaram, 

Sebastia Massanet, Joan Torrens 
S. Massanet and J. Torrens acknowledge the support 
by the Spanish Grants MTM2009-10320 and TIN2013- 
42795-P, both with FEDER support. B. Jayaram would 
like to acknowledge the partial support given by the De- 
partment of Science and Technology, INDIA under the 
project SERB/F/2862/201 1-12. 


B.16 An Algebraic Model of Reasoning 

to Support Zadeh's CWW 

by Enric Trillas 
This chapter is partially supported by the Foundation 
for the Advancement of Soft Computing (Mieres, As- 
turias, Spain), and by MICINN/Government of Spain, 
under project TIN 2011-29827-C02-01. The author is 
in debt with Professors Claudio Moraga (Mieres), and 
Settimo Termini (Palermo), as well as with Dr. Maria G. 
Navarro (Madrid), for their comments and advises on 
the contents of this paper. 


B.20 Application of Fuzzy Techniques 

to Autonomous Robots 

by Ismael Rodriguez Fdez, Manuel Mucientes, 

Alberto Bugarin Diz 
This work was supported by the Spanish Ministry of 
Economy and Competitiveness under grants TIN2011- 
22935 and TIN2011-29827-C02-02, and the Galician 
Ministry of Education under grant EM2014/012. I. 
Rodriguez-Fdez is supported by the Spanish Ministry of 
Education under the FPU national plan (AP2010-0627). 
This work was also partially supported by the Eu- 
ropean Regional Development Fund (ERDF/FEDER) 
under projects CN2012/151 and GRC2014/030 of the 
Galician Ministry of Education. 


1539 


1540 


*|MOUW!Y 


Acknowledgements 


C.21 Foundations of Rough Sets 

by Andrzej Skowron, Andrzej Jankowski, 

Roman Swiniarski 
This work was supported by the Polish National Sci- 
ence Center, grants DEC-2011/01/B/ST6/03867, DEC- 
2011/01/D /ST6/06981, DEC-2012/05/B/ST6/03215, 
and DEC-2013/09/B/ST6/01568, as well as by the Pol- 
ish National Center for Research and Development 
(NCBiR) under the grant SYNAT No. SP/I/1/77065/10 
in the frame of the strategic scientific research 
and experimental development program: Interdisci- 
plinary System for Interactive Scientific and Scientific- 
Technical Information and the grant No. O ROB/0010/ 
03/001 in the frame of the Defence and Security Pro- 
grammes and Projects: Modern Engineering Tools for 
Decision Support for Commanders of the State Fire Ser- 
vice of Poland during Fire and Rescue Operations in 
Buildings. 


D.32 Kernel Methods 
by Marco Signoretto, Johan Suykens 

@ EU: The research leading to these results has re- 
ceived funding from the European Research Coun- 
cil under the European Union’s Seventh Frame- 
work Programme (FP7/2007-2013) / ERC AdG 
A-DATADRIVE-B (290923). 

@ Research Council KUL: GOA/10/09 MaNet, CoE 
PFV/10/002 (OPTEC), BIL12/11T; PhD/Postdoc 
grants 

@ Flemish Government: 

— FWO: projects: G.0377.12 (Structured sys- 
tems), G.088114N (Tensor based data similar- 
ity); PhD/Postdoc grants 

— IWT: projects: SBO POM (100031); PhD/ 
Postdoc grants 

— iMinds Medical Information Technologies SBO 
2014 

@ Belgian Federal Science Policy Office: TUAP P7/19 
(DYSCO, Dynamical systems, control and opti- 
mization, 2012-2017) 


D.36 Cognitive Architectures and Agents 
by Sebastien Hélie, Ron Sun 

This research was supported by the ONR grant 
N00014-08-1-0068 to the second author. Requests 
for reprints should be addressed to Sébastien Hélie, 
Department of Psychological Sciences, Purdue Uni- 
versity, West Lafayette, IN 47907-2081, e-mail: she- 
lie@purdue.edu; or to Ron Sun, Department of Cog- 
nitive Science, Rensselaer Polytechnic Institute, email: 
rsun @rpi.edu. 


D.40 Evolving Connectionist Systems: 

From Neuro-Fuzzy-, to Spiking- 

and Neuro-Genetic 

by Nikola Kasabov 
The work presented in this chapter is supported by the 
Knowledge Engineering and Discovery Research Insti- 
tute (KEDRI, http://www.kedri.info) of the Auckland 
University of technology, New Zealand. 


E.43 Genetic Programming 
by James McDermott, Una-May O'Reilly 

JMcD was funded for this research by the Irish Re- 
search Council for Science, Engineering, and Technol- 
ogy, co-funded by Marie Curie. U-MO’R acknowledges 
the support of the Li Ka Shing Foundation, General 
Electric, and the US Department of Energy. Thanks to 
Erik Hemberg for reading early drafts. 


E.45 Estimation of Distribution Algorithms 

by Martin Pelikan, Mark Hauschild, 

Fernando Lobo 
This project was sponsored by the National Sci- 
ence Foundation under grants ECS-0547013 and HS- 
1115352, by the University of Missouri in St. Louis 
through the High Performance Computing Collabora- 
tory sponsored by Information Technology Services, 
and by the University of Missouri Bioinformatics Con- 
sortium (UMBC). Any opinions, findings, and conclu- 
sions or recommendations expressed in this material are 
those of the author(s) and do not necessarily reflect the 
views of the National Science Foundation. 


E.46 Parallel Evolutionary Algorithms 
by Dirk Sudholt 

Part of this work was done while the author was 
a member of CERCIA, University of Birmingham, sup- 
ported by EPSRC grant EP/D052785/1. The research 
leading to these results has received funding from 
the European Union Seventh Framework Programme 
(FP7/2007—2013) under grant agreement No. 618091 
(SAGE). 


E.49 Multi-Objective 

Evolutionary Algorithms 

by Kalyanmoy Deb 
This chapter is an updated version of a recent article by 
the author: K. Deb: Recent developments in evolution- 
ary multi-objective optimization. In: Trends in Multiple 
Criteria Decision Analysis Trends in Multiple Criteria 
Decision Analysis, International Series in Operations 
Research & Management Science, Vol. 142, ed. by S. 


Acknowledgements 


Greco, M. Ehrgott and J. R. Figueira (Springer, New 
York 2010), pp. 339-368. 


E.50 Parallel Multiobjective 

Evolutionary Algorithms 

by Francisco Luna, Enrique Alba 
This work has been partially funded by the Span- 
ish Ministry of Science and Innovation and FEDER 
under contracts TIN2008-06491-C04-01 (the MSTAR 
project) and TIN2011-28194 (the roadME project), and 
by the Andalusian Government under contract P07- 
TIC-03044 (the DIRICOM project). 


E.51 Many-Objective Problems: 

Challenges and Methods 

by Antonio López Jaimes, Carlos Coello Coello 
The second author acknowledges support from CONA- 
CyT project no. 103570. 


E.52 Memetic and Hybrid 

Evolutionary Algorithms 

by Jhon Amaya, Carlos Cotta Porras, 

Antonio Fernández Leiva 
This work was partially supported by the Spanish 
MICINN under project ANYSELF (TIN2011-28627- 
C04-01) and by Junta de Andalucia under project TIC- 
6083. 


E.54 Stochastic Local Search Algorithms: 

An Overview 

by Holger Hoos, Thomas Stiitzle 
This work has been supported by the Meta-X, an ARC 
project funded from the Scientific Research Directorate 
of the French Community of Belgium. Holger H. Hoos 
acknowledges support provided by a Discovery Grant 
from the Natural Sciences and Engineering Research 
Council of Canada (NSERC); Thomas Sttzle acknowl- 
edges support of the Belgian FNRS, of which he is 
a research associate. 


E.56 How to Create Generalizable Results 

by Thomas Bartz-Beielstein 
This work was kindly supported by the Federal Min- 
istry of Education and Research (BMBF) under the 
grants MCIOP (FKZ 17N0311) and CIMO (FKZ 
17002X11). 

In addition, the paper and the corresponding R code 
are based on Chiarandini and Goegebeur’s publication 
Mixed models for the analysis of optimization algo- 
rithms [81.3]. 


E.59 Modeling and Optimization 
of Machining Problems 
by Dirk Biermann, Petra Kersting, Tobias Wagner, 
Andreas Zabel 
This work was supported by the Deutsche Forschungs- 
gemeinschaft (DFG) by founding the Collaborative Re- 
search Center Computational Intelligence (SFB 531). 


E.64 Metaheuristic Algorithms 

and Tree Decomposition 

by Thomas Hammerl, Nysret Musliu, 

Werner Schafhauser 
The work was supported by the Austrian Science Fund 
(FWF): P20704-N18 and P24814-N23. Moreover, the 
research herein is partially conducted within the com- 
petence network Softnet Austria II (www.soft-net.at, 
COMET K-Projekt) and funded by the Austrian Federal 
Ministry of Economy, Family and Youth (bmwf}), the 
province of Styria, the Steirische Wirtschaftsfrderungs- 
gesellschaft mbH. (SFG), and the city of Vienna in 
terms of the center for innovation and technology (ZIT). 


F.66 Swarm Intelligence in Optimization 

and Robotics 

by Christian Blum, Roderich Groß 
This work was supported by grant TIN2012-37930- 
C02-02 of the Spanish Government. In addition, C. 
Blum acknowledges support from IKERBASQUE, the 
Basque Foundation for Science. R. Gro acknowledges 
support from the Engineering and Physical Sciences 
Research Council (EPSRC, grant no. EP/K033948/1). 


F.68 Ant Colony Optimization for the Minimum- 
Weight Rooted Arborescence Problem 
by Christian Blum, Sergi Mateo Bellido 
This work was supported by grant TIN2007-66523 
(FORMALISM) of the Spanish government. 


F.71 Fundamental Collective Behaviors 

in Swarm Robotics 

by Vito Trianni, Alexandre Campo 
Vito Trianni acknowledges support from the H*Swarm 
project, founded within the EUROCORES Programme 
EuroBioSAS of the European Science Foundation. 
The project is partially supported by funds from the 
Italian CNR and the Belgian F.R.S.-FNRS. Alexan- 
dre Campo acknowledges support from the CoCoRo 
project, funded by the Information and Communication 
Technologies programme (call FP7-ICT-2009-6) of the 
European Commission under grant number 270382. 


1541 


*|mouypV 


1543 


About the Authors 


= Enrique Alba 


Chapter E.50 


Universidad de Malaga 
E.T.S.I. Informatica 
Malaga, Spain 
eat@I/cc.uma.es 


Jose M. Alonso 


European Centre for Soft Computing 
Cognitive Computing 

Mieres, Spain 
jose.alonso@softcomputing.es 


Jhon Edgar Amaya 


Universidad Nacional Experimental del 
Tachira 

Dep. Electronic Engineering 

San Cristobal, Venezuela 
jedgar@unet.edu.ve 


"mgs Plamen P. Angelov 


Lancaster University 


Enrique Alba is a Full Professor at the University of of Málaga (Spain) where received 
his degree in Engineering in 1992 and his PhD in Computer Science in 1999. He is 
an invited professor at INRIA, the University of Luxembourg, and the University of 
Ostrava. He has published 80 articles in ISI-ranked journals, 40 papers in LNCS, over 
250 refereed conference papers, 11 books, and 39 book chapters. 


Chapter B.14 


Jose Alonso received his MSc (2003) and PhD (2007) degrees in Telecommunication 
Engineering from the Technical University of Madrid, Spain. He has published 
more than 60 papers in international journals and as book chapters. His main 
research lines are soft computing, fuzzy logic, computing with perceptions, fuzzy 
modeling, interpretable fuzzy systems, knowledge extraction and representation, and 
development of free software tools. 


Chapter E.52 


Jhon Edgar Amaya received his degree in Engineering in 1997 from 
UNET and an MSc degree in Computation in 2003 from ULA. He 
obtained his PhD degree from the University of Malaga in 2011. He is 
= an Associate Professor in the Department of Electronic Engineering at 
UNET. His research interests include evolutionary computation and soft 
computing applications in microelectronic devices. 


Chapter G.75 


Plamen P. Angelov leads the Data Science Group at the School of 


School of Computing and Communications Computing and Communications, Lancaster University, UK. He has 


Bailrigg, Lancaster, UK 
p.angelov@lancaster.ac.uk 


Dirk V. Arnold 


authored or co-authored over 200 peer-reviewed publications in leading 
journals, five patents, two research monographs, and several edited 
books. His interests are computational intelligence and autonomous 
system modeling, identification, and machine learning. 


Chapter E.44 


Dalhousie University 
d Faculty of Computer Science 


(eo Halifax, Nova Scotia, Canada 
< A dirk@cs. dal. ca 


Anne Auger 


University Paris-Sud Orsay 
CR Inria 

Orsay Cedex, France 
anne.auger@inria.fr 


Dirk Arnold is a Professor in the Faculty of Computer Science at Dalhousie University. 
His research interests include evolutionary computation, image processing, and 
computer graphics and animation. 


Chapter E.44 


Anne Auger is a permanent researcher at the French National Institute for Research 
in Computer Science and Control (INRIA). She received her diploma (2001) and 
PhD (2004) in Mathematics from the Paris VI University. Prior to joining INRIA, 
she worked for 2 years (2004-2006) at ETH in Zurich. Her main research interest 
is stochastic continuous optimization, including theoretical aspects and algorithm 
designs. 


1544 About the Authors 


sioyyny 


Davide Bacciu Chapter D.31 


Davide Bacciu received a Laurea degree in Computer Science from 
the University of Pisa in 2003 and a PhD in Computer Science and 
Engineering from IMT Lucca in 2008. Currently, he is with the CI and 
Machine Learning Group, the University of Pisa. His research interests 
include machine learning, graphical models, neural networks, learning in 
structured domains, and machine vision. 


Universita di Pisa 
Dip. Informatica 
Pisa, Italy 
bacciu@di.unipi.it 


ia z Michał Baczynski Chapter B.12 
University of Silesia Michał Baczynski received his MSc and PhD degrees in Mathematics 
Inst. Mathematics from the University of Silesia, Poland in 1995 and 2000, respectively. He 
Katowice, Poland received the Habilitation degree from the Polish Academy of Sciences 
michal. baczynski@us.edu.p! in 2010. His current research interests include fuzzy aggregation 


operations, chiefly fuzzy implications, approximate reasoning, fuzzy 
systems, and functional equations. 


Edurne Barrenechea Chapter A.6 

Universidad Publica de Navarra Edurne Barrenechea received the PhD degree in Computer Science from the Public 

Dep. Automatica y Computacion University of Navarra in 2005 where she is an Associate Professor in the Department 

Pamplona (Navarra), Spain of Automatics and Computation. Her publications comprise more than 30 papers and 
< edurne. barrenechea@unavarra.es about 20 book chapters. Her research interests include fuzzy techniques for image 


processing, and medical and industrial applications of soft computing techniques. 


Thomas Bartz-Beielstein Chapter E.56 For biographical profile, please see the section "About the Part Editors". 


Lubica Benuskova r- 3 Chapter D.27 


Lubica Benuskova is an Associate Professor at the Department of Computer Science 
at the University of Otago, Dunedin, New Zealand. She is also a Professor at 
Comenius University, Bratislava, Slovakia. Her research activities are mainly in the 
area of computational neuroscience. Currently she serves as a member of the Neural 
Networks Technical Committee of the of the IEEE/CIS. 


University of Otago 

Dep. Computer Science 
Dunedin, New Zealand 
lubica@cs.otago.ac.nz 


Dirk Biermann Chapter E.59 


TU Dortmund University Dirk Biermann studied Mechanical Engineering and obtained his doctoral 

Dep. Mechanical Engineering degree from the University of Dortmund (now TU Dortmund University). 

Dortmund, Germany He has been Head of the Institute of Machining Technology (ISF) 

biermann@isf.de ù since 2007. He is an associate member of the International Academy of 
Production Engineering (CIRP) and of the German Academic Society for 
Production Engineering (WGP). 


a = Sašo Blažič Chapter G.75 
University of Ljubljana Sašo Blažič received his BSc, MSc and PhD degrees in 1996, 1999, 
Faculty of Electrical Engineering and 2002, respectively, from the Faculty of Electrical Engineering, 
ljubljana, Slovenia University of Ljubljana, where he currently holds the position of Full 
Àn saso. blazic@fe.uni-lj.si Professor. His research interests include adaptive, fuzzy, and predictive 


control. Recently, he has focused on autonomous mobile systems, 
mobile robotics, and satellite systems. 


Christian Blum Chapters F.66, F.68 For biographical profile, please see the section "About the Part Editors". 


About the Authors 


re æa Andrea Bobbio 


Chapter F.69 


Universita del Piemonte Orientale 
DiSit - Computer Science Section 
Alessandria, Italy 
andrea.bobbio@unipmn. it 


Josh Bongard 


University of Vermont 
Dep. Computer Science 
Burlington, USA 
josh.bongard@uvm.edu 


Piero P. Bonissone 


Andrea Bobbio is Professor of Computer Science at the Universita del Piemonte 
Orientale, Alessandria, Italy. He graduated from Politecnico di Torino in 1969, and 
in 1971 he joined the Istituto Elettrotecnico Nazionale Galileo Ferraris di Torino His 
activity focuses on the modeling and analysis of the performance and reliability of 
stochastic systems. 


Chapter D.37 


Piero P. Bonissone Analytics, LLC 
San Diego, USA 
bonissone@gmail.com 


Dario Bruneo 


Josh Bongard is an Associate Professor at the University of Vermont. He is the 
Director of the Morphology, Evolution, and Cognition Laboratory there, as well as 
the Vice Chair of the UVM Complex Systems Center. He is the recipient of an NSF 
PECASE award and has served as a Microsoft Research New Faculty Fellow. 


) Chapter D.41 


a Piero Bonissone is a Coolidge Fellow and a Fellow of AAAI, IEEE, and 
IFSA. He has published over 150 articles and has received 65 US patents. 
Recently, he was bestowed the 2012 Fuzzy Systems Pioneer Award from 
the IEEE Computational Intelligence Society. 


Chapter F.69 


Universita’ di Messina 


Messina, Italy 
dbruneo@unime.it 


Alberto Bugarin Diz 


Dario Bruneo is Assistant Professor in the Department of Engineering 


Dip. Ingegneria Civile, Informatica at the University of Messina. His research activity focuses on the study 


of distributed systems in particular on the management of advanced 
service provisioning, system modeling, and performance evaluation. 
His current research topics include sensor networks, Internet of Things, 
monitoring and performance and reliability of complex systems. 


Chapter B.20 


University of Santiago de Compostela 
Research Centre for Information 
Technologies 

Santiago de Compostela, Spain 
alberto. bugarin.diz@usc.es 


È. 
NS 


Humberto Bustince F 


Universidad Pública de Navarra 
Dep. Automática y Computación 
Pamplona (Navarra), Spain 
bustince@unavarra. es 


Martin V. Butz 


Alberto Bugarín is a full Professor of Artificial Intelligence at the Research Centre 
for Information Technologies of the University of Santiago de Compostela (CiTIUS). 
His research interests focus on modeling intelligent systems with uncertainty (fuzzy 
rule-based systems) and their applications in adaptive business intelligence (plan- 
ning/scheduling), automatic building of linguistic descriptions of data, and mobile 
robotics. 


Chapter A.6 


University of Tübingen 

Computer Science, Cognitive Modeling 
Tübingen, Germany 

martin. butz@uni-tuebingen.de 


Humberto Bustince received his BSc degree in Physics from Salamanca University, 
Spain, and his PhD degree in Mathematics from the Public University of Navarra. He 
is a full Professor in the Department of Automatics and Computation at the Public 


@ University of Navarra. He has authored more than 120 journal papers and more than 


100 contributions to international conferences. He has also co-authored four books. 


my Chapter E.47 


Martin V. Butz is a full Professor of Cognitive Modeling at the University 
of Tiibingen, Department of Computer Science and Department of 
Psychology. His research group focuses on how the brain develops 
anticipatory representations of the body, the surrounding space for 
controlling predictive interactions with the environment maximally 
effectively. 


1545 


s1oy}ny 


1546 About the Authors 


sioyyny 


Chapter F.71 


Université Libre de Bruxelles 
Unit of Social Ecology 
Brussels, Belgium 
alexandre.campo@ulb.ac.be 


Angelo Cangelosi 


Alexandre Campo is a Postdoctoral Fellow at the Unit of Social 
Ecology at the Université Libre de Bruxelles. He received his PhD 
there in Applied Sciences in 2011. His research interests include the 
study and design of complex systems applied to swarm robotics. He has 
participated in several projects funded by the European Commission. 


Chapter D.37 


Plymouth University 

Centre for Robotics and Neural Systems 
Plymouth, UK 
A.Cangelosi@plymouth.ac.uk 


Robert Carrese 


LEAP Australia Pty. Ltd. 
Clayton North, Australia 
robert.carrese@leapaust.com.au 


Ciro Castiello 


University of Bari 

Dep. Informatics 

Bari, Italy 
ciro.castiello@uniba.it 


Oscar Castillo Chapters G.78, G.80 


Angelo Cangelosi is Professor of Artificial Intelligence and Cognition and directs 
the Centre for Robotics and Neural Systems at Plymouth University, UK. His 
main expertise is on language and cognitive modeling in humanoid robots. He has 
coordinated UK and FP7 projects (RobotDoC ITN, ITALK, BABEL), and is the 2012 
Chair of the IEEE Autonomous Mental Development Technical Committee. 


| Chapter F.67 


Robert Carrese graduated as an Aerospace Engineer and then received his PhD degree 
in Aerodynamic Design and Optimization from RMIT University. Since 2012, he has 
been a Senior Engineer at LEAP Australia Pty. Ltd. His research interests include the 
development of evolutionary design processes for key aerodynamic deliverables and 

mitigation of adverse aeroelastic and aeroacoustic effects. 


Chapter B.14 


Ciro Castiello graduated in Informatics in 2001 and received his PhD 
in Informatics in 2005. He is currently a researcher in the Informatics 
Department at the University of Bari, Italy. His research interests 
include soft computing techniques, inductive learning mechanisms, 
and interpretability of fuzzy systems. He has published more than 50 
peer-reviewed papers. 


For biographical profile, please see the section “About the Part Editors”. 


5 Davide Cerotti 


Chapter F.69 


Politecnico di Milano 


Bioingegneria 
Milano, Italy 
davide.cerotti@polimi.it 


"games Badong Chen 


Davide Cerotti obtained his degree in Computer Science from the 


Dip. Elettronica, Informazione e University of Piemonte Orientale, Italy and his PhD in Computer 


Science in 2010 from the University of Turin, Italy. Currently he 
is a postdoctoral scholar at the Politecnico di Milano, Italy. His 
main research topics are Markovian agents, queueing networks, and 
performance evaluation of large-scale distributed systems. 


Chapter D.30 


Xi'an Jiaotong University 
Inst. Artificial Intelligence and Robotics 
Xi'an, China 

dy chenbd@mail.xjtu.edu.cn 


Badong Chen received his PhD degree in Computer Science and Technology from 
Tsinghua University, China, in 2008. He was a Postdoctoral Associate at the University 
of Florida from 2010 to 2012. He is currently a Professor at Xi’an Jiaotong University, 
China. His research interests are in signal processing and machine learning, and their 
applications in cognition and neuroscience. 


About the Authors 


Ke Chen Chapter D.28 


Ke Chen has been an academic staff member at The University of Manchester since 
2003. He has worked at Birmingham University, Peking University, Ohio State 
University, Kyushu Institute of Technology, and Tsinghua University. His main 
research interests lie in neural computation with an emphasis on deep and modular 
neural networks, machine learning, machine perception and their applications in 
intelligent systems. 


The University of Manchester 
School of Computer Science 
Manchester, UK 
chen@cs.manchester.ac.uk 


Davide Ciucci rF 


n Chapter C.25 


Davide Ciucci received a PhD in 2004 in Computer Science from the 

University of Milan and the Habilitation (HdR) from the University of 

Toulouse III in 2013. Since 2005, he has held a permanent research 

_ position at the University of Milano-Bicocca. His research activity 
concerns uncertainty management, with particular reference to rough sets, 
many-valued logics, and ontologies. 


University of Milano-Bicocca 
Dep. Informatics, Systems and 
Communications 

Milano, Italy 
ciucci@disco.unimib. it 


Carlos A. Coello Coello Chapter E.51 For biographical profile, please see the section “About the Part Editors”. 
Chris Cornelis Chapter C.26 
Ghent University Chris Cornelis has an MSc and a PhD degree in Computer Science from 
Dep. Applied Mathematics and Computer Ghent University (Belgium). Currently, he is a Postdoctoral Fellow at 
Science the University of Granada, supported by the Ramón y Cajal program, 
Ghent, Belgium as well as a Guest Professor at Ghent University. His current research 


= 4 > 
chriscornelis@ugr.es : : ; : 
@ug interests include fuzzy sets, rough sets, and machine learning. 


Nikolaus Correll Chapter F.74 

University of Colorado at Boulder Nikolaus Correll has been an Assistant Professor of Computer Science at the 
Dep. Computer Science University of Colorado at Boulder since 2009. He obtained a PhD from EPFL in 2007 
Boulder, USA and did postdoctoral studies at MIT CSAIL. His research interests are modeling and 


ncorrell@colorado.edu 


design of large-scale distributed swarming systems and smart materials. 


Carlos Cotta Porras = wy Chapter E.52 

Universidad de Málaga Carlos Cotta received his MSc and PhD degrees in Computer Science from the Univer- 
Dep. Lenguajes y Ciencias de la sity of Málaga in 1994 and 1998, respectively. He has held a tenured professorship at 
Computación this university since 2001. His main research areas involve metaheuristic optimization 
Málaga, Spain — in particular hybrid and memetic approaches — with a focus on both algorithmic and 


ccottap@icc.uma.es applied aspects (particularly combinatorial optimization) and complex systems. 


Damien Coyle Chapter D.39 


Damien Coyle is a Senior Lecturer at the School of Computing and 
Intelligent Systems, University of Ulster. His research interests include 
brain-computer interfaces, computational intelligence and neuroscience, 
neuroimaging, and biosignal processing. He is a recipient of the IEEE 
Computational Intelligence Society’s Outstanding Doctoral Dissertation 
Award and the International Neural Network Society’s Young Investigator 
of the Year Award. 


University of Ulster 

Intelligent Systems Research Centre 
Derry, Northern Ireland, UK 
dh.coyle@ulster.ac.uk 


1547 


s1oy}ny 


1548 About the Authors 


sioyyny 


Guy De Tré 


Chapter B.19 


Ghent University 

Dep. Telecommunications and 
Information Processing 
Ghent, Belgium 
guy.detre@ugent.be 


Kalyanmoy Deb 


Guy De Tré received his MSc in Computer Science in 1994 and 

his PhD in Engineering in 2000 from Ghent University (Belgium). 
He is Associate Professor in Fuzzy Information Processing in the 
Department of Telecommunications and Information Processing at 
Ghent University. His research is centred on soft computing techniques 
for database modeling, flexible querying, and decision support. 


Chapter E.49 


Michigan State University 

Dep. Electrical and Computer Engineering 
East Lansing, USA 

kdeb@egr.msu.edu 


Clarisse Dhaenens 


University of Lille 

CRIStAL laboratory 

Villeneuve d'Ascq Cedex, France 
clarisse.dhaenens@univ-lillet.fr 


Luca Di Gaspero 


Universita degli Studi di Udine 

Dip. Ingegneria Elettrica, Gestionale e 
Meccanica 

Udine, Italy 
luca.digaspero@uniud. it 


F m= Didier Dubois 


Kalyanmoy Deb is a Koenig Endowed Chair Professor at Michigan State University. 
His research interests are in evolutionary optimization and their application in 
optimization, modeling, and machine learning. He has published 375 research papers 
with h-index 85. He is on the Editorial Board of 18 major international journals. 


Chapter E.61 


Clarisse Dhaenens is Professor at the University of Lille. She obtained her PhD in 
1998 from the Polytechnicum University of Grenoble (INPG). She became Associate 


» Professor in 1999 at the University of Lille and a full Professor in 2006. Her work deals 


with operations research, combinatorial optimization with applications in knowledge 
discovery for bioinformatics and healthcare. 


Chapter E.62 


Luca Di Gaspero graduated and received a PhD in Computer Science 
from the University of Udine, where he is currently Senior Lecturer of 
Information Technology. In 2011 he was Visiting Professor at Vienna 
University of Technology, where he also received his Habilitation in 2014. 
His research focus is boosting metaheuristic techniques by hybridizing 
them with other optimization methods, mainly from the AI field. 


Chapter A.3 


Université Paul Sabatier 
IRIT — Equipe ADRIA 
Toulouse Cedex 9, France 
dubois@irit.fr 


Antonio J. Fernandez Leiva 


Didier Dubois is a Research Advisor at IRIT, the Computer Science 
Department of Paul Sabatier University, Toulouse, France. He has 
co-authored 2 books, a monograph, and over 15 edited volumes on 
uncertain reasoning and fuzzy sets. His interests range from knowledge 
representation and reasoning to decision sciences and representation 
and processing of imprecise information. 


Chapter E.52 


Universidad de Malaga 

Dep. Lenguajes y Ciencias de la 
Computación 

Málaga, Spain 
afdez@Icc.uma.es 


Javier Fernandez 


Antonio J. Fernandez-Leiva received his PhD degree in Computer Science from 
the University of Malaga in 2002 and later became an Associate Professor. In the 
past he has worked in private companies as a computer engineer. His main areas of 
research involve both the application of metaheuristics techniques to combinatorial 
optimization and the employment of computational intelligence in games. 


Chapter A.6 


Universidad Publica de Navarra 
Dep. Automatica y Computación 
Pamplona (Navarra), Spain 
fcojavier.fernandez@unavarra.es 


Javier Fernandez received his PhD in Mathematics from the University of the Basque 
Country in 2003. He was a postdoc researcher at the CNRS and currently he is a 
member of the GIARA research group at the Public University of Navarra. He has 
authored more than 30 papers in JCR journals. His main research interests are fuzzy 
sets and extensions, aggregation functions, image processing, and harmonic analysis. 


About the Authors 


Martin H. Fischer 


University of Potsdam 
Psychology Dep. 

Potsdam OT Golm, Germany 
martinf@uni-potsdam.de 


5 Janos C. Fodor 


Chapter D.37 


Martin Fischer obtained a PhD from the University of Massachusetts in 
1997 through graduate studies on motor control and visual attention. He 
then worked at LMU Munich for 3 years before moving to University of 
Dundee in Scotland. In 2011 he became Professor of Cognitive Sciences 
at the University of Potsdam in Germany, where he investigates embodied 
numerical cognition. 


Chapter B.10 


Obuda University 

Dep. Applied Mathematics 
Budapest, Hungary 
fodor@uni-obuda.hu 


Janos Fodor is a Full Professor and the Rector of Obuda University, 
Budapest, Hungary. He received his Master’s degree in 1981 and his PhD 
in 1991, both in Mathematics. He is a Doctor of the Hungarian Academy 
of Sciences and is pursuing research in mathematical foundations of 
fuzzy logic, computational intelligence, and preference modeling. He 
has published 2 monographs and over 250 scientific papers. 


Chapter F.70 


Engineering 


meg Jairo Alonso Giraldo 
fa.) Universidad de los Andes 

Dep. Electrical and Electronics 
D -~i 


Bogotá, Colombia 
ja.giraldo908@uniandes.edu.co 


Siegfried Gottwald 


Leipzig University 

Inst. Philosophy 

Leipzig, Germany 
gottwald@uni-leipzig.de 


Salvatore Greco 


Jairo Giraldo is a PhD student in the Department of Electrical and Electronics 
Engineering at Universidad de los Andes, Colombia. His research interests include 
distributed control algorithms for power grids, multi-agent-based control, networked 
systems, and cybersecurity and privacy. He is a member of the IEEE Control Systems 
Society and the Power and Energy Society. 


J Chapter A.2 


University of Catania 
Dep. Economics and Business 


Catania, Italy 


salgreco@unict.it 


Marco Gribaudo 


Siegfried Gottwald was Full Professor for Nonclassical and Mathematical Logic 
at Leipzig University until 2008. He obtained a PhD in Mathematics in 1969. His 
research areas include many-valued and fuzzy logic, fuzzy sets, fuzzy relations 
and fuzzy control, and history of mathematics and logic. He is IFSA Fellow and 
he received the Forschungspreis Technische Kommunikation of the SEL-Alcatel 
foundation (1992). 


3 Chapters C.22, C.24 


Salvatore Greco is Professor at the University of Catania, Italy, and 
part-time Professor at the University of Portsmouth, UK. He is an active 
researcher in the area of rough set theory, multiple criteria decision aiding 
> (MCDA), and non-additive integrals. He received the Multiple Criteria 
Decision Making Gold Medal in 2013. 


Chapter F.69 


—1E 


Roderich Groß 


Politecnico di Milano 


Marco Gribaudo is a Senior Researcher at the Politecnico di Milano, 


Dip. Elettronica, Informazione e Italy. His current research interests are multi-formalism modeling, 


Bioingegneria 
Milano, Italy 
marco.gribaudo@polimi.it 


Chapter F.66 


queueing networks, mean-field analysis and spatial models. 


For biographical profile, please see the section “About the Part Editors”. 


1549 


s1oy}ny 


1550 About the Authors 


sioyyny 


— 


Hani Hagras 


Jerzy W. Grzymala-Busse Chapter C.23 

University of Kansas Jerzy W. Grzymala-Busse has been a Professor of Electrical Engineering and 

Dep. Electrical Engineering and Computer Computer Science at the University of Kansas since August 1993. His research 
Science interests include data mining, machine learning, knowledge discovery, expert systems, 
Lawrence, USA reasoning under uncertainty, and rough set theory. He is the author, co-author, or editor 
Jerzygb@ku.edu of 14 books, and has published over 300 articles. 


Chapter B.18 


University of Essex 


The Computati 
Colchester, UK 
hani@essex.a 


Hani Hagras is a Professor of Computational Intelligence, Director of the Computa- 
tional Intelligence Centre, Head of the Fuzzy Systems Research Group and Head of 
the Intelligent Environments Research Group at the University of Essex, UK. He is a 
Fellow of IEEE and of IET. He has authored more than 250 papers. He was awarded 
the Oustanding Paper award of the IEEE Computational Intelligence Society, in 2004 
and 2013. 


onal Intelligence Centre 


c.uk 


Heiko Hamann Chapter F.74 

Heiko Hamann is Assistant Professor at the Computer Science Depart- 
ment of the University of Paderborn. He spent his postdoctoral time at the 
University of Graz, Austria. He received his PhD from the University of 
Karlsruhe, Germany. The main focus of his research efforts is on swarm 
models, swarm robotics, and synthesis of robot controllers. 


Universtity of Paderborn 

Dep. Computer Science 

Paderborn, Germany 
heiko.hamann@uni-paderborn.de 


= Thomas Hammerl Chapter E.64 
Vienna, Austria Thomas Hammerl is working as a self-employed software developer 
thomas.hammerl@gmail.com in Vienna, Austria. He received his Master’s degree from Vienna 


University of Technology in 2009. While working on his thesis he did 
research on metaheuristics, constraint satisfaction, and optimization. 


Nikolaus Hansen 


Julie Hamon Chapter E.61 

Ingenomix Julie Hamon graduated as a computer science and statistics engineer from the Ecole 
Dep. Research and Development Polytechnique Universitaire de Lille (Polytech’ Lille), and then received her PhD in 
Boisseuil, France Combinatorial Optimization and Statistics from Lille 1 University. She is currently a 


julie.hamon@ingenomix.fr biostatistician at Ingenomix, a company specialized in genomic selection in bovine. 


Her main research interests include statistical models applied to genomic data. 


Chapter E.44 


Université Pari 
Machine Learn 
Group (TAO) 

Orsay Cedex, F 
hansen@lri.fr 


s-Sud 
ing and Optimization 


Nikolaus Hansen is a Senior Researcher at Inria, France. After studies in medicine and 
mathematics he received his PhD in Civil Engineering from the Technical University 
Berlin in 1998 and his Habilitation in Computer Science from the University Paris-Sud 
in 2010. Before joining Inria he worked in applied artificial intelligence, evolutionary 
computation, and genomics in Berlin, and in computational science at the ETH Zurich. 


rance 


Mark W. Hauschild Chapter E.45 


University of Missouri-St. Louis 
Dep. Mathematics and Computer 
Science 

St. Louis, USA 
markhauschild@gmail.com 


Mark Hauschild received his PhD in Applied Mathematics from the 
University of Missouri in 2014. He is a Visiting Professor at the University 
of Missouri, where he teaches courses on AI. His research interests 
include estimation of distribution algorithms, genetic programming, and 
machine learning. He is currently a member of the Missouri Estimation 
of Distribution Algorithms Laboratory. 


About the Authors 1551 
r =m Sebastien Hélie Chapter D.36 
Purdue University Sebastien Helie is Assistant Professor in the Department of Psycho- 
Dep. Psychological Sciences logical Sciences at Purdue University. He received a PhD in Cognitive 

i West Lafayette, USA Science from the Université du Québec à Montreal. He uses compu- 

T shelie@purdue.edu tational cognitive neuroscience and neuroimaging methods to study 
categorization, automaticity, rule learning, sequence learning, and skill 
acquisition. 

> 
= 
= 
ey 
Jano I. van Hemert Chapter E.65 S 


Holger H. Hoos 


Optos 
Dunfermline, UK 
jano@vanhemert. co.uk 


Dr Jano van Hemert (MSc 1998, PhD 2002; Leiden University, The Netherlands) is 
a Senior Manager and the Academic Liaison at Optos plc. His main area of research 
is in computer science and its applications. From 2005 until 2011 he was a visiting 
researcher at the Human Genetics Unit in Edinburgh of the UK’s Medical Research 
Council. 


Chapter E.54 


University of British Columbia 
Dep. Computer Science 
Vancouver, Canada 
hoos@cs.ubc.ca 


Tania Iglesias 


University of Oviedo 

Dep. Statistics and 0.R. 
Oviedo, Spain 
iglesiasctania@uniovi.es 


= Giacomo Indiveri 


Holger H. Hoos is a Professor of Computer Science and a Faculty Associate at the 
Peter Wall Institute for Advanced Studies at the University of British Columbia 
(Canada). He is a Fellow of the Association for the Advancement of Artificial 
Intelligence (AAAI) and past President of the Canadian Artificial Intelligence 
Association (CAIAC). His main research interests span empirical algorithmics, 
artificial intelligence, bioinformatics, and computer music. His research has been 
published in numerous books and journals. 


Chapter B.11 


Tania Iglesias received her BSc degree in Mathematics from the University 
of Oviedo, Spain, in 2006 and her MSc degree from the University of 
Granada, Spain, in 2013. She is now a technician in the Statistical 

| Consulting Unit of the University of Oviedo. Decision making in fuzzy 
sets is her main topic of research. 


Chapter D.38 


9 Inst. Neuroinformatics 
4. Zurich, Switzerland 
LA giacomo@ini.uzh.ch 


) Masahiro Inuiguchi 


University of Zurich and ETH Zurich 


Giacomo Indiveri is a Professor at the Faculty of Science of the 
University of Zurich, Switzerland. He obtained an MSc degree in 
Electrical Engineering and a PhD degree in Computer Science from 
the University of Genoa, Italy. His research interests lie in the study 
of real and artificial neural processing systems, and in the hardware 
implementation of neuromorphic cognitive systems. 


Chapter C.26 


Osaka University 

Dep. Systems Innovation, Graduate School 
of Engineering Science 

Toyonaka, Osaka, Japan 
inuiguti@sys.es.osaka-u.ac.jp 


Masahiro Inuiguchi received ME and DE degrees from Osaka Prefecture University 
in 1987 and 1991. He worked as a Research Associate at Osaka Prefecture University 
(1987-1992), Associate Professor at Hiroshima University (1992-1997) and Osaka 
University (1997-2003). At present, he is a Full Professor at Osaka University. His 
research interests include possibility theory, fuzzy mathematical programming, rough 
sets, and their applications to decision making. 


1552 


sioyyny 


About the Authors 


Hisao Ishibuchi 


Osaka Prefecture University 

Dep. Computer Science and Intelligent 
Systems, Graduate School of Engineering 
Osaka, Japan 
hisaoi@cs.osakafu-u.ac.jp 


Emiliano luliano 


Chapter G.77 


CIRA, Italian Aerospace Research Center 


Fluid Dynamics Lab. 
Capua (CE), Italy 
e.iuliano@cira.it 


z Julie Jacques 


Hisao Ishibuchi has been a Professor at Osaka Prefecture University since 1999. His 
research interests include fuzzy rule-based systems, multiobjective optimization, and 
evolutionary games. He was the IEEE CIS Vice-President for Technical Activities 
for 2010-2013. Currently he is an IEEE CIS AdCom member (2014-2016), an IEEE 
CIS Distinguished Lecturer (2015-2017), and the Editor-in-Chief of the IEEE CI 
Magazine. 


Chapter E.60 


Emiliano Iuliano received the Laurea (MSc) degree and the Doctorate in 
Aerospace Engineering from the University of Naples in 2004 and 2012, 
respectively. He is currently a Senior Researcher at CIRA, the Italian 
Aerospace Research Center. His research interests include CFD analysis, 
aircraft aerodynamic design, surrogate-based optimization methods, and 
aircraft in-flight icing. 


Chapter E.61 


Alicante LAB 
Seclin, France 


julie. jacques@alicante.fr 


A 


Julie Jacques received her PhD degree in Computer Science from the 
University of Lille, France, in 2013. She has been working in the 
industry on applied research projects since 2008. Her research interests 
are operational research and data mining, with applications to health. 


Chapter C.21 


Warsaw, Poland 


Balasubramaniam Jayaram 


andrzej.adgam@gmail.com 


Knowledge Technology Foundation 


Indian Institute of Technology Hyderabad 
Dep. Mathematics 
Hyderabad, India 
jbala@iith.ac.in 


Laetitia Jourdan 


A` 


Andrzej Jankowski, received his PhD from the University of Warsaw and is one of 
the founders of the Polish-Japanese Institute of Information Technology. He currently 
works for R&D and education AI projects at Warsaw University of Technology and 
the University of Warsaw. He has unique experience in managing complex IT projects 
in Central Europe and USA. 


Chapter B.12 


University of Lille 1 


INRIA/UFR IEEA/laboratory CRIStAL/CNRS 


Lille, France 


laetitia.jourdan@univ-lille1.fr 


Balasubramaniam Jayaram received his MSc and PhD degrees in Mathematics from 
the Sri Sathya Sai Institute of Higher Learning, India, in 1999 and 2004. He has 
authored and co-authored more than 50 publications and a research monograph. 
His current research interests include fuzzy aggregation operations, approximate 
reasoning, clustering in high dimensions, and kernel learning methods. 


Chapter E.61 


Laetitia Jourdan is a full Professor of Computer Sciences at the University 
of Lille 1/LIFL. She holds a PhD in Combinatorial Optimization from 
the University of Lille 1 (France). Her areas of research are modeling 
datamining tasks as combinatorial optimization problems, solving meth- 
ods based on metaheuristics, and incorporating learning in metaheuristics 
and multiobjective optimization. 


About the Authors 1553 


Nikola Kasabov Chapter D.40 

Auckland University of Technology Nikola K. Kasabov obtained his Master’s and PhD degrees from the 
KEDRI — Knowledge Engineering and Technical University of Sofia, Bulgaria. He is the Director and the 
Discovery Research Inst. Founder of the Knowledge Engineering and Discovery Research Insti- 
Auckland, New Zealand tute (KEDRI) and Professor of Knowledge Engineering at the School 


nkasabov@aut:ae:nz of Computing and Mathematical Sciences at Auckland University of 


Technology, New Zealand. He has published over 450 papers, books, 
and patents on informatics, computational intelligence, neural networks, 


bioinformatics, neuroinformatics. = 
= 
S 
o 
= 
7) 
~ = Petra Kersting Chapter E.59 
bl TU Dortmund University Petra Kersting studied Computer Science and finished her dissertation in Mechanical 
Dep. Mechanical Engineering Engineering at TU Dortmund University. She is Head of the Division Simulation and 
Dortmund, Germany Optimization and holds the junior professorship Modeling methods for machining 
d a? kersting@isf.de processes. She is Research Affiliate of the International Academy of Production 


Engineering (CIRP) and Dorothea-Erxleben Visiting Professor 2014/2015 at the 
OvGU-Magdeburg. 


Erich P, Klement F ) Chapter A.5 
Johannes Kepler University Erich Peter Klement received his PhD degree from the University of Innsbruck. He is 
Dep. Knowledge-Based Mathematical Professor of Mathematics at the Johannes Kepler University in Linz, Austria and the 
Systems k _ Head of the Fuzzy Logic Laboratory in Linz. He has (co-)authored 3 monographs and 
Linz, Austria __ l 80 journal papers and is a Member of the Editorial Boards of 16 journals. 
ep.klement@jku.at 

Anna Kolesárová m my Chapter A.4 


Slovak University of Technology in Anna Kolesárová received her MSc degree in Mathematics and Physics 


Bratislava i from Comenius University in Bratislava and her PhD degree from the 
Faculty of Chemical and Food Slovak Academy of Sciences. In 2008 she became a Full Professor at 
Technology the Slovak University of Technology in Bratislava. Her current research 


Bratislava, Slovakia 


anna.kolesarova@stuba.sk interests include aggregation functions, with special stress on copulas, 


measures and integrals, decision making, and fuzzy mathematics. 


Magda Komornikova Chapter A.4 


Slovak University of Technology Magda Komornikova graduated at Comenius University, Faculty of 

Dep. Mathematics Mathematics and Physics in 1973 and received her PhD from the same 

Bratislava, Slovakia faculty in 1979. Since 1990 she has been a member of the Department 

magda@math.sk of Mathematics at the Slovak University of Technology in Bratislava. 
She has been a full professor since 2002. Her fields of interest are 
measure theory, uncertainty modeling, copulas, time series analysis, and 
aggregation and related operators. 


Mark Kotanchek Chapter E.57 


Evolved Analytics LLC Mark Kotanchek’s diverse academic background (Engineering Science BSc, Acoustics 
Midland, USA : MEng, Aerospace Engineering PhD, IEEE Senior Member) is consistent with the 
mark@evolved-analytics.com diversity of his professional experience. He founded Evolved Analytics in 2005. 


1554 About the Authors 


sioyyny 


Robert Kozma Chapter D.33 


University of Memphis 

Dep. Mathematical Sciences 
Memphis, USA 
rkozma@memphis.edu 


Robert Kozma has MSc degrees from Moscow Power Engineering University and 
Eötvös University (Budapest, Hungary), respectively. He received his PhD from Delft 
University of Technology (The Netherlands). He is a First Tennessee University 
Professor and receipient of the INNS Gabor Award. His research includes robust 
decision support for large-scale networks, autonmous robot control, sensor networks, 
brain networks, and brain—computer interfaces. 


Tomas Kroupa — Chapter A.9 

Institute of Information Theory and Í | Tomáš Kroupa obtained his PhD in Mathematical Engineering from the 
Automation i Czech Technical University in 2005. He is affiliated with the Institute of 
Dep. Decision-Making Theory Information Theory and Automation, the Academy of Sciences of the 


Prague, Czech Republic 


: Czech Republic. He has had a Senior Researcher position in the institute 
kroupa@utia.cas.cz 


since 2014. His area of research is game theory and many-valued logics. 


Rudolf Kruse Chapter B.17 


University of Magdeburg Rudolf Kruse obtained his PhD in Mathematics in 1980 from the 
Faculty of Computer Science University of Braunschweig. Since 1996 he has been a full professor 
Magdeburg, Germany at the University of Magdeburg where he is leading the computational 
N kruse@iws.cs.uni-magdeburg.de intelligence research group. He is a Fellow of the International Fuzzy 
Systems Association and a Fellow of the Institute of Electrical and 
Electronics Engineers. 


Tufan Kumbasar Chapter B.18 

Istanbul Technical University Tufan Kumbasar is currently working as a Postdoctoral Research Fellow in the 

Control Engineering Dep. Control Engineering Department of Istanbul Technical University. His research 

Maslak, Istanbul, Turkey interests lie predominately in the area of control engineering, particularly with respect 
A kumbasart@itu.edu.tr to optimization, intelligent control, process control and mechatronic control methods, 


and their applications. He is particularly interested in the area of type-2 fuzzy logic 
systems, especially in their controller applications. 


James T. Kwok fF wy Chapter D.29 
Hong Kong University of Science and Yee} James Kwok received his PhD degree from the Hong Kong University of Science and 
Technology _.» | Technology in 1996. He then joined the Department of Computer Science, Hong Kong 


Hong Kong, Hong Kong of Computer Science and Engineering. 


Dep. Computer Science and Engineering y ý Baptist University as an Assistant Professor. He is now a Professor in the Department 
jamesk@cse.ust.edu.hk & d 


Rhyd Lewis F m Chapter E.63 


Cardiff University 
School of Mathematics 
Cardiff, UK 
lewisR9@cf.ac.uk 


Rhyd Lewis is a lecturer in operational research in the School of 
Mathematics at Cardiff University. He holds a PhD in Computer Science 
‘=~ and Operational Research from Edinburgh Napier University and a degree 
fm, in Computing from the University of Wales, Swansea. Dr Lewis is the 
Co-Founder and an Associate Editor of the International Journal of 


Metaheuristics. 
5 Xiaodong Li Chapter F.67 
RMIT University Xiaodong Li received his PhD degree in Artificial Intelligence from 
School of Computer Science and the University of Otago. He is currently an Associate Professor at 
Information Technology the School of Computer Science and Information Technology, RMIT 


Melbourne, Australia 
xiaodong.li@rmit.edu.au 


University. His research interests include evolutionary computation, 
machine learning, multiobjective optimization, and swarm intelligence. 


About the Authors 1555 


Paulo J.G. Lisboa Chapter D.31 

Liverpool John Moores University Paulo Lisboa has a PhD in Theoretical Physics from the University of Liverpool. He is 
Dep. Mathematics & Statistics a Professor in Industrial Mathematics at Liverpool John Moores University. He chairs 
Liverpool, UK the Medical Analysis Task Force in the Data Mining Technical Committee of the IEEE 
p.j.lisboa@ljmu.ac.uk Computational Intelligence Society. His research focus is computational data analysis 


for medical decision support, in particular with interpretable machine learning models. 


Weifeng Liu Chapter D.30 


s1oy}ny 


Weifeng Liu received his PhD degree in Electrical and Computer Engineering from 
the University of Florida in 2008. He is currently a senior researcher with Jump 
Trading, Chicago, IL. His research interests include machine learning, adaptive signal 
processing and their applications to e-commerce, business, and finance. 


Jump Trading 
Chicago, USA 
weifeng@ieee.org 


Fernando G. Lobo gems Chapter £.45 


Fernando Lobo is Associate Professor at the University of Algarve, 
Portugal. He received his PhD degree in 2000 from Universidade Nova de 
Lisboa, Portugal, and during that period was a regular visitor at the Illinois 
à Genetic Algorithms Laboratory, UIUC. His major research interests are 
evolutionary computation, and computers and accessibility. 


Universidade do Algarve 

Dep. Engenharia Electronica e 
Informatica 

Faro, Portugal x 
fernando.lobo@gmail.com 


a seem Antonio Lopez Jaimes Chapter E.51 
CINVESTAV-IPN Antonio López Jaimes received his BSc degree in Computer Science 
Dep. Computación in 2002 from the Autonomous Metropolitan University, Mexico, and 
México, Mexico : his MSc and PhD degrees in Computer Science from the National 
tonio.jaimes@gmail.com Polytechnic Institute, Mexico in 2005 and 2011. His research interests 


include evolutionary multiobjective optimization, parallel evolutionary 
algorithms, and interactive optimization methods. He is currently a 
Professor at the Autonomous Metropolitan University. 


Francisco Luna Chapter E.50 

Centro Universitario de Mérida Francisco Luna received his degree in Engineering and PhD in Computer Science 

Mérida, Spain in 2002 and 2008, from the University of Malaga, Spain. Since 2013, he has been 

fluna@unex.es Assistant Professor in the Universidad de Extremadura at the Campus Universitario 


de Mérida. His current research interests include the design and implementation of 
parallel and multi-objective metaheuristics, and their application to solving complex 
problems in the domain of telecommunications and combinatorial optimization. 


Luis Magdalena Chapter B.13 For biographical profile, please see the section “About the Part Editors". 


Sebastia Massanet Chapter B.12 


Sebastia Massanet received his BS, MSc, and PhD degrees in Mathematics from 
the University of the Balearic Islands (UIB) in 2008, 2009, and 2012 where he is 
currently an Assistant Professor in the Department of Mathematics and Computer 
Science. His current research interests are fuzzy sets theory and related fields such as 
fuzzy connectives, fuzzy implications, functional equations, and fuzzy mathematical 
morphology. 


University of the Balearic Islands 

Dep. Mathematics and Computer Science 
Palma de Mallorca, Spain 
s.massanet@uib.es 


1556 About the Authors 


sioyyny 


Benedetto Matarazzo 


g Chapter C.22 


University of Catania Benedetto Matarazzo is Chairman of the degree course in Corporate Fi- 
Dep. Economics and Business nance at Catania University. He received the Gold Medal of International 
Catania, Italy Society of Multiple Criteria Decision Making in 2009. His main research 


matarazz@unict.it interests are in the fields of multiple criteria decision analysis, the rough 


set approach to decision analysis, and preference modeling. 


Sergi Mateo Bellido Chapter F.68 

Polytechnic University of Catalonia Sergi Mateo Bellido obtained his degree in Computer Science from 
Dep. Computer Architecture the Polytechnic University of Catalonia in 2011. Since then, he has 
Barcelona, Spain been working in the field of high performance computing at Barcelona 


sergim@ac.upc.edu Supercomputing Center and recently at the Polytechnic University of 


Catalonia. He is interested in algorithms, parallel programming models 
and domain specific languages. 


James McDermott Chapter E.43 

University College Dublin James McDermott has research interests in evolutionary design and genetic program- 
Lochlann Quinn School of Business ming. He was an IRC/Marie Curie Post-Doctoral Fellow at Massachusetts Institute of 
Dublin 4, Ireland Technology and is now a lecturer and the Program Director in Business Analytics at 

jmmcd@jmmcad.net University College, Dublin. He was Co-Chair of EvoMUSART 2013 and 2014 and 


Publication Chair of EuroGP 2015. 


Patricia Melin Chapters G.79, G.80 For biographical profile, please see the section “About the Part Editors". 


Corrado Mencar 


Chapter B.14 


University of Bari 
Dep. Informatics 


Bari, Italy 


corrado.mencar@uniba.it 


Corrado Mencar is Assistant Professor in the Department of Informatics, at the 
University of Bari, Italy. His research interests are computational intelligence, 

with a special emphasis on fuzzy logic, granular computing, neuro-fuzzy systems, 
d computational web intelligence, and intelligent data analysis. He has published more 
than 60 peer-reviewed papers. 


Radko Mesiar Chapters A.L, A.5 For biographical profile, please see the section "About the Part Editors". 


Ralf Mikut Chapter B.17 


Karlsruhe Institute of Technology (KIT) 
Inst. Applied Computer Science 
Eggenstein-Leopoldshafen, Germany 
ralf.mikut@kit.edu 


Ralf Mikut graduated in automatic control at TU Dresden and received 
his PhD degree from the University of Karlsruhe in 1999. He is Associate 
Professor at the Karlsruhe Institute of Technology. His research interests 
include fuzzy systems, computational intelligence, and data mining in 
biological and technical applications. 


Ali A. Minai Chapter D.35 

University of Cincinnati Ali A. Minai is Professor in the School of Electronic and Computing 
School of Electronic & Computing Systems Systems at the University of Cincinnati. He received his PhD in 
Cincinnati, USA Electrical Engineering from the University of Virginia. His areas of 


ali.minai@uc.edu research include complex systems, neural networks, cognitive models, 


computational neuroscience, and computational models of social 
systems. He is a senior member of IEEE and the International Neural 
Network Society, and a member of the Society for Neuroscience. 


About the Authors 1557 


eS =m Sadaaki Miyamoto Chapter B.15 


University of Tsukuba Sadaaki Miyamoto is a Professor of the University of Tsukuba, Japan. His current 

Risk Engineering research interests include methodology for uncertainty modeling. In particular, he has 

Tsukuba, Japan : been working on data clustering algorithms and the theory of generalized bags. He has 
<a miyamoto@risk.tsukuba.ac.jp published 3 books and over 300 research papers. In 2007, he became a Fellow of the 


International Fuzzy Systems Association. 


Christian Moewes a = Chapter B.17 


Christian Moewes received his diploma in Computer Science from the University 
of Magdeburg, Germany in 2007. He has co-authored two textbooks, co-edited two 
proceedings, published three journal articles, and five peer-reviewed book chapters. 
Currently he focuses on the statistical analysis of dynamical brain networks by means 
of model-based methods. 


University of Magdeburg 
Faculty of Computer Science 
Magdeburg, Germany 
cmoewes@ovgu.de 


s1oy}ny 


Javier Montero F 5 Chapter A.6 


Javier Montero is Professor at the Department of Statistics and Opera- 
Dep. Statistics and Operational Research | = ` tional Research, Complutense University of Madrid. He holds a PhD in 
Madrid, Spain E Mathematics. His research interests are aggregation operators, preference 
monty@mat.ucm.es a <2 representation, multicriteria decision aid, group decision making, system 
reliability theory, image processing, and classification problems. He has 
been the President of the European Association for Fuzzy Logic and 


Complutense University, Madrid 


Technology. 
Ignacio Montes Chapter B.11 
University of Oviedo Ignacio Montes received his BSc degree in Mathematics from the 
Dep. Statistics and 0.R. University of Oviedo, Spain, in 2009 and his MSc degree in 2010 also 
Oviedo, Spain _ from the University of Oviedo. He is now a member of the Department 
b imontes@uniovi.es of Statistics and Operational Research at the same university. Preference 


modeling with imprecise elements is his main topic of research. 


Susana Montes Chapter B.11 

University of Oviedo Susana Montes received her MSc degree in Mathematics from the University of 

Dep. Statistics and 0.R. Valladolid, Spain, in 1993, and her PhD degree from the University of Oviedo, Spain, 

Oviedo, Spain in 1998. She is a Professor of Statistics and Operational Research at the University 

montes@uniovi.es of Oviedo. She has published in international journals and international conference 
proceedings. 


Oscar H. Montiel Ross Chapter G.76 


Oscar Humberto Montiel Ross received his MSc in Digital Systems in 1996 from the 
Instituto Politécnico Nacional-CITEDI, an MSc from Tijuana Institute of Technology 
in 2000, and his PhD in 2006 from the Universidad Autónoma of Baja California in 
' Tijuana, México, both in Computer Science. He has published about 74 contributions 
in journals, book chapters, proceedings, and 4 books. His research interests include 
optimization, intelligent systems, and robotics. 


Mesa de Otay, Tijuana, Mexico 
oross@citedi.mx 


Manuel Mucientes Chapter B.20 


Manuel Mucientes is a Ramon y Cajal Research Fellow with the Research 
Centre for Information Technologies (CiTIUS) of the University of 
Santiago de Compostela. His current research interests are evolutionary 
algorithms, genetic fuzzy systems, motion planning and control in 
robotics, visual SLAM, web services, and process mining. 


University of Santiago de Compostela 
Research Centre for Information 
Technologies 

Santiago de Compostela, Spain 
manuel.mucientes@usc.es 


1558 About the Authors 


sioyuyny 


a wm Nysret Musliu Chapter E.64 
Vienna University of Technology Nysret Musliu is Privat Dozent and Senior Researcher at Vienna 
Inst. Information Systems University of Technology. He received his PhD in Computer Science 
Vienna, Austria from Vienna University of Technology in 2001 and his MSc degree 


7% PUSHMG)e DAL LUWENG from the University of Prishtina in 1996. His research interests include 


problem solving and search, metaheuristics, machine learning and 
optimization, constraint satisfaction, scheduling, and timetabling. 


ad\ 


a my Yusuke Nojima Chapter G.77 
Osaka Prefecture University Yusuke Nojima received his BS and MSc degrees from Osaka Institute of Technology, 
Dep. Computer Science and Intelligent Japan, in 1999 and 2001, respectively, and his PhD from Kobe University, Hyogo, 


Systems, Graduate School of Engineering Japan, in 2004. Since 2004, he has been with Osaka Prefecture University, Japan, 
ee ved kafüzikač] where he is currently an Associate Professor. His research interests include genetic 
4 ` saEJp fuzzy systems, evolutionary multiobjective optimization, and parallel distributed 

data mining. 


Stefano Nolfi Chapter D.37 


Stefano Nolfi is Director of Research of the Institute of Cognitive Sciences and 
Technologies (CNR). He is one of the founders of evolutionary robotics. His research 
activities focus on the evolution and development of behavioral and cognitive skills in 
embodied agents. He has authored/co-authored more than 130 peer-reviewed scientific 
publications, including a monograph. 


Consiglio Nazionale delle Ricerche 
(CNR-ISTC) 

Inst. Cognitive Sciences and Technologies 
Roma, Italy 

stefano.nolfi@istc.cnr.it 


Una-May O'Reilly sey Chapter E.43 


Massachusetts Institute of Technology 
Computer Science and Artificial 
Intelligence Lab. 

Cambridge, USA 


Una-May O'Reilly leads the AnyScale Learning For All (ALFA) group at 
Massachusetts Institute of Technology Computer Science and Artificial 

Intelligence Laboratory. She received the EvoStar Award for Outstanding 
Achievements in Evolutionary Computation in Europe in 2013 and serves 


unamay@csail.mit.edu as Vice-Chair of ACM SigEVO. 
Miguel Pagola Chapter A.6 
Universidad Pública de Navarra Miguel Pagola received his MSc and PhD degrees in Industrial Engi- 
g Dep. Automática y Computación neering from the Public University of Navarra, in 2000 and 2008. He is 
Pamplona (Navarra), Spain Associate Lecturer with the Department of Automatics and Computa- 


miguel.pagola@unavarra.es tion at UPNa. His research interests include fuzzy techniques for image 


processing, fuzzy set theory, machine learning, and data mining. He is a 
member of the European Society for Fuzzy Logic and Technology. 


jı Lynne Parker Chapter F.72 


University of Tennessee Lynne Parker is a Professor in the Department of Electrical Engineering and Computer 
Dep. Electrical Engineering and Computer Science at the University of Tennessee, Knoxville. She received her PhD degree from 
Science the Massachusetts Institute of Technology. She previously worked for several years 
Pe Nh odu as a full-time researcher at Oak Ridge National Laboratory. Her research focuses 
P ` on distributed intelligent robotics, human-robot interaction, sensor networks, and 
machine learning. 


About the Authors 


Kevin M. Passino gms Chapter F70 


Kevin M. Passino is Professor of Electrical and Computer Engineering and Director 
of the Humanitarian Engineering Center at Ohio State University. He has been Vice- 
President of Technical Activities of IEEE Control Systems Society, an elected member 
of IEEE Control Systems Society Board of Governors, Program Chair of the 2001 
IEEE Conference on Decision and Control, and is a Distinguished Lecturer for the 
IEEE Society on Social Implications of Technology. 


The Ohio State University 

Dep. Electrical and Computer Engineering 
Columbus, USA 

passino@ece.osu.edu 


Martin Pelikan gms Chapter £.45 


Martin Pelikan received his PhD in Computer Science from the University 
of Illinois at Urbana-Champaign in 2002. He is now a software engineer 
at Google. Previously, he was an Associate Professor at the University of 
ai ‘a Missouri in St. Louis. His research in evolutionary computation focused 
mainly on estimation of distribution algorithms (EDAs), efficiency 
enhancement techniques, and scalability of EDAs. 


Sunnyvale, USA 
martin@martinpelikan.net 


Irina Perfilieva Chapter A.7 

University of Ostrava Irina Perfilieva received her PhD from Moscow State University and 
Inst. Research and Applications of Fuzzy is currently a Professor of Applied Mathematics at the University of 
Modeling Ostrava, Czech Republic. Her research interests are fuzzy transforms 


Ostrava, Czech Republic 
Irina. Perfilieva@osu.cz 


with applications to image processing and computer vision. She has 
published more than 300 journal and conference papers and is co-author 


of 5 books. 
= Henry Prade Chapter A.3 
Université Paul Sabatier Henri Prade is a Research Director at CNRS at Paul Sabatier University. He is co- 
IRIT — Equipe ADRIA author of two monographs on fuzzy sets and possibility theory. His current research 
Toulouse Cedex 9, France interests are uncertainty and preference modeling, non-classical logics, approximate, 
prade@irit.fr plausible and analogical reasoning with applications to artificial intelligence, and 


information systems. He is an ECCAI fellow, an IFSA fellow, a 2001 highly-cited ISI 
laureate, and received an IEEE pioneer award in 2002. 


Mike Preuss Chapter E.58 

WWU Münster Mike Preuss is Research Associate at ERCIS, University of Miinster, Germany, and 
Inst. Wirtschaftsinformatik the Chair of Algorithm Engineering at TU Dortmund, Germany, where he received his 
Münster, Germany PhD in 2013. His research interests focus on the field of evolutionary algorithms for 
mike. preuss@tu-dortmund.de __ real-valued problems, namely on multimodal and multiobjective optimization, and on 


computational intelligence methods for computer games. 


José C. Principe 3 Chapter D.30 


Jose C. Principe is currently a Distinguished Professor of Electrical and 
Biomedical Engineering at the University of Florida, Gainesville, USA. 
He is Founder and Director of the University of Florida Computational 
Neuro-Engineering Laboratory (CNEL). He is an IEEE Fellow and 
an AIMBE Fellow. He is involved in biomedical signal processing, 

in particular, the electroencephalogram (EEG) and the modeling and 
applications of adaptive systems. 


University of Florida 

Dep. Electrical and Computer 
Engineering 

Gainesville, USA 
principe@cnel.ufl.edu 


1559 


s1oy}ny 


1560 About the Authors 


sioyyny 


Domenico Quagliarella Chapter E.60 
CIRA, Italian Aerospace Research Center Domenico Quagliarella received his MSc degree and his PhD in 
Fluid Dynamics Lab. Aerospace Engineering from the University of Naples, Italy. He 
Capua (CE), Italy is currently Senior Researcher at CIRA. His research interests are 
d.quagliarella@cira.it the application of hybrid multi-objective optimization methods to 
aerodynamic and multidisciplinary design. He is the author of about 65 
publications. 
Nicanor Quijano Chapter F.70 
Universidad de los Andes Nicanor Quijano received his PhD degree in Electrical and Computer Engineering 
Dep. Electrical and Electronics from The Ohio State University in 2006. Since 2007 he has been with the Electrical 
Engineering F Engineering Department, Universidad de los Andes, Colombia, where he is the 
Bogotá, Colombia Director of the Research Group on Control Systems. His research interests include 


nquijano@uniandes:edu.co hierarchical and distributed optimization methods using bio-inspired and game- 


theoretical techniques for dynamic resource allocation problems. 


Jaroslav Ramík F Chapter A.8 

Silesian University in Opava Prof. Jaroslav Ramík, PhD, is a Professor of Mathematics, Statistics, and Operations 

Dep. Informatics and Mathematics Research at the Silesian University Opava, in Karvina, Czech Republic. His interests 

Karviná, Czech Republic include optimization methods in economics and decision making. He is the author of 
ramik@opf.slu.cz 6 books and more than 50 papers listed in WoS. He is also active in the Czech Society 


for Operations Research and has served as its former president. 


Ismael Rodriguez Fdez Chapter B.20 


Ismael Rodriguez received his MSc in Computer Science in 2011 from 
the University of Santiago de Compostela. He is presently a PhD student 
at the Research Centre for Information Technologies of the University 
of Santiago de Compostela (CiTIUS). His current research interests are 
regression problems, evolutionary algorithms, and genetic fuzzy systems. 


University of Santiago de Compostela 
Research Centre for Information 
Technologies 

Santiago de Compostela, Spain 
ismael.rodriguez@usc.es 


Franz Rothlauf Chapter E.53 

Johannes Gutenberg University Mainz Franz Rothlauf received a Diploma in Electrical Engineering from 
Gutenberg School of Management and the University of Erlangen, Germany, a PhD in Information Systems 
Economics from the University of Bayreuth, Germany, and a Habilitation from the 
Mainz, Germany University of Mannheim, Germany. He is a Professor for Information 


rothlauf@uni=manz:de Systems at the University of Mainz. His research activities include 


planning and optimization, evolutionary computation, e-business, and 
software engineering. 


F = Jonathan E. Rowe Chapter E.42 
FA University of Birmingham Jonathan Rowe received a degree in Mathematics and PhD in Computer Science 
School of Computer Science from the University of Exeter. He has worked in the field of natural computation for 
Birmingham, UK 20 years. He joined the University of Birmingham in 2000 and is now the Head of 
M > J.E.Rowe@cs.bham.ac.uk the School of Computer Science. He is Associate Editor for Theoretical Computer 


Science and Natural Computing journals. 


Imre J. Rudas i Chapter B.10 


Óbuda University & Imre J. Rudas received his Master’s Degree in Mathematics in Budapest, and his 


Dep. Applied Mathematics 


rudas@uni-obuda.hu 


Doctor of Science degree from the Hungarian Academy of Sciences. He is a full 


Budapest, Hungary á University Professor and Head of Óbuda University Research and Innovation Center. 


His present areas of research activities are robotics and computational intelligence. He 
has published 6 books and more than 690 scientific papers. 


About the Authors 


Giinter Rudolph 


= sem Chapter E.58 


Technische Universitat Dortmund 

Fak. Informatik 

Dortmund, Germany 
guenter.rudolph@cs.tu-dortmund.de 


5 Gabriele Sadowski 


> Giinter Rudolph (PhD in Computer Science, 1996) has been a Professor 
of Computational Intelligence at the Department of Computer Science 
at TU Dortmund University since 2005. His research interests include 
A the development and theoretical analysis of bio-inspired methods applied 
to difficult optimization problems encountered in engineering sciences, 
logistics, and economics. 


Chapter E.58 


Dortmund, Germany 


Marco Scarpa 


Technische Universitat Dortmund Gabriele Sadowski (PhD in Physical Chemistry, 1991) has been a 
Bio- und Chemieingenieurwesen Professor of Thermodynamics in the Department of Biochemical and 


Chemical Engineering at TU Dortmund since 2001. Her research 


gabriele.sadowski@bci.tu-dortmund.de interests include experimental investigation and thermodynamic mod- 


eling of phase behavior in systems containing complex molecules, like 
polymers, electrolytes, biomolecules, or pharmaceuticals. 


Chapter F.69 


Universita’ di Messina 

Dip. Ingegneria Civile, Informatica 
Messina, Italy 
mscarpag@unime.it 


Werner Schafhauser 


XIMES 
Vienna, Austria 
schafhauser@ximes.com 


Roberto Sepulveda Cruz 


Marco Scarpa received his Bachelor degree in Computer Engineering from the 
University of Catania, Italy and his PhD degree in Computer Science in 2000 from 
the University of Turin, Italy. He is currently Associate Professor of Computer 
Engineering at the University of Messina, Italy. His interests include performance 
and reliability modeling of distributed and real time systems and algorithms for their 
solution. 


Chapter E.64 


Werner Schafhauser is a Senior Consultant and a software developer at XIMES in 
Vienna where, amongst other things, he develops and applies optimization algorithms 
to real scheduling problems. His research interests include metaheuristic optimization, 


a constraint satisfaction problems, structural decomposition methods, and scheduling. 


He has a PhD in Computer Science from Vienna University of Technology. 


Chapter G.76 


Mesa de Otay, Tijuana, Mexico 
rsepulve@citedi.mx 


F z Jennie Si 


Roberto Sepúlveda Cruz received his MSc from the Tijuana Institute of 
Technology, México, and his PhD from the Universidad Autónoma of 
Baja California, Tijuana, México, both in Computer Science in 1999 and 
2006, respectively. His research interests include type-2 fuzzy systems, 
intelligent systems, and robotics. He is a member of the International 
Association of Engineers (IANG). 


Chapter D.34 


Arizona State University 


Jennie Si received her BS and MSc degrees from Tsinghua University, 


School of Electrical, Computer and Energy Beijing, China, and her PhD from the University of Notre Dame. She 


Engineering 
Tempe, USA 
a si@asu.edu 


has been on the faculty in the Department of Electrical Engineering 
at Arizona State University since 1991. Her research focuses on 
dynamic optimization using learning and neural network approximation 
approaches, namely approximate dynamic programming. She has served 
on the Executive Boards of several professional organizations. 


1561 


s1oy}ny 


1562 


sioyuyny 


About the Au 


thors 


Marco Signoretto Chapter D.32 

Katholieke Universiteit Leuven Marco Signoretto holds a PhD in Mathematical Engineering from Katholieke 
Leuven, Belgium Universiteit Leuven, Belgium; a Laurea Magistralis in Electronic Engineering 
marco. signoretto@esat. kuleuven.be from the University of Padova, Italy; and an MSc in Methods for Management of 


Complex Systems from the University of Pavia, Italy. His research interests include 
mathematical modeling of structured data. His current work deals with methods based 
on (convex) optimization, structure-inducing penalties, and spectral regularization. 


Andrzej Skowron wm Chapter C.21 


University of Warsaw 
Faculty of Mathematics, Computer Science 


and Mechanics 


Warsaw, Poland 
skowron@mimuw.edu.p! 


Andrzej Skowron is a Full Professor at the Institute of Mathematics at the University of 
Warsaw. He received his PhD and DSc (Habilitation) from the University of Warsaw, 
and the title of Professor in 1991. He is an ECCAI Fellow. His area of expertise 
includes reasoning with imperfect data and knowledge, soft computing methods, 
rough sets, granular computing, data mining, adaptive and autonomous systems, 
perception-based computing, and interactive computational systems. 


Igor Skrjanc Chapter G.75 

Igor Skrjanc is currently a Professor of Automatic Control with the 
Faculty of Electrical Engineering, the University of Ljubjana. His main 
research interests include intelligent, predictive control systems and 
autonomous mobile systems. In 2009 he received the Humboldt Research 
Fellowship for Experienced Researchers. 


University of Ljubljana 

Faculty of Electrical Engineering 
Ljubljana, Slovenia 
igor.skrjanc@fe.uni-lj.si 


Roman Stowinski Chapters C.22, C.24 For biographical profile, please see the section “About the Part Editors”. 


aa my Guido Smits Chapter E.57 
Dow Benelux BV Guido F. Smits is a Data Scientist at Dow Chemical Company. His 
Core R&D main area of interest and expertise is in innovative applications of 
NM Hoek, The Netherlands computational intelligence to new product design and optimization. He 


ià gfsmits@dow.com has authored more than 70 papers and currently holds 15 patents. He 


has a PhD from the University of Leiden, NL. 


i 


Alessandro S 


Ronen Sosnik Chapter D.39 

Holon Institute of Technology (H.1.T.) Ronen Sosnik received his MSc degree in Neuroscience and his Research Doctorate 

Electrical, Electronics and Communication in Neuroscience from Weizmann Institute of Science in 2000 and 2005. His research 

ee interests include computational motor control, motor learning, and neural substrates 
olon, Israe 


mediating the acquisition, representation, and generation of motion primitives. 
Currently, he is devising innovative experimental paradigms and mathematical 
methods for the construction of novel BCI systems. 


ronens@hit.ac.il 


perduti Chapters D.27, D.31 


University of Padova 


Dep. Pure and 
Padova, Italy 


sperduti@math.unipd.it 


Alessandro Sperduti has a PhD in Computer Science from the University of Pisa. 
He has been Professor in Computer Science at the University of Padova since 2002 
and Chair of the Data Mining and Neural Networks Technical Committees of IEEE 
CIS. His research interests include machine learning, neural networks, learning in 
structured domains, and data and process mining. 


Applied Mathematics 


About the Authors 


Kasper Støy 


IT University of Copenhagen 
Copenhagen S, Denmark 
ksty@itu.dk 


Chapter F.73 


Kasper Stoy holds an Associate Professorship at The Maersk Mc-Kinney 
Moller Institute, University of Southern Denmark (USD). His research 
interests include design of modular robot systems and distributed control. 
He has authored a monograph. He holds an MSc in Computer Science and 
Physics, University of Aarhus, Denmark (1999) and a PhD in Computer 
System Engineering, USD (2003). 


F 5 Harrison Stratton Chapter D.34 
Arizona State University & Barrow Harrison Stratton obtained his BSc in Physics from Virginia Polytechnic 
Neurological Institute Institute and State University in 2008 and is currently completing his 
Phoenix, USA PhD at Arizona State University and Barrow Neurological Institute. 
Harrison. Stratton@asu.edu His work focuses on the role of the endogenous cannabinoid system in 


Thomas Stiitzle 


3 
d 


Université libre de Bruxelles (ULB) 
IIRIDIA, CP 194/6 

Brussels, Belgium 
stuetzle@ulb.ac.be 


Dirk Sudholt 


regulating changes of emotion, memory, and learning. 


Chapter E.54 


Thomas Stützle is Senior Research Associate of the Belgian FRS-FNRS at the 
University of Brussels. He received his MSc from Universität Karlsruhe (TH) and 
his PhD and Habilitation from the Technische Universität Darmstadt, Germany. 
His interests lie in stochastic local search, swarm intelligence, methodologies for 
engineering stochastic local search algorithms, multi-objective optimization, and 
automatic configuration of algorithms. 


Chapter E.46 


University of Sheffield 
Dep. Computer Science 
Sheffield, UK 
d.sudholt@sheffield.ac.uk 


Ron Sun = 


Rensselaer Polytechnic Institute 
Cognitive Science Dep. 

Troy, USA 

rsun@rpi.edu 


5 Johan A. K. Suykens 


Dirk Sudholt obtained Diploma and PhD degrees in Computer Science in 2004 
and 2008, respectively, from the Technische Universität Dortmund, Germany. After 
holding postdoc positions in Berkeley, California, and Birmingham, UK, he joined 
the University of Sheffield, UK, as Lecturer in 2012. His research focuses on the 
computational complexity of randomized search heuristics. 


m Chapter D.36 


Ron Sun is Professor of Cognitive Sciences and Computer Science at 
Rensselaer Polytechnic Institute. He received his PhD from Brandeis 

University in 1992. His research interests center around the study of 
cognition. He has published many papers in these areas, as well as ten 
books. 


Chapter D.32 


Katholieke Universiteit Leuven Johan A.K. Suykens is Professor at Katholieke Universiteit Leuven, 


€ Leuven, Belgium 


Belgium, where he obtained a degree in Electro-Mechanical Engineering 


Johan. suykens@esat. kuleuven.be and a PhD in Applied Sciences. He is a senior IEEE member, has 


Roman W. Swiniarski (deceased) 


co/authored and edited several books, and received many prestigious 
awards. 


Chapter C.21 


1563 


s1oy}ny 


1564 About the Authors 


sioyyny 


a em El-Ghazali Talbi 


Chapter E.55 


University of Lille 
Computer Science CRISTAL 


\ Villeneuve d'Ascq, France 
el-ghazali.talbi@univ-lillet.fr 


Lothar Thiele 


Swiss Federal Institute of Technology 
Zurich 

Computer Engineering and Networks Lab. 
Zurich, Switzerland 

thiele@ethz.ch 


Peter Tino 


El-Ghazali Talbi received his Master’s and PhD in Computer Science from the Institut 
National Polytechnique de Grenoble, France. He is a Full Professor at the University of 
Lille and the Head of the Optimization Team of the Computer Science Laboratory. His 
research interests are in the field of multi-objective optimization, parallel algorithms, 
metaheuristics, combinatorial optimization, cluster and grid computing. 


m™ Chapter E.48 


University of Birmingham 
School of Computer Science 
Birmingham, UK 
P.Tino@cs.bham.ac.uk 


wm Joan Torrens 


Lothar Thiele received his Diploma and Dr.-Ing. degrees in Electrical Engineering 
from the Technical University of Munich where he also received his Habilitation. 
He joined ETH Zurich, Switzerland, as a Full Professor of Computer Engineering in 
1994. His research interests include models, methods and software tools for the design 
of embedded systems, embedded software and bioinspired optimization techniques. 


Chapter D.27 


Peter Tino has a PhD in Computer Science (Slovak Academy of Sciences) 
and is a Reader in Complex and Adaptive Systems at the University of 
Birmingham, UK. He is a Vice-Chair of the Neural Networks Technical 
Committee of IEEE CIS. His main research interests include dynamical 
systems, machine learning, probabilistic modeling of structured data, 
evolutionary computation, and fractal analysis. 


Chapter B.12 


University of the Balearic Islands Joan Torrens received his BSc degree in Mathematics from Universitat 
Dep. Mathematics and Computer Science Autonoma de Barcelona (1981) and his PhD degree in Computer 


Palma de Mallorca, Spain 
i A\ jts224@uib.es 


B = Vito Trianni 


Science from Universitat de les Illes Balears, Spain (1990). He is a Full 
Professor in the Department of Mathematics and Computer Science at 

Universitat de les Illes Balears. His research interests include fuzzy sets 
theory. He has published over 100 papers in journals. 


Chapter F.71 


Consiglio Nazionale delle Ricerche 
Ist. Scienze e Tecnologie della Cognizione 
Roma, Italy 

X vito.trianni@istc.cnr.it 


Enric Trillas 


European Centre for Soft Computing 
Fundamentals of Soft Computing 
Mieres, Spain 
enric.trillas@softcomputing.es 


Fevrier Valdez 


Vito Trianni is a tenured researcherat ISTC-CNR, the Institute of Cognitive Sciences 
and Technologies of the Italian National Research Council. He received his PhD in 
Applied Sciences from the Université Libre de Bruxelles in 2006. He has thorough 
expertise, both theoretical and experimental, in the study and design of self-organizing 
behavior, especially applied to swarm robotics. 


Chapter B.16 


Tijuana Institute of Technology 
Tijuana, Mexico 
fevrier@tectijuana.mx 


Enric Trillas, an Emeritus Researcher at ECSC, received a PhD in Mathematics from 
the University of Barcelona. From 1964-1984 he did research in ordered semigroups, 
probabilistic and generalized metric spaces, and in 1975 he began working in fuzzy 
logic. He has published over 400 papers, and published or edited several books. His 
current interests lie in fuzzy logic and in the mathematical analysis of natural language 
and commonsense reasoning. 


æ Chapter G.81 


Fevrier Valdez is a Professor in the Computer Science Department 

of Tijuana Institute of Technology. His research interests are bio- 
inspired optimization methods, parallel computing, fuzzy logic and 
neural networks. He has published several papers in journals, conference 
proceedings, and as book chapters. 


About the Authors 


Nele Verbiest Chapter C.26 

Ghent University Nele Verbiest holds a Master’s degree in Mathematical Computer 
Dep. Applied Mathematics, Computer Science and a PhD in Computer Science, both from Ghent University. 
Science and Statistics Her research interests include classification, evolutionary algorithms, 


Ghent, Belgium 
nele.verbiest@ugent.be 


instance selection, feature selection, and fuzzy rough set theory. 


Thomas Villmann Chapter D.31 


University of Applied Sciences Mittweida Thomas Villmann received his PhD in Computer Science in 1996 and his Venia legendi 
Dep. Mathematics, Natural and Computer in 2005 from Leipzig University. He has been a Full Professor of Computational 
Sciences Intelligence at the University of Applied Sciences Mittweida, Germany since 2009. 
ena OAs ttiwéidä-de His research focus includes the theory of prototype-based clustering and classification, 
` ` non-standard metrics, information theory, and their application in pattern recognition 
for use in medicine, bioinformatics, remote sensing, and hyperspectral analysis. 


Milan Vlach i ) Chapters A.8, A.9 


Charles University 

Theoretical Computer Science and 
Mathematical Logic 

Prague, Czech Republic 
Milan.Viach@mff. cuni.cz 


Milan Vlach studied Mathematics at Charles University, Prague (1958-1960) and 
Moscow State University (1960-1963). Since graduating from Moscow State Uni- 
versity (1960), he has been affiliated with Charles University. At present he is also 
affiliated with the Institute of Information Theory and Automation, Czech Academy 
of Sciences. His current area of interest includes game theory, fair division, and 
optimization theory. 


Ekaterina Vladislavleva Chapter E.57 


Evolved Analytics Europe BVBA 
Turnhout, Belgium 
katya@evolved-analytics.com 


Ekaterina Vladislavleva is CEO and Co-Founder of Evolved Analytics 
Europe, a Belgium-based advanced predictive analytics company — 
creator of DataStories.com. She received her PhD from CentER at Tilburg 
School of Economics (The Netherlands), her Doctorate in Engineering 
from Eindhoven University of Technology (The Netherlands) and MSc 
from Lomonosov Moscow State University (Russia). 


5 Tobias Wagner Chapter E.59 
TU Dortmund University Tobias Wagner studied computer science and finished his Doctoral 
Dep. Mechanical Engineering degree in Mechanical Engineering at the University of Dortmund (now 
Dortmund, Germany TU Dortmund University). He has been a Postdoctoral Researcher 
wagner@isf.de (Akademischer Rat) at the Institute of Machining Technology (ISF) 


since 2013. His research interests include the empirical modeling and 
optimization of machining processes and process chains. 


Jun Wang Chapter D.33 

The Chinese University of Hong Kong Jun Wang is a Professor at the Chinese University of Hong Kong. He has been a 
Dep. Mechanical & Automation National Thousand-Talent Chair Professor at Dalian University of Technology since 
Engineering 2011. He received his BSc degree in Electrical Engineering and his MSc degree in 


Hongkong, Hong Kong 


| iwang@mae.cuhk.edu.hk Systems Engineering from Dalian University of Technology, China and his PhD in 


Systems Engineering from Case Western Reserve University, Cleveland, Ohio, USA. 


Simon Wessing Chapter E.58 


Technische Universitat Dortmund 
Fak. Informatik 

Dortmund, Germany 
simon.wessing@tu-dortmund.de 


Simon Wessing has been a Research Associate in the Computational Intelligence 
Group, Technische Universität Dortmund since 2009. His research interest focuses 
on multimodal and global optimization. He develops new optimization algorithms for 
these problems based on evolutionary algorithms and other derivative-free methods. 


1565 


s1oy}ny 


1566 About the Authors 


sioyyny 


Wei-Zhi Wu wm Chapter C.26 


Ti 


Wei-Zhi Wu received his BSc degree in Mathematics from Zhejiang 
Normal University, his MSc degree from East China Normal University, 
and his PhD degree from Xi’an Jiaotong University (2002). He is currently 
a Professor of Mathematics in the School of Mathematics, Physics, and 
Information Science, Zhejiang Ocean University. His current research 
interests include approximate reasoning, rough sets, random sets, formal 
concept analysis, and granular computing. 


Zhejiang Ocean University 

School of Mathematics, Physics and 
Information Science 

Zhoushan, Zhejiang, China 
wuwz@zjou.edu.cn 


Des Lei Xu Chapter D.29 
The Chinese University of Hong Kong Lei Xu is a Professor at the Chinese University of Hong Kong. He is 
Dep. Computer Science and Engineering an IEEE Fellow, an IAPR Fellow, an Academician of the European 
Hong Kong, Hong Kong Academy of Sciences. He has published well-cited papers on neural 


Ixu@cse. cuhk. edu. hk networks, machine learning, and pattern recognition. 


JingTao Yao Chapter C.25 

University of Regina JingTao Yao is a Professor of Computer Science at the University of Regina. He 
Dep. Computer Science has published over 100 papers on granular computing, rough sets, data mining, and 
Regina, Saskatchewan, Canada Web-based support systems. He is the Elected Chair of the Steering Committee of the 


Jtyao@cs.uregina.ca International Rough Set Society. 


Yiyu Yao Chapter C.24 For biographical profile, please see the section “About the Part Editors”. 
Andreas Zabel Chapter E.59 

TU Dortmund University Andreas Zabel studied Computer Science and finished his Doctoral degree in Mechan- 
Dep. Mechanical Engineering ical Engineering at the University of Dortmund (now TU Dortmund University). He is 


Dortmund, Germany 


zabel@isf.de 


Chief Engineer of the Institute of Machining Technology (ISF). His research includes 
the simulation of machining processes, modeling and analysis of tool wear, as well as 
augmented and virtual reality for process planning. 


Stawomir Zadrozny Chapter B.19 


Stawomir Zadrozny is an Associate Professor (PhD 1994, DSc 2006) and 
a Deputy Director for Research at the Systems Research Institute, Polish 
Academy of Sciences. His current interests include applications of fuzzy 
logic in database management systems, information retrieval, decision 
support, and data analysis. He has authored and co-authored about 200 
journal and conference papers. He is also a teacher at the Warsaw School 
of Information Technology. 


Polish Academy of Sciences 

Systems Research Inst. 

Warsaw, Poland 
Slawomir.Zadrozny@ibspan.waw.pl 


Zhigang Zeng Chapter D.33 

Huazhong University of Science and Zhigang Zeng is Professor in the Department of Control Science and 
Technology Engineering, Huazhong University of Science and Technology, Wuhan, 
Dep. Control Science and Engineering Hubei, China. He received his BSc degree in Mathematics from Hubei 
Wuhan, China Normal University, Huangshi, China, his MSc degree in Ecological 
zgzeng@hust.edu.cn Mathematics from Hubei University, Wuhan, China, in 1993 and 1996, 


and his PhD degree in Systems Analysis and Integration from Huazhong 
University of Science and Technology, Wuhan, China, in 2003. 


About the Authors 


legge Yan Zhang 


Chapter C.25 


University of Regina 

Dep. Computer Science 
Regina, Saskatchewan, Canada 
zhang83y@cs.uregina.ca 


Zhi-Hua Zhou E 


Nanjing University 

National Key Lab. for Novel Software 
Technology 

Nanjing, China 

zhouzh@nju.edu.cn 


Yan Zhang is currently a PhD student in the Department of Computer Science, 
the University of Regina, Canada. Her main research involves rough sets, granular 
computing, data analysis, and data mining. She has authored or co-authored more 
than 15 technical papers in international journals and conference proceedings. She is 
coauthor of two book chapters. 


Chapter D.29 


Zhi-Hua Zhou received BSc, MSc, and PhD degrees in Computer Science from 
Nanjing University, China, in 1996, 1998 and 2000. He joined the Department of 
Computer Science and Technology of Nanjing University in 2001 and at present 
he is a Professor and Deputy Director of the National Key Lab for Novel Software 
Technology. His research interests include artificial intelligence, machine learning, 
data mining, pattern recognition and multimedia information retrieval. 


1567 


s1oy}ny 


1569 


Detailed Contents 


List of Abbreviations 2.6.6.0: ciscciccesacceiasiadavsecssarescsesessoreciacsansenss XLV 
1 Introduction 
Janusz KOCHIZYR, Witold POGIY EZ. esisin sirinin dai 1 
11 (Betas GT ihe Comten ecrin annene aa P] 
1,11 Part A Foundations eccsciccrrn or rnn eaaa 2 
Ll2 Part B FUZZY LOR Ceneo eieren n ee ideias P] 
1.13 Part C Roug SETE:......<.0cccccsetacesareeterdentescdessanencee 2 = 
1.1.4 Part D Neural NetWorkS 0.65.5. cesses coisas ceases ccesasceesene 3 D 
1.1.5 Part E Evolutionary Computation .........sssssessssseesesess 3 = 
1,1,6 Part F Swarm Intelligence .............cce eee e eee cceeeeeeeeees 3 a 
1.1.7  ‘PartG Hybrid System ...00.0..0...sccccceecseeseasacescaeven’ 4 S 
1.2 Conclusions and Acknowledgments .......seessnssesseessossessossessse 4 > 
Part A Foundations 
2 Many-Valued and Fuzzy Logics 
Siea ied COLON enc ce sossceeccba nace icsavaus sodas eevee saucatisees ew dadenae T 
2:1 Basie Many-Valued LOBES 2.c..cc. cc bess ccnwads ceed nc ceed cheese esitas 8 
211 The GORE) ABIES ecceruen a anes 8 
Zele2 The CURASIOWIEZ LOBICS occcccreiriiniisrescsirereerensiasa 9 
21:23 The Prodnet WONG ins csccccasncth scanwaseeses ainne mini 9 
Zoli The Post LOGS ces cvcsisiccieanieasa uses coeeut in aa 10 
2.1.5 Algebraic Semantics ........sesesesseseccseseesscererensrceses 10 
Zul PU SOUS ois dic dicataesiaadts E E A 11 
2.2.1  SetAlgebratfor Fuzzy SOS acid... .ciswccceseeswensaaeeesed ances 11 
Ze dul Fuzzy Sets and Many-Valued LOgiC............... cesses eeees 12 
223 CONGRATS amd CH COHONMNG.. on... sicevensiewessivead ee ned iiio 12 
2.3 f-Nomü-Based LOGICS... .22...c50ds iceke ecexiees osaneeaicdeuses iai 13 
pie wal Base LACES iis: wissen saws E ema nedenmenacaanees 13 
2:32 Left and Full Continuity of t-NormS........ssssessssseesesess 14 
2:33 Extracting an Algebraic Framework ...............eeeeeeeees 15 
Ak Particular Fuzzy LOBIGS seocniaecriscinnnorait cece e oii 16 
2.4.1 The Logic BL of All Continuous t-Norms .................065 LF 
2.4.2 The Logic MTL of All Left Continuous t-Norms.............. 18 
2.4.3 Extensions OT MIL, ...<s.4 cee oeeven vind venison se aicciad tenisie 19 
2.4.4 Logics of Particular t-NOFMS............. cee ceccceeeeseeeeeees 20 
2.4.5 Extensions to First-Order LOgiCS ............cccccceeeeeeeeees 20 
2:5 SOME Generalizations a5 0 scie id eating nensem ees dais toad einas resan 21 
2.5.1 Adding a Projection Operator............. cece ccc eee eee e eens 21 
2.5.2 Adding an Idempotent Negation...............cceeeeeeeeees 22 


25:3 Logics with Additional Strong Conjunctions................ 22 


1570 Detailed Contents 


*jU0) paylejaq 


Logics Without a Truth Degree Constant .................065 
Logics with a Noncommutative Strong Conjunction ....... 
Extensions with Graded Notions of Inference 


Pavelka-Style Approaches............ 
A Lattice Theoretic Approach......... 


Some Complexity Results ...............ceeeeees 
Concluding Remarks ..............ceccccceeeeees 


254 
255 
2.6 
2.6.1 
2.6.2 
Lat 
2.8 
References 


3 Possibility Theory and Its Applications: Where Do We Stand? 


Dither DUBOIS, Henty PRAGE on... ove dss eecesesscgtes ede seve es narsi enina 
Sal. Mistorheal BACKBIGUiNd «2252s cgcd.0 ces ceeesadvoiesdecdecantedieeevee. cease 
3.1.1 ie SIRO T E E E es 
ee RY Fh EAN E E eeinnaeeta wince neatasareeeesomeas neem 
3.1.3 Ee CORNER orearen iE EEEE EE des senu aurea ees 
Slt LMP GMs ccscadicccia cde iaaehiwns dinsmosedain wed sens saat see 
3.2 Basic Notions of Possibility TREOry.............. cee cece eee ee eee eeeeenee 
32.1 Possibility DIStHBUTIONS: oiiire oisiesisorocsiresisissndona 
3.2.2 MOL T E E EE S ties es 
J23 Possibility and Necessity Functions ..............ceeee eee ees 
Jak Certainty QualMicalon 2.0: sco. acacces Vesdsweendotsdeesinion de 
3.2.5 Joint Possibility DistHDUTIONS.... 00.02 sc20.cccc6s cocesev eden 
3.2.6 COMTO S eene a a E E E na 
I2 WCE PRINCE oeiia eeesas leeds adic iieii a 
3.2.8 FUZZY IG Aa) AMalysSiS .i.0sccccsscedeesaseseecndeecaasoean ewe 
3:2:9 Guaranteed Possibility ..............cccee eee ecceceeeeeeeeeees 
3.2.10 Bipolar Possibility Theory ............ ccc cece cece ee ee eee eens 
3.3 Qualitative Possibility Theory........... ccc c cece cece ccc eeeeeeeeeeeenee 
3.3.1 Possibility Theory and Modal Logic ...............seeeeeeees 
Deane Comparative Possibility ..............ccceeeeecceeeeeeeeeeeees 
Sided Possibility Theory and Nonmonotonic Inference .......... 
3.3.4 PSSSIBINSHE LOBE erii 2 acdccsed rniii nennir 
3.3.5 Ranking Fanction TMEO 2... ...ccecdecseeeseecevecscesesee ens 
oe Possibilistic Belief Networks ................eeeeeeeeeeeeeeeee 
3.3.1 Fuzzy Rule-Based 
and Case-Based Approximate Reasoning................00- 
3.3.8 Preference Representation............cceeeeeccecceeeeeeeeees 
ae Decision-Theoretic Foundations ................ceeeeeeeeees 
3.4 Quantitative Possibility Theory ...............c cece ccc ccceeeeeeeeeceeees 
3.4.1 Possibility as Upper Probability ...............ccceeeee eee ees 
3.4.2 ROMMNCIOMING =. 3. sscccaiiiehosdscetasnceseeeesacemmorseesenaiee us 
3.4.3 Probability—Possibility Transformations ................0005 
959 Some AP OIICAM GMS... vise sees sendsorsdesteecesdssacsencsltewte eee dese 
35 Uncertain Database Querying and Preference Queries..... 
35.2 Desonipton LOBIES 2.0.05. ..5cc cece ccs ecee ceeds ceneecnteeseiee cs 
TE information FUSO cc: cctasadtisedensssdendeara toe dbeainaninnd 
3.5.4 Temporal Reasoning and Scheduling ...............eeeeeeee 


22 
22 
23 
23 
24 
25 
27 
27 


34 
32 
32 
33 
33 
33 
33 
33 
34 
34 
35 
35 
35 
36 
36 
36 
37 
38 
38 
38 
39 
40 
41 
42 


43 
43 
44 
45 
46 
46 
4T 
49 
49 
50 
50 
51 


Detailed Contents 


3.5.5 RSW GS GSIS . < scce atone raid E easton sade 52 
3.5.6 Machine leaning ao esirecrsisrsisrerescrrorecrennkoesisiasi 52 
36 Some Curent Research LIMES... occccc cies see veces ovcccovenereded seca 53 
Referens oc. osic cosas cosa c ce esd ista suede age cena ds ode deen ae teasecdieeeccesecs 54 


4 Aggregation Functions on [0,1] 


Radko Mesiar, Anna Kolesárová, Magda Komorníková ............0000eee0es 61 
4.1 Historical and Introductory Remarks .............cccccesseeeeeceeeeees 61 
4.2 Classification of Aggregation FUNCTIONS................ee eee ececeeeeees 63 
4.3 Properties and Construction Methods ...............seeeeeecceceeeenes 66 
Bote  Condüding Remarks 22.4 ccccceccesshcaeiaec ceed ee deea cous shbeveees enacts 71 
PQDSW ENC cornice ceee E EE EE aE E aie Eaa oleae 72 


5 Monotone Measures-Based Integrals 


Erich P. Klement, Radko Mesiaf .........ccecc ees ccuccccccecceucceucecceesceess T5 
5.1 Preliminaries, Choquet, and Sugeno Integrals................eeeeeeee 76 
Sa Benvenuti integral ocrni so ncesdasoedewassa goles n E 80 
Sid  Upiweral WATERS eien ienen enna n E a Na 82 
5.4 General Integrals Which Are Not Universal ..............cceeeeeeeeeees 84 
5.5 Concluding Remarks, Application Fields..............cccceeeeeeeeeeees 86 
PROD ENN SO a T A E 87 


6 The Origin of Fuzzy Extensions 
Humberto Bustince, Edurne Barrenechea, Javier Ferndndez, 


Miguel Pagola, Javier Montero .......... ccc cece cece ccc eeneneeteeeeeeeeeeeneeees 89 
6.1 Considerations Prior to the Concept of Extension of Fuzzy Sets ..... 90 
6.1.1 Brouwer's Intuitionistic Logic ........... cc cece cee ee seen ee ees 91 
6.1.2 Lukasiewicz's Multivalued LOgiCS.............cccceeeeeeeeees 91 
6.1.3 Zadeh's Fuzzy Logic. First Generalization by Goguen...... 91 
6.2 Origin ofthe Extensions 2.2.0. c cee ece seen tinede mewen saccade eeeeeneas 93 
Gd “Wea 2 TUZA SO in oen tice emo tiewieieeosedeazecmenwarded 94 
6.3.1  Type-2 Fuzzy Sets as a Lattice ......... cece cece cece eee e eee 94 
Gy F. Remarks on the Notation ........... 0... eee eee cece cece ee eee g5 
6.3.3 A First Definition of Operations Between 
Type-2 Fuzzy Sets: Lattice-Based Approach................ 95 
6.3.4 Problems with the Lattice-Based Definitions. 
Operations Based on Zadeh's Extension Principle......... 96 
6.3.5 Second Definition of the Operations: 
Zadeh's Extension Principle Approach................eeeees 96 
6.3.6 About Computational Efficiency............... sees cece eee eee 98 
GLT -AHPN csi cis sa. ces acinwasdneds sdeeweu aes ie diawadendads 98 
6.4  Interval-Valued Fuzzy Sets ............cccccsseeseeenecccceceneeeeneeees 98 
6.4.1 Two Interpretations of Interval-Valued Fuzzy Sets......... 99 
6.4.2 Shadowed Sets Are a Particular Case 
of Interval-Valued Fuzzy Sets............ccceeeeeeeceeeeeeees 100 
6.4.3 Interval-Valued Fuzzy Sets Are a Particular Case 
GI WGN 2 FUZZY SONS cscs iaieaccdesestes ven ceeebinseeceteeeay 100 
6.4.4 Some Problems with Interval-Valued Fuzzy Sets .......... 100 
GAS APPIO caesen iania ceeds diaveecedads 100 


1571 


*}u0) payie1ag 


1572 


*jU0) payle}ag 


Detailed Contents 


6.5 Atanasssov's Intuitionistic Fuzzy Sets or Bipolar Fuzzy Sets 
öf Type 20r IF FUZZY Sete co ac cicccessacccice nd eseecdeas occa seceuses agate 
65.1 Relation Between Interval-Valued Fuzzy Sets 
and Atanassov's Intuitionistic Fuzzy Sets: 


TWO: Different TOncepts encerc cciesceinrisiseneeses 
5.2 Some Problems with the Intuitionistic Sets 
Defined Dy ATANASSOV 2... ..0..cccc eee cereeesesceeetenwseseaes 
OS: ABDIMGNONS ecien cd cobs ce ctesdeiediae tee seetes vagess 
6.5.4 The Problem of the Name@.............cceeeee cece eee eee eee ees 
6.6  Atanassov's Interval-Valued Intuitionistic Fuzzy Sets ................ 
6.7 Links Between the Extensions of Fuzzy Sets ..............seeeeeceeees 
o8 Other TES OF SATS nrnna soos tae Siete a i ca beesennees canis 
6.8.1 Probabilistic Sets ii.0.: sieeweweccan ei va betmerewsaan ooneaenadinas 
6.8.2 Fuzzy Multisets and n-Dimensional Fuzzy Sets ............ 
6.8.3 Bipolar Valued Set or Bipolar Set..............ccceee seen eens 
6.8.4 Neutrosophic Sets or Symmetric Bipolar Sets............... 
6.8.5 HESITANT SOUS. E E E 
6:8:6 Fuzzy SOM Sels sc. oieracea donit neia iEn 
6.8.7 Fuzzy RONEM SEIS onecie e aa Wares 
CS COMINS a r en E E EE DONE EA ETE 
RETere COS ieaiaia E EERE EE Eaa 
F-Transform 
RAA S E E E cadeancisdeseuses aeaen 
Rak Fuzzy MOUE ii cds cencess cae cae cahnnt aes steed seicueaeseetoesaeseeace se 
Tad PURE PaO xd beens acess. cgosaec teed nei iei a Ee adele 
tal Fuzzy Partition with the Ruspini Condition ................ 
Tokra Fuzzy Partitions with the Generalized Ruspini Condition . 
Takia Generalized Fuzzy Partitions..............ccecceceeeeeeeeeees 
Taa FUZZY ansio encens iernii ais n E e E ei E aden 
Tal Direct F- WatiStOING esios resessiarcoer neidon sn iaee 
(eer verse P= ANSTONN os ices ciate ceases cuties tides cece iaosiegan 
Ta  -BISCHETS FHUANSTONNY ies cc cases csccawanaesnesien eeoupsinceeaneseeewael ows 
7.5 F-Transforms of Functions of Two Variables ...................e eee eee 
RE cake) || 1 | SORE pene ae En AC i a eR aE eee ete one 
Tel “PP BICMOS cacacccccccosta nadie ose teradoonwestiseisudeen EENG 
Ea Image Compression and Reconstruction...............0000e 
Fahad Witla Se FUSION ccd. sccsav E 
Teles F -Transform Edge Detector .........cccccceseceseseseeeeeees 
Ta. LONEOS norui a eE EEE EE E E A ESE ES 
TOME COS oeecrainerin a T E E OE EEE 


Fuzzy Linear Programming and Duality 


Joroslav Ramik, Milan Vlae enssesecrorrrsirerercrsesssirr cstern cisnes 
8.1 PrelnmiNnaritS. secnrcccersr isecen aannsneieienneni ei a 
8.1.1 linear PiogtammiNE -oriire socawed ienna 


8.1.2 Sets and IZZY SOUS svc ssdeniaacwssndewmerscemedscusmaneeesiadnes 


Detailed Contents 1573 


8.2 Fuzzy Linear PrograMMing............cccccceeeseeeeeeccceeeneeeeeceeees 135 
8.3 Duality in Fuzzy Linear ProgrammMing..............ccccceeeeseeeeeeeees 137 
8.3.1 Early AG PROGE NGS inc ccccsicesaceeedaneesae Mew staseeeacessdubes’ 137 
8.3.2 More General Approach ........sssssssssesssseseessessessessso 140 
S COMISO asena iiien a Ei EE 143 
Referens eoero E E E EA O E EAEE TEA ead 143 


9 Basic Solutions of Fuzzy Coalitional Games 


Tomáš Kroupa, Mion NaP accsrnirereirnrnnennn esasi aian ea 145 
9.1  Coalitional Games with Transferable Utility ................. cece econ ee 146 
gRr TEO rai Paved vines N EE EE 147 
9.1.2 The Shapley Vale eccocorireseriencrie neoion esnia 147 
9.1-3 Probabilistic Yal S ee erien nnna ennea 149 [=] 
9.2  Coalitional Games with Fuzzy Coalitions ................. cece cece eeees 150 = 
221  Mültivalued SOUR NR scssi besa bocuts cheagens 151 = 
9.2.2  Single-Valued Solutions ..........ccccceseceeeeeeeeeeeeeeeees 153 = 
BS E GIANG. a a E T 155 S 
e e E swede dons nea sash idcdaac ovase de dees nase odddesweaenee dese aes 156 Ge 


Part B Fuzzy Logic 


10 Basics of Fuzzy Sets 


János C FORO imre J PUJOS o5..t6i sad cease chdsacohadbedvassaetsaseeesasabaisad 159 
10.1 Classical Mathematics: and LOGIE... 6.6... vvewedeaes sess caves ceed owen 160 
10.2 Fuzzy Logic, Membership Functions, and Fuzzy Sets ................. 160 
10.3 Connectives in FUZZY LOC. is. cece sc vee wwsatwen i sade eeewealeaen 161 
10.3.1. NegationS os cca wissa saves adv engaawiescanwe panend cen tvs NEREA 161 
10.3.2 Triangular Norms and CONOrMS..............ceeeee cece eeeeee 162 
10.3.3 Fuzzy IMPIAN i055 vee csp aaasaasaveseinasdere tageedaies ears 166 
10.4 Concluding Remarks fei vicd coe eae cacvae ota tw ews deen ewinadeeeeaaawan 168 
ROTETONCCS vicicciicscsscs see vacsaaeanasin dca veeees EREE ESINE EEEREN saree 168 


11 Fuzzy Relations: Past, Present, and Future 


Susana Montes, Ignacio Montes, Tania Iglesias ...........ccceeeeeceececeeees 171 
Lll Fuzzy Rela oss soe ai a E ESE eA 172 
11.1.1 Operations on Fuzzy Relations..............c cesses seen eee 172 
11.1.2 Specific Operations on Fuzzy Relations .................0065 173 
Dad. CUT RERE eena aE EERE aud E RNE sm E EERS 174 
11.3 Fuzzy Biņnaty RelO ocene nnsenncenienrnaes nnise nenes 174 
1S1 RSC Y ecn a E 174 
11.3.2 MENEN oc osccscactineecsseasccstetesssaunanaeecteesaasaace 175 
Fe e111 ee OEE ERE RENEE 175 
1G AMUSE UY eceania a a dae 175 
1an. AMEN oere EEE EEEE EEEE TERES 175 
ILe VAn eiaeaen e a e ea 176 
11.3.7 Negative Transiti- oc. sc ccssscedeecccessaeteis cee ele sedewecs 177 


TL 368. Semitrans iVi sio8 a oiawcveaewesensnaucessanicensseedsesawsn cee 177 


1574 Detailed Contents 


*jU0) payle}ag 


1 


1 


N 


Ww 


11:3.9 Completeness nennen en a a 
11:3.10 Unea esas ccates nei ee saceets ondasascevsaescasboesdevageie 
11.4 Particular Cases of Fuzzy Binary Relations.................eeeeeeeeeeee 
11:41 Similarity R@IMON..co....ccsssve ces coc cesaec soninn end 
ILG Zy UE ces scdeenbedhcskcotness cecaeeusaiesenmeadeesantonds 
11.5 Present and Future of Fuzzy Relations...............ccceeeeeeeeeceeees 
PCM TEENA 0:55 pae ot aco gto onsen ounce anle ENEE EE Mies 


Fuzzy Implications: Past, Present, and Future 
Michat Baczynski, Balasubramaniam Jayaram, Sebastia Massanet, 
POOR TONENS sss cices's dictisncceie teed sovates sceaulaeer teagan da saraes E a EEE 


12.1 Fuzzy Implications: Examples, Properties, and Classes .............. 
12.2 Current Research on Fuzzy Implications ................ceeeeeeeeeeeees 
12.2.1 Functional Equations and Properties ..................ee eee 
12.2.2 New Classes and Generalizations ...................... eee 
12.2.3 New Construction Methods .................. eee eee cece eens 
12.2.4 Fuzzy Implications in Nonclassical Settings ................ 
12.3 Fuzzy Implications in Applications... ..s..c0cc.cccen cers ecdeee cree 
12.3.1 FL,—Fuzzy Logic in the Narrow Sense ...............eeeeeeee 
12.3.2 Approximate REasONning..............cceeee eee ceceeeeeeeeeees 
12.3.3 Fuzzy Subsethood Measures ........0.60.eceecsecceeeeecee cs 
D235. FOZ COM nerra ene ene a couriers aden 
12.3.5 Fuzzy Mathematical Morphology..............ccceeseeeeeees 
12:4 FUtUTe OF FUZZY Impl catons ..2.2cs.dces ev ecesds cacdin sc soetsesceass 
RGTONO IRS A E A E A E ubeo ue sinaaenonn ane seals dad 


Fuzzy Rule-Based Systems 
LOIS Maod enU ceii coven dees vagal de ia E E sae aed esate EEE Enia 
13.1 Components of a Fuzzy Rule Based-System.............seeeeeeeceeeee 
13:11 Knowledge Base incercari heen ote ia 
13.1.2 Processing SUUCtUTE os sc cavieasccancasscteesecacedentegenanmes 
13.2 Types of Fuzzy Rule-Based Systems ............cceccccceeeeseeeeeceeees 
13.2.1 Linguistic Fuzzy Rule-Based Systems ...............eeeee eee 
13.2.2 Variants of Mamdani Fuzzy Rule-Based Systems .......... 
13.2.3 Takagi-Sugeno—Kang Fuzzy Rule-Based Systems.......... 
13.2.4 Singleton Fuzzy Rule-Based Systems ...............eeeeeeee 
13.2.5 Fuzzy Rule-Based Classifiers .............ccceccecceeseeeeeees 
13.2.6 Type-2 Fuzzy Rule-Based Systems ............ccceeeeeeee ees 
13.2.7 Fuzzy Systems with Implicative Rules.................eeeeee 
13.3 Hierarchical Fuzzy Rule-Based SysteMs............ccccceeeseeeeeceeees 
13.4 Fuzzy Rule-Based Systems Design ............cceeececcceeeseeeeeceenes 
Dove.) FRES Propertles:-ciccees cgi ccvs sie cdasbecessaetes dacceees eeaed 
T342. “DOSING FR BSS sc5cc. hchoaed vaseates ee de Selowecmhocsaasteaien ce 
Se ONC SINE Seca esas bates taste wacea tn ence Dedede ea eeeetocmea sb daus 


Refor nits oss scicis  Sdccdavicccaswencavesenes eKO EEE ETIE A see EET SEE 


Detailed Contents 1575 


14 Interpretability of Fuzzy Systems: 
Current Research Trends and Prospects 


Jose M. Alonso, Ciro Castiello, Corrado M@NCOP.........cecceucceuccecceesceess 219 
14.1 The Quest for Interpretability................. cece eee ececceeeeeeeeeeeees 220 
14.1.1 Why Is Interpretability So Important?....................065 221 
14.1.2 A Historical Review ........... ccc eee e eee eee reece sees teens 222 
14.2  Interpretability Constraints and Criteria ............... ccs cece eee eee eee 224 
14.2.1 Constraints and Criteria for Fuzzy Sets ................ceeeee 224 
14.2.2 Constraints and Criteria for Fuzzy Partitions................ 225 
14.2.3 Constraints and Criteria for Fuzzy Rules ................0005 226 
14.2.4 Constraints and Criteria for Fuzzy Rule Bases............... 226 
14.3  Interpretability Assessment .......... cc cccee esse cece eeeeeeeeeeceeeeees 227 
14.4 Designing Interpretable Fuzzy Systems .............ccceeseseeeceeeeees 229 J 
14.4.1 Design Strategies for the Generation of a KB a 
Regarding the Interpretability-Accuracy Trade-Off ........ 229 = 
14.4.2 Design Decisions at Fuzzy Processing Level................. 232 = 
14.5 Interpretable Fuzzy Systems in the Real World ...............eeceeeee 233 S 
14.6 Future Research Trends on Interpretable Fuzzy Systems ............. 234 * 
1T Comeu enendeni ea r EE wince obec 234 
Refere NES anon o EE EE EEE 235 


15 Fuzzy Clustering - Basic Ideas and Overview 


Sadaaki MVO orreri caress nce tow ses eoviea aantitws ENERE ERNE teenies aun 239 
15.1 ‘Fuzzy CMWSTO RNS ii calvin dove coe in lua bineeniwnddawad on Es 239 
15.2 FUZZY COMCANS 6 ii csc ciascceaiirscsi cds arusa ERSEN EERE daeekaaans 239 
15.2.1. Notation. ide ciccck nc cecie cee irs iis EEEa cee ewes 239 
15.2.2 Fuzzy C- Means ASO oeie i sancsnsacs x eenee teens oace 240 
15.2.3 A Natural Classifier 0.65 60.66 scecse sen sevseest conor dene seeenes 240 
15:24 Variations of Fuzzy €= Mean’ <i: canes cries cos none sows canine 241 
25.2.5 Possibilistic CUSTOM NE a ncievinw cd venewiewerussvieian Veen ed boner 242 
15.2.6 Kernel-Based Fuzzy c-Means................. ce eee ee ee eee eee 243 
15.2.7 Clustering with Semi-Supervision...............ceeeeeeeeees 244 
15.3 Hierarchical Fuzzy Clustering oo. sc vies ssneed cca ee acai s oo eees nenes coins 245 
15.6 COC SIO co aie oiaseeisincsrain.e ooreis ra s tei ¥ since gas aiaie ENE E a deine a 246 
RTT ON COS enini sat icdaienias Voindiaw sia ERNES ENESENN ETN EEEREN REENEN ES 247 


16 An Algebraic Model of Reasoning to Support Zadeh's CWW 


anA E. e E E aie nid 475 hale aneiticin E babe lnels aivieuliwie E sds 249 
16:1 AView Of) REASONING ocine a eeececcnaseddeews caveiaas 249 
1G. MMA NS aicishasnrsnacoia’s res anpseresg/asvin’asalacoiasaisnisinsG ois « si vabverernletainlecsia oldinlorandiaisiein EE 250 
16,3 ReO as 5siccke cadcicaiiverswdontedeieaceiiheshevees E a EE 251 

16.3.1 A Remark on the Mathematical Reasoning................. 252 

16.3.2 A Remark on Medical Reasoning ...............seeee eee eeees 254 
16.4 Reasoning aiid LOG ..o...c ccc. ccceee wee secetensde nes snedseaewseeentens 254 
16.5 A Possible Scheme for an Algebraic Model 

Of COMMONSENSE REASONING......... cc cceee eee c eee ccceeeteeeceeceeeeues 255 


16.6 Weak and Strong Deduction: Refutations and Conjectures 
in a BFA (With a Few Restrictions) .........cccccseseeseeseeeseeeeseeees 260 


1576 Detailed Contents 


*jU0) paylejag 


16.7 Toward a Classification of Conjectures .............cccccee sees eee eeenee 
16:8. Last Remak. occ coc coseia cos caaa Fina nossa cwdeaasdaesawees secesaeenesan 
16.9 COMIÓ sinais EDE EEEE EEEN deseataeeeemons 
PE E A E A 


17 Fuzzy Control 


Christian Moewes, Ralf Mikut, Rudolf KruS@...........ccceseccccccceeneeeeeees 
17.1 Knowledge-Driiven Control aerrccisiacircsissoiseresisiiiccsistreinieses 
17-2 Classical Control Engineenng ...ccis.. sie. esses cvead eee cenvees canes 
17.3 Using Fuzzy Rules for CONG: icc csacccsseccces none gaa cassecee cde eases 
17.3.1 Mamaodani—Assilian COntrol.....i0..c.ccsccsveecveds eevee s saves 
17.3.2 Takagi- Sugeno COUN cerien arnein eaa aai 
17.3.3 Fuzzy Logic-Based Controller sc... ic. c0cds0<ccres ceases caves 
17.4 A Glance at Some Industrial Applications .................c eee e eee eeee 
17.4.1 Engine Idle Speed Control 4... 6.....c0c06escecceee nes ceeceeee 
17.4.2 Flowing Shift-Point Determination ..................eeee eee 
17.5 Automatic Learning of Fuzzy Controllers..............cceseeeeeeeeeeees 
17.5.1 Transfer Passenger Analysis Based on FCM.............00005 
176 Conclusion tas sc wcities codes scwvenexvaee exh a ceva seigaene te dhe ee oie o's’ 
Referen ES ii sca scaasaewaadeesgiae ETATEN ENAKE EEEE ETENE bA ES Ba 


18 Interval Type-2 Fuzzy PID Controllers 


Tufan Kumbasar, fani Hagas. 3c cccies cocina neoin tn enii 
18:1 Fuzzy Control Background oasis ccivaccctaossscwineaiecniamesseeensanes 
18.2 The General Fuzzy PID Controller Structure ...............eeeeeeeeeeees 

18.2.1 Type=1 Fuzzy PID Controllers... nic. ccccecnc cca scaseeeweenee 

18.2.2 Interval Type-2 Fuzzy PID Controllers .................eee eee 
18.3 Simulan SWE ceseco aide wes sce araiareenareience isins 
TE COMO ri a E E ORG 
RETO EUS anina kann r E Oi EEEE E AE S 


19 Soft Computing in Database and Information Management 
Guy De Tré, Slawomir ZACIOZNY, srren rnn iian 


19.1 Challenges for Modern Information Systems...............eceeeceeeee 
19.2 Some PSU MANNS erii i aE EEEE EA 
19.2.1 Relational Databases ...........cccccceeseeseseseesssseenennes 
19.2.2 Fuzzy Set Theory and Possibility Theory .................66- 
19.3 Soft Computing in Information Modeling .................ceeeeeeeeeee 


19.3.1 Modeling of Imperfect Information — Basic Approaches .. 
19.3.2 Modeling of Imperfect Information — 


Selected Advanced Approaches..............seeeeeeeceeeeees 

19,4 Soft COMPUT AS in Querying -ssc ccccrscsicisercrsrsieersncesiresiasas 
19.4.1 Flexible Querying of Regular Databases .................... 
19.4.2 Flexible Querying of Fuzzy Databases..................00005 

T39. CONE SONS visors. cre ira aE E nen ta tetee eaten OES 


ROTO I PITGOS A diossissarayan'p aterncesesicardis oe wo stein E 


Detailed Contents 


20 Application of Fuzzy Techniques to Autonomous Robots 


Ismael Rodriguez Fdez, Manuel Mucientes, Alberto Bugarín Diz............ 313 
20.1. Robotics and Fuzzy lOt oericsceciiernaiiiimresisincnneieniniai o 313 
20.2 Wal Fallow ye secacucsdssdesvec tes de chive cdbedssdeevan aes ESEE 314 
PLOM ~ WARP LA do E ae Mecamasitiat A Maweamteedassmdeabens 315 
20.4 Talega TaC MAB eien icenen i e a a aa 317 
20.5 Moving Target Tracking ........... ccc cece cece cc eee neste ceceeeeeeeeeeeeees 318 
20:6 Perception cicc6 cxva sce sohas ea cweds teunk var a Soesaaa neu E naeedaad 319 
Bie -P o i a E A EEA EEN 319 
POR MAM ieor merreni a anaE O E R 320 
209 CODO crna a a a 320 
20:10 legged RGODOIS «ica caccidesdesveadve id chavs cdnckeadeeaae ce ia aai 321 
20.11 Exoskeletons and Rehabilitation Robots ........sssssssssssessesseess 322 
20.12 EMOUGN al RODOS -riannee rnern ne s an en enaa EG 323 
20.13 Fuzzy Modeling eesriie raren a aao 323 
20.14 Comments and CONMCIUSIONS ............eceeeee eect eer e tree ee eeeeeeeees 324 
PRT ENS scission nits g vin catcwhanynignsa on snngennnn AEEA 325 


Part C Rough Sets 


21 Foundations of Rough Sets 


2 


N 


Andrzej Skowron, Andrzej Jankowski, Roman W. Swiniarski.............06. 331 
21.1 Rough Sets: Comments on Development.............cccceeeeeeeeeeees 331 
212 Vane ONCEDE oeesernnoanirnice ri seente TED EEES EEEE 332 
21:3 ‘ROWS Set Philösophy.--orcscreiesisriiererir resite nrcbo rii iai 333 
21.4 Indiscernibility and Approximation.........ssessssssesssesssssssseses 333 
21.5 Decision Systems and Decision Rules ............. cee cccceeeeeeeeteeees 336 
21.6 DepëndemilS.. i ccc. cesieseneewsaaacsanbicediienesemerteuateceseeeeeanee 337 
21.7 (REGUCUGM GT AU DUES 2. coicsc cies teenav usd Mati sananers chess eo e 337 
21.8. Rough Membership -crreesescricnrcenseience anea 338 
21.9 Discernibility and Boolean Reasoning .........ssssssssssessessssseses 339 
21.10 Röugh Sets and [MAME ON a oscserancsnisisinsisicinssnei rimasi 340 
21.11 Rough Set-Based Generalizations .............. cc cece cceeeeeeeeeeeeees 340 
2112 Rough SEES and logit ones 20 sen caste ecaienentee Maceradieedas se ddebeas 343 
PAER oa E eaarmentond cmeomiod Quads celts dad ena aneenan’s 347 
PRET S cc sic x50; 05 s:5is- 0:00 aiiin irte r En A E E EEEE AE 347 


Rough Set Methodology for Decision Aiding 


Roman Stowinski, Salvatore Greco, Benedetto MatardazZO................065 349 

22.1 Data Inconsistency as a Reason for Using Rough Sets................ 350 
22.1.1 From Indiscernibility-Based Rough Sets 

to Dominance-Based Rough Sets ...........cccccceceeneeees 351 


22,2 


22.3 


The Need for Replacing the Indiscernibility Relation 

by the Dominance Relation when Reasoning About Ordinal Data .. 351 
The Dominance-based Rough Set Approach 

to Multi-Criteria Classification............ rsiesrescorsirensorsiass 353 


1577 


*}u0) payleyaq 


1578 Detailed Contents 


*jU0) paylejag 


22.5 
22.6 
22.7 


22.3.2 Variable Consistency Dominance-Based Rough Set 


Approach WOKDRSA} c. .cscin cove csendesestaesss tewsssas cesses 
22.3.3 Stochastic Dominance-based Rough Set Approach........ 
22.3.4 Induction of Decision Rules..................ceeeeeeeeeeeeeee 
22.3.5  Rule-based Classification Algorithms .................eeeeee 
The Dominance-based Rough Set Approach to Multi-Criteria 
CHGIOS TG AMINE socio suvadentadcescstgaadunsseveueherckasseteeetuets 


22.4.1 Differences with Respect to Multi-Criteria Classification .. 
22.4.2 The Pairwise Comparison Table as Input Preference 


NATTA RAR TBI, <varsc' dinsacos saaiait ceemiierseaeaes E 
22.4.3 Rough Approximation of Preference Relations............. 
22.4.4 Induction of Decision Rules from Rough 

Approximations of Preference Relations.................06. 
22.4.5 Application of Decision Rules to Multi-Criteria 

Choice and RaW KINE. oioccsd.ceudescserses anea 
Important Extensions Of DRSA........... ccc cc ceeee sence ec eeeeeeeeeeeees 
DRSA to Operational Research Problems .............ccccccceeeeeeeeees 


Concluding Remarks on DRSA Applied to Multi-Criteria 
Dedisiori PIS eesnimede niina ais 


LALE e E E E E EA 


23 Rule Induction from Rough Approximations 


Jerzy W. GeV -BUSSE s ieor iaaa aan wag ENER aids SOMONE RS 
23.1. Complete and Consistent Data 03 csc cs ves caesar siven ta vee aw Soe ave 
23-221. Global COVGHNBS oo.5 iaicdccnnes asides ianwds sadsaees testaviayes 
23.1.2 LOCAL Coverings esnaera cat recn E isseus ved nee 
23.1.5 CRASSIMMOAMON eerren canes spelaisie ae cies naledene 
23:2 [nconsistent Data esri ives ccvvcscdesce ss ehaeds ses edeben tad eevers Ei 
23.3 Decision Table with Numerical Attributes ..................... eee eee 
23:4. lncomplete DALE 305) sca vnc seavehacetacwe eds sua teow ddan verseaweee} edace 
23.4.1 Singleton, Subset, and Concept Approximations .......... 
23.4.2 Modified LEM2Z AISOFINIM -s.s ses 600 c0eeveecs ese oes eevee 
23.4.3 Probabilistic Approximations ............. ccc cccceeeeeeeenees 
23:5 COMCMSIGNS «cates cdi eevees sede riets asi DREE EEE EEE ead 
Referentes . 5 icisccedcaeccaeeeeeadasacadeesasanesad sees sadeeisiocdessccaaewondes 


24 Probabilistic Rough Sets 


Yiyu Yao, Salvatore Greco, ROMAN StOWINSKI ..........00ccceeeeeeeeeeeeeeeeees 
24.1 Motivation for Studying Probabilistic Rough Sets.................0005 
242 Pawlak Rough Sets osc <ciecscwiea nels siciadrense-viw'siouid wiaiernisidalnew aconaicaiesie’e Heeenae 
24.2.1 Rough Set Approximations.............cc cece ccccceeeeeeeeees 
24.2.2 Construction of Rough Set Approximations................. 
24.3 A Basic Model of Probabilistic Rough Sets ..............cccceeeee sence 
24.4 Variants of Probabilistic Rough Sets .............cc cece cece eeeeeeeeees 
24.4.1 Variable Precision Rough Sets ............ cece ceceeeeeeeee ees 
24.4.2 Parameterized Rough Sets ............cceee eee cccceeeeeeeeees 
24.4.3  Confirmation-Theoretic Rough Sets ..............ceeeeeeeeee 


24.4.4 Bayesian Rough Sets ............ccccccccesscenceccceseeeenees 


Detailed Contents 


24.5 Three Fundamental Issues of Probabilistic Rough Sets............... 
24.5.1 Decision-Theoretic Rough Set Model: 
Determining the Thresholds ..............cccccceeesseeeeeees 
24.5.2 Naive Bayesian Rough Set Model: 
Estimating the Conditional Probability ..................... 
24.5.3 Three-Way Decisions: Interpreting the Three Regions..... 
24.6 Dominance-Based Rough Set Approaches ..............ceeeeeeee cece 
24.7 A Basic Model of Dominance-Based Probabilistic Rough Sets....... 
24.8 Variants of Probabilistic Dominance-Based Rough Set Approach... 
24.8.1 Variable Consistency Dominance-Based Rough Sets....... 
24.8.2 Parameterized Dominance-Based Rough Sets ............. 
24.8.3 Confirmation-Theoretic Dominance-Based Rough Sets... 
24.8.4 Bayesian Dominance-Based Rough Sets...............ee00e 
24.9 Three Fundamental Issues of Probabilistic Dominance-Based 
eenia Kt E E stars na aioe tpt EN E yen olan 
24.9.1 Decision-Theoretic Dominance-Based 
Rough Set Model: Determining the Thresholds ............ 
24.9.2 Stochastic Dominance-Based Rough Set Approach: 
Estimating the Conditional Probability ..................... 
24.9.3 Three-Way Decisions: 
Interpreting the Three Regions in the Case 
of Dominance-Based Rough Sets .............seeeeececeeees 
2010 CGS on cic are teed ernn E EE EEE ENE 
PAEAN e E TT 


25 Generalized Rough Sets 


Jinglao Yao, Davide Gucci, Yan ZHONG so. vics cicssccavesgedeseecaeds sevens sien 
25.1 Definition and Approximations of the Models................eeeceeee 
25.1.1 A Framework for Generalizing Rough Sets...............066 
25.1.2 Binary Relation-Based Rough Sets.............ccceeeeeeeees 
25.1.3 Covering-Based Rough Sets.............cceccccceeeseeeeeeees 
25.1.4 Subsystem-Based Rough Sets ............ceeccccceeeeeeeeees 
25.2 Theoretical Approaches 35. 6iscc ace sce isanad damiaaiadser oe saveaivss auc 
25.2.1 Logical Setting oeicsieec ss aidea swine sie Pancestewnin ies enei Soe bos oe 
PO2 2. WOROIG RY discs < sates snniece sind dues E samewaeeiee de e's Bowes colo 
25,3. COMUS orreri iirst vivre Gieleweineid Ee v dion EEE TEENE NES 
RETBIONCOS sa cs cccdeiscrannsebsiaetasuccd saris EEEN TEREE ENEA TENEN ENE 


26 Fuzzy-Rough Hybridization 


Masahiro Inuiguchi, Wei-Zhi Wu, Chris Cornelis, Nele Verbiest.............. 

26.1 Introduction to Fuzzy-Rough Hybridization .................e eee ee eee 

26.2 Classification- Versus Approximation-Oriented 
FUZZY ROUEN Set Models .....ccccicccseas ee isinsi hseeceniecerseesdesaes 
26.2.1 Classification-Oriented Fuzzy Rough Sets ................6- 
26.2.2 Approximation-Oriented Fuzzy Rough Sets ................ 
26.2.3 Relations Between Two Kinds of Fuzzy Rough Sets........ 
26.2.4 The Other Approximation-Oriented Fuzzy Rough Sets..... 
BG See. REMAR serra r E a ES EEE EA 


1579 


*}u0) payie1ag 


1580 Detailed Contents 


*jU0) paylejaq 


26.3 Generalized Fuzzy Belief Structures with Application 


in Fuzzy INPOF MATION SYSTEMS cc ceived occvedieed adedean exadendesende ees 
26.3.1 Belief Structures and Belief Functions ...................06+ 
26.3.2 Belief Structures of Rough Approximations................. 
26.3.3. Conclusion OF THIS Secom e ..ieseeceveeesnewteeecatacese nes 
26.4 Applications of Fuzzy Rough Sets .......... cc ccceeee eee ceeceeeeeeeeees 
26.4.1 Applications in Machine Learning................ceseeeeeeee 
26:42 Other Applications iis occa cocci steed dads cassis ahaa cackeesceeen 
ROTORS EE E E A S A 


Part D Neural Networks 


27 Artificial Neural Network Models 


Peter Tino, Lubica Benuskova, Alessandro Sperduti ............eeeeccceceeees 
27.1- Biological NémONS socicisiriissiserieteeriesest oirnn i a E sued 
Dhak, PCRS Tw ernerieroiomennii s n EEEE E ES 
27.3 Multilayered Feed-Forward ANN Models ............seccccceeseseeeeee 
27.8 ReCUIENt ANN Models 0... ccsccks cocssens occas ctsesecedavensoeeseneas 
27.5 Radial Basis Function ANN Models...............cceecccesceceeeceeeees 
27.6 SHUM MIZE Mapi ecarri si reai eiio aE 
27.7 Recursive Neural Networks .........sssssssessesesessesesssseseseseesese 
27S ONUUAOM o aaesid saicden stand e en aE E E E E EE 
PE E E E T T due de obeaeeunens 


28 Deep and Modular Neural Networks 


Tee EIN E E E E TEE 
Bic, OVENI WE oc coin wisieahalcts aa guiness aeea E E ab semaas 
28.2 Deep Neural NETWOIKS 2002 cece cei cesceeseeueeecee ese eeunaecdvessceuen 
28.2.1 Background and Motivation ..............c ccc cecceeeeee eens 
28.2.2 Building Blocks and Learning Algorithms .................. 
28.2.3 Hybrid Learning Strategy ........... cece eee e cece ce eeeeeeeeeee 
Zoo “REIGVANUISSURS cacani exes dexnes tees seseaneeenan 
28.3 Modular Neural Networks................. cece cece cece ee eeeeeeeeeeeeees 
28.3.1 Background and Motivation .............. cece cceeeeeeeeee ees 
28.3.2 Tightly Coupled Models ............. cc cece cece ccc ceeeeeeenees 
28.3.3 Loosely Coupled MOdels........... cc cccce eee e ccc cceeeeeeeeees 
28.3.4 REPVANTISSUES ccce nede ae eae siedaeesecaeaeioae nen 
28:4 “COMCUCING REMAINS .cosd ocecdsdeavecdensadaces Saves aecutd ee aden 
RECN MENS iia 6: 555; 5 sidsnrdiarsncain A A E E 


29 Machine Learning 


Jameės T. Kwok, Zhi=-Hua Zhou, Lei KY oo oi eaceecaneesseesses vse sdetseccsedese vs 
ao OPEP occ forces cnt E sinces os EE E E in weseseyeke sacah cam 
29.2 Supervised LOSE... ccc sonnis eperra nnani i e a 

29.2.1 Classification and Regression............cceeccccceeeeeeeeees 


29.2.2 Other Variants of Supervised Learning .........sesesssssess 


Detailed Contents 


29.3 Unsupervised LGarnlie..: oc. ccck cick cee enedceee ces aned cov ioe tanai 
29.3.1 Principal Subspaces and Independent Factor Analysis.... 
29.3.2 Multi-Model-Based Learning: 

Data Clustering, Object Detection, and Local Regression.. 
29.3.3 Matrix-Based Learning: Similarity Graph, 

Nonnegative Matrix Factorization, and Manifold Learning 
29.3.4 Tree-Based Learning: Temporal Ladder, Hierarchical 


Mixture, and Causal TRG. occ...cc.ccccisaccessedevsescevsess 

29.4 Reinforcement Leathe 2 2..cccc ics coswacceeesaeenesceuedeseseneasanes 
29.4.1 Markov Decision Processes ............ 2... eee e eee e cece eee 
29.4.2 TD Learning and Q-Learning.............. ccc ccceee essence ees 
29.4.3 Improving Q-Learning by Unsupervised Methods ......... 

29.5 Senii-Superised learning. 2... ccc. ccedscsiescessareduereeweeseeenaes 
29:6 ‘ENSSTIDICMEUNOUS arcean tienes tees ceessaesinesy atabevedeseas 
Roel Basie COMME INS orori renees ni n EE E RE 
29.6;2 BOOSUING cocci 2 esdac beac dices e a t e ees eaa 
DOM BAE ora aen Ee E a RR eSEE 
29.0% “SACKING ocorren ae i i E a e EE aai 
2965 DWE eosina a aai a a E 

29.7 Feature Sélection and Extraction... c:sersssssrssrcrrcsseriosserssesresi 


PARE COS E EEEE E E E 


30 Theoretical Methods in Machine Learning 


Badong Chen, Weifeng Liu, José C. PriINCiPe.........cccceceecccceeensceeeeeees 
30.1. Background OVEMEW cnc cecccccsa cased eesscatnccsaemewse seco e@eusasees 
30.2 Reproducing Kernel Hilbert Spaces .............. cece cc ceeeeeeeeeeeeeee 
30.3 Online Learning with Kernel Adaptive Filters .................eeeeeeee 
30.3.1 Kernel Least Mean Square (KLMS) Algorithm................ 
30.3.2 Kernel Recursive Least Squares (KRLS) Algorithm........... 
30.3.3 Kernel Affine Projection Algorithms (KAPA)...............6. 
SOS: INNSEARION Exames inanan aana ne 
30.4.1 Chaotic Time Series Prediction................. eee eee ee eee ee 
30.4.2 Frequency DOUBIING.............. 00s cccctecs sensed cseses amen’ 
20.4.3 Noise Cancelation cis .c. cavccrcecdercvaseanes bockeseerncdaes 
30.4.4 Nonlinear Channel Equalization ...............cc eee ee eee eee 
30:5 CNU O aan Pawnanto gen consedauudes elie dead eam anvenants 
PROTEINS ae i sone os¥ 559-9 win 8 iinr EnA E rE AEE E NEEE A oneas 


31 Probabilistic Modeling in Machine Learning 


Davide Bacciu, Paulo J.G. Lisboa, Alessandro Sperduti, Thomas Villmann . 


31.1 Probabilistic and Information-Theoretic Methods................008. 
31.1.1 Information-Theoretic Methods.................ceeeceeee ees 
SL,1.2.' Probabilistic MORES... ics cosciidesageses iccenavegeseded eedac 

231.2 Graphical Models.. ccc c0c55.ccctss hdesia teistesesdanetaddeusastuasens 
31.2.1 Bayesian NetWork ....cccedcceewcdadiadeeectdereliecessasasee 
31.2.2 Markov NetWOrKS . 0.0.5.5 c0.000cctesecccssaeteasgccesesedevece 


Sipe METER OG ois os iress ssincicemssaacseianeereev ous cangeecescwatenss 


502 
503 


1581 


*}u0) pajie1ag 


1582 


*jU0) payle}ag 


Detailed Contents 


32 


31.3 . latent Variable: MOdGS. cic ceed dues sevGhe sois oesie 560 
31.3.1 Latent Space Representation .............c ccc ceceeeeeeeee eee 561 
31.3.2 Learning with Latent Variables: 

The Expectation—Maximization Algorithm ................. 561 
31:33 lipéar Factor ANGIVEIS sects ccs aceceescceesaeendocsdteseciues 562 
Sl ° PUNE IE a ccna ccs sades lovee Oeiidis dase eeetdahinsn 563 

BLA Markov Motl oc cciiscicevas coe siducssessdeae re EES EO E EEEE see 565 
Sst) Markor MONE eirin eanna Enn ease wiles eeesnasitasn 566 
31.4.2 Hidden Markov Models.................. cee cece esse ee eeeeeeee 567 
Ble Related MOE. ccc.cicy canes bs onts yee cdeearndes scenes dates 571 

31.5 Condusion and. Further ROQMIHE. cee ccaicadecnsacined sca seaceceucenee 572 

nE e E E E 573 

Kernel Methods 

Marco Signoretto, Johan A. K. Suykens ......s.sssssssssssesesssessssesssssses 577 

321 Baeken eisen erraia a EE EE AREE ENERE RANE Sa ERE 578 
32.1:1 Summary OT the Chapter aia ciel ses Veen eeanied iicrs inicia 578 
32.1.2 Historical Background ccc scnietaancet sarees rnncs vencecewieeees 578 
32.1.3 The Main INSTOMIBINTS.... cas covs coed casas donnie vader eonaed es 578 

32.2 Foundations of Statistical Learning .............. cc cece ceeeeeeeeeeeenee 580 
32.2.1 Supervised and Unsupervised Inductive Learning......... 580 
32.2.2 Semi-Supervised and Transductive Learning............... 580 
32.2.3 Bounds on Generalization Error... ......c.cese ceed cneee sen 580 
32.2.4 Structural Risk Minimization and Regularization .......... 583 
32.2.5 TYPOS OF Regularization © ceicinicesaiecic dscns s seu sa veioeie ¥ wena sve 584 

32.3 Primal-Dual MOtHOdS eserse sea cccie ces senescence acacee snes ne 586 
32.3.1 SVMS før Classification. 6.63..0600 sees cede venese seeecdeneniey ee 586 
32.3.2 SVMS for Function EStUMADION srs ssrssrssrssrrirrsersases 589 
32.3.3 Malt Features of SVMS 6.655 cccce sce Vecsey vee ceed been dw 590 
32.3.4 The Class of Least-Squares SVMS..........c ccc cccceeeeee eens 590 
32.3.5 Kernel Principal Component Analysis ................eeeeeee 592 

32.4 Gaussian Processes o oes 6c cass sisal sis aa aints s saat seria rr vinosi eed aerise ene 593 
SLM, Definition eccessi rieni eni enei nei i easa 594 
52.8.2 GPS for Regressio scrini naea iaa 594 
32.4.3 Bayesian Decision TheOry sessicscesisesisiresssciresiinerissa 596 

32.5 Madel Selecto ais. ci sies caacisoiaaa.cis antes s oe cee en 4 dale aoe aulee aa 596 
32.5:-1 Cross-Validation ..0:6565..0ccivseceveviee veeueie dese ses eeuied sani diee 596 
32.5.2 Bayesian Inference of Hyperparameters...............00005 596 

326 MOVE OF Kerels ois cis s cases same seein ais'e'tie cece earnde-e nnne enea 597 
32.6.1 Positive Definite Kernels ................ eee eee ee eee eee ee 597 
32.6.2 Reproducing Kernels escscccicrsiii ceci verii iiris winnie dv 597 
32.6.3 Equivalence Between the Two Notions ................0006+ 598 
32.6.4 Kernels for Structured Data ............. 0... eee ee eee cece eee 599 

321 AP IMGALISIIS 6 scisicis cic nscen ewines seat es aalvinice aid E ERO sas SEE born ocd Cate 600 
32.1. Text CategonzatiDM eesse inei eeni eein a we 600 
32.12 Tme-series ANANS 25 ceca asaucs con ce caaeaesaewends iiie 600 
32.7.3 Bioinformatics and Biomedical Applications............... 600 


RO TATOMGCES: aiiin saonadaciedese od aw vou sana gaaees 290 sae E sae awe aia 601 


Detailed Contents 1583 


33 Neurodynamics 


Robert Kozma, Jun Wang, Zhigang Zeng.........cccceeeeecccccccceeeseeseeees 607 
33.1 Dynamics of Attractor and Analog Networks ................eeeeeeeees 607 
33.1.1 Phase Space and Attractors ............c cece ccc cee seen eee e ees 608 
33.1.2 Single Attractors of Dynamical Systems ................00eee 609 
33.1.3 Multiple Attractors of Dynamical Systems................... 610 
DSL EONS IIA ios daa sw sain aes ekansan ea 611 
33.2 Synchrony, Oscillations, and Chaos in Neural Networks.............. 611 
33:21. “SWIC MOni 22d. cweveeesviacscadeeteuedenstecdsesenemens 611 
33.2.2 Oscillations in Neural Networks ..................eeeeeee eens 616 
33.2.3 (hadtic Neural WEIWOrKS .............0ccrecrteevererencenanes 623 
33.3 Memristive NEUrOdYNaMICS...........c cece cece eeeee cece eeeeeeteeeeeeees 629 
33.3.1 Memristor-Based Synapses ............. cece cccceeeeeeeeeeees 630 J 
33.3.2 Memristor-Based Neural Networks.................eeeeeeeee 632 a 
DSS OMS NIA: ios. ces E 634 D 
33.4 Neurodynamic Optimization............sessosesesessosrsseessosesseeoss 634 = 
33.4.1 Neurodynamic Models .......sssssssssssssessssesesssssseses 635 S 
33.4.2 Design Methods..........cccseecececceseececeecetcececeenesecs 636 E 
33.4.3 Selected Applications .........s.osssosesssesseseseoesseeeeese 638 
33.4. Concluding Remarks. ...:.0.c.00cccee sc css eeendeceusesceuaaes 638 
REEMS orre oen aE needa aticetaee ccdoe cad’ pened nreenasetedeaces 639 


34 Computational Neuroscience — Biophysical Modeling 
of Neural Systems 


Harmison Stratton, Jennie SH 8 oa 8 Sicue ce cuss nee esieliee ade oeewle sa areentswewlseende 649 
34.1 Anatomy and Physiology of the Nervous System ...............eeeeee 649 
34.1.1 Introduction to the Anatomy of the Nervous System...... 650 
34.1.2 Sensation — Environmental Signal Detection............... 651 
34.1.3 Associations — The Foundation of Cognition ............... 652 
34.2 Cells and Signaling Among Cells.............ccceeeeeeeeccceeeeseeeeeees 652 
34.2.1 Neurons — Electrically Excitable Cells ................ cece eee 652 
34.2.2 Glial Cells - Supporting Neural Networks...............0665 654 
34.2.3 Transduction Proteins — Cellular Signaling Mechanisms .. 654 
34.2.4 Electrochemical Potential Difference — Signaling Medium 654 
34.3 Modeling Biophysically Realistic Neurons ........... ccc cceee seen eee ees 656 
34.3.1 Electrical Properties of Neurons ........... cc ccceee seen eee ees 656 
34.3.2 Early Empirical Models — Hodgkin and Huxley............. 657 
34.3.3 Compartmental Modeling — Anatomical Reduction ....... 659 
34.3.4 Cable Equations — Connecting Compartments ............. 659 
34.4 Reducing Computational Complexity 
for Large Network Simulations .............ccccee eee e eee eeeeeeeeeeeeees 660 
34.4.1 Reducing Computational Complexity — 
large Scale Models oo... ccccc codecs sascctewscccieeseeesedencns 660 
34.4.2 Firing Rate-Based Dynamic Models...............eeeeeeeees 661 
34.4.3 Spike Response Model ..............cceeeseeeccccceeseeneeees 662 
Siem I l 4 sbocwia cd did tata racine 8 dcekusidere eee saanerstaleleemtaoeas 662 


RGPSR ENCES oso soc ccnsice cao de cusarienavendeaneecdaeied nigedse see santenscmeenaesces 662 


1584 Detailed Contents 


35 Computational Models of Cognitive and Motor Control 


PAU A, PANG oc csoins T ee See lida E 665 
Bl, MVSVIOW i. cosicncnanmasanes n n T susen sea gneniiusiemscekmaseaanainnae 665 
Bak Motor COMIN eked os foc tend ind avansden O¢ delay claediuses bee doe dees valnned 667 
35.2.1 Cortical Representation of Movement.................eeeeee 667 
35.2.2 Synergy-based RepresentationsS.............ccccceeseeeeeeee 668 
35.2.3 Computational Models of Motor Control.................06- 669 
35.3 Cognitive Control and Working Memory ..............cceeeeeeeeeeeeeee 670 
35.3.1 Action Selection and Reinforcement Learning ............. 670 
35:34 Working MeMO- o:cc. ccc. cc ccsececceecdssveweres scoctees ccaes 671 
35.3.3 Computational Models of Cognitive Control 
and Working Memory «22. .6666s.esescecessaesias conssas edees 671 
2 Saek COO sein.c'sis ears nr r cede TEETE ETE 674 
a ROTGRG MICOS: cies. seein este ciesd Ghote dente dedtbeulesdocetiinees se Seen. eee’ 674 
a 36 Cognitive Architectures and Agents 
So Sebastien Hélie, RON SUN .......ccccecccecceucceuccencecuceeuceesceecceeseeeaees 683 
= 36:1 BaCKSrOUN : isinecccsccs scare saiseesees digs EEEN DETE EEEE EA 683 
36.1.1 QUI cose seversidaustavisaecsiess etre gendebe rid encee tetas 684 
36.2 Adaptive Control of Thought-Rational (ACT-R) ...............cceeeeees 685 
36.2.1 The Perceptual-Motor MOdules...........ccccccccessseeeeees 686 
36.2.2 The Goal MOdUl@.. ccc. cccseescese ses sevcassaccterweceeaweaees 686 
36.2.3 The Declarative Module 2.0.0.0. .c0 ec cedeeeeceeececsoneeeton 686 
36.2.4 The Procedural Memory .......... cc cccee esse cece ce eeeeeeeeees 686 
36.2.5 Simulation Example: Learning Past Tenses of Verbs ....... 687 
Be, a E E E E sea E EA E EEA 688 
36.3.1 Architectural Representation .........sssssesssssesesessssese 688 
36.3.2 The Soar Decision CYEIe. «icc. oa ce < awee sasiec'ens aces sews cee 688 
363:3. [MPEs ES ae oiac iceeinis s sarir inst i Erna ENEE CUE KNEGTE KEEA NE 688 
36:36 BeN O erana sea navies PERDEI EINE EEEREN 689 
36.3.5 Simulation Example: Learning in Problem Solving ........ 689 
zea ARON: iriaren en EE sais nue E REN ia eae 690 
36.4.1 The Action-Centered SubsysteM............ccccceeeeseeeeees 691 
36.4.2 The Non-Action-Centered Subsystem..............seeeeeeee 691 
36.4.3 The Motivational and Meta-Cognitive Subsystems ........ 692 
36.4.4 Simulation Example: Minefield Navigation ................ 692 
36.5 Cognitive Architectures as Models of Multi-Agent Interaction....... 693 
36.51 BeN ssc s cscs <sascawencds e e E a Raia 694 
36:6 General DISCUSSION c acs cawes ssn se cee asd rnes cne iinnre sured sisteieie’s oe 694 
TOPO E aici eie eds ving 34. Soa v aiainns s E ea edaie base alee a anne Coe 695 


37 Embodied Intelligence 


Angelo Cangelosi, Josh Bongard, Martin H. Fischer, Stefano Nolfi.......... 697 

37.1 Introduction to Embodied Intelligence ............ cece ccc cece sees ee eee 697 

37.2 Morphological Computation for Body-Behavior Coadaptation...... 698 
37.2.1 The Counterintuitive Nature 

of Morphological Computation ................c eee e cece eeeee 699 


37.2.2 Evolution and Morphological Computation ................ 700 


Detailed Contents 


38 


39 


37.3 Sensory—Motor Coordination in Evolving Robots .................e005 
37.3.1 Enabling the Discovery of Simple Solutions ................ 
37.3.2 Accessing and Generating Information Through Action... 
37.3.3 Channeling the Course of the Learning Process............ 

37.4 Developmental Robotics for Higher Order Embodied Cognitive 
Capa bes cicideccvccccdsdhesea cee fe cdaue venedd abba arcen sedate scat os 
37.4.1 Embodied Cognition and Developmental Robots.......... 
37.4.2 Embodied Language Learning..............cccccceeeeeeeeeee 
374,3 NUMBER ANE SPACE 5 .cccce ic cesc tic ccnetersacdeedemeses cneee 

Phe CUSO vereor cee iy diene agedouminneecsedeckeeewnnsdeed 

PRG EON CS esos sisi EEE E NE EEE EA 


Neuromorphic Engineering 
Giaconia MAEN «cass cock scindsces aie sedi died EEEE saleauseas 
Se. TM RING esnea entrmsice nea d namwarded «med voles Nadhntacmeacewed nea 
38.2 Neural and Neuromorphic COMPUTING............. cece eeee eee e ence ees 
38.3 The Importance of Fundamental Neuroscience ...............eeeeeeee 
38.4 Temporal Dynamics in Neuromorphic Architectures .................- 
38.5 Synapse and Neuron Circuits 22.00.00... ccc ene ee mee snetsteeeseeteiens 
38.5.1 Spike-Based Learning Circuits .............. cece ceee seen eee 
38.6 Spike-Based Multichip Neuromorphic Systems ...............eeeeeeee 
38.7 State-Dependent Computation in Neuromorphic Systems........... 
38:8 CONTRO asinine a ae 
Rere rnan a eE E E T EE EE 


Neuroengineering 
Damien Coyle, Ronen SASH secicsscsscrscriciniiskenrsesicerikcererinsiniess 
39.1 Overview — Neuroengineering in General .........ssssssssssessssseses 
39.1.1 The Human Motor System............c cee ee cece eceeeeeeeeeees 
39:2 Human Mator CONG cs icrvsancivis Gkewee n a saree’ 
39.2.1 Motion Planning and Execution in Humans ............... 
39.2.2 Coordinate Systems Used to Acquire 
a New Internal Model ..............eceee cece eee e sees eee eeeees 
39.2.3 Spatial Accuracy and Reproducibility ...................0005 
39.3 Modeling the Motor System — Internal Motor Models................ 
39.3.1 Forward Models, Inverse Models, 
and Combined Models ...esssssessesssrsessscsicosssecesrsse 
39.3.2 Adaptive Control TREO -oses icrrnscnacescriicisessidosasi 
39.3.3 Optimization Principles ......0....0000..00.cceieeedsesecenees 
39.3.4 Kinematic Features of Human Hand Movements 
and the Minimum Jerk Hypothesis...............ccececeeees 
39.3.5 The Minimum Jerk Model, 
The Target Switching Paradigm, 
and Writing-like Sequence Movements ...............ee00s 
329.4 Sensomotor Leammi. s2scccce cee sses eeced neces aceedeeedeseseasavea 
39.4.1 Explicit Versus Implicit Motor Learning .................0065 
39.4.2 Time Phases in Motor Learning.............cccceeeeseeeeeees 


1585 


*yU0) pəļ!e4ə0 


1586 Detailed Contents 


*jU0) paylejag 


39.4.3. Effector Dependency «icc iesnccessecadescessweencetstesdeaeees 
39.4 OGAMICUIAUION vis acca cthsdndutudinteseaedesensdess Genes dete sa 
3945- MOVEIIETL ME iis. ctv deccscevataxecomecebacedmoredass teases 
39.5 MRI and the Motor System — Structure and Function ................ 
39.6 Electrocorticographic Motor Cortical Surface Potentials .............. 
39.7 MEG and EEG - Extra Cerebral Magnetic and Electric Fields 
ofthe Motor SYSUC ID os. con ccecee ene ceneaacwhieu sd wwew scab c adenin siecle 
39.7.1 Sensorimotor Rhythms 
and Other Surrounding Oscillations..................eeeeeee 
39.7.2  Movement-Related Potentials................ 2... eee eee eee 
39.7.3 Decoding Hand Movements from EEG................0eeeeee 
39.8 Extracellular Recording — Decoding Hand Movements 
from Spikes and Local Field Potential ............... ccc cccceeeeee eens 
39.8.1 Neural Coding SCHEMES ............. cesses eee c cc cceeeeeeeees 
39.8.2 Single Unit Activity Correlates of Hand Motion Attributes . 
39.8.3 Local Field Potential Correlates of Hand Motion Attributes 
39.9 Translating Brainwaves into Control Signals — BCls................04% 
39.9.1 Pre-Processing and Feature Extraction/Selection .......... 
Soe CISCO 5.5 vaca asc E E T 
39.9.3 Unsupervised Adaptation in Sensorimotor Rhythms BCIs . 
Boe BEL DUTOOK ccdcs ices succouoraadvnnsine En EE nes 
SU CONE SION catewacstarawonan vada aa ete E 
ROTOR DEES iio seg ear e ETOO nen SOE EE 


40 Evolving Connectionist Systems: 


From Neuro-Fuzzy-, to Spiking- and Neuro-Genetic 
PECTED FSO oa occ tse Sc taees EEEE Saaeu Neu dled ew E chee ee ean eelensn 
40.1 Principles of Evolving Connectionist Systems (ECOS)..............0085 
40.2 Hybrid Systems and Evolving Neuro-Fuzzy Systems ...............08. 
HOD si CIA SYE e eeuna na ia EEE elu ces 
40.2.2 Evolving Neuro-Fuzzy Systems ............cceeeeeeeeeeeeeees 
40.2.3 From Local to Transductive (Individualized) 
lēëariing and MOTI osiccccieroicrenecioirssiirsrsist 
HOZ Appia ee eaa E a 
40.3 Evolving Spiking Neural Networks (eSNN) ...........ccceeeceeesseeeees 
40.3.1 Spiking Neuron Models ..........00...cccee eects ee endeneeeees 
40.3.2 Evolving Spiking Neural Networks (@SNN).............00e00. 
40.3.3 Extracting Fuzzy Rules from eSNN .............ccecceeeeeeees 
40.3.4  SSNN ADPICATIONS 220. cccccceis sccuedec cadences accesses cessn 
40.4 Computational Neuro-Genetic Modeling (CNGM) ...............00e00s 
AOD PIRCIPlOS ie. cscs  dcnceseua cancel coaevaeecdetaccesacoishes adebs 
40.4.2 The NeuroCube Framework .............ceeeeeeeeeeeeeeeeeees 
40.4.3 Quantum-Inspired Optimization of eSNN and CNGM...... 
40.4.4 Applications Of CNGM..............ccccccccseeseeeeeeeeeeeeees 
40.5 Conclusions and Further DITECHONG «60... cccsesccesaccees cwccsecneees 
RETENO NCES oii seis sic dessa sive a E E E EEEE EENE EENET 


Detailed Contents 


41 Machine Learning Applications 


FACTO P BEMISSONG oa coisa eaidiae N EEEE EE EET ES 
EIA MO Oe EEEE EE RRES 
41.1.1 Building Computational Intelligence 
Object- and Meta-Models ............ccccccesseeeeeceeeeees 
#112 Model Ufecyele ercran sir ec essaccetstdeueasegesees 
41.2 Machine Learning (ML) Functions .............ccccceeeceeeeseeeeeeeeees 
41.3 CI/ML Applications in Industrial Domains: 
Prognostics and Health Management (PHM) .............cceeeeeeeeees 
41.3.1 Health Assessment and Anomaly Detection: 
An Unsupervised Learning Problem...............cceeeeeees 
41.3.2 Health Assessment — Diagnostics: 
A Semisupervised and Supervised Learning Problem...... 


41.3.3 Health Assessment — Prognostics: A Regression Problem . 
41.3.4 Health Management — Fault Accommodation 


atid Opti ZATION seeni cde ke ee ceececesadecatsecs stare iaxtes 
41.4  CI/ML Applications in Financial Domains: Risk Management........ 
41.4.1 Automation of Insurance Underwriting: 
A Classification PLODIOM es ccsesceeeedeews ecedveaeees eoneawaes 
41.4.2 Mortgage Collateral Valuation: A Regression Problem..... 
41.4.3 Portfolio Rebalancing: An Optimization Problem.......... 
41:5 Model Ensembles and FUSION . 20... cccss ck csc cctssaccesseneeeeeceasees 
41.5.1 Motivations for Model Ensembles ................ceeee eee ees 
41.5.2 Construction of Model Ensembles ................ceeeeee eens 
41.5.3 Creating Diversity in the Model Ensembles................. 
41.5.4 Lazy Meta-Learning: 
A Model-Agnostic Fusion Mechanism...............eeeeeees 
41.6 Summary and Future Research Challenges .............sseeeeeceeeeees 
41.6.1 Future Research Challenges............cccccceeseeeeeceeeeees 
Refere DES i o a dallaaindencacees EEE coe weainane eee EE reas 
Part E Evolutionary Computation 
42 Genetic Algorithms 
Jonatan E ROWE resan eran E E EE EE ae 
42.1 Algorithmic Framework ..c.cscccces cacdl cece sce ba cov awed sansa noses 
G22 Selection Methodi secen arinen serieren esterase 
42.2.41 Random Selection eeerisnuecrieresnnncsiniassereoserss 
42.2.2 Proportional Selection .cocciccrcicsiccrieisrreciserciccrans 
42.2.3 Stochastic Universal Sampling...............cee eee eeeeeeeees 
B22 Sealing Methods oc. cceses ccciw estes eeessaeiess eooviesceebacs 
ide. Rank Seler eesse srera eana 
42.26  Toütament Selecon ass cascode conc cdaetes sawersanecesies 
42.2.7 Truncation Selection eessrsccsesccrsreccrsccnisiercnesunsiss 
2:3 Replacement Methods .<.ccc.20 cee sscedeaseeeks sieni i oni i 


42.3.1 Random Replacement.............cccccccccseesseecceceeeeues 


1587 


*}u0) pəļ!e4ə0 


1588 Detailed Contents 


*jU0) paylejaq 


42.3.2 Inverse SELECTION a. 25066 ccs sec caieeg cies voces encienece seis cee enas 831 

42.33 Replace Wot i.46.0sccs cheevin Hid erme an dee seadeve waned 831 

42.3.4 Replace Parents sic ce cnciew neces ceca deweweaededediceaedeceecess 832 
42:4 Mutation MRUNOdS 2 so ccasasig cecin se cadcce ne overated on saeee venceas onsen’ 832 
42.5 Selection—Mutation Balance ..............ccccee eee ceeeeceeneeeeenenees 834 
42.6 Crossover Methods . so.ccccss cididascvavaddie vs cuesd idvawsceiads sdceaeeress 836 
427 Population Diversi.. .vccceccescecencacteismanseenedemecsineewesule n 838 
42.8 Parallel Genetic Algorithms .........sssssssessesssesessssssessessseseese 839 
42.9 Populations as Solutions .........sssssssssesessssesesssssseesesssesesee 841 
42.10 COMUNS ecsucrnseesretirerciterisian dred Cne EIE ENEA 842 
RGTCROIEES asics ues anseucs s aE n EE E AEE TEE 843 


43 Genetic Programming 


James McDermott, Una-May O'Reilly oscssssesiierseirsorisiresisriresiinecise 845 
43.1 Evolutionary Search for Executable Programs ...............eeeeeeeeee 845 
Be Sik ANS UOT E E E evnis W wine Sig ¥ bore intel einie wiecare db ni 846 
43.9 Taxonomy of Aland GPs ..% ci ccassceisodsteriseeensedsanctens aE 848 
43.3.1 Placing GP in an Al COMOX ...... ces. veces acne sane ced canes 848 
W342 TAXONOMY OF GP. sci cesccauis cease areas RE EAEE 849 
43.0.9 Representations . casa cins decuvs vee eee e eE e E vee ear ane 849 
43.3.4 Population Models ss. .ccciccssicasas eevsesgaoiecesnsseavens es 852 
BSA Uses GP. .cccvsavdevesveeencdaneadevsas veel EEEE seen E 853 
43.4.1 Symbolic Regression ........ssessesesessosesessssososesesseses 853 
#342 Machine Learning ssecrisaseiennscssii ieri iiei ie anes 853 
43.4.3 Software Engineering .........sessssssesessesoseseosesesessee 854 
BES ME Desig essersi scecdsesan tiers ie wore ncaicinte rek EASE EERE Ea 855 
435 Research TOPi oeeie eraren Cauna EEE ERER eE 857 
Bed’ UGE EE E E E T 857 
HLTA GPE sciora s sinon e EE EERE TEE EEEREN 858 
MS Mogulaniiy oie Soa diss aise oss v o:e 0:0 Ginne seis y oad oad tease e EENE a 860 
43.5.4 Open-Ended Evolution and GP.......... ccc cece cece eee ee eee 860 
43:6 PHACEICANIEI OS so sic. c:0i5' eis iinet iieri ek EEn ENEE E EA 861 
43.6.1 Conferences and JournalS.......ssssssssssessssssessessessese 861 
H362 SORNE oeer irin id sine eni e Ea 861 
43.6.3 Resources and Further Reading ......sssssssssssessessesses 861 
Referentes oie. ovine Ses vin ere ss ris ern EEN EEE E EEEN EEEE TETEE EA E 862 


44 Evolution Strategies 


Nikolaus Hansen, Dirk V. Arnold, Anne AUGELS...........cccccceensececeeceeees 871 
ET OYEN oerirneeiseense eneit ieai n E E EE EEE 871 
44.1.1 Symbols and Abbreviations ...............ccceeeeeeeeeeeeeees 872 
44.2 Main Principles .........sessssssessosesssesecsescososcsessoecseseseceeseo 873 
44.2.1 Environmental Selection .............. ccc eee eeeeeeeeeeeeees 873 
44.2.2 Mating Selection and Recombination ..................000. 873 
44.2.3 Mutation and Parameter Control .................cceeeeeeees 873 
A 2A UPAR OUMESS:... i. osc riasain sees saute eee veces ecienes 873 
44.2.5 (w/p +t A) Notation for Selection and Recombination..... 874 


44.2.6 Two Algorithm Templates ..............cccceeeeeeeeceeeeeeees 874 


Detailed Contents 


44.2.7 Recombination Operators .............cccccceeeseeeeeceeeeees 
44.2.8 Mutation Operators ..........sessssesesesssecseseseoeeseoseess 
BS Parameter COMM oeesssiccsreecasirs resecie 
HA.S.1 The 1/5th Süccess RUIE oo cis caccccscccessacdea ss ccseneceesees 
Tuk SPUR APTN orie Sonnets n i E nonce 
44.3.3 Derandomized Self-Adaptation ................ cee eeceeeeees 
44.3.4 Nonlocal Derandomized Step-Size Control (CSA) ........... 
44.3.5 Addressing Dependences Between Variables............... 
44.3.6 Covariance Matrix Adaptation (CMA) ..............ceeeeeeees 
44.3.7 Natural Evolution Strategies .............cccceee eee eeceeeeees 
BA3.8 Further ASP RES oc ice cask cea w ied meeyaded cncenageaeeededendec 
GAD TACO cc ccees cccduses cdeedceeesecciess deebaeteeecbeesdanteedsdeveeetgasacs 
44.4.1 Lower Runtime Bounds ...............ccecccecceeeeee eee eeees 
HEA 2 Progress RATES uc case ees code ciae si ranen E fia ciaetdewees 
44.4.3 Convergence PIOOTS:.. 0005. 6ccccccreveeesceceteeeevewtecsaees 
POTENGI CCS E siwcieneaid Geode ceesdecmea T 


45 Estimation of Distribution Algorithms 


Martin Pelikan, Mark W. Hauschild, Fernando G. Lobo .............ceeceeees 

5.1. Basic EDA Procedures cca. ccctcccen sing icecteseeaees n mead eee obecneats 

45.1.1 Problem Definition .............. ccc eee c cc cceeeeeeeeeeeeeeeaes 

BS 1.2 EDA ProcOdUpe: ic. sac one ds cinad tiene eshte aaa coeds ai oaa 

45.1.3 Simulation of an EDA by Hand ............... cece cece ee eeees 

B5.2 Taxonomy of EDA Models ...... 50.8. cece cceesacedviea ceeds ndeee coeeees 

45.2.1 Classification Based on Problem Decomposition........... 
45.2.2 Classification Based on Local Distributions 

in Graphical MOdEIS. ......08....iccsens donee snadicadeeseeennens 

BS.3 OQVERVIOW Of EDAS 2. ccc icotie aretan nea E E E tea a RE 

45.3.1 EDAs for Fixed-Length Strings over Finite Alphabets ...... 

45.3.2 EDAs for Real-Valued Vectors ........ssssssnssesssessssesesee 

45.3.3 EDAs for Genetic Programming ......es.sssessesessesssssessoe 

45.3.4 EDAs for Permutation Problems ...............eeeecceeeeeees 

GS EDA THOU oceciirni arnein a a E a 

45.5 Efficiency Enhancement Techniques for EDAS..............cccceeeeees 

65.5.1. Parallelizatiói «.2ccc<cawcivcgeacsidnnenvsenedsteetaad qaadates’ 

6S5 2 HYO OM eean ar n a a E 

45.5.3 Time Continuation ..........esessssssosesesessesesseesseseesoe 

45.5.4 Using Prior Knowledge and Learning from Experience.... 

45.5.5 Fitness Evaluation Relaxation ..............cceeeecceceeeeees 

45.5.6 Incremental and Sporadic Model Building ................. 

45.6 Starting Points for Obtaining Additional Information................ 

45.6.1 Introductory Books and Tutorials................eeeceeeeeees 

GE GA SONAE ia cacemtey aces cacenage see etedan tec 

R5 6:3 JOURS eed cece dec thisteesine tied bsesdantessideuseetgasees 

GSG COENE ninani n a a dousieaer elie Eea 

5:7 Summary and CONCIUSIONS:..2 ose. cceks cedeeevegeaaetets oes ehesideeses 


Anne E E E ies nigede dren santencceceneesces 


1589 


*yU0) payie1ag 


1590 Detailed Contents 


*jU0) paylejag 


46 Parallel Evolutionary Algorithms 


DIC O section cclbvacschasaceiavadadas cca saceates degewocntacweces Mebancee gis 
BL, Paralel Models cccanswarsmesewereswcnnesie carina name ciewsieawewenense ai 
46.1:1 MasterSlave Mid els .ciccvin ccs ssceseeteesss ster ccseasede an 
66.1.2 Mmdepehdent RUNS: 3 oi. dos cc te c.e cee caveceies desavecdee odessa 
GELI Bland ModE is canis seins ceiess cwseasaescadeeties seacean vctegan 
Holy- Celuli ERGs oasis asc seiciienedecesnonlea E E E E 
46.1.5 A Unified Hypergraph Model for Population Structures ... 
BG LG Hybrid Models eccccsccccsncsinsesesrirrorr necesites neeesa 
46.2 Effects OF Paralellzaton sisas is eis es dee sds cde sescesdan irrena 
46.2.1 Performance Measures for Parallel EAS...................06. 
46.2.2 Superlinear Speedups ..............c cece sce ceceeeeeeeeeeeeees 
46.3 On the Spread of Information in Parallel EAS ...............ccceeeeees 
46.3.1 Logistic Models for Growth Curves.............ccecccceeeeees 
46.3.2 Rigorous Takeover TIMES ........... cece cceeeeeeeecceeeeenees 
46.3.3 Maximum Growth GUIVES ..ccccncesescteseecess cccsbas enn 
06.34 Propagace ve caseestcevadesescdvavecesereee ys 
46.4 Examples Where Parallel EAs Excel ............ccccceceeseeeeccceeeeaees 
06.4.1 Independent RUNS. 0. .ss.cssccessessdeaseasoesceversaesesee des 
46.4.2 Offspring Populations .............. ccc ccccsseseeeeeeeeeeeeees 
H63 Bland Modala siascsscinensslarwentce te seiie Se reee bnwc daeenacae tucaae 
46.4.4 Crossover Between Islands............cceeceecceceeeeeeeeeeee 
46.5 Speedups by Parallelization .............. cc cece cece eeeneeeeceeeeeeees 
46.5.1 A General Method for Analyzing Parallel EAs............... 
46.5.2 Speedups in Combinatorial Optimization .................. 
46.5.3 Adaptive Numbers of Islands ...............eeeeeeccceeeeees 
GEO ONNO 205.05 cdavitasnns EEEE EEEE EEEE 
46:6.1. Further Reading ... 2.25 scec. cceke cesescs cteaaceessccussas acess 
ROTO ONES E E E E E veeddnetescaraueas 


47 Learning Classifier Systems 


MOr UBE ea eSEE E E E E ESEE 
ik: E ET T e EE A EE E E 
H71 Ear AppUcatoONS siccirorseiroreerceporimssiieeeiciesnisi 
47.1.2 The Pitt and Michigan Approach ...............ccccceeeeeees 
47.1.3 Basic Knowledge Representation ..............ccccceeeeeees 
E E E E T E T E E de 
47:2.1. System DYervieW ...022 sc cece cceles couse eedenaceeteeecslas denen 
47.2.2 When and How XCS Works ............cceeeeeceeeeeeeeeeeeees 
47.2.3 When and How to Apply XCS.......... cc ccee eee e ec eeeeeeeees 
47.2.4 Parameter Tuning in XCS ............ ccc ccceseseeeeeeceeeeeees 
MUGS PSP avs ctciaekdussectias E tes sseeelaeadeleecdaeseseeys cusde valete 
OTA DASMA raice ehetaduadace reese sevens binsedbavetueeesiee ys 
47,5 Behawldtal LGamine...:....scc0..cces cesses esdecciaeceessaecveas sees 
47.5.1 Reward-Based Learning with LCSs ..............cccceeeeeees 
47.5.2 Anticipatory Learning Classifier Systems .................065 


47.5.3 Controlling a Robot Arm with an LCS................ceeeeees 


Detailed Contents 1591 


GTG CONCUSSIONS sos coilececad o5d62s en ateousesadasind o4acabsew EEE aa 977 
47:7 BOOKS ANG Source LOGE. «cscs ecskd cidsdcdccdeedssasw neste idesenegesaes 978 
ROTOT ON CES E tiewin sins gains deiner sautions 26-00 tsa does Soreausesuiegecemedenies’ 979 


48 Indicator-Based Selection 


lothar Iele rriari asnrcioss-c an EA vein sm ew EEA ESEA RA SA 983 
HG. MONIO iiaae i Ee E E cv taataakv EE EREET A 983 
HD Basie COMED 6c; cates sencacdies ger EEEE E eames 984 
oF ae a Ma) C00 | | er ES 984 
B22 SOE WAICHTONS a ivisccscaiese vs cas ddwtaasaawe estas ss vas dias sacs 985 
48.3 Selection SENAIMES «...ddsce0s aed var neve ana tewend chanced vie E vest 987 
68.3.1 Basic Search Algorithm 44... ..0cc.ceeceeaaccs cee seeeaacesaees 987 = 
48.3.2 Exhaustive Selection .............cccceeeeseeeeeeeenseeeeeeees 987 p 
48.3.3 Steady State and Greedy Selection ................ eee eee eee 988 > 
48.3.4 Hierarchical Set Preferences ............cccceeeeeeeeeceeeeees 989 a 
48.3.5 Using Binary Indicatois 366s iia caseeses veces sates comnes sends 989 c 
48.4 Preference-Based Selection .........snesesesessessssssssesosossesesesee 990 = 
8.5 Conduding Remarks 2 iicaadads ona vies s baceenwewamosilec sina ans s seen ele sain 992 
FRET TREES ices x iasc: sss cocciveniesaies ni eneit Eo Te EEEE E ¥ E b viens 993 


49 Multi-Objective Evolutionary Algorithms 


MOV ORION DOE oe enerne iien niai oe viata eecaemavcemetoedeeenaences 995 
KoL Predmier Peabo toda wen eetaleeesauonate demented 995 
49.2 Evolutionary Multi-Objective Optimization (EMO) .................65. 996 
69.2.1. EMO Principles ici cc ccs cs cheed texted idcewea sel dodkeedeeeedees 997 
49.2.2 A Posteriori MCDM Methods and EMO................eeeeee 998 
49.3 A Brief Timeline for the Development of EMO Methodologies ....... 999 
HOG Elitist EMO! NSGASII 3. cc ccecenccseeensseasiew sie seed tes saieceseee sememeas 1000 
69.4.1 Sample RESUS: oo... ese cs oscia cedeecedeieaccen sees veeee daeees 1000 
HOH .2 Constraint Handling in EPMO: cit ccc cevecesaccaceveeseetedsaess 1001 
R95 Appicathans OF EMG csocccccccecceses cesta codes ee cgedac sees shoves iia iea 1002 
49.5.1 Spacecraft Trajectory Design ............ cesses eee eeceeeeees 1002 
49.6 Recent Developments in EMO ..............cee cece ccceeenneeeeeceeeeees 1004 
B9.G.1 Hybrid EMO Algorithms. 2.26.0. .cccesc ces cc ceseesaeew secon 1004 
89.6.2 MUlt-Ob[GCtIVIZATION 0... ccccsccc eed ecesdacees ed beveeseees cos 1005 
69.6.3 Wncertainty-Baséd EMO............060. ccs cectsee cece esee ewes 1005 
49.6.4 EMO and Decision-Making.............ccccccesseseeeceeeeees 1006 
49.6.5 EMO for Handling a Large Number of Objectives: 
Matiy=ObDj6ctive EMO... <0 cscs cctasdcessaceessedceses gave’ 1006 
49.6.6 Knowledge Extraction Through EMO ................ceeeeeee 1008 
BO... Dynamic EMO... c0.0cccsss cde ie et desee sess aeiels e eti iani 1008 
49.6.8 Quality Estimates for EMO. .:........scc..cccseeeseeededenees 1009 
49.6.9 Exact EMO with Run-Time Analysis...............cceceeeeees 1009 
49.6.10 EMO with Meta-Models ...............ccccececceee eee eeeeeees 1019 
BOE MUSE 4 su cisi csindas aE EEE EEE EE E EEES 1010 


Anne E NEE EE 1011 


1592 Detailed Contents 


50 Parallel Multiobjective Evolutionary Algorithms 


Francisco túna; Enrique Albă. -occ ccieascesec ccedeeccksaes asaancostedebees cease 1017 
50.1 Multiobjective Optimization and Parallelism ................see scenes 1017 
50.2 Parallel Models for Evolutionary Multi-Objective Algorithms........ 1018 
50.2.1 Specialized Models for Parallel EAS...............ceeeeeeeees 1018 
50.2.2 General Models for Parallel Metaheuristics................. 1020 
50.3 An Updated Review of the Literature .............. cc cece cece seen eee eee 1020 
50.31. Analysis OY Neal ss orce i dvawde deusdaie ved tin Padciwdesdeseranens 1023 
50.3.2 Analysis of the Parallel Models .............cccccceeeeeeeeeee 1023 
50.3.3 Review of the Software Implementations.................. 1025 
50.3.4 Main Application Domains.................ccccccceeeeeeeeees 1025 
50.4 Conclusions and Future Works sci is sc cisisecdsinechesacs ton scesas deeean 1026 
= R E E E E E E EE 1026 
a 50.4.2 Future Tends cssicsiserieriasiirissiurssrisissrtirikoiureniians 1026 
D ROETE ACES eet a E E E E EEE wlemoreniadaGewleb en 1027 
a 
e — 
3 51 Many-Objective Problems: Challenges and Methods 
, Antonio López Jaimes, Carlos A. Coello CoellO..........ceeeeecceeeenneeeeeeees 1033 
Sic. BAEKErOUNA oiire e E memes 1033 
51.2 Basic Concepts and Notation .........sssssssssssessesosesessssessesses 1034 
51.2.1 Multiobjective Optimization Problems..................0065 1034 
51.2.2 Notions of Conflict Among Objectives...................00ee 1035 
51.3 Sources of Difficulty to Solve Many-Objective 
Dpümization PROBES cciseneiesiseeienrrnas isine Wet nies 1036 
51.3.1 Deterioration of the Search Ability ................. cece eens 1036 
51.3.2 Effectiveness of Crossover Operators ..........sssssssssssseo 1037 
51.3.3 Dimensionality of the Pareto Front .................eeeeeeee 1037 
51.3.4 Visualization of the Pareto Front..................... ee eee 1038 
51.4 Current Approaches to Deal with Many-Objective Problems......... 1038 
51.4.1 Preference Relations to Deal 
with Many-Objective Problems...............ceeeeceeeeeeees 1039 
51.4.2 Objective Reduction Approaches .............cccceee eee ee eee 1039 
51.4.3 Preference Incorporation Approaches................eeeeeee 1041 
51.5 Recombination Operators and Mating Restrictions................... 1042 
51:6- SCalahizationy MEINGMS va.8 ccsss ee tisacisis saddens ctseaes don scesse deeean 1043 
51.7 Conclusions and Research Paths .............. cece cece eee e eee eeeeees 1043 
RETETE DCE e ia a EE EEE E mad EE EEE E 1044 


52 Memetic and Hybrid Evolutionary Algorithms 


Jhon Edgar Amaya, Carlos Cotta Porras, Antonio J. Fernández Leiva ....... 1047 
Sill: MV SIVIOW co icici saitne wanes peneidieeis ood EEE EE EEE ESE 1047 
52.2 A Bird's View of Evolutionary Algorithims.............ccccceeeeee eee eee 1049 
52.3 From Hybrid Metaheuristics to Hybrid EAs ..............ccceeeee sence 1050 

52.3.1 Hybridization Mechanisms..............ceeeeccecceeeeeeeeees 1050 

S22 VE BAS cciu knee onc vaseatgasecateweuts 1051 
524 Men@tc AIBOMUNINS searre ceccelawocyides des access eea 1052 


52.5 Cooperative Optimization Models ..............c cece cccceeeseeeeeceeeee 1055 


Detailed Contents 


52.6 


ONCOS voces veloc aede ah swiesSoncencesswing aoae aa tan cease tind eae 1056 


RETOTONMCES (onda decssccdes can ciendiewarcseds bucedede saseneessaves boa bee esonedees 1056 


53 Design of Representations and Search Operators 


Franz ROtMOUT ois0's x sin sin's Givierebin edad sine Sinise iira vid wid dinrsinie were ie ¥ win deed X wine 000 Do 1061 
53.1. Representations «5. wise ccneine vets cca cine anwan «vias na advice e@ nisin s woman coals 1061 
53.1.1 Genotypes and Phenotypes.............cceccccceeesseeeeeees 1062 
53.1.2 Genotype and Phenotype Search Spaces ..............0000e 1062 
53.1.3 Benefits of Representations ............. cece ccceee sees eeeees 1064 
53.14 Standard GENCE POS aiiis cies cscs cseceacevins cos ness wwwas eoulee 1064 
53.2 Search Operat. v....isic bs ace wieieissecnis s vivrernsn t tent EnEn E EE Wein s bate ts 1065 
53.2.1 General Design GUidelineS.............. cece cc ccceeeeeee tees 1065 
53.2.2 Local Search Operators o.oo saced beedse wae wed bale deo ou base oe 1066 
53.2.3 Recombination Operators ............ccce cece cccceeeeeeeeeees 1068 
53.2.4 Direct Representations 0 aie sisi sis v siecwisiees vues vieieiee ¥ ven oie doin nce 1069 
53.2.5 Standard Search Operators... sesccavicsaencs cnosans vase 1070 
53.3 Problem -Specific Design of Representations 
and search Operati vcvivcs cawiscew ae vice s paren a sew se tnik an mendes sialon 1071 
53.3.4. ogc || |) ae eee eer 1072 
53:32 Biang SEAN sscescucksnshs teelen) sdaeanrnned panes vewhau dean 1075 
53.4 Summary and Conclusos. sccis ses cae diva sic F corsac b hea dpoe wwe ose waco oe 1079 
RETOrETCES ins 5 sinisics avicisicc a.nwia.cs s untae EAE sa sins EENEN EEEE « oie alee a EEEE 1080 


54 Stochastic Local Search Algorithms: An Overview 


Holger H. Hoos, Momas STUIZIG a0 sc5 00s 8. cose desk a adeabiveen da dewsees dasaes 1085 
54.1 The Nature and Concept Of SLS .......... cece cess eee e cece eeeeeeeeeeeees 1086 
54.2 Greedy Construction Heuristics and Iterative Improvement ......... 1089 
54.3 Simple SLS Methods 2. coc cccceccdece ens sen cece be danedoabesswee anes 1091 
54.3.1 Randomized Iterative Improvement .................eeeeeee 1091 
54.3.2 Probabilistic Iterative Improvement ................eee eens 1091 
54.3.3 Sleilated Anna AS vec scce sec coscweseveven ceussoeneecteews as 1092 
Sat FADO SOG isc rE ENEE EEES 1092 
54.3.5. Dynamic Local Sear occasie ev ceed cccdes ecees coewses ceaaaes 1093 
SG Hybrid SUS Methods 6c once cccssceetesviccseaneetaescentceceseseaseces 1094 
54.4.1 Greedy Randomized Adaptive Search Procedures.......... 1094 
54.4.2 Iterated Greedy Algorithms ............. ccc cece ccee seen eee ees 1094 
SA43 erated Local Seah 2.5.5. csesetesecctieaecess scene ectesees 1095 
54.5 Population-Based SLS Methods .............ccceeee cece ccceeeeseeseeees 1095 
54.5.1 Ant Colony Optimization .............. eee ccee seen eee e ees 1096 
54.5.2 Evolutiowary Algorithms ... .coscccscsceseecouncceesescasescs 1097 
54.6 Recent Research Directions. sessesssrssrssrsrerircrocsserecssssssosress 1097 
54.6.1 Combination of SLS Algorithms 
with Systematic Search Techniques ............ccccceeeeeees 1098 
54.6.2 SLS Algorithm Engineering ............... cece ccceeeeeeee ences 1098 
54.6.3 Automatic Configuration of SLS Algorithms................. 1100 


Aa E E stewie aul deaseacemstos EE E 1100 


1593 


*}u0) payie1ag 


1594 Detailed Contents 


*jU0) paylejag 


55 Parallel Evolutionary Combinatorial Optimization 


5 


5 


6 


T 


El-Ghazgli TODI oc sveusaccencescascveexdastessete E E E E sedeveee eens 
SSI MONING 02 cccssnesnceesscseamceslamecsavsauaaeniesieuncabontasenauaaes 
55.2 Parallel Design of EAS. cccd.ccecs caceded bev seeeeed cheebaa dees badese eau eed 
55.2.1 Algorithmic-Level Parallel Model...............cceeeeeeee ees 
55.2.2  Iteration-Level Parallel Model .............cccccceeeseee eee 
55.2.3 Solution-Level Parallel Model ..............ccccceees eee ee ees 
55.2.4 Hierarchical Combination of the Parallel Models .......... 
55.3 Parallel Implementation Of EAS ............ cc cccce esse cece ceeeeeeeenees 
55.3.1 Parallel and Distributed Architectures ................e0e0ee 
55.3.2 Dedicated Architectures... ........cccccesesaeceveesceseseeens 
55.3.3 Parallel Programming Environments and Middlewares ... 
55.3.4 Performance Evaluation...............cceeeeeccccceeeeeeeeees 
55.3.5 Main Properties of Parallel EAS ............. cece cee ee eee ee eee 
55.3.6 Algorithmic-Level Parallel Model...............cceeseeeeeees 
55.3.7 Iteration-Level Parallel Model............. ccc cccceeeeeeee ees 
55.3.8 Solution-Level Parallel Model ..............ccccceeee eee ee ees 
55.4 Parallel EAs Under ParadisEO .....2...ccsac.ccescscesdacccsesccvses ceases 
55.5 CONCUSIONS and Perspectives .......0icceisectessensseeceserseveesacees 
Referens «oo. 056 sodaes.oc ghee cca cnc cis ts oebaa cues cw lesen sdesaacses oorwsae aden 
How to Create Generalizable Results 
Thomas Bartz =BCIGISTOIN ic ec caste sien van ned ve sai) ensk EENE onan 
56.1 Test Problems in Computational Intelligence .................eee eee ee 
56.2 Features of Optimization Problems ..............cc cece ccceeeeeeeeeeees 
56.2.1 Problem Classes and Instances ............cccccceeeseee ences 
56.2.2 Feature Extraction and Instance Generation............... 
56:3  Algornthm Feats ois oii seam die aaivinic ace tiene nwiv oe aia born ocd Cae 
56:3-1. Factors and LAV CNG sos iecc cs cassie s inni eenei reei iinis oe 
56.3.2 Example: Evolution Strategy ............. cece cc ceeeee seen eens 
56-4: Objective FUNETIONS.... . cctv ccvvev canta ee ded eevew sn dy bee Ue ibe eneeeb euven 
Seo GaSe T a e sears a seieds cas vas an 0a EE A Be Ode co 
56.5.1 Single Problem Designs: SASP and MASP..............0ee00e 
56.5.2 SAMP: Single Algorithm, Multiple Problems ................ 
56.5.3 MAMP: Multiple Algorithms, Multiple Problems............ 
56.0. Summary and DUDOK 6. caicesssasss cannes aa wets denni enikan Bellew coe 
PROTO NGI COS casio ais: n sinisvseincsie wiesesninsy tornino Eine da Siete 4 EEEE ESE E bint 
Computational Intelligence in Industrial Applications 
Ekaterina Viladislavleva, Guido Smits, Mark KotancheKk............cccseeceees 
57.1 Intelligence and Computation ........... ccc cccce eee e cece ceeeseeeenees 
57.2 Computational Modeling for Predictive Analytics..................065 
57:24:14 BUSINESS ANAVI 22.2.0 c. se ccsssvecesesciesaecdise cusses nisa 
57.24 “PROCESS ANAVES so ocis ci icdacaecavscdsocmtauecsccuteetes esses es 
5ST:2:3 Resear ANGIE oinin in ean aaa 
57:3 MEMS ennnen n a aR E iE E ERD 
STe WoNOWS eee rne socneavcedeadiins dcewed daddies dgedesenan esd 


Detailed Contents 


57.4.2 Model Developmeltt... 0... s0..scccecs cece wees cov ceteeeveneaes 
57.4.3 Problem Analysis and Reduction..............cceeeeeeeeeees 
Ree IMIS | ccc enega lacie cosh EE a EEE TEA 
57.5.1 Hybrid Intelligent Systems for Process Analytics ........... 
57.5.2 Symbolic-Regression Workflow for Process Analytics...... 
57.5.3 Sensory Evaluation Workflow for Research Analytics ...... 
RE T TE E ease set vel die anaccis necddsaniccsa ee E yedecnsess 
ROTORON CCS E E E ee eacaw A dann ae cee ssmensee el avens 


58 Solving Phase Equilibrium Problems 


by Means of Avoidance-Based Multiobjectivization 
Mike Preuss, Simon Wessing, Günter Rudolph, Gabriele SadowskKi.......... 
58.1 Coping with Real-World Optimization Problems ..................... 
58.2 The Phase-Equilibrium Calculation Problem................eeeeeeeeee 
58.3 Multiobjectivization-Assisted Multimodal Optimization: MOAMO... 
58.3.1 Basics of Multiobjective Optimization ...................006 
58.4 Solving General Phase-Equilibrium Problems ...............eeeeeeeee 
58.4.1 Ternary Liquid-Liquid Equilibrium: 
Water/Methanol/MMA .....c60. ects cedaeseswnsscasaasecaenes 
58.4.2 Three Phase Equilibria: Water/MMA and Water/Furfural... 
58.4.3 Obtaining the Phase Diagrams .............cccceeeeeeee cece 
58.5 Conclusions and QUEOGK ci. 53 cc6 csc ice es issasetsadsacssas ver renceiesss 
ROPRTOMGOS oi. cose cc ia innws aasiew iaia cowed dee EEEE end dees EERE 


59 Modeling and Optimization of Machining Problems 


Dirk Biermann, Petra Kersting, Tobias Wagner, Andreas Zabel.............. 
59.1 Elements of a Machining Process .............ccee eee ccceeeeeeeeeeceees 
59.2 (DES ODTINNZAON snoa Meeeneeteeesaceabens 
59.2.1 Optimal Design of Machines .............ccccccceeeeeeeeeeees 
59,.2,2 TO) DBUIMIZAU OM iasccscncncdc sernassa tate ine ieia 
59.2.3 Workpiece Layout Optimization ............... cece eee e eee ees 
59.3 Computer-Aided Design and Manufacturing................eeeeeeeeee 
59.3.1 Surface RECONSIUCTION 0 icccecsiiseredes caved oavenwenaerens 
59.3.2 Optimization of NC Pats... ......c0cccccceees cies ctenssanees 
59.4 Modeling and Simulation of the Machining Process ................. 
59.4.1 Empirical Modeling..........esessesssssssssssssoessssesseses 
59.4.2 Physical Modeling for Simulation .................ceeeee eens 
59.5 Optimization of the Process Parameters ...........ccccceeeeeeeeeeceees 
59:6 -Process MOMITONNG ccc ccasisa cus ds ceuvavornce n iaaiiai i 
59.7 Visalia yas csinse nia x srsverors cretsheccie ora n trainee evisinisiord nie ore dwt w's 
59.8 Sümmani and DULG -ccc nerien e ceini neia 
RETETE COS oe isc nivcninndscsc nod nA EEEREN 


60 Aerodynamic Design with Physics-Based Surrogates 


Emiliano luliano, Domenico Quagliarella ......... cece eee eee c cece eeeeneeneeees 
60.1 The Aerodynamic Design Problem .............cccccceceeseeeeeceeeeees 

60.1.1 Problem Approximation............... cc eee c cc cceeeeeeeeeeees 
60.2 Literature Review of Surrogate-Based Optimization.................. 


1595 


*yU0) pəļ!e4ə0 


1596 Detailed Contents 


*jU0) payiejag 


60.3 POD-Based SUMGBAIOS..o6.0c5 cc ce cedecaads nee saved vacaseeesedstes bea costs 
60.3.1 Model Order ReductUOM.  iicccissndcieccessaws tan scceas enecan 
60.3.2 POD Theory and SOUNION 0.06... ccce secs sen cdeeeendeeieeess 

60.4 Application Example of POD-Based Surrogates ...............ccceeeee 
60.4.1 Parameterization and Design Space Definition ............ 
60.4.2 Design of Experiments .............. cece cece cceeeseeeeeeeeees 
GUia Tow POD sc ca siee an sitedumsace E E EEEE 
60.4.4 Model Training, Validation, and Error Analysis............. 

60.5 Strategies for Improving POD Model Quality: Adaptive Sampling.... 
POSA Rationale acs. vedere oe de EEEE EEEE E EEN 
60.5.2 Improvement of the Modal BaSis.................eeeeeeeeeee 
60.5.3 Improvement of the Modal Coefficients .................... 

60.6 Aerodynamic Shape Optimization by Surrogate Modeling 
and Evolutionary Computing ....c2...cccis cosets ececsacceseeeew dae edans 
60.6.1 Problem DEMMIMION «0.05. 66s cc cccececstecccescsccnsceeceeses see 
60.6.2 Optimization Strategies and Setup..............e cece cece eee 
60.6.3 Non-Adaptive Optimization Results ................ce cece ee 
60.6.4 Adaptive Optimization Results ................ cece cece eeeeee 
60.6.5 Optima ANalySiS. ccc. idee cease case sees tes csceetdeeiiaess 

POT. COTWUSIGUE: «155662 bce Sewse E sade awciews wos EE aE 

PAE T ci E A E A 


61 Knowledge Discovery in Bioinformatics 


Julie Hamon, Julie Jacques, Laetitia Jourdan, Clarisse Dhaenens........... 
61.1 Challenges in Bioinformatics ..........ssssessesnsesessssoseseessseseees 
61.2 Association Rules by Evolutionary Algorithm in Bioinformatics ..... 
61.2.1 Association Rules Discovery .........ssessesssesseeseeseesees 
61.2.2 Evolutionary Approaches for Association Rules 
if Biona enninu 
61.3 Feature Selection for Classification and Regression 
by Evolutionary Algorithm in Bioinformatics.................eeeeee ees 
Glad Feature GRUB CMON cid. ceca cece siedens aes sirenen sian 
61.3.2 Evolutionary Approaches for Feature Selection 
for Classification and Regression in Bioinformatics ....... 
61.4 Clustering by Evolutionary Algorithm in Bioinformatics.............. 
LGU). CSIC CNE os cece ocedd voce sddees lecceed bs pudacces seanweeeeeess 
61.4.2 Evolutionary Approaches for Clustering in Bioinformatics 
DLS “COMMON: ean beets E e sams ss wlan lau casioes eosin cade 
OTOP OMS porres EE E EENE E SESE 


62 Integration of Metaheuristics and Constraint Programming 


tuca DY GOSDCIO so ccc ccc renon x detains ds aE SE KE EE ETEEN Sa 
62.1 Constraint Programming and Metaheuristics ...............ceeeeeeeee 
62.2 Constraint Programming Essentials ...............ccccceeeeeeeeeeceeees 
PAPE MOM CUNG  sncpeesidatvcnedencasccdeat E E 
622:7 Solun MEMON S oiisaccts vidoes cccccndebeceseote dee desea cns 


CENAE “BM E TG E T E E N N anaes 


Detailed Contents 


6 


Ww 


62.3 Integration of Metaheuristics and CP ......... ec cce sees eee e ee eeeees 1230 
62.3.1. local Search and CP win... necks eae ccccsaeeeds cheebes caeenes 1230 
62.3.2 Genetic Algorithms and CP... 02.6 cceccc. ccc. caecsesededen ces 1233 
62:33 WE BHM CP iernii rosenn nnana ea 1233 

PA E e A A E E E 1234 

ROETEROMCGS oocicceocecwecsiasbivceusiisaatessoniede a e ee eai 1235 


Graph Coloring and Recombination 


ROV LOWI oera en a E E E a EE ia 1239 
GIL Graph Onn. ecenin ae e iaa 1239 
63.2 Algorithms for Graph Coloring... ccsssiccsesrrcrrecesciscsoseccisases 1240 

63.2.1 EAs för Graph COloting x «ssc cccied ecessacces ode eeavensaaes 1241 
GS? SEN cic asic nat dacstantuie cine testy sames aonruonnssedaealentorcecswetaroske 1244 

63:3.1 Problem (INSTANCES 5 secs cerca vaceid soeexda codes eens ceetiaas 1245 
Go Pemet seccina a E E saleannae@ensdunees 1246 
63:5  EXPENMEMT 2 ivoiks cadewaiivewis aE T EEE iN 1249 
63.6 Conclusions and Discussion .........sssssssssssesesesseseessesreseeses. 1251 
PEIE la E E E E Meee ceeded s 1252 


64 Metaheuristic Algorithms and Tree Decomposition 


Thomas Hammerl, Nysret Musliu, Werner SChafhauser...........00seeeeeeees 1255 
6-1 Tee DECOMPOSES ccor cc ces oes. tue cade iiini an i Eis 1256 
64.1.1 Fornal Definitions. s ssssssssrcsrsrosssesresiasiciirceseisass 1257 
64.2 Generating Tree Decompositions by Metaheuristic Techniques...... 1258 
64.2.1 Genetic Algorithms for Tree Decomposition ................ 1258 
64.2.2 Ant Colony Optimization for Tree Decomposition .......... 1260 
64.2.3 Iterated Local Search for Tree Decomposition .............. 1263 
64.2.4 Other Techniques for Tree Decomposition.................. 1266 
64.2.5 Comparison of Algorithms for Tree Decomposition ........ 1266 
64.2.6 Application of Tree Decomposition 
in Metaheuristic TeECHNiQUES............ ccc cee ceeeeeeeeeeees 1267 
643 Ondu ce caisesicdianesrniewiiaainen TEn EEEE EEEE E Vena oe vests 1268 
RETErENCES eripi n orita pi ninin nE REE sense dandead saad ana ces names TENERNE 1269 
65 Evolutionary Computation and Constraint Satisfaction 
DANTE, VUN PEME cre ccc viv bnsslasinrs iea aa a E E ew eis E EA 1271 
65:1. Informal Iintioduúction to CP ecresrssrsss ri eaten nocd eceenetadac 1271 
65.2 FORMAN DOTTIE os oss isisia seis rian cies wince gd disiermeseiaisroaaimiaibioracseieeioina oeas 1272 
65.3 Solving CSP with Evolutionary Algorithms ................eee cece eeeeee 1273 
Go. Sik CURSES: cic cd sieciwasadeadedinnsee new sclsacedaeameduneas 1273 
G5.3.2 Wrdirect ENCOOINE 5. cccsnccicecciesetexeses ond tode ea nceweaad 1273 
65.3.3 General Techniques to Improve Performance .............. 1274 
65:4 Peroimance THCICATONG snes cae neve saceses ccd sonsaseades te akeaeonecdaed 1275 
OSEI ACEM sc cansaenedsbesineea canaae tee ee nes RE N 1276 
Gos PMEU e ri me ooteseue ees seen dead E RE 1276 
65.5 Specific Constraint Satisfaction ProblemS..........ssssssessssssseeses. 1277 
65.5.1 Boolean Satisfiability Problem ................ cece cece cece 1277 


65.5.2 Graph TOONS: oeeitenis enone acca cetinnecenadhnmsemtas ioni 1278 


1597 


*yU0) payie1ag 


1598 Detailed Contents 


*jU0) paylejag 


65.5.3 Binary Constraint Satisfaction Problems .................04% 1278 

65.5.4 Examination Timetabling .............. cc ccc ccee nsec eee eeeeee 1282 

65.6 Creating Rather than Solving Problems................cseeeeeeeeceeees 1283 
65.6.1 Evolving Binary Constraint Satisfaction 

Proite IMAME ci. eiae e a 1283 

65.6.2 Evolving Boolean Satisfiability Problem Instances......... 1283 

65.6.3 Furthër INVESHBAUONS .. ce ..c6... 0. ccev sees ce cceesenweetenses 1283 

65:7 CONCIUSIGONE and FULUIE DINECHIONS cecciises cea ceceasee tice scecean csdeen 1284 

RGTOTO NCES ios sos siccanacesnseabnat buns E E 1284 


Part F Swarm Intelligence 


66 Swarm Intelligence in Optimization and Robotics 


Christion BRUM, Roderich GOB o.2 scccscsc tiveeieieeccuedecassa cous deoesee eden 1291 

Me TROT E T E E see 1291 

G6:2 SIMPA onnan e E E a D a cadens 1292 

66.2.1 Ant Colony Optimization .........ssssssesessssssesesesssesesee 1292 

66.2.2 Particle Swarm Optimization ................ ccc eee e cece eeeee 1293 

66.2.3 Artificial Bee Colony Algorithm ............... cs eee e cece eens 1294 
66.2.4 Other SI Techniques for Optimization 

ANd Management Tasks ...........sseeeeeccccceetseeeeeceeees 1295 

66a Slit MOOS? Swarm Robat. eessen 1296 

eee E E S S E 1296 

eE E E E demsmocedeia aides emeopeaiiney 1301 

GG Research Challenges e cccnccccaicesomarsnii esenee 1302 

Gae E E T E tena tessosea deen 1303 


67 Preference-Based Multiobjective 
Particle Swarm Optimization for Airfoil Design 


Reber CONSE, MIQORONG H eesi aian aai reS E a 1311 
Orl -Atl DESB occcnccnsra peaca nae cand an tides aeaii 1311 
67.1.1 Airfoil Design Architecture .........2.......cccsesenveeseeees 1311 
67.1.2 Intelligent Optimization: PSO.............. cc eee e eee e cece eens 1312 
67.1.3 Multiobjective Optimization ........... ccc cece cece eee eee eee 1313 
GT-14 Surrogate Modling .. 02.04 ccciceccescecevdac ces seocvees ceases 1316 
67.2 Shape Parameterization and Flow Solver ...............seeeeeeeceeees 1317 
67.2.1 The PARSEC Parameterization Method.....................6. t17 
GF 2.2 Transonic POW SOlVEr s ecssecicrsessiirisersessssesnsseissasi 1318 
Bo Opimizaton ABONA d0scic sce ceseda Gea heeteeduaw dice ces oneeds cadies 1319 
67.3.1 The Reference Point Method..............:ceeeeeee esse eee ees 1319 
67.3.2 User-Preference Multiobjective PSO: UPMOPSO............. 1320 
61:33 PIE Modëlng occ. ci cceecice sien cevewescccdateetdebsavess 1322 
67.3.4 Reference Point Screening Criterion................c eee eeeee 1323 
67.4 Case Study: Airfoil Shape Optimization ................ ccs cece eee eee eee 1323 
67.4.1 Pre-Optimization and Variable Screening.................. 1324 


GT:4.2 Optimization Results . 2... con. else ceveces csecsseenweetaness 1325 


Detailed Contents 


67.4.3 Post-Optimization and Trade-Off Visualization............ 
G74 Filial DOSISMS..2. 25. ccccc0s.ccese sedans scessaeted i eecndes caaeaes 
oTa PACU eos. sria cis n e EE EEE EE EA 
ROTOTO NCGS 5 2.55.5 E E TT 


68 Ant Colony Optimization for the Minimum-Weight Rooted 
Arborescence Problem 


Christian Blum, Sergi Mateo Bellido ..........cc cece cc cccccennneeececeeeenneees 
68:1. Introductory REMArKS wack vin coe scscicacaissiusesteaenn innie i 
68.2 The Minimum-Weight Rooted Arborescence Problem................ 
68.3  DP-Heur: A Heuristic Approach to the MWRA Problem............... 
68.4 Ant Colony Optimization for the MWRA Problem................ceeeee 
68:5 Experimental BvalUAUION : vccc.c.cccdse access econ. ceeeen iiaa iae 

68.5.1 Benchmark Instances ............cceceee eee rece eee e eee eeeeees 

68:5.2 . Algorithm TUNE ereenn vec eds soescee conse teaws ween 

Bae, RESUS uc aiidaieisaawed nog E E smed nies 
GEG CONCIUSIONS and Future Work eccc score ccctiasccavea conte aieneensenau’ 
POTOP ECCS cancan a E ence Dane EEA 


69 An Intelligent Swarm of Markovian Agents 
Dario Bruneo, Marco Scarpa, Andrea Bobbio, Davide Cerotti, 
Marco CUO soci erir rerit nisitan t Saisie ed A EEN NECES TEANA TEE EEEE ni 


69.1 Swarm Intelligence: A Modeling Perspective..............ceeeeeeeeeee 
69.2 Markovian Agent MOGEIS....c...c065..ccceecewtewedvvan ented sie sede aves 
69.2.1 Mathematical Formulation .........ccccsccccsceseseeesessees 
69.3 A Consolidated Example: WSN Routing .............. cesses eee eeeeeee 
69.3.1 A Swarm Intelligence Based Routing........................ 
69.3.2 The MAM Model ...5. 65 cc cee cecccevs aces wane segue canoes beeen 
69.3.3 Numerical Results .i.5 cscs sass coaedaasiear advices eviio sees 
69.4 Ant Colony Optimization 66. osc. cee edieecs v wiew'sieivs wie dintw nies i vine oa a vss 
6941 The MAM Model: sics srsris srira sesanceacevaanes rinia sac 
69.4.2 Numerical Results for ACO Double Bridge Experiment..... 
69.5 CONCUSSIONS eucretnuarsur rer ireann ans siete sine ERARE ERNA EER aN 
ReferenCës sosincs svori renit rn irinen i En EE EEEE EE E Caine oe eens 


70 Honey Bee Social Foraging Algorithm for Resource Allocation 


Jairo Alonso Giraldo, Nicanor Quijano, Kevin M. Passino ............eeeeeees 
70.1 Honey Bee Foraging AIBOFthM -er ceccccersiirssssirrecresserccneniaise 
70.1.1 Landscape of Foraging Profitability ......................006 
70.1.2 Roles and Expedition of Bees ............. ccc ceee eee e eee e eee 
70.1.3 Dance Strength Determination ................cce cece ee ee eee 
70.1.4 Explorer Allocation and Forager Recruitment.............. 
70.2 Application in a Multizone Temperature Control Grid................ 
70.2.1. Hardware DOSeMOUON oc cc ccsccs doce odes vee ce es eeeeecdeatads 
70.2.2 Other Algorithms for Resource Allocation................665 
TOS BSUS e teed nas die racine tee meee totadmancataee ae cp EEA E aa 
70.3.1 Experiment |: Maximum Uniform Temperature ............ 
70.3.2 Experiment Il: Disturbance ............ cece ccc eee eens ence ees 


70.3.3 Experiment Ill: Multiple Set Points................. cece cece 


1599 


*yU0) payie1ag 


1600 Detailed Contents 


*jU0) paylejag 


T 


2 


POS PSU Onana a a EE EEE O casts 1373 
TOS  COMCOMGNS corsini iini E E a E 1374 
EAE T T E E, 1374 
71 Fundamental Collective Behaviors in Swarm Robotics 
Vito Trianni, Alexandre COMIN -o ercrrcrericisierccesereiriiacd inseri iasiine 1377 
71.1 Designing Swarm BehaviðulsS..cecsesscsrrrcercssicrrercicisesssioasas 1378 
71.2 Getting Together: Aggregation ...........ssssssessosesssessesessssssese 1379 
71.2.1 Variants of Aggregation Behavior ........sssssssssessssssese 1379 
71.2.2 Self-Organized Aggregation in Biological Systems......... 1379 
71.2.3 Self-Organized Aggregation in Swarm Robotics............ 1380 
PLidat Other SIUCIOS 2.2 sive ccescdcves te ecxedoccssancdsconetees eens 1381 
71.3 Acting Together: Synchronization ............. cece cece cee ee eeeeeeeeees 1381 
71.3.1 Variants of Synchronization Behavior..................ee0ee 1381 
71.3.2 Self-Organized Synchronization in Biology................. 1382 
71.3.3 Self-Organized Synchronization in Swarm Robotics ....... 1382 
These Oer STIG. 55. edocs ecko cceesnsecasetestsaccennesateweaiee ns 1383 
71.4 Staying Together: Coordinated MOtION..............cccceeee esse eeeeeee 1383 
71.4.1 Variants of the Coordinated Motion Behavior.............. 1384 
71.4.2 Coordinated Motion in Biology ............. ccc ccceee eee ee eee 1384 
71.4.3 Coordinated Motion in Swarm Robotics .................065 1385 
TL Other StUdiës. scoene access cndeset cass aecdid doewdev eden 1385 
71.5 Searching Together: Collective Exploration ................eeeeeeeeeeee 1386 
71.5.1 Variants of Collective Exploration Behavior................. 1386 
71.5.2 Collective Exploration in Biology ...............cceee eee ee eee 1386 
71.5.3 Collective Exploration in Swarm Robotics .................. 1386 
71.6 Deciding Together: Collective Decision Making ...............eeceeeee 1388 
71.6.1 Variants of Collective Decision Making Behavior........... 1388 
71.6.2 Collective Decision Making in Biology .................00005 1389 
71.6.3 Collective Decision Making in Swarm Robotics............. 1389 
PLC VEGF SWIS coc. tiiedac sone saeeeeeoreees aeckbeeecesotuns es 1390 
TL? CONGUSIGNS 22 ish de che ic cesdiceiiac tee biecnie aeusce devdan AA 1390 
RETORE ICES E E E A O EE 1391 
Collective Manipulation and Construction 
lynne Parko eecieeevocsc rroen giatarwinisd face iere d i eeni i E i a Winton ald E 1395 
T21 Object TAnSPOMAUON ..i5a6sc5 tacnttianiateeiacsewien auaetestsatawaees 1395 
TaLi Transport by Pushing. ose csccvcis. via vecws voce satie sessed aiiis 1396 
2.1.2 Mansport, by GASPING occ: aiid scenes charmer piecaadtcies sevice cee 1397 
C213- “TRANS POR BY CAB oo cac sues oi ead ewe eO tev ow E 1400 
222 Object Sorting and CIMSHONING si cinccincceccrcasseveas princes eesasaveacen 1401 
72.3 Collective Construction and Wall Building..................c cece eeeeee 1402 
Tau COMOIU NANG ccc saicics s.o0ons ain oad sama eeanes sew aag ahbeuis maeeaa ee site ade eda ea 1404 
RETERENCOR . 2. . os aa nis cach cad evs s Coban eaaivwe ee hws anise ve Maw edae esis ean ane 1404 


73 Reconfigurable Robots 


KOEDEN los dco TE dense s TE REEE EE N a EERTE 1407 
73.1 Mechatronics System Integration ........sssessessosessssssesessssssese 1409 


Detailed Contents 1601 


73.2 Connection Mechanisms ............ ccc cccceeeseeeecceeeeeseeeeeeeeeeees 1410 
Paes ENEI ith mete s denn easdacdvinaadasedecidiaddnsdhodvenadare dectmuasdnieds 1411 
T36 Disthibuted COME. cenere a a 1412 
Po SL ComM e nainn eane i a a iai 1412 
T342 LOCOMOTION oorisicer annainne 1413 
73.4.3 Self-Reconfiguratión o.cc5 cei cocccs sSeecee conde teews cewedaas 1414 
T344 Mañnipúlatiðn socjecicccidcesnontccssdectnnseenewecisatadatameduneas 1416 
73.5 Programmability and Debugging ............... ccc ccccceeeeeeeeeeeeeee 1417 
73.5.1 Iterative, Incremental Programming and Debugging...... 1417 
T3S:A SiM coarne e ee ii 1417 
Faced Emering Solio -ccrois niii 1418 
73:0 Pepee: cccsieccceesosesscestes e EE EEEa aS 1418 
Jar Further Reading oireccsrecsecicosorei isra rerasanan 1419 a 
e EE 1419 a 
Oo 
74 Probabilistic Modeling of Swarming Systems = 
Nikolaus Correll, Heiko Hamani rssssesserccesessiansiorossseiseivsesenssari 1423 S 
74.1 From Bioligical to Artificial SWarms............. ce cece cceeee eee eeeeeeee 1423 Ge 
T42 The Master EQUATION 2 ccaiciesec cen cece bedcccis seedy saeeen chad ane iaei 1424 
74.3 Non-Spatial Probabilistic Models .............. ccc cece cc eeeeeeeeeeeeeee 1424 
PHIL: COUBDOIAHON ns. i davscadevcsetcca sacked soescee conse teaws veeeeaad 1424 
74.3.2 Collective DOCISIONS .. 0050... ices eeee wee snsacedeeeeedeneas 1426 
74.4 Spatial Models: Collective Optimization ............... cc cess cece eee eee 1428 
PE “OWS GN esirin nimena haan tad tucmsnenes a aR 1431 
Referens (0350 ce dec deta naa de chung de sdeevea rete Sued cedads 1431 


Part G Hybrid Systems 


75 A Robust Evolving Cloud-Based Controller 


Plamen P. Angelov, Igor Škrjanc, Sašo Blažič ........ccccc ec cecec n 1435 
75.1 Overview of Some Adaptive and Evolving Control Approaches ...... 1435 
75.2 Structure of the Cloud-Based Controller...............csecceeceeeeeees 1437 
75.3 Evolving Methodology for RECCO .crssrssissscsicsscssnsecrecircnernes 1439 

75.3.1 Online Adaptation of the Rule Consequents ............... 1439 

75.3.2 Evolution of the Structure: Adding New Clouds............ 1440 
POA Simulation SAY eere ieee iiy iie n bho teats ces AC 1442 
T5.5 CONCUSSIONS 25. .cacscccrsecataccerseseeeateasaaweansacsegeweceas dead eames 1447 
Refere MEES irice deirante Seesaw eddie sews (adda peeanw KEE ESEA EEKE 1448 

76 Evolving Embedded Fuzzy Controllers 

Oscar H. Montiel Ross, Roberto Sepúlveda CruZ..........ccccccceeeeceeeeeeees 1451 
A a e A E T 1452 
76.2 Type-1 and Type-2 Fuzzy Controllers 2... ccc cece cscs tadced eed eecenens 1454 
To: Host TACWWOlO ON errain henee ten seeuun these alacant 1457 
76.4 Hardware Implementation Approaches ...............ceeeeeeeeeeceeee 1458 

76.4.1 Multiprocessor Systems 2.00.60 .ccccc cceecee ceed sedeeeeeeeiaes 1458 


76.4.2 Implementations into FPGAS .............c ccc ceee essence ees 1459 


1602 


*jU0) paylejag 


Detailed Contents 


76.5 Development of a Standalone IT2FC ........... eee eee eee ee eee 
76.5.1 Development of the IT2 FT2KM Design Entity.......... 
76.6 Developing of IT2FC COprocessOrs .......... cscs eee e eee cceeeeeeees 
76.6.1 Integrating the IT2FC Through Internal Ports.......... 
76.6.2 Development of IP Cores ..............60..ccceecer essen 
76.7 Implementing a GA in an FPGA .....ssssssssesssssosesssessesesse 
76.7.1 GA Software Based Implementations .................. 
76.7.2 GA Hardware Implementations................ceeeeeees 
76.8 Evolving Fuzzy Controllers e ccccicrreceresersisisereisessscssaas 
76.8.1 EAPR Flow for Changing the Controller Structure...... 
76.8.2 Flexible Coprocessor Prototype of an IT2FC............ 
76.8.3 Conclusion and Further Reading ................00eeeee 
PAS a A E E E E E to 


77 Multiobjective Genetic Fuzzy Systems 

Hisao tshibuchi, Yusüke NojiMä .s..irersrscisisncsrerreresreressrisssccs 
TTL Fuzzy System Design ..ecscesercnsicsiciresecesrisisnsece isise 
ThA Aeeuracy MOXIMIZAUON .c...ccca ccvevin deeds nirai eias 
T7:2:4. Types Of Fuzzy RulēS. o sensiscseesn cessaevensdadesences 
77.2.2 Types of Fuzzy Partitions ........esnsssneeesssssoesssese 

77.2.3 Handling of High-Dimensional Problems 
with Many Input Variables................seee eee eeeeeee 

77.2.4 Hybrid Approaches with Neural Networks 
and Genetic Algorithms ............ ccc cece ccc eee eee e eens 
77.3 Complexity Minimization ............ ccc cccce eee e cece eeeeeeeeeees 
77.3.1 Decreasing the Number of Fuzzy Rules ................ 
77.3.2 Decreasing the Number of Antecedent Conditions.... 
77.3.3 Other Interpretability Improvement Approaches ..... 
77.4 Single-Objective Approaches .............cccceee eee cecceeeeeeeeees 
77.4.1 Use of Scalarizing Functions ............. ccc cess seen eee 
77.4.2 Handling of Objectives as Constraint Conditions...... 
77.4.3 Minimization of the Distance to the Reference Point 
77.5 Evolutionary Multiobjective Approaches ...............eeeeeeeees 


77.5.1 Basic Idea of Evolutionary Multiobjective Approaches .... 


77.5.2 Various Evolutionary Multiobjective Approaches...... 
77.5.3 Future Research DirectionsS...............cceeeceeeeceees 
76 SPIO seeen insda a toed naseletetedoales 
Referentes ooecrcccccrrecrereecisr stiir sernai e a aa 


78 Bio-Inspired Optimization of Type-2 Fuzzy Controllers 
DREGE OSI A T eineedes E O T 
78.1 Related Work in Type-2 Fuzzy Control ..............ccceeeee eee ees 
78:2 Fuzzy logic SST ONES oon Sos siec cise ce cciews ante aai dime seat 
78.2.1 TWype=1. Fuzzy LOGIC SYSIEINS cecer es cece cecenes 
78.2.2 Type-2 Fuzzy Logic Systems .............ccccccceeeeeeees 
78.3 Bio-Inspired Optimization Methods ............... cece ceeeeeeeees 
78.3.1 Particle Swarm Optimization ................ccceee eens 


Detailed Contents 


78.3.2 Genetic AIZOFITHMS ......... ccc cece cece cece eee eeeeeeeeeeeees 

78.3.3 Ant Colony Optimization «22... .cc00. cccc.aeven sc ccesesccaeaes 
78.3.4 General Remarks About Optimization 

of Type=Z FUZZY Systems . neccs cece ci ceedeciae setoveaseeas ees 

78.4 General Overview of the Area and Future Trends ................ee00s 

FES GOMCMUSIONS..6occicscoleec ecgevae bales ase ce gendecetaaegeleoes viesegdeeeds 

OTST CGS T E E E 


79 Pattern Recognition with Modular Neural Networks 


and Type-2 Fuzzy Logic 


Patricia MAUD oasis ce sscasean tine eed eiesegie 01g tennit VEEE TEREE EEEE Eei 

79.1 Related Work in the Area ........esesssosessssessessesesssssesseressesso 

79.2 Overview of Fuzzy Edge Detectors ............c cece cece cceeeeeeeeeeeeee 

79.2.1 Sobel Edge Detector Improved with Fuzzy Logic........... 
79.2.2 Morphological Gradient Edge Detector Improved 

with Fuzzy LOGIE occiorient ra EE E EEAS 

79.3 EXpenmemtal SALUD coves dienes ses cos cinncauwew ses aeus o e evento 

79.3.1 General Algorithm used for the Experiments .............. 

79.3.2 Parameters for the Images Databases ................000eee 

79.3.3 The Modular Neural Network ................ 2... ee eee eee eee 

PO EXONS Tall Results o viec5ic visieces sine eie's ¥ siecareis waiere’stelvca-vonie siete aioe ¥ vise ea Mose ns 

TOS COMCUSIONS 6 sisi iiwscicerecsamccd xosun EnaA NES EAEE COENEN needs 

ROPRTOMCOS oi. oie ceisensaes SEEE cewni daddies EEE E EEE EEEE EEEE 


80 Fuzzy Controllers for Autonomous Mobile Robots 


Patricia Melin, Oscar Castiho sssi sine oe arid Foe afui0d Dee bean beled een oeseeaewoneds 
80.1 Fuzzy Contial of Mobile RGODOUS .2.s.c0cc cece sdescee ces dc ceeeeeeecdans 
80.2 The Chemical Optimization Paradigm.............cccccceeeeeeeeeeeeees 
BO.2.1 PSM MISMCOMPOUNS sci av cece rri de svedee decd ee eervevess 
80.2.2 Chemical REACHONS:. oi ccecs cae cee ccc accel cetecee cseten sien dns 
80.2.3 Synthesis ROACHONS..cii.5 cceis ecdveceedss accesses vee ee deeees 
80.2.4 Decomposition Reactions .............ccccccceeseeeeeceeeeees 
80.2.5 Single-Substitution Reactions ................ce eee eeceeeeees 
80.2.6 Double-Substitution Reactions ................cceeee ee eee ees 
80:3 TmMeMoülle RODUT. socrii e dncsaeviarsoriuserete lates EE USS 
80.4 Fuzzy Logit Controlli so cvec. i ccd cas eass ted sengar 
80.5 Expenmental R@SUlIS 0.2... ..cce.s seca sec ennei in sees eos 
BE OCIS TONS 8 core si ena a a saute NR EE 
Referen naon din aE AE E EA E RE E EE tes 


81 Bio-Inspired Optimization Methods 


Ferier VOER cai enori ia 6a tan EEEE E E EEN E nes 
81.1 Blom (Spied Methods 25. hc5 ca deh ccsaesd vee hs cdeeeee aevdecdeeeececdads 
81.2 Bio-Inspired Optimization Methods .............. cece ceeee sence eeeeee 
B1.2.1. Generic ABONIM «2 oo. sncehcacceesassey saecun aaa iiae 
81.2.2 Particle Swarm Optimization ............. ccc ce eee e eee eee eee 
81.2.3 Simulated Annealing woes. cesce ceccs oesces ea ieteewesewwdaas 


81.24 Pattern SEAM oo cick coscassccvs codes sesewincs sswess nadenees 


1603 


*}u0) payie1ag 


1604 Detailed Contents 


*}u0) pajlejaq 


81.3 A Brief Histöry OT GPUS ..ccesiccececsedasdccesavedessasasesceee dean deleuies 1535 
BLS CUDA cos cok spa sdieekaeeds E aaa DOENE nea ands 1535 

81.4 Experimental RESUME... 6.00. ccc cee. cece cece ceeded dean csssneaenesees 1535 
81:5 Conclusione «coi ss osies ci EEEE EENE EN EAEN sa0 Se SE SEUEN 1538 
CAAC E E vee duidd aeeied ease 1538 
Acknowledgements ....................ceceeceec enc eeeeeeeeeeeeeeeeeeeeeeeeeeaees 1539 
About thë AUtHOTS. ocer cccie cs ces ccc bas cose necuieeese Seu aeisi penisen aiia 1543 
Detailed Contents ...............ccccc ccc cece ccc ceceeese ence seseesseseeeeuseseeeees 1569 
LILEK osha bec EE E E ios sews eee bane citncatand us haeeiaus diseanceere news 1605 


1605 


Index 


(a, B)-maximal solution 136 
(a, B)-minimal solution 140 
(h, h’)-uniform generalized fuzzy 
partition 118 
(S,N)-implication 185 


A 
acceptance criterion 1265 
accuracy 
— maximization 1482 
—of approximation 335 
Actel Fusion FPGA 1467 


action selection 670 

action-centered subsystem (ACS) 
690 

active 

— categorical perception (ACP) 699 

-CMA 882 

— media technology (AMT) 343 

— perception 701 

—trail 554 

AdaBoost 515 

adapting boosting (Adaboost) 809 

adaptive 

— control of thought-rational 
(ACT-R) 685 

— control theory 734 

— genetic algorithm optimization 
library (ADGLIB) 1202 

—judgment 343 

1438 

— perturbation 1265 

— range MOGA (ARMOGA) 1001 

— representation genetic optimization 
technique (ARGOT) 1076 

—sampling 1199 

— variant of mBOA (amBOA) 913 

— vector integration to endpoint 
handwriting (AVITEWRITE) 669 

additive 

— approximation 986 

—generator 81 


— law 


additively decomposable function 
(ADF) 916 

address-event representation (AER) 
721 

adenosine triphosphate (ATP) 654 

adjoint pair 14 

adjointness condition 14 

advanced topic model 564 

aerodynamic shape optimization 
1201 

aesthetic design 856 

affine projection algorithm (APA) 
536 

agent 1054 

aggregated criteria 

aggregation 1379 

— behavior 1379 

— function 62 

— pheromone system (APS) 920 

AI context 848 

airfoil 

— design 1311 

— shape optimization 1323 

Akaike information criterion (AIC) 
1219 

algebra 

- BL- 16 

— Heyting 11 

- MTL- 19 

-t- 16 

algebraic 

—model 255 

— property 332 

—semantic 10 

algorithm 1240, 1266 

—engineering 1098 

— resource allocation 

-SLS 1085 

— tuning 1337 

algorithm feature 1130 

— evolution strategy 1130 

all pairs shortest path problem 841 

1062 

1364 


1167 


1366 


allele 
allocation 


all-to-all (A2A) 1025 

alternating-position crossover (AP) 
1260 

amodal 698 

amorphous computing 

amplification 1379 

amplitude 

— measure 612 

— modulation (AM) 627 

analog digital converter (ADC) 
1457 

analog network 607 

analysis of variance (ANOVA) 
1128 

— fixed factor 


1299 


1130 
analytics-as-a-service (AaaS) 813 
anatomical reduction 659 
Angelov—Yager (ANYA) 
annealing schedule 1092 
anomaly 

— detection (AD) 788 

— identification (AI) 788 
answer-set programming 41 


1436 


ant 
— based EMO 1001 
— behavior 1399 


— colony optimization (ACO) 914, 
942, 1096, 1112, 1226, 1255, 1292, 
1333, 1345, 1361, 1494, 1504, 
1518 

— colony system (ACS) 

antecedent condition 


1263, 1293 
1487 
anterior cingulate cortex (ACC) 

666 
anthropomorphic prosthetic limb 

753 
anticipated mean shift (AMS) 916 
anticipatory learning classifier 

system (ALCS) 962 
— system overview 974 
anytime algorithm 815 
application 962, 1267 
—environment 233 
—finance 233 


1606 


xapu| 


Index 


— health care 233 

— programming interface (API) 
1456 

—robotics 233 

—society 234 

application-specific integrated circuit 
(ASIC) 635, 1451 

approximate 

— implicant 339 

— linear dependency (ALD) 525 

— reasoning (AR) 43, 195, 343 

approximation 

— concept 379 

524 

— probabilistic 382 

— singleton 379 

—space 341 

— subset 379 

Archimedean t-norm 

archive 986-988 

arithmetic logic unit (ALU) 

arithmetic unit (ALU) 1459 

ARM processor 1467 

artificial 

— fish swarm algorithm (AFSA) 
1296 

— intelligence (AI) 331, 371, 473, 
845, 1050, 1225 


— error 


163 


1118 


— life 1300 

— neural network (ANN) 455, 633 

artificial bee colony (ABC) 1294, 
1333, 1362 

— algorithm 1294 


aspiration level 990 

asset—liability management (ALM) 
804 

assignment-based operator 1241 

association foundation of cognition 
652 

associative switch 488 

assortment 1066 

asymmetric/heterogeneous bipolarity 
302 


asynchronous 
— cellular EA 934 
—communication 1119 


Atanassov’s intuitionistic fuzzy sets 
103 

attainment indicator 

attractor 607, 722 

—dynamics 673 


1009 


attribute-concept value 379 
auto-associative neural network 
(AANN) 789 


auto-encoder 475 

— learning algorithm 476 
automated 

— decision-making 1009 

—innovization 1008 

— programming (AP) 847 


— relevance determination (ARD) 
548 
automatically defined function 


(ADF) 860 
autonomous 
— mental development 704 
-robot 313 
average ranking (AR) 1039 


axiom, bookkeeping 24 


back-propagation (BP) 474 

— through time (BPTT) 462 

backward 1354 

— gradient computation 476 

bag of little bootstrap (BLB) 809 

bagging 516, 808 

Bandler—Kohout subproduct (BKS) 
195 

bargaining scheme 152 

basal ganglia (BG) 666, 730 

base system builder (BSB) 1467 

basic 

— function cosine 114 

— fuzzy algebra (BFA) 257 

— k-means algorithm 240 

—f-norm logic BL 17 

Baum—Welch algorithm 569 

Bayes model averaging (BMA) 517 

Bayesian 

— and nonparametric extension 572 

— decision theory 596 

—DRSA 402 

—fusion 489 

— information criterion (BIC) 506, 
910, 1219 

—network 545 

— optimization algorithm (BOA) 
905 

—regression 593 


— regularization 550 

—rough set 393 

— Yin-Yang (BYY) 505 

bee colony optimization (BCO) 
1362 

BEECLUST algorithm 

behavior 313, 314, 324 

behavioral learning 973 

behavior-based 

—design 1299 

— multiple robot system with host for 
object manipulation (BeRoSH) 
1397 

belief function 51, 437 

benchmark instance 1337 

Benvenuti integral 76 

bereitschafts potential (BP) 747 

Berkeley data analytics stack 
(BDAS) 813 

beta rhythm 744 

bias 1072, 1078 

— search operator 

bias-variance analysis 

biclustering 1220 

bi-directional connection 477 

bifurcation 618 


1429 


1078 
486 


binary 
— aggregation function 64 
— indicator 986 


— matrix factorization (BMF) 508 
— operation 12 
binary constraint satisfaction 


problem (BINCSP) 1278 
— average tightness 1280 
— density 1280 
— evolving 1283 
—ModelE 1281 
—parametic vector 1280 
—tightness 1280 
bioinformatics 1212, 1219 
— oriented hierarchical evolutionary 
learning (BioHEL) 1214 
bio-inspired 1401 
— optimization method 1503 
biological 
— plausible 474 
— significance 611 


biophysical neural model 660 
biophysically realistic neuron 656 
bipolar 


— fuzzy set 103 


Index 


— information 301 

— possibility theory 37 

—query 304 

— satisfaction degree (BSD) 305 

-set 107 

bit error rate (BER) 541 

bit-flipping 1067 

bivalence 8 

bivariate marginal distribution 
algorithm (BMDA) 905 

black box complexity 825 

BL-algebra 16 

blind 

—bulldozing 1404 

— source separation (BSS) 547 

bloat 857, 1005 

block 

—decomposition 123 

— of an attribute-value pair 373 

blood brain barrier (BBB) 650 

body plan 699 

body-behavior coadaptation 698 

body-hub 706 

boids 1301 

Boltzmann 

— distribution 478,555 

—selection 830 

Boolean 

—reasoning 339 

— satisfaction problem 

boosting 491, 516, 809 

bootstrap 491, 710, 808 

borderline case 333 

bottom-up learning 691 

boundary region 334 

bounded sum 163 


1277, 1283 


box pushing 1396 
BP algorithm 482 
brain 616 


—communication rate 760 

— computer interface (BCI) 727, 
137 

— machine interface (BMI) 667, 
Jol 

— outlook 760 

— plasticity 729 

-rhythm 741 

brainstem motor center 730 

brainwave translation 754 

branch and bound (BnB) 1052 

—algorithm 1266 


Brouwer’s intuitionistic logic 91 
bug-fixing 854 

builder robot 1403 

building 

-block 475 

— computational intelligence 784 
bump attractor 674 


C 


cable equation 659 
caging 1396 
candidate solution 
Canny 

— algorithm 128 

— edge detector 127 
canonical hyperplane 586 
capacity 581 


1086 


cardinality 1064 
Cartesian GP (CGP) 849 
case study 1133, 1323 


— multiple algorithms, multiple 
problems: MAMP 1137 

— single algorithm, multiple 
problems: SAMP 1133 

— single problem designs: SASP and 
MASP 1133 

case-based reasoner (CBR) 43, 
786, 803 

Cauchy definition of integral 75 

causal 

—relationship 553 

-tree 509 

cell 652, 933 

— saving (CS) 1198 

— signaling among cells 652 

cellular 

— automata (CA) 1294 

— MOEA (cMOEA) 1019 

— neural network (CNN) 629 

— robot (CEBOT) 1409 

— signaling mechanism 654 

cellular EA (CEA) 933, 1160 

— synchronous cellular EA 934 

cellular evolutionary algorithm 
(CEA) 1019 

cemetery formation 1295 

center of gravity (COG) 208, 272, 
1436 


center of set (cos) 1502 


central 

— nervous system (CNS) 473, 650, 
T2 

— pattern generator (CPG) 667 

— processing unit (CPU) 1017, 
1533 

centralized Pareto front (CPF) 

cerebellum 730 

certain rule 376 

certainty qualification 35, 431 

chain 1398 

— behavior 1388 

— completeness 17 

chained local optimization 1095 

change/activate-deactivate (DC/AD) 
1473 

channel equalization 540 

channeling of the learning process 
701 

chaos neural system 623 

chaotic 

— neural network 623 

-time 538 

characteristic 

— function 35, 160 

—relation 379 

—set 379 

— vector 

chemical 

— optimization 1518 

— optimization algorithm 

— reaction algorithm (CRA) 

choice problem 349 

Cholesky factor 882 


1019 


1070 


1518 
1518 


Choquet integral 68, 153 
chromosome 1062 
chunk 689 


cigar function 893 

cingulate motor area(CMA) 739 
circuit design 855 

CKBot 1408 

CLARION 672 

clash 1241 

classical 

— approach 1214 

— control engineering 270 

— logic 203 

— measure 75 

classification 497, 757, 904, 1215 
— accuracy (CA) 1217 


1607 


xəƏpul 


1608 


xapu| 


Index 


— analysis and regression tree 
(CART) 786 

— ordinal 349 

— problem 798 

— rule-based algorithm 360 

—system 375 

classifier 340 

— adaptation 759 

class-shape transformation (CST) 
1192 

clique 1240 

clock (clk) 1461 

closure operator 24 


cloud 1440 

— based controller 1437 
cluster 446, 1219 
—graph 558 

— processor (COP) 1108 


— with semi-supervision 244 

— workstation (COW) 1113 

clustering 1218, 1301, 1401 

coalitional game 146, 150 

— fuzzy coalition 150 

coarse-grained 

— model 932 

— parallelism 1469 

coarticulation 738 

code growth 857 

co-evolution 852 

— genetic algorithm 840 

co-firing-based comprehensibility 
228 

cognition 683, 688 

cognitive 

—architecture 684 

— bootstrapping 710 

— control 666 

— implication 627 

— psychology 39 

— social simulation 693 

— system (CS1) 962 

Cohen 33 

Cohen-Grossberg (CG) 617 

coherence criterion (CC) 525 

collaboration 1424 

collaborative filtering (CF) 585, 
816 

collection of local factors 113 

collective 

— chaotic neural network 627 

— construction 1395 


— decision 1426 

— decision behavior 1388 
— decision biology 1389 
— decision making 1388 


— decision swarm robotic 1389 
— manipulation 1395 
collective exploration 1386 

— behavior 1386 

— biology 1386 


— swarm robotics 1386 

combination mechanism 488 

combinatorial optimization 825, 
1333 

—problem 1049 

combined model 734 

comma strategy 827 

committee machine 487 

common spatial pattern (CSP) 756 

commonsense reasoning (CR) 255 


communication 1298, 1412 
—system 638 
compact 


—flash (CF) 1471 

— genetic algorithm (CGA) 904 

comparable based AI model 
(AICOMP) 802 

comparative possibility 33, 38, 45 

compartmental modeling 659 

compatibility criteria 490 

complement of a fuzzy set 161 

complementary 

— crossover operator 1067 

— metal-oxide-semiconductor 
(CMOS) 629, 1457 

complete 

— data likelihood 561 

— F-transform-based fusion 
algorithm (CA) 126 

- NP- 25 

completeness 17 

—chain 17 

— chain general 21 

— general strong 18 

—standard 17 

— standard strong 18 

complex granule (c-granule) 343 

complexity 25 

— minimization 

compliance 991 

component level model (CLM) 
788, 1463 


1487 


compositional rule of inference (CRI) 
195 

comprehensibility 221 

— postulate 221 

compression ratio 123 

computational 

— complexity 

—cost 736 

— efficiency 98 

— fluid dynamics (CFD) 1191, 1311 

— intelligence (CI) 1, 729, 771, 
783, 1127, 1143 

671, 673 

— principle 717 

— reducing complexity 660 


196, 660, 992 


— model 


computational modeling 1144 
— adaptation 1149 

— business analytics 1144 

— data collection 1149 
—method 1147 

— model development 1150 

— process analytics 1145 

— research analytics 1146 


1149 

computational neuro-genetic 
modeling (CNGM) 778 

— application 779 

— principle 778 

— quantum-inspired optimization 
779 

compute unified device architecture 
(CUDA) 1025, 1456, 1535 

computer-aided design (CAD) 
1176 

computer-assisted manufacturing 
(CAM) 1174 

computing research and education 
(CORE) 314 

computing with words (CWW) 
250, 345 

concept 371 

— approximation 381 

— completely covered by a rule set 
372 

conditional 


— workflow 


— independence 552 

— mixture model 485 

— possibility 35 

— preference network (CP net) 44 


Index 


— probability distribution (CPD) table 
554 

— random field 572 

conference 861, 920 

confidence interval 1137 

configurable logic block (CLB) 
1459 

configuration 

— offline 1100 

—online 1100 

—per-instance 1100 

confirmation-theoretic rough set 
392 

conflict check 1276 

confusion matrix 800 

g-conjugation 190 

conjunction 

— Lukasiewicz 9 

— noncommutative 22 

conjunctive normal form (CNF) 
1277 

connecting compartments 

connection mechanism 

connectionist 

—network 695 

connective 17 

connectivity 1077 

— of search space 

connector 1410 

—electro-static 1411 

—magnetic 1411 

— mechanical 1411 

conorm 162 

— triangular 13 

CONRO 1409 

consensus 1426 

— decision making 

consistency measure 

consolidated example 1349 

constrained domination 1002 

constrained optimization 995 

— problem (COP) 1227 

constraint 1064 

— based local search (CBLS) 

—check 1276 

— condition 1490 

— handling 828 

— programming (CP) 
1271 

— satisfaction problem (CSP) 
1256 


1100 


659 
1410 


1077 


1385 
356 


1230 


1050, 1225, 


1226, 


constraint satisfaction problem (CSP) 
1271 

— formal definition 

1272 

construction 1403 

— grammar 706 

— model ensemble 807 

construction heuristic 

— adaptive 1089 

1089 

constructive 

— learning 489 

— modularization learning 489 

context-adaptation-based index 228 

context-free grammar (CFG) 850 

continuity 14 


1272 
— solution 


1075 


— static 


continuous 

— Archimedean t-norm and t-conorm 
164 

— black-box optimization 871 

— t-norm 17,20 

continuous-time finite Markov chain 
(CTMC) 1346 

contractual service agreement (CSA) 
787 

contrapositivization 190 

contrastive 

—auto-encoder (CAE) 480 

— divergence (CD) 478 


control 

— policy 699 

— register (CR) 1472 
— signal 754 

— system 638 

— word (CW) 1472 
controller 317, 733 
— structure 1471 


controlling crossed genes (CCG) 
1042 


convergence 984, 987 
— factor (cf) 1335 
— linear 886 


convolution 483 
cooperation 320 
-robot 703 
cooperative 

— algorithmic level 
1056 

— optimization model 
— transport 1385 


1109 
— model 
1050 


1070 
1415 
— descent optimization 480 
—system 732 
coordinated motion 
—behavior 1384 
—biology 1384 

— swarm robotics 


coordinate 
— attractor 


1383 


1385 
coordination control 1299 
coprocessor 1458 

corpus callosum 738 


correlated mutations 881 
correlation 
—analysis 1077 


— feature selection (CFS) 1216 
corridor function 878 

cortex 717 

cortical representation 667 
corticospinal tract (CST) 739 


cost-sensitive learning 502 


coupled 
— map lattice (CML) 621 
— oscillator 1382 


coupling 484 
covariance matrix adaptation (CMA) 
881 
— evolution strategy (CMA-ES) 
881, 1163 
covariance matrix update 
covariate shift 
— adaptation 758 
— minimization (CSM) 758 
coverage-based genetic induction 
(COGIN) 1453 
covering 
— global 
— local 


884 


372 
373 
crawling 710 
creating problem 
credit-assignment 
—algorithm 490 
crisp 
— (precise set) 333 
— belief structure 437 
—relation 171 
— rough approximation 439 
— setting 427 
critical demixing point 
cross 
— correlation function (CCF) 613 


1283 
489 


1161 


1609 


xəƏpul 


1610 


xapu| 


Index 


— information potential (CIP) 525 
— power spectral density (CPSD) 
613 
crossed ridge function 970 
826, 849 
1068 


crossover 

— arithmetic 

-cycle 1071 

— generalized order 

— geometrical 1068 

—intermediate 1068 

—n-point 1068 

—one-point 1068 

— operator 1259 

1071 

— partially mapped 

—point 1068 

— precedence preservative 

— uniform 1068 

cross-talk 484 

cross-validation 477, 596, 1218 

crowding 839 

1000 

crowdsourcing 814 

cultural evolution 706 

cumulative 

— path length control 880 

— step-size adaptation (CSA) 880, 
891 

current-mode 718 

a-cut of a fuzzy set 161 

cut relation 174 

cycle crossover (CX) 


1071 


— order 
1071 


1071 


— distance 


1259 


D 


dance 

1364 
— strength determination 
data 

— clustering 505 

— complete 371 

— consistent 371 

— dependent regularization 589 
— driven design 220 

— incomplete 378 

— inconsistency rough set 350 
— inconsistent 375 

— mining 339, 447, 972, 978 

— processing 638 


— decision function 
1363 


— space adaptation (DSA) 759 
— word (DS) 1472 

database (DB) 793 

— management system 295 


— querying 49 

Davis-Putnam-Logemann-Loveland 
(DPLL) 1271 

D-core 337 

De Morgan triplet 163 

dead zone 1439 

Deb-Thiele-Laumanns-Zitzler 
(DTLZ) 1026 

decision 


— aiding rough set methodology 
349 

— class 336 

— graph BOA (dBOA) 906 

— maker (DM) 349, 990, 1038 

— maker (DM) preference 806 

— making 446 

-rule 336, 351 

—rule dominance-based 359 

— rule induction 359, 364 

— rule multi-criteria choice 
and ranking 365 

— rule true 336 

— rule truth degree 336 

—space 984, 985 

— support system (DSS) 797 

—system 336 

— system consistent 336 

— system inconsistent 336 

—table 377 

—theory 596 

decision-theoretic rough set (DTRS) 
388 

declarative memory 686 

decoder 475, 1061 

de-correlated component analysis 
(DCA) 504 

dedicated architectures 

deduction 255 

—refutation 264 


1114 


— strong 260 
-weak 260 
deep 


—architecture 474 

— belief network (DBN) 478 

— learning (DL) 474, 483, 492, 816 
— neural network (DNN) 474 
definable set 334, 389 


defuzzification 49, 206 


defuzzifier 1503 
degenerated representation 1076 
degree 


— of freedom (DOF) 322, 700, 762 

— of freedom (DOF) problem 667 

—provability 23 

Dempster rule of conditioning 46 

denoising auto-encoder (DAE) 475 

density estimation using Markov 
random fields algorithm (DEUM) 
911 

dependency 

—degree 337 

— structure matrix genetic algorithm 
(DSMGA) 910 

—tree EDA (dtEDA) 915 

depth perception 702 

derandomization 879 

derivation 850 

description logic 50 

design 855, 1327 

— and analysis of computer 
experiments (DACE) 1188 

— architecture 1311 

—method 636 

— optimization 1175 

— principle 1071 

desirability function 990 

desynchronization transition 614 

development 701 

— flow 1460 

developmental robotics 698 

deviance information criterion (DIC) 
1219 


diagnostic 788, 791 
dictyostelium 1408 
differential 


— evolution (DE) 852, 1001, 1162 

— pair integrator 718 

diffuse decision model 673 

diffusion 

-model 934 

— tensor imaging (DTI) 739 

digital signal processing (DSP) 
1457 

digital signal processor (DSP) 316 

DIMACS graph coloring instance 
1268 

direct 


— current (DC) 1366 


Index 


—encoding 1273 

—memory access (DMA) 1118 

—representation 1062, 1065 

direct representation 1070 

directed acyclic graph (DAG) 468 

direction adaptation 881 

discernibility relation 339 

discovery of simple solution 701 

discrete 

—F-transform 119 

—recombination 877 

disjunction 

— Łukasiewicz 9 

disjunctive normal form (DNF) 209 

displacement mutation operator 
(DM) 1260 

dissimilarity function 67 

distance 

— criterion 

— Manhattan 

— measure 1247 

— metric 1319 

— proportional step-size 

— to the optimal solution 


1166 
1063 


887 
1276 


distillation tower 1151 
distributed 

— architecture 1113 

— control 1412 


— evolutionary algorithm (dEA) 
932, 1019 


— memory architecture 1113 
— MOEA (dMOEA) 1019 
— Pareto front (DPF) 1019 


— resource evolutionary algorithm 
machine (DREAM) 1122 

distribution estimation using Markov 
random fields (DEUM) 920 

disturbance 1369 

diversity 517, 808, 984, 1078 

— measure 1247 

— promotion 484 

divide-and-conquer 484, 489 

divisibility condition 18 

divisible 15 


division 1295 
do not care condition 378 
domain 


— relational calculus (DRC) 296 
— specific programming language 
1418 


Dombi t-norm 166 


dominance 
—principle 351 
—ranking 989 


— relation 351, 985 

— resistant solution (DRS) 1036 

— without degrees of preference 
363 

dominance-based 


— consistency measure 356 
— decision rule 359 
— probabilistic rough set 399 


— rough membership 355-357 

dominance-based rough set 354 

— probabilistic 358 

— properties 355 

— rough approximation 354 

— stochastic 358 

— variable consistency 355 

dominance-based rough set approach 
(DRSA) 349, 398 

— extensions 366 

— multi-criteria decision problems 
367 

— operational research problems 
366 

— stochastic 358 

domination 996 

dopamine (DA) 671 

dorsal pathway 707 

dorsolateral prefrontal cortex 
(DLPFC) 685 

double bridge experiment 

drastic product 162 

D-reduct 337 

d-separation 554 

dual pair 137-139 

— Bector and Chandra 

— fuzzy 138 

—Ramik 141 

— Rödder and Zimmermann 

— Verdegay 139 

dual problem 132 

duality 131 

— method 637 

— theory 143 

duality theorem 141 

— first weak 141 

— second weak 141 

dynamic 607 

— eSNN (deSNN) 777 

— intrinsic primitive 736 


1357 


138 


137 


— local search (DLS) 1093 

—model 661 

— neuro-fuzzy inference system 
(DENFIS) 774 

— optimization 1008 

— partial reconfiguration (DPR) 
1470 

— programming (DP) 

—system 609 

dynamically reconfigurable robotic 
system (DRRS) 1297 

dynasearch 1091 


1333 


E 


early 

— access partial reconfiguration 
(EAPR) 1471 

— empirical model 657 

— word learning 705 

echo state network (ESN) 464 

edge 

— detector 127, 1509 

— histogram based sampling 
algorithm (EHBSA) 915 

— histogram matrix (EHM) 915 

— recombination 838 

— recombination (ER) 

edge detection 1509 

— operator 128 

EEG data space adaptation 759 

effect of the FOU 292 

effectiveness 1276 

effector dependency 737 

efference copy 734 

efficiency 1276 

— enhancement technique 917 

efficient global optimization (EGO) 
1188 

eigenvalue problem 592 

ejection chain 1091 

electric field 745 

electrically excitable cell 652 

electrochemical potential difference 
654 

electrocorticographic motor cortical 
surface potential 741 

electrocorticography (ECoG) 727 

electroencephalogram (EEG) 614, 
625, 741, 778 


1053 


1611 


xəƏpul 


1612 


xapu| 


Index 


electroencephalography (EEG) 727 
electromyography (EMG) 751 
elementary 
— granule 
—set 334 
elimination ordering 
elitism 827 

elitist 950 
embedding system (ES) 1453 
embodied cognition (EC) 698 


334 


1257 


embodiment 697 
— bias 705 
emergence 623, 670 


emergent macroscopic 625 
emigration policy 938 


emission distribution 567 


emotional robot 323 
empirical 
— modeling 1177 


— risk minimization (ERM) 581 

encoder 475 

encoding 1218 

endogenous strategy parameter 

energy 

— conservation relation 531 

—function 478, 556 

— functional 560 

— reconfigurable robot 

engine idle speed control 

engineering design 855 

enhanced 

— Karnik—Mendel (EKM) 1456 

— Karnik—Mendel algorithm with 
new initialization (EKMANI) 
1456 

— opposite directions searching 
(EODS) 1456 

— simple algorithm (ESA) 

ensemble 

—learning 491 

—method 514 

ensembling strategy 488 

entity 1463 

environmental 

—selection 983 

— signal detection 651 

epigenetic robotics 704 

— architecture (ERA) 669 

epoch 477 

epsilon 

— approximation 986 


873 


1411 
276 


127 


— constraint method 990 


— indicator 986, 1009 

equilibrium 

— genetic algorithm (EGA) 904 

— point hypothesis 667 

— trajectory hypothesis 733 
equivalence 

—class 1065 

— of neighborhood definition 1067 
ergodicity 1066 


error backpropagation 458 

eSNN application 777 

essential infimum 872 

estimation 

524 

— of Bayesian network algorithm 
(EBNA) 905 

— of distribution programming (EDP) 
914 

— of Gaussian networks algorithm 
(EGNA) 912 

— of multivariate normal algorithm 
(EMNA) 905 

estimation of distribution algorithm 
(EDA) 853, 899, 967, 1052, 1070 

-model 903 

— procedure 900 

—theory 916 

e-threshold generated implication 
191 

Eulerian cycle problem 947 

evaluation 568 

— function 1088, 1218 

—problem 568 

event-related 

—desynchronization (ERD) 746 

— synchronization (ERS) 746 

evidence reasoning 489 

evidential support 37 

evolution 701, 967, 1440 

-path 880 

— strategy (ES) 849, 871, 1065, 
1130 

—window 889 

evolutionary 

— algorithm (EA) 784, 790, 825, 
$27,902, 929, 930, 995, 1017; 
1049, 1061, 1097, 1107, 1130, 
1160, 1179, 1212, 1241, 1273, 
1313, 1366, 1452 


— error 


— algorithm (EA) hybridization 
1050 

— approach 1213 

— computation (EC) 845, 1159, 
1173, 1271, 1468 

— computing (EC) 784 

-cycle 1130 

— FRBS (EFRBS) 1452 

— local selection algorithm (ELSA) 
1216 

— programming (EP) 847, 849, 
1065 

—robotics 698, 1300 

evolutionary multi-objective 
algorithm (EMOA) 1160 

evolutionary multi-objective 
optimization (EMO) 995, 1311, 
1491 

— application 1002 

— constraint handling 

— decision-making 

— elitist approach 1000 

—hybrid approach 1004 

— hypervolume measure 

— KKT condition 997 

— knowledge extraction 

—meta-modeling 1010 

—non-elitist approach 999 

— parallel computing 1007 

— partial Pareto-optimal set 

— performance metrics 1009 

— principle 997 

— redundant objective 1007 

— reliability-based optimization 
1005 

— run-time analysis 

— uncertainty handling 

evolving 

— connectionist system (ECOS) 
771 

— fuzzy neural network (EFuNN) 
774 

— methodology 1439 

— neuro-fuzzy system 772 

— self-organized map (ESOM) 774 

— spiking neural network (eSNN) 
TIS 

— Takagi—Sugeno system (ETS) 
FES 

exact algorithm 557 

examination timetabling 


1001 
1006 


1005 


1008 


1007 


1009 
1005 


1282 


Index 


excess mean square error (EMSE) 
531 

exchange 

— mutation operator (EM) 

— property (EP) 184 

executable graph 849 

exhaustive selection 987 

exogenous strategy parameter 

exoskeleton 322 

expectation 

—(E) step 562 

— maximization (EM) 485, 486, 
504, 547 

— maximization (EM) algorithm 
561 

— propagation 565 

expected improvement (EI) 

expedition 1363 

experimental 

— evaluation 


1260 


873 


1188 


1337 

—science 251 

—task 1166 

expert 

—network 485 

—system 684 

explicit knowledge 695 

exponential natural evolution strategy 
(xNES) 884 

exponentially weighted KRLS 
(EW-KRLS) 536 

extended 

— aggregation function 63 

— compact genetic algorithm (ECGA) 
903 

— compact genetic programming 
(ECGP) 914 

— kernel recursive least square 
(EX-KRLS) 525 

extended possibilistic 

—approach 307 

— truth value (EPTV) 298 

extension 

—method 69 

—of approximation 341 

— of fuzzy sets 90, 106 

—principle 96 

extra 

— cerebral magnetic field 745 

— degree of freedom 290 

extracellular recording 748 

extreme value (EV) 67 


F 


F!-transform 121 

— edge detector 127 

factor 1130 

— analysis (FA) 504, 562 

— fixed 1130 

factorial hidden Markov model 

factorized distribution algorithm 
(FDA) 905 

fair evolutionary multi-objective 
optimizer 1010 

FARC-HD method 101 

fast learning 737 

FATI (first aggregation then 
inference) 232 

fault 

— accommodation 797 

1120 


571 


— tolerance 
feasible 135 
—region 132 

—set 996 

— set in objective space 


132 


996 
— solution 
feature 
759 
518, 755 


— adaptation 

— extraction 

-map 598 

—regression 759 

—selection 518, 548, 755, 1215 

—vector 528 

feedback control mechanism 732 

feed-forward 483 

FERET database 1514 

field programmable gate array 
(FPGA) 1114, 1451, 1453 

— fusion 1451 

— spartan 6 1451 

—virtex5 1451 

financial engineering 638 

fine-grained 

934 

— parallelism 


— model 
1469 

finite 

— alphabet 908 

— impulse response (FIR) 528 
— mixture model (FMM) 485 
— sample bound 581, 582 

— state machine 850 


firefly algorithm (FA) 1296 


firing rate 749 

— based 661 

— based dynamic model 661 

first-order 

— fuzzy logics 21 

—logic 50 

—logics 20 

— plus-time delay model 291 

fish schooling 1296 

FITA (first inference then 
aggregation) 232 

fitness 853, 1366 


— assignment and diversity 
preservation (FA-DP) 1020 

— evaluation relaxation 919 

—function 828, 1064 

— landscape 1159 

— level method 950 

— sharing 839 

fixed bus 1471 

fixed-budget KRLS (FB-KRLS) 
536 

fixed-effects model 1133 

fixed-length string 908 

flashing fireflies 1295 

flawed variable 1280 

flexible coprocessor (FlexCo) 

flexible querying 

— fuzzy database 307 

— regular database 303 

floating point unit (FPU) 

flocking 1385 

flower patch 1362 

flowing shift-point determination 


1472 


1118 


278 
FLP problem 140 
— dual 140 
— primal 140 
flying insect 702 


Fokker-Planck equation 1428 
footprint of uncertainty (FOU) 286, 
1454, 1501, 1526 
forager recruitment 

foraging 1301 

— algorithm 1363 

— profitability 1363 
force closure 1397 
forebrain 730 
foreign exchange (FX) 
form closure 1397 
formae 1065 


1364 


1130 


1613 


xəƏpul 


1614 


xapu| 


Index 


formal 

— concept analysis (FCA) 53 
—frame 253 

formation control 1400 
forward 1354 


—algorithm 568 

— -backward algorithm 570 

—computation 476 

—model 734, 975 

— probability 568 

foundation of cognition 652 

FPID controller 292 

1410 

fractal prediction machine (FPM) 
464 

fractional anisotropy (FA) 739 

Frank t-norm 165 

FT2KM design entity 


Fracta 


1462 


F-transform 117 

— based fusion algorithm 126 
—component 118 

— compression advanced 124 


—higher degree 121 

— image compression (FTR) 

— of a function of two variables 

fugacity 1161 

full multi-grid (FMG) 

function 1351 

functional magneto-resonance 
imaging (fMRI) 727, 778 

FURIA algorithm 101 

further reading 861, 1474 

fusion 815 

— operator 

future 

— research challenge 

-trend 1505 

-work 1343 

fuzzification 206, 1462 

fuzzifier 1501 

fuzzy 

—2-partition 115 

— approximation model 

— belief structure 437 

196, 269, 789 

—c-regression model 242 

—c-variety 241 

-data 143 

— decision tree (FDT) 

— deduction system 24 


122 
120 


1318 


126 


812 


113 


— control 


101, 1452 


— domain relational calculus (FDRC) 
308 

— dominance 1007 

— edge detector 1510 

— generic algorithm (FGA) 1538 

— hierarchical clustering 239 

—IF-THEN 194 

—IF-THEN rule 113 

— inference mechanism (FIM) 

— information system 437 

— instance based model (FIM) 793 

— integral 79 

— linear programming 137 

— logic-based controller 275 

— mathematical morphology (FMM) 
196, 197 

— mathematics 171 

— measure 68 

— model 221 

— modeling (FM) 113, 221, 323 

— nearest neighbor (FNN) 446 

— neural network (FNN) 756, 773 

— particle swarm optimization 
(FPSO) 1538 

— partition 116 

— pattern matching 49 

— preferences 303 

— proposition 193 

— quantity 132 

— query language (FQL) 303 

— relational database (GEFRED) 
300 

— semantics 25 

— setting 428 


196 


— soft sets 107 
— subsethood measure 196 
— supervisory control 789 


— support vector machine (FSVM) 
445 

—transform 113 

fuzzy binary relation 

—antisymmetry 175 

—asymmetry 175 

— completeness 177 

— fuzzy order 179 

— irreflexivity 175 

— linearity 179 

— negative transitivity 

— reflexivity 174 

— semitransitivity 

— similarity relation 


174 


177 


177 
179 


—- symmetry 175 

— transitivity 176 

fuzzy clustering 239 

fuzzy c-means 239 

— algorithm 240, 279 

— kernel-based 243 

— kernel-based, numerical example 
244 

— notation 239 


— variations 241 

fuzzy controller 271, 1454 
— architecture 271 

— automatic learning 279 
— structure 276 


fuzzy database 298 
— possibilistic approach 299 

— similarity-based approach 300 
fuzzy implication 166, 184 

— application 193 
— characterization 
— class 186 

— classes and generalizations 189 
— construction method 190, 191 
— distributivity 187 

— natural negation 184 
— non-classical setting 


185 


192 


— open problem 198 
— property 184 
— R-implication 189 


fuzzy interval 50 

—analysis 36 

fuzzy logic (FL) 91, 160, 161, 193, 
203, 220, 269, 784, 1173, 1452, 
1510 

— controller (FLC) 204, 285, 315, 
1522 

— operation 187 

— system (FLS) 315, 1499 

fuzzy partition 119, 205 

—coverage 225 

— distinguishability 225 

— interpretability constraint 

— relation preservation 225 

— Ruspini condition 114 

— special elements 225 

-type 1485 

— uniform 114 

fuzzy PID (FPID) 285 

— control law 286 

fuzzy PID (FPID) controller 285 

— membership function (MF) 286 


225 


Index 


— scaling factor (SF) 286 

— structural parameter 286 

— structure 286 

— tuning parameter 286 

fuzzy relation 171, 207, 299 

—composition 173 

— cylindrical extension 173 

—domain 172 

— operation 172 

— projection 173 

—range 172 

fuzzy relational inference (FRI) 
195 

fuzzy rough 

—hybridization 426 

-set 107, 427 

fuzzy rule 43, 274, 275, 777 

— decreasing the number 1487 

— granular output 226 

-type 1482 

fuzzy rule base 229 

— average firing rules 226 

—compactness 226 

—completeness 226 

— logical view 226 

— semantic interpretability 226 

fuzzy rule for control 271 

— fuzzy logic-based controller 275 


— Mamdani-—Assilian control 271, 
274 
fuzzy rule-based (FRB) 1436 


— classification system (FRBCS) 
100 

— classifier (FRC) 212, 798 

fuzzy rule-based system (FRBS) 
203, 222, 1452 

— approximate 210 

—design 214 

—hierarchical 213 

— linguistic 209 

—Mamdani 209, 215 

— property 214 

— singleton 212 

-TSK 211,215 

—type 209 

—type-2 212 

fuzzy set (FS) 11, 89-91, 128, 159, 
286, 331, 1455 

— algebra 11 

— Atanassov’s interval-valued 
intuitionistic 105 


— Atanassov’s intuitionistic 89 
— continuity 225 
—convexity 225 
—crisp 134 
—extension 89 

— interpretability constraint 224 
—interval-valued 89 

— merging 231 

— normality 224 

— theory 203, 297 
—type-2 89 

fuzzy system (FS) 204 
— database (DB) 232 
— design 231 

— integration 222 

— interaction 222 

— interpretability 219 
— representation 234 
— rule-base (RB) 232 
— trust 222 

— validation 222 


G 


gait control table 1413 

GALE 972 

gamma-aminobutyric acid (GABA) 
654 

gas expansion behavior 

gating network 485 

Gaussian 

— filter 128 

— radial basis function (RBF) 590 

—RBM(GRBM) 480 

Gaussian process (GP) 594, 1178 

— function space view 594 

— weight space view 594 

gene 1062 

— invariant genetic algorithm 832 

— regulatory network (GRN) 778 

gene/protein regulatory network 
(GRN) 778 

general 

— algorithm 1512 

— completeness 21 

— integral 84 

-remark 1505 

— technique 1274 

generalization 483 


1387 


generalized 

— approximate cross-validation 
(GACV) 596 

— modus ponens (GMP) 194 

— possibilistic logic 53 

— rough set 413 

— Ruspini condition 115 

— T2FS (GT2) 1454 

general-purpose GPU (GPGPU) 
1456 

general-purpose input/output 
interface (GPIO) 1466 

generating 

— function 116 

— method 998 

— set adaptation 881 

generative 

— AI model (AIGEN) 802 

-model 477 

genetic 

— drift 1079 

— fuzzy system (GFS) 320 

— pattern search (GPS) 1538 

— programming (GP) 845, 913, 
1071, 1162 

— repair 876 

genetic algorithm (GA) 757, 825, 
847, 873, 962, 967, 995, 1162, 
1178, 1204, 1215, 1258, 1259, 
1361, 1452, 1468, 1504, 1533 

— gradient (GAGRAD) 101 

—niched 963 

— optimization toolbox (GAOT) 
790 

Genetic and Evolutionary 
Computation Conference 
(GECCO) 861, 920 

genetic fuzzy 

—clustering 1452 

—neural network 1452 

— system (GFS) 1486 

genome-wide association study 
(GWAS) 1211 

genotype 1062 

— binary 1064 

— continuous 1065 

— integer 1064 

— real-valued 1064 

—type 1064 


geographical space 1350 


1615 


xəƏpul 


1616 


xapu| 


Index 


geometric 
—crossover 1068 
— program 1403 


German Aerospace Center (DLR) 
280 

gesture 708 

Gibbs sampling 478, 559 

glial cell 654 

global 

— communication 

— density 1441 

— leader 1321 

— optimum 1159 

— supervised learning 482 

— workspace theory (GWT) 673 

globus pallidus (GPi) 671 

GM3M index 228 

goal-directed arm movement 

Gödel logic 8 

GP theory 858 

g-protein coupled receptor (GPCR) 
654 

GPU computing 483 

graded certainty 32 

gradient 1415 

grammar model-based program 
evolution (GMPE) 914 

grammar-based genetic programming 
(GGP) 1214 

grammatical evolution (GE) 850 

granular computing (GC) 331 


1413 


732 


—dominance cone 353 
granularity 1118 

graph 1070 

— coloring 1239-1241, 1278 

— coloring problem (GCP) 1256 
—representation 849 

graphical model 545, 717, 906 


graphics processing unit (GPU) 
483, 1019, 1108, 1451, 1533 


grasping 1396 
gray matter(GM) 731 
greedy 


— construction heuristic 1089 

— partition crossover GPX 1243 

— randomized adaptive search 
procedure (GRASP) 1086, 1094, 
1241 

— selection 988-990 

— strategy 1053 


green fluorescent protein (GFP) 
653 

grouping genetic algorithm (GGA) 
1243 

growth curve 938 

guaranteed possibility 36 

guided local search 1093 

Gustafson method 242 


H 
Hamacher t-norm 166 
Hammersley 555 
hand 
— motion attribute 
—movement 747 


751 


hardware 

— architecture 1297 

— description 1366 

— description language (HDL) 1457 

— implementation approach 1458 

— internal configuration access point 
(HWICAP) 1471 

H-bridge 1467 

health management (HM) 788 

Hebbian 706 


hesitant sets 107 


heteroazeotrope point 1167 
heterochronic change 710 
heterogeneous 619, 1410 

— island model 933 

— NN ensemble 487 

— parallel architecture 1113 
heuristic 263, 1261 

— space search 1076 


Heyting algebras 11 

hidden 

— layer 475 

— Markov model (HMM) 509, 566 
—state 567 


— tree Markov model 572 
hierarchical 
— BOA (hBOA) 906 


— fuzzy clustering 245 

— information processing 474 

—mixture 509 

— probabilistic incremental program 
evolution (H-PIPE) 914 

—selection 989 

—system 196 


high-dimensional problem 1486 

higher 

— frequency band (HFB) 742 

— level decomposition 126 

— order cognitive capability 703 

— order Markov model 571 

high-friction ground 700 

high-performance computing (HPC) 
814, 1457 

Hilbert space (HS) 588 

hill climber 1054 

hill climbing 1089 

— with learning (HCwL) 908 

h-implication 190 

hindbrain 730 

Hodgkin 657 

Hodgkin—Huxley (HH) 616, 657 

homogeneous 1410 

— island model 933 

—NNensemble 487 

— parallel architecture 

homoiconicity 846 

honey bee 1363 

hoop 22 

hormone-based control 1414 

hormone-inspired communication 


1113 


and control 1300 
host technology 1457 
Hotelling’s T? 785 


Hough transform (HT) 507 
human 

— centered design 234 

— competitive result 848 

— motor system 728, 730 
— robot interaction 324 


human hand movement 735 

— kinematic feature 735 
h-uniform Ruspini partition 117 
Huxley 657 

hybrid 

— algorithm 1333 

— communication 1413 


— intelligent system 1150 

— learning strategy 474 

— metaheuristics 1050 

— system 772 

hybridization 918, 1050, 1263 

hyperbolic tangent transfer function 
479 

hyper-cube framework (HCF) 

hyper-heuristics 1048 


1335 


Index 


hypervolume 
— indicator 986, 1009 
— measure 1005 
hypothesis 

— testing 1136 

— verification 344 


hysteresis thresholding 128 


iCub 705 

ideal point 990 

idempotent negation 22 

identity 

— graded 21 

— principle (IP) 184 

IF fuzzy set 103 

ignorance function 102 

image 

— analysis 1509 

— compression 122 

— database 1512 

— fusion 125 

— processing 122, 446 

— producing 1514 

— recognition system 

— reconstruction 122 

immigration policy 938 

impasse 688 

imperfect 

— information 295 

— knowledge 331 

implication 

— function 428 

—R-function 14 

implicative rule 213 

implicit knowledge 695 

importance weight 304 

imprecise membership degree 301 

improve performance 1274 

improving Q-learning 512 

inclusion function 341 

increasing population size (IPOP) 
885 

incremental 919 

— Bayesian optimization algorithm 
(BOA) 917 

— univariate marginal distribution 
algorithm (IUMDA) 908 

— update 1090 


1509 


independence relationship 554 

independent 

— algorithmic level 1109 

— and identically distributed (i.i.d.) 
1133 

— component analysis (ICA) 504, 
547, 756 

— factor analysis (IFA) 503 

— identically distributed (i.i.d.) 578 

indicator based (IB) 1023 

— evolutionary algorithm (IBEA) 
989, 1009 

— selection 983—985, 987, 988 

indirect encoding 1273 

— advantage 1275 

— disadvantage 1275 

indirect representation 1061 

indiscernibility relation 334, 371 

indiscernibility-based rough set 
approach (IRSA) 351 

individual 873 

induced chromosome element 
exchanger (ICE) 915 

induction decision rule 359 

inductive programming (IP) 848 

industrial domain 787 

inference 556, 1502 

— engine (IE) 1465, 1473 

— graded 23 

— mechanism 195 

— system 207 

inferior parietal 

— lobe (IPL) 740 

— lobule (IPL) 748 

infinite-valued logics 11 

infomax principle 547 

information 

— fusion 50 

— modeling 298 

— retrieval 447 

— system 295, 334 

information theoretic 

— learning (ITL) 525 

— method (ITL) 545 

informed individual 1385 

inhibitory feedback 618 

inner measure 75 


innovization 1008, 1180 
— automated 1008 
input/output 

— block (IOB) 1459 


— hidden Markov model (IO-HMM) 
eres 

—scaling 206 

input-data 131 

input-dependent 488 

input-independent 488 

insertion mutation operator (ISM) 
1260 

instance selection 445 

insurance 798 

integral 

— time absolute error (ITAE) 292 

— transform 113 

integration theory 75 

integrator 719 


intellectual property (IP) 1457 
— interface (IPIF) 1467 
intelligence 251 

— center 729 

intelligent 

— controller (IC) 1451 


— distribution agent (IDA) 673 

— optimization 1312 

interaction 346, 908 

— between ontogenetic and 
phylogenetic 710 


— via communication 1299 
— via environment 1298 

— via sensing 1298 
interactive 

— analysis 1179 


— computation 343 

— granular computing (IGR) 343 

— optimization 990 

— rough granular computing 343 

inter-island evolution 933 

intermittent brain chaos 627 

internal 

— configuration access point (ICAP) 
1471 

—model 732 

— motor model 733 

—port 1466 

International Neuroinformatics 
Coordinating Facility (INCF) 
662 

internet of things (IoT) 813 

interneuron (IN) 742 

interpretability 219 

— accuracy trade-off 223 

— assessment 227 


1617 


xəƏpul 


1618 


xapu| 


Index 


— challenges 219 

— constraint 224 

— definition 234 

— design decision 232 

— fuzzy system design 229 

— improvement 1488 

— semantic-based 228 

— structural based 227 

interpretable fuzzy system 229 

interrupt controller (IC) 1466 

interruption 1440 

intersection and union of fuzzy sets 
161 

intersection operations 12 

inter-spike interval (ISI) 750 

interval T2FC (IT2FC) 1451 

interval T2FS (IT2FS) 1454 

interval type-2 fuzzy 

— inference system (IT2-FIS) 1453 

— logic controller (IT2-FLC) 286 

— PID (IT2-FPID) 285 

— set (IT2-FS) 286, 288 

—system 1509 

interval type-2 fuzzy PID controller 
285, 288, 291 

— design method 290 

— design parpmeter 290 

— design strategy 289 

— inference 288 

— internal structure 

— rule structure 289 

— type reduction 288 

interval type-2 membership function 
(IT2-MF) 289 

interval valued fuzzy set (IFVS) 37, 
98, 301 

intracortical microelectrode 

intra-island evolution 933 

invariance 874, 885 

inverse 

— kinematics 735 

— model 734, 975 

— selection 831 

inverse F-transform 118 

— of a function of two variables 121 

inversion 

— formula 118 

— mutation operator (IVM) 

IP core 1466 

island model 932 

IT2FC coprocessor 


288 


TIJ 


1260 


1466 


iterated 

— density estimation evolutionary 
algorithm (IDEA) 899 

— descent 1089 

— greedy (IG) 1094 

— greedy algorithm 

— local search (ILS) 

iteration-level parallel model 

iterative 

— algorithm with stop condition 
(IASC) 1456 

— flattening 1094 

—improvement 1089 

iteratively re-weighted least squares 
(ARLS) 486 


J 


1094 
1095, 1263 
1111 


jamming gripper 699 

java 

— distributed evolutionary algorithms 
library (IDEAL) 1122 

— multi-criteria and multi-attribute 
analysis framework (JMAF) 360 

Java mimetic algorithms framework 
(MAFRA) 1122 

Jensen inequality 561 

joint 

— normal distribution 912 

— posterior probability 570 

joint distribution 553 

— universal decomposition 554 

journal 920 

judgment 

— adaptive 

— intuitve 


346 
346 
346 
jump problem 838 


— rational 


junction tree 558 

K 
k nearest neighbor (KNN) 1216 
Karnik—Mendel (KM) 289, 1455 


—algorithm 1462 

—type reduction 289, 1453 
Karush—Kuhn—Tucker (KKT) 
Kempe chain 1251 

kernel 117 

— adaptive filter(KAF) 523 


588 


— affine projection algorithm (KAPA) 
523, 536 


—design 599 
— Fisher 599 
—graph 599 


— least mean square (KLMS) 523 

— maximum correntropy (KMC) 
534 

— Mercer 597 

— positive definite 597 

— principal component analysis 
(KPCA) 519 

— recursive least square (KRLS) 
323, 333 

— reproducing 597 

— trick 524,579 

— universal 599 

Kessel method 242 

Khepera robot 701 

1079 

kinematic extrinsic primitive 

323 

knapsack problem 

knee point 1006 

knowledge 252 

— base (KB) 204, 229 

— extraction 216, 995 

— gradualness 339 

— granularity 339 

— representation 964 

knowledge discovery 339, 1211 

— and data mining (KDD) 339 

kriging 53 

— modeling 1322 

Kullback—Leibler (KL) 759 


Kimura 
736 
kinematics 
1051 


— divergence 546 

Kuramoto model 617 

Kurswae (KUR) 1001 
L 

labor 1295 


Lagrange method 637 

landscape generator 1132 
Langevin equation 1428 

language 17 

— learning 705 

Laplacean indifference principle 47 


Index 


large 

— neighborhood search (LNS) 

—network simulation 660 

large-scale 484 

—model 660 

— spiking neural network 716 

large-step Markov chain 1095 

latent 

— Dirichlet allocation (LDA) 565 

— space representation 561 

— variable 563 

— variable model 560 

lateralized readiness potential (LRP) 
747 

latin hypercube sampling (LHS) 
1192, 1323 

lattice 

— operation 190 

—ordering 16 

—residuated 15 

— structure 92 

lattice-based 

—approach 95 

—definition 96 


1230 


law of importation (LI) 187 
layer-wise greedy 
— learning procedure 480 


— unsupervised learning 474 

layout optimization 1175 

lazy 

—learning 786 

—meta-learning 809 

leading ones 

—problem 833 

— trailing zeroes (LOTZ) 

leakage 1438 

leaky integrate-and-fire (LIFM) 
Tila 

learnable 488 

learning 569, 720, 919 

-— algorithm 475 

— bound 968 

— classifier system (LCS) 847, 961, 
1214 

— FDA (LFDA) 905 

— from examples module, version 1 
(LEM1) 373 

— from examples module, version 2 
(LEM2) 373 

— from examples using rough sets 
(LERS) 373 


1009 


— problem 568 

—stage 737 

least 

— absolute shrinkage and selection 
operator (LASSO) 585, 786 

— distorted scene 125 

— mean square (LMS) 525 

— square (LS) 527,591 

— square approximation 120 

— square SVM (LS-SVM) 590 

leave-one-out (LOO) 596 


— cross-validation (LOOCYV) 1218 

left neutrality principle (NP) 184 

legged 

— locomotion 700 

— robot 321,700 

Levenberg-Marquadt 482 

Lewis 33 

light emitting diode (LED) 1301, 
1417 

likelihood weighting 559 

limited 

— discrepancy search (LDS) 1232 


—memory 881 


linear 
— combination scheme 489 
— convergence 886 


— discriminant analysis (LDA) 519, 
T37 

— factor analysis 

— GP (LGP) 851 

— matrix inequality (LMI) 618 

linear programming 131 

— fuzzy 135 

linear-quadratic regulator (LQR) 
322 

linguistic 

— expression 299 

— fuzzy modeling (LFM) 223, 230 

— modifier 231 

— quantifier 304 

— term (LT) 1473 

— variable (LV) 204, 1473 

— variableovariable (LV) 1473 

linguistic-variable-term (LVT) 
1473 

Lin-Kernighan algorithm 1091 

liquid state machine (LSM) 464 

liquid-liquid equilibrium (LLE) 
1161 

LIST PARSE 673 


562 


local 

— communication 

— covering 373 

— distribution 906 

— factor analysis (LFA) 507 

— field potential (LFP) 625, 748 

— learning 775 

128 

— minimum 483 

— modeling 775 

— network (LAN) 

— optimum 1159 

— regression 505 

-rule 1415 

— subspace 507 

local search (LS) 

— algorithm 1264 

— method 1052 

— operator 1066, 1067 

locality 1072, 1075, 1077 

— influence 1074 

locally-weighted projection 
regression algorithm (LWPR) 
970 

locational value (LOCVAL) 802 

locked-in syndrome 762 

locomotion 1413 

logic 254 

— based on RS 332 

— block (LB) 1459 

— classical 160 

— first-order fuzzy 20 

— flea 23 

— for data analysis (DAL) 421 

— product 11 

— programming (LP) 41, 1225 

— rational Pavelka 24 


1412 


— maximum 


1107 


1054, 1220, 1264 


— R-fuzzy 14 

— t-norm based 13 
—two-valued 160 
logical setting 420 
logics 

— fuzzy 7 

— Gödel 8 

— Łukasiewicz 8,9 
— many valued 7,8 
— Post 8,10 


logistic regression (LR) 786 
logit function 549 
LOLZ 945 


look-up table (LUT) 1471 


1619 


xəƏpul 


1620 


xapu| 


Index 


loosely coupled 

-MNN 484 

—model 487 

Lorenz time series 538 

loss function 989 

lost value 378 

Lovász extension 69 

lower 

— approximation 334, 375 

— frequency band (LFB) 742 

— membership function (LMF) 
1454 

—runtime bounds 887 

low-level programming 852 

Lukasiewicz t-norm 162 

—conjunction 9 

— disjunction 9 

— infinite valued 11 

—logics 9 

— multivalued logic 91 

-t-norm 69 


M 


machine 1175 


— learning (ML) 52, 340, 444, 474, 


545, 638, 783, 845 

— to-machine (M2M) 813 

machining process 1174 

Mackey-Glass 538 

magnetic resonance imaging (MRI) 
738 

magnetoencephalogram (MEG) 
625 

magnetoencephalography (MEG) 
600, 727 

Main European Events on 
Evolutionary Computation 
(EvoStar) 920 

main vector adaptation 881 

majority 

— choice 1427 

—vote 487 

MAM model 1350 

Mamdani fuzzy system 222 

Mamdani—Assilian 

—control 271 

—rule evaluation 273 

MAMP analysis 1139 

management task 1295 


manifold learning 507 

manipulation 1402 

— reconfigurable robot 

manipulator 699 

man—machine learning dilemma 
(MMLD) 758 

many-objective 

— optimization problem (MOP) 
1033 

—problem 1006 

many-valued logic 

mapping 

—genotype-phenotype 1062 

—phenotype-fitness 1062 

marginal product model (MPM) 
905 

Markov 

—chain 478, 566 

—chain analysis 858 

— decision process (MDP) 510 

—model 545, 565 

—network 552 

— network EDA (MN-EDA) 911 

Markovian agent (MA) 1345 

— model (MAM) 1345 

mask-based crossover 837 

massively parallel machine (MPP) 
1113 

master equation 


1416 


8, 41 


1424 


master/slave (MS) 1020 

— model 931 

— MOEA (msMOEA) 1019 
matching function 197 
mathematical 


— fuzzy logic 18 

— morphology (MM) 197 

— reasoning 252 

matheuristic 1098 

mating selection 873, 983 
matrix inverse lemma 537 
matrix-based learning 507 
maximal 

— margin classifier 587 

— specificity 44 

maximization step 562 
maximum 

— a posteriori (MAP) 547, 585 
— cardinality search (MCS) 1266 
— correntropy criterion 534 

— entropy 874 

— likelihood (ML) 486, 546 


— ranking (MR) 1039 

— uniform temperature 

— value (MV) 208 

MAX-MIN ant system (MMAS) 
1263, 1335 

max-set of Gaussian landscape 
generator (MSG) 1129 

mean 

— absolute error (MAE) 812 

— interval 49 

— percentage error 

— square error (MSE) 

mean of maxima (MOM) 208 

— method 272 

measurable space 84 

mechatronics system integration 
1409 

medial temporal (MT) 752 

median operator 64 

medical reasoning 254 

membership 1462 

— function (MF) 134, 160, 204, 
297, 338, 1453 

—set 98 

memetic algorithm (MA) 
1056, 1097 

— mutation 


1372 


1196 


1048, 


1053 

— recombination 1053 
memory block 1466 
memristive neurodynamic 629 
memristor-based 

— neural network (MNN) 632 


— recurrent neural network (MRNN) 


633 
—synapse 630 
Mendel 1062 


Mercer theorem 598 

message passing interface (MPI) 
1025, 1115 

messenger RNA (mRNA) 849 

meta-cognitive subsystem (MCS) 
690 

metaheuristic (MH) 784, 1088, 
1253, 1333 


— classification 1048 
— hybridization 1048 
— technique 1258 


meta-model 784 


metamorphic robot 1410 


126, 482, 527 


Index 


meta-optimizing semantic 
evolutionary search (MOSES) 
914 

method of f-based partitions 950 


metric 1063 

—city-block 1063 
—Euclidean 1063 
—Hamming 1063 


Metropolis condition 1092 

Michigan approach 963, 1453 

microarray 1211 

microbial genetic algorithm 832 

microblaze 1458 

micro-GA 1001 

micro-robot 1302 

midbrain 730 

migration 932 

— frequency 932 

— interval 932 

—size 932 

migration topology 932 

— bidirectional ring 932 

—complete graph 932 

—grid graph 932 

—hypercube 932 

—random graph 932 

— scale-free graph 932 

—torus graph 932 

—undirectional ring 932 

min-degree heuristic 1262 

min-fill heuristic 1262 

mini-batch 482 

minimal 

—commitment assumption 49 

—complex 373 

— epistemic logic (MEL) 38 

— redundancy maximal relevance 
(mRMR) 1216 

— specificity 44 

— subset 988 

minimum 

— description length (MDL) 336, 
506, 910, 1214 

— jerk hypothesis 735 

—jerk model 736 

—length principle 339 

-t-norm 162 

— weight rooted arborescence 
(MWRA) 1333, 1334 


minor 

— component analysis (MCA) 504 
— subspace analysis (MSA) 504 
missing information 300 


mixed 
— Bayesian optimization algorithm 
(mBOA) 906 


— effects model 1138 

mixture 911 

—model 563 

— of experts (MoE) 485, 672 

MLEM2 (modified LEM2 algorithm) 
ait 


Möbius transform 79 


modal 

— basis 1200 
— logic 38 
model 250 


— assumption validation 1136 

— based multiobjective evolutionary 
algorithm (MMEA) 920 

— building 1134 

— ensemble 783 

— ensembles fusion 807 

— generation 1141 

— lifecycle 786 

— of bipolarity 304 

—training 1194 

modeling 250, 1177 

modification function 197 

modified duality 139 

modular 

— neural network (MNN) 473, 1513 

— selection and identification for 
control (MOSAIC) 669, 672 

modular robot 1297 

— reconfigurable robot 

modularity 473, 860 

modularization 484 

modulus of continuity 119 

modus ponens 188 

molecule 1409 

momentum adaptation 881 

monoid 15 

monolithic 484 

monotone measure 75 

Moore neighborhood 935 

moral graph 557 

morphological computation 698 

morphological gradient (MG) 1511 

mortgage collateral valuation 801 


1407 


most probable 

—assignment 556 

— explanation (MPE) 556 

motion 

—execution 732 

— planning 732 

— primitive 736 

motivational subsystem (MS) 690 

motor 

— adaptation study 735 

— command 733 

— cortex (M1) 666, 730 

— imagery 728 

— observation 740 

— population vector 751 

— synergy 668 

motor control 666, 732 

— computational model 669 

motor learning 

— explicit 737 

— implicit 737 

— time phases 737 

motor system 

— electric field 745 

— extra cerebral magnetic field 745 

— function 738 

— structure 738 

movement 

— cuing 738 

— related cortical potentials (MRCP) 
747 

— related potential 747 

—time (MOT) 752 

moving target tracking 318 

MTL-algebras 19 

M-TRAN 1410 

multi-agent interaction 693 

multicore processor 1458 

multi-criteria choice and ranking 

— decision rule 365 

multi-criteria classification 353, 
361 

multi-criteria decision 

— aiding (MCDA) 349 

— analysis (MCDA) 1006 

— problem 349 

multi-deme model 932 

multidimensional scaling (MDS) 
786 

multifingered dexterous manipulator 
699 


1621 


xəƏpul 


1622 


xapu| 


Index 


multi-graded dominance 362 

multi-input single-output (MISO) 
195 

multi-instance learning (MIL) 500 

multi-instance, multi-label learning 
(MIML) 500 

multi-label classification 499 

multilayer perceptron (MLP) 474, 
549 

multilayered feed-forward ANN 
model 458 

multi-level model 1055 

multimemetic algorithm (MMA) 
1054 

multimodal optimization 

multiobjective 

— approach 1491, 1492 

— cost function 479 

— evolutionary (MOE) strategy 232 

— evolutionary algorithm (MOEA) 
1017, 1033, 1215 

— evolutionary algorithm (MOEA) 
based on decomposition 
(MOEA/D) 1043, 1492 

— messy GA (MOMGA) 

— optimization (MOO) 
1164, 1313, 1314 

— particle swarm optimization 
(MOPSO) 1311 

— problem (MOP) 

— strategy 223 

multiobjective genetic 

— algorithm (MOGA) 1000, 1041 

— fuzzy system (MoGFS) 1479 

multiobjective optimization (MOO) 

— problem (MOP) 996 

multiobjectivization 1160 

— assisted multimodal optimization 
(MOAMO) 1162 

multiple 

— algorithms and multiple problem 
instances (MAMS)_ 1133 

610 

— classifier system 491 

— inputs-single output (MISO) 205, 
1501 

— kernel learning (MKL) 

— linear regression (MLR) 

—set point 1369 


1162 


1001 
839, 995, 


1313 


— attractor 


585 
1218 


multiple criteria decision 

— aiding (MCDA) 398 

— making (MCDM) 997, 998 

multiplicative approximation 986 

multiprocessor system (MPS) 1458 

multirecombination 875 

multi-response linear regression 
(MLR) 517 

multi-robot system 1297 

multi-task learning (MTL) 501, 585 

multivalued logic 91 

multivariate 

— adaptive regression splines 
(MARS) 785 

— interaction 910 

multi-view 489 


— learning 500 

— NN ensemble 489 
multizone temperature 1365 
music 856 

mutation 826, 849, 930 

— operator 876, 1259 

-rate 833 


mutation-based search locality 
1075 

mutual information 548 

— maximizing input clustering 
(MIMIC) 905 


nano-robot 1302 

naive 

— Bayes model 553 

— Bayesian rough set model 395 

National Aeronautics and Space 
Administration (NASA) 856 

natural 

— classifier 240 

— evolution strategy (NES) 882 

natural gradient 884 

—update 884 

navigation 315, 702 

n-dimensional fuzzy sets 106 

nearest neighbor heuristic 1087 

necessary conclusion 251 

necessity function 34 

negation 161, 190 

— fuzzy implications 167 


— idempotent 22 


negative 

— log-likelihood 478 

— matrix and tensor factorization 
(NMF) 547 

— pieces of information 37 

— slope (NS) 747 

negative correlation learning (NCL) 
485 

— algorithm 487 

neighborhood 

-model 934 

— pruning 1090 

neocortex 717 

nervous system 649 

— anatomy 649 

— phsichology 649 

nest-building 1296 

net present value (NPV) 800 

network 


— connectionist 695 

— of workstation (NOW) 1108, 
1113 

—routing 638 

neural 


—coding schemes 749 

— computation (NC) 473 

—computing 716 

— information processing system 
(NIPS) 495 

neural input activity 

— high-frequency content 749 

— low-frequency content 749 

neural network (NN) 446, 486, 
GLL 719, 751, 771, 812, 1173, 
1215, 1486 

— ensemble 487 

neurocomputational model 

neurocomputer 716 

neurocomputing (NC) 716, 784 

neurocube framework 778 

neurodynamic 

-model 635 

— optimization 634 

neuroengineering 727 

neuro-evolution of augmenting 
topologies (NEAT) 850 

neuro-fuzzy 

— inference system (NFI) 775 

—system 1483 

neuromorphic 

— architecture 


674 


718 


Index 


—computing 716 

—engineering 715 

neuron 650 

— ascending 731 

— circuit 719 

— descending 731 

— electrical properties 

—motor 731 

neuronal nuclei antibody (NeuN) 
653 

neuroprosthesis 728 

neurorehabilitation 727 

neuroscience 473,717 

neutral special implication 191 

niching 999, 1160 

nilpotent 

— minimum 162 

-t-norm 164 

no free lunch (NFL) 1048 

— theorem (NFL) 860, 945, 1048, 
1160 

no interaction 908 

node histogram based sampling 
algorithm (NHBSA) 915 

noise 

— cancellation 540 

— reduction (NR) 540 

— removal 119 

non-action-centered subsystem 
(NACS) 690 

non-convex optimization 474, 487 

non-deterministic 

— information logic (NIL) 421 

— polynomial-time (NP) 1071 

nondominated 

— front 997 

— solution 989 

— sorting 989 

— sorting genetic algorithm 
(NSGA-II) 984, 990, 1000, 1040, 
1164, 1315, 1492 

— sorting particle swarm optimization 
(NSPSO) 1315 

non-Gaussian factor analysis (NFA) 
505 

non-learnable 

nonlinear 

— dynamical systems (NDS) 607 

— principal component (NLPCA) 
789 

nonlocal adaptation 880 


656 


488 


nonmaximum suppression 128 

nonmonotonic 

— inference 39 

— reasoning 31 

nonnegative matrix factorization 
(NMF) 507, 508 

non-saturating fuzzy implication 
188 

nonuniform rational B-splines 
(NURBS) 1176 

norm 

— triangular 12 

normal distribution 911 

— multivariate 876 

normalized LMS (NLMS) 528 


normed space of square-integrable 
function 121 


novelty criterion (NC) 525 
nuclear norm 585 

number 

— comparison 708 

706 

— learning 706 


— concept 


numerical 
— attribute 377 


— control (NC) 1174 
0 

object 

— closure 1400 


— detection 505 
— manipulation 699 
1401 
— transportation 


— sorting 
1395 
objective 

— probability 47 

984 

984 

objective function 984, 1131 
1131 


— space 
— vector 


— randomly generated 
1166 
obstacle avoidance 


observation 
316 
offspring 873 

— population 945 


on-chip peripheral bus (OPB) 1468, 


1471 


one single algorithm 

— and multiple problems (SAMP) 
1133 

— and one single problem (SASP) 
1133 

one-level decomposition 

OneMax 943 

— problem 834 

one-point crossover 

one-stage optimization 

online 

— adaptation 1439 

— analytical processing (OLAP) 
303 

— kernel learning (OKL) 524 

—learning 978 

ontogenetic phenomena 
(development, learning, 
maturation) 710 

ontology 50 

open programming language (OPL) 
1229 

open-ended 

— cumulative learning 710 

—evolution 860 

— representation 860 

operations research (OR) 366, 787, 
1050, 1225 

operator 688 

optimal 

— collaboration rate 


129 


836 
1282 


1425 

— control strategy 733 

— design 1175 

— fusion rule 488 

—region 132 

988 

—solution 132, 1062 

— solution of FLP 137 

570 

— states problem 568 

— tuning function 751 

optimality 135 

—method 637 

optimistic possibilistic criterion 45 

optimization 67, 1178, 1295, 1505, 
1533 

— algorithm 

524 

— of type-2 fuzzy 

— principle 734 


— set 


— state 


1319 
— error 


1503 


1623 


xəƏpul 


1624 


xapu| 


Index 


— result 1325 

-time 943 

optimization problem 1128, 1362 
— artificial test function 1129 

— feature extraction 1128 

— instance generation 1128 

— natural problem class 1129 

— problem class 1128 

— problem instance 1128 

order crossover (OX1) 1260 


order-based crossover (OX2) 1260 
ordered modular average (OMA) 
65 
ordered weighted 
— average (OWA) 63 
— maximum (OWMax) 65 
ordering property (OP) 184 
ordinal classification 
—problem 349 
— with monotonicity constraints 
392 
ordinal representation 1075 
ordinary differential equation (ODE) 
623 
original equipment manufacturer 
(OEM) 787 
ORL database 
oscillation 611 
oscillatory network 616 
outer measure 75 
over-complete representation 475 
overfitting 498, 1277 
overlapping subspace 
overshoot (OS) 292 
overview area 1505 


1513 


490 


p 


pairwise interaction 909 
P-algebra 11 

parallel 

— algorithm 1107 

— and distributed GP (PDGP) 849 
— design 1108 

— evolutionary algorithm 
— genetic algorithm 839 
— implementation 1113 
— metaheuristic 1107 

— model (PM) 1020, 1109 
— platform (PP) 1020 


1107 


— problem solving in nature (PPSN) 
920 

— programming 1114 

parallelization 855, 917 

parameter 

—control 873 

—exogenous 1131 

— projection 1439 

—setting 1338 

parameterization method 

parameterized 

— dominance-based rough set 401 

—rough set 392 

parametric model 477 

parental centroid 874 

parents 873 

Pareto 

— archived evolution strategy (PAES) 
1000, 1001 

— compliance 

— dominance 


1317 


986 

44, 984, 1163 

— envelope based selection algorithm 
(PESA) 1001 

— front computation (PFC) 1020 

— optimal 984, 985, 990, 997 

— set approximation 985, 990 

— sorting evolutionary algorithm 
(PSEA) 805 

parietal cortex 730 

parity judgment 708 

Parkinson disease (PD) 738 

parsimonious behavioral strategy 
701 

partial 

— differential equation (PDE) 620, 
1428 

— least square (PLS) 1218 

— mutual information (PMI) 757 

— ordering 40, 996 

— reconfiguration (PR) 

partially reconfigurable 

— module (PRM) 1471 

—region(PRR) 1471 

partially-mapped crossover (PMX) 
1053, 1259 

particle 

— swarm optimization (PSO) 852, 
1052, 1179, 1292, 1333, 1369, 
1493, 1517, 1533 

— swarm-based EMO 


1470 


1001 


particle swarm optimization (PSO) 
T37 

partition 944 

— based operator 

— function 555 

— into fuzzy sets 120 


1242 


—of-r 116 
— of-unity 115 
— problem 1239 


partitioning-based strategy 489, 
491 

path 

— following 318 


— length control 880 
—relinking 1097 

— selection 1426 
pattern 


— based clustering 507 

—classifier 318 

—recognition 1509 

— search (PS) 1533 

Pavelka-style 23 

Pawlak rough set 388 

peak signal-to-noise ratio (PSNR) 
123 

penalized empirical risk 
minimization 584 

penalty method 636 

perception 314, 345, 456, 1351 

— based computing (PBC) 343 


performance 1141 
— assessment 986 
—comparison 292 
—evaluation 1116 
— indicator 1275 
— metrics 1009 
peripheral 


— interface controller (PIC) 1366 

— nervous system (PNS) 650 

peri-stimulus-time histogram (PSTH) 
750 

permutation 1071 

— problem 915 

persistent vegetative state (PVS) 
762 

perspective 1123 

PERT network 52 

perturbation 1264 

perturbed chain statistical associating 
fluid theory (PC-SAFT) 1161 

pessimistic possibilistic criterion 45 


Index 


phase 

—diagram 1165 

— equilibrium calculation problem 
1161 

— equilibrium problem 

—lock value (PLV) 614 

—space 608 

— synchronization 613 

—transition 1279 

phenotype 1062 

pheromone 1262 

—deposition 1262 

— evaporation 1263 

—trail 1261 

—update 1336 

phylogenetic (evolutionary) factor 
710 


1165 


physical modeling 1178 
PID controller 285 

Pitt approach 963 
Pittsburgh approach 1453 
pivoting rule 1090 

— best improvement 1090 
— first improvement 1090 


place and route (PAR) 1471 

planning 319 

— domain definition language 
(PDDL) 1277 

plant 733 

plastic agent 703 

plausibility functions 

plus strategy 827 

POD model quality 

point 

-method 1319 

—screening 1323 

pointing 708 

point-to-point arm movement 

pointwise approximation 118 

pollination 932 

polynomial-time randomized 
approximation scheme 944 

pooling 483 

population 825, 852, 873, 929, 985, 
1068 

— coding 667, 749 

— diversity 827, 838 

-model 852 

— vector hypothesis 752 

population-based incremental 
learning (PBIL) 904 


437 


1199 


T39 


port map 1465 

portfolio optimization 783 

position-based crossover (POS) 
1260 

positive 

— boundary, negative regions 

— definite kernel 597 

— feedback 1379 

— pieces of information 37 

—region 337, 427 

positron emission tomography (PET) 
742 

possibilistic 

— approach 307 

— belief network 42 

— clustering 242 

— fusion 51 

— logic 40 

possibility 

— distribution 33-36, 297 

— function 34 

— measure 53 

possibility theory 31, 297 

— basic notion 33 


390 


possible 

— region 427 
-rule 376 
post 

— logics 8 
—system 10 


posteriori approach 998 
post-processing adaptation 759 


posture 705 
potential function 555 
power 


—consumption 716 

—setalgebra 12 

— spectral density (PSD) 741 
precise fuzzy modeling (PFM) 223 


prediction 1137 
— interval 48 
predictive 

— analytics 1144 


— sparse decomposition (PSD) 479 

preexperimental planning 1165 

preference 984 

— information 990 

— information pairwise comparison 
table 361 

— order relation (POR) 

— query 49 


1039 


-— refinement 986 

-relation 984-987 

— relation rough approximation 
362, 364 

— representation 43 

— statement 44 

prefrontal cortex (PFC) 666 

prelinearity condition 17 

premotor cortex (PMC) 666 

preorder 985 

pre-processing 755 

pre-training 483 

previous sensory state 

primal problem 132 

primal-dual estimation 579 

primary membership 1455 

prime implicant 339 

primitive connectives 17 

primitives’ modules 736 

principal component analysis (PCA) 
504, 592, 756, 786, 1007, 1188 

principal subspace analysis (PSA) 
503 

principal value (PV) 613 

principle of minimum redundancy 


709 


1241 
prior 
—knowledge 919 
— probability 566 
probabilistic 


— classifier vector machine (PCVM) 
aoe 

—computing (PC) 784 

—implication 192 

— incremental program evolution 
(PIPE) 914 

— iterative improvement (PII) 1091 

— latent semantic analysis (pLSA) 


563 
—method 545 
— model 548 


— model-building genetic algorithm 
(PMBGA) 899 

— modeling 545 

-PCA 563 

— rough set 

— set 106 

-sum 163 

— value 149 

probability 1351 

— amplification 931 


387 


1625 


xəƏpul 


1626 


xapu| 


Index 


— density function (pdf) 551 

— measure 47 

— possibility transformation 47 

— theory 31 

probably approximately correct 
(PAC) 809 

problem 

— analysis 1150 

— decomposition 904, 1068 

— definition 900 

— difficulty 859 

— instances 1132 

—job shop 1076 

problem-space 

— computational model 688 

— search 1051 

procedural memory 686 

process monitoring 1179 

processing 

— perception 220 

— structure 206 


processor local bus (PLB) 1467 
product 

—logic 9,11 

— of experts (PoE) 491 

— t-norm 162 


production compilation 687 

profitability landscape 1363 

prognostic 788 

— health management (PHM) 783 

program 

— distribution estimation with 
grammar model (PRODIGY) 905 

— evaluation and review technique 
(PERT) 52 

— evolution explicit learning (PEEL) 
914 

—memory 1473 

programmable logic 

—array (PLA) 1453 

— cell 1459 

programming by optimization PbO 
1100 

progress rate 887 

— sphere function 889 

projection operator 21 

proof by contradiction 194 

propagation 941 

proper orthogonal decomposition 
(POD) 1185 


proportional 

— differential (PD) 1453 

— integral, and derivative (PID) 
components 270 

— integral-derivative (PID) 269, 
285, 317 

—selection 829 

pulse width modulation (PWM) 
1366 

pure crossover 838 

pursuit-tracking task (PTT) 751 

pushing 1396 

pyramidal neuron 742 


Q 


Q-learning 511, 965 
QL-operation 185 


Q-Q plot 1140 
quad tree algorithm 124 
quadratic 


— assignment problem (QAP) 900 
— information potential (QIP) 525 
— programming (QP) 587 
qualitative 

— possibility theory 38 
—uncertainty 50 
quantile-quantile (Q-Q) 1136 
quantitative possibility theory 45 
quantized KLMS (QKLMS) 533 
quantum-inspired 

—eSNN 779 

— optimization 779 

quasiconcave function 134 
—semistrictly 134 
quasi-possibilistic logic 40 
query 296 

querying 302 


quorum sensing 1389 
R 
Radcliffe 1065 


radial basis function (RBF) 280, 
489, 507, 525, 773, 1191 

— ANN model 464 

radial membership function 117 


radio frequency identification (RFID) 


319 


random 

— access memory (RAM) 
1457 

— binomial variable 833 

— decision 1424 

— forest 809 

— generation number (RGN) 

— replacement 831 

— subspace 809 

— walk 1386 

random-effects 

— design 1134 

— experiment 

— model 1135 

randomized 

— Hough transform (RHT) 507 

— iterative improvement (RII) 1091 

— linear programming (RLP) 805 

— local search 948 


1118, 


1469 


1134 


randomness 1424 
rank selection 830 
ranking 


— function theory 41 
— problem 349 


rate 
—code 749 
—coding 749 


rational analysis 685 

reaction time (RT) 732 

reactive search 1100 

read only memory (ROM) 

readability 221 

readiness potential (RP) 747 

reading direction 708 

real time (RT) 1454 

real time recurrent learning (RTRL) 
462 

real-coded BOA (rBOA) 905 

realization 478 

real-value 

-EDA 912 

— encoding 1468 

— vector 911 

reasoning 203, 251 

— common sense 344 

— deductive 344 

— inductive 344 

— nonmonotonic 345 

recombination 875, 930, 1239 

— discrete 877 

—role of 875 


1453 


Index 


recombination operator 1068, 1069 

— design guideline 1068 

reconfigurable robot 1407 

—chain-type 1409 

—hybrid 1410 

—lattice-type 1409 

—motivation 1407 

— progammability 

—simulation 1417 

reconstruction cost function 475 

recruitment 1416 

recurrent neural network (RNN) 
455, 756 

recursive 

— credit propagation 490 

— least square (RLS) 525, 970 

— neural network (RecNN) 455 

red fluorescent protein (RFP) 653 

reduced intstruction set computer 
(RISC) 1457 

reduced-order model (ROM) 

reduction 1150 

—stage 1464 

redundancy 1076 

— definition 1078 

—locality 1077 

—non-synonymous 1076 

— non-uniform 1078 

— order 1078 

— synonymous 

— uniform 1078 

redundant representation 

reference direction 1006 

reference point 990, 1006, 1319 

— minimization of distance 1490 

refinement 986-989 

region 1320 

— of interest (ROI) 1042 

register transfer logic (RTL) 

regression 497, 962, 1216 

regularity model based multiobjective 
EDA (RM-MEDA) 920 

regularization 480, 583 

— network (RN) 538 

regularized 

— auto-encoder 480 

— multi-task learning (RMTL) 501 

— Siamese DNN 483 

regulator 734 

rehabilitation robot 322 

reinforced learning 496 


1417 


1188 


1076 


1076 


1460 


reinforcement learning (RL) 321, 
510, 670, 786, 854, 961, 1300 

rejection sampling 559 

related model 571 

a-relationon R 136 

136 

relation schema 297 

relational 

— algebra 297 

297 

— database 296 

— genetic algorithm learner (REGAL) 
1453 

relative margin classifier 

relaxation of constraints 


— strict 


— calculus 


589 

1282 

relevancy transformation (RET) 71 

remaining useful life (RUL) 791 

remote 

— method invocation (RMI) 1115 

— procedure call (RPC) 1115 

replacement method 826 

representable aggregation function 
(RAF) 189 

representation 475, 849, 1061 

— benefit 1064 

— benefits 1064 

— learning 483 

— locality 1073 

—messy 1065 

— problem-specific 

—redundant 1076 

— variable-length 857 

reproducibility 733 

reproducing kernel 526, 598 

— Hilbert space (RKHS) 523, 578, 
597 


1075 


research 

— challenge 1302 
— direction 1097 
— resource 861 

— software 861 
reset (rst) 1461 


residuated lattice 15 
residuation operator 14 


resource allocation 1365 
respect 1066 
respectful crossover 837 


response threshold model 1299 
restricted Boltzmann machine 
(RBM) 477 


restricted maximum likelihood 
1136 

— estimator (REML) 

RET operator 71 

revision rule 38 

Reynolds-averaged Navier-Stokes 
(RANS) 1191, 1311 

R-fuzzy logics 14 

ribonucleic acid (RNA) 

ridge function 891 

R-implication 14, 185 

risk 

— analysis 52 

— management 783 

rival penalized competitive learning 
(RPCL) 506 

robot 

-arm 975 

— control 848 

— operating system (ROS) 

robotic 

— manipulator 

—system 638 

robust evolving cloud-based 
controller (RECCo) 1435 

robustness 592, 1412 

role-based control 1414 

root-mean-square error (RMSE) 
123 

— of prediction (RMSEP) 1217 

rough approximation preference 
relation 362 

rough membership 

— dominance-based 355-357 

— function 338 

rough mereology 342 

rough set (RS) 331 

— approximation 388 

— binary relation-based 415 

— covering-based 416 

— data inconsistency 350 

— dominance-based 351 

— indiscernibility-based 351 

— logic 421 

— methodology 349 

— probabilistic 355 

— stochastic dominance-based 358 

— theory 333 

— topology 422 

— variable consistency 355 


1136 


849 


1418 


316 


1627 


xəƏpul 


1628 


xapu| 


Index 


roulette wheel 

—algorithm 829 

— selection (RWS) 

routing 1349 

ruin-and-recreate 

rule 371, 1501 

— base (RB) 205 

—consequent 1439 

— consistent with the data set 372 

— constant introduction 24 

— induction 371 

— set complete for a data set 372 

— specificity 375 

— strength 375 

— support 375 

— utility 686 

rule-based classification algorithm 
360 

runtime 887 

— complexity 859 


S 


1215 


1094 


sampling 
— algorithm 558 
-method 992 


satisfiability (SAT) 1088, 1277 

satisfiable standard 26 

s-bot 1398 

scalability 1119, 1412 

scalable 484 

scalarizing function 990, 1035, 
1489 

scaling up 710 

scatter search 1054, 1097 

scheduling 1119 

schema-based analysis 858 

school (SCH) 1282 

schooling 1385 

scientific computing 638 

score variable 592 

scramble mutation operator (SM) 
1260 

search 

—path 880 

— preference 990 

search method 

1087 

1087 


— constructive 
— perturbative 


search operator 1065 

—bias 1078 

— design guideline 

—graph 1070 

— high locality 1073 

— permutation 1071 

— standard 1070 

—subset 1070 

-tree 1071 

search space 1063 

—metric 1063 

search-based software engineering 
855 

secondary membership 1455 

second-level decomposition 125 

selected application 638 

selection 930 

—method 826 

—noise 879 

self-adaptation 879 

self-assembly 1298, 1399 

self-desynchronized croaking 

self-modificatying code 852 

self-organization 1291 

self-organized 

— behavior 1302 

— criticality (SOC) 623 

— feature map (SOFM) 669 

self-organized aggregation 

— biological system 1379 

— swarm robotics 1380 

self-organizing 473 

— fuzzy neural network (SOFNN) 
757 

— map (SOM) 465, 705, 773, 1180, 
1326 

self-reconfigurable robot 

—system 1297 

self-reconfiguration 1414 

self-regularization 529 

— property 535 

self-replicating machine 1297 

self-supervised learning 475 

semantic 10 

—cointension 221 

— cointension-based index 228 

-web 50 

semi-supervised 

— learning 480, 513 

— support vector machine (S3VM) 
514 


1065, 1066 


1296 


1408 


sensation 651 

sensing 1298 

sensorimotor 

— area (SMA) 746 

— learning 736 

— rhythm (SMR) 740, 757 

— unsupervised, 
redundancy-resolving control 
architecture (SURE-REACH) 
669 

sensorimotor—computer interface 
763 

sensory 

-data 854 

— evaluation workflow 

— information 701 

sensory motor 

— coordination 698 

— oscillation 727 

-rhythm 727 

sequence 

—learning 668, 737 

— transduction 572 

sequential 

— backward selection (SBS) 1217 

— forward selection (SFS) 1217 

— minimum optimization (SMO) 
590 

— parameter optimization toolbox 
(SPOT) 1141 

serial reaction time (SRT) 737 

set 

— definable 371 

— elementary 371 

— indicator 983—986 

set preference 984 

— algorithm for multiobjective 
optimization (SPAM) 987 

—relation 984, 989 

set-based optimization 985 

settling time 292 

Shackle 32 

shadowed set 100 

shallow architecture 474 

shape parameterization 1317 

Shapley value 147, 153 

shared 


1152 


—memory 1113 

— parallel architecture 
shelter selection 1426 
shorter term feedback 709 


I3 


Index 


SI optimization 1292 

SI robotic 1296 

sigma scaling 829 

sigmoid 458 

signal/image processing 638 

signaling 652 

—medium 654 

signal-noise-ratio (SNR) 540 

silicon 

—cochlea 716 

— neuron 716, 720 

—synapse 719 

similarity 

—graph 507 

-relation 341 

similarity based 

— approach 307 

— reasoning (SBR) 195, 691 

— rough set approach 343 

simple 

— ant colony optimization (S-ACO) 
1504 

— evolutionary multi-objective 
optimizer 1009 

— F-transform-based fusion 
algorithm (SA) 126 

— recurrent network (SRN) 461 

simple-inversion mutation operator 
(SIM) 1260 

S-implication 167 

simplified value at risk (SVaR) 805 

simulated 

— annealing (SA) 1052, 1092, 1533 

— binary crossover (SBX) 1001 

— evolution and learning (SEAL) 
920 

simulation 902, 1177 

—study 291, 1442 

simultaneous localization and 
mapping (SLAM) 320 

single 

—attractor 609 

— input single output (SISO) 195 

— nucleotide polymorphism (SNP) 
1211 

— source shortest path problem 
(SSSP) 954 

single unit (SU) 727 

— activity correlate 751 


single-objective 

— approach 1489 

— genetic algorithm (SOGA) 1203 

single-peak normal distribution 911 

single-receiver model 948 

single-source shortest path problem 
(SSSP) 942 

singleton approximation 380 

singular value decomposition (SVD) 
600, 1191 

situatedness 697 

slack variable 587 

sliding window KRLS (SW-KRLS) 
536 

slow learning 737 

smart cube 1402 

S-metric selection evolutionary 
multiobjective algorithm 
(SMS-EMOA) 1164, 1492 

— parameter 1168 

smoothing problem 569 

SMR-BCI control 740 

SNARC effect 706 

Sobel edge detector 

— fuzzy logic 1510 

social 

— amplification 1427 

— attraction 1427 

—odometry 1389 

soft 

—competition 488 

—computing (SC) 298, 784 

— robotics 709 

softmax 485 

software (SW) 920, 1453 

—engineering 854 

solution 984 

— component 

— construction 


1510 


1086 

1336 
— level parallel model 
sorites paradoxe 333 
sorting 953 


1112 


source 
-code 978 
— of imperfection 299 
SPAM 988 


sparse auto-encoder (SAE) 480 
sparse coding 479 

—neural gas(SCNG) 547 
sparsification criterion 532 


spatial 
— accuracy 733 
— distribution 1424 


— representation 709 

spatial-numerical association of 
response codes 706 

spatiotemporal oscillation 619 

special fuzzy implication 188 

specificity 34 

spectral regularization 585 

speed-accuracy tradeoff 1388 

speedup 936 

— absolute speedup 936 

— efficiency 937 

— incremental efficiency 937 

— linear speedup 936 

— orthodox weak speedup 936 

—relative speedup 936 

— single machine/panmixia 

— strong speedup 936 

—sublinear speedup 936 

—superlinear speedup 936 

— weak speedup 936 

sphere function 890 

spike 716 

— based learning 720 

— based learning circuit 

—count rate 750 

— pattern association neuron (SPAN) 
ERE 

— response model (SRM) 662, 775 

spike-timing dependent 

— learning (STDP) 631 

— plasticity (STDP) 721,777 

spiking 

— neural network 716 

— neuron model 775 

spinal cord 730 

sporadic model building 919 

stability 215 

stack-based GP 851 

stacked generalization 488 

stacking 517 


936 


720 


StaGe (SG) 1472 
stagnation measure 1263 
standard 


— completeness 17 

— deviation percentage error (SDPE) 
1196 

— GP (StdGP) 849 


— mutation operator 1067 


1629 


xəƏpul 


1630 


xapu| 


Index 


—negation 162 

— reference dataset (SRD) 799 

starting point 920 

state posterior probability 569 

state-action-reward-state-action 
(SARSA) 962 

state-space model (SSM) 509 

static 

— partial reconfiguration (SPR) 
1470 

—region 1474 

steady-state 874, 1425 

— genetic algorithm 826 

stepping stone model 933 

step-size 873, 878 

stereotyped kinematic pattern 

stick pulling 1398 

— experiment 1424 

stigmergy 1298, 1401 

stochastic 

— differential equation (SDE) 

— dominance rough set 358 

— dominance-based rough set 
approach 406 

— gradient descent 482 

— gradient descent algorithm 

— grammar-based genetic 
programming (SG-GP) 914 

— hill climbing with learning by 
vectors of normal distribution 
(SHCLVND) 911 

— neighborhood embedding (SNE) 
548 

—resonance (SR) 622 

— universal sampling 829, 1215 

stochastic local search (SLS) 1085 

—algorithm 1087 

—concept 1086 

—engineering 1098 

— method 1086, 1091 

—nature 1086 

strength 

— Pareto evolutionary algorithm 
(SPEA) 984, 1001, 1492 

— raw fitness (SRF) 1023 

strict 

—negation 161 

—t-norm 164 

strictly positive definite (SPD) 525 

string 1070 


736 


1428 


476 


strong 

— duality theorem 142 

— general completeness 18 

—negation 161 

— standard completeness 18 

strongly typed GP (STGP) 849 

structural 

—complexity 484 

— magnetic resonance imaging 
(sMRI 727 

— risk minimization (SRM) 580, 
583 

structure 

structured 

—data(SD) 469 

— query language (SQL) 297, 339, 
813 

structure-learning mechanism 1441 

subjective possibility distribution 
48 

subpopulation 932 

subset 1070 

— approximation 380 

— size-oriented common feature 
(SSOCF) 1217 

— sum problem 825 

subspace-based function (SBF) 507 

subsystem-based rough sets 418 

subthreshold 718 

success 

— probability 931 

—rate 1276 

-rule 878 

Sugeno integral 76 

suitability function 1363 

sum-product message passing 558 

super agent 1055 

SuperBot 1410 

superior longitudinal fasciculus 
(SLF) 739 

supervised classifier system (UCS) 
972 

supervised learning 474, 496 

supplementary motor area (SMA) 
666, 738 

support vector 

— classification (SVC) 586 

— machine (SVM) 445, 523, 549, 
577, 757, 1148, 1216 

—regression 589 

supporting neural network 654 


1437 


suppression through synchronization 
hypothesis 744 

surprise 533 

— criterion (SC) 525 

surrogate 

— based optimization (SBO) 

— based shape optimization 

—model (SM) 1195 

— modeling 1201, 1316 

survival 

—implication 192 

—modeling 550 

swarm 

— density 1429 

— intelligence (SI) 1291, 1361 

swarm robotic 1296 

— inspection 1425 

Swarm-Bots 1398 

symbolic regression (SR) 

— workflow 1151 

symmetric 

— bipolar set 107 

— bivariate bipolarity 302 

— multiprocessor (SMP) 1113 

— univariate bipolarity 301 

synapse 456,719 

synaptic dynamics 719 

synchronization 611, 1381 

— behavior 1381 

— biology 1382 

— swarm robotics 

— transition 614 

synchronized cortical oscillation 
745 

synchronous 1119 

synchrony 611 

synergy-based representation 668 

system 1296 

systematic search technique 


1187 
1204 


812, 853 


1382 


1098 


T 


T1-FPID controller 291 

tabu search (TS) 1052, 1092, 1162, 
1245 

— aspiration criterion 1093 

tabu tenure 1092, 1245 

Takagi-Sugeno 

— control 274 


— controller 274 


Index 1631 


Takagi-Sugeno-Kang (TSK) 208, 
318, 1438 

takeover time 930 

target 

— objective genetic algorithm 
(TOGA) 805 

— switching paradigm 736 

—tracking 318 

task 1301 

—decomposition 490 

tautology 

— general 26 

— positive 25 

— standard 26 

taxonomy 848, 903 

t-conorm 13, 163, 428 

— fuzzy implications 167 

temperature field 1429 

tempered expectation-maximization 
564 

temporal 

-code 749 

— dynamics 718 

— factor analysis (TFA) 509 

— ladder 509 

— reasoning 51 

temporal difference (TD) 511 

— learning 511 

tensegrity 700 

termination condition 477 

terminological logic 50 

ternary liquid-liquid equilibrium 
1165 

test function 

test problem 

- DTLZ 1009 

-KUR 1001 

-ZDT 1001 

thalamus (TN) 730, 742 

theoretical approaches 420 

three-way decision 397 

tightly coupled 

-MNN 484 

-model 485 

Tikhonov regularization 583 

time 

— continuation 918 

— delay neural network (TDNN) 
461 

— saving (TS) 1198 

time-dependent firing rate 


1131 


749, 750 


timescales of adaptation 710 
t-norm 


— algebra 17 
— fuzzy implications 167 
tolerance 


— based rough set approach 343 

-— relation 341 

tool optimization 

top-level design 

topology 421 

total 

— experiment time (TET) 752 

— order 986 

tournament selection 830 

trade-off solution 997 

traditional parallel model 1110 

trainable threshold gate array 
(TTGA) 632 

training data 483 

trajectory tracking 317 

transduction protein 654 

transductive 

— learning 583, 775 

— modeling 775 

— weighted neuro-fuzzy inference 
system (TWNFI) 775 

transfer 

— function 475, 482 

— learning 501 

— passenger analysis 280 

transition probability 566 

transmission 1066 

travel over the ice 700 

traveling salesman problem (TSP) 
826, 1053, 1071, 1075, 1086, 1087 

tree 849, 1071 

— adjoining grammar-guided genetic 
programming (TAG3P) 851 

— decomposition 1255, 1256 

tree-based learning 509 

treewidth 557 

triangular conorm 162 

triangular MF 291, 1454 

triangular norm 12, 61, 69, 162 

— parametric family 165 

triangular-shaped basic function 
118 

triangulated graph 

TRoPICALS 707 

truncated generalized Bell function 
(TGBF) 793 


1175 
1461 


1257 


truncation selection 830, 873 

truth 250 

—degree 8 

— degree function 9 

—value gap 8 

TSK fuzzy system 222 

tuple relational calculus (TRC) 296 
Turing completeness 846 
two-point crossover 836 


two-stage optimization 1282 
type 

—reducer (TR) 1455, 1502 
— reduction 1455 

type-1 fuzzy 

— controller (T1FC) 1451 


— inference 287 

— logic system 1500 

— set (T1FS) 1455 

—system 1511 

type-1 fuzzy PID (T1-FPID) 285 

type-1 fuzzy PID controller 285, 
287 

—design 287 

— internal structure 287 

— rule structure 287 

type-2 fuzzy 

—control 1499, 1526 

— controller (T2FC) 1451, 1500, 
1317, 1327, 1530 

— inference system 1511 

— set (T2FS) 94, 1454 

type-2 fuzzy logic 1499, 1515 

— control 319 

—system 1500 

type-2 fuzzy PID controller 

— internal structure 288 

type-2 intelligent controller (T2IC) 
1451 

type-2 membership function (T2MF) 
1454 

type-3 fuzzy set 98 

types of sets 106 


U 


xəƏpul 


UCI repository 972 

unary 

— connective 21 

— indicator 985 

— propositional operator 21 


1632 


xapu| 


Index 


unbiased 

— crossover 837 
—mutation 833 
unbiasedness 873 
uncertainty 300 

— function 341 
uncontrolled manifold 670 
underwriter (UW) 801 
unified theory 683 
uniform 

836, 875 

— cycle crossover (UCX) 
1369 


— crossover 

1053 

— temperature 

uninorm 168 

United States Environmental 
Protection Agency (US EPA) 
1129 

univariate marginal distribution 
algorithm (UMDA) 902 

universal 

— approximation 530 

— asynchronous receiver/transmitter 
(UART) 1466 

— integral 75 

— modeling language (UML) 848 

unloading waiting time 1364 

unmanned aerial vehicle (UAV) 
313 

unordered fuzzy rule induction 
algorithm (FURIA) 101 

unsupervised 

— adaption 757 

— feature learning 474 

— learning 480, 496 

—method 512 

upper 

— approximation 334, 375 

— membership function (UMF) 
1454 

— motor neuron group 752 

— probability 46 

user 

— constraint file (UCF) 

—preference 990 


1463 


user-preference multiobjective 
particle swarm optimization 
(UPMOPSO) 1320 

utility indicator 1009 


V 


vague concept 332 

vaguely quantified rough set (VQRS) 
445 

vagueness 332 

Vapnik—Chervonenkis (VC) 581 

— bounds for classification 588 


variable 
—elimination 558 
— screening 1324 


variable consistency (VC) 351 
variable neighborhood 


— descent (VND) 1090 

— search (VNS) 1052, 1090, 1268 

variable precision rough set 391, 
426 


— model (VPRSM) 335 

variance ratio criterion (VCR) 

variation operator 987 

variational 

— algorithm 559 

— Bayes (VB) method 562 

— parameter 560 

vector field 1400 

vector quantization (VQ) 533 

vector-evaluated genetic algorithm 
(VEGA) 999 

ventral pathway 707 

ventrolateral prefrontal cortex 
(VLPFC) 685 

vertex cover 948 

vertex elimination 

vertical slice 1454 

very high speed integrated circuit 
(VHSIC) 1460 

very large neighborhood search 
(VLNS) 1231 

very large scale integration (VLSI) 
635, 715, 716 

—network 720 

VHSIC hardware description 
language (VHDL) 1460 

virtual 

— heading system (VHS) 

— pheromone 1390 

— potential field 1299 

visual attention 686 

visualization 1179, 1326 

Viterbi algorithm 571 

volatile parallel architecture 


1220 


1262 


1385 


1114 


voluntary movement 732 

von Neumann neighborhood 935 
vox populi 1389 

v-structure 554 


wW 


waggle run 1364 

walking 710 

— fish group (WFG) 

wall building 1402 

wall-following 315 

— behavior 314 

weak refinement 986 

weight 

— error power (WEP) 530 

— function 990 

— matrix 475 

weighted 

— aggregation function 71 

— arithmetic mean 70 

weighted-weighted nearest neighbor 
(WWKNN) 775 

wheeled robot 702 

whegs (wheel-leg hybrid) 700 

white matter (WM) 731 

wide area network (WAN) 

Willson-Cowan (WC) 618 

winner-take-all (WTA) 512, 722, 
1436 

wireless sensor network (WSN) 
1345 

wisdom 

— technology (WisTech) 346 

— web of things (W2T) 343 

working memory (WM) 670 

worst replacement 831 

writing-like sequence movement 
736 


1026 


1107 


WSN routing 1349 
Wu-Tan (WT) 1456 
X 


X classifier system (XCS) 961, 965 
— covering 965 

— data mining 972 

— functionality 967 

— parameter tuning 969 

— prediction array 965 


Index 1633 


— reinforcement learning 973 

— rule evolution 966 

— subsumption 967 

— system overview 965, 966 

—theory 968 

x-anticipatory classifier system 
(XACS) 975 

XCS for function approximation 
(XCSF) 961, 970 

—performance 970 

—robot arm control 975 

— system overview 970 

Xie-Beni cluster validity index (XB) 
1220 


Xilinx 


— Integrated Synthesis Environment 


(ISE) 1463 
— platform studio (XPS) 
— system generator (XSG) 
xNES 885 
XOR function 699 


Y 


1467 


1461 


Yager’s implication 185 
YALE database 1513 


Z 


Zadeh 33 

— extension principle 96 

Zador’s magnification law 547 
zeroth level classifier systemm (ZCS) 


964 

Zitzler-Deb-Thiele (ZDT) 1000, 
1026 

Zonal Euler—Navier—Stokes (ZEN) 
1202 


xəƏpul 


